Empty trees in Git
After using Git for a little while, there is a reasonable chance you will run across the following hash:
So where does it come from, and why should you care?
Where does the hash come from?
Every git repository, even an empty repository will contain the hash. This
can be verified with
$ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904 tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
So where does this hash come from? Well internally Git keeps track of a few different object types. The most fundamental object is a blob which represents file content. Blobs are then referenced by tree objects which represent directories and commit objects reference tree objects. The diagram below gives a quick overview of this:
The diagram above is taken from Pro Git and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license.
Note: the Git Internals chapter from Pro Git has more info on objects if you're still curious.
So how does the hash fit in? Well it's actually the hash of an empty tree.
This can be verified by creating a object hash for either
/dev/null of an
$ git hash-object -t tree /dev/null 4b825dc642cb6eb9a060e54bf8d69288fbee4904 $ echo -n '' | git hash-object -t tree --stdin 4b825dc642cb6eb9a060e54bf8d69288fbee4904
Using the hash
The empty tree hash is often used with git diff. For example
if you wanted to check for whitespace errors in a directory, you could use the
--check option and compare
HEAD against the empty tree:
$ git diff $(git hash-object -t tree /dev/null) HEAD --check -- po po/ca.po:7: trailing whitespace. +# Terminologia i criteris utilitzats po/ru.po:4: trailing whitespace. +#
The empty tree hash is also very useful when writing git hooks. A fairly common pattern is to validate new commits before accepting them with code similar to the following:
for changed_file in $(git diff --cached --name-only --diff-filter=ACM HEAD) do if ! validate_file "$changed_file"; then echo "Aborting commit" exit 1 fi done
This works fine if there are previous commits, however the HEAD reference will not exist if no commits have been made. To get around this the empty tree hash can be used when checking the initial commit:
if git rev-parse --verify -q HEAD > /dev/null; then against=HEAD else # Initial commit: diff against an empty tree object against="$(git hash-object -t tree /dev/null)" fi for changed_file in $(git diff --cached --name-only --diff-filter=ACM "$against") do if ! validate_file "$changed_file"; then echo "Aborting commit" exit 1 fi done