Empty trees in Git
After using Git for a little while, there is a reasonable chance you will run across the following hash:
4b825dc642cb6eb9a060e54bf8d69288fbee4904
So where does it come from, and why should you care?
Where does the hash come from?
Every git repository, even an empty repository will contain the hash. This
can be verified with git show
:
$ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
So where does this hash come from? Well internally Git keeps track of a few different object types. The most fundamental object is a blob which represents file content. Blobs are then referenced by tree objects which represent directories and commit objects reference tree objects. The diagram below gives a quick overview of this:
The diagram above is taken from Pro Git and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license.
Note: the Git Internals chapter from Pro Git has more info on objects if you're still curious.
So how does the hash fit in? Well it's actually the hash of an empty tree.
This can be verified by creating a object hash for either /dev/null
of an
empty string:
$ git hash-object -t tree /dev/null
4b825dc642cb6eb9a060e54bf8d69288fbee4904
$ echo -n '' | git hash-object -t tree --stdin
4b825dc642cb6eb9a060e54bf8d69288fbee4904
Using the hash
The empty tree hash is often used with git diff. For example
if you wanted to check for whitespace errors in a directory, you could use the
--check
option and compare HEAD
against the empty tree:
$ git diff $(git hash-object -t tree /dev/null) HEAD --check -- po
po/ca.po:7: trailing whitespace.
+# Terminologia i criteris utilitzats
po/ru.po:4: trailing whitespace.
+#
The empty tree hash is also very useful when writing git hooks. A fairly common pattern is to validate new commits before accepting them with code similar to the following:
for changed_file in $(git diff --cached --name-only --diff-filter=ACM HEAD)
do
if ! validate_file "$changed_file"; then
echo "Aborting commit"
exit 1
fi
done
This works fine if there are previous commits, however the HEAD reference will not exist if no commits have been made. To get around this the empty tree hash can be used when checking the initial commit:
if git rev-parse --verify -q HEAD > /dev/null; then
against=HEAD
else
# Initial commit: diff against an empty tree object
against="$(git hash-object -t tree /dev/null)"
fi
for changed_file in $(git diff --cached --name-only --diff-filter=ACM "$against")
do
if ! validate_file "$changed_file"; then
echo "Aborting commit"
exit 1
fi
done