Short answer: No, Git always records the entire file.
Longer answer: Okay, that's not quite true. Logically, Git always records the entire file. In the storage backend, however, Git performs delta compression across all files from all revisions, so it even detects identical content between different files and across the entire history of all branches, not just the parent commit. And since the network protocol and the storage backend share the same format ("pack files"), you get the same efficiency for push
and fetch
.
However, it is important to remember that this is an internal implementation detail of the storage backend. It is not a part of the object model. The object model is that each commit contains the entire tree.
This is Git's object model:
blob: a bytestream. Basically, a file, but only its contents. It doesn't have a name. In this way, Git works like a Unix filesystem, files don't have names, rather directories associate names with files.
tree: a flat(!!!) list of (mode, name, {tree|blob})
triples. This is the equivalent to a Unix directory. It associates names and modes (mainly executable or not) with blobs or trees. I.e. trees can be recursive.
commit: a pointer to a tree and a pointer to zero, one, or many parent commits. Also contains a datestamp and two name strings (author and committer) and most importantly, the commit message.
(local tag): technically, not a Git object. Just a local file pointing to a commit.
annotated tag: contains a pointer to a commit, a name, and an annotation message.
signed tag: contains an annotated tag(???) and a digital signature [not sure about this one, is built on top of an annotated tag or does it duplicate it?]
note: a piece of text that can be attached to any Git object. This can be used to add arbitrary user-defined metadata to any Git object, e.g. a CI server could attach code coverage results to commits or a bug tracker could attach links to tickets to commits which fix a bug, a web server could attach MIME types to blobs, a release management system could attach go/no-go votes to annotated tags, …
Note that only blobs actually contain file data. The rest is just pointers. And blobs don't have names, which means that as long as a blob has the same content, it is the same blob, and thus only exists once in the object store. In fact, it even exists only once in the entire Git universe! For example, the FSF's GPL COPYING
file, as long as you keep it unmodified, will be the exact same blob, even in totally unrelated repositories!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…