diff mbox series

archive: document output stability concerns

Message ID 20230203080629.31492-1-ray@ameretat.dev (mailing list archive)
State New, archived
Headers show
Series archive: document output stability concerns | expand

Commit Message

Raymond E. Pasco Feb. 3, 2023, 8:06 a.m. UTC
In 4f4be00d302 (archive-tar: use internal gzip by default), the 'git
archive' command switched to using an internal compression filter
implemented with zlib rather than invoking a 'gzip' binary, for the
'.tar.gz' / '.tgz' output formats.

This change brought to light a common misconception that the output of
'git archive' is intended to be byte-for-byte stable. While this is not
the case, stable archive output is desirable for many applications; we
discuss concerns related to output stability and suggest ways in which
the user can control the compression used with the
"tar.<format>.command" configuration option.

Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
---
I think that something along these lines should be included in the
docs, but that the behavior should be kept the same. If it is decided
later to stabilize output, e.g. by vendoring a blessed zlib version
forever, the current state as of 2.38 is the best starting point;
and reverting a useful change because of external breakage which
already has a solution, while also promising instability, seems like
a poor choice.

 Documentation/git-archive.txt | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
diff mbox series

Patch

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 60c040988b..77acdacdf8 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -178,6 +178,41 @@  appropriate export-ignore in its `.gitattributes`), adjust the checked out
 option.  Alternatively you can keep necessary attributes that should apply
 while archiving any tree in your `$GIT_DIR/info/attributes` file.
 
+[[STABILITY]]
+STABILITY
+---------
+
+'git archive' does not guarantee that precisely identical archive files
+will be produced for invocations on the same commit or tree.
+
+'git archive' uses an internal implementation of `tar` archiving
+for the `tar` format, which includes the commit ID in an extended
+pax header.  For the `tgz` and `tar.gz` formats, it is augmented with
+a compression filter applied to the output, which is implemented by
+'git archive' by linking to the system zlib.
+
+If the commit ID of the "same" commit is different, for instance in the
+case of an object format migration from SHA-1 to SHA-256, the `tar`
+archive will necessarily differ due to including a different ID.
+
+The output of the compression filter is less deterministic than
+the output of the `tar` implementation, because the versions
+of zlib used may differ. The internal compression filter can be
+replaced with a particular command specified by the user using the
+`tar.<format>.command` configuration option; for instance, a particular
+gzip binary provided by the user could be specified here for consistent
+output.
+
+The `tar` format used by 'git archive' is unlikely to change
+frequently, but is not guaranteed to be completely stable; its output
+will remain identical at least within the same Git version.
+
+The `zip` format has similar concerns to the `tar.gz` and `tgz`
+formats; ZIP archiving is implemented internally, but the Deflate
+compression used relies on the linked zlib. However, because archiving
+and compression are combined into a single operation, there is no
+user-specifiable filter command for the `zip` format.
+
 EXAMPLES
 --------
 `git archive --format=tar --prefix=junk/ HEAD | (cd /var/tmp/ && tar xf -)`::