diff mbox series

[6/9] archive: use "gzip -cn" for stability, not "git archive gzip"

Message ID patch-6.9-34c7ce73099-20230202T093212Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series git archive: use gzip again by default, document output stabilty | expand

Commit Message

Ævar Arnfjörð Bjarmason Feb. 2, 2023, 9:32 a.m. UTC
This reverts and amends [1] so that we don't use "git archive gzip" by
default, but only fall back on it when we cannot invoke "gzip".

As noted in the discussion at [2] that commit first released with
v2.38.0 caused widespread breakage in the wild: Hosting sites like
GitHub tend to offer a feature to download tagged releases as
archives, which are generated by some variant of "git archive
--format=tgz".

Downstream distributors then tend to (re-)download those archives
as-is, hardcoding their known hash their packaging systems. See [3],
[4] etc. for reports of those systems breaking in conjunction with
[1].

The reason for "why" is entirely missing from the commit message for
[1], but as seen in the question about that in [5] and reply at [6] at
the time it was to "avoid a run[time] dependency; the build/test
dependency remains.".

It's not immediately apparent what the second part of that is
referring to, as [1] also removed the "GZIP" prerequisite from some
tests. The answer is that we still have other tests that need "GZIP",
but those are invoking "gzip(1)" explicitly.

In any case, whatever promises we make in the future about the
stability and non-stability of "git archive" output (or the derived
compressed artifact), this amount of fallout isn't worth it to get to
the stated goal in [1].

Let's instead default to "gzip -cn" again, but if we can't find it
fall back on "git archive gzip". Note that we'll only fallback if that
"gzip -cn" is ours, not if it comes from the user's own
"tar.<format>.command" configuration.

If we do need the fallback we'll warn about it. No such warning will
be emitted if the user has explicitly asked for "git archive gzip".

1. 4f4be00d302 (archive-tar: use internal gzip by default, 2022-06-15)
2. https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/
3. https://github.com/Homebrew/homebrew-core/issues/121877
4. https://github.com/bazel-contrib/SIG-rules-authors/issues/11
5. https://lore.kernel.org/git/220615.86wndhwt9a.gmgdl@evledraar.gmail.com/
6. https://lore.kernel.org/git/3ed80afd-34b3-afd8-5ffb-0187a4475ee1@web.de/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/tar.txt |  8 ++++++--
 archive-tar.c                | 20 +++++++++++++++-----
 2 files changed, 21 insertions(+), 7 deletions(-)
diff mbox series

Patch

diff --git a/Documentation/config/tar.txt b/Documentation/config/tar.txt
index 5456fc617a2..37f24baa73a 100644
--- a/Documentation/config/tar.txt
+++ b/Documentation/config/tar.txt
@@ -16,8 +16,12 @@  tar.<format>.command::
 	to the command (e.g., `-9`).
 +
 The `tar.gz` and `tgz` formats are defined automatically and use the
-magic command `git archive gzip` by default, which invokes an internal
-implementation of gzip.
+command `gzip -cn` by default. An internal gzip implementation can be
+used by specifying the value `git archive gzip`.
++
+If 'gzip -cn' cannot be executed we'll fall back on `git archive gzip`
+with a warning, if you don't have a gzip(1) and would like to use the
+internal `git archive gzip` without warning, configure it explicitly.
 +
 The automatically defined commands do not invoke the shell, avoiding
 the minor overhead of an extra sh(1) process.
diff --git a/archive-tar.c b/archive-tar.c
index dfc133deac7..26efb911ebc 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -464,6 +464,7 @@  static void tgz_write_block(const void *data)
 }
 
 static const char internal_gzip_command[] = "git archive gzip";
+static const char gzip_cn_command[] = "gzip -cn";
 
 static int gzip_internally(const struct archiver *ar,
 			   struct archiver_args *args)
@@ -494,12 +495,15 @@  static int write_tar_filter_archive(const struct archiver *ar,
 {
 	struct strbuf cmd = STRBUF_INIT;
 	struct child_process filter = CHILD_PROCESS_INIT;
+	int filter_is_gzip_cn = 0;
 	int r;
 
 	if (!ar->filter_command)
 		BUG("tar-filter archiver called with no filter defined");
 
-	if (!strcmp(ar->filter_command, internal_gzip_command))
+	if (!strcmp(ar->filter_command, gzip_cn_command))
+		filter_is_gzip_cn = 1;
+	else if (!strcmp(ar->filter_command, internal_gzip_command))
 		return gzip_internally(ar, args);
 
 	strbuf_addstr(&cmd, ar->filter_command);
@@ -515,8 +519,14 @@  static int write_tar_filter_archive(const struct archiver *ar,
 	filter.in = -1;
 	filter.silent_exec_failure = 1;
 
-	if (start_command(&filter) < 0)
-		die_errno(_("unable to start '%s' filter"), cmd.buf);
+	if (start_command(&filter) < 0) {
+		if (!filter_is_gzip_cn)
+			die_errno(_("unable to start '%s' filter"), cmd.buf);
+
+		warning_errno(_("unable to start '%s' filter, falling back to '%s'"),
+			      cmd.buf, internal_gzip_command);
+		return gzip_internally(ar, args);
+	}
 	close(1);
 	if (dup2(filter.in, 1) < 0)
 		die_errno(_("unable to redirect descriptor"));
@@ -544,9 +554,9 @@  void init_tar_archiver(void)
 	int configured = 1;
 	register_archiver(&tar_archiver);
 
-	tar_filter_config("tar.tgz.command", internal_gzip_command, NULL);
+	tar_filter_config("tar.tgz.command", gzip_cn_command, NULL);
 	tar_filter_config("tar.tgz.remote", "true", NULL);
-	tar_filter_config("tar.tar.gz.command", internal_gzip_command, NULL);
+	tar_filter_config("tar.tar.gz.command", gzip_cn_command, NULL);
 	tar_filter_config("tar.tar.gz.remote", "true", NULL);
 	git_config(git_tar_config, &configured);
 	for (i = 0; i < nr_tar_filters; i++) {