[v2] archive: initialize archivers earlier
diff mbox series

Message ID bc6f20274dfe11f1451745e0accb065544cc59ca.1540244445.git.steadmon@google.com
State Superseded
Headers show
Series
  • [v2] archive: initialize archivers earlier
Related show

Commit Message

Josh Steadmon Oct. 22, 2018, 9:48 p.m. UTC
Initialize archivers as soon as possible when running git-archive and
git-upload-archive. Various non-obvious behavior depends on having the
archivers initialized, such as determining the desired archival format
from the provided filename.

Since 08716b3c11 ("archive: refactor file extension format-guessing",
2011-06-21), archive_format_from_filename() has used the registered
archivers to match filenames (provided via --output) to archival
formats. However, when git-archive is executed with --remote, format
detection happens before the archivers have been registered. This causes
archives from remotes to always be generated as TAR files, regardless of
the actual filename (unless an explicit --format is provided).

This patch fixes that behavior; archival format is determined properly
from the output filename, even when --remote is used.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Helped-by: Jeff King <peff@peff.net>
---
 archive.c                | 9 ++++++---
 archive.h                | 1 +
 builtin/archive.c        | 2 ++
 builtin/upload-archive.c | 1 +
 t/t5000-tar-tree.sh      | 6 ++++++
 5 files changed, 16 insertions(+), 3 deletions(-)

Comments

Jeff King Oct. 22, 2018, 10:35 p.m. UTC | #1
On Mon, Oct 22, 2018 at 02:48:11PM -0700, steadmon@google.com wrote:

> Initialize archivers as soon as possible when running git-archive and
> git-upload-archive. Various non-obvious behavior depends on having the
> archivers initialized, such as determining the desired archival format
> from the provided filename.
> 
> Since 08716b3c11 ("archive: refactor file extension format-guessing",
> 2011-06-21), archive_format_from_filename() has used the registered
> archivers to match filenames (provided via --output) to archival
> formats. However, when git-archive is executed with --remote, format
> detection happens before the archivers have been registered. This causes
> archives from remotes to always be generated as TAR files, regardless of
> the actual filename (unless an explicit --format is provided).
> 
> This patch fixes that behavior; archival format is determined properly
> from the output filename, even when --remote is used.
> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> Helped-by: Jeff King <peff@peff.net>

Thanks, this looks good overall.

A few minor comments (that I'm not even sure are worth re-rolling for):

> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 25d9116356..3f35ebcfe8 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -43,6 +43,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	/* parse all options sent by the client */
> +	init_archivers();
>  	return write_archive(sent_argv.argc, sent_argv.argv, prefix,
>  			     the_repository, NULL, 1);
>  }

This seems to separate the comment from what it describes. Any reason
not to just init_archivers() closer to the top of the function here
(probably after the enter_repo() call)?

> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 2a97b27b0a..3e95fdf660 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -206,6 +206,12 @@ test_expect_success 'git archive with --output, override inferred format' '
>  	test_cmp_bin b.tar d4.zip
>  '
>  
> +test_expect_success GZIP 'git archive with --output and --remote uses expected format' '
> +	git archive --output=d5.tgz --remote=. HEAD &&
> +	gzip -d -c < d5.tgz > d5.tar &&
> +	test_cmp_bin b.tar d5.tar
> +'

This nicely tests the more-interesting tgz case. But unfortunately it
won't run on machines without the GZIP prerequisite. I'd think that
would really be _most_ machines, but is it worth having a separate zip
test to cover machines without gzip? I guess that just creates the
opposite problem: not everybody has ZIP.

-Peff
Josh Steadmon Oct. 22, 2018, 11:51 p.m. UTC | #2
On 2018.10.22 18:35, Jeff King wrote:
> On Mon, Oct 22, 2018 at 02:48:11PM -0700, steadmon@google.com wrote:
> 
> > Initialize archivers as soon as possible when running git-archive and
> > git-upload-archive. Various non-obvious behavior depends on having the
> > archivers initialized, such as determining the desired archival format
> > from the provided filename.
> > 
> > Since 08716b3c11 ("archive: refactor file extension format-guessing",
> > 2011-06-21), archive_format_from_filename() has used the registered
> > archivers to match filenames (provided via --output) to archival
> > formats. However, when git-archive is executed with --remote, format
> > detection happens before the archivers have been registered. This causes
> > archives from remotes to always be generated as TAR files, regardless of
> > the actual filename (unless an explicit --format is provided).
> > 
> > This patch fixes that behavior; archival format is determined properly
> > from the output filename, even when --remote is used.
> > 
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> > Helped-by: Jeff King <peff@peff.net>
> 
> Thanks, this looks good overall.
> 
> A few minor comments (that I'm not even sure are worth re-rolling for):
> 
> > diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> > index 25d9116356..3f35ebcfe8 100644
> > --- a/builtin/upload-archive.c
> > +++ b/builtin/upload-archive.c
> > @@ -43,6 +43,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
> >  	}
> >  
> >  	/* parse all options sent by the client */
> > +	init_archivers();
> >  	return write_archive(sent_argv.argc, sent_argv.argv, prefix,
> >  			     the_repository, NULL, 1);
> >  }
> 
> This seems to separate the comment from what it describes. Any reason
> not to just init_archivers() closer to the top of the function here
> (probably after the enter_repo() call)?

Ack, fixed.


> > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> > index 2a97b27b0a..3e95fdf660 100755
> > --- a/t/t5000-tar-tree.sh
> > +++ b/t/t5000-tar-tree.sh
> > @@ -206,6 +206,12 @@ test_expect_success 'git archive with --output, override inferred format' '
> >  	test_cmp_bin b.tar d4.zip
> >  '
> >  
> > +test_expect_success GZIP 'git archive with --output and --remote uses expected format' '
> > +	git archive --output=d5.tgz --remote=. HEAD &&
> > +	gzip -d -c < d5.tgz > d5.tar &&
> > +	test_cmp_bin b.tar d5.tar
> > +'
> 
> This nicely tests the more-interesting tgz case. But unfortunately it
> won't run on machines without the GZIP prerequisite. I'd think that
> would really be _most_ machines, but is it worth having a separate zip
> test to cover machines without gzip? I guess that just creates the
> opposite problem: not everybody has ZIP.

Added a test to compare the file lists from the .zip file to the
reference .tar file. I'm not sure if this is the best way to do things,
but it at least verifies that a .zip is produced. However, it's brittle
if the output of "zip -sf" changes. Let me know if you have a better
idea.
Jeff King Oct. 23, 2018, 12:06 a.m. UTC | #3
On Mon, Oct 22, 2018 at 04:51:27PM -0700, Josh Steadmon wrote:

> > > +test_expect_success GZIP 'git archive with --output and --remote uses expected format' '
> > > +	git archive --output=d5.tgz --remote=. HEAD &&
> > > +	gzip -d -c < d5.tgz > d5.tar &&
> > > +	test_cmp_bin b.tar d5.tar
> > > +'
> > 
> > This nicely tests the more-interesting tgz case. But unfortunately it
> > won't run on machines without the GZIP prerequisite. I'd think that
> > would really be _most_ machines, but is it worth having a separate zip
> > test to cover machines without gzip? I guess that just creates the
> > opposite problem: not everybody has ZIP.
> 
> Added a test to compare the file lists from the .zip file to the
> reference .tar file. I'm not sure if this is the best way to do things,
> but it at least verifies that a .zip is produced. However, it's brittle
> if the output of "zip -sf" changes. Let me know if you have a better
> idea.

I wonder if we could do something more black-box. What we really care
about here is not the exact output, but rather that "-o foo.zip"
produces the same output as "--format zip". Could we do that without
even relying on ZIP?

I think it should follow even for tgz, because we use "-n" for a
repeatable output. But there we are relying on an external gzip just to
_create_ the file, so we'd still need the GZIP prereq.

Hmm. Looks like we already have a similar test in t5003. So maybe just:

diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 55c7870997..cf19f56924 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -158,11 +158,16 @@ test_expect_success 'git archive --format=zip with --output' \
     'git archive --format=zip --output=d2.zip HEAD &&
     test_cmp_bin d.zip d2.zip'
 
-test_expect_success 'git archive with --output, inferring format' '
+test_expect_success 'git archive with --output, inferring format (local)' '
 	git archive --output=d3.zip HEAD &&
 	test_cmp_bin d.zip d3.zip
 '
 
+test_expect_success 'git archive with --output, ferring format (remote)' '
+	git archive --remote=. --output=d4.zip HEAD &&
+	test_cmp_bin d.zip d4.zip
+'
+
 test_expect_success \
     'git archive --format=zip with prefix' \
     'git archive --format=zip --prefix=prefix/ HEAD >e.zip'

which I think exposes the bug and can run everywhere?

-Peff
Josh Steadmon Oct. 23, 2018, 12:23 a.m. UTC | #4
On 2018.10.22 20:06, Jeff King wrote:
> On Mon, Oct 22, 2018 at 04:51:27PM -0700, Josh Steadmon wrote:
> 
> > > > +test_expect_success GZIP 'git archive with --output and --remote uses expected format' '
> > > > +	git archive --output=d5.tgz --remote=. HEAD &&
> > > > +	gzip -d -c < d5.tgz > d5.tar &&
> > > > +	test_cmp_bin b.tar d5.tar
> > > > +'
> > > 
> > > This nicely tests the more-interesting tgz case. But unfortunately it
> > > won't run on machines without the GZIP prerequisite. I'd think that
> > > would really be _most_ machines, but is it worth having a separate zip
> > > test to cover machines without gzip? I guess that just creates the
> > > opposite problem: not everybody has ZIP.
> > 
> > Added a test to compare the file lists from the .zip file to the
> > reference .tar file. I'm not sure if this is the best way to do things,
> > but it at least verifies that a .zip is produced. However, it's brittle
> > if the output of "zip -sf" changes. Let me know if you have a better
> > idea.
> 
> I wonder if we could do something more black-box. What we really care
> about here is not the exact output, but rather that "-o foo.zip"
> produces the same output as "--format zip". Could we do that without
> even relying on ZIP?
> 
> I think it should follow even for tgz, because we use "-n" for a
> repeatable output. But there we are relying on an external gzip just to
> _create_ the file, so we'd still need the GZIP prereq.
> 
> Hmm. Looks like we already have a similar test in t5003. So maybe just:
> 
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 55c7870997..cf19f56924 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -158,11 +158,16 @@ test_expect_success 'git archive --format=zip with --output' \
>      'git archive --format=zip --output=d2.zip HEAD &&
>      test_cmp_bin d.zip d2.zip'
>  
> -test_expect_success 'git archive with --output, inferring format' '
> +test_expect_success 'git archive with --output, inferring format (local)' '
>  	git archive --output=d3.zip HEAD &&
>  	test_cmp_bin d.zip d3.zip
>  '
>  
> +test_expect_success 'git archive with --output, ferring format (remote)' '
> +	git archive --remote=. --output=d4.zip HEAD &&
> +	test_cmp_bin d.zip d4.zip
> +'
> +
>  test_expect_success \
>      'git archive --format=zip with prefix' \
>      'git archive --format=zip --prefix=prefix/ HEAD >e.zip'
> 
> which I think exposes the bug and can run everywhere?

Makes sense, thanks!

Patch
diff mbox series

diff --git a/archive.c b/archive.c
index c1870105eb..ce0f8a0362 100644
--- a/archive.c
+++ b/archive.c
@@ -29,6 +29,12 @@  void register_archiver(struct archiver *ar)
 	archivers[nr_archivers++] = ar;
 }
 
+void init_archivers(void)
+{
+	init_tar_archiver();
+	init_zip_archiver();
+}
+
 static void format_subst(const struct commit *commit,
                          const char *src, size_t len,
                          struct strbuf *buf)
@@ -531,9 +537,6 @@  int write_archive(int argc, const char **argv, const char *prefix,
 	git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable);
 	git_config(git_default_config, NULL);
 
-	init_tar_archiver();
-	init_zip_archiver();
-
 	args.repo = repo;
 	argc = parse_archive_args(argc, argv, &ar, &args, name_hint, remote);
 	if (!startup_info->have_repository) {
diff --git a/archive.h b/archive.h
index d4f97a00f5..21ac010699 100644
--- a/archive.h
+++ b/archive.h
@@ -43,6 +43,7 @@  extern void register_archiver(struct archiver *);
 
 extern void init_tar_archiver(void);
 extern void init_zip_archiver(void);
+extern void init_archivers(void);
 
 typedef int (*write_archive_entry_fn_t)(struct archiver_args *args,
 					const struct object_id *oid,
diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..d2455237ce 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -97,6 +97,8 @@  int cmd_archive(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, local_opts, NULL,
 			     PARSE_OPT_KEEP_ALL);
 
+	init_archivers();
+
 	if (output)
 		create_output_file(output);
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 25d9116356..3f35ebcfe8 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -43,6 +43,7 @@  int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 	}
 
 	/* parse all options sent by the client */
+	init_archivers();
 	return write_archive(sent_argv.argc, sent_argv.argv, prefix,
 			     the_repository, NULL, 1);
 }
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 2a97b27b0a..3e95fdf660 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -206,6 +206,12 @@  test_expect_success 'git archive with --output, override inferred format' '
 	test_cmp_bin b.tar d4.zip
 '
 
+test_expect_success GZIP 'git archive with --output and --remote uses expected format' '
+	git archive --output=d5.tgz --remote=. HEAD &&
+	gzip -d -c < d5.tgz > d5.tar &&
+	test_cmp_bin b.tar d5.tar
+'
+
 test_expect_success 'git archive --list outside of a git repo' '
 	nongit git archive --list
 '