diff mbox series

[2/2] archive: avoid spawning `gzip`

Message ID 44d5371ae6808ec40e8f52c3dc258a85c878b27e.1555110278.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Avoid spawning gzip in git archive | expand

Commit Message

Johannes Schindelin via GitGitGadget April 12, 2019, 11:04 p.m. UTC
From: Rohit Ashiwal <rohit.ashiwal265@gmail.com>

As we already link to the zlib library, we can perform the compression
without even requiring gzip on the host machine.

Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 archive-tar.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

Comments

Jeff King April 13, 2019, 1:51 a.m. UTC | #1
On Fri, Apr 12, 2019 at 04:04:40PM -0700, Rohit Ashiwal via GitGitGadget wrote:

> From: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
> 
> As we already link to the zlib library, we can perform the compression
> without even requiring gzip on the host machine.

Very cool. It's nice to drop a dependency, and this should be a bit more
efficient, too.

> diff --git a/archive-tar.c b/archive-tar.c
> index ba37dad27c..5979ed14b7 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -466,18 +466,34 @@ static int write_tar_filter_archive(const struct archiver *ar,
>  	filter.use_shell = 1;
>  	filter.in = -1;
>  
> -	if (start_command(&filter) < 0)
> -		die_errno(_("unable to start '%s' filter"), argv[0]);
> -	close(1);
> -	if (dup2(filter.in, 1) < 0)
> -		die_errno(_("unable to redirect descriptor"));
> -	close(filter.in);
> +	if (!strcmp("gzip -cn", ar->data)) {

I wondered how you were going to kick this in, since users can define
arbitrary filters. I think it's kind of neat to automagically convert
"gzip -cn" (which also happens to be the default). But I think we should
mention that in the Documentation, in case somebody tries to use a
custom version of gzip and wonders why it isn't kicking in.

Likewise, it might make sense in the tests to put a poison gzip in the
$PATH so that we can be sure we're using our internal code, and not just
calling out to gzip (on platforms that have it, of course).

The alternative is that we could use a special token like ":zlib" or
something to indicate that the internal implementation should be used
(and then tweak the baked-in default, too). That might be less
surprising for users, but most people would still get the benefit since
they'd be using the default config.

> +		char outmode[4] = "wb\0";

This looks sufficiently magical that it might merit a comment. I had to
look in the zlib header file to learn that this is just a normal
stdio-style mode. But we can't just do:

  gzip = gzdopen(fd, "wb");

because we want to (maybe) append a compression level. It's also
slightly confusing that it explicitly includes a NUL, but later:

> +		if (args->compression_level >= 0 && args->compression_level <= 9)
> +			outmode[2] = '0' + args->compression_level;

we may overwrite that and assume that outmode[3] is also a NUL. Which it
is, because of how C initialization works. But that means we also do not
need the "\0" in the initializer.

Dropping that may make it slightly less jarring (any time I see a
backslash escape in an initializer, I assume I'm in for some binary
trickery, but this turns out to be much more mundane).

I'd also consider just using a strbuf:

  struct strbuf outmode = STRBUF_INIT;

  strbuf_addstr(&outmode, "wb");
  if (args->compression_level >= 0 && args->compression_level <= 9)
	strbuf_addch(&outmode, '0' + args->compression_level);

That's overkill in a sense, but it saves us having to deal with
manually-counted offsets, and this code is only run once per program
invocation, so the efficiency shouldn't matter.

> +		gzip = gzdopen(fileno(stdout), outmode);
> +		if (!gzip)
> +			die(_("Could not gzdopen stdout"));

Is there a way to get a more specific error from zlib? I'm less
concerned about gzdopen here (which should never fail), and more about
the writing and closing steps. I don't see anything good for gzdopen(),
but...

> +	if (gzip) {
> +		if (gzclose(gzip) != Z_OK)
> +			die(_("gzclose failed"));

...according to zlib.h, here the returned int is meaningful. And if
Z_ERRNO, we should probably use die_errno() to give a better message.

> [...]

That was a lot of little nits, but the overall shape of the patch looks
good to me (and I think the goal is obviously good). Thanks for working
on it.

-Peff
René Scharfe April 13, 2019, 10:01 p.m. UTC | #2
Am 13.04.2019 um 03:51 schrieb Jeff King:
> On Fri, Apr 12, 2019 at 04:04:40PM -0700, Rohit Ashiwal via GitGitGadget wrote:
>
>> From: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
>>
>> As we already link to the zlib library, we can perform the compression
>> without even requiring gzip on the host machine.
>
> Very cool. It's nice to drop a dependency, and this should be a bit more
> efficient, too.

Getting rid of dependencies is good, and using zlib is the obvious way to
generate .tgz files. Last time I tried something like that, a separate gzip
process was faster, though -- at least on Linux [1].  How does this one
fare?

Doing compression in its own thread may be a good idea.

René


[1] http://public-inbox.org/git/4AAAC8CE.8020302@lsrfire.ath.cx/
brian m. carlson April 13, 2019, 10:16 p.m. UTC | #3
On Fri, Apr 12, 2019 at 09:51:02PM -0400, Jeff King wrote:
> I wondered how you were going to kick this in, since users can define
> arbitrary filters. I think it's kind of neat to automagically convert
> "gzip -cn" (which also happens to be the default). But I think we should
> mention that in the Documentation, in case somebody tries to use a
> custom version of gzip and wonders why it isn't kicking in.
> 
> Likewise, it might make sense in the tests to put a poison gzip in the
> $PATH so that we can be sure we're using our internal code, and not just
> calling out to gzip (on platforms that have it, of course).
> 
> The alternative is that we could use a special token like ":zlib" or
> something to indicate that the internal implementation should be used
> (and then tweak the baked-in default, too). That might be less
> surprising for users, but most people would still get the benefit since
> they'd be using the default config.

I agree that a special value (or NULL, if that's possible) would be
nicer here. That way, if someone does specify a custom gzip, we honor
it, and it serves to document the code better. For example, if someone
symlinked pigz to gzip and used "gzip -cn", then they might not get the
parallelization benefits they expected.

I'm fine overall with the idea of bringing the compression into the
binary using zlib, provided that we preserve the "-n" behavior
(producing reproducible archives).
Jeff King April 15, 2019, 9:35 p.m. UTC | #4
On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:

> >> As we already link to the zlib library, we can perform the compression
> >> without even requiring gzip on the host machine.
> >
> > Very cool. It's nice to drop a dependency, and this should be a bit more
> > efficient, too.
> 
> Getting rid of dependencies is good, and using zlib is the obvious way to
> generate .tgz files. Last time I tried something like that, a separate gzip
> process was faster, though -- at least on Linux [1].  How does this one
> fare?

I'd expect a separate gzip to be faster in wall-clock time for a
multi-core machine, but overall consume more CPU. I'm slightly surprised
that your timings show that it actually wins on total CPU, too.

Here are best-of-five times for "git archive --format=tar.gz HEAD" on
linux.git (the machine is a quad-core):

  [before, separate gzip]
  real	0m21.501s
  user	0m26.148s
  sys	0m0.619s

  [after, internal gzwrite]
  real	0m25.156s
  user	0m25.059s
  sys	0m0.096s

which does show what I expect (longer overall, but less total CPU).

Which one you prefer depends on your situation, of course. A user on a
workstation with multiple cores probably cares most about end-to-end
latency and using all of their available horsepower. A server hosting
repositories and receiving many unrelated requests probably cares more
about total CPU (though the differences there are small enough that it
may not even be worth having a config knob to un-parallelize it).

> Doing compression in its own thread may be a good idea.

Yeah. It might even make the patch simpler, since I'd expect it to be
implemented with start_async() and a descriptor, making it look just
like a gzip pipe to the caller. :)

-Peff
Jeff King April 15, 2019, 9:36 p.m. UTC | #5
On Sat, Apr 13, 2019 at 10:16:46PM +0000, brian m. carlson wrote:

> > The alternative is that we could use a special token like ":zlib" or
> > something to indicate that the internal implementation should be used
> > (and then tweak the baked-in default, too). That might be less
> > surprising for users, but most people would still get the benefit since
> > they'd be using the default config.
> 
> I agree that a special value (or NULL, if that's possible) would be
> nicer here. That way, if someone does specify a custom gzip, we honor
> it, and it serves to document the code better. For example, if someone
> symlinked pigz to gzip and used "gzip -cn", then they might not get the
> parallelization benefits they expected.

Thanks for spelling that out. I had a vague feeling somebody might be
surprised, but I didn't know if people actually did stuff like
symlinking pigz to gzip (though it makes perfect sense to do so).

> I'm fine overall with the idea of bringing the compression into the
> binary using zlib, provided that we preserve the "-n" behavior
> (producing reproducible archives).

I just assumed that gzwrite() would have the "-n" behavior, but it's
definitely worth double-checking.

-Peff
Johannes Schindelin April 26, 2019, 2:47 p.m. UTC | #6
Hi Peff,

On Fri, 12 Apr 2019, Jeff King wrote:

> On Fri, Apr 12, 2019 at 04:04:40PM -0700, Rohit Ashiwal via GitGitGadget wrote:
>
> > From: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
> >
> > As we already link to the zlib library, we can perform the compression
> > without even requiring gzip on the host machine.
>
> Very cool. It's nice to drop a dependency, and this should be a bit more
> efficient, too.

Sadly, no, as René intuited and as your testing shows: there seems to be a
~15% penalty for compressing in the same thread as producing the data to
be compressed.

Given that it reduces the number of dependencies, and given that it might
be better to rely on the external command `pigz -cn` if speed is really a
matter, I still think it makes sense to switch the default, though.

> > diff --git a/archive-tar.c b/archive-tar.c
> > index ba37dad27c..5979ed14b7 100644
> > --- a/archive-tar.c
> > +++ b/archive-tar.c
> > @@ -466,18 +466,34 @@ static int write_tar_filter_archive(const struct archiver *ar,
> >  	filter.use_shell = 1;
> >  	filter.in = -1;
> >
> > -	if (start_command(&filter) < 0)
> > -		die_errno(_("unable to start '%s' filter"), argv[0]);
> > -	close(1);
> > -	if (dup2(filter.in, 1) < 0)
> > -		die_errno(_("unable to redirect descriptor"));
> > -	close(filter.in);
> > +	if (!strcmp("gzip -cn", ar->data)) {
>
> I wondered how you were going to kick this in, since users can define
> arbitrary filters. I think it's kind of neat to automagically convert
> "gzip -cn" (which also happens to be the default). But I think we should
> mention that in the Documentation, in case somebody tries to use a
> custom version of gzip and wonders why it isn't kicking in.
>
> Likewise, it might make sense in the tests to put a poison gzip in the
> $PATH so that we can be sure we're using our internal code, and not just
> calling out to gzip (on platforms that have it, of course).
>
> The alternative is that we could use a special token like ":zlib" or
> something to indicate that the internal implementation should be used
> (and then tweak the baked-in default, too). That might be less
> surprising for users, but most people would still get the benefit since
> they'd be using the default config.

I went with `:zlib`.

> > +		char outmode[4] = "wb\0";
>
> This looks sufficiently magical that it might merit a comment. I had to
> look in the zlib header file to learn that this is just a normal
> stdio-style mode. But we can't just do:
>
>   gzip = gzdopen(fd, "wb");
>
> because we want to (maybe) append a compression level. It's also
> slightly confusing that it explicitly includes a NUL, but later:
>
> > +		if (args->compression_level >= 0 && args->compression_level <= 9)
> > +			outmode[2] = '0' + args->compression_level;
>
> we may overwrite that and assume that outmode[3] is also a NUL. Which it
> is, because of how C initialization works. But that means we also do not
> need the "\0" in the initializer.
>
> Dropping that may make it slightly less jarring (any time I see a
> backslash escape in an initializer, I assume I'm in for some binary
> trickery, but this turns out to be much more mundane).
>
> I'd also consider just using a strbuf:
>
>   struct strbuf outmode = STRBUF_INIT;
>
>   strbuf_addstr(&outmode, "wb");
>   if (args->compression_level >= 0 && args->compression_level <= 9)
> 	strbuf_addch(&outmode, '0' + args->compression_level);
>
> That's overkill in a sense, but it saves us having to deal with
> manually-counted offsets, and this code is only run once per program
> invocation, so the efficiency shouldn't matter.

I'll change that, too, as it seems that `pigz` allows compression levels
higher than 9, in which case we need `strbuf_addf()` anyway. I will not
adjust the condition `<= 9`, of course, as zlib is still limited that way.

> > +		gzip = gzdopen(fileno(stdout), outmode);
> > +		if (!gzip)
> > +			die(_("Could not gzdopen stdout"));
>
> Is there a way to get a more specific error from zlib? I'm less
> concerned about gzdopen here (which should never fail), and more about
> the writing and closing steps. I don't see anything good for gzdopen(),
> but...

Sadly, I did not find anything there.

> > +	if (gzip) {
> > +		if (gzclose(gzip) != Z_OK)
> > +			die(_("gzclose failed"));
>
> ...according to zlib.h, here the returned int is meaningful. And if
> Z_ERRNO, we should probably use die_errno() to give a better message.

Okay.

Thanks,
Dscho
Johannes Schindelin April 26, 2019, 2:51 p.m. UTC | #7
Hi Peff,

On Mon, 15 Apr 2019, Jeff King wrote:

> On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:
>
> > >> As we already link to the zlib library, we can perform the compression
> > >> without even requiring gzip on the host machine.
> > >
> > > Very cool. It's nice to drop a dependency, and this should be a bit more
> > > efficient, too.
> >
> > Getting rid of dependencies is good, and using zlib is the obvious way to
> > generate .tgz files. Last time I tried something like that, a separate gzip
> > process was faster, though -- at least on Linux [1].  How does this one
> > fare?
>
> I'd expect a separate gzip to be faster in wall-clock time for a
> multi-core machine, but overall consume more CPU. I'm slightly surprised
> that your timings show that it actually wins on total CPU, too.

If performance is really a concern, you'll be much better off using `pigz`
than `gzip`.

> Here are best-of-five times for "git archive --format=tar.gz HEAD" on
> linux.git (the machine is a quad-core):
>
>   [before, separate gzip]
>   real	0m21.501s
>   user	0m26.148s
>   sys	0m0.619s
>
>   [after, internal gzwrite]
>   real	0m25.156s
>   user	0m25.059s
>   sys	0m0.096s
>
> which does show what I expect (longer overall, but less total CPU).
>
> Which one you prefer depends on your situation, of course. A user on a
> workstation with multiple cores probably cares most about end-to-end
> latency and using all of their available horsepower. A server hosting
> repositories and receiving many unrelated requests probably cares more
> about total CPU (though the differences there are small enough that it
> may not even be worth having a config knob to un-parallelize it).

I am a bit sad that this is so noticeable. Nevertheless, I think that
dropping the dependency is worth it, in particular given that `gzip` is
not exactly fast to begin with (you really should switch to `pigz` or to a
faster compression if you are interested in speed).

> > Doing compression in its own thread may be a good idea.
>
> Yeah. It might even make the patch simpler, since I'd expect it to be
> implemented with start_async() and a descriptor, making it look just
> like a gzip pipe to the caller. :)

Sadly, it does not really look like it is simpler.

And when going into the direction of multi-threaded compression anyway,
the `pigz` trick of compressing 32kB chunks in parallel is an interesting
idea, too.

All of this, however, is outside of the purview of this (still relatively
simple) patch series.

Ciao,
Dscho
Johannes Schindelin April 26, 2019, 2:54 p.m. UTC | #8
Hi brian,

On Sat, 13 Apr 2019, brian m. carlson wrote:

> On Fri, Apr 12, 2019 at 09:51:02PM -0400, Jeff King wrote:
> > I wondered how you were going to kick this in, since users can define
> > arbitrary filters. I think it's kind of neat to automagically convert
> > "gzip -cn" (which also happens to be the default). But I think we should
> > mention that in the Documentation, in case somebody tries to use a
> > custom version of gzip and wonders why it isn't kicking in.
> >
> > Likewise, it might make sense in the tests to put a poison gzip in the
> > $PATH so that we can be sure we're using our internal code, and not just
> > calling out to gzip (on platforms that have it, of course).
> >
> > The alternative is that we could use a special token like ":zlib" or
> > something to indicate that the internal implementation should be used
> > (and then tweak the baked-in default, too). That might be less
> > surprising for users, but most people would still get the benefit since
> > they'd be using the default config.
>
> I agree that a special value (or NULL, if that's possible) would be
> nicer here. That way, if someone does specify a custom gzip, we honor
> it, and it serves to document the code better. For example, if someone
> symlinked pigz to gzip and used "gzip -cn", then they might not get the
> parallelization benefits they expected.

I went with `:zlib`. The `NULL` value would not really work, as there is
no way to specify that via `archive.tgz.command`.

About the symlinked thing: I do not really want to care to support such
hacks. If you want a different compressor than the default (which can
change), you should specify it specifically.

> I'm fine overall with the idea of bringing the compression into the
> binary using zlib, provided that we preserve the "-n" behavior
> (producing reproducible archives).

Thanks for voicing this concern. I had a look at zlib's source code, and
it looks like it requires an extra function call (that we don't call) to
make the resulting file non-reproducible. In other words, it has the
opposite default behavior from `gzip`.

Ciao,
Dscho
René Scharfe April 27, 2019, 9:59 a.m. UTC | #9
Am 26.04.19 um 16:51 schrieb Johannes Schindelin:> Hi Peff,
>
> On Mon, 15 Apr 2019, Jeff King wrote:
>
>> On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:
>>
>>>>> As we already link to the zlib library, we can perform the compression
>>>>> without even requiring gzip on the host machine.
>>>>
>>>> Very cool. It's nice to drop a dependency, and this should be a bit more
>>>> efficient, too.
>>>
>>> Getting rid of dependencies is good, and using zlib is the obvious way to
>>> generate .tgz files. Last time I tried something like that, a separate gzip
>>> process was faster, though -- at least on Linux [1].  How does this one
>>> fare?
>>
>> I'd expect a separate gzip to be faster in wall-clock time for a
>> multi-core machine, but overall consume more CPU. I'm slightly surprised
>> that your timings show that it actually wins on total CPU, too.

My initial expectation back then was that moving data between processes
is costly and that compressing in-process would improve the overall
performance.  Your expectation is more in line with what I then actually
saw.  The difference in total CPU time wasn't that big, perhaps just
noise.

> If performance is really a concern, you'll be much better off using `pigz`
> than `gzip`.

Performance is always a concern, but on the other hand I didn't see any
complaints about slow archiving so far.

>> Here are best-of-five times for "git archive --format=tar.gz HEAD" on
>> linux.git (the machine is a quad-core):
>>
>>    [before, separate gzip]
>>    real	0m21.501s
>>    user	0m26.148s
>>    sys	0m0.619s
>>
>>    [after, internal gzwrite]
>>    real	0m25.156s
>>    user	0m25.059s
>>    sys	0m0.096s
>>
>> which does show what I expect (longer overall, but less total CPU).

I get similar numbers with hyperfine:

Benchmark #1: git archive --format=tar.gz HEAD >/dev/null
  Time (mean ± σ):     16.683 s ±  0.451 s    [User: 20.230 s, System: 0.375 s]
  Range (min … max):   16.308 s … 17.852 s    10 runs

Benchmark #2: ~/src/git/git-archive --format=tar.gz HEAD >/dev/null
  Time (mean ± σ):     19.898 s ±  0.228 s    [User: 19.825 s, System: 0.073 s]
  Range (min … max):   19.627 s … 20.355 s    10 runs

Benchmark #3: git archive --format=zip HEAD >/dev/null
  Time (mean ± σ):     16.449 s ±  0.075 s    [User: 16.340 s, System: 0.109 s]
  Range (min … max):   16.326 s … 16.611 s    10 runs

#1 is git v2.21.0, #2 is with the two patches applied, #3 is v2.21.0
again, but with zip output, just to put things into perspective.

>> Which one you prefer depends on your situation, of course. A user on a
>> workstation with multiple cores probably cares most about end-to-end
>> latency and using all of their available horsepower. A server hosting
>> repositories and receiving many unrelated requests probably cares more
>> about total CPU (though the differences there are small enough that it
>> may not even be worth having a config knob to un-parallelize it).
>
> I am a bit sad that this is so noticeable. Nevertheless, I think that
> dropping the dependency is worth it, in particular given that `gzip` is
> not exactly fast to begin with (you really should switch to `pigz` or to a
> faster compression if you are interested in speed).

We could import pigz verbatim, it's just 11K LOCs total. :)

>>> Doing compression in its own thread may be a good idea.
>>
>> Yeah. It might even make the patch simpler, since I'd expect it to be
>> implemented with start_async() and a descriptor, making it look just
>> like a gzip pipe to the caller. :)
>
> Sadly, it does not really look like it is simpler.

I have to agree -- at least I was unable to pull off the stdout
plumbing trick.  Is there a way?  But it doesn't look too bad, and
the performance is closer to using the real gzip:

Benchmark #1: ~/src/git/git-archive --format=tar.gz HEAD >/dev/null
  Time (mean ± σ):     17.300 s ±  0.198 s    [User: 20.825 s, System: 0.356 s]
  Range (min … max):   17.042 s … 17.638 s    10 runs

This is with the following patch:

---
 archive-tar.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 3e53aac1e6..c889b84c2c 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -38,11 +38,13 @@ static int write_tar_filter_archive(const struct archiver *ar,
 #define USTAR_MAX_MTIME 077777777777ULL
 #endif

+static int out_fd = 1;
+
 /* writes out the whole block, but only if it is full */
 static void write_if_needed(void)
 {
 	if (offset == BLOCKSIZE) {
-		write_or_die(1, block, BLOCKSIZE);
+		write_or_die(out_fd, block, BLOCKSIZE);
 		offset = 0;
 	}
 }
@@ -66,7 +68,7 @@ static void do_write_blocked(const void *data, unsigned long size)
 		write_if_needed();
 	}
 	while (size >= BLOCKSIZE) {
-		write_or_die(1, buf, BLOCKSIZE);
+		write_or_die(out_fd, buf, BLOCKSIZE);
 		size -= BLOCKSIZE;
 		buf += BLOCKSIZE;
 	}
@@ -101,10 +103,10 @@ static void write_trailer(void)
 {
 	int tail = BLOCKSIZE - offset;
 	memset(block + offset, 0, tail);
-	write_or_die(1, block, BLOCKSIZE);
+	write_or_die(out_fd, block, BLOCKSIZE);
 	if (tail < 2 * RECORDSIZE) {
 		memset(block, 0, offset);
-		write_or_die(1, block, BLOCKSIZE);
+		write_or_die(out_fd, block, BLOCKSIZE);
 	}
 }

@@ -434,6 +436,56 @@ static int write_tar_archive(const struct archiver *ar,
 	return err;
 }

+static int internal_gzip(int in, int out, void *data)
+{
+	int *levelp = data;
+	gzFile gzip = gzdopen(1, "wb");
+	if (!gzip)
+		die(_("gzdopen failed"));
+	if (gzsetparams(gzip, *levelp, Z_DEFAULT_STRATEGY) != Z_OK)
+		die(_("unable to set compression level"));
+
+	for (;;) {
+		char buf[BLOCKSIZE];
+		ssize_t read = xread(in, buf, sizeof(buf));
+		if (read < 0)
+			die_errno(_("read failed"));
+		if (read == 0)
+			break;
+		if (gzwrite(gzip, buf, read) != read)
+			die(_("gzwrite failed"));
+	}
+
+	if (gzclose(gzip) != Z_OK)
+		die(_("gzclose failed"));
+	close(in);
+	return 0;
+}
+
+static int write_tar_gzip_archive(const struct archiver *ar,
+				  struct archiver_args *args)
+{
+	struct async filter;
+	int r;
+
+	memset(&filter, 0, sizeof(filter));
+	filter.proc = internal_gzip;
+	filter.data = &args->compression_level;
+	filter.in = -1;
+
+	if (start_async(&filter))
+		die(_("unable to fork off internal gzip"));
+	out_fd = filter.in;
+
+	r = write_tar_archive(ar, args);
+
+	close(out_fd);
+	if (finish_async(&filter))
+		die(_("error in internal gzip"));
+
+	return r;
+}
+
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args)
 {
@@ -445,6 +497,9 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	if (!ar->data)
 		BUG("tar-filter archiver called with no filter defined");

+	if (!strcmp(ar->data, "gzip -cn"))
+		return write_tar_gzip_archive(ar, args);
+
 	strbuf_addstr(&cmd, ar->data);
 	if (args->compression_level >= 0)
 		strbuf_addf(&cmd, " -%d", args->compression_level);
--
2.21.0
René Scharfe April 27, 2019, 5:39 p.m. UTC | #10
Am 27.04.19 um 11:59 schrieb René Scharfe:> Am 26.04.19 um 16:51 schrieb Johannes Schindelin:
>>
>> On Mon, 15 Apr 2019, Jeff King wrote:
>>
>>> On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:
>>>
>>>> Doing compression in its own thread may be a good idea.
>>>
>>> Yeah. It might even make the patch simpler, since I'd expect it to be
>>> implemented with start_async() and a descriptor, making it look just
>>> like a gzip pipe to the caller. :)
>>
>> Sadly, it does not really look like it is simpler.
>
> I have to agree -- at least I was unable to pull off the stdout
> plumbing trick.

The simplest solution is of course to not touch the archive code.  The
patch below makes that possible:

Benchmark #1: ~/src/git/git -c tar.tgz.command=~/src/git/git-gzip archive --format=tgz HEAD >/dev/null
  Time (mean ± σ):     17.256 s ±  0.299 s    [User: 20.380 s, System: 0.294 s]
  Range (min … max):   16.940 s … 17.804 s    10 runs

Curious to see how it looks like on other systems and platforms.

And perhaps the buffer size needs to be tuned.

-- >8 --
Subject: [PATCH] add git gzip

Add a cheap gzip lookalike based on zlib for systems that don't have
(or want) the real thing.  It can be used e.g. to generate tgz files
using git archive and its configuration options tar.tgz.command and
tar.tar.gz.command, without any other external dependency.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 .gitignore       |  1 +
 Makefile         |  1 +
 builtin.h        |  1 +
 builtin/gzip.c   | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
 command-list.txt |  1 +
 git.c            |  1 +
 6 files changed, 69 insertions(+)
 create mode 100644 builtin/gzip.c

diff --git a/.gitignore b/.gitignore
index 44c74402c8..e550868219 100644
--- a/.gitignore
+++ b/.gitignore
@@ -71,6 +71,7 @@
 /git-gc
 /git-get-tar-commit-id
 /git-grep
+/git-gzip
 /git-hash-object
 /git-help
 /git-http-backend
diff --git a/Makefile b/Makefile
index 9f1b6e8926..2b34f1a4aa 100644
--- a/Makefile
+++ b/Makefile
@@ -1075,6 +1075,7 @@ BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
 BUILTIN_OBJS += builtin/grep.o
+BUILTIN_OBJS += builtin/gzip.o
 BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
 BUILTIN_OBJS += builtin/index-pack.o
diff --git a/builtin.h b/builtin.h
index b78ab6e30b..abc34cc9d0 100644
--- a/builtin.h
+++ b/builtin.h
@@ -170,6 +170,7 @@ extern int cmd_fsck(int argc, const char **argv, const char *prefix);
 extern int cmd_gc(int argc, const char **argv, const char *prefix);
 extern int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix);
 extern int cmd_grep(int argc, const char **argv, const char *prefix);
+extern int cmd_gzip(int argc, const char **argv, const char *prefix);
 extern int cmd_hash_object(int argc, const char **argv, const char *prefix);
 extern int cmd_help(int argc, const char **argv, const char *prefix);
 extern int cmd_index_pack(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gzip.c b/builtin/gzip.c
new file mode 100644
index 0000000000..90a98c44ce
--- /dev/null
+++ b/builtin/gzip.c
@@ -0,0 +1,64 @@
+#include "cache.h"
+#include "builtin.h"
+#include "parse-options.h"
+
+static const char * const gzip_usage[] = {
+	N_("git gzip [-NUM]"),
+	NULL
+};
+
+static int level_callback(const struct option *opt, const char *arg, int unset)
+{
+	int *levelp = opt->value;
+	int value;
+	const char *endp;
+
+	if (unset)
+		BUG("switch -NUM cannot be negated");
+
+	value = strtol(arg, (char **)&endp, 10);
+	if (*endp)
+		BUG("switch -NUM cannot be non-numeric");
+
+	*levelp = value;
+	return 0;
+}
+
+#define BUFFERSIZE (64 * 1024)
+
+int cmd_gzip(int argc, const char **argv, const char *prefix)
+{
+	gzFile gz;
+	int level = Z_DEFAULT_COMPRESSION;
+	struct option options[] = {
+		OPT_NUMBER_CALLBACK(&level, N_("compression level"),
+				    level_callback),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, gzip_usage, 0);
+	if (argc > 0)
+		usage_with_options(gzip_usage, options);
+
+	gz = gzdopen(1, "wb");
+	if (!gz)
+		die(_("unable to gzdopen stdout"));
+
+	if (gzsetparams(gz, level, Z_DEFAULT_STRATEGY) != Z_OK)
+		die(_("unable to set compression level %d"), level);
+
+	for (;;) {
+		char buf[BUFFERSIZE];
+		ssize_t read_bytes = xread(0, buf, sizeof(buf));
+		if (read_bytes < 0)
+			die_errno(_("unable to read from stdin"));
+		if (read_bytes == 0)
+			break;
+		if (gzwrite(gz, buf, read_bytes) != read_bytes)
+			die(_("gzwrite failed"));
+	}
+
+	if (gzclose(gz) != Z_OK)
+		die(_("gzclose failed"));
+	return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index 3a9af104b5..755848842c 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -99,6 +99,7 @@ git-gc                                  mainporcelain
 git-get-tar-commit-id                   plumbinginterrogators
 git-grep                                mainporcelain           info
 git-gui                                 mainporcelain
+git-gzip                                purehelpers
 git-hash-object                         plumbingmanipulators
 git-help                                ancillaryinterrogators          complete
 git-http-backend                        synchingrepositories
diff --git a/git.c b/git.c
index 50da125c60..48f7fc6c56 100644
--- a/git.c
+++ b/git.c
@@ -510,6 +510,7 @@ static struct cmd_struct commands[] = {
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id, NO_PARSEOPT },
 	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "gzip", cmd_gzip },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY | NO_PARSEOPT },
--
2.21.0
Johannes Schindelin April 29, 2019, 9:25 p.m. UTC | #11
Hi René,


On Sat, 27 Apr 2019, René Scharfe wrote:

> Am 27.04.19 um 11:59 schrieb René Scharfe:> Am 26.04.19 um 16:51 schrieb
> Johannes Schindelin:
> >>
> >> On Mon, 15 Apr 2019, Jeff King wrote:
> >>
> >>> On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:
> >>>
> >>>> Doing compression in its own thread may be a good idea.
> >>>
> >>> Yeah. It might even make the patch simpler, since I'd expect it to
> >>> be implemented with start_async() and a descriptor, making it look
> >>> just like a gzip pipe to the caller. :)
> >>
> >> Sadly, it does not really look like it is simpler.
> >
> > I have to agree -- at least I was unable to pull off the stdout
> > plumbing trick.
>
> The simplest solution is of course to not touch the archive code.

We could do that, of course, and we could avoid adding a new command that
we have to support for eternity by introducing a command mode for `git
archive` instead (think: `git archive --gzip -9`), and marking that
command mode clearly as an internal implementation detail.

But since the performance is still not quite on par with `gzip`, I would
actually rather not, and really, just punt on that one, stating that
people interested in higher performance should use `pigz`.

And who knows, maybe nobody will complain at all about the performance?
It's not like `gzip` is really, really fast (IIRC LZO blows gzip out of
the water, speed-wise).

And if we get "bug" reports about this, we

1) have a very easy workaround:

	git config --global archive.tgz.command 'gzip -cn'

2) could always implement a pigz-like multi-threading solution.

I strongly expect a YAGNI here, though.

Ciao,
Dscho
René Scharfe May 1, 2019, 5:45 p.m. UTC | #12
Hello Dscho,

Am 29.04.19 um 23:25 schrieb Johannes Schindelin:
> On Sat, 27 Apr 2019, René Scharfe wrote:
>> The simplest solution is of course to not touch the archive code.
>
> We could do that, of course, and we could avoid adding a new command that
> we have to support for eternity by introducing a command mode for `git
> archive` instead (think: `git archive --gzip -9`), and marking that
> command mode clearly as an internal implementation detail.

adding gzip as the 142nd git command and 18th pure helper *would* be a
bit embarrassing, in particular for a command that's not directly
related to version control and readily available on all platforms.
Exposing it as a (hidden?) archive sub-command might be better.

> But since the performance is still not quite on par with `gzip`, I would
> actually rather not, and really, just punt on that one, stating that
> people interested in higher performance should use `pigz`.

Here are my performance numbers for generating .tar.gz files again:

master, using gzip(1):
  Time (mean ± σ):     16.683 s ±  0.451 s    [User: 20.230 s, System: 0.375 s]
  Range (min … max):   16.308 s … 17.852 s    10 runs

using zlib sequentially:
  Time (mean ± σ):     19.898 s ±  0.228 s    [User: 19.825 s, System: 0.073 s]
  Range (min … max):   19.627 s … 20.355 s    10 runs

using zlib asynchronously:
  Time (mean ± σ):     17.300 s ±  0.198 s    [User: 20.825 s, System: 0.356 s]
  Range (min … max):   17.042 s … 17.638 s    10 runs

using a gzip-lookalike:
  Time (mean ± σ):     17.256 s ±  0.299 s    [User: 20.380 s, System: 0.294 s]
  Range (min … max):   16.940 s … 17.804 s    10 runs

The last two have comparable system time, ca. 1% more user time and
ca. 5% longer duration.  The second one has much better system time
and 2% less user time and 19% longer duration.  Hmm.

> And who knows, maybe nobody will complain at all about the performance?

Probably.  And popular tarballs would be cached anyway, I guess.

So I'll send comments on your series later this week.

René
Jeff King May 1, 2019, 6:18 p.m. UTC | #13
On Wed, May 01, 2019 at 07:45:05PM +0200, René Scharfe wrote:

> > But since the performance is still not quite on par with `gzip`, I would
> > actually rather not, and really, just punt on that one, stating that
> > people interested in higher performance should use `pigz`.
> 
> Here are my performance numbers for generating .tar.gz files again:
> 
> master, using gzip(1):
>   Time (mean ± σ):     16.683 s ±  0.451 s    [User: 20.230 s, System: 0.375 s]
>   Range (min … max):   16.308 s … 17.852 s    10 runs
> 
> using zlib sequentially:
>   Time (mean ± σ):     19.898 s ±  0.228 s    [User: 19.825 s, System: 0.073 s]
>   Range (min … max):   19.627 s … 20.355 s    10 runs
> 
> using zlib asynchronously:
>   Time (mean ± σ):     17.300 s ±  0.198 s    [User: 20.825 s, System: 0.356 s]
>   Range (min … max):   17.042 s … 17.638 s    10 runs
> 
> using a gzip-lookalike:
>   Time (mean ± σ):     17.256 s ±  0.299 s    [User: 20.380 s, System: 0.294 s]
>   Range (min … max):   16.940 s … 17.804 s    10 runs
> 
> The last two have comparable system time, ca. 1% more user time and
> ca. 5% longer duration.  The second one has much better system time
> and 2% less user time and 19% longer duration.  Hmm.

I think the start_async() one seems like a good option. It reclaims most
of the (wall-clock) performance, isn't very much code, and doesn't leave
any ugly user-visible traces.

I'd be fine to see it come later, though, on top of the patches Dscho is
sending. Even though changing to sequential zlib is technically a change
in behavior, the existing behavior wasn't really planned. And given the
wall-clock versus CPU time tradeoff, it's not entirely clear that one
solution is better than the other.

> > And who knows, maybe nobody will complain at all about the performance?
> 
> Probably.  And popular tarballs would be cached anyway, I guess.

At GitHub we certainly do cache the git-archive output. We'd also be
just fine with the sequential solution. We generally turn down
pack.threads to 1, and keep our CPUs busy by serving multiple users
anyway.

So whatever has the lowest overall CPU time is generally preferable, but
the times are close enough that I don't think we'd care much either way
(and it's probably not worth having a config option or similar).

-Peff
Ævar Arnfjörð Bjarmason May 2, 2019, 8:20 p.m. UTC | #14
On Fri, Apr 26 2019, Johannes Schindelin wrote:

> Hi brian,
>
> On Sat, 13 Apr 2019, brian m. carlson wrote:
>
>> On Fri, Apr 12, 2019 at 09:51:02PM -0400, Jeff King wrote:
>> > I wondered how you were going to kick this in, since users can define
>> > arbitrary filters. I think it's kind of neat to automagically convert
>> > "gzip -cn" (which also happens to be the default). But I think we should
>> > mention that in the Documentation, in case somebody tries to use a
>> > custom version of gzip and wonders why it isn't kicking in.
>> >
>> > Likewise, it might make sense in the tests to put a poison gzip in the
>> > $PATH so that we can be sure we're using our internal code, and not just
>> > calling out to gzip (on platforms that have it, of course).
>> >
>> > The alternative is that we could use a special token like ":zlib" or
>> > something to indicate that the internal implementation should be used
>> > (and then tweak the baked-in default, too). That might be less
>> > surprising for users, but most people would still get the benefit since
>> > they'd be using the default config.
>>
>> I agree that a special value (or NULL, if that's possible) would be
>> nicer here. That way, if someone does specify a custom gzip, we honor
>> it, and it serves to document the code better. For example, if someone
>> symlinked pigz to gzip and used "gzip -cn", then they might not get the
>> parallelization benefits they expected.
>
> I went with `:zlib`. The `NULL` value would not really work, as there is
> no way to specify that via `archive.tgz.command`.
>
> About the symlinked thing: I do not really want to care to support such
> hacks.

It's the standard way by which a lot of systems do this, e.g. on my
Debian box:

    $ find /{,s}bin /usr/{,s}bin -type l -exec file {} \;|grep /etc/alternatives|wc -l
    108

To write this E-Mail I'm invoking one such symlink :)

> If you want a different compressor than the default (which can
> change), you should specify it specifically.

You might want to do so system-wide, or for each program at a time.

I don't care about this for gzip myself, just pointing out it *is* a
thing people use.

>> I'm fine overall with the idea of bringing the compression into the
>> binary using zlib, provided that we preserve the "-n" behavior
>> (producing reproducible archives).
>
> Thanks for voicing this concern. I had a look at zlib's source code, and
> it looks like it requires an extra function call (that we don't call) to
> make the resulting file non-reproducible. In other words, it has the
> opposite default behavior from `gzip`.

Just commenting on the overall thread: I like René's "new built-in"
patch best.

You mentioned "new command that we have to support for eternity". I
think calling it "git gzip" is a bad idea. We'd make it "git
archive--gzip" or "git archive--helper", and we could hide building it
behind some compat flag.

Then we'd carry no if/else internal/external code, and the portability
issue that started this would be addressed, no?

As a bonus we could also drop the "GZIP" prereq from the test suite
entirely and just put that "gzip" in $PATH for the purposes of the
tests.

I spied on your yet-to-be-submitted patches and you could drop GZIP from
the "git archive" tests, but we'd still need it in
t/t5562-http-backend-content-length.sh, but not if we had a "gzip"
compat helper.

There's also a long-standing bug/misfeature in git-archive that I wonder
about: When you combine --format with --remote you can only generate
e.g. tar.gz if the remote is OK with it, if it says no you can't even if
it supports "tar" and you could do the "gz" part locally. Would such a
patch be harder with :zlib than if we always just spewed out to external
"gzip" after satisfying some criteria?
Johannes Schindelin May 3, 2019, 8:49 p.m. UTC | #15
Hi Ævar,

On Thu, 2 May 2019, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Apr 26 2019, Johannes Schindelin wrote:
>
> > On Sat, 13 Apr 2019, brian m. carlson wrote:
> >
> >> On Fri, Apr 12, 2019 at 09:51:02PM -0400, Jeff King wrote:
> >> > I wondered how you were going to kick this in, since users can
> >> > define arbitrary filters. I think it's kind of neat to
> >> > automagically convert "gzip -cn" (which also happens to be the
> >> > default). But I think we should mention that in the Documentation,
> >> > in case somebody tries to use a custom version of gzip and wonders
> >> > why it isn't kicking in.
> >> >
> >> > Likewise, it might make sense in the tests to put a poison gzip in
> >> > the $PATH so that we can be sure we're using our internal code, and
> >> > not just calling out to gzip (on platforms that have it, of
> >> > course).
> >> >
> >> > The alternative is that we could use a special token like ":zlib"
> >> > or something to indicate that the internal implementation should be
> >> > used (and then tweak the baked-in default, too). That might be less
> >> > surprising for users, but most people would still get the benefit
> >> > since they'd be using the default config.
> >>
> >> I agree that a special value (or NULL, if that's possible) would be
> >> nicer here. That way, if someone does specify a custom gzip, we honor
> >> it, and it serves to document the code better. For example, if
> >> someone symlinked pigz to gzip and used "gzip -cn", then they might
> >> not get the parallelization benefits they expected.
> >
> > I went with `:zlib`. The `NULL` value would not really work, as there
> > is no way to specify that via `archive.tgz.command`.
> >
> > About the symlinked thing: I do not really want to care to support
> > such hacks.
>
> It's the standard way by which a lot of systems do this, e.g. on my
> Debian box:
>
>     $ find /{,s}bin /usr/{,s}bin -type l -exec file {} \;|grep /etc/alternatives|wc -l
>     108
>
> To write this E-Mail I'm invoking one such symlink :)

I am well aware of the way Debian-based systems handle alternatives, and I
myself also use something similar to write this E-Mail (but it is not a
symlink, it is a Git alias).

But that's not the hack that I was talking about.

The hack I meant was: if you symlink `gzip` to `pigz` in your `PATH` *and
then expect `git archive --format=tgz` to pick that up*.

As far as I am concerned, the fact that `git archive --format=tgz` spawns
`gzip` to perform the compression is an implementation detail, and not
something that users should feel they can rely on.

> > If you want a different compressor than the default (which can
> > change), you should specify it specifically.
>
> You might want to do so system-wide, or for each program at a time.
>
> I don't care about this for gzip myself, just pointing out it *is* a
> thing people use.

Sure.

> >> I'm fine overall with the idea of bringing the compression into the
> >> binary using zlib, provided that we preserve the "-n" behavior
> >> (producing reproducible archives).
> >
> > Thanks for voicing this concern. I had a look at zlib's source code,
> > and it looks like it requires an extra function call (that we don't
> > call) to make the resulting file non-reproducible. In other words, it
> > has the opposite default behavior from `gzip`.
>
> Just commenting on the overall thread: I like René's "new built-in"
> patch best.

I guess we now have to diverging votes: yours for the `git archive --gzip`
"built-in" and Peff's for the async code ;-)

> You mentioned "new command that we have to support for eternity". I
> think calling it "git gzip" is a bad idea. We'd make it "git
> archive--gzip" or "git archive--helper", and we could hide building it
> behind some compat flag.
>
> Then we'd carry no if/else internal/external code, and the portability
> issue that started this would be addressed, no?

Sure.

The async version would leave the door wide open for implementing pigz'
trick to multi-thread the compression, though.

> As a bonus we could also drop the "GZIP" prereq from the test suite
> entirely and just put that "gzip" in $PATH for the purposes of the
> tests.
>
> I spied on your yet-to-be-submitted patches and you could drop GZIP from
> the "git archive" tests, but we'd still need it in
> t/t5562-http-backend-content-length.sh, but not if we had a "gzip"
> compat helper.

We need it at least once for *decompressing* the `--format=tgz` output in
order to compare it to the `--format=tar` output. Besides, I think it is
really important to keep the test that verifies that the output is correct
(i.e. that gzip can decompress it).

> There's also a long-standing bug/misfeature in git-archive that I wonder
> about: When you combine --format with --remote you can only generate
> e.g. tar.gz if the remote is OK with it, if it says no you can't even if
> it supports "tar" and you could do the "gz" part locally. Would such a
> patch be harder with :zlib than if we always just spewed out to external
> "gzip" after satisfying some criteria?

I think it would be precisely the same: you'd still use the same "filter"
code path.

Ciao,
Dscho
Jeff King May 3, 2019, 8:52 p.m. UTC | #16
On Fri, May 03, 2019 at 10:49:17PM +0200, Johannes Schindelin wrote:

> I am well aware of the way Debian-based systems handle alternatives, and I
> myself also use something similar to write this E-Mail (but it is not a
> symlink, it is a Git alias).
> 
> But that's not the hack that I was talking about.
> 
> The hack I meant was: if you symlink `gzip` to `pigz` in your `PATH` *and
> then expect `git archive --format=tgz` to pick that up*.
> 
> As far as I am concerned, the fact that `git archive --format=tgz` spawns
> `gzip` to perform the compression is an implementation detail, and not
> something that users should feel they can rely on.

I'd agree with you more if we didn't document a user-facing config
variable that claims to run "gzip" from the system.

> > Just commenting on the overall thread: I like René's "new built-in"
> > patch best.
> 
> I guess we now have to diverging votes: yours for the `git archive --gzip`
> "built-in" and Peff's for the async code ;-)

For the record, I am fine with any of the solutions (including just
doing the single-thread bit you already have and letting René do what he
likes on top).

-Peff
René Scharfe June 10, 2019, 10:44 a.m. UTC | #17
Am 01.05.19 um 20:18 schrieb Jeff King:
> On Wed, May 01, 2019 at 07:45:05PM +0200, René Scharfe wrote:
>
>>> But since the performance is still not quite on par with `gzip`, I would
>>> actually rather not, and really, just punt on that one, stating that
>>> people interested in higher performance should use `pigz`.
>>
>> Here are my performance numbers for generating .tar.gz files again:

OK, tried one more version, with pthreads (patch at the end).  Also
redid all measurements for better comparability; everything is faster
now for some reason (perhaps due to a compiler update? clang version
7.0.1-8 now):

master, using gzip(1):
Benchmark #1: git archive --format=tgz HEAD
  Time (mean ± σ):     15.697 s ±  0.246 s    [User: 19.213 s, System: 0.386 s]
  Range (min … max):   15.405 s … 16.103 s    10 runs

using zlib sequentially:
Benchmark #1: git archive --format=tgz HEAD
  Time (mean ± σ):     19.191 s ±  0.408 s    [User: 19.091 s, System: 0.100 s]
  Range (min … max):   18.802 s … 19.877 s    10 runs

using a gzip-lookalike:
Benchmark #1: git archive --format=tgz HEAD
  Time (mean ± σ):     16.289 s ±  0.218 s    [User: 19.485 s, System: 0.337 s]
  Range (min … max):   16.020 s … 16.555 s    10 runs

using zlib with start_async:
Benchmark #1: git archive --format=tgz HEAD
  Time (mean ± σ):     16.516 s ±  0.334 s    [User: 20.282 s, System: 0.383 s]
  Range (min … max):   16.166 s … 17.283 s    10 runs

using zlib in a separate thread (that's the new one):
Benchmark #1: git archive --format=tgz HEAD
  Time (mean ± σ):     16.310 s ±  0.237 s    [User: 20.075 s, System: 0.173 s]
  Range (min … max):   15.983 s … 16.790 s    10 runs

> I think the start_async() one seems like a good option. It reclaims most
> of the (wall-clock) performance, isn't very much code, and doesn't leave
> any ugly user-visible traces.

The pthreads numbers look a bit better still.  The patch is huge though,
because it duplicates almost everything.  It was easier that way; a real
patch series would extract functions that can be used both with static
and allocated headers first, and keep everything in archive-tar.c.

> I'd be fine to see it come later, though, on top of the patches Dscho is
> sending. Even though changing to sequential zlib is technically a change
> in behavior, the existing behavior wasn't really planned. And given the
> wall-clock versus CPU time tradeoff, it's not entirely clear that one
> solution is better than the other.

The current behavior is not an accident; the synchronous method was
rejected in 2009 because it was slower [1].  Redid the measurements
with v1.6.5-rc0 and the old patch [2], but they would only compile with
gcc (Debian 8.3.0-6) for me, so it's not directly comparable to the
numbers above:

v1.6.5-rc0:
Benchmark #1: ../git/git-archive HEAD | gzip
  Time (mean ± σ):     16.051 s ±  0.486 s    [User: 19.514 s, System: 0.341 s]
  Range (min … max):   15.416 s … 17.001 s    10 runs

v1.6.5-rc0 + [2]:
Benchmark #1: ../git/git-archive --format=tar.gz HEAD
  Time (mean ± σ):     19.684 s ±  0.374 s    [User: 19.601 s, System: 0.060 s]
  Range (min … max):   19.082 s … 20.177 s    10 runs

User time is still slightly higher, but the difference is in the noise.

[1] http://public-inbox.org/git/4AAAC8CE.8020302@lsrfire.ath.cx/
[2] http://public-inbox.org/git/4AA97B61.6030301@lsrfire.ath.cx/

>>> And who knows, maybe nobody will complain at all about the performance?
>>
>> Probably.  And popular tarballs would be cached anyway, I guess.
>
> At GitHub we certainly do cache the git-archive output. We'd also be
> just fine with the sequential solution. We generally turn down
> pack.threads to 1, and keep our CPUs busy by serving multiple users
> anyway.
>
> So whatever has the lowest overall CPU time is generally preferable, but
> the times are close enough that I don't think we'd care much either way
> (and it's probably not worth having a config option or similar).

Moving back to 2009 and reducing the number of utilized cores both feels
weird, but the sequential solution *is* the most obvious, easiest and
(by a narrow margin) lightest one if gzip(1) is not an option anymore.

Anyway, the threading patch:

---
 Makefile      |   1 +
 archive-tar.c |  11 +-
 archive-tgz.c | 452 ++++++++++++++++++++++++++++++++++++++++++++++++++
 archive.h     |   4 +
 4 files changed, 465 insertions(+), 3 deletions(-)
 create mode 100644 archive-tgz.c

diff --git a/Makefile b/Makefile
index 8a7e235352..ed649ac18d 100644
--- a/Makefile
+++ b/Makefile
@@ -834,6 +834,7 @@ LIB_OBJS += alloc.o
 LIB_OBJS += apply.o
 LIB_OBJS += archive.o
 LIB_OBJS += archive-tar.o
+LIB_OBJS += archive-tgz.o
 LIB_OBJS += archive-zip.o
 LIB_OBJS += argv-array.o
 LIB_OBJS += attr.o
diff --git a/archive-tar.c b/archive-tar.c
index 3e53aac1e6..929eb58235 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -15,7 +15,9 @@
 static char block[BLOCKSIZE];
 static unsigned long offset;

-static int tar_umask = 002;
+int tar_umask = 002;
+
+static const char internal_gzip[] = "git archive gzip";

 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args);
@@ -445,6 +447,9 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	if (!ar->data)
 		BUG("tar-filter archiver called with no filter defined");

+	if (!strcmp(ar->data, internal_gzip))
+		return write_tgz_archive(ar, args);
+
 	strbuf_addstr(&cmd, ar->data);
 	if (args->compression_level >= 0)
 		strbuf_addf(&cmd, " -%d", args->compression_level);
@@ -483,9 +488,9 @@ void init_tar_archiver(void)
 	int i;
 	register_archiver(&tar_archiver);

-	tar_filter_config("tar.tgz.command", "gzip -cn", NULL);
+	tar_filter_config("tar.tgz.command", internal_gzip, NULL);
 	tar_filter_config("tar.tgz.remote", "true", NULL);
-	tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL);
+	tar_filter_config("tar.tar.gz.command", internal_gzip, NULL);
 	tar_filter_config("tar.tar.gz.remote", "true", NULL);
 	git_config(git_tar_config, NULL);
 	for (i = 0; i < nr_tar_filters; i++) {
diff --git a/archive-tgz.c b/archive-tgz.c
new file mode 100644
index 0000000000..ae219e1cc0
--- /dev/null
+++ b/archive-tgz.c
@@ -0,0 +1,452 @@
+#include "cache.h"
+#include "config.h"
+#include "tar.h"
+#include "archive.h"
+#include "object-store.h"
+#include "streaming.h"
+
+#define RECORDSIZE	(512)
+#define BLOCKSIZE	(RECORDSIZE * 20)
+
+static gzFile gzip;
+static size_t offset;
+
+/*
+ * This is the max value that a ustar size header can specify, as it is fixed
+ * at 11 octal digits. POSIX specifies that we switch to extended headers at
+ * this size.
+ *
+ * Likewise for the mtime (which happens to use a buffer of the same size).
+ */
+#if ULONG_MAX == 0xFFFFFFFF
+#define USTAR_MAX_SIZE ULONG_MAX
+#else
+#define USTAR_MAX_SIZE 077777777777UL
+#endif
+#if TIME_MAX == 0xFFFFFFFF
+#define USTAR_MAX_MTIME TIME_MAX
+#else
+#define USTAR_MAX_MTIME 077777777777ULL
+#endif
+
+static void tgz_write(const void *data, size_t size)
+{
+	const char *p = data;
+	while (size) {
+		size_t to_write = size;
+		if (to_write > UINT_MAX)
+			to_write = UINT_MAX;
+		if (gzwrite(gzip, p, to_write) != to_write)
+			die(_("gzwrite failed"));
+		p += to_write;
+		size -= to_write;
+		offset = (offset + to_write) % BLOCKSIZE;
+	}
+}
+
+static void tgz_finish_record(void)
+{
+	size_t tail = offset % RECORDSIZE;
+	if (tail) {
+		size_t to_seek = RECORDSIZE - tail;
+		if (gzseek(gzip, to_seek, SEEK_CUR) < 0)
+			die(_("gzseek failed"));
+		offset = (offset + to_seek) % BLOCKSIZE;
+	}
+}
+
+static void tgz_write_trailer(void)
+{
+	size_t to_seek = BLOCKSIZE - offset;
+	if (to_seek < 2 * RECORDSIZE)
+		to_seek += BLOCKSIZE;
+	if (gzseek(gzip, to_seek, SEEK_CUR) < 0)
+		 die(_("gzseek failed"));
+	if (gzflush(gzip, Z_FINISH) != Z_OK)
+		die(_("gzflush failed"));
+}
+
+struct work_item {
+	void *buffer;
+	size_t size;
+	int finish_record;
+};
+
+#define TODO_SIZE 64
+struct work_item todo[TODO_SIZE];
+static int todo_start;
+static int todo_end;
+static int todo_done;
+static int all_work_added;
+static pthread_mutex_t tar_mutex;
+static pthread_t thread;
+
+static void tar_lock(void)
+{
+	pthread_mutex_lock(&tar_mutex);
+}
+
+static void tar_unlock(void)
+{
+	pthread_mutex_unlock(&tar_mutex);
+}
+
+static pthread_cond_t cond_add;
+static pthread_cond_t cond_write;
+static pthread_cond_t cond_result;
+
+static void add_work(void *buffer, size_t size, int finish_record)
+{
+	tar_lock();
+
+	while ((todo_end + 1) % ARRAY_SIZE(todo) == todo_done)
+		pthread_cond_wait(&cond_write, &tar_mutex);
+
+	todo[todo_end].buffer = buffer;
+	todo[todo_end].size = size;
+	todo[todo_end].finish_record = finish_record;
+
+	todo_end = (todo_end + 1) % ARRAY_SIZE(todo);
+
+	pthread_cond_signal(&cond_add);
+	tar_unlock();
+}
+
+static struct work_item *get_work(void)
+{
+	struct work_item *ret = NULL;
+
+	tar_lock();
+	while (todo_start == todo_end && !all_work_added)
+		pthread_cond_wait(&cond_add, &tar_mutex);
+
+	if (todo_start != todo_end || !all_work_added) {
+		ret = &todo[todo_start];
+		todo_start = (todo_start + 1) % ARRAY_SIZE(todo);
+	}
+	tar_unlock();
+	return ret;
+}
+
+static void work_done(void)
+{
+	tar_lock();
+	todo_done = (todo_done + 1) % ARRAY_SIZE(todo);
+	pthread_cond_signal(&cond_write);
+
+	if (all_work_added && todo_done == todo_end)
+		pthread_cond_signal(&cond_result);
+	tar_unlock();
+}
+
+static void *run(void *arg)
+{
+	for (;;) {
+		struct work_item *w = get_work();
+		if (!w)
+			break;
+		tgz_write(w->buffer, w->size);
+		free(w->buffer);
+		if (w->finish_record)
+			tgz_finish_record();
+		work_done();
+	}
+	return NULL;
+}
+
+static void start_output_thread(void)
+{
+	int err;
+
+	pthread_mutex_init(&tar_mutex, NULL);
+	pthread_cond_init(&cond_add, NULL);
+	pthread_cond_init(&cond_write, NULL);
+	pthread_cond_init(&cond_result, NULL);
+
+	memset(todo, 0, sizeof(todo));
+
+	err = pthread_create(&thread, NULL, run, NULL);
+	if (err)
+		die(_("failed to create thread: %s"), strerror(err));
+}
+
+static void wait_for_output_thread(void)
+{
+	tar_lock();
+	all_work_added = 1;
+
+	while (todo_done != todo_end)
+		pthread_cond_wait(&cond_result, &tar_mutex);
+
+	pthread_cond_broadcast(&cond_add);
+	tar_unlock();
+
+	pthread_join(thread, NULL);
+
+	pthread_mutex_destroy(&tar_mutex);
+	pthread_cond_destroy(&cond_add);
+	pthread_cond_destroy(&cond_write);
+	pthread_cond_destroy(&cond_result);
+}
+
+static int stream_blob(const struct object_id *oid)
+{
+	struct git_istream *st;
+	enum object_type type;
+	unsigned long sz;
+	ssize_t readlen;
+	size_t chunk_size = BLOCKSIZE * 10;
+
+	st = open_istream(oid, &type, &sz, NULL);
+	if (!st)
+		return error(_("cannot stream blob %s"), oid_to_hex(oid));
+	for (;;) {
+		char *buf = xmalloc(chunk_size);
+		readlen = read_istream(st, buf, chunk_size);
+		if (readlen <= 0)
+			break;
+		sz -= readlen;
+		add_work(buf, readlen, !sz);
+	}
+	close_istream(st);
+	return readlen;
+}
+
+/*
+ * pax extended header records have the format "%u %s=%s\n".  %u contains
+ * the size of the whole string (including the %u), the first %s is the
+ * keyword, the second one is the value.  This function constructs such a
+ * string and appends it to a struct strbuf.
+ */
+static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
+				     const char *value, unsigned int valuelen)
+{
+	int len, tmp;
+
+	/* "%u %s=%s\n" */
+	len = 1 + 1 + strlen(keyword) + 1 + valuelen + 1;
+	for (tmp = len; tmp > 9; tmp /= 10)
+		len++;
+
+	strbuf_grow(sb, len);
+	strbuf_addf(sb, "%u %s=", len, keyword);
+	strbuf_add(sb, value, valuelen);
+	strbuf_addch(sb, '\n');
+}
+
+/*
+ * Like strbuf_append_ext_header, but for numeric values.
+ */
+static void strbuf_append_ext_header_uint(struct strbuf *sb,
+					  const char *keyword,
+					  uintmax_t value)
+{
+	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
+	int len;
+
+	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
+	strbuf_append_ext_header(sb, keyword, buf, len);
+}
+
+static unsigned int ustar_header_chksum(const struct ustar_header *header)
+{
+	const unsigned char *p = (const unsigned char *)header;
+	unsigned int chksum = 0;
+	while (p < (const unsigned char *)header->chksum)
+		chksum += *p++;
+	chksum += sizeof(header->chksum) * ' ';
+	p += sizeof(header->chksum);
+	while (p < (const unsigned char *)header + sizeof(struct ustar_header))
+		chksum += *p++;
+	return chksum;
+}
+
+static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
+{
+	size_t i = pathlen;
+	if (i > 1 && path[i - 1] == '/')
+		i--;
+	if (i > maxlen)
+		i = maxlen;
+	do {
+		i--;
+	} while (i > 0 && path[i] != '/');
+	return i;
+}
+
+static void prepare_header(struct archiver_args *args,
+			   struct ustar_header *header,
+			   unsigned int mode, unsigned long size)
+{
+	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
+	xsnprintf(header->size, sizeof(header->size), "%011"PRIoMAX , S_ISREG(mode) ? (uintmax_t)size : (uintmax_t)0);
+	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
+
+	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
+	xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
+	strlcpy(header->uname, "root", sizeof(header->uname));
+	strlcpy(header->gname, "root", sizeof(header->gname));
+	xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0);
+	xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0);
+
+	memcpy(header->magic, "ustar", 6);
+	memcpy(header->version, "00", 2);
+
+	xsnprintf(header->chksum, sizeof(header->chksum), "%07o", ustar_header_chksum(header));
+}
+
+static void write_extended_header(struct archiver_args *args,
+				  const struct object_id *oid,
+				  struct strbuf *extended_header)
+{
+	size_t size;
+	char *buffer = strbuf_detach(extended_header, &size);
+	struct ustar_header *header = xcalloc(1, sizeof(*header));
+	unsigned int mode;
+	*header->typeflag = TYPEFLAG_EXT_HEADER;
+	mode = 0100666;
+	xsnprintf(header->name, sizeof(header->name), "%s.paxheader",
+		  oid_to_hex(oid));
+	prepare_header(args, header, mode, size);
+	add_work(header, sizeof(*header), 1);
+	add_work(buffer, size, 1);
+}
+
+static int write_tar_entry(struct archiver_args *args,
+			   const struct object_id *oid,
+			   const char *path, size_t pathlen,
+			   unsigned int mode)
+{
+	struct ustar_header *header = xcalloc(1, sizeof(*header));
+	struct strbuf ext_header = STRBUF_INIT;
+	unsigned int old_mode = mode;
+	unsigned long size, size_in_header;
+	void *buffer;
+	int err = 0;
+
+	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
+		*header->typeflag = TYPEFLAG_DIR;
+		mode = (mode | 0777) & ~tar_umask;
+	} else if (S_ISLNK(mode)) {
+		*header->typeflag = TYPEFLAG_LNK;
+		mode |= 0777;
+	} else if (S_ISREG(mode)) {
+		*header->typeflag = TYPEFLAG_REG;
+		mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;
+	} else {
+		return error(_("unsupported file mode: 0%o (SHA1: %s)"),
+			     mode, oid_to_hex(oid));
+	}
+	if (pathlen > sizeof(header->name)) {
+		size_t plen = get_path_prefix(path, pathlen,
+					      sizeof(header->prefix));
+		size_t rest = pathlen - plen - 1;
+		if (plen > 0 && rest <= sizeof(header->name)) {
+			memcpy(header->prefix, path, plen);
+			memcpy(header->name, path + plen + 1, rest);
+		} else {
+			xsnprintf(header->name, sizeof(header->name), "%s.data",
+				  oid_to_hex(oid));
+			strbuf_append_ext_header(&ext_header, "path",
+						 path, pathlen);
+		}
+	} else
+		memcpy(header->name, path, pathlen);
+
+	if (S_ISREG(mode) && !args->convert &&
+	    oid_object_info(args->repo, oid, &size) == OBJ_BLOB &&
+	    size > big_file_threshold)
+		buffer = NULL;
+	else if (S_ISLNK(mode) || S_ISREG(mode)) {
+		enum object_type type;
+		buffer = object_file_to_archive(args, path, oid, old_mode, &type, &size);
+		if (!buffer)
+			return error(_("cannot read %s"), oid_to_hex(oid));
+	} else {
+		buffer = NULL;
+		size = 0;
+	}
+
+	if (S_ISLNK(mode)) {
+		if (size > sizeof(header->linkname)) {
+			xsnprintf(header->linkname, sizeof(header->linkname),
+				  "see %s.paxheader", oid_to_hex(oid));
+			strbuf_append_ext_header(&ext_header, "linkpath",
+						 buffer, size);
+		} else
+			memcpy(header->linkname, buffer, size);
+	}
+
+	size_in_header = size;
+	if (S_ISREG(mode) && size > USTAR_MAX_SIZE) {
+		size_in_header = 0;
+		strbuf_append_ext_header_uint(&ext_header, "size", size);
+	}
+
+	prepare_header(args, header, mode, size_in_header);
+
+	if (ext_header.len > 0) {
+		write_extended_header(args, oid, &ext_header);
+	}
+	add_work(header, sizeof(*header), 1);
+	if (S_ISREG(mode) && size > 0) {
+		if (buffer)
+			add_work(buffer, size, 1);
+		else
+			err = stream_blob(oid);
+	} else
+		free(buffer);
+	return err;
+}
+
+static void write_global_extended_header(struct archiver_args *args)
+{
+	const struct object_id *oid = args->commit_oid;
+	struct strbuf ext_header = STRBUF_INIT;
+	struct ustar_header *header = xcalloc(1, sizeof(*header));
+	unsigned int mode;
+	size_t size;
+	char *buffer;
+
+	if (oid)
+		strbuf_append_ext_header(&ext_header, "comment",
+					 oid_to_hex(oid),
+					 the_hash_algo->hexsz);
+	if (args->time > USTAR_MAX_MTIME) {
+		strbuf_append_ext_header_uint(&ext_header, "mtime",
+					      args->time);
+		args->time = USTAR_MAX_MTIME;
+	}
+
+	if (!ext_header.len)
+		return;
+
+	buffer = strbuf_detach(&ext_header, &size);
+	*header->typeflag = TYPEFLAG_GLOBAL_HEADER;
+	mode = 0100666;
+	xsnprintf(header->name, sizeof(header->name), "pax_global_header");
+	prepare_header(args, header, mode, size);
+	add_work(header, sizeof(*header), 1);
+	add_work(buffer, size, 1);
+}
+
+int write_tgz_archive(const struct archiver *ar, struct archiver_args *args)
+{
+	int level = args->compression_level;
+	int err = 0;
+
+	gzip = gzdopen(1, "wb");
+	if (!gzip)
+		return error(_("gzdopen failed"));
+	if (gzsetparams(gzip, level, Z_DEFAULT_STRATEGY) != Z_OK)
+		return error(_("unable to set compression level %d"), level);
+
+	start_output_thread();
+	write_global_extended_header(args);
+	err = write_archive_entries(args, write_tar_entry);
+	if (err)
+		return err;
+	wait_for_output_thread();
+	tgz_write_trailer();
+	return err;
+}
diff --git a/archive.h b/archive.h
index e60e3dd31c..b00afa1a9f 100644
--- a/archive.h
+++ b/archive.h
@@ -45,6 +45,10 @@ void init_tar_archiver(void);
 void init_zip_archiver(void);
 void init_archivers(void);

+int tar_umask;
+
+int write_tgz_archive(const struct archiver *ar, struct archiver_args *args);
+
 typedef int (*write_archive_entry_fn_t)(struct archiver_args *args,
 					const struct object_id *oid,
 					const char *path, size_t pathlen,
--
2.22.0
Jeff King June 13, 2019, 7:16 p.m. UTC | #18
On Mon, Jun 10, 2019 at 12:44:54PM +0200, René Scharfe wrote:

> Am 01.05.19 um 20:18 schrieb Jeff King:
> > On Wed, May 01, 2019 at 07:45:05PM +0200, René Scharfe wrote:
> >
> >>> But since the performance is still not quite on par with `gzip`, I would
> >>> actually rather not, and really, just punt on that one, stating that
> >>> people interested in higher performance should use `pigz`.
> >>
> >> Here are my performance numbers for generating .tar.gz files again:
> 
> OK, tried one more version, with pthreads (patch at the end).  Also
> redid all measurements for better comparability; everything is faster
> now for some reason (perhaps due to a compiler update? clang version
> 7.0.1-8 now):

Hmm. Interesting that using pthreads is still slower than just shelling
out to gzip:

> master, using gzip(1):
> Benchmark #1: git archive --format=tgz HEAD
>   Time (mean ± σ):     15.697 s ±  0.246 s    [User: 19.213 s, System: 0.386 s]
>   Range (min … max):   15.405 s … 16.103 s    10 runs
> [...]
> using zlib in a separate thread (that's the new one):
> Benchmark #1: git archive --format=tgz HEAD
>   Time (mean ± σ):     16.310 s ±  0.237 s    [User: 20.075 s, System: 0.173 s]
>   Range (min … max):   15.983 s … 16.790 s    10 runs

I wonder if zlib is just slower. Or if the cost of context switching
is somehow higher than just dumping big chunks over a pipe. In
particular, our gzip-alike is still faster than pthreads:

> using a gzip-lookalike:
> Benchmark #1: git archive --format=tgz HEAD
>   Time (mean ± σ):     16.289 s ±  0.218 s    [User: 19.485 s, System: 0.337 s]
>   Range (min … max):   16.020 s … 16.555 s    10 runs

though it looks like the timings do overlap.

> > At GitHub we certainly do cache the git-archive output. We'd also be
> > just fine with the sequential solution. We generally turn down
> > pack.threads to 1, and keep our CPUs busy by serving multiple users
> > anyway.
> >
> > So whatever has the lowest overall CPU time is generally preferable, but
> > the times are close enough that I don't think we'd care much either way
> > (and it's probably not worth having a config option or similar).
> 
> Moving back to 2009 and reducing the number of utilized cores both feels
> weird, but the sequential solution *is* the most obvious, easiest and
> (by a narrow margin) lightest one if gzip(1) is not an option anymore.

It sounds like we resolved to give the "internal gzip" its own name
(whether it's a gzip-alike command, or a special name we recognize to
trigger the internal code). So maybe we could continue to default to
"gzip -cn", but platforms could do otherwise when shipping gzip there is
a pain (i.e. Windows, but maybe also anybody else who wants to set
NO_EXTERNAL_GZIP or detect it from autoconf).

-Peff
diff mbox series

Patch

diff --git a/archive-tar.c b/archive-tar.c
index ba37dad27c..5979ed14b7 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -466,18 +466,34 @@  static int write_tar_filter_archive(const struct archiver *ar,
 	filter.use_shell = 1;
 	filter.in = -1;
 
-	if (start_command(&filter) < 0)
-		die_errno(_("unable to start '%s' filter"), argv[0]);
-	close(1);
-	if (dup2(filter.in, 1) < 0)
-		die_errno(_("unable to redirect descriptor"));
-	close(filter.in);
+	if (!strcmp("gzip -cn", ar->data)) {
+		char outmode[4] = "wb\0";
+
+		if (args->compression_level >= 0 && args->compression_level <= 9)
+			outmode[2] = '0' + args->compression_level;
+
+		gzip = gzdopen(fileno(stdout), outmode);
+		if (!gzip)
+			die(_("Could not gzdopen stdout"));
+	} else {
+		if (start_command(&filter) < 0)
+			die_errno(_("unable to start '%s' filter"), argv[0]);
+		close(1);
+		if (dup2(filter.in, 1) < 0)
+			die_errno(_("unable to redirect descriptor"));
+		close(filter.in);
+	}
 
 	r = write_tar_archive(ar, args);
 
-	close(1);
-	if (finish_command(&filter) != 0)
-		die(_("'%s' filter reported error"), argv[0]);
+	if (gzip) {
+		if (gzclose(gzip) != Z_OK)
+			die(_("gzclose failed"));
+	} else {
+		close(1);
+		if (finish_command(&filter) != 0)
+			die(_("'%s' filter reported error"), argv[0]);
+	}
 
 	strbuf_release(&cmd);
 	return r;