diff mbox series

[1/1] upload-pack.c: fix filter spec quoting bug

Message ID 20210122142137.21161-2-jacob@gitlab.com (mailing list archive)
State Superseded
Headers show
Series upload-pack.c: fix filter spec quoting bug | expand

Commit Message

Jacob Vosmaer Jan. 22, 2021, 2:21 p.m. UTC
This fixes a bug that occurs when you combine partial clone and
uploadpack.packobjectshook. You can reproduce it as follows:

git clone -u 'git -c uploadpack.allowfilter '\
'-c uploadpack.packobjectshook=" exec" '\
'upload-pack' --filter=blob:none --no-local \
src.git dst.git

Be careful with the line endings because this has a long quoted string
as the -u argument. Note that there is an intentional space before
'exec'. Without that space, run-command.c tries to be smart and the
command fails for the wrong reason.

The error I get when I run this is:

Cloning into '/tmp/broken'...
remote: fatal: invalid filter-spec ''blob:none''
error: git upload-pack: git-pack-objects died with error.
fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: index-pack failed

The problem is an unnecessary and harmful layer of quoting. I tried
digging through the history of this function and I think this quoting
was there from the start. My best guess is that it stems from a
misunderstanding of what use_shell=1 means. The code seems to assume
it means "arguments get joined into one big string, then fed to
/bin/sh". But that is not what it means: use_shell=1 means that the
first argument in the arguments array may be a shell script and if so
should be passed to /bin/sh. All other arguments are passed as normal
arguments.

The solution is simple: never quote the filter spec.
---
 upload-pack.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Comments

Jeff King Jan. 22, 2021, 8:32 p.m. UTC | #1
On Fri, Jan 22, 2021 at 03:21:37PM +0100, Jacob Vosmaer wrote:

> This fixes a bug that occurs when you combine partial clone and
> uploadpack.packobjectshook. You can reproduce it as follows:
> 
> git clone -u 'git -c uploadpack.allowfilter '\
> '-c uploadpack.packobjectshook=" exec" '\
> 'upload-pack' --filter=blob:none --no-local \
> src.git dst.git
> 
> Be careful with the line endings because this has a long quoted string
> as the -u argument. Note that there is an intentional space before
> 'exec'. Without that space, run-command.c tries to be smart and the
> command fails for the wrong reason.

The "-u" command is run with a shell, so:

  git clone \
    -u 'git -c uploadpack.allowfilter \
            -c uploadpack.packobjectshook=env \
	    upload-pack' \
  --filter=blob:none --no-local src.git dst.git

may be a more readable version. I also found the use of " exec" clever,
but rather subtle; you need the extra space so that our "don't bother
using a shell" run-command optimization does not kick in. I replaced it
with "env" here, which is a slightly more canonical way of running a
sub-program that does not rely on shell builtins.

But all of this should be added as a new test, probably in t5544 with
the other pack-objects hook tests.

> The problem is an unnecessary and harmful layer of quoting. I tried
> digging through the history of this function and I think this quoting
> was there from the start. My best guess is that it stems from a
> misunderstanding of what use_shell=1 means. The code seems to assume
> it means "arguments get joined into one big string, then fed to
> /bin/sh". But that is not what it means: use_shell=1 means that the
> first argument in the arguments array may be a shell script and if so
> should be passed to /bin/sh. All other arguments are passed as normal
> arguments.

Yeah, that is exactly right. "use_shell" just means that the command is
(possibly) run with a shell. Quoting for any extra arguments is handled
automatically.

I think you're correct that this was broken from the start in 10ac85c785
(upload-pack: add object filtering for partial clone, 2017-12-08).
That's even before the use_shell was added, and then later it was pushed
into that conditional by 0b6069fe0a (fetch-pack: test support excluding
large blobs, 2017-12-08). Presumably because the non-hook path would not
have worked at all, and that was the first time any of it was actually
tested. ;)

(I've cc'd authors of those commits as an FYI; I think both were
relatively new to the project at the time so misunderstanding this
subtlety of run-command is not too surprising).

I'm somewhat embarrassed to say that despite being the one who added the
pack-objects hook 4 years ago, we still have not switched over to it at
GitHub from our custom patch (the reason is just mundane; there's some
other adjustments that would have to happen and nobody has ever quite
gotten around to it). Presumably you are looking to use it at GitLab.
Just beware that you are probably treading new-ish ground, so there may
be other bugs like this lurking.

> diff --git a/upload-pack.c b/upload-pack.c
> index 3b66bf92ba..eae1fdbc55 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -305,14 +305,7 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  	if (pack_data->filter_options.choice) {
>  		const char *spec =
>  			expand_list_objects_filter_spec(&pack_data->filter_options);
> -		if (pack_objects.use_shell) {
> -			struct strbuf buf = STRBUF_INIT;
> -			sq_quote_buf(&buf, spec);
> -			strvec_pushf(&pack_objects.args, "--filter=%s", buf.buf);
> -			strbuf_release(&buf);
> -		} else {
> -			strvec_pushf(&pack_objects.args, "--filter=%s", spec);
> -		}
> +		strvec_pushf(&pack_objects.args, "--filter=%s", spec);

Yep, this looks like the right fix. I think with an addition to the test
suite, this will be good to go.

-Peff
Junio C Hamano Jan. 22, 2021, 10:10 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Fri, Jan 22, 2021 at 03:21:37PM +0100, Jacob Vosmaer wrote:
>
>> This fixes a bug that occurs when you combine partial clone and
>> uploadpack.packobjectshook. You can reproduce it as follows:
> ...
> I'm somewhat embarrassed to say that despite being the one who added the
> pack-objects hook 4 years ago, we still have not switched over to it at
> GitHub from our custom patch (the reason is just mundane; there's some
> other adjustments that would have to happen and nobody has ever quite
> gotten around to it). Presumably you are looking to use it at GitLab.
> Just beware that you are probably treading new-ish ground, so there may
> be other bugs like this lurking.
>
>> diff --git a/upload-pack.c b/upload-pack.c
>> index 3b66bf92ba..eae1fdbc55 100644
>> --- a/upload-pack.c
>> +++ b/upload-pack.c
>> @@ -305,14 +305,7 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>>  	if (pack_data->filter_options.choice) {
>>  		const char *spec =
>>  			expand_list_objects_filter_spec(&pack_data->filter_options);
>> -		if (pack_objects.use_shell) {
>> -			struct strbuf buf = STRBUF_INIT;
>> -			sq_quote_buf(&buf, spec);
>> -			strvec_pushf(&pack_objects.args, "--filter=%s", buf.buf);
>> -			strbuf_release(&buf);
>> -		} else {
>> -			strvec_pushf(&pack_objects.args, "--filter=%s", spec);
>> -		}
>> +		strvec_pushf(&pack_objects.args, "--filter=%s", spec);
>
> Yep, this looks like the right fix. I think with an addition to the test
> suite, this will be good to go.

Yeah, that looks simpler and better.  It does deserve new tests.

Thanks, both.
Jacob Vosmaer Jan. 25, 2021, 5:14 p.m. UTC | #3
Thanks, I changed the reproducing command to use 'env' and I added a
test case in t5544.


Best regards,

Jacob Vosmaer
GitLab, Inc.

On Fri, Jan 22, 2021 at 9:32 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jan 22, 2021 at 03:21:37PM +0100, Jacob Vosmaer wrote:
>
> > This fixes a bug that occurs when you combine partial clone and
> > uploadpack.packobjectshook. You can reproduce it as follows:
> >
> > git clone -u 'git -c uploadpack.allowfilter '\
> > '-c uploadpack.packobjectshook=" exec" '\
> > 'upload-pack' --filter=blob:none --no-local \
> > src.git dst.git
> >
> > Be careful with the line endings because this has a long quoted string
> > as the -u argument. Note that there is an intentional space before
> > 'exec'. Without that space, run-command.c tries to be smart and the
> > command fails for the wrong reason.
>
> The "-u" command is run with a shell, so:
>
>   git clone \
>     -u 'git -c uploadpack.allowfilter \
>             -c uploadpack.packobjectshook=env \
>             upload-pack' \
>   --filter=blob:none --no-local src.git dst.git
>
> may be a more readable version. I also found the use of " exec" clever,
> but rather subtle; you need the extra space so that our "don't bother
> using a shell" run-command optimization does not kick in. I replaced it
> with "env" here, which is a slightly more canonical way of running a
> sub-program that does not rely on shell builtins.
>
> But all of this should be added as a new test, probably in t5544 with
> the other pack-objects hook tests.
>
> > The problem is an unnecessary and harmful layer of quoting. I tried
> > digging through the history of this function and I think this quoting
> > was there from the start. My best guess is that it stems from a
> > misunderstanding of what use_shell=1 means. The code seems to assume
> > it means "arguments get joined into one big string, then fed to
> > /bin/sh". But that is not what it means: use_shell=1 means that the
> > first argument in the arguments array may be a shell script and if so
> > should be passed to /bin/sh. All other arguments are passed as normal
> > arguments.
>
> Yeah, that is exactly right. "use_shell" just means that the command is
> (possibly) run with a shell. Quoting for any extra arguments is handled
> automatically.
>
> I think you're correct that this was broken from the start in 10ac85c785
> (upload-pack: add object filtering for partial clone, 2017-12-08).
> That's even before the use_shell was added, and then later it was pushed
> into that conditional by 0b6069fe0a (fetch-pack: test support excluding
> large blobs, 2017-12-08). Presumably because the non-hook path would not
> have worked at all, and that was the first time any of it was actually
> tested. ;)
>
> (I've cc'd authors of those commits as an FYI; I think both were
> relatively new to the project at the time so misunderstanding this
> subtlety of run-command is not too surprising).
>
> I'm somewhat embarrassed to say that despite being the one who added the
> pack-objects hook 4 years ago, we still have not switched over to it at
> GitHub from our custom patch (the reason is just mundane; there's some
> other adjustments that would have to happen and nobody has ever quite
> gotten around to it). Presumably you are looking to use it at GitLab.
> Just beware that you are probably treading new-ish ground, so there may
> be other bugs like this lurking.
>
> > diff --git a/upload-pack.c b/upload-pack.c
> > index 3b66bf92ba..eae1fdbc55 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -305,14 +305,7 @@ static void create_pack_file(struct upload_pack_data *pack_data,
> >       if (pack_data->filter_options.choice) {
> >               const char *spec =
> >                       expand_list_objects_filter_spec(&pack_data->filter_options);
> > -             if (pack_objects.use_shell) {
> > -                     struct strbuf buf = STRBUF_INIT;
> > -                     sq_quote_buf(&buf, spec);
> > -                     strvec_pushf(&pack_objects.args, "--filter=%s", buf.buf);
> > -                     strbuf_release(&buf);
> > -             } else {
> > -                     strvec_pushf(&pack_objects.args, "--filter=%s", spec);
> > -             }
> > +             strvec_pushf(&pack_objects.args, "--filter=%s", spec);
>
> Yep, this looks like the right fix. I think with an addition to the test
> suite, this will be good to go.
>
> -Peff
Jacob Vosmaer Jan. 25, 2021, 5:41 p.m. UTC | #4
On Fri, Jan 22, 2021 at 9:32 PM Jeff King <peff@peff.net> wrote:
> I also found the use of " exec" clever,
> but rather subtle; you need the extra space so that our "don't bother
> using a shell" run-command optimization does not kick in. I replaced it
> with "env" here, which is a slightly more canonical way of running a
> sub-program that does not rely on shell builtins.
Yes good idea, the exec thing is too clever for its own good.


> But all of this should be added as a new test, probably in t5544 with
> the other pack-objects hook tests.

Did that, hope it is what you had in mind.


> I'm somewhat embarrassed to say that despite being the one who added the
> pack-objects hook 4 years ago, we still have not switched over to it at
> GitHub from our custom patch (the reason is just mundane; there's some
> other adjustments that would have to happen and nobody has ever quite
> gotten around to it). Presumably you are looking to use it at GitLab.
> Just beware that you are probably treading new-ish ground, so there may
> be other bugs like this lurking.

Thanks for the heads up!

Jacob
diff mbox series

Patch

diff --git a/upload-pack.c b/upload-pack.c
index 3b66bf92ba..eae1fdbc55 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -305,14 +305,7 @@  static void create_pack_file(struct upload_pack_data *pack_data,
 	if (pack_data->filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&pack_data->filter_options);
-		if (pack_objects.use_shell) {
-			struct strbuf buf = STRBUF_INIT;
-			sq_quote_buf(&buf, spec);
-			strvec_pushf(&pack_objects.args, "--filter=%s", buf.buf);
-			strbuf_release(&buf);
-		} else {
-			strvec_pushf(&pack_objects.args, "--filter=%s", spec);
-		}
+		strvec_pushf(&pack_objects.args, "--filter=%s", spec);
 	}
 	if (uri_protocols) {
 		for (i = 0; i < uri_protocols->nr; i++)