ref-filter: don't look for objects when outside of a repository
diff mbox series

Message ID 20180922141145.10558-1-szeder.dev@gmail.com
State New
Headers show
Series
  • ref-filter: don't look for objects when outside of a repository
Related show

Commit Message

SZEDER Gábor Sept. 22, 2018, 2:11 p.m. UTC
The command 'git ls-remote --sort=authordate <remote>' segfaults when
run outside of a repository, ever since the introduction of its
'--sort' option in 1fb20dfd8e (ls-remote: create '--sort' option,
2018-04-09).

While in general the 'git ls-remote' command can be run outside of a
repository just fine, its '--sort=<key>' option with certain keys does
require access to the referenced objects.  This sorting is implemented
using the generic ref-filter sorting facility, which already handles
missing objects gracefully with the appropriate 'missing object
deadbeef for HEAD' message.  However, being generic means that it
checks replace refs while trying to retrieve an object, and while
doing so it accesses the 'git_replace_ref_base' variable, which has
not been initialized and is still a NULL pointer when outside of a
repository, thus causing the segfault.

Make ref-filter more careful and only attempt to retrieve an object
when we are in a repository.  Also add a test to ensure that 'git
ls-remote --sort' fails gracefully when executed outside of a
repository.

Reported-by: H.Merijn Brand <h.m.brand@xs4all.nl>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

I'm not quite sure that this is the best place to add this check...
but hey, it's a Saturday afternoon after all ;)

 ref-filter.c         | 3 ++-
 t/t5512-ls-remote.sh | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Sept. 24, 2018, 4:15 p.m. UTC | #1
SZEDER Gábor <szeder.dev@gmail.com> writes:

> The command 'git ls-remote --sort=authordate <remote>' segfaults when
> run outside of a repository, ever since the introduction of its
> '--sort' option in 1fb20dfd8e (ls-remote: create '--sort' option,
> 2018-04-09).
>
> While in general the 'git ls-remote' command can be run outside of a
> repository just fine, its '--sort=<key>' option with certain keys does
> require access to the referenced objects.  This sorting is implemented
> using the generic ref-filter sorting facility, which already handles
> missing objects gracefully with the appropriate 'missing object
> deadbeef for HEAD' message.  However, being generic means that it
> checks replace refs while trying to retrieve an object, and while
> doing so it accesses the 'git_replace_ref_base' variable, which has
> not been initialized and is still a NULL pointer when outside of a
> repository, thus causing the segfault.
>
> Make ref-filter more careful and only attempt to retrieve an object
> when we are in a repository.  Also add a test to ensure that 'git
> ls-remote --sort' fails gracefully when executed outside of a
> repository.

OK.  So by forcing get_object() return an error, we do the same to
populate_value() which in turn will make get_ref_atgom_value return
an error and cmp_ref_sorting() will notice and die.

I think that is the best we could do.

>
> Reported-by: H.Merijn Brand <h.m.brand@xs4all.nl>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>
> I'm not quite sure that this is the best place to add this check...
> but hey, it's a Saturday afternoon after all ;)
>
>  ref-filter.c         | 3 ++-
>  t/t5512-ls-remote.sh | 6 ++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index e1bcb4ca8a..3555bc29e7 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1473,7 +1473,8 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
>  		oi->info.sizep = &oi->size;
>  		oi->info.typep = &oi->type;
>  	}
> -	if (oid_object_info_extended(the_repository, &oi->oid, &oi->info,
> +	if (!have_git_dir() ||
> +	    oid_object_info_extended(the_repository, &oi->oid, &oi->info,
>  				     OBJECT_INFO_LOOKUP_REPLACE))
>  		return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
>  				       oid_to_hex(&oi->oid), ref->refname);
> diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
> index bc5703ff9b..7dd081da01 100755
> --- a/t/t5512-ls-remote.sh
> +++ b/t/t5512-ls-remote.sh
> @@ -302,4 +302,10 @@ test_expect_success 'ls-remote works outside repository' '
>  	nongit git ls-remote dst.git
>  '
>  
> +test_expect_success 'ls-remote --sort fails gracefully outside repository' '
> +	# Use a sort key that requires access to the referenced objects.
> +	nongit test_must_fail git ls-remote --sort=authordate "$TRASH_DIRECTORY" 2>err &&
> +	test_i18ngrep "^fatal: missing object" err
> +'
> +
>  test_done
Jeff King Sept. 24, 2018, 6:17 p.m. UTC | #2
On Sat, Sep 22, 2018 at 04:11:45PM +0200, SZEDER Gábor wrote:

> The command 'git ls-remote --sort=authordate <remote>' segfaults when
> run outside of a repository, ever since the introduction of its
> '--sort' option in 1fb20dfd8e (ls-remote: create '--sort' option,
> 2018-04-09).
> 
> While in general the 'git ls-remote' command can be run outside of a
> repository just fine, its '--sort=<key>' option with certain keys does
> require access to the referenced objects.  This sorting is implemented
> using the generic ref-filter sorting facility, which already handles
> missing objects gracefully with the appropriate 'missing object
> deadbeef for HEAD' message.  However, being generic means that it
> checks replace refs while trying to retrieve an object, and while
> doing so it accesses the 'git_replace_ref_base' variable, which has
> not been initialized and is still a NULL pointer when outside of a
> repository, thus causing the segfault.
> 
> Make ref-filter more careful and only attempt to retrieve an object
> when we are in a repository.  Also add a test to ensure that 'git
> ls-remote --sort' fails gracefully when executed outside of a
> repository.

This all makes sense, and I think your fix is going in the right
direction.

But...

> I'm not quite sure that this is the best place to add this check...
> but hey, it's a Saturday afternoon after all ;)

I also wonder about this. For refs, we already catch these cases at a
low-level and BUG(). That's better than a segfault, and I suspect we
should be doing the same here in oid_object_info_extended(). But that
just shifts the segfault to a BUG().

For the refs code, we've generally tried to catch things at a high-level
and report a more human-friendly error explaining the situation. So
doing the same thing here would mean adding code to ls-remote. But I
think the plumbing gets pretty tricky, since it has no way to ask
ref-filter "hey, are we doing to need to look at objects?".

That's a thing that I think ref-filter _should_ support (it knows it
after having parsed the format string). But it probably ought to come
along with other refactoring, and shouldn't hold up this fix.

So this probably _is_ a reasonable place to check it. However...

> diff --git a/ref-filter.c b/ref-filter.c
> index e1bcb4ca8a..3555bc29e7 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1473,7 +1473,8 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
>  		oi->info.sizep = &oi->size;
>  		oi->info.typep = &oi->type;
>  	}
> -	if (oid_object_info_extended(the_repository, &oi->oid, &oi->info,
> +	if (!have_git_dir() ||
> +	    oid_object_info_extended(the_repository, &oi->oid, &oi->info,
>  				     OBJECT_INFO_LOOKUP_REPLACE))
>  		return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
>  				       oid_to_hex(&oi->oid), ref->refname);

Would we perhaps want to give the user a hint that the object is not
really missing, but rather that we're not in a repository? E.g.,
something like:

  if (!have_git_dir())
	return strbuf_addf_ret(err, -1, "format specifier requires a repository");
  if (oid_object_info_extended(...))
	return ...;

?

-Peff
SZEDER Gábor Sept. 24, 2018, 9:20 p.m. UTC | #3
On Mon, Sep 24, 2018 at 02:17:22PM -0400, Jeff King wrote:
> On Sat, Sep 22, 2018 at 04:11:45PM +0200, SZEDER Gábor wrote:
 
> > diff --git a/ref-filter.c b/ref-filter.c
> > index e1bcb4ca8a..3555bc29e7 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -1473,7 +1473,8 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
> >  		oi->info.sizep = &oi->size;
> >  		oi->info.typep = &oi->type;
> >  	}
> > -	if (oid_object_info_extended(the_repository, &oi->oid, &oi->info,
> > +	if (!have_git_dir() ||
> > +	    oid_object_info_extended(the_repository, &oi->oid, &oi->info,
> >  				     OBJECT_INFO_LOOKUP_REPLACE))
> >  		return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
> >  				       oid_to_hex(&oi->oid), ref->refname);
> 
> Would we perhaps want to give the user a hint that the object is not
> really missing, but rather that we're not in a repository? E.g.,
> something like:
> 
>   if (!have_git_dir())
> 	return strbuf_addf_ret(err, -1, "format specifier requires a repository");
>   if (oid_object_info_extended(...))
> 	return ...;
> 
> ?

I think it makes sense.

I wanted to preserve the error message, because the description of
'--sort=<key>' in 'Documentation/git-ls-remote.txt' explicitly
mentions it, and I added the condition at this place because I didn't
want to duplicate the construction of the error message.

However, if we go for a more informative error message, then wouldn't
it be better to add this condition in populate_value() before it even
calls get_object()?  Then we could also add the problematic format
specifier to the error message (I think, but didn't actually check),
just in case someone specified multiple sort keys.
Jeff King Sept. 24, 2018, 9:30 p.m. UTC | #4
On Mon, Sep 24, 2018 at 11:20:34PM +0200, SZEDER Gábor wrote:

> > Would we perhaps want to give the user a hint that the object is not
> > really missing, but rather that we're not in a repository? E.g.,
> > something like:
> > 
> >   if (!have_git_dir())
> > 	return strbuf_addf_ret(err, -1, "format specifier requires a repository");
> >   if (oid_object_info_extended(...))
> > 	return ...;
> > 
> > ?
> 
> I think it makes sense.
> 
> I wanted to preserve the error message, because the description of
> '--sort=<key>' in 'Documentation/git-ls-remote.txt' explicitly
> mentions it, and I added the condition at this place because I didn't
> want to duplicate the construction of the error message.

Ah, I didn't realize we actually documented that. And perhaps it is more
consistent, too: you'd get different results from running "ls-remote"
outside a repository versus one that just doesn't have the objects from
the other side.

> However, if we go for a more informative error message, then wouldn't
> it be better to add this condition in populate_value() before it even
> calls get_object()?  Then we could also add the problematic format
> specifier to the error message (I think, but didn't actually check),
> just in case someone specified multiple sort keys.

Yeah, that probably would be a better place. Though your response also
has made me think that maybe just sticking with the "missing object"
response is reasonable. I don't have a strong opinion between the two.

-Peff
Junio C Hamano Sept. 25, 2018, 8:57 p.m. UTC | #5
SZEDER Gábor <szeder.dev@gmail.com> writes:

> However, if we go for a more informative error message, then wouldn't
> it be better to add this condition in populate_value() before it even
> calls get_object()?  Then we could also add the problematic format
> specifier to the error message (I think, but didn't actually check),
> just in case someone specified multiple sort keys.

Even though I suspect that verify_ref_format() is the logically the
right place to do this (after all, it is about seeing if the format
makes sense, and a format that requires an object access used
outside a repository should trigger an verification error), doing
that in populate_value() probably strikes the best balance, I would
think.

Patch
diff mbox series

diff --git a/ref-filter.c b/ref-filter.c
index e1bcb4ca8a..3555bc29e7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1473,7 +1473,8 @@  static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 		oi->info.sizep = &oi->size;
 		oi->info.typep = &oi->type;
 	}
-	if (oid_object_info_extended(the_repository, &oi->oid, &oi->info,
+	if (!have_git_dir() ||
+	    oid_object_info_extended(the_repository, &oi->oid, &oi->info,
 				     OBJECT_INFO_LOOKUP_REPLACE))
 		return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
 				       oid_to_hex(&oi->oid), ref->refname);
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index bc5703ff9b..7dd081da01 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -302,4 +302,10 @@  test_expect_success 'ls-remote works outside repository' '
 	nongit git ls-remote dst.git
 '
 
+test_expect_success 'ls-remote --sort fails gracefully outside repository' '
+	# Use a sort key that requires access to the referenced objects.
+	nongit test_must_fail git ls-remote --sort=authordate "$TRASH_DIRECTORY" 2>err &&
+	test_i18ngrep "^fatal: missing object" err
+'
+
 test_done