diff mbox series

[7/9] ls-refs: ignore very long ref-prefix counts

Message ID YUDBokYvEBnzwsIN@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series reducing memory allocations for v2 servers | expand

Commit Message

Jeff King Sept. 14, 2021, 3:37 p.m. UTC
Because each "ref-prefix" capability from the client comes in its own
pkt-line, there's no limit to the number of them that a misbehaving
client may send. We read them all into a strvec, which means the client
can waste arbitrary amounts of our memory by just sending us "ref-prefix
foo" over and over.

One possible solution is to just drop the connection when the limit is
reached. If we set it high enough, then only misbehaving or malicious
clients would hit it. But "high enough" is vague, and it's unfriendly if
we guess wrong and a legitimate client hits this.

But we can do better. Since supporting the ref-prefix capability is
optional anyway, the client has to further cull the response based on
their own patterns. So we can simply ignore the patterns once we cross a
certain threshold. Note that we have to ignore _all_ patterns, not just
the ones past our limit (since otherwise we'd send too little data).

The limit here is fairly arbitrary, and probably much higher than anyone
would need in practice. It might be worth limiting it further, if only
because we check it linearly (so with "m" local refs and "n" patterns,
we do "m * n" string comparisons). But if we care about optimizing this,
an even better solution may be a more advanced data structure anyway.

I didn't bother making the limit configurable, since it's so high and
since Git should behave correctly in either case. It wouldn't be too
hard to do, but it makes both the code and documentation more complex.

Signed-off-by: Jeff King <peff@peff.net>
---
We're perhaps bending "optional" a little here. The client does know if
we said "yes, we support ref-prefix" and until now, that meant they
could trust us to cull. But no version of Git has ever relied on that
(we tell the transport code "if you can limit by these prefixes, go for
it" but then just post-process the result).

The other option is that we could just say "no, you're sending too many
prefixes" and hangup. This seemed friendlier to me (though either way, I
really find it quite unlikely anybody would legitimately hit this
limit).

 ls-refs.c            | 19 +++++++++++++++++--
 t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 2 deletions(-)

Comments

Taylor Blau Sept. 14, 2021, 5:18 p.m. UTC | #1
On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:
> Because each "ref-prefix" capability from the client comes in its own
> pkt-line, there's no limit to the number of them that a misbehaving
> client may send. We read them all into a strvec, which means the client
> can waste arbitrary amounts of our memory by just sending us "ref-prefix
> foo" over and over.
>
> One possible solution is to just drop the connection when the limit is
> reached. If we set it high enough, then only misbehaving or malicious
> clients would hit it. But "high enough" is vague, and it's unfriendly if
> we guess wrong and a legitimate client hits this.
>
> But we can do better. Since supporting the ref-prefix capability is
> optional anyway, the client has to further cull the response based on
> their own patterns. So we can simply ignore the patterns once we cross a
> certain threshold. Note that we have to ignore _all_ patterns, not just
> the ones past our limit (since otherwise we'd send too little data).

Right, because each ref-prefix line *adds* references to the advertised
set instead of culling them out. So as soon as we start ignoring even a
single ref-prefix line, we have to ignore all of them. Makes sense.

> The limit here is fairly arbitrary, and probably much higher than anyone
> would need in practice. It might be worth limiting it further, if only
> because we check it linearly (so with "m" local refs and "n" patterns,
> we do "m * n" string comparisons). But if we care about optimizing this,
> an even better solution may be a more advanced data structure anyway.
>
> I didn't bother making the limit configurable, since it's so high and
> since Git should behave correctly in either case. It wouldn't be too
> hard to do, but it makes both the code and documentation more complex.

Agreed. I don't think it's worth making it configurable because the
limit is so absurdly high that probably nobody will ever want to tweak
it.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> We're perhaps bending "optional" a little here. The client does know if
> we said "yes, we support ref-prefix" and until now, that meant they
> could trust us to cull. But no version of Git has ever relied on that
> (we tell the transport code "if you can limit by these prefixes, go for
> it" but then just post-process the result).
>
> The other option is that we could just say "no, you're sending too many
> prefixes" and hangup. This seemed friendlier to me (though either way, I
> really find it quite unlikely anybody would legitimately hit this
> limit).

FWIW, either (dropping the connection or the approach you took here)
would have been fine with me, but I find it unlikely that any real users
will notice ;).

>  ls-refs.c            | 19 +++++++++++++++++--
>  t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index a1a0250607..839fb0caa9 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -40,6 +40,12 @@ static void ensure_config_read(void)
>  	config_read = 1;
>  }
>
> +/*
> + * The maximum number of "ref-prefix" lines we'll allow the client to send.
> + * If they go beyond this, we'll avoid using the prefix feature entirely.
> + */
> +#define MAX_ALLOWED_PREFIXES 65536
> +
>  /*
>   * Check if one of the prefixes is a prefix of the ref.
>   * If no prefixes were provided, all refs match.
> @@ -141,6 +147,7 @@ static int ls_refs_config(const char *var, const char *value, void *data)
>  int ls_refs(struct repository *r, struct packet_reader *request)
>  {
>  	struct ls_refs_data data;
> +	int too_many_prefixes = 0;
>
>  	memset(&data, 0, sizeof(data));
>  	strvec_init(&data.prefixes);
> @@ -156,8 +163,16 @@ int ls_refs(struct repository *r, struct packet_reader *request)
>  			data.peel = 1;
>  		else if (!strcmp("symrefs", arg))
>  			data.symrefs = 1;
> -		else if (skip_prefix(arg, "ref-prefix ", &out))
> -			strvec_push(&data.prefixes, out);
> +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
> +			if (too_many_prefixes) {
> +				/* ignore any further ones */
> +			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> +				strvec_clear(&data.prefixes);
> +				too_many_prefixes = 1;
> +			} else {
> +				strvec_push(&data.prefixes, out);
> +			}
> +		}

The order of this if-statement is a little odd to me, but obviously
correct. I might have wrote:

    if (too_many_prefixes)
      continue;

    if (data.prefixes.nr < MAX_ALLOWED_PREFIXES) {
      strvec_push(&data.prefixes, out);
    } else {
      too_many_prefixes = 1;
      strvec_clear(&data.prefixes);
    }

But certainly what you wrote here works just fine (so this is a cosmetic
comment, and not a functional one).

>  		else if (!strcmp("unborn", arg))
>  			data.unborn = allow_unborn;
>  	}
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index 930721f053..b095bfa0ac 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'ignore very large set of prefixes' '
> +	# generate a large number of ref-prefixes that we expect
> +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> +	# from ls-refs.c.
> +	{
> +		echo command=ls-refs &&
> +		echo object-format=$(test_oid algo)
> +		echo 0001 &&
> +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> +		echo 0000
> +	} |
> +	test-tool pkt-line pack >in &&
> +
> +	# and then confirm that we see unmatched prefixes anyway (i.e.,
> +	# that the prefix was not applied).
> +	cat >expect <<-EOF &&
> +	$(git rev-parse HEAD) HEAD
> +	$(git rev-parse refs/heads/dev) refs/heads/dev
> +	$(git rev-parse refs/heads/main) refs/heads/main
> +	$(git rev-parse refs/heads/release) refs/heads/release
> +	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
> +	$(git rev-parse refs/tags/one) refs/tags/one
> +	$(git rev-parse refs/tags/two) refs/tags/two

You could have written this as a loop over the unmatched prefixes, but I
vastly prefer the result you came up with, which is much more explicit
and doesn't require readers to parse out what the loop does.

So this part looks very good to me.

Thanks,
Taylor
Jeff King Sept. 14, 2021, 5:23 p.m. UTC | #2
On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:

> The limit here is fairly arbitrary, and probably much higher than anyone
> would need in practice. It might be worth limiting it further, if only
> because we check it linearly (so with "m" local refs and "n" patterns,
> we do "m * n" string comparisons). But if we care about optimizing this,
> an even better solution may be a more advanced data structure anyway.

The limit I picked is 65536, because it seemed round and high. But note
that somebody can put up to almost-64k in a single ref-prefix line,
which means ultimately you can allocate 4GB. I do wonder if dropping
this to something like 1024 might be reasonable.

In practice I'd expect it to be a handful in most cases (refs/heads/*,
refs/tags/*, HEAD). But if you do something like:

  git fetch $remote 1 2 3 4 5 6 7 ...

then we'll prefix-expand those names with the usual lookup rules into
refs/1, refs/heads/1, refs/2, refs/heads/2, and so on.

At some point it becomes silly and works counter to the purpose of the
optimization (you send more prefix constraints than the actual ref
advertisement, not to mention that client bandwidth may not be
symmetric). I'm not sure what we want to declare as a reasonable limit.

And this is just about protecting the server; probably it makes sense
for the client to realize it's going to send a ridiculous number of
prefixes and just skip the feature entirely (since that's what actually
saves the bandwidth).

-Peff
Jeff King Sept. 14, 2021, 5:27 p.m. UTC | #3
On Tue, Sep 14, 2021 at 01:18:06PM -0400, Taylor Blau wrote:

> > @@ -156,8 +163,16 @@ int ls_refs(struct repository *r, struct packet_reader *request)
> >  			data.peel = 1;
> >  		else if (!strcmp("symrefs", arg))
> >  			data.symrefs = 1;
> > -		else if (skip_prefix(arg, "ref-prefix ", &out))
> > -			strvec_push(&data.prefixes, out);
> > +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
> > +			if (too_many_prefixes) {
> > +				/* ignore any further ones */
> > +			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> > +				strvec_clear(&data.prefixes);
> > +				too_many_prefixes = 1;
> > +			} else {
> > +				strvec_push(&data.prefixes, out);
> > +			}
> > +		}
> 
> The order of this if-statement is a little odd to me, but obviously
> correct. I might have wrote:
> 
>     if (too_many_prefixes)
>       continue;
> 
>     if (data.prefixes.nr < MAX_ALLOWED_PREFIXES) {
>       strvec_push(&data.prefixes, out);
>     } else {
>       too_many_prefixes = 1;
>       strvec_clear(&data.prefixes);
>     }
> 
> But certainly what you wrote here works just fine (so this is a cosmetic
> comment, and not a functional one).

My view of it was: check every case that may avoid us pushing a prefix,
and then finally push one. But that may have been related to my goal in
writing the patch. :)

> > +test_expect_success 'ignore very large set of prefixes' '
> > +	# generate a large number of ref-prefixes that we expect
> > +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> > +	# from ls-refs.c.
> > +	{
> > +		echo command=ls-refs &&
> > +		echo object-format=$(test_oid algo)
> > +		echo 0001 &&
> > +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> > +		echo 0000
> > +	} |
> > +	test-tool pkt-line pack >in &&
> > +
> > +	# and then confirm that we see unmatched prefixes anyway (i.e.,
> > +	# that the prefix was not applied).
> > +	cat >expect <<-EOF &&
> > +	$(git rev-parse HEAD) HEAD
> > +	$(git rev-parse refs/heads/dev) refs/heads/dev
> > +	$(git rev-parse refs/heads/main) refs/heads/main
> > +	$(git rev-parse refs/heads/release) refs/heads/release
> > +	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
> > +	$(git rev-parse refs/tags/one) refs/tags/one
> > +	$(git rev-parse refs/tags/two) refs/tags/two
> 
> You could have written this as a loop over the unmatched prefixes, but I
> vastly prefer the result you came up with, which is much more explicit
> and doesn't require readers to parse out what the loop does.

I actually think that:

  git for-each-ref --format='%(objectname) %(refname)' >expect

is pretty readable (and is much more efficient, and nicely avoids the
master/main brittle-ness, which I ran into while backporting this). But:

  - this matches what the rest of the script does

  - for-each-ref doesn't report on HEAD, so we have to add that in
    separately

  - the "pkt-line unpack" will include the flush packet, so we'd have to
    add that in, too.

-Peff
Martin Ågren Sept. 14, 2021, 7:06 p.m. UTC | #4
On Tue, 14 Sept 2021 at 17:38, Jeff King <peff@peff.net> wrote:
>
> One possible solution is to just drop the connection when the limit is
> reached. If we set it high enough, then only misbehaving or malicious
> clients would hit it. But "high enough" is vague, and it's unfriendly if
> we guess wrong and a legitimate client hits this.
>
> But we can do better. Since supporting the ref-prefix capability is
> optional anyway, the client has to further cull the response based on
> their own patterns. So we can simply ignore the patterns once we cross a
> certain threshold. Note that we have to ignore _all_ patterns, not just
> the ones past our limit (since otherwise we'd send too little data).

This all makes sense to me. At some point, we should be able to go "I
don't know what you're trying to do, but let me just ignore all this
craziness and instead try to give you a useful result sooner rather than
later".

I do wonder if we should document that the client can't trust us to
actually do all this culling. In general, I find that it's a matter of
hygiene for the client to do its own checks, but with this change they
actually *need* to do them. (Unless they know our limit and that they're
on the right side of it, but that kind of magic is even less hygienic.)

> +               else if (skip_prefix(arg, "ref-prefix ", &out)) {
> +                       if (too_many_prefixes) {
> +                               /* ignore any further ones */
> +                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> +                               strvec_clear(&data.prefixes);
> +                               too_many_prefixes = 1;
> +                       } else {
> +                               strvec_push(&data.prefixes, out);
> +                       }
> +               }

Is it easier to reason about with something like this
(whitespace-damaged) on top?

diff --git a/ls-refs.c b/ls-refs.c
index 839fb0caa9..b3101ff361 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -147,7 +147,6 @@ static int ls_refs_config(const char *var, const
char *value, void *data)
 int ls_refs(struct repository *r, struct packet_reader *request)
 {
        struct ls_refs_data data;
-       int too_many_prefixes = 0;

        memset(&data, 0, sizeof(data));
        strvec_init(&data.prefixes);
@@ -164,14 +163,8 @@ int ls_refs(struct repository *r, struct
packet_reader *request)
                else if (!strcmp("symrefs", arg))
                        data.symrefs = 1;
                else if (skip_prefix(arg, "ref-prefix ", &out)) {
-                       if (too_many_prefixes) {
-                               /* ignore any further ones */
-                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
-                               strvec_clear(&data.prefixes);
-                               too_many_prefixes = 1;
-                       } else {
+                       if (data.prefixes.nr <= MAX_ALLOWED_PREFIXES)
                                strvec_push(&data.prefixes, out);
-                       }
                }
                else if (!strcmp("unborn", arg))
                        data.unborn = allow_unborn;
@@ -180,6 +173,9 @@ int ls_refs(struct repository *r, struct
packet_reader *request)
        if (request->status != PACKET_READ_FLUSH)
                die(_("expected flush after ls-refs arguments"));

+       if (data.prefixes.nr > MAX_ALLOWED_PREFIXES)
+               strvec_clear(&data.prefixes);
+
        send_possibly_unborn_head(&data);
        if (!data.prefixes.nr)
                strvec_push(&data.prefixes, "");

Maybe even name the macro TOO_MANY_PREFIXES (and bump it by one)
to make the logic instead be

        if (data.prefixes.nr < TOO_MANY_PREFIXES)
                strvec_push(&data.prefixes, out);
 ...
        if (data.prefixes.nr >= TOO_MANY_PREFIXES)
                strvec_clear(&data.prefixes);

Just a thought. I'm reaching to try to find a way to improve this
series. ;-) It was a nice read.


Martin
Jeff King Sept. 14, 2021, 7:22 p.m. UTC | #5
On Tue, Sep 14, 2021 at 09:06:55PM +0200, Martin Ågren wrote:

> > But we can do better. Since supporting the ref-prefix capability is
> > optional anyway, the client has to further cull the response based on
> > their own patterns. So we can simply ignore the patterns once we cross a
> > certain threshold. Note that we have to ignore _all_ patterns, not just
> > the ones past our limit (since otherwise we'd send too little data).
> 
> This all makes sense to me. At some point, we should be able to go "I
> don't know what you're trying to do, but let me just ignore all this
> craziness and instead try to give you a useful result sooner rather than
> later".
> 
> I do wonder if we should document that the client can't trust us to
> actually do all this culling. In general, I find that it's a matter of
> hygiene for the client to do its own checks, but with this change they
> actually *need* to do them. (Unless they know our limit and that they're
> on the right side of it, but that kind of magic is even less hygienic.)

Perhaps we could say so more explicitly in the v2 protocol spec. I'll
take a look.

> > +               else if (skip_prefix(arg, "ref-prefix ", &out)) {
> > +                       if (too_many_prefixes) {
> > +                               /* ignore any further ones */
> > +                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> > +                               strvec_clear(&data.prefixes);
> > +                               too_many_prefixes = 1;
> > +                       } else {
> > +                               strvec_push(&data.prefixes, out);
> > +                       }
> > +               }
> 
> Is it easier to reason about with something like this
> (whitespace-damaged) on top?

You're the second person to complain about this if-else chain. I'll take
the hint. ;)

> diff --git a/ls-refs.c b/ls-refs.c
> index 839fb0caa9..b3101ff361 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -147,7 +147,6 @@ static int ls_refs_config(const char *var, const
> char *value, void *data)
>  int ls_refs(struct repository *r, struct packet_reader *request)
>  {
>         struct ls_refs_data data;
> -       int too_many_prefixes = 0;
> 
>         memset(&data, 0, sizeof(data));
>         strvec_init(&data.prefixes);
> @@ -164,14 +163,8 @@ int ls_refs(struct repository *r, struct
> packet_reader *request)
>                 else if (!strcmp("symrefs", arg))
>                         data.symrefs = 1;
>                 else if (skip_prefix(arg, "ref-prefix ", &out)) {
> -                       if (too_many_prefixes) {
> -                               /* ignore any further ones */
> -                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> -                               strvec_clear(&data.prefixes);
> -                               too_many_prefixes = 1;
> -                       } else {
> +                       if (data.prefixes.nr <= MAX_ALLOWED_PREFIXES)
>                                 strvec_push(&data.prefixes, out);
> -                       }
>                 }

Hmm. At first I liked this, because it reduces the number of cases (and
variables!). But there's something really subtle going on here. I
thought at first it should be "<", but you are intentionally
over-allocating by one entry to indicate the overflow. I.e., you've
essentially stuffed the too_many_prefixes boolean into the count.

> @@ -180,6 +173,9 @@ int ls_refs(struct repository *r, struct
> packet_reader *request)
>         if (request->status != PACKET_READ_FLUSH)
>                 die(_("expected flush after ls-refs arguments"));
> 
> +       if (data.prefixes.nr > MAX_ALLOWED_PREFIXES)
> +               strvec_clear(&data.prefixes);
> +

This is far from the parser, but I think that's OK. I'd probably couple
it with a comment explaining why we need to clear rather than using what
we got.

> Maybe even name the macro TOO_MANY_PREFIXES (and bump it by one)
> to make the logic instead be
> 
>         if (data.prefixes.nr < TOO_MANY_PREFIXES)
>                 strvec_push(&data.prefixes, out);
>  ...
>         if (data.prefixes.nr >= TOO_MANY_PREFIXES)
>                 strvec_clear(&data.prefixes);

At first I thought this was just being cute, but it's an attempt to
compensate for the off-by-one subtlety in the early check. I'll give it
some thought. I kind of like it, but the fact that it took me a minute
or three to be sure the code is correct makes me worried it's being too
clever.

-Peff
Jeff King Sept. 14, 2021, 10:09 p.m. UTC | #6
On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:

> +test_expect_success 'ignore very large set of prefixes' '
> +	# generate a large number of ref-prefixes that we expect
> +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> +	# from ls-refs.c.
> +	{
> +		echo command=ls-refs &&
> +		echo object-format=$(test_oid algo)
> +		echo 0001 &&
> +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> +		echo 0000
> +	} |
> +	test-tool pkt-line pack >in &&

Yuck. While double-checking some refactoring, I realized this test does
not actually generate the correct input!

It omits the "ref-prefix" header. _And_ it accidentally expands $_ in
the shell rather than in perl.

The test does work once corrected. And in fact I had originally written
it correctly as a "test_seq | sed" pipeline, but generating 64k+ lines
in the shell seemed a bit much (we do avoid sub-processes, but the "set
-x" output is unwieldy).

I'll fix it in a re-roll.

-Peff
Taylor Blau Sept. 14, 2021, 10:11 p.m. UTC | #7
On Tue, Sep 14, 2021 at 06:09:45PM -0400, Jeff King wrote:
> On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:
>
> > +test_expect_success 'ignore very large set of prefixes' '
> > +	# generate a large number of ref-prefixes that we expect
> > +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> > +	# from ls-refs.c.
> > +	{
> > +		echo command=ls-refs &&
> > +		echo object-format=$(test_oid algo)
> > +		echo 0001 &&
> > +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> > +		echo 0000
> > +	} |
> > +	test-tool pkt-line pack >in &&
>
> Yuck. While double-checking some refactoring, I realized this test does
> not actually generate the correct input!
>
> It omits the "ref-prefix" header. _And_ it accidentally expands $_ in
> the shell rather than in perl.

Hah, nice find. You'd think that one of us would have caught it earlier
given that we both discussed it.

Thanks,
Taylor
Jeff King Sept. 14, 2021, 10:15 p.m. UTC | #8
On Tue, Sep 14, 2021 at 06:11:32PM -0400, Taylor Blau wrote:

> On Tue, Sep 14, 2021 at 06:09:45PM -0400, Jeff King wrote:
> > On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:
> >
> > > +test_expect_success 'ignore very large set of prefixes' '
> > > +	# generate a large number of ref-prefixes that we expect
> > > +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> > > +	# from ls-refs.c.
> > > +	{
> > > +		echo command=ls-refs &&
> > > +		echo object-format=$(test_oid algo)
> > > +		echo 0001 &&
> > > +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> > > +		echo 0000
> > > +	} |
> > > +	test-tool pkt-line pack >in &&
> >
> > Yuck. While double-checking some refactoring, I realized this test does
> > not actually generate the correct input!
> >
> > It omits the "ref-prefix" header. _And_ it accidentally expands $_ in
> > the shell rather than in perl.
> 
> Hah, nice find. You'd think that one of us would have caught it earlier
> given that we both discussed it.

Really, I'd have thought that ls-refs would complain about a totally
bogus capability. I'll see if I can fix that, as well.

-Peff
diff mbox series

Patch

diff --git a/ls-refs.c b/ls-refs.c
index a1a0250607..839fb0caa9 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -40,6 +40,12 @@  static void ensure_config_read(void)
 	config_read = 1;
 }
 
+/*
+ * The maximum number of "ref-prefix" lines we'll allow the client to send.
+ * If they go beyond this, we'll avoid using the prefix feature entirely.
+ */
+#define MAX_ALLOWED_PREFIXES 65536
+
 /*
  * Check if one of the prefixes is a prefix of the ref.
  * If no prefixes were provided, all refs match.
@@ -141,6 +147,7 @@  static int ls_refs_config(const char *var, const char *value, void *data)
 int ls_refs(struct repository *r, struct packet_reader *request)
 {
 	struct ls_refs_data data;
+	int too_many_prefixes = 0;
 
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
@@ -156,8 +163,16 @@  int ls_refs(struct repository *r, struct packet_reader *request)
 			data.peel = 1;
 		else if (!strcmp("symrefs", arg))
 			data.symrefs = 1;
-		else if (skip_prefix(arg, "ref-prefix ", &out))
-			strvec_push(&data.prefixes, out);
+		else if (skip_prefix(arg, "ref-prefix ", &out)) {
+			if (too_many_prefixes) {
+				/* ignore any further ones */
+			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
+				strvec_clear(&data.prefixes);
+				too_many_prefixes = 1;
+			} else {
+				strvec_push(&data.prefixes, out);
+			}
+		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
 	}
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 930721f053..b095bfa0ac 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -158,6 +158,37 @@  test_expect_success 'refs/heads prefix' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ignore very large set of prefixes' '
+	# generate a large number of ref-prefixes that we expect
+	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
+	# from ls-refs.c.
+	{
+		echo command=ls-refs &&
+		echo object-format=$(test_oid algo)
+		echo 0001 &&
+		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
+		echo 0000
+	} |
+	test-tool pkt-line pack >in &&
+
+	# and then confirm that we see unmatched prefixes anyway (i.e.,
+	# that the prefix was not applied).
+	cat >expect <<-EOF &&
+	$(git rev-parse HEAD) HEAD
+	$(git rev-parse refs/heads/dev) refs/heads/dev
+	$(git rev-parse refs/heads/main) refs/heads/main
+	$(git rev-parse refs/heads/release) refs/heads/release
+	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
+	$(git rev-parse refs/tags/one) refs/tags/one
+	$(git rev-parse refs/tags/two) refs/tags/two
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs