[v2,07/11] ls-refs: ignore very long ref-prefix counts

Message ID	YUE1hz/KKM0XebCP@coredump.intra.peff.net (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Date: Tue, 14 Sep 2021 19:51:35 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>, Martin =?utf-8?b?w4VncmVu?= <martin.agren@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmo=?= =?utf-8?b?w7Zyw7A=?= Bjarmason <avarab@gmail.com> Subject: [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts Message-ID: <YUE1hz/KKM0XebCP@coredump.intra.peff.net> References: <YUE1alo58cGyTw6/@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <YUE1alo58cGyTw6/@coredump.intra.peff.net> Precedence: bulk
Series	limit memory allocations for v2 servers \| expand [v2,0/11] limit memory allocations for v2 servers [v2,01/11] serve: rename is_command() to parse_command() [v2,02/11] serve: return capability "value" from get_capability() [v2,03/11] serve: add "receive" method for v2 capabilities table [v2,04/11] serve: provide "receive" function for object-format capability [v2,05/11] serve: provide "receive" function for session-id capability [v2,06/11] serve: drop "keys" strvec [v2,07/11] ls-refs: ignore very long ref-prefix counts [v2,08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details [v2,09/11] serve: reject bogus v2 "command=ls-refs=foo" [v2,10/11] serve: reject commands used as capabilities [v2,11/11] ls-refs: reject unknown arguments

Message ID

YUE1hz/KKM0XebCP@coredump.intra.peff.net (mailing list archive)

State

Superseded

Headers

Date: Tue, 14 Sep 2021 19:51:35 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>, Martin
	=?utf-8?b?w4VncmVu?= <martin.agren@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmo=?=
	=?utf-8?b?w7Zyw7A=?= Bjarmason <avarab@gmail.com>
Subject: [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts
Message-ID: <YUE1hz/KKM0XebCP@coredump.intra.peff.net>
References: <YUE1alo58cGyTw6/@coredump.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <YUE1alo58cGyTw6/@coredump.intra.peff.net>
Precedence: bulk

Series

limit memory allocations for v2 servers | expand

Commit Message

Jeff King Sept. 14, 2021, 11:51 p.m. UTC

Because each "ref-prefix" capability from the client comes in its own
pkt-line, there's no limit to the number of them that a misbehaving
client may send. We read them all into a strvec, which means the client
can waste arbitrary amounts of our memory by just sending us "ref-prefix
foo" over and over.

One possible solution is to just drop the connection when the limit is
reached. If we set it high enough, then only misbehaving or malicious
clients would hit it. But "high enough" is vague, and it's unfriendly if
we guess wrong and a legitimate client hits this.

But we can do better. Since supporting the ref-prefix capability is
optional anyway, the client has to further cull the response based on
their own patterns. So we can simply ignore the patterns once we cross a
certain threshold. Note that we have to ignore _all_ patterns, not just
the ones past our limit (since otherwise we'd send too little data).

The limit here is fairly arbitrary, and probably much higher than anyone
would need in practice. It might be worth limiting it further, if only
because we check it linearly (so with "m" local refs and "n" patterns,
we do "m * n" string comparisons). But if we care about optimizing this,
an even better solution may be a more advanced data structure anyway.

I didn't bother making the limit configurable, since it's so high and
since Git should behave correctly in either case. It wouldn't be too
hard to do, but it makes both the code and documentation more complex.

Signed-off-by: Jeff King <peff@peff.net>
---
 ls-refs.c            | 20 ++++++++++++++++++--
 t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 2 deletions(-)

Comments

Taylor Blau Sept. 15, 2021, 4:16 a.m. UTC | #1

On Tue, Sep 14, 2021 at 07:51:35PM -0400, Jeff King wrote:
> @@ -156,15 +162,25 @@ int ls_refs(struct repository *r, struct packet_reader *request)
>  			data.peel = 1;
>  		else if (!strcmp("symrefs", arg))
>  			data.symrefs = 1;
> -		else if (skip_prefix(arg, "ref-prefix ", &out))
> -			strvec_push(&data.prefixes, out);
> +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
> +			if (data.prefixes.nr < TOO_MANY_PREFIXES)
> +				strvec_push(&data.prefixes, out);
> +		}
>  		else if (!strcmp("unborn", arg))
>  			data.unborn = allow_unborn;
>  	}
>
>  	if (request->status != PACKET_READ_FLUSH)
>  		die(_("expected flush after ls-refs arguments"));
>
> +	/*
> +	 * If we saw too many prefixes, we must avoid using them at all; as
> +	 * soon as we have any prefix, they are meant to form a comprehensive
> +	 * list.
> +	 */
> +	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
> +		strvec_clear(&data.prefixes);
> +

Great, I find this version even better than what I suggested where the
case where the list already has too many prefixes `continue`s through
the loop before calling strvec_add().

Just noting aloud, the `>` part of this conditional will never happen
(since data.prefixes.nr will be at most equal to TOO_MANY_PREFIXES, but
never larger than it).

Obviously not wrong, but I figured it'd be worth mentioning in case it
caught the attention of other reviewers.

Thanks,
Taylor

Eric Sunshine Sept. 15, 2021, 5 a.m. UTC | #2

On Tue, Sep 14, 2021 at 7:51 PM Jeff King <peff@peff.net> wrote:
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> @@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
> +test_expect_success 'ignore very large set of prefixes' '
> +       # generate a large number of ref-prefixes that we expect
> +       # to match nothing; the value here exceeds TOO_MANY_PREFIXES
> +       # from ls-refs.c.
> +       {
> +               echo command=ls-refs &&
> +               echo object-format=$(test_oid algo)
> +               echo 0001 &&
> +               perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
> +               echo 0000
> +       } |
> +       test-tool pkt-line pack >in &&

Broken &&-chain in {...}.

(Granted, it doesn't matter in this case since the exit code gets lost
down the pipe, but better to be consistent about it.)

Jeff King Sept. 15, 2021, 4:39 p.m. UTC | #3

On Wed, Sep 15, 2021 at 12:16:04AM -0400, Taylor Blau wrote:

> > +	/*
> > +	 * If we saw too many prefixes, we must avoid using them at all; as
> > +	 * soon as we have any prefix, they are meant to form a comprehensive
> > +	 * list.
> > +	 */
> > +	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
> > +		strvec_clear(&data.prefixes);
> > +
> 
> Great, I find this version even better than what I suggested where the
> case where the list already has too many prefixes `continue`s through
> the loop before calling strvec_add().
> 
> Just noting aloud, the `>` part of this conditional will never happen
> (since data.prefixes.nr will be at most equal to TOO_MANY_PREFIXES, but
> never larger than it).
> 
> Obviously not wrong, but I figured it'd be worth mentioning in case it
> caught the attention of other reviewers.

Yeah, your analysis is right. Long ago I read some advice (I think from
K&R, either the C book or a related article) that suggested always using
bounding checks like this rather than equality, because they do the
right thing even if the variable ends up with a surprising value.

I can certainly see an argument against it (like that if you're so
worried about it, maybe you ought to detect it and figure out who is
setting the variable to be unexpectedly high). But it seems like a
reasonable defensive measure to me (against an off-by-one mis-analysis,
or against the earlier code changing in the future).

-Peff

Jeff King Sept. 15, 2021, 4:40 p.m. UTC | #4

On Wed, Sep 15, 2021 at 01:00:01AM -0400, Eric Sunshine wrote:

> On Tue, Sep 14, 2021 at 7:51 PM Jeff King <peff@peff.net> wrote:
> > diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> > @@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
> > +test_expect_success 'ignore very large set of prefixes' '
> > +       # generate a large number of ref-prefixes that we expect
> > +       # to match nothing; the value here exceeds TOO_MANY_PREFIXES
> > +       # from ls-refs.c.
> > +       {
> > +               echo command=ls-refs &&
> > +               echo object-format=$(test_oid algo)
> > +               echo 0001 &&
> > +               perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
> > +               echo 0000
> > +       } |
> > +       test-tool pkt-line pack >in &&
> 
> Broken &&-chain in {...}.

Thanks, will fix.

> (Granted, it doesn't matter in this case since the exit code gets lost
> down the pipe, but better to be consistent about it.)

Yep. I think what happened is that I started to convert this to a
here-doc to match the nearby tests, but then realized that expanding the
multi-line bit was weird (even more so when it was a shell loop with a
pipe, before I switched it to perl).

-Peff

diff --git a/ls-refs.c b/ls-refs.c
index a1a0250607..18c4f41e87 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -40,6 +40,12 @@  static void ensure_config_read(void)
 	config_read = 1;
 }
 
+/*
+ * If we see this many or more "ref-prefix" lines from the client, we consider
+ * it "too many" and will avoid using the prefix feature entirely.
+ */
+#define TOO_MANY_PREFIXES 65536
+
 /*
  * Check if one of the prefixes is a prefix of the ref.
  * If no prefixes were provided, all refs match.
@@ -156,15 +162,25 @@  int ls_refs(struct repository *r, struct packet_reader *request)
 			data.peel = 1;
 		else if (!strcmp("symrefs", arg))
 			data.symrefs = 1;
-		else if (skip_prefix(arg, "ref-prefix ", &out))
-			strvec_push(&data.prefixes, out);
+		else if (skip_prefix(arg, "ref-prefix ", &out)) {
+			if (data.prefixes.nr < TOO_MANY_PREFIXES)
+				strvec_push(&data.prefixes, out);
+		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ls-refs arguments"));
 
+	/*
+	 * If we saw too many prefixes, we must avoid using them at all; as
+	 * soon as we have any prefix, they are meant to form a comprehensive
+	 * list.
+	 */
+	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
+		strvec_clear(&data.prefixes);
+
 	send_possibly_unborn_head(&data);
 	if (!data.prefixes.nr)
 		strvec_push(&data.prefixes, "");
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 930721f053..3bc96ebcde 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -158,6 +158,37 @@  test_expect_success 'refs/heads prefix' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ignore very large set of prefixes' '
+	# generate a large number of ref-prefixes that we expect
+	# to match nothing; the value here exceeds TOO_MANY_PREFIXES
+	# from ls-refs.c.
+	{
+		echo command=ls-refs &&
+		echo object-format=$(test_oid algo)
+		echo 0001 &&
+		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
+		echo 0000
+	} |
+	test-tool pkt-line pack >in &&
+
+	# and then confirm that we see unmatched prefixes anyway (i.e.,
+	# that the prefix was not applied).
+	cat >expect <<-EOF &&
+	$(git rev-parse HEAD) HEAD
+	$(git rev-parse refs/heads/dev) refs/heads/dev
+	$(git rev-parse refs/heads/main) refs/heads/main
+	$(git rev-parse refs/heads/release) refs/heads/release
+	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
+	$(git rev-parse refs/tags/one) refs/tags/one
+	$(git rev-parse refs/tags/two) refs/tags/two
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs

[v2,07/11] ls-refs: ignore very long ref-prefix counts

Commit Message

Comments

Patch