mbox series

[0/9] reducing memory allocations for v2 servers

Message ID YUC/6n1hhUbMJiLw@coredump.intra.peff.net (mailing list archive)
Headers show
Series reducing memory allocations for v2 servers | expand

Message

Jeff King Sept. 14, 2021, 3:29 p.m. UTC
While looking at [1], I noticed that v2 servers will read a few bits of
client input into strvecs. Even though we expect these to be small-ish,
there's nothing preventing a client from sending us a bunch of junk and
wasting memory.

This series changes that, putting a cap on how much data we'll receive.
The two spots are the "capabilities" list we receive (before we even
dispatch to a particular command like ls-refs or fetch), and the
ref-prefix list we receive for ls-refs.

To deal with the capabilities issue, we'll just handle each capability
line as it comes in, rather than storing a list. This requires a bit of
cleanup, which is why it takes up the first 6 patches. It also needs a
few other cleanups from ab/serve-cleanup (and so this series is based on
that). The dependencies there are both textual (e.g., using designated
initializers in the capabilities table) and functional (dropping the
"keys" parameter from v2 command() functions).

Patch 7 fixes the ls-refs issue by degrading when we see too many
prefixes (which works because the capability is optional in the first
place).

The final two patches aren't strictly necessary. They're a tightening of
the protocol against some bogus input that I noticed while doing the
earlier cleanups. That bogus input isn't really hurting anything, but I
think it makes sense to reject nonsense from the client rather than
ignoring it. The obvious risk is that some client for some reason would
send us nonsense, but I don't think Git ever has.

Once again, these are based on ab/serve-cleanup, which is currently in
next.

  [1/9]: serve: rename is_command() to parse_command()
  [2/9]: serve: return capability "value" from get_capability()
  [3/9]: serve: add "receive" method for v2 capabilities table
  [4/9]: serve: provide "receive" function for object-format capability
  [5/9]: serve: provide "receive" function for session-id capability
  [6/9]: serve: drop "keys" strvec
  [7/9]: ls-refs: ignore very long ref-prefix counts
  [8/9]: serve: reject bogus v2 "command=ls-refs=foo"
  [9/9]: serve: reject commands used as capabilities

 ls-refs.c            |  19 ++++++-
 serve.c              | 120 +++++++++++++++++++++++--------------------
 t/t5701-git-serve.sh |  60 ++++++++++++++++++++++
 3 files changed, 142 insertions(+), 57 deletions(-)

-Peff

[1] https://lore.kernel.org/git/YT54CNYgtGcqexwq@coredump.intra.peff.net/

Comments

Taylor Blau Sept. 14, 2021, 5:30 p.m. UTC | #1
On Tue, Sep 14, 2021 at 11:29:46AM -0400, Jeff King wrote:
> While looking at [1], I noticed that v2 servers will read a few bits of
> client input into strvecs. Even though we expect these to be small-ish,
> there's nothing preventing a client from sending us a bunch of junk and
> wasting memory.
>
> This series changes that, putting a cap on how much data we'll receive.
> The two spots are the "capabilities" list we receive (before we even
> dispatch to a particular command like ls-refs or fetch), and the
> ref-prefix list we receive for ls-refs.
>
> [...]

Thanks. I reviewed this series carefully and all of my comments were
either of the form "you could have written it this way, but I'm equally
happy with what you wrote here" or "this behavior change won't affect
real users and so I'm OK with it".

    Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor
Junio C Hamano Sept. 14, 2021, 6 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> While looking at [1], I noticed that v2 servers will read a few bits of
> client input into strvecs. Even though we expect these to be small-ish,
> there's nothing preventing a client from sending us a bunch of junk and
> wasting memory.

The title of the topic says "reducing", but it smells more to me
like this is about "limiting" allocations to protect ourselves.  Am
I reading the series correctly?

> This series changes that, putting a cap on how much data we'll receive.
> The two spots are the "capabilities" list we receive (before we even
> dispatch to a particular command like ls-refs or fetch), and the
> ref-prefix list we receive for ls-refs.
>
> To deal with the capabilities issue, we'll just handle each capability
> line as it comes in, rather than storing a list. This requires a bit of
> cleanup, which is why it takes up the first 6 patches. It also needs a
> few other cleanups from ab/serve-cleanup (and so this series is based on
> that). The dependencies there are both textual (e.g., using designated
> initializers in the capabilities table) and functional (dropping the
> "keys" parameter from v2 command() functions).
>
> Patch 7 fixes the ls-refs issue by degrading when we see too many
> prefixes (which works because the capability is optional in the first
> place).
>
> The final two patches aren't strictly necessary. They're a tightening of
> the protocol against some bogus input that I noticed while doing the
> earlier cleanups. That bogus input isn't really hurting anything, but I
> think it makes sense to reject nonsense from the client rather than
> ignoring it. The obvious risk is that some client for some reason would
> send us nonsense, but I don't think Git ever has.

Your cover letters (as proposed log messages for individual commits)
are always to the point and very pleasant to read.  Very much
appreciated.

Thanks.
Jeff King Sept. 14, 2021, 6:38 p.m. UTC | #3
On Tue, Sep 14, 2021 at 11:00:56AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > While looking at [1], I noticed that v2 servers will read a few bits of
> > client input into strvecs. Even though we expect these to be small-ish,
> > there's nothing preventing a client from sending us a bunch of junk and
> > wasting memory.
> 
> The title of the topic says "reducing", but it smells more to me
> like this is about "limiting" allocations to protect ourselves.  Am
> I reading the series correctly?

Yeah, I think "limiting" is probably a better word. We will still
allocate for ref-prefix, of course, but we'll limit the number of
allocations. We did "reduce" the number of spots that allocate, since
capabilities no longer do so. :)

But I think your understanding of the series is correct.

-Peff
Ævar Arnfjörð Bjarmason Sept. 15, 2021, 12:25 a.m. UTC | #4
On Tue, Sep 14 2021, Jeff King wrote:

> While looking at [1], I noticed that v2 servers will read a few bits of
> client input into strvecs. Even though we expect these to be small-ish,
> there's nothing preventing a client from sending us a bunch of junk and
> wasting memory.

[...]

>
> [1] https://lore.kernel.org/git/YT54CNYgtGcqexwq@coredump.intra.peff.net/

When I first read this I expected it to be a link to
https://lore.kernel.org/git/YPCsDLoiiAG%2FC%2Fft@coredump.intra.peff.net/;
i.e. that the object-info leak discussion back in July had encouraged
you to work on this ... :)
Jeff King Sept. 15, 2021, 4:41 p.m. UTC | #5
On Wed, Sep 15, 2021 at 02:25:49AM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Sep 14 2021, Jeff King wrote:
> 
> > While looking at [1], I noticed that v2 servers will read a few bits of
> > client input into strvecs. Even though we expect these to be small-ish,
> > there's nothing preventing a client from sending us a bunch of junk and
> > wasting memory.
> 
> [...]
> 
> >
> > [1] https://lore.kernel.org/git/YT54CNYgtGcqexwq@coredump.intra.peff.net/
> 
> When I first read this I expected it to be a link to
> https://lore.kernel.org/git/YPCsDLoiiAG%2FC%2Fft@coredump.intra.peff.net/;
> i.e. that the object-info leak discussion back in July had encouraged
> you to work on this ... :)

Nope, I got terrified by the "yes | upload-pack" example I showed
earlier. :)

I was really worried it could turn into a heap overflow, but it turns
out that it cannot. But I still think tightening things up (and avoiding
any memory-consumption attacks) is worth doing.

-Peff