diff mbox series

[8/9] upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places

Message ID 20240228223903.GH1158131@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit a6ca601cdf82dda85db86c12973f2ecd746cb4bd
Headers show
Series bound upload-pack memory allocations | expand

Commit Message

Jeff King Feb. 28, 2024, 10:39 p.m. UTC
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of
"want" objects, 2022-09-06), we optimized the parse_object() calls for
v2 "want" lines from the client so that they avoided parsing blobs, and
so that they used the commit-graph rather than parsing commit objects
from scratch.

We should extend that to two other spots:

  1. We parse "have" objects in the got_oid() function. These won't
     generally be non-commits (unlike "want" lines from a partial
     clone). But we still benefit from the use of the commit-graph.

  2. For v0, the "want" lines are parsed in receive_needs(). These are
     also less likely to be non-commits because by default they have to
     be ref tips. There are config options you might set to allow
     non-tip objects, but you'd mostly do so to support partial clones,
     and clients recent enough to support partial clone will generally
     speak v2 anyway.

So I don't expect this change to improve performance much for day-to-day
operations. But both are possible denial-of-service vectors, where an
attacker can waste our time by sending over a large number of objects to
parse (of course we may waste even more time serving a pack to them, but
we try as much as possible to optimize that in pack-objects; we should
do what we can here in upload-pack, too).

With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows
similar results to what we saw in 0bc2557951 (which ran with the v2
protocol by default). Here are the numbers for linux.git:

  Test                          HEAD^                 HEAD
  -----------------------------------------------------------------------------
  5600.3: checkout of result    50.91(87.95+2.93)     41.75(79.00+3.18) -18.0%

Or for a more extreme (and malicious) case, we can claim to "have" every
blob in git.git over the v0 protocol:

  $ {
      echo "0032want $(git rev-parse HEAD)"
      printf 0000
      git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"'
    } >input

  $ time ./git.old upload-pack . <input >/dev/null
  real	0m52.951s
  user	0m51.633s
  sys	0m1.304s

  $ time ./git.new upload-pack . <input >/dev/null
  real	0m0.261s
  user	0m0.156s
  sys	0m0.105s

(Note that these don't actually compute a pack because of the hacky
protocol usage, so those numbers are representing the raw blob-parsing
effort done by upload-pack).

Signed-off-by: Jeff King <peff@peff.net>
---
 upload-pack.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/upload-pack.c b/upload-pack.c
index 3970bb9b30..b721155442 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -469,7 +469,8 @@  static void create_pack_file(struct upload_pack_data *pack_data,
 static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
 {
 	int we_knew_they_have = 0;
-	struct object *o = parse_object(the_repository, oid);
+	struct object *o = parse_object_with_flags(the_repository, oid,
+						   PARSE_OBJECT_SKIP_HASH_CHECK);
 
 	if (!o)
 		die("oops (%s)", oid_to_hex(oid));
@@ -1148,7 +1149,8 @@  static void receive_needs(struct upload_pack_data *data,
 			free(client_sid);
 		}
 
-		o = parse_object(the_repository, &oid_buf);
+		o = parse_object_with_flags(the_repository, &oid_buf,
+					    PARSE_OBJECT_SKIP_HASH_CHECK);
 		if (!o) {
 			packet_writer_error(&data->writer,
 					    "upload-pack: not our ref %s",