Message ID | 20240228223907.GI1158131@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 6cd05e768b7e54ca48b16fb0214df4c70aecd46c |
Headers | show |
Series | bound upload-pack memory allocations | expand |
On Wed, Feb 28, 2024 at 05:39:07PM -0500, Jeff King wrote: > When a client sends us a "want" or "have" line, we call parse_object() > to get an object struct. If the object is a tree, then the parsed state > means that tree->buffer points to the uncompressed contents of the tree. > But we don't really care about it. We only really need to parse commits > and tags; for trees and blobs, the important output is just a "struct > object" with the correct type. > > But much worse, we do not ever free that tree buffer. It's not leaked in > the traditional sense, in that we still have a pointer to it from the > global object hash. But if the client requests many trees, we'll hold > all of their contents in memory at the same time. > > Nobody really noticed because it's rare for clients to directly request > a tree. It might happen for a lightweight tag pointing straight at a > tree, or it might happen for a "tree:depth" partial clone filling in > missing trees. > > But it's also possible for a malicious client to request a lot of trees, > causing upload-pack's memory to balloon. For example, without this > patch, requesting every tree in git.git like: > > pktline() { > local msg="$*" > printf "%04x%s\n" $((1+4+${#msg})) "$msg" > } > > want_trees() { > pktline command=fetch > printf 0001 > git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | > while read oid type; do > test "$type" = "tree" || continue > pktline want $oid > done > pktline done > printf 0000 > } > > want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null > > shows a peak heap usage of ~3.7GB. Which is just about the sum of the > sizes of all of the uncompressed trees. For linux.git, it's closer to > 17GB. > > So the obvious thing to do is to call free_tree_buffer() after we > realize that we've parsed a tree. We know that upload-pack won't need it > later. But let's push the logic into parse_object_with_flags(), telling > it to discard the tree buffer immediately. There are two reasons for > this. One, all of the relevant call-sites already call the with_options > variant to pass the SKIP_HASH flag. So it actually ends up as less code > than manually free-ing in each spot. And two, it enables an extra > optimization that I'll discuss below. > > I've touched all of the sites that currently use SKIP_HASH in > upload-pack. That drops the peak heap of the upload-pack invocation > above from 3.7GB to ~24MB. > > I've also modified the caller in get_reference(); a partial clone > benefits from its use in pack-objects for the reasons given in > 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, > 2022-09-06), where we were measuring blob requests. But note that the > results of get_reference() are used for traversing, as well; so we > really would _eventually_ use the tree contents. That makes this at > first glance a space/time tradeoff: we won't hold all of the trees in > memory at once, but we'll have to reload them each when it comes time to > traverse. > > And here's where our extra optimization comes in. If the caller is not > going to immediately look at the tree contents, and it doesn't care > about checking the hash, then parse_object() can simply skip loading the > tree entirely, just like we do for blobs! And now it's not a space/time > tradeoff in get_reference() anymore. It's just a lazy-load: we're > delaying reading the tree contents until it's time to actually traverse > them one by one. > > And of course for upload-pack, this optimization means we never load the > trees at all, saving lots of CPU time. Timing the "every tree from > git.git" request above shows upload-pack dropping from 32 seconds of CPU > to 19 (the remainder is mostly due to pack-objects actually sending the > pack; timing just the upload-pack portion shows we go from 13s to > ~0.28s). > > These are all highly gamed numbers, of course. For real-world > partial-clone requests we're saving only a small bit of time in > practice. But it does help harden upload-pack against malicious > denial-of-service attacks. > > Signed-off-by: Jeff King <peff@peff.net> > --- > object.c | 14 ++++++++++++++ > object.h | 1 + > revision.c | 3 ++- > upload-pack.c | 9 ++++++--- > 4 files changed, 23 insertions(+), 4 deletions(-) > > diff --git a/object.c b/object.c > index e6a1c4d905..f11c59ac0c 100644 > --- a/object.c > +++ b/object.c > @@ -271,6 +271,7 @@ struct object *parse_object_with_flags(struct repository *r, > enum parse_object_flags flags) > { > int skip_hash = !!(flags & PARSE_OBJECT_SKIP_HASH_CHECK); > + int discard_tree = !!(flags & PARSE_OBJECT_DISCARD_TREE); > unsigned long size; > enum object_type type; > int eaten; > @@ -298,6 +299,17 @@ struct object *parse_object_with_flags(struct repository *r, > return lookup_object(r, oid); > } > > + /* > + * If the caller does not care about the tree buffer and does not > + * care about checking the hash, we can simply verify that we > + * have the on-disk object with the correct type. > + */ > + if (skip_hash && discard_tree && > + (!obj || obj->type == OBJ_TREE) && > + oid_object_info(r, oid, NULL) == OBJ_TREE) { > + return &lookup_tree(r, oid)->object; > + } The other condition for blobs does the same, but the condition here confuses me. Why do we call `oid_object_info()` if we have already figured out that `obj->type == OBJ_TREE`? Feels like wasted effort if the in-memory object has been determined to be a tree already anyway. I'd rather have expected it to look like the following: if (skip_hash && discard_tree && ((obj && obj->type == OBJ_TREE) || (!obj && oid_object_info(r, oid, NULL)) == OBJ_TREE)) { return &lookup_tree(r, oid)->object; } Am I missing some side effect that `oid_object_info()` provides? Patrick > buffer = repo_read_object_file(r, oid, &type, &size); > if (buffer) { > if (!skip_hash && > @@ -311,6 +323,8 @@ struct object *parse_object_with_flags(struct repository *r, > buffer, &eaten); > if (!eaten) > free(buffer); > + if (discard_tree && type == OBJ_TREE) > + free_tree_buffer((struct tree *)obj); > return obj; > } > return NULL; > diff --git a/object.h b/object.h > index 114d45954d..c7123cade6 100644 > --- a/object.h > +++ b/object.h > @@ -197,6 +197,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet); > */ > enum parse_object_flags { > PARSE_OBJECT_SKIP_HASH_CHECK = 1 << 0, > + PARSE_OBJECT_DISCARD_TREE = 1 << 1, > }; > struct object *parse_object(struct repository *r, const struct object_id *oid); > struct object *parse_object_with_flags(struct repository *r, > diff --git a/revision.c b/revision.c > index 2424c9bd67..b10f63a607 100644 > --- a/revision.c > +++ b/revision.c > @@ -381,7 +381,8 @@ static struct object *get_reference(struct rev_info *revs, const char *name, > > object = parse_object_with_flags(revs->repo, oid, > revs->verify_objects ? 0 : > - PARSE_OBJECT_SKIP_HASH_CHECK); > + PARSE_OBJECT_SKIP_HASH_CHECK | > + PARSE_OBJECT_DISCARD_TREE); > > if (!object) { > if (revs->ignore_missing) > diff --git a/upload-pack.c b/upload-pack.c > index b721155442..761af4a532 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -470,7 +470,8 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid > { > int we_knew_they_have = 0; > struct object *o = parse_object_with_flags(the_repository, oid, > - PARSE_OBJECT_SKIP_HASH_CHECK); > + PARSE_OBJECT_SKIP_HASH_CHECK | > + PARSE_OBJECT_DISCARD_TREE); > > if (!o) > die("oops (%s)", oid_to_hex(oid)); > @@ -1150,7 +1151,8 @@ static void receive_needs(struct upload_pack_data *data, > } > > o = parse_object_with_flags(the_repository, &oid_buf, > - PARSE_OBJECT_SKIP_HASH_CHECK); > + PARSE_OBJECT_SKIP_HASH_CHECK | > + PARSE_OBJECT_DISCARD_TREE); > if (!o) { > packet_writer_error(&data->writer, > "upload-pack: not our ref %s", > @@ -1467,7 +1469,8 @@ static int parse_want(struct packet_writer *writer, const char *line, > "expected to get oid, not '%s'", line); > > o = parse_object_with_flags(the_repository, &oid, > - PARSE_OBJECT_SKIP_HASH_CHECK); > + PARSE_OBJECT_SKIP_HASH_CHECK | > + PARSE_OBJECT_DISCARD_TREE); > > if (!o) { > packet_writer_error(writer, > -- > 2.44.0.rc2.424.gbdbf4d014b >
On Mon, Mar 04, 2024 at 09:33:57AM +0100, Patrick Steinhardt wrote: > > + if (skip_hash && discard_tree && > > + (!obj || obj->type == OBJ_TREE) && > > + oid_object_info(r, oid, NULL) == OBJ_TREE) { > > + return &lookup_tree(r, oid)->object; > > + } > > The other condition for blobs does the same, but the condition here > confuses me. Why do we call `oid_object_info()` if we have already > figured out that `obj->type == OBJ_TREE`? Feels like wasted effort if > the in-memory object has been determined to be a tree already anyway. > > I'd rather have expected it to look like the following: > > if (skip_hash && discard_tree && > ((obj && obj->type == OBJ_TREE) || > (!obj && oid_object_info(r, oid, NULL)) == OBJ_TREE)) { > return &lookup_tree(r, oid)->object; > } > > Am I missing some side effect that `oid_object_info()` provides? Calling oid_object_info() will make sure the on-disk object exists and has the expected type. Keep in mind that an in-memory "struct object" may have a type that was just implied by another reference. E.g., if a commit references some object X in its tree field, then we'll call lookup_tree(X) to get a "struct tree" without actually touching the odb at all. When it comes time to parse that object, that's when we'll see if we really have it and if it's a tree. In the case of skip_hash (and discard_tree) it might be OK to skip both of those checks. If we do, I think we should probably do the same for blobs (in the skip_hash case, we could just return the object we found already). But I'd definitely prefer to do that as a separate step (if at all). -Peff
On Mon, Mar 04, 2024 at 04:57:36AM -0500, Jeff King wrote: > On Mon, Mar 04, 2024 at 09:33:57AM +0100, Patrick Steinhardt wrote: > > > > + if (skip_hash && discard_tree && > > > + (!obj || obj->type == OBJ_TREE) && > > > + oid_object_info(r, oid, NULL) == OBJ_TREE) { > > > + return &lookup_tree(r, oid)->object; > > > + } > > > > The other condition for blobs does the same, but the condition here > > confuses me. Why do we call `oid_object_info()` if we have already > > figured out that `obj->type == OBJ_TREE`? Feels like wasted effort if > > the in-memory object has been determined to be a tree already anyway. > > > > I'd rather have expected it to look like the following: > > > > if (skip_hash && discard_tree && > > ((obj && obj->type == OBJ_TREE) || > > (!obj && oid_object_info(r, oid, NULL)) == OBJ_TREE)) { > > return &lookup_tree(r, oid)->object; > > } > > > > Am I missing some side effect that `oid_object_info()` provides? > > Calling oid_object_info() will make sure the on-disk object exists and > has the expected type. Keep in mind that an in-memory "struct object" > may have a type that was just implied by another reference. E.g., if a > commit references some object X in its tree field, then we'll call > lookup_tree(X) to get a "struct tree" without actually touching the odb > at all. When it comes time to parse that object, that's when we'll see > if we really have it and if it's a tree. > > In the case of skip_hash (and discard_tree) it might be OK to skip both > of those checks. If we do, I think we should probably do the same for > blobs (in the skip_hash case, we could just return the object we found > already). > > But I'd definitely prefer to do that as a separate step (if at all). Thanks for the explanation! Patrick
diff --git a/object.c b/object.c index e6a1c4d905..f11c59ac0c 100644 --- a/object.c +++ b/object.c @@ -271,6 +271,7 @@ struct object *parse_object_with_flags(struct repository *r, enum parse_object_flags flags) { int skip_hash = !!(flags & PARSE_OBJECT_SKIP_HASH_CHECK); + int discard_tree = !!(flags & PARSE_OBJECT_DISCARD_TREE); unsigned long size; enum object_type type; int eaten; @@ -298,6 +299,17 @@ struct object *parse_object_with_flags(struct repository *r, return lookup_object(r, oid); } + /* + * If the caller does not care about the tree buffer and does not + * care about checking the hash, we can simply verify that we + * have the on-disk object with the correct type. + */ + if (skip_hash && discard_tree && + (!obj || obj->type == OBJ_TREE) && + oid_object_info(r, oid, NULL) == OBJ_TREE) { + return &lookup_tree(r, oid)->object; + } + buffer = repo_read_object_file(r, oid, &type, &size); if (buffer) { if (!skip_hash && @@ -311,6 +323,8 @@ struct object *parse_object_with_flags(struct repository *r, buffer, &eaten); if (!eaten) free(buffer); + if (discard_tree && type == OBJ_TREE) + free_tree_buffer((struct tree *)obj); return obj; } return NULL; diff --git a/object.h b/object.h index 114d45954d..c7123cade6 100644 --- a/object.h +++ b/object.h @@ -197,6 +197,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet); */ enum parse_object_flags { PARSE_OBJECT_SKIP_HASH_CHECK = 1 << 0, + PARSE_OBJECT_DISCARD_TREE = 1 << 1, }; struct object *parse_object(struct repository *r, const struct object_id *oid); struct object *parse_object_with_flags(struct repository *r, diff --git a/revision.c b/revision.c index 2424c9bd67..b10f63a607 100644 --- a/revision.c +++ b/revision.c @@ -381,7 +381,8 @@ static struct object *get_reference(struct rev_info *revs, const char *name, object = parse_object_with_flags(revs->repo, oid, revs->verify_objects ? 0 : - PARSE_OBJECT_SKIP_HASH_CHECK); + PARSE_OBJECT_SKIP_HASH_CHECK | + PARSE_OBJECT_DISCARD_TREE); if (!object) { if (revs->ignore_missing) diff --git a/upload-pack.c b/upload-pack.c index b721155442..761af4a532 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -470,7 +470,8 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid { int we_knew_they_have = 0; struct object *o = parse_object_with_flags(the_repository, oid, - PARSE_OBJECT_SKIP_HASH_CHECK); + PARSE_OBJECT_SKIP_HASH_CHECK | + PARSE_OBJECT_DISCARD_TREE); if (!o) die("oops (%s)", oid_to_hex(oid)); @@ -1150,7 +1151,8 @@ static void receive_needs(struct upload_pack_data *data, } o = parse_object_with_flags(the_repository, &oid_buf, - PARSE_OBJECT_SKIP_HASH_CHECK); + PARSE_OBJECT_SKIP_HASH_CHECK | + PARSE_OBJECT_DISCARD_TREE); if (!o) { packet_writer_error(&data->writer, "upload-pack: not our ref %s", @@ -1467,7 +1469,8 @@ static int parse_want(struct packet_writer *writer, const char *line, "expected to get oid, not '%s'", line); o = parse_object_with_flags(the_repository, &oid, - PARSE_OBJECT_SKIP_HASH_CHECK); + PARSE_OBJECT_SKIP_HASH_CHECK | + PARSE_OBJECT_DISCARD_TREE); if (!o) { packet_writer_error(writer,
When a client sends us a "want" or "have" line, we call parse_object() to get an object struct. If the object is a tree, then the parsed state means that tree->buffer points to the uncompressed contents of the tree. But we don't really care about it. We only really need to parse commits and tags; for trees and blobs, the important output is just a "struct object" with the correct type. But much worse, we do not ever free that tree buffer. It's not leaked in the traditional sense, in that we still have a pointer to it from the global object hash. But if the client requests many trees, we'll hold all of their contents in memory at the same time. Nobody really noticed because it's rare for clients to directly request a tree. It might happen for a lightweight tag pointing straight at a tree, or it might happen for a "tree:depth" partial clone filling in missing trees. But it's also possible for a malicious client to request a lot of trees, causing upload-pack's memory to balloon. For example, without this patch, requesting every tree in git.git like: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_trees() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "tree" || continue pktline want $oid done pktline done printf 0000 } want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null shows a peak heap usage of ~3.7GB. Which is just about the sum of the sizes of all of the uncompressed trees. For linux.git, it's closer to 17GB. So the obvious thing to do is to call free_tree_buffer() after we realize that we've parsed a tree. We know that upload-pack won't need it later. But let's push the logic into parse_object_with_flags(), telling it to discard the tree buffer immediately. There are two reasons for this. One, all of the relevant call-sites already call the with_options variant to pass the SKIP_HASH flag. So it actually ends up as less code than manually free-ing in each spot. And two, it enables an extra optimization that I'll discuss below. I've touched all of the sites that currently use SKIP_HASH in upload-pack. That drops the peak heap of the upload-pack invocation above from 3.7GB to ~24MB. I've also modified the caller in get_reference(); a partial clone benefits from its use in pack-objects for the reasons given in 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), where we were measuring blob requests. But note that the results of get_reference() are used for traversing, as well; so we really would _eventually_ use the tree contents. That makes this at first glance a space/time tradeoff: we won't hold all of the trees in memory at once, but we'll have to reload them each when it comes time to traverse. And here's where our extra optimization comes in. If the caller is not going to immediately look at the tree contents, and it doesn't care about checking the hash, then parse_object() can simply skip loading the tree entirely, just like we do for blobs! And now it's not a space/time tradeoff in get_reference() anymore. It's just a lazy-load: we're delaying reading the tree contents until it's time to actually traverse them one by one. And of course for upload-pack, this optimization means we never load the trees at all, saving lots of CPU time. Timing the "every tree from git.git" request above shows upload-pack dropping from 32 seconds of CPU to 19 (the remainder is mostly due to pack-objects actually sending the pack; timing just the upload-pack portion shows we go from 13s to ~0.28s). These are all highly gamed numbers, of course. For real-world partial-clone requests we're saving only a small bit of time in practice. But it does help harden upload-pack against malicious denial-of-service attacks. Signed-off-by: Jeff King <peff@peff.net> --- object.c | 14 ++++++++++++++ object.h | 1 + revision.c | 3 ++- upload-pack.c | 9 ++++++--- 4 files changed, 23 insertions(+), 4 deletions(-)