diff mbox series

[1/3] is_promisor_object(): free tree buffer after parsing

Message ID YHVFKgn7WN76QnRz@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit fcc07e980b344a813b47358d0aa823d07c7ac744
Headers show
Series low-hanging performance fruit with promisor packs | expand

Commit Message

Jeff King April 13, 2021, 7:15 a.m. UTC
To get the list of all promisor objects, we not only include all objects
in promisor packs, but also parse each of those objects to see which
objects they reference. After parsing a tree object, the tree->buffer
field will remain populated until we explicitly free it. So in a partial
clone of blob:none, for example, we are essentially reading every tree
in the repository (since they're all in the initial promisor pack), and
keeping all of their uncompressed contents in memory at once.

This patch frees the tree buffers after we've finished marking all of
their reachable objects. We shouldn't need to do this for any other
object type. While we are using some extra memory to store the structs,
no other object type stores the whole contents in its parsed form (we do
sometimes hold on to commit buffers, but less so these days due to
commit graphs, plus most commands which care about promisor objects turn
off the save_commit_buffer global).

Even for a moderate-sized repository like git.git, this patch drops the
peak heap (as measured by massif) for git-fsck from ~1.7GB to ~138MB.
Fsck is a good candidate for measuring here because it doesn't interact
with the promisor code except to call is_promisor_object(), so we can
isolate just this problem.

The added perf test shows only a tiny improvement on my machine for
git.git, since 1.7GB isn't enough to cause any real memory pressure:

  Test                                 HEAD^               HEAD
  --------------------------------------------------------------------------------
  5600.4: fsck                         21.26(20.90+0.35)   20.84(20.79+0.04) -2.0%

With linux.git the absolute change is a bit bigger, though still a small
percentage:

  Test                          HEAD^                 HEAD
  -----------------------------------------------------------------------------
  5600.4: fsck                  262.26(259.13+3.12)   254.92(254.62+0.29) -2.8%

I didn't have the patience to run it under massif with linux.git, but
it's probably on the order of about 14GB improvement, since that's the
sum of the sizes of all of the uncompressed trees (but still isn't
enough to create memory pressure on this particular machine, which has
64GB of RAM). Smaller machines would probably see a bigger effect on
runtime (and sadly our perf suite does not measure peak heap).

Signed-off-by: Jeff King <peff@peff.net>
---
 packfile.c                    | 1 +
 t/perf/p5600-partial-clone.sh | 4 ++++
 2 files changed, 5 insertions(+)

Comments

Junio C Hamano April 13, 2021, 8:17 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> The added perf test shows only a tiny improvement on my machine for
> git.git, since 1.7GB isn't enough to cause any real memory pressure:
>
>   Test                                 HEAD^               HEAD
>   --------------------------------------------------------------------------------
>   5600.4: fsck                         21.26(20.90+0.35)   20.84(20.79+0.04) -2.0%
>
> With linux.git the absolute change is a bit bigger, though still a small
> percentage:
>
>   Test                          HEAD^                 HEAD
>   -----------------------------------------------------------------------------
>   5600.4: fsck                  262.26(259.13+3.12)   254.92(254.62+0.29) -2.8%
>
> I didn't have the patience to run it under massif with linux.git, but
> it's probably on the order of about 14GB improvement, since that's the
> sum of the sizes of all of the uncompressed trees (but still isn't
> enough to create memory pressure on this particular machine, which has
> 64GB of RAM). Smaller machines would probably see a bigger effect on
> runtime (and sadly our perf suite does not measure peak heap).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  packfile.c                    | 1 +
>  t/perf/p5600-partial-clone.sh | 4 ++++
>  2 files changed, 5 insertions(+)
>
> diff --git a/packfile.c b/packfile.c
> index 8668345d93..b79cbc8cd4 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -2247,6 +2247,7 @@ static int add_promisor_object(const struct object_id *oid,
>  			return 0;
>  		while (tree_entry_gently(&desc, &entry))
>  			oidset_insert(set, &entry.oid);
> +		free_tree_buffer(tree);
>  	} else if (obj->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *) obj;
>  		struct commit_list *parents = commit->parents;

Hmph, does an added free() without removing one later mean we've
been leaking?

Nicely done.  Thanks.


> diff --git a/t/perf/p5600-partial-clone.sh b/t/perf/p5600-partial-clone.sh
> index 3e04bd2ae1..754aaec3dc 100755
> --- a/t/perf/p5600-partial-clone.sh
> +++ b/t/perf/p5600-partial-clone.sh
> @@ -23,4 +23,8 @@ test_perf 'checkout of result' '
>  	git -C worktree checkout -f
>  '
>  
> +test_perf 'fsck' '
> +	git -C bare.git fsck
> +'
> +
>  test_done
Jeff King April 14, 2021, 5:18 a.m. UTC | #2
On Tue, Apr 13, 2021 at 01:17:55PM -0700, Junio C Hamano wrote:

> > diff --git a/packfile.c b/packfile.c
> > index 8668345d93..b79cbc8cd4 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -2247,6 +2247,7 @@ static int add_promisor_object(const struct object_id *oid,
> >  			return 0;
> >  		while (tree_entry_gently(&desc, &entry))
> >  			oidset_insert(set, &entry.oid);
> > +		free_tree_buffer(tree);
> >  	} else if (obj->type == OBJ_COMMIT) {
> >  		struct commit *commit = (struct commit *) obj;
> >  		struct commit_list *parents = commit->parents;
> 
> Hmph, does an added free() without removing one later mean we've
> been leaking?

Yes. Though perhaps not technically a leak, in that we are still holding
on to the "struct tree" entries via the obj_hash table. But nobody was
freeing them at all until the end of the program.

I actually think it may be a mistake for "struct tree" to have
buffer/len fields at all. It is a slight convenience to be able to pass
them around with the struct, but it makes the expected lifetime much
more confusing. In practice, all code wants to deal with one tree at a
time, then drop the buffer when it's done (we might hold several when
recursing through subtrees, but we'd never hold more than the distance
from the leaf to the root, and each recursive invocation of something
like process_tree() is holding exactly one tree buffer).

It may not be worth the trouble to try to clean it up at this point,
though.

-Peff
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index 8668345d93..b79cbc8cd4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2247,6 +2247,7 @@  static int add_promisor_object(const struct object_id *oid,
 			return 0;
 		while (tree_entry_gently(&desc, &entry))
 			oidset_insert(set, &entry.oid);
+		free_tree_buffer(tree);
 	} else if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;
diff --git a/t/perf/p5600-partial-clone.sh b/t/perf/p5600-partial-clone.sh
index 3e04bd2ae1..754aaec3dc 100755
--- a/t/perf/p5600-partial-clone.sh
+++ b/t/perf/p5600-partial-clone.sh
@@ -23,4 +23,8 @@  test_perf 'checkout of result' '
 	git -C worktree checkout -f
 '
 
+test_perf 'fsck' '
+	git -C bare.git fsck
+'
+
 test_done