Message ID | 20190907050439.GB23904@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | a few commit-graph improvements | expand |
On Sat, Sep 07, 2019 at 01:04:40AM -0400, Jeff King wrote: > The commit-graph tool may read a lot of commits, but it only cares about > parsing their metadata (parents, trees, etc) and doesn't ever show the > messages to the user. And so it should not need save_commit_buffer, > which is meant for holding onto the object data of parsed commits so > that we can show them later. In fact, it's quite harmful to do so. > According to massif, the max heap of "git commit-graph write > --reachable" in linux.git before/after this patch (removing the commit > graph file in between) goes from ~1.1GB to ~270MB. > > Which isn't surprising, since the difference is about the sum of the > uncompressed sizes of all commits in the repository, and this was > equivalent to leaking them. > > This obviously helps if you're under memory pressure, but even without > it, things go faster. My before/after times for that command (without > massif) went from 12.521s to 11.874s, a speedup of ~5%. > > Signed-off-by: Jeff King <peff@peff.net> > --- > We didn't actually notice this on linux.git, but rather on a repository > with 130 million commits (don't ask). With this patch, I was able to > generate the commit-graph file with a peak heap of ~25GB, which is ~200 > bytes per commit. > > I'll bet we could do better with some effort, but obviously this case > was just pathological. For most cases this should be cheaper than a > normal repack (which probably spends that much memory on each object, > not just commits). > > builtin/commit-graph.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index 57863619b7..052696f1af 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) > builtin_commit_graph_usage, > PARSE_OPT_STOP_AT_NON_OPTION); > > + save_commit_buffer = 0; > + This looks exactly right to me. We had discussed a little bit off-list about where you might place this line, but I think that the spot you picked is perfect: as late as possible. Thankfully, the option parsing code here doesn't load any commits (though even if it did, I don't think that turning on/off 'save_commit_buffer' would really make much of a difference). So, the patch here looks obviously correct, and I don't think it needs a test or anything like that... besides: what is there to test? :). > if (argc > 0) { > if (!strcmp(argv[0], "read")) > return graph_read(argc, argv); > -- > 2.23.0.474.gb1abd76f7a Thanks, Taylor
On Sat, Sep 07, 2019 at 02:56:36PM -0400, Taylor Blau wrote: > So, the patch here looks obviously correct, and I don't think it needs a > test or anything like that... besides: what is there to test? :). There's no functional change, so as long as this has coverage in the regular suite (and I think it does), there's no point in adding a new functional test. However, it may make sense to get some coverage of commit graphs in the perf suite, so that we could detect regressions there. I think that would make sense as a separate series, though, not attached to this particular fix. (It would also be cool if the perf suite could record peak memory usage, but that would take a fair bit of refactoring). -Peff
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 57863619b7..052696f1af 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) builtin_commit_graph_usage, PARSE_OPT_STOP_AT_NON_OPTION); + save_commit_buffer = 0; + if (argc > 0) { if (!strcmp(argv[0], "read")) return graph_read(argc, argv);
The commit-graph tool may read a lot of commits, but it only cares about parsing their metadata (parents, trees, etc) and doesn't ever show the messages to the user. And so it should not need save_commit_buffer, which is meant for holding onto the object data of parsed commits so that we can show them later. In fact, it's quite harmful to do so. According to massif, the max heap of "git commit-graph write --reachable" in linux.git before/after this patch (removing the commit graph file in between) goes from ~1.1GB to ~270MB. Which isn't surprising, since the difference is about the sum of the uncompressed sizes of all commits in the repository, and this was equivalent to leaking them. This obviously helps if you're under memory pressure, but even without it, things go faster. My before/after times for that command (without massif) went from 12.521s to 11.874s, a speedup of ~5%. Signed-off-by: Jeff King <peff@peff.net> --- We didn't actually notice this on linux.git, but rather on a repository with 130 million commits (don't ask). With this patch, I was able to generate the commit-graph file with a peak heap of ~25GB, which is ~200 bytes per commit. I'll bet we could do better with some effort, but obviously this case was just pathological. For most cases this should be cheaper than a normal repack (which probably spends that much memory on each object, not just commits). builtin/commit-graph.c | 2 ++ 1 file changed, 2 insertions(+)