From patchwork Sat Sep 7 05:01:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11136223 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A31013BD for ; Sat, 7 Sep 2019 05:04:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 68399208C3 for ; Sat, 7 Sep 2019 05:04:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394351AbfIGFBf (ORCPT ); Sat, 7 Sep 2019 01:01:35 -0400 Received: from cloud.peff.net ([104.130.231.41]:42748 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S2394279AbfIGFBe (ORCPT ); Sat, 7 Sep 2019 01:01:34 -0400 Received: (qmail 28050 invoked by uid 109); 7 Sep 2019 05:01:35 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Sat, 07 Sep 2019 05:01:35 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9591 invoked by uid 111); 7 Sep 2019 05:03:21 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Sat, 07 Sep 2019 01:03:21 -0400 Authentication-Results: peff.net; auth=none Date: Sat, 7 Sep 2019 01:01:33 -0400 From: Jeff King To: git@vger.kernel.org Cc: SZEDER =?utf-8?b?R8OhYm9y?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Derrick Stolee , Taylor Blau Subject: [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Message-ID: <20190907050132.GA23904@sigill.intra.peff.net> References: <20190907045848.GA24515@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190907045848.GA24515@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: SZEDER Gábor Commit 49bbc57a57 (commit-graph write: emit a percentage for all progress, 2019-01-19) was a bit overeager when it added progress percentages to the "Expanding reachable commits in commit graph" phase as well, because most of the time the number of commits that phase has to iterate over is not known in advance and grows significantly, and, consequently, we end up with nonsensical numbers: $ git commit-graph write --reachable Expanding reachable commits in commit graph: 138606% (824706/595), done. [...] $ git rev-parse v5.0 | git commit-graph write --stdin-commits Expanding reachable commits in commit graph: 81264400% (812644/1), done. [...] Even worse, because the percentage grows so quickly, the progress code outputs much more often than it should (because it ticks every second, or every 1%), slowing the whole process down. My time for "git commit-graph write --reachable" on linux.git went from 13.463s to 12.521s with this patch, ~7% savings. Therefore, don't show progress percentages in the "Expanding reachable commits in commit graph" phase. Note that the current code does sometimes do the right thing, if we picked up all commits initially (e.g., omitting "--reachable" in a fully-packed repository would get the correct count without any parent traversal). So it may be possible to come up with a way to tell when we could use a percentage here. But in the meantime, let's make sure we robustly avoid printing nonsense. Signed-off-by: SZEDER Gábor Signed-off-by: Jeff King --- Compared to the original from: https://public-inbox.org/git/20190322102817.19708-1-szeder.dev@gmail.com/ I rebased it to handle code movement, added in the timing data, and tried to summarize the discussion from the thread. commit-graph.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index f2888c203b..d6a5c8cf1c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1050,7 +1050,7 @@ static void close_reachable(struct write_commit_graph_context *ctx) if (ctx->report_progress) ctx->progress = start_delayed_progress( _("Expanding reachable commits in commit graph"), - ctx->oids.nr); + 0); for (i = 0; i < ctx->oids.nr; i++) { display_progress(ctx->progress, i + 1); commit = lookup_commit(ctx->r, &ctx->oids.list[i]); From patchwork Sat Sep 7 05:04:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11136225 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A03571731 for ; Sat, 7 Sep 2019 05:04:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 84B3120863 for ; Sat, 7 Sep 2019 05:04:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394370AbfIGFEl (ORCPT ); Sat, 7 Sep 2019 01:04:41 -0400 Received: from cloud.peff.net ([104.130.231.41]:42760 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S2394250AbfIGFEl (ORCPT ); Sat, 7 Sep 2019 01:04:41 -0400 Received: (qmail 28074 invoked by uid 109); 7 Sep 2019 05:04:41 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Sat, 07 Sep 2019 05:04:41 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9624 invoked by uid 111); 7 Sep 2019 05:06:27 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Sat, 07 Sep 2019 01:06:27 -0400 Authentication-Results: peff.net; auth=none Date: Sat, 7 Sep 2019 01:04:40 -0400 From: Jeff King To: git@vger.kernel.org Cc: SZEDER =?utf-8?b?R8OhYm9y?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Derrick Stolee , Taylor Blau Subject: [PATCH 2/2] commit-graph: turn off save_commit_buffer Message-ID: <20190907050439.GB23904@sigill.intra.peff.net> References: <20190907045848.GA24515@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190907045848.GA24515@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The commit-graph tool may read a lot of commits, but it only cares about parsing their metadata (parents, trees, etc) and doesn't ever show the messages to the user. And so it should not need save_commit_buffer, which is meant for holding onto the object data of parsed commits so that we can show them later. In fact, it's quite harmful to do so. According to massif, the max heap of "git commit-graph write --reachable" in linux.git before/after this patch (removing the commit graph file in between) goes from ~1.1GB to ~270MB. Which isn't surprising, since the difference is about the sum of the uncompressed sizes of all commits in the repository, and this was equivalent to leaking them. This obviously helps if you're under memory pressure, but even without it, things go faster. My before/after times for that command (without massif) went from 12.521s to 11.874s, a speedup of ~5%. Signed-off-by: Jeff King --- We didn't actually notice this on linux.git, but rather on a repository with 130 million commits (don't ask). With this patch, I was able to generate the commit-graph file with a peak heap of ~25GB, which is ~200 bytes per commit. I'll bet we could do better with some effort, but obviously this case was just pathological. For most cases this should be cheaper than a normal repack (which probably spends that much memory on each object, not just commits). builtin/commit-graph.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 57863619b7..052696f1af 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) builtin_commit_graph_usage, PARSE_OPT_STOP_AT_NON_OPTION); + save_commit_buffer = 0; + if (argc > 0) { if (!strcmp(argv[0], "read")) return graph_read(argc, argv);