From patchwork Thu Aug 5 11:25:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12421027 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E8D1C4338F for ; Thu, 5 Aug 2021 11:25:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E282661108 for ; Thu, 5 Aug 2021 11:25:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241000AbhHELZq (ORCPT ); Thu, 5 Aug 2021 07:25:46 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:55903 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240623AbhHELZm (ORCPT ); Thu, 5 Aug 2021 07:25:42 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 20B435C00C2; Thu, 5 Aug 2021 07:25:28 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Thu, 05 Aug 2021 07:25:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=WuoF964+NMkZPHlCpgvNh99sdy3 /bDL1xWbmx3G4QfQ=; b=TYzh82ARzCmhNoP1sPDqJptekuMlsRP5ploOBFbqMeC cvlWt0Mcn+bqn1MxaO9fa8GTu8WpZZ/iGz+xOJpECRFuNDu6Gri7T3jCYiYH+rrp NCogiK1MK772zx5eTDpglRJ8DtU1/TBl3GF7Q8PuhkhbxxGEOm9M4exRHxM960sf RmyahIL64LF2iuRBjLgOPsOR0HEB8LJinKWFtYiWcSH4yLbx6EJsx7mm9vOYgN9V Zr8v9sCILTgMbzoKJ/wcm+kLSwq0PjYBGrqY9quPTXY91i9zPjRGQAS7xApf+ksE CrHROvmB7nJYeqovHzbw3Iig+evV+KscRGl8oVHLN2g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=WuoF96 4+NMkZPHlCpgvNh99sdy3/bDL1xWbmx3G4QfQ=; b=Z7RPJhG1v7YoSlzJwNzZLd 5Dxtmz/vDhGdfNckC4YwVejElLcdDPZrf3cAtqCEAMAKg6DpMEX6UNfBa1MzNzel og8OKcxrzA88M6t+z4VfYU6Jtrwci49TNN9exqe+PpSplh0sxYb7T9XnnYmJE65f eCkfQfbhsi3F3Hz/DeUs3Ers9MR7K8r0KKcrOiH6J0v5oOY5PrPpyjFCe4J4uB56 8c0TOeyHubrRp9rlp+U15K2XKp8Lz6R2AKhiaIOnQJlY6zVebjmUzxByYRylM+kj mpss8mDOqg3ykqOJwNsosibu4CTKNPFOeeTZ9R67okDYESPg+vPn0ujV0ZgMzu3Q == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdefjecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtvdenucfhrhhomheprfgrthhrihgt khcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvghrnh epheeghfdtfeeuffehkefgffduleffjedthfdvjeektdfhhedvlefgtefgvdettdfhnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhssehpkh hsrdhimh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 07:25:26 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 153a1d56 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 5 Aug 2021 11:25:25 +0000 (UTC) Date: Thu, 5 Aug 2021 13:25:24 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano , Taylor Blau Subject: [PATCH v4 1/6] revision: separate walk and unsorted flags Message-ID: <67232910acf4a248654060ad51a17af37acba0fb.1628162156.git.ps@pks.im> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The `--no-walk` flag supports two modes: either it sorts the revisions given as input input or it doesn't. This is reflected in a single `no_walk` flag, which reflects one of the three states "walk", "don't walk but without sorting" and "don't walk but with sorting". Split up the flag into two separate bits, one indicating whether we should walk or not and one indicating whether the input should be sorted or not. This will allow us to more easily introduce a new flag `--unsorted-input`, which only impacts the sorting bit. Signed-off-by: Patrick Steinhardt --- builtin/log.c | 2 +- builtin/revert.c | 3 ++- revision.c | 9 +++++---- revision.h | 7 ++----- 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index 3d7717ba5c..f75d87e8d7 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -637,7 +637,7 @@ int cmd_show(int argc, const char **argv, const char *prefix) repo_init_revisions(the_repository, &rev, prefix); rev.diff = 1; rev.always_show_header = 1; - rev.no_walk = REVISION_WALK_NO_WALK_SORTED; + rev.no_walk = 1; rev.diffopt.stat_width = -1; /* Scale to real terminal size */ memset(&opt, 0, sizeof(opt)); diff --git a/builtin/revert.c b/builtin/revert.c index 237f2f18d4..2e13660e4b 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -191,7 +191,8 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) struct setup_revision_opt s_r_opt; opts->revs = xmalloc(sizeof(*opts->revs)); repo_init_revisions(the_repository, opts->revs, NULL); - opts->revs->no_walk = REVISION_WALK_NO_WALK_UNSORTED; + opts->revs->no_walk = 1; + opts->revs->unsorted_input = 1; if (argc < 2) usage_with_options(usage_str, options); if (!strcmp(argv[1], "-")) diff --git a/revision.c b/revision.c index cddd0542a6..86bbcd10d2 100644 --- a/revision.c +++ b/revision.c @@ -2651,16 +2651,17 @@ static int handle_revision_pseudo_opt(const char *submodule, } else if (!strcmp(arg, "--not")) { *flags ^= UNINTERESTING | BOTTOM; } else if (!strcmp(arg, "--no-walk")) { - revs->no_walk = REVISION_WALK_NO_WALK_SORTED; + revs->no_walk = 1; } else if (skip_prefix(arg, "--no-walk=", &optarg)) { /* * Detached form ("--no-walk X" as opposed to "--no-walk=X") * not allowed, since the argument is optional. */ + revs->no_walk = 1; if (!strcmp(optarg, "sorted")) - revs->no_walk = REVISION_WALK_NO_WALK_SORTED; + revs->unsorted_input = 0; else if (!strcmp(optarg, "unsorted")) - revs->no_walk = REVISION_WALK_NO_WALK_UNSORTED; + revs->unsorted_input = 1; else return error("invalid argument to --no-walk"); } else if (!strcmp(arg, "--do-walk")) { @@ -3584,7 +3585,7 @@ int prepare_revision_walk(struct rev_info *revs) if (!revs->reflog_info) prepare_to_use_bloom_filter(revs); - if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED) + if (!revs->unsorted_input) commit_list_sort_by_date(&revs->commits); if (revs->no_walk) return 0; diff --git a/revision.h b/revision.h index fbb068da9f..0c65a760ee 100644 --- a/revision.h +++ b/revision.h @@ -79,10 +79,6 @@ struct rev_cmdline_info { } *rev; }; -#define REVISION_WALK_WALK 0 -#define REVISION_WALK_NO_WALK_SORTED 1 -#define REVISION_WALK_NO_WALK_UNSORTED 2 - struct oidset; struct topo_walk_info; @@ -129,7 +125,8 @@ struct rev_info { /* Traversal flags */ unsigned int dense:1, prune:1, - no_walk:2, + no_walk:1, + unsorted_input:1, remove_empty_trees:1, simplify_history:1, show_pulls:1, From patchwork Thu Aug 5 11:25:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12421029 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1B22C4320A for ; Thu, 5 Aug 2021 11:25:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 718506112D for ; Thu, 5 Aug 2021 11:25:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241004AbhHELZr (ORCPT ); Thu, 5 Aug 2021 07:25:47 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:44057 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240992AbhHELZp (ORCPT ); Thu, 5 Aug 2021 07:25:45 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 846815C0057; Thu, 5 Aug 2021 07:25:31 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Thu, 05 Aug 2021 07:25:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=wIdT187urxsZ+ugFCKm5eRmv8lR s/tUi4ybukcDS7F8=; b=QfIB4nWElak8JnxEds+WslvjrkIOd122w9weletMUJm 5btuMjJOwz8Et9YgSj7LoE+w8Z9YP7sygWZENpF1kdGRapLFx4yWalZvBPxyPhYX bWYKhzsizwvGNbyt7ORs+znzyJDn9DgSljjicbvFoTnG9QKqDeWlQfhOEJL+/ESz xDXKzkNBLkK5LK9yJlazm9qpC2M4MsuB0mi3l/4lkZHWyOwf3y+wHB7aWsnLWt/Z Jh0esty0TRwyCEHCycHDdGye3YeUCrqTeOMbPjd85kLfNlJp/aXTTYDgpiGf3c4K E/oAmLRM8vpSCvLPxFsy9eVq2usjHfnK2qqyZRGYXIQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=wIdT18 7urxsZ+ugFCKm5eRmv8lRs/tUi4ybukcDS7F8=; b=IR98Vl0MzLBxcpr2xsDSIF FXnAAqRS/omLU5hpg8OzIEE4TxtkNV3+RaqYUq5ihbQkMKzcuooZnQCD6HRDGT3i cjy94tLNM1yBCOt7k+3D3OW2toU4j6XqYt54TyHrW8wF5Kw5wXUG2khgZ3DGpL5w BYSFPxhPtiLeOEcdOVbiLmaLERCEKiybD9IerYdGQTwXtfP6RqEaxzlVBsd5+VMf 2s1cPt/iWBvHqhtf1eia1w3uSgLx3RiZeCq/1Ol63cxu9fgQTTWcoe2UlcBcY6Gs k8X2Dw/WmsoEAOmhG8d6kO30vJgpWLoQO9VrLLcD/LRQTDEvJGAKqd1j2kdxFoyA == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdefkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomheprfgrthhrihgt khcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvghrnh epveduueeitefgieevvefgtdeuueevveeggeevgfegvdeuleeivdelleelkeeuvddunecu ffhomhgrihhnpehgihhtlhgrsgdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 07:25:30 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id a3568f38 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 5 Aug 2021 11:25:30 +0000 (UTC) Date: Thu, 5 Aug 2021 13:25:28 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano , Taylor Blau Subject: [PATCH v4 2/6] connected: do not sort input revisions Message-ID: <9d7f484907e2bd2492e6676238579e9f0c6ed374.1628162156.git.ps@pks.im> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In order to compute whether objects reachable from a set of tips are all connected, we do a revision walk with these tips as positive references and `--not --all`. `--not --all` will cause the revision walk to load all preexisting references as uninteresting, which can be very expensive in repositories with many references. Benchmarking the git-rev-list(1) command highlights that by far the most expensive single phase is initial sorting of the input revisions: after all references have been loaded, we first sort commits by author date. In a real-world repository with about 2.2 million references, it makes up about 40% of the total runtime of git-rev-list(1). Ultimately, the connectivity check shouldn't really bother about the order of input revisions at all. We only care whether we can actually walk all objects until we hit the cut-off point. So sorting the input is a complete waste of time. Introduce a new "--unsorted-input" flag to git-rev-list(1) which will cause it to not sort the commits and adjust the connectivity check to always pass the flag. This results in the following speedups, executed in a clone of gitlab-org/gitlab [1]: Benchmark #1: git rev-list --objects --quiet --not --all --not $(cat newrev) Time (mean ± σ): 7.639 s ± 0.065 s [User: 7.304 s, System: 0.335 s] Range (min … max): 7.543 s … 7.742 s 10 runs Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.995 s ± 0.044 s [User: 4.657 s, System: 0.337 s] Range (min … max): 4.909 s … 5.048 s 10 runs Summary 'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran 1.53 ± 0.02 times faster than 'git rev-list --objects --quiet --not --all --not $newrev' [1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs are visible to clients. Signed-off-by: Patrick Steinhardt --- Documentation/rev-list-options.txt | 8 ++++++- connected.c | 1 + revision.c | 13 ++++++++-- t/t6000-rev-list-misc.sh | 38 ++++++++++++++++++++++++++++++ 4 files changed, 57 insertions(+), 3 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 24569b06d1..b7bd27e171 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -968,6 +968,11 @@ list of the missing objects. Object IDs are prefixed with a ``?'' character. objects. endif::git-rev-list[] +--unsorted-input:: + Show commits in the order they were given on the command line instead + of sorting them in reverse chronological order by commit time. Cannot + be combined with `--no-walk` or `--no-walk=sorted`. + --no-walk[=(sorted|unsorted)]:: Only show the given commits, but do not traverse their ancestors. This has no effect if a range is specified. If the argument @@ -975,7 +980,8 @@ endif::git-rev-list[] given on the command line. Otherwise (if `sorted` or no argument was given), the commits are shown in reverse chronological order by commit time. - Cannot be combined with `--graph`. + Cannot be combined with `--graph`. Cannot be combined with + `--unsorted-input` if `sorted` or no argument was given. --do-walk:: Overrides a previous `--no-walk`. diff --git a/connected.c b/connected.c index b18299fdf0..b5f9523a5f 100644 --- a/connected.c +++ b/connected.c @@ -106,6 +106,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data, if (opt->progress) strvec_pushf(&rev_list.args, "--progress=%s", _("Checking connectivity")); + strvec_push(&rev_list.args, "--unsorted-input"); rev_list.git_cmd = 1; rev_list.env = opt->env; diff --git a/revision.c b/revision.c index 86bbcd10d2..793f76a509 100644 --- a/revision.c +++ b/revision.c @@ -2256,6 +2256,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--author-date-order")) { revs->sort_order = REV_SORT_BY_AUTHOR_DATE; revs->topo_order = 1; + } else if (!strcmp(arg, "--unsorted-input")) { + if (revs->no_walk && !revs->unsorted_input) + die(_("--unsorted-input is incompatible with --no-walk and --no-walk=sorted")); + revs->unsorted_input = 1; } else if (!strcmp(arg, "--early-output")) { revs->early_output = 100; revs->topo_order = 1; @@ -2651,6 +2655,8 @@ static int handle_revision_pseudo_opt(const char *submodule, } else if (!strcmp(arg, "--not")) { *flags ^= UNINTERESTING | BOTTOM; } else if (!strcmp(arg, "--no-walk")) { + if (revs->unsorted_input) + die(_("--no-walk is incompatible with --no-walk=unsorted and --unsorted-input")); revs->no_walk = 1; } else if (skip_prefix(arg, "--no-walk=", &optarg)) { /* @@ -2658,9 +2664,12 @@ static int handle_revision_pseudo_opt(const char *submodule, * not allowed, since the argument is optional. */ revs->no_walk = 1; - if (!strcmp(optarg, "sorted")) + if (!strcmp(optarg, "sorted")) { + if (revs->unsorted_input) + die(_("--no-walk=sorted is incompatible with --no-walk=unsorted " + "and --unsorted-input")); revs->unsorted_input = 0; - else if (!strcmp(optarg, "unsorted")) + } else if (!strcmp(optarg, "unsorted")) revs->unsorted_input = 1; else return error("invalid argument to --no-walk"); diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 12def7bcbf..8e213eb413 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -169,4 +169,42 @@ test_expect_success 'rev-list --count --objects' ' test_line_count = $count actual ' +test_expect_success 'rev-list --unsorted-input results in different sorting' ' + git rev-list --unsorted-input HEAD HEAD~ >first && + git rev-list --unsorted-input HEAD~ HEAD >second && + ! test_cmp first second && + sort first >first.sorted && + sort second >second.sorted && + test_cmp first.sorted second.sorted +' + +test_expect_success 'rev-list --unsorted-input compatible with --no-walk=unsorted' ' + git rev-list --unsorted-input --no-walk=unsorted HEAD HEAD~ >actual && + git rev-parse HEAD >expect && + git rev-parse HEAD~ >>expect && + test_cmp expect actual +' + +test_expect_success 'rev-list --unsorted-input incompatible with --no-walk=sorted' ' + cat >expect <<-EOF && + fatal: --no-walk is incompatible with --no-walk=unsorted and --unsorted-input + EOF + test_must_fail git rev-list --unsorted-input --no-walk HEAD 2>error && + test_cmp expect error && + + cat >expect <<-EOF && + fatal: --no-walk=sorted is incompatible with --no-walk=unsorted and --unsorted-input + EOF + test_must_fail git rev-list --unsorted-input --no-walk=sorted HEAD 2>error && + test_cmp expect error && + + cat >expect <<-EOF && + fatal: --unsorted-input is incompatible with --no-walk and --no-walk=sorted + EOF + test_must_fail git rev-list --no-walk --unsorted-input HEAD 2>error && + test_cmp expect error && + test_must_fail git rev-list --no-walk=sorted --unsorted-input HEAD 2>error && + test_cmp expect error +' + test_done From patchwork Thu Aug 5 11:25:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12421031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 963B7C4338F for ; Thu, 5 Aug 2021 11:25:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7357161108 for ; Thu, 5 Aug 2021 11:25:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241012AbhHELZu (ORCPT ); Thu, 5 Aug 2021 07:25:50 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:52783 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241003AbhHELZu (ORCPT ); Thu, 5 Aug 2021 07:25:50 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 0AEF85C0045; Thu, 5 Aug 2021 07:25:36 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Thu, 05 Aug 2021 07:25:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=z/2OyS59uNq2K07hpBzSlfZPwmz kvL+aHGjFrh896Zk=; b=psYi014jsA1/k8iE9e75mVb5JpTcr55O33y7s3ApsW5 E894eAIvbZ2srIGTYGgJDvt+SjV56ayHe4mlAot/qXFPZLNLY3J5Dp9X12GPY21i unux9d0gmCBU9WgUEgW40DcjK6yv5woZdaNiS1K1U9UFLZ85FbZpdkqPypn/NUSP B9VMvyOAJVyEM6K6xyE9pwgCzjL7Ol2V4M6RsScL+BmHUv4cFgFNsL+fG4sZyyyQ ENGTvbvqMi8iN/k2nVKzNEVZgzLJEz+PF0HttrDUCn0M5Xp2jd+gTSlhE5fThK3m mfZ/l8MWtm0b3DSRxlAffI9nT63sZt9u49csBOs1OWg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=z/2OyS 59uNq2K07hpBzSlfZPwmzkvL+aHGjFrh896Zk=; b=u9lgqxdB8Wo1zLkdSz0uyS xCMcrPUagvZ6MNBypgIGBCntKVKAmjcg9PuoO4scnKaonae/hdJMy6V/CS5QmyGl dWehM/+11TR+3c3FCinppetYZeYDWGPUTAnzIAamradRrvDgQ2Lx98fFwz98lZr9 TpynksZR1nyL71bI4DArXh4DkqkAcXvAzX+0U0ij/W8SDrEPnnJlKecic5lO0iid RUl7Du7y6ajD/FaaiCoFiBSsZZ5Nl+vWWDYOE+N2SeNKPZIKTxnmkNYAHcQm7U2P yKFux06PT9IBo/9OgO05kbIiPNWfmEXN7Ta8m4FXhQi5t8eWAy8m7fbRixApkSrw == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdefkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomheprfgrthhrihgt khcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvghrnh ephefgjeeuveejteduhefgffefffdvjeefjeeivdekfffgkeeugfehveetueefleeknecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhssehpkh hsrdhimh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 07:25:34 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id a1b37944 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 5 Aug 2021 11:25:34 +0000 (UTC) Date: Thu, 5 Aug 2021 13:25:32 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano , Taylor Blau Subject: [PATCH v4 3/6] revision: stop retrieving reference twice Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When queueing up references for the revision walk, `handle_one_ref()` will resolve the reference's object ID via `get_reference()` and then queue the ID as pending object via `add_pending_oid()`. But given that `add_pending_oid()` is only a thin wrapper around `add_pending_object()` which fist calls `get_reference()`, we effectively resolve the reference twice and thus duplicate some of the work. Fix the issue by instead calling `add_pending_object()` directly, which takes the already-resolved object as input. In a repository with lots of refs, this translates into a near 10% speedup: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 5.015 s ± 0.038 s [User: 4.698 s, System: 0.316 s] Range (min … max): 4.970 s … 5.089 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.606 s ± 0.029 s [User: 4.260 s, System: 0.345 s] Range (min … max): 4.565 s … 4.657 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' Signed-off-by: Patrick Steinhardt --- revision.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/revision.c b/revision.c index 793f76a509..0d99413856 100644 --- a/revision.c +++ b/revision.c @@ -1534,7 +1534,7 @@ static int handle_one_ref(const char *path, const struct object_id *oid, object = get_reference(cb->all_revs, path, oid, cb->all_flags); add_rev_cmdline(cb->all_revs, object, path, REV_CMD_REF, cb->all_flags); - add_pending_oid(cb->all_revs, path, oid, cb->all_flags); + add_pending_object(cb->all_revs, object, path); return 0; } From patchwork Thu Aug 5 11:25:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12421033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B018C4338F for ; Thu, 5 Aug 2021 11:25:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1EB7A61108 for ; Thu, 5 Aug 2021 11:25:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241015AbhHELZz (ORCPT ); Thu, 5 Aug 2021 07:25:55 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:43967 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241003AbhHELZy (ORCPT ); Thu, 5 Aug 2021 07:25:54 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 572375C0045; Thu, 5 Aug 2021 07:25:40 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Thu, 05 Aug 2021 07:25:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=UOwEP93xR/+5B2SAHmS3fbywrQL MKr5fGDoOOCMCTyM=; b=Ru6uGgcCgBPuXjLgulmiLDJEYH3SPdP2juWx3z76tBq jNudoKu8vwAylL9qImgEFgY9bFDCTAoRkpdICuV9BPgQ4lFn1aQzLhplKISBbSO4 LF5HoYErdOzynKzcKs56gytYCEt+mSR8xOFZjI5rwGVS4uKddROIka3EqxEY992b y7yU4sYx5MlrWGAlbPwy5fIsNpqagR4AcI/LK5rf8x+fh8oGHPUnQbB2BNSPOOW3 BVey20EY423dT/m6tC8pIqiIAoBkYLDRnEfC5d9H+jV1DgNeDTrLZUwnweTpH38o YW7nWvjvWVtaewY/sLbxYQuK1GCvKkEB1uR5tQNGUuw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=UOwEP9 3xR/+5B2SAHmS3fbywrQLMKr5fGDoOOCMCTyM=; b=KMdTMPWKCvSnEbfIM4sEUr NPpIFFChe7lhFRvSsLZ7LfkpzEr9BL1V16xU+H6WNRbg24vPnH1TldE7KwjFbbO2 YJe17fKdxwUvw/7O46bO4tyg5qTHE6JOT7yE963WAj7SCS6XDGcG+VAaAsqSZf2Y 7E02fgEgiCojhbzV63igSsCup/2JEpCSfeHi2R3QqegpW+Sd3t0QClE0O6AJbX4d NzGTdKt7osDVB3oq1lmKDdoUdpB4324U5MKMxpRs2X0iFnYSxoDOEmPYoaqcF4Ln uqVE2H/iMTQySWaudqcL9pxhONfJX58GXDfkW2v30pYE/e+gY5C+4XFPSOllCZCw == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdefjecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomheprfgrthhrihgt khcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvghrnh ephefgjeeuveejteduhefgffefffdvjeefjeeivdekfffgkeeugfehveetueefleeknecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhssehpkh hsrdhimh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 07:25:39 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 620dfca0 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 5 Aug 2021 11:25:38 +0000 (UTC) Date: Thu, 5 Aug 2021 13:25:36 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano , Taylor Blau Subject: [PATCH v4 4/6] revision: avoid loading object headers multiple times Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When loading references, we try to optimize loading of commits by using the commit graph. To do so, we first need to determine whether the object actually is a commit or not, which is why we always execute `oid_object_info()` first. Like this, we'll unpack the object header of each object first. This pattern can be quite inefficient in case many references point to the same commit: if the object didn't end up in the cached objects, then we'll repeatedly unpack the same object header, even if we've already seen the object before. Optimize this pattern by using `lookup_unknown_object()` first in order to determine whether we've seen the object before. If so, then we don't need to re-parse the header but can directly use its object information and thus gain a modest performance improvement. Executed in a real-world repository with around 2.2 million references: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.771 s ± 0.238 s [User: 4.440 s, System: 0.330 s] Range (min … max): 4.539 s … 5.219 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.454 s ± 0.037 s [User: 4.122 s, System: 0.332 s] Range (min … max): 4.375 s … 4.496 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.07 ± 0.05 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' The downside is that `lookup_unknown_object()` is forced to always allocate an object such that it's big enough to host all object types' structs and thus we may waste memory here. This tradeoff is probably worth it though considering the following struct sizes: - commit: 72 bytes - tree: 56 bytes - blob: 40 bytes - tag: 64 bytes Assuming that in almost all repositories, most references will point to either a tag or a commit, we'd have a modest increase in memory consumption of about 12.5% here. Note that on its own, this patch may not seem like a clear win. But it is a prerequisite for the following patch, which will result in another 37% speedup. Signed-off-by: Patrick Steinhardt --- revision.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/revision.c b/revision.c index 0d99413856..25f4784fdd 100644 --- a/revision.c +++ b/revision.c @@ -359,14 +359,22 @@ static struct object *get_reference(struct rev_info *revs, const char *name, const struct object_id *oid, unsigned int flags) { - struct object *object; + struct object *object = lookup_unknown_object(revs->repo, oid); + + if (object->type == OBJ_NONE) { + int type = oid_object_info(revs->repo, oid, NULL); + if (type < 0 || !object_as_type(object, type, 1)) { + object = NULL; + goto out; + } + } /* * If the repository has commit graphs, repo_parse_commit() avoids * reading the object buffer, so use it whenever possible. */ - if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) { - struct commit *c = lookup_commit(revs->repo, oid); + if (object->type == OBJ_COMMIT) { + struct commit *c = (struct commit *) object; if (!repo_parse_commit(revs->repo, c)) object = (struct object *) c; else @@ -375,6 +383,7 @@ static struct object *get_reference(struct rev_info *revs, const char *name, object = parse_object(revs->repo, oid); } +out: if (!object) { if (revs->ignore_missing) return object; From patchwork Thu Aug 5 11:25:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12421035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7F75C4338F for ; Thu, 5 Aug 2021 11:25:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADDED61102 for ; Thu, 5 Aug 2021 11:25:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241010AbhHELZ6 (ORCPT ); Thu, 5 Aug 2021 07:25:58 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:41861 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241003AbhHELZ6 (ORCPT ); Thu, 5 Aug 2021 07:25:58 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 25AC95C0057; Thu, 5 Aug 2021 07:25:44 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 05 Aug 2021 07:25:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=m1wieKDI9SUKAVqVuZgCY//laQW 8dsM1fv4J2VSA/UY=; b=U1FO/1VyesJ+LrdQ1Btuaf+ETQPJf6kPXvDND7C5W1v oFalaSmazrv3VWElYHiT0rEaC4uFaHRMoJUU7zHTIfBNi7bHENDx56JU5VTPwkvd SEv8u/RaIAoFNNHxyF3uCxDzfcVE8GM9QQ2Iafr+MNfFbMbd0SRVjqPypx04L7bE HPBhNI5oRz0xbxdI/Nh8vdmdDnlociAmBo/NwoKyS/Hw46L9JbSgQAOw2S+rC32D K7puzSKfibneowDFc73emsWLQ096dbhhB209l7mW7ELay1q3c7Qiaoypqp9otjib griCPRNBVJnlX2aFZf1X2IxcEGUyiwkolqGdiXeeDLQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=m1wieK DI9SUKAVqVuZgCY//laQW8dsM1fv4J2VSA/UY=; b=kMDXNsWXsg1hEuKLEcnRCh RQ33PjsLvOMOqkGCxdRojTaXmVruyR8mszoVfmOl491yCNaAA+gMsBkkD8hwRTZ1 G4vIHQxG0el32CNu1qITdsFYqzC6q38wgBmQLDP9wVvIOesZ+T4DS6WIWI6L81U/ 0Zoap2XDl+5a+vopSTHLKMB0586/cr3yNPNEJyWrVa/UNs1WiC8TqxiLUrmJxDxA +wKN4PbuZuZhvBSL6NndJ4k2/99pmmZN6jCiOQCzMKqyBi8kjajUZ3FPLAbqj3yV /dU1E9JHH2ZVJ7JnHoid342eKGGHo6nTAgk+Iz+Un63wWPsZ62pGJwELpfOl8GFw == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdefkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtvdenucfhrhhomheprfgrthhrihgt khcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvghrnh epheeghfdtfeeuffehkefgffduleffjedthfdvjeektdfhhedvlefgtefgvdettdfhnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhssehpkh hsrdhimh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 07:25:42 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 9bf19aa9 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 5 Aug 2021 11:25:42 +0000 (UTC) Date: Thu, 5 Aug 2021 13:25:41 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano , Taylor Blau Subject: [PATCH v4 5/6] commit-graph: split out function to search commit position Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The function `find_commit_in_graph()` assumes that the caller has passed an object which was already determined to be a commit given that it will access the commit's graph position, which is stored in a commit slab. In a subsequent patch, we want to search for an object ID though without knowing whether it is a commit or not, which is not currently possible. Split out the logic to search the commit graph for a given object ID to prepare for this change. This commit also renames the function to `find_commit_pos_in_graph()`, which more accurately reflects what this function does. Furthermore, in order to allow for the searched object ID to be const, we need to adjust `bsearch_graph()`'s signature to accept a constant object ID as input, too. Signed-off-by: Patrick Steinhardt --- commit-graph.c | 55 +++++++++++++++++++++++++++----------------------- 1 file changed, 30 insertions(+), 25 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 3860a0d847..8c4c7262c8 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -723,7 +723,7 @@ void close_commit_graph(struct raw_object_store *o) o->commit_graph = NULL; } -static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos) +static int bsearch_graph(struct commit_graph *g, const struct object_id *oid, uint32_t *pos) { return bsearch_hash(oid->hash, g->chunk_oid_fanout, g->chunk_oid_lookup, g->hash_len, pos); @@ -864,25 +864,30 @@ static int fill_commit_in_graph(struct repository *r, return 1; } -static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) +static int search_commit_pos_in_graph(const struct object_id *id, struct commit_graph *g, uint32_t *pos) +{ + struct commit_graph *cur_g = g; + uint32_t lex_index; + + while (cur_g && !bsearch_graph(cur_g, id, &lex_index)) + cur_g = cur_g->base_graph; + + if (cur_g) { + *pos = lex_index + cur_g->num_commits_in_base; + return 1; + } + + return 0; +} + +static int find_commit_pos_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos) { uint32_t graph_pos = commit_graph_position(item); if (graph_pos != COMMIT_NOT_FROM_GRAPH) { *pos = graph_pos; return 1; } else { - struct commit_graph *cur_g = g; - uint32_t lex_index; - - while (cur_g && !bsearch_graph(cur_g, &(item->object.oid), &lex_index)) - cur_g = cur_g->base_graph; - - if (cur_g) { - *pos = lex_index + cur_g->num_commits_in_base; - return 1; - } - - return 0; + return search_commit_pos_in_graph(&item->object.oid, g, pos); } } @@ -895,7 +900,7 @@ static int parse_commit_in_graph_one(struct repository *r, if (item->object.parsed) return 1; - if (find_commit_in_graph(item, g, &pos)) + if (find_commit_pos_in_graph(item, g, &pos)) return fill_commit_in_graph(r, item, g, pos); return 0; @@ -921,7 +926,7 @@ void load_commit_graph_info(struct repository *r, struct commit *item) uint32_t pos; if (!prepare_commit_graph(r)) return; - if (find_commit_in_graph(item, r->objects->commit_graph, &pos)) + if (find_commit_pos_in_graph(item, r->objects->commit_graph, &pos)) fill_commit_graph_info(item, r->objects->commit_graph, pos); } @@ -1091,9 +1096,9 @@ static int write_graph_chunk_data(struct hashfile *f, edge_value += ctx->new_num_commits_in_base; else if (ctx->new_base_graph) { uint32_t pos; - if (find_commit_in_graph(parent->item, - ctx->new_base_graph, - &pos)) + if (find_commit_pos_in_graph(parent->item, + ctx->new_base_graph, + &pos)) edge_value = pos; } @@ -1122,9 +1127,9 @@ static int write_graph_chunk_data(struct hashfile *f, edge_value += ctx->new_num_commits_in_base; else if (ctx->new_base_graph) { uint32_t pos; - if (find_commit_in_graph(parent->item, - ctx->new_base_graph, - &pos)) + if (find_commit_pos_in_graph(parent->item, + ctx->new_base_graph, + &pos)) edge_value = pos; } @@ -1235,9 +1240,9 @@ static int write_graph_chunk_extra_edges(struct hashfile *f, edge_value += ctx->new_num_commits_in_base; else if (ctx->new_base_graph) { uint32_t pos; - if (find_commit_in_graph(parent->item, - ctx->new_base_graph, - &pos)) + if (find_commit_pos_in_graph(parent->item, + ctx->new_base_graph, + &pos)) edge_value = pos; } From patchwork Thu Aug 5 11:25:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 12421037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60BE7C4338F for ; Thu, 5 Aug 2021 11:25:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 421CB6112D for ; Thu, 5 Aug 2021 11:25:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241016AbhHEL0D (ORCPT ); Thu, 5 Aug 2021 07:26:03 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:45369 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240808AbhHEL0C (ORCPT ); Thu, 5 Aug 2021 07:26:02 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 90B7F5C008E; Thu, 5 Aug 2021 07:25:48 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Thu, 05 Aug 2021 07:25:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=mzvGiBrhHrgbfMgcKD5j1UozQZB duOVUts4/SP/2L9k=; b=L3k1l0XvLX+Pa9r9wAWKX7tSq0NvaEL5/OswEzZnTr5 21GERK5koD1qP0in2OpMT5xXV69hm6k1yucxcRmuqiSlCesZjLSMpXvvyF7Zr/iA yG2avby/AVJhRpgSMoqmxzhATv3LaLoBv9CMRKNMXkPSEEkcw2+GtVg2YNpdUmZY 831/jQ8X4kOOLZXQCJ1Y+YV/h0yd8B6RZoh62nRn+4napQT77E35vgOLixm2oya4 Qda1zq8Vc/T4N14ta2ldDi4A18c6RlKv7s6T9DVrlwUW2vCWS/QzlnaNOAoJW0Rk DtidT2d8fGsy59FI5blvFR/6bxAl61y90tllA6eLvow== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=mzvGiB rhHrgbfMgcKD5j1UozQZBduOVUts4/SP/2L9k=; b=J1zYDqBAt+4ngQmkswI9+E 5ELsgK2fiWVtAQGxyQRd2mixu6wry+d3NVkMhSoltCcmO8y2B+m15+0Dg2Jt/I6K lclwtWd4SOWU1nvzHQM6wb4DUZuuzdXQzXgZWLp9wqkkkZ5bTmHMQHGMgcrfn3KE ttWvzqiwir/GD0e0CKC54c2wsPq9AD5O9lWUPXejUwq3s9v0kHHJ+vv5ku/NEmEL TSNeVggahlWIm+Jfq+XBd/tIrLYn/WfEUKc2YCjFMHh9vXFnY1e26S5TiJILM6vP UZDw8nH7p9AO+WIP7ZS2oSXWV3zXKzSE2AV1Dm+mAVFj1GbrxOVd/S5AgWePCsPQ == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdefjecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomheprfgrthhrihgt khcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvghrnh ephefgjeeuveejteduhefgffefffdvjeefjeeivdekfffgkeeugfehveetueefleeknecu vehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepphhssehpkh hsrdhimh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 07:25:47 -0400 (EDT) Received: from localhost (ncase [10.192.0.11]) by vm-mail.pks.im (OpenSMTPD) with ESMTPSA id 38af3e60 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 5 Aug 2021 11:25:46 +0000 (UTC) Date: Thu, 5 Aug 2021 13:25:45 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Jeff King , Felipe Contreras , SZEDER =?iso-8859-1?q?G=E1bor?= , Chris Torek , =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason , Junio C Hamano , Taylor Blau Subject: [PATCH v4 6/6] revision: avoid hitting packfiles when commits are in commit-graph Message-ID: <900c5a9c60282e0fb57840305cbab013ffa398f1.1628162156.git.ps@pks.im> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When queueing references in git-rev-list(1), we try to either reuse an already parsed object or alternatively we load the object header from disk in order to determine its type. This is inefficient for commits though in cases where we have a commit graph available: instead of hitting the real object on disk to determine its type, we may instead search the object graph for the object ID. In case it's found, we can directly fill in the commit object, otherwise we can still hit the disk to determine the object's type. Expose a new function `parse_commit_in_graph_gently()`, which fills in an object of unknown type in case we find its object ID in the graph. This provides a big performance win in cases where there is a commit-graph available in the repository in case we load lots of references. The following has been executed in a real-world repository with about 2.2 million refs: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.508 s ± 0.039 s [User: 4.131 s, System: 0.377 s] Range (min … max): 4.455 s … 4.576 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 3.072 s ± 0.031 s [User: 2.707 s, System: 0.365 s] Range (min … max): 3.040 s … 3.144 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.47 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' Signed-off-by: Patrick Steinhardt --- commit-graph.c | 26 ++++++++++++++++++++++++++ commit-graph.h | 7 +++++++ revision.c | 17 +++++++++++++++-- 3 files changed, 48 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 8c4c7262c8..cc7136243d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -891,6 +891,32 @@ static int find_commit_pos_in_graph(struct commit *item, struct commit_graph *g, } } +int parse_commit_in_graph_gently(struct repository *repo, struct object *object) +{ + struct commit *commit; + uint32_t pos; + + if (object->parsed) { + if (object->type != OBJ_COMMIT) + return -1; + return 0; + } + + if (!repo->objects->commit_graph) + return -1; + + if (!search_commit_pos_in_graph(&object->oid, repo->objects->commit_graph, &pos)) + return -1; + + commit = object_as_type(object, OBJ_COMMIT, 1); + if (!commit) + return -1; + if (!fill_commit_in_graph(repo, commit, repo->objects->commit_graph, pos)) + return -1; + + return 0; +} + static int parse_commit_in_graph_one(struct repository *r, struct commit_graph *g, struct commit *item) diff --git a/commit-graph.h b/commit-graph.h index 96c24fb577..e2e93b0ee1 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -40,6 +40,13 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st); */ int parse_commit_in_graph(struct repository *r, struct commit *item); +/* + * Given an object of unknown type, try to fill in the object in case it is a + * commit part of the commit-graph. Returns 0 if the object is a parsed commit + * or if it could be filled in via the commit graph, otherwise it returns -1. + */ +int parse_commit_in_graph_gently(struct repository *repo, struct object *object); + /* * It is possible that we loaded commit contents from the commit buffer, * but we also want to ensure the commit-graph content is correctly diff --git a/revision.c b/revision.c index 25f4784fdd..318b43026e 100644 --- a/revision.c +++ b/revision.c @@ -362,8 +362,21 @@ static struct object *get_reference(struct rev_info *revs, const char *name, struct object *object = lookup_unknown_object(revs->repo, oid); if (object->type == OBJ_NONE) { - int type = oid_object_info(revs->repo, oid, NULL); - if (type < 0 || !object_as_type(object, type, 1)) { + /* + * It's likely that the reference points to a commit, so we + * first try to look it up via the commit-graph. If successful, + * then we know it's a commit and don't have to unpack the + * object header. We still need to assert that the object + * exists, but given that we don't request any info about the + * object this is a lot faster than `oid_object_info()`. + */ + if (parse_commit_in_graph_gently(revs->repo, object) < 0) { + int type = oid_object_info(revs->repo, oid, NULL); + if (type < 0 || !object_as_type(object, type, 1)) { + object = NULL; + goto out; + } + } else if (!repo_has_object_file(revs->repo, oid)) { object = NULL; goto out; }