From patchwork Mon Oct 23 11:27:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13432770 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37B2115E84 for ; Mon, 23 Oct 2023 11:27:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="lnxWsnS0"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="djq+BGZr" Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 178D0E8 for ; Mon, 23 Oct 2023 04:27:20 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 977DE32007F9; Mon, 23 Oct 2023 07:27:16 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Mon, 23 Oct 2023 07:27:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :message-id:mime-version:reply-to:sender:subject:subject:to:to; s=fm3; t=1698060436; x=1698146836; bh=XROFSa9bm7OWnP90sJr3WgPmY 1e89ZTnfYqoxQcFOvU=; b=lnxWsnS0g/zDu3dZqqEhvQRHUP9Vac2/KZyxwolnw pQhFHopJVmmVQg7aLMnFib1kyb20L0ONvFwNHURFpwyN4iYELMvoD+9CTAchboy0 F2WnSvsXUotSGYs3ZARt6MYMxzWUarpYX/8RqW3Yv94KOlzS17KEHiRoSRk5Gn1r B/pwg4ojzVOBv/hkN9mDb95mD6vwBGZR3Hk5pAsmVrE4gwIDU9s+RCgw9qAAnlRY Bvq2825jf6yQoPJ5ZU32m27qF+TdJeTlXKJI8+OHnBdp6gI3FCE907ao457Zkrz0 TjoyS/f0EiqmShHQlXPnCFxiA//+T9FiBvJZsJI3+nQ8g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1698060436; x=1698146836; bh=XROFSa9bm7OWnP90sJr3WgPmY1e89ZTnfYq oxQcFOvU=; b=djq+BGZrDvTOJr329nYOizjpR0pC7PJygJ7wgH/TbVxpSY18ckZ 5iX7P4pPMUiZrKa3Xm+8bSvmTaQIMoFMf6V2RwjGoW+ymUeVQ4Y4PIMLaxvi2RuT PKeqQ2yz0Tepa7qtiSxIyti/qacGr+AJpAmcB4e8okwniEOQQkpcF65172Z2Sg1c 7aZ8hThc29t/BPhiGUR0Uykt3llTuATQDH/RbtiN2sw2+74seicM/pVd7TOHM5eG q85ZwOYNj22HIGNYinV/mvzaJx5GxQiYANf2Hr646Sc4CoiNgU3rWTsRM6RhgAWk 96/T6ezeooYW+CZMdY1/0YgwRWXuvX8Q2CQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrkeeigdeflecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvvefukfggtggusehgtderredttdejnecuhfhrohhmpefrrghtrhhitghk ucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrhhnpe eghfefveeuleehtdekteevfeejfeekteeiudeujedtleektdeltdektedttdegueenucff ohhmrghinhepphhkshdrihhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 23 Oct 2023 07:27:14 -0400 (EDT) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 10da9b5d (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 23 Oct 2023 11:27:10 +0000 (UTC) Date: Mon, 23 Oct 2023 13:27:11 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Karthik Nayak , Junio C Hamano , Taylor Blau , Jeff King Subject: [PATCH v2 0/2] commit-graph: detect commits missing in ODB Message-ID: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline Hi, this is version 2 of my patch series to more readily detect commits parsed from the commit graph which are missing in the object database. I've split this out into a separate thread as version 1 was sent in reply to a different series to extend git-rev-list(1)'s `--missing` option so that I don't continue to hijack this thread. Changes compared to v1: - I've added a preparatory patch that introduced a new GIT_COMMIT_GRAPH_PARANOIA environment variable as suggested by Peff. This envvar is retrofitted to the preexisting check in `lookup_commit_in_graph()` so that the behaviour for this sanity check is consistent. - `repo_parse_commit_internal()` now also honors this new envvar. - I've amended the commit message in v2 to include the benchmark that demonstrates the performance regression. - We now un-parse the commit when parsing it via the commit graph succeeded, but it doesn't exist in the ODB. Thanks for all the feedback and discussion around this. Patrick [1]: Patrick Steinhardt (2): commit-graph: introduce envvar to disable commit existence checks commit: detect commits that exist in commit-graph but not in the ODB Documentation/git.txt | 9 ++++++++ commit-graph.c | 6 +++++- commit-graph.h | 6 ++++++ commit.c | 16 +++++++++++++- t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 83 insertions(+), 2 deletions(-) Range-diff against v1: -: ---------- > 1: a89c435528 commit-graph: introduce envvar to disable commit existence checks 1: 6ec1e340f8 ! 2: 0476d48555 commit: detect commits that exist in commit-graph but not in the ODB @@ Commit message behaviour by checking for object existence via the object database, as well. + This check of course comes with a performance penalty. The following + benchmarks have been executed in a clone of linux.git with stable tags + added: + + Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) + Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] + Range (min … max): 2.894 s … 2.950 s 10 runs + + Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] + Range (min … max): 3.780 s … 3.961 s 10 runs + + Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) + Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] + Range (min … max): 13.714 s … 13.995 s 10 runs + + Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] + Range (min … max): 13.645 s … 14.038 s 10 runs + + Summary + git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran + 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) + + We look at a ~30% regression in general, but in general we're still a + whole lot faster than without the commit graph. To counteract this, the + new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. + Signed-off-by: Patrick Steinhardt ## commit.c ## +@@ + #include "shallow.h" + #include "tree.h" + #include "hook.h" ++#include "parse.h" + + static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **); + @@ commit.c: int repo_parse_commit_internal(struct repository *r, return -1; if (item->object.parsed) return 0; - if (use_commit_graph && parse_commit_in_graph(r, item)) + if (use_commit_graph && parse_commit_in_graph(r, item)) { -+ if (!has_object(r, &item->object.oid, 0)) ++ static int object_paranoia = -1; ++ ++ if (object_paranoia == -1) ++ object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1); ++ ++ if (object_paranoia && !has_object(r, &item->object.oid, 0)) { ++ unparse_commit(r, &item->object.oid); + return quiet_on_missing ? -1 : + error(_("commit %s exists in commit-graph but not in the object database"), + oid_to_hex(&item->object.oid)); ++ } ++ return 0; + } @@ commit.c: int repo_parse_commit_internal(struct repository *r, return quiet_on_missing ? -1 : ## t/t5318-commit-graph.sh ## -@@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version upgrade' ' +@@ t/t5318-commit-graph.sh: test_expect_success 'stale commit cannot be parsed when given directly' ' ) ' -+test_expect_success 'commit exists in commit-graph but not in object database' ' ++test_expect_success 'stale commit cannot be parsed when traversing graph' ' + test_when_finished "rm -rf repo" && + git init repo && + ( @@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version + oid=$(git rev-parse B) && + rm .git/objects/"$(test_oid_to_path "$oid")" && + ++ # Again, we should be able to parse the commit when not ++ # being paranoid about commit graph staleness... ++ GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 && ++ # ... but fail when we are paranoid. + test_must_fail git rev-parse HEAD~2 2>error && + grep "error: commit $oid exists in commit-graph but not in the object database" error + )