From patchwork Mon Oct 23 11:27:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13432771 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37BCC15E85 for ; Mon, 23 Oct 2023 11:27:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="ty8U0mLw"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Tfqbx1G7" Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7F8AC1 for ; Mon, 23 Oct 2023 04:27:20 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 0C498320097D; Mon, 23 Oct 2023 07:27:19 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 23 Oct 2023 07:27:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1698060439; x=1698146839; bh=lF CK0KTX+2Y/XXO2kKtdk0mvtPkS7kkoKSa2QgR1v0Q=; b=ty8U0mLwUtzJ5JuJxW i/l8zN4PVY8o2R135vc0Bx9tTU6bfSusOSeKcu3YfAAggnbeWBFhUIiIBqNDXEbc 74fo5L2ucg7cj2cCakYuHVdbl0ZzgpxigzYOBVFoLuS1+/kC30yCZmgU/k54/KMV 9e1XS7HDWmKPuh0AOESjO9Uh9rthRajwB/gi0gdSMzrnqE6lK5OTeFQrLVPOOJvk d2MamEIVATOkTqZWTf/upCGP+f/I/ljYh6/uAAhnMYDRPM46mdegpBGo59QVNMJr n817WafAucKB0gKwD8t7m2KkiUAJeWCJX7M8+OxiLpvMhL65vseKZx/mxmy141Fp FUoQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1698060439; x=1698146839; bh=lFCK0KTX+2Y/X XO2kKtdk0mvtPkS7kkoKSa2QgR1v0Q=; b=Tfqbx1G7GDiC/m9EFxKJVQIYiQNV8 /gby+C9aEjTj09aApps4NE2eHXaBoEKJqXYqrsGnVtXUIKJHdc+gVJInVi/I+KZo RPxbuiTZze2ehICykv4aMQaBmkF1+EtjSaRp4WAKMfvEjvaVPpNePjYVkpSttG8+ xZ4wE9g4xnH4IWxYkL4qtO3NCCX+Tk0yAOwQfwNjkjkMZW/DXKcCvoxlhkVeO3yd chUC86IQe2kM4PqbgLXo8EnQRzQ5eKteRlm2Uncu2B8m/ahEUFelEL4c6AsEkbT7 IiTadhueKh1NyyR68y2IbEtqTg6CpPI/zSWwTZfUa+tHO8b6SG2AHsEiA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrkeeigdeflecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvvefukfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefrrghtrhhi tghkucfuthgvihhnhhgrrhguthcuoehpshesphhkshdrihhmqeenucggtffrrghtthgvrh hnpeeukedtvedtffevleejtefgheehieegkeeluddvfeefgeehgfeltddtheejleffteen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpshesph hkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 23 Oct 2023 07:27:18 -0400 (EDT) Received: by vm-mail (OpenSMTPD) with ESMTPSA id c1d978b3 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 23 Oct 2023 11:27:15 +0000 (UTC) Date: Mon, 23 Oct 2023 13:27:16 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Karthik Nayak , Junio C Hamano , Taylor Blau , Jeff King Subject: [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt --- Documentation/git.txt | 9 +++++++++ commit-graph.c | 6 +++++- commit-graph.h | 6 ++++++ t/t5318-commit-graph.sh | 21 +++++++++++++++++++++ 4 files changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index 11228956cd..22c2b537aa 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -911,6 +911,15 @@ for full details. should not normally need to set this to `0`, but it may be useful when trying to salvage data from a corrupted repository. +`GIT_COMMIT_GRAPH_PARANOIA`:: + If this Boolean environment variable is set to false, ignore the + case where commits exist in the commit graph but not in the + object database. Normally, Git will check whether commits loaded + from the commit graph exist in the object database to avoid + issues with stale commit graphs, but this check comes with a + performance penalty. The default is `1` (i.e., be paranoid about + stale commits in the commit graph). + `GIT_ALLOW_PROTOCOL`:: If set to a colon-separated list of protocols, behave as if `protocol.allow` is set to `never`, and each of the listed diff --git a/commit-graph.c b/commit-graph.c index fd2f700b2e..12ec31902e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c, struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id) { + static int object_paranoia = -1; struct commit *commit; uint32_t pos; + if (object_paranoia == -1) + object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1); + if (!prepare_commit_graph(repo)) return NULL; if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos)) return NULL; - if (!has_object(repo, id, 0)) + if (object_paranoia && !has_object(repo, id, 0)) return NULL; commit = lookup_commit(repo, id); diff --git a/commit-graph.h b/commit-graph.h index 20ada7e891..bd4289620c 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -8,6 +8,12 @@ #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" +/* + * This environment variable controls whether commits looked up via the + * commit graph will be double checked to exist in the object database. + */ +#define GIT_COMMIT_GRAPH_PARANOIA "GIT_COMMIT_GRAPH_PARANOIA" + /* * This method is only used to enhance coverage of the commit-graph * feature in the test suite with the GIT_TEST_COMMIT_GRAPH and diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index ba65f17dd9..c0cc454538 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' ' ) ' +test_expect_success 'stale commit cannot be parsed when given directly' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + test_commit A && + test_commit B && + git commit-graph write --reachable && + + oid=$(git rev-parse B) && + rm .git/objects/"$(test_oid_to_path "$oid")" && + + # Verify that it is possible to read the commit from the + # commit graph when not being paranoid, ... + GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B && + # ... but parsing the commit when double checking that + # it actually exists in the object database should fail. + test_must_fail git rev-list -1 B + ) +' + test_done