diff mbox series

[v2,3/3] commit: don't lazy-fetch commits

Message ID c5fe42deb04285b85d5354a57e90bf9410cc2420.1670373420.git.jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series Don't lazy-fetch commits when parsing them | expand

Commit Message

Jonathan Tan Dec. 7, 2022, 12:40 a.m. UTC
When parsing commits, fail fast when the commit is missing or
corrupt, instead of attempting to fetch them. This is done by inlining
repo_read_object_file() and setting the flag that prevents fetching.

This is motivated by a situation in which through a bug (not necessarily
through Git), there was corruption in the object store of a partial
clone. In this particular case, the problem was exposed when "git gc"
tried to expire reflogs, which calls repo_parse_commit(), which triggers
fetches of the missing commits.

(There are other possible solutions to this problem including passing an
argument from "git gc" to "git reflog" to inhibit all lazy fetches, but
I think that this fix is at the wrong level - fixing "git reflog" means
that this particular command works fine, or so we think (it will fail if
it somehow needs to read a legitimately missing blob, say, a .gitmodules
file), but fixing repo_parse_commit() will fix a whole class of bugs.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Dec. 7, 2022, 1:17 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> +	/*
> +	 * Git does not support partial clones that exclude commits, so set
> +	 * OBJECT_INFO_SKIP_FETCH_OBJECT to fail fast when an object is missing.
> +	 */
> +	int flags = OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT |
> +		OBJECT_INFO_DIE_IF_CORRUPT;

That's a quite helpful comment.
Jeff King Dec. 7, 2022, 6:47 a.m. UTC | #2
On Tue, Dec 06, 2022 at 04:40:53PM -0800, Jonathan Tan wrote:

> @@ -516,8 +527,8 @@ int repo_parse_commit_internal(struct repository *r,
>  		return 0;
>  	if (use_commit_graph && parse_commit_in_graph(r, item))
>  		return 0;
> -	buffer = repo_read_object_file(r, &item->object.oid, &type, &size);
> -	if (!buffer)
> +
> +	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)

Nice. And this swap-out is much more obviously correct in this version
of the series because read_object_file_extended() is now clearly a thin
wrapper around oid_object_info_extended(), after your patch 2.

I actually think it would be beneficial to do a bit more refactoring
there to eliminate read_object() entirely (in favor of just having the
two callers use oid_object_info_extended() directly), and having
repo_read_object_extended() pass OBJECT_INFO_LOOKUP_REPLACE instead of
doing its own lookup.

But as that's all orthogonal to your goal, I don't mind if we punt on it
for now. We can do it later on top.

-Peff
diff mbox series

Patch

diff --git a/commit.c b/commit.c
index 572301b80a..a02723f06b 100644
--- a/commit.c
+++ b/commit.c
@@ -508,6 +508,17 @@  int repo_parse_commit_internal(struct repository *r,
 	enum object_type type;
 	void *buffer;
 	unsigned long size;
+	struct object_info oi = {
+		.typep = &type,
+		.sizep = &size,
+		.contentp = &buffer,
+	};
+	/*
+	 * Git does not support partial clones that exclude commits, so set
+	 * OBJECT_INFO_SKIP_FETCH_OBJECT to fail fast when an object is missing.
+	 */
+	int flags = OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT |
+		OBJECT_INFO_DIE_IF_CORRUPT;
 	int ret;
 
 	if (!item)
@@ -516,8 +527,8 @@  int repo_parse_commit_internal(struct repository *r,
 		return 0;
 	if (use_commit_graph && parse_commit_in_graph(r, item))
 		return 0;
-	buffer = repo_read_object_file(r, &item->object.oid, &type, &size);
-	if (!buffer)
+
+	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
 			     oid_to_hex(&item->object.oid));