diff mbox series

revision: ignore non-existent objects in resolve-undo list

Message ID 20221018175530.086c8c74@apus (mailing list archive)
State New, archived
Headers show
Series revision: ignore non-existent objects in resolve-undo list | expand

Commit Message

Mathias Rav Oct. 18, 2022, 3:55 p.m. UTC
Garbage collection could inadvertently prune blobs mentioned only in the
resolve-undo extension prior to the bugfix in 5a5ea141e7
("revision: mark blobs needed for resolve-undo as reachable", 2022-06-09).

If a repository is affected by this bug, an obscure error can occur in
`git gc` after updating to a version of git that has the bugfix:

	$ git gc
	Enumerating objects: 327687, done.
	Counting objects: 100% (327687/327687), done.
	Delta compression using up to 8 threads
	Compressing objects: 100% (70883/70883), done.
	fatal: unable to read 616c8d17f4625f227708aae480e71233f7f58dce
	fatal: failed to run repack

A similar error occurs in `git rev-list --objects --indexed-objects`.

Fix the error by emitting a warning when the resolve-undo list mentions
objects that do not exist and then ignoring the nonexistent object.

The bugfix 5a5ea141e7 already contained code to emit this warning,
but since the code used lookup_blob() (and not parse_object()),
it would only warn in the unlikely scenario where the resolve-undo list
mentions an existing object that is not a blob.

I have encountered this error on two different clones of a large
mono-repository in checkouts with dozens of worktrees that see frequent
rebasing (causing a lot of git gc churn) and frequent merge conflicts
during rebasing, leading to resolve-undo lists in the index.
Somehow it seems the resolve-undo lists in the index persist after the
merge conflicts are resolved, which makes the error more frequent than
if the resolve-undo lists were only populated during the merge conflict.

Signed-off-by: Mathias Rav <m@git.strova.dk>
Fixes: 5a5ea141e7
Reported-by: Paul Wagland <pwagland@gmail.com>
---
 revision.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Oct. 18, 2022, 4:32 p.m. UTC | #1
Mathias Rav <m@git.strova.dk> writes:

> Garbage collection could inadvertently prune blobs mentioned only in the
> resolve-undo extension prior to the bugfix in 5a5ea141e7
> ("revision: mark blobs needed for resolve-undo as reachable", 2022-06-09).

Older versions of Git did not consider blobs referenced by the
resolve-undo as reachable, and allowed "git gc" to expire them out
of existence.  And "git fsck" in these versions did report that
these blobs are unreachable.

Newer versions of Git on the other hand do consider these blobs as
reachable, so "git gc" would not expire them.  And "git fsck" would
complain when they are missing, because by definition we should not
lose reachable objects.

The error discussed recently on the list was only because older
version was used to "git gc" away blobs that are still in use.

I think the right solution for such a transitory error is not to
hide the problem and pretend that such a blob reference does not
exist, which is what ...

> Fix the error by emitting a warning when the resolve-undo list mentions
> objects that do not exist and then ignoring the nonexistent object.

... this approach is about.  I think it is backwards to sweep the
problem under the rug without fixing the underlying problem.

We should instead be removing the reference that is no longer even
usable for the purpose of resolve-undo, e.g. when "rerere forget
<pathspec>" reads from the resolve-undo extension to recreate the
conflicts.

Perhaps "git reflog --state-fix" is a good model to follow.  Back
when the option was introduced, we found that there was a buggy
implementation of "git gc" that did not consider commits referenced
by reflog entries reachable and removed them, breaking "git reflog".
The solution was to remove these reflog entries that accidentally
lost commits that they reference because they no longer are usable.

The manual procedure Peff gave in the thread does work OK, but if it
makes it more friendly, a new option to "update-index" to fix the
index file by removing things that refer to missing objects would
not be a bad idea.

Thanks.
Junio C Hamano Oct. 18, 2022, 4:40 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>> Fix the error by emitting a warning when the resolve-undo list mentions
>> objects that do not exist and then ignoring the nonexistent object.
>
> ... this approach is about.  I think it is backwards to sweep the
> problem under the rug without fixing the underlying problem.
>
> We should instead be removing the reference that is no longer even
> usable for the purpose of resolve-undo, e.g. when "rerere forget
> <pathspec>" reads from the resolve-undo extension to recreate the
> conflicts.

Ah, I take half of it back.  "instead" -> "in addition".

The patch corresponds to revision.c::handle_one_reflog_commit() that
skips the object referenced by a reflog entry that no longer exists
as part of the solution to the same problem as "reflog --stale-fix"
solved, and needs to be an integral half of the solution to the
"older gc lose blobs referenced by resolve-undo extension" problem.

And the patch goes in the right direction.  It is a bit sad that it
now has to do parse_object() but in the normal case, the object
referenced should be a blob that exists, for which the cost of
parsing it would be none (just setting .parsed member to true), so
it should be OK.

Thanks.  Will queue.
Jeff King Oct. 18, 2022, 8:29 p.m. UTC | #3
On Tue, Oct 18, 2022 at 09:40:01AM -0700, Junio C Hamano wrote:

> And the patch goes in the right direction.  It is a bit sad that it
> now has to do parse_object() but in the normal case, the object
> referenced should be a blob that exists, for which the cost of
> parsing it would be none (just setting .parsed member to true), so
> it should be OK.

This isn't quite true. parse_object() will still inflate the object
contents to check the sha1. I think has_object_file() is probably the
right thing here. We want to know if the object is missing entirely.

We'd not notice corrupted bytes, of course, but that is OK. Traversal
does not open blobs we reach via trees, either. For pack-objects, we
rely on either:

  - for repacking to disk, we check the pack crc for already-packed
    objects (which avoids inflating them). For loose objects, we'll
    inflate them later when we convert them to packed form.

  - for packing to stdout for fetch/push, the receiver is expected to
    check the sha1 via index-pack, etc.

So I think just checking "do we have it? If not, gently skip it" is the
right thing here. And in the long run we'd hopefully remove that code,
as "we don't have it" becomes less "this was probably gc'd with an older
version of git" to "oops, there is a bug in Git that lost this object".

I notice that 5a5ea141e7 (revision: mark blobs needed for resolve-undo
as reachable, 2022-06-09) uses parse_object() in the fsck code path.
That _might_ be better as lookup_object(), as earlier stages of fsck
would have checked the bytes of each object and created an in-memory
object struct. Though I guess in that sense, it doesn't matter;
parse_object() will hit lookup_object() first and see that in-memory
struct.

-Peff
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 36e31942ce..03bc45bef1 100644
--- a/revision.c
+++ b/revision.c
@@ -1720,18 +1720,18 @@  static void add_resolve_undo_to_pending(struct index_state *istate, struct rev_i
 		if (!ru)
 			continue;
 		for (i = 0; i < 3; i++) {
-			struct blob *blob;
+			struct object *obj;
 
 			if (!ru->mode[i] || !S_ISREG(ru->mode[i]))
 				continue;
 
-			blob = lookup_blob(revs->repo, &ru->oid[i]);
-			if (!blob) {
+			obj = parse_object(revs->repo, &ru->oid[i]);
+			if (!obj) {
 				warning(_("resolve-undo records `%s` which is missing"),
 					oid_to_hex(&ru->oid[i]));
 				continue;
 			}
-			add_pending_object_with_path(revs, &blob->object, "",
+			add_pending_object_with_path(revs, obj, "",
 						     ru->mode[i], path);
 		}
 	}