diff mbox series

[3/3] fsck: mention file path for index errors

Message ID Y/hxW9i9GyKblNV4@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series fsck index files from all worktrees | expand

Commit Message

Jeff King Feb. 24, 2023, 8:12 a.m. UTC
If we encounter an error in an index file, we may say something like:

  error: 1234abcd: invalid sha1 pointer in resolve-undo

But if you have multiple worktrees, each with its own index, it can be
very helpful to know which file had the problem. So let's pass that path
down through the various index-fsck functions and use it where
appropriate. After this patch you should get something like:

  error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index

That's a bit verbose, but since the point is that you shouldn't see this
normally, we're better to err on the side of more details.

I've also added the index filename to the name used by "fsck
--name-objects", which will show up if we find the object to be missing,
etc. This is bending the rules a little there, as the option claims to
write names that can be fed to rev-parse. But there is no revision
syntax to access the index of another worktree, so the best we can do is
make up something that a human will probably understand.

I did take care to retain the existing ":file" syntax for the current
worktree. So the uglier output should kick in only when it's actually
necessary. See the included tests for examples of both forms.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 42 +++++++++++++++++++++++++++---------------
 t/t1450-fsck.sh | 20 +++++++++++++++++++-
 2 files changed, 46 insertions(+), 16 deletions(-)

Comments

Eric Sunshine May 11, 2023, 6:39 a.m. UTC | #1
On 2/24/23 3:12 AM, Jeff King wrote:
> If we encounter an error in an index file, we may say something like:
> 
>    error: 1234abcd: invalid sha1 pointer in resolve-undo
> 
> But if you have multiple worktrees, each with its own index, it can be
> very helpful to know which file had the problem. So let's pass that path
> down through the various index-fsck functions and use it where
> appropriate. After this patch you should get something like:
> 
>    error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index
> 
> That's a bit verbose, but since the point is that you shouldn't see this
> normally, we're better to err on the side of more details.
> 
> I've also added the index filename to the name used by "fsck
> --name-objects", which will show up if we find the object to be missing,
> etc. This is bending the rules a little there, as the option claims to
> write names that can be fed to rev-parse. But there is no revision
> syntax to access the index of another worktree, so the best we can do is
> make up something that a human will probably understand.
>
> I did take care to retain the existing ":file" syntax for the current
> worktree. So the uglier output should kick in only when it's actually
> necessary. See the included tests for examples of both forms.

This made me think of the work Duy did[1,2] to make it possible to 
reference per-worktree refs from other worktrees which allows one to 
say, for instance:

     git rev-parse main-worktree/HEAD:somefile
     git rev-parse worktrees/foo/HEAD:somefile

but, of course, that special syntax doesn't extend to "index", so your 
made-up syntax is probably good enough.

[1]: 3a3b9d8cde (refs: new ref types to make per-worktree refs visible 
to all worktrees, 2018-10-21)

[2]: ab3e1f78ae (revision.c: better error reporting on ref from 
different worktrees, 2018-10-21)

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> @@ -795,7 +797,8 @@ static int fsck_resolve_undo(struct index_state *istate)
> -static void fsck_index(struct index_state *istate)
> +static void fsck_index(struct index_state *istate, const char *index_path,
> +		       int is_main_index)

This adds an `is_main_index` flag, but...

> @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> +			if (read_index_from(&istate, path,
>   					    get_worktree_git_dir(wt)) > 0)
> -				fsck_index(&istate);
> +				fsck_index(&istate, path, wt->is_current);

...this accesses `is_current`, the value of which is "true" only for the 
worktree in which the Git command was run, which is not necessarily the 
main worktree. The main worktree, on the other hand, is guaranteed to be 
the first entry returned by get_worktrees(), so shouldn't this instead be:

     worktrees = get_worktrees();
     for (p = worktrees; *p; p++) {
         ...
         fsck_index(&istate, path, p == worktrees);
         ...
     }
     free_worktrees(worktrees);

Or am I fundamentally misunderstanding something?
Jeff King May 11, 2023, 4:17 p.m. UTC | #2
On Thu, May 11, 2023 at 02:39:59AM -0400, Eric Sunshine wrote:

> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > diff --git a/builtin/fsck.c b/builtin/fsck.c
> > @@ -795,7 +797,8 @@ static int fsck_resolve_undo(struct index_state *istate)
> > -static void fsck_index(struct index_state *istate)
> > +static void fsck_index(struct index_state *istate, const char *index_path,
> > +		       int is_main_index)
> 
> This adds an `is_main_index` flag, but...
> 
> > @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> > +			if (read_index_from(&istate, path,
> >   					    get_worktree_git_dir(wt)) > 0)
> > -				fsck_index(&istate);
> > +				fsck_index(&istate, path, wt->is_current);
> 
> ...this accesses `is_current`, the value of which is "true" only for the
> worktree in which the Git command was run, which is not necessarily the main
> worktree. The main worktree, on the other hand, is guaranteed to be the
> first entry returned by get_worktrees(), so shouldn't this instead be:
> 
>     worktrees = get_worktrees();
>     for (p = worktrees; *p; p++) {
>         ...
>         fsck_index(&istate, path, p == worktrees);
>         ...
>     }
>     free_worktrees(worktrees);
> 
> Or am I fundamentally misunderstanding something?

I think "current" is what we want here, since the point was to return
the short-but-syntactically-correct ":path-in-index" for the current
worktree, which is where "rev-parse :path-in-index", etc, would look
when resolving that name.

So the code is working as intended, but I may have misused the term
"main" with respect to other worktree code. I didn't even know that was
a concept, not having dealt much with worktrees.

Maybe it's worth s/main/current/ here (and I guess in t1450)?

-Peff
Eric Sunshine May 11, 2023, 4:28 p.m. UTC | #3
On Thu, May 11, 2023 at 12:17 PM Jeff King <peff@peff.net> wrote:
> On Thu, May 11, 2023 at 02:39:59AM -0400, Eric Sunshine wrote:
> > > +static void fsck_index(struct index_state *istate, const char *index_path,
> > > +                  int is_main_index)
> >
> > This adds an `is_main_index` flag, but...
> >
> > > @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> > > +                           fsck_index(&istate, path, wt->is_current);
> >
> > ...this accesses `is_current`, the value of which is "true" only for the
> > worktree in which the Git command was run, which is not necessarily the main
> > worktree. The main worktree, on the other hand, is guaranteed to be the
> > first entry returned by get_worktrees(), so shouldn't this instead be:
> >
> >     for (p = worktrees; *p; p++) {
> >         fsck_index(&istate, path, p == worktrees);
>
> I think "current" is what we want here, since the point was to return
> the short-but-syntactically-correct ":path-in-index" for the current
> worktree, which is where "rev-parse :path-in-index", etc, would look
> when resolving that name.

Okay, that makes sense.

> So the code is working as intended, but I may have misused the term
> "main" with respect to other worktree code. I didn't even know that was
> a concept, not having dealt much with worktrees.
>
> Maybe it's worth s/main/current/ here (and I guess in t1450)?

Yes, s/main/current/ probably would be helpful for future readers of
the code. It's unfortunate that the term "current" can ambiguously
also be read as meaning "the up-to-date index" or "the present-time
index" as opposed to "the index in this directory/worktree", which is
the intention here. But "current" is consistent with the existing
`struct worktree.is_current`, so hopefully should not be too
confusing.
Jeff King May 11, 2023, 5:01 p.m. UTC | #4
On Thu, May 11, 2023 at 12:28:45PM -0400, Eric Sunshine wrote:

> > So the code is working as intended, but I may have misused the term
> > "main" with respect to other worktree code. I didn't even know that was
> > a concept, not having dealt much with worktrees.
> >
> > Maybe it's worth s/main/current/ here (and I guess in t1450)?
> 
> Yes, s/main/current/ probably would be helpful for future readers of
> the code. It's unfortunate that the term "current" can ambiguously
> also be read as meaning "the up-to-date index" or "the present-time
> index" as opposed to "the index in this directory/worktree", which is
> the intention here. But "current" is consistent with the existing
> `struct worktree.is_current`, so hopefully should not be too
> confusing.

I think in this context it should be pretty clear. Do you want to
prepare a patch?

-Peff
Andreas Schwab June 1, 2023, 12:15 p.m. UTC | #5
On Feb 24 2023, Jeff King wrote:

> If we encounter an error in an index file, we may say something like:
>
>   error: 1234abcd: invalid sha1 pointer in resolve-undo
>
> But if you have multiple worktrees, each with its own index, it can be
> very helpful to know which file had the problem. So let's pass that path
> down through the various index-fsck functions and use it where
> appropriate. After this patch you should get something like:
>
>   error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index

That is still suboptimal, because there is no obvious mapping from the
internal worktree name to the directory where it lives (git worktree
list doesn't mention the internal name).  If you have several worktrees
with the same base name in different places, the name under
.git/worktrees is just made unique by appending a number.  Normally you
would want to change to the affected worktree directory to repair it.
Jeff King June 1, 2023, 2:04 p.m. UTC | #6
On Thu, Jun 01, 2023 at 02:15:39PM +0200, Andreas Schwab wrote:

> On Feb 24 2023, Jeff King wrote:
> 
> > If we encounter an error in an index file, we may say something like:
> >
> >   error: 1234abcd: invalid sha1 pointer in resolve-undo
> >
> > But if you have multiple worktrees, each with its own index, it can be
> > very helpful to know which file had the problem. So let's pass that path
> > down through the various index-fsck functions and use it where
> > appropriate. After this patch you should get something like:
> >
> >   error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index
> 
> That is still suboptimal, because there is no obvious mapping from the
> internal worktree name to the directory where it lives (git worktree
> list doesn't mention the internal name).  If you have several worktrees
> with the same base name in different places, the name under
> .git/worktrees is just made unique by appending a number.  Normally you
> would want to change to the affected worktree directory to repair it.

I don't use worktrees all that much, and I never had to repair one of
these cases in the real world, but I would have imagined you'd chdir
into the affected .git directory to fix things (either by blowing away
the index, or by running Git commands inside there).

I don't think it would be too hard to print more information. The caller
of fsck_index() has the "struct worktree", which contains more path
information. But we'd need to figure out how to present it, as well as
which paths to show in fsck_cache_tree(), etc.

So I'd say "patches welcome" if anybody wants to figure out those
issues. :)

-Peff
Eric Sunshine June 29, 2023, 6:21 p.m. UTC | #7
On Thu, May 11, 2023 at 1:01 PM Jeff King <peff@peff.net> wrote:
> On Thu, May 11, 2023 at 12:28:45PM -0400, Eric Sunshine wrote:
> > Yes, s/main/current/ probably would be helpful for future readers of
> > the code. It's unfortunate that the term "current" can ambiguously
> > also be read as meaning "the up-to-date index" or "the present-time
> > index" as opposed to "the index in this directory/worktree", which is
> > the intention here. But "current" is consistent with the existing
> > `struct worktree.is_current`, so hopefully should not be too
> > confusing.
>
> I think in this context it should be pretty clear. Do you want to
> prepare a patch?

Done. As usual, I forgot to use --in-reply-to=<this-thread> when
sending the patch despite having gone through the effort of looking up
the relevant message-ID of this thread. Oh well. The patch is here[1].

[1]: https://lore.kernel.org/git/20230629181333.87465-1-ericsunshine@charter.net/
Junio C Hamano June 29, 2023, 7:37 p.m. UTC | #8
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Thu, May 11, 2023 at 1:01 PM Jeff King <peff@peff.net> wrote:
>> On Thu, May 11, 2023 at 12:28:45PM -0400, Eric Sunshine wrote:
>> > Yes, s/main/current/ probably would be helpful for future readers of
>> > the code. It's unfortunate that the term "current" can ambiguously
>> > also be read as meaning "the up-to-date index" or "the present-time
>> > index" as opposed to "the index in this directory/worktree", which is
>> > the intention here. But "current" is consistent with the existing
>> > `struct worktree.is_current`, so hopefully should not be too
>> > confusing.
>>
>> I think in this context it should be pretty clear. Do you want to
>> prepare a patch?
>
> Done. As usual, I forgot to use --in-reply-to=<this-thread> when
> sending the patch despite having gone through the effort of looking up
> the relevant message-ID of this thread. Oh well. The patch is here[1].
>
> [1]: https://lore.kernel.org/git/20230629181333.87465-1-ericsunshine@charter.net/

I've queued your patch on top of the jk/fsck-indices-in-worktrees
topic as-is, but the earlier discussion in the thread shows that
Peff already is in agreement with the change, so I would not mind
amending in his Acked-by: later.

Thanks, anyway.
diff mbox series

Patch

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ddd13cb2b3..e0974644a1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -731,19 +731,19 @@  static int fsck_head_link(const char *head_ref_name,
 	return 0;
 }
 
-static int fsck_cache_tree(struct cache_tree *it)
+static int fsck_cache_tree(struct cache_tree *it, const char *index_path)
 {
 	int i;
 	int err = 0;
 
 	if (verbose)
-		fprintf_ln(stderr, _("Checking cache tree"));
+		fprintf_ln(stderr, _("Checking cache tree of %s"), index_path);
 
 	if (0 <= it->entry_count) {
 		struct object *obj = parse_object(the_repository, &it->oid);
 		if (!obj) {
-			error(_("%s: invalid sha1 pointer in cache-tree"),
-			      oid_to_hex(&it->oid));
+			error(_("%s: invalid sha1 pointer in cache-tree of %s"),
+			      oid_to_hex(&it->oid), index_path);
 			errors_found |= ERROR_REFS;
 			return 1;
 		}
@@ -754,11 +754,12 @@  static int fsck_cache_tree(struct cache_tree *it)
 			err |= objerror(obj, _("non-tree in cache-tree"));
 	}
 	for (i = 0; i < it->subtree_nr; i++)
-		err |= fsck_cache_tree(it->down[i]->cache_tree);
+		err |= fsck_cache_tree(it->down[i]->cache_tree, index_path);
 	return err;
 }
 
-static int fsck_resolve_undo(struct index_state *istate)
+static int fsck_resolve_undo(struct index_state *istate,
+			     const char *index_path)
 {
 	struct string_list_item *item;
 	struct string_list *resolve_undo = istate->resolve_undo;
@@ -781,8 +782,9 @@  static int fsck_resolve_undo(struct index_state *istate)
 
 			obj = parse_object(the_repository, &ru->oid[i]);
 			if (!obj) {
-				error(_("%s: invalid sha1 pointer in resolve-undo"),
-				      oid_to_hex(&ru->oid[i]));
+				error(_("%s: invalid sha1 pointer in resolve-undo of %s"),
+				      oid_to_hex(&ru->oid[i]),
+				      index_path);
 				errors_found |= ERROR_REFS;
 				continue;
 			}
@@ -795,7 +797,8 @@  static int fsck_resolve_undo(struct index_state *istate)
 	return 0;
 }
 
-static void fsck_index(struct index_state *istate)
+static void fsck_index(struct index_state *istate, const char *index_path,
+		       int is_main_index)
 {
 	unsigned int i;
 
@@ -816,12 +819,14 @@  static void fsck_index(struct index_state *istate)
 		obj = &blob->object;
 		obj->flags |= USED;
 		fsck_put_object_name(&fsck_walk_options, &obj->oid,
-				     ":%s", istate->cache[i]->name);
+				     "%s:%s",
+				     is_main_index ? "" : index_path,
+				     istate->cache[i]->name);
 		mark_object_reachable(obj);
 	}
 	if (istate->cache_tree)
-		fsck_cache_tree(istate->cache_tree);
-	fsck_resolve_undo(istate);
+		fsck_cache_tree(istate->cache_tree, index_path);
+	fsck_resolve_undo(istate, index_path);
 }
 
 static void mark_object_for_connectivity(const struct object_id *oid)
@@ -993,12 +998,19 @@  int cmd_fsck(int argc, const char **argv, const char *prefix)
 			struct worktree *wt = *p;
 			struct index_state istate =
 				INDEX_STATE_INIT(the_repository);
+			char *path;
 
-			if (read_index_from(&istate,
-					    worktree_git_path(wt, "index"),
+			/*
+			 * Make a copy since the buffer is reusable
+			 * and may get overwritten by other calls
+			 * while we're examining the index.
+			 */
+			path = xstrdup(worktree_git_path(wt, "index"));
+			if (read_index_from(&istate, path,
 					    get_worktree_git_dir(wt)) > 0)
-				fsck_index(&istate);
+				fsck_index(&istate, path, wt->is_current);
 			discard_index(&istate);
+			free(path);
 		}
 
 	}
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 3b70ad9e22..bca46378b2 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -1032,7 +1032,25 @@  test_expect_success 'fsck detects problems in worktree index' '
 	blob=$(git -C wt rev-parse :file) &&
 	remove_object $blob &&
 
-	test_must_fail git fsck
+	test_must_fail git fsck --name-objects >actual 2>&1 &&
+	cat >expect <<-EOF &&
+	missing blob $blob (.git/worktrees/wt/index:file)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'fsck reports problems in main index without filename' '
+	test_when_finished "rm -f .git/index && git read-tree HEAD" &&
+	echo "this object will be removed to break the main index" >file &&
+	git add file &&
+	blob=$(git rev-parse :file) &&
+	remove_object $blob &&
+
+	test_must_fail git fsck --name-objects >actual 2>&1 &&
+	cat >expect <<-EOF &&
+	missing blob $blob (:file)
+	EOF
+	test_cmp expect actual
 '
 
 test_done