Message ID | Y/hxW9i9GyKblNV4@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fsck index files from all worktrees | expand |
On 2/24/23 3:12 AM, Jeff King wrote: > If we encounter an error in an index file, we may say something like: > > error: 1234abcd: invalid sha1 pointer in resolve-undo > > But if you have multiple worktrees, each with its own index, it can be > very helpful to know which file had the problem. So let's pass that path > down through the various index-fsck functions and use it where > appropriate. After this patch you should get something like: > > error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index > > That's a bit verbose, but since the point is that you shouldn't see this > normally, we're better to err on the side of more details. > > I've also added the index filename to the name used by "fsck > --name-objects", which will show up if we find the object to be missing, > etc. This is bending the rules a little there, as the option claims to > write names that can be fed to rev-parse. But there is no revision > syntax to access the index of another worktree, so the best we can do is > make up something that a human will probably understand. > > I did take care to retain the existing ":file" syntax for the current > worktree. So the uglier output should kick in only when it's actually > necessary. See the included tests for examples of both forms. This made me think of the work Duy did[1,2] to make it possible to reference per-worktree refs from other worktrees which allows one to say, for instance: git rev-parse main-worktree/HEAD:somefile git rev-parse worktrees/foo/HEAD:somefile but, of course, that special syntax doesn't extend to "index", so your made-up syntax is probably good enough. [1]: 3a3b9d8cde (refs: new ref types to make per-worktree refs visible to all worktrees, 2018-10-21) [2]: ab3e1f78ae (revision.c: better error reporting on ref from different worktrees, 2018-10-21) > Signed-off-by: Jeff King <peff@peff.net> > --- > diff --git a/builtin/fsck.c b/builtin/fsck.c > @@ -795,7 +797,8 @@ static int fsck_resolve_undo(struct index_state *istate) > -static void fsck_index(struct index_state *istate) > +static void fsck_index(struct index_state *istate, const char *index_path, > + int is_main_index) This adds an `is_main_index` flag, but... > @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) > + if (read_index_from(&istate, path, > get_worktree_git_dir(wt)) > 0) > - fsck_index(&istate); > + fsck_index(&istate, path, wt->is_current); ...this accesses `is_current`, the value of which is "true" only for the worktree in which the Git command was run, which is not necessarily the main worktree. The main worktree, on the other hand, is guaranteed to be the first entry returned by get_worktrees(), so shouldn't this instead be: worktrees = get_worktrees(); for (p = worktrees; *p; p++) { ... fsck_index(&istate, path, p == worktrees); ... } free_worktrees(worktrees); Or am I fundamentally misunderstanding something?
On Thu, May 11, 2023 at 02:39:59AM -0400, Eric Sunshine wrote: > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > diff --git a/builtin/fsck.c b/builtin/fsck.c > > @@ -795,7 +797,8 @@ static int fsck_resolve_undo(struct index_state *istate) > > -static void fsck_index(struct index_state *istate) > > +static void fsck_index(struct index_state *istate, const char *index_path, > > + int is_main_index) > > This adds an `is_main_index` flag, but... > > > @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) > > + if (read_index_from(&istate, path, > > get_worktree_git_dir(wt)) > 0) > > - fsck_index(&istate); > > + fsck_index(&istate, path, wt->is_current); > > ...this accesses `is_current`, the value of which is "true" only for the > worktree in which the Git command was run, which is not necessarily the main > worktree. The main worktree, on the other hand, is guaranteed to be the > first entry returned by get_worktrees(), so shouldn't this instead be: > > worktrees = get_worktrees(); > for (p = worktrees; *p; p++) { > ... > fsck_index(&istate, path, p == worktrees); > ... > } > free_worktrees(worktrees); > > Or am I fundamentally misunderstanding something? I think "current" is what we want here, since the point was to return the short-but-syntactically-correct ":path-in-index" for the current worktree, which is where "rev-parse :path-in-index", etc, would look when resolving that name. So the code is working as intended, but I may have misused the term "main" with respect to other worktree code. I didn't even know that was a concept, not having dealt much with worktrees. Maybe it's worth s/main/current/ here (and I guess in t1450)? -Peff
On Thu, May 11, 2023 at 12:17 PM Jeff King <peff@peff.net> wrote: > On Thu, May 11, 2023 at 02:39:59AM -0400, Eric Sunshine wrote: > > > +static void fsck_index(struct index_state *istate, const char *index_path, > > > + int is_main_index) > > > > This adds an `is_main_index` flag, but... > > > > > @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) > > > + fsck_index(&istate, path, wt->is_current); > > > > ...this accesses `is_current`, the value of which is "true" only for the > > worktree in which the Git command was run, which is not necessarily the main > > worktree. The main worktree, on the other hand, is guaranteed to be the > > first entry returned by get_worktrees(), so shouldn't this instead be: > > > > for (p = worktrees; *p; p++) { > > fsck_index(&istate, path, p == worktrees); > > I think "current" is what we want here, since the point was to return > the short-but-syntactically-correct ":path-in-index" for the current > worktree, which is where "rev-parse :path-in-index", etc, would look > when resolving that name. Okay, that makes sense. > So the code is working as intended, but I may have misused the term > "main" with respect to other worktree code. I didn't even know that was > a concept, not having dealt much with worktrees. > > Maybe it's worth s/main/current/ here (and I guess in t1450)? Yes, s/main/current/ probably would be helpful for future readers of the code. It's unfortunate that the term "current" can ambiguously also be read as meaning "the up-to-date index" or "the present-time index" as opposed to "the index in this directory/worktree", which is the intention here. But "current" is consistent with the existing `struct worktree.is_current`, so hopefully should not be too confusing.
On Thu, May 11, 2023 at 12:28:45PM -0400, Eric Sunshine wrote: > > So the code is working as intended, but I may have misused the term > > "main" with respect to other worktree code. I didn't even know that was > > a concept, not having dealt much with worktrees. > > > > Maybe it's worth s/main/current/ here (and I guess in t1450)? > > Yes, s/main/current/ probably would be helpful for future readers of > the code. It's unfortunate that the term "current" can ambiguously > also be read as meaning "the up-to-date index" or "the present-time > index" as opposed to "the index in this directory/worktree", which is > the intention here. But "current" is consistent with the existing > `struct worktree.is_current`, so hopefully should not be too > confusing. I think in this context it should be pretty clear. Do you want to prepare a patch? -Peff
On Feb 24 2023, Jeff King wrote: > If we encounter an error in an index file, we may say something like: > > error: 1234abcd: invalid sha1 pointer in resolve-undo > > But if you have multiple worktrees, each with its own index, it can be > very helpful to know which file had the problem. So let's pass that path > down through the various index-fsck functions and use it where > appropriate. After this patch you should get something like: > > error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index That is still suboptimal, because there is no obvious mapping from the internal worktree name to the directory where it lives (git worktree list doesn't mention the internal name). If you have several worktrees with the same base name in different places, the name under .git/worktrees is just made unique by appending a number. Normally you would want to change to the affected worktree directory to repair it.
On Thu, Jun 01, 2023 at 02:15:39PM +0200, Andreas Schwab wrote: > On Feb 24 2023, Jeff King wrote: > > > If we encounter an error in an index file, we may say something like: > > > > error: 1234abcd: invalid sha1 pointer in resolve-undo > > > > But if you have multiple worktrees, each with its own index, it can be > > very helpful to know which file had the problem. So let's pass that path > > down through the various index-fsck functions and use it where > > appropriate. After this patch you should get something like: > > > > error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index > > That is still suboptimal, because there is no obvious mapping from the > internal worktree name to the directory where it lives (git worktree > list doesn't mention the internal name). If you have several worktrees > with the same base name in different places, the name under > .git/worktrees is just made unique by appending a number. Normally you > would want to change to the affected worktree directory to repair it. I don't use worktrees all that much, and I never had to repair one of these cases in the real world, but I would have imagined you'd chdir into the affected .git directory to fix things (either by blowing away the index, or by running Git commands inside there). I don't think it would be too hard to print more information. The caller of fsck_index() has the "struct worktree", which contains more path information. But we'd need to figure out how to present it, as well as which paths to show in fsck_cache_tree(), etc. So I'd say "patches welcome" if anybody wants to figure out those issues. :) -Peff
On Thu, May 11, 2023 at 1:01 PM Jeff King <peff@peff.net> wrote: > On Thu, May 11, 2023 at 12:28:45PM -0400, Eric Sunshine wrote: > > Yes, s/main/current/ probably would be helpful for future readers of > > the code. It's unfortunate that the term "current" can ambiguously > > also be read as meaning "the up-to-date index" or "the present-time > > index" as opposed to "the index in this directory/worktree", which is > > the intention here. But "current" is consistent with the existing > > `struct worktree.is_current`, so hopefully should not be too > > confusing. > > I think in this context it should be pretty clear. Do you want to > prepare a patch? Done. As usual, I forgot to use --in-reply-to=<this-thread> when sending the patch despite having gone through the effort of looking up the relevant message-ID of this thread. Oh well. The patch is here[1]. [1]: https://lore.kernel.org/git/20230629181333.87465-1-ericsunshine@charter.net/
Eric Sunshine <sunshine@sunshineco.com> writes: > On Thu, May 11, 2023 at 1:01 PM Jeff King <peff@peff.net> wrote: >> On Thu, May 11, 2023 at 12:28:45PM -0400, Eric Sunshine wrote: >> > Yes, s/main/current/ probably would be helpful for future readers of >> > the code. It's unfortunate that the term "current" can ambiguously >> > also be read as meaning "the up-to-date index" or "the present-time >> > index" as opposed to "the index in this directory/worktree", which is >> > the intention here. But "current" is consistent with the existing >> > `struct worktree.is_current`, so hopefully should not be too >> > confusing. >> >> I think in this context it should be pretty clear. Do you want to >> prepare a patch? > > Done. As usual, I forgot to use --in-reply-to=<this-thread> when > sending the patch despite having gone through the effort of looking up > the relevant message-ID of this thread. Oh well. The patch is here[1]. > > [1]: https://lore.kernel.org/git/20230629181333.87465-1-ericsunshine@charter.net/ I've queued your patch on top of the jk/fsck-indices-in-worktrees topic as-is, but the earlier discussion in the thread shows that Peff already is in agreement with the change, so I would not mind amending in his Acked-by: later. Thanks, anyway.
diff --git a/builtin/fsck.c b/builtin/fsck.c index ddd13cb2b3..e0974644a1 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -731,19 +731,19 @@ static int fsck_head_link(const char *head_ref_name, return 0; } -static int fsck_cache_tree(struct cache_tree *it) +static int fsck_cache_tree(struct cache_tree *it, const char *index_path) { int i; int err = 0; if (verbose) - fprintf_ln(stderr, _("Checking cache tree")); + fprintf_ln(stderr, _("Checking cache tree of %s"), index_path); if (0 <= it->entry_count) { struct object *obj = parse_object(the_repository, &it->oid); if (!obj) { - error(_("%s: invalid sha1 pointer in cache-tree"), - oid_to_hex(&it->oid)); + error(_("%s: invalid sha1 pointer in cache-tree of %s"), + oid_to_hex(&it->oid), index_path); errors_found |= ERROR_REFS; return 1; } @@ -754,11 +754,12 @@ static int fsck_cache_tree(struct cache_tree *it) err |= objerror(obj, _("non-tree in cache-tree")); } for (i = 0; i < it->subtree_nr; i++) - err |= fsck_cache_tree(it->down[i]->cache_tree); + err |= fsck_cache_tree(it->down[i]->cache_tree, index_path); return err; } -static int fsck_resolve_undo(struct index_state *istate) +static int fsck_resolve_undo(struct index_state *istate, + const char *index_path) { struct string_list_item *item; struct string_list *resolve_undo = istate->resolve_undo; @@ -781,8 +782,9 @@ static int fsck_resolve_undo(struct index_state *istate) obj = parse_object(the_repository, &ru->oid[i]); if (!obj) { - error(_("%s: invalid sha1 pointer in resolve-undo"), - oid_to_hex(&ru->oid[i])); + error(_("%s: invalid sha1 pointer in resolve-undo of %s"), + oid_to_hex(&ru->oid[i]), + index_path); errors_found |= ERROR_REFS; continue; } @@ -795,7 +797,8 @@ static int fsck_resolve_undo(struct index_state *istate) return 0; } -static void fsck_index(struct index_state *istate) +static void fsck_index(struct index_state *istate, const char *index_path, + int is_main_index) { unsigned int i; @@ -816,12 +819,14 @@ static void fsck_index(struct index_state *istate) obj = &blob->object; obj->flags |= USED; fsck_put_object_name(&fsck_walk_options, &obj->oid, - ":%s", istate->cache[i]->name); + "%s:%s", + is_main_index ? "" : index_path, + istate->cache[i]->name); mark_object_reachable(obj); } if (istate->cache_tree) - fsck_cache_tree(istate->cache_tree); - fsck_resolve_undo(istate); + fsck_cache_tree(istate->cache_tree, index_path); + fsck_resolve_undo(istate, index_path); } static void mark_object_for_connectivity(const struct object_id *oid) @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) struct worktree *wt = *p; struct index_state istate = INDEX_STATE_INIT(the_repository); + char *path; - if (read_index_from(&istate, - worktree_git_path(wt, "index"), + /* + * Make a copy since the buffer is reusable + * and may get overwritten by other calls + * while we're examining the index. + */ + path = xstrdup(worktree_git_path(wt, "index")); + if (read_index_from(&istate, path, get_worktree_git_dir(wt)) > 0) - fsck_index(&istate); + fsck_index(&istate, path, wt->is_current); discard_index(&istate); + free(path); } } diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 3b70ad9e22..bca46378b2 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -1032,7 +1032,25 @@ test_expect_success 'fsck detects problems in worktree index' ' blob=$(git -C wt rev-parse :file) && remove_object $blob && - test_must_fail git fsck + test_must_fail git fsck --name-objects >actual 2>&1 && + cat >expect <<-EOF && + missing blob $blob (.git/worktrees/wt/index:file) + EOF + test_cmp expect actual +' + +test_expect_success 'fsck reports problems in main index without filename' ' + test_when_finished "rm -f .git/index && git read-tree HEAD" && + echo "this object will be removed to break the main index" >file && + git add file && + blob=$(git rev-parse :file) && + remove_object $blob && + + test_must_fail git fsck --name-objects >actual 2>&1 && + cat >expect <<-EOF && + missing blob $blob (:file) + EOF + test_cmp expect actual ' test_done
If we encounter an error in an index file, we may say something like: error: 1234abcd: invalid sha1 pointer in resolve-undo But if you have multiple worktrees, each with its own index, it can be very helpful to know which file had the problem. So let's pass that path down through the various index-fsck functions and use it where appropriate. After this patch you should get something like: error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index That's a bit verbose, but since the point is that you shouldn't see this normally, we're better to err on the side of more details. I've also added the index filename to the name used by "fsck --name-objects", which will show up if we find the object to be missing, etc. This is bending the rules a little there, as the option claims to write names that can be fed to rev-parse. But there is no revision syntax to access the index of another worktree, so the best we can do is make up something that a human will probably understand. I did take care to retain the existing ":file" syntax for the current worktree. So the uglier output should kick in only when it's actually necessary. See the included tests for examples of both forms. Signed-off-by: Jeff King <peff@peff.net> --- builtin/fsck.c | 42 +++++++++++++++++++++++++++--------------- t/t1450-fsck.sh | 20 +++++++++++++++++++- 2 files changed, 46 insertions(+), 16 deletions(-)