From patchwork Fri Feb 24 08:07:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13151028 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C643DC61DA3 for ; Fri, 24 Feb 2023 08:07:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229547AbjBXIHf (ORCPT ); Fri, 24 Feb 2023 03:07:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbjBXIHe (ORCPT ); Fri, 24 Feb 2023 03:07:34 -0500 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 548E055070 for ; Fri, 24 Feb 2023 00:07:33 -0800 (PST) Received: (qmail 4230 invoked by uid 109); 24 Feb 2023 08:07:32 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 24 Feb 2023 08:07:32 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 31716 invoked by uid 111); 24 Feb 2023 08:07:32 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 24 Feb 2023 03:07:32 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 24 Feb 2023 03:07:32 -0500 From: Jeff King To: Johannes Sixt Cc: Git Mailing List Subject: [PATCH 1/3] fsck: factor out index fsck Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The code to fsck an index operates directly on the_index. Let's move it into its own function in preparation for handling the index files from other worktrees. Since we now have only a single reference to the_index, let's drop our USE_THE_INDEX_VARIABLE definition and just use the_repository.index directly. That's a minor cleanup, but also ensures that we didn't miss any references when moving the code into fsck_index(). Signed-off-by: Jeff King --- builtin/fsck.c | 54 ++++++++++++++++++++++++++++---------------------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index d207bd909b..fa101e0db2 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -1,4 +1,3 @@ -#define USE_THE_INDEX_VARIABLE #include "builtin.h" #include "cache.h" #include "repository.h" @@ -796,6 +795,35 @@ static int fsck_resolve_undo(struct index_state *istate) return 0; } +static void fsck_index(struct index_state *istate) +{ + unsigned int i; + + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); + for (i = 0; i < istate->cache_nr; i++) { + unsigned int mode; + struct blob *blob; + struct object *obj; + + mode = istate->cache[i]->ce_mode; + if (S_ISGITLINK(mode)) + continue; + blob = lookup_blob(the_repository, + &istate->cache[i]->oid); + if (!blob) + continue; + obj = &blob->object; + obj->flags |= USED; + fsck_put_object_name(&fsck_walk_options, &obj->oid, + ":%s", istate->cache[i]->name); + mark_object_reachable(obj); + } + if (istate->cache_tree) + fsck_cache_tree(istate->cache_tree); + fsck_resolve_undo(istate); +} + static void mark_object_for_connectivity(const struct object_id *oid) { struct object *obj = lookup_unknown_object(the_repository, oid); @@ -959,29 +987,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) verify_index_checksum = 1; verify_ce_order = 1; repo_read_index(the_repository); - /* TODO: audit for interaction with sparse-index. */ - ensure_full_index(&the_index); - for (i = 0; i < the_index.cache_nr; i++) { - unsigned int mode; - struct blob *blob; - struct object *obj; - - mode = the_index.cache[i]->ce_mode; - if (S_ISGITLINK(mode)) - continue; - blob = lookup_blob(the_repository, - &the_index.cache[i]->oid); - if (!blob) - continue; - obj = &blob->object; - obj->flags |= USED; - fsck_put_object_name(&fsck_walk_options, &obj->oid, - ":%s", the_index.cache[i]->name); - mark_object_reachable(obj); - } - if (the_index.cache_tree) - fsck_cache_tree(the_index.cache_tree); - fsck_resolve_undo(&the_index); + fsck_index(the_repository->index); } check_connectivity(); From patchwork Fri Feb 24 08:09:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13151029 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 385EDC61DA3 for ; Fri, 24 Feb 2023 08:10:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbjBXIKB (ORCPT ); Fri, 24 Feb 2023 03:10:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjBXIJ7 (ORCPT ); Fri, 24 Feb 2023 03:09:59 -0500 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BB9519F17 for ; Fri, 24 Feb 2023 00:09:58 -0800 (PST) Received: (qmail 4549 invoked by uid 109); 24 Feb 2023 08:09:58 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 24 Feb 2023 08:09:58 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 31727 invoked by uid 111); 24 Feb 2023 08:09:58 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 24 Feb 2023 03:09:58 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 24 Feb 2023 03:09:57 -0500 From: Jeff King To: Johannes Sixt Cc: Git Mailing List Subject: [PATCH 2/3] fsck: check index files in all worktrees Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We check the index file for the main worktree, but completely ignore the index files in other worktrees. These should be checked, too, as they are part of the repository state (and in particular, errors in those index files may cause repo-wide operations like "git gc" to complain). Reported-by: Johannes Sixt Signed-off-by: Jeff King --- I mostly cargo-culted the loop from add_index_objects_to_pending(), which is how they get included in git-prune, git-repack, etc. I'm not sure if we should be reporting something if we get a negative return. Last time I dug into this, I think I found that the index-reading code could not actually return an error (it dies instead). If we get zero, there are no entries to check. But I'm not sure if we'd still have extensions? So maybe this ought to be ignoring the return value from read_index_from() entirely. builtin/fsck.c | 17 +++++++++++++++-- t/t1450-fsck.sh | 12 ++++++++++++ 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index fa101e0db2..ddd13cb2b3 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -984,10 +984,23 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) } if (keep_cache_objects) { + struct worktree **p; + verify_index_checksum = 1; verify_ce_order = 1; - repo_read_index(the_repository); - fsck_index(the_repository->index); + + for (p = get_worktrees(); *p; p++) { + struct worktree *wt = *p; + struct index_state istate = + INDEX_STATE_INIT(the_repository); + + if (read_index_from(&istate, + worktree_git_path(wt, "index"), + get_worktree_git_dir(wt)) > 0) + fsck_index(&istate); + discard_index(&istate); + } + } check_connectivity(); diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index fdb886dfe4..3b70ad9e22 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -1023,4 +1023,16 @@ test_expect_success 'fsck error on gitattributes with excessive size' ' test_cmp expected actual ' +test_expect_success 'fsck detects problems in worktree index' ' + test_when_finished "git worktree remove -f wt" && + git worktree add wt && + + echo "this will be removed to break the worktree index" >wt/file && + git -C wt add file && + blob=$(git -C wt rev-parse :file) && + remove_object $blob && + + test_must_fail git fsck +' + test_done From patchwork Fri Feb 24 08:12:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13151030 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16E37C64ED8 for ; Fri, 24 Feb 2023 08:12:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229642AbjBXIMa (ORCPT ); Fri, 24 Feb 2023 03:12:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229670AbjBXIM2 (ORCPT ); Fri, 24 Feb 2023 03:12:28 -0500 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BB9C5C178 for ; Fri, 24 Feb 2023 00:12:12 -0800 (PST) Received: (qmail 4558 invoked by uid 109); 24 Feb 2023 08:12:11 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 24 Feb 2023 08:12:11 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 31767 invoked by uid 111); 24 Feb 2023 08:12:11 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 24 Feb 2023 03:12:11 -0500 Authentication-Results: peff.net; auth=none Date: Fri, 24 Feb 2023 03:12:11 -0500 From: Jeff King To: Johannes Sixt Cc: Git Mailing List Subject: [PATCH 3/3] fsck: mention file path for index errors Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org If we encounter an error in an index file, we may say something like: error: 1234abcd: invalid sha1 pointer in resolve-undo But if you have multiple worktrees, each with its own index, it can be very helpful to know which file had the problem. So let's pass that path down through the various index-fsck functions and use it where appropriate. After this patch you should get something like: error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index That's a bit verbose, but since the point is that you shouldn't see this normally, we're better to err on the side of more details. I've also added the index filename to the name used by "fsck --name-objects", which will show up if we find the object to be missing, etc. This is bending the rules a little there, as the option claims to write names that can be fed to rev-parse. But there is no revision syntax to access the index of another worktree, so the best we can do is make up something that a human will probably understand. I did take care to retain the existing ":file" syntax for the current worktree. So the uglier output should kick in only when it's actually necessary. See the included tests for examples of both forms. Signed-off-by: Jeff King --- builtin/fsck.c | 42 +++++++++++++++++++++++++++--------------- t/t1450-fsck.sh | 20 +++++++++++++++++++- 2 files changed, 46 insertions(+), 16 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index ddd13cb2b3..e0974644a1 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -731,19 +731,19 @@ static int fsck_head_link(const char *head_ref_name, return 0; } -static int fsck_cache_tree(struct cache_tree *it) +static int fsck_cache_tree(struct cache_tree *it, const char *index_path) { int i; int err = 0; if (verbose) - fprintf_ln(stderr, _("Checking cache tree")); + fprintf_ln(stderr, _("Checking cache tree of %s"), index_path); if (0 <= it->entry_count) { struct object *obj = parse_object(the_repository, &it->oid); if (!obj) { - error(_("%s: invalid sha1 pointer in cache-tree"), - oid_to_hex(&it->oid)); + error(_("%s: invalid sha1 pointer in cache-tree of %s"), + oid_to_hex(&it->oid), index_path); errors_found |= ERROR_REFS; return 1; } @@ -754,11 +754,12 @@ static int fsck_cache_tree(struct cache_tree *it) err |= objerror(obj, _("non-tree in cache-tree")); } for (i = 0; i < it->subtree_nr; i++) - err |= fsck_cache_tree(it->down[i]->cache_tree); + err |= fsck_cache_tree(it->down[i]->cache_tree, index_path); return err; } -static int fsck_resolve_undo(struct index_state *istate) +static int fsck_resolve_undo(struct index_state *istate, + const char *index_path) { struct string_list_item *item; struct string_list *resolve_undo = istate->resolve_undo; @@ -781,8 +782,9 @@ static int fsck_resolve_undo(struct index_state *istate) obj = parse_object(the_repository, &ru->oid[i]); if (!obj) { - error(_("%s: invalid sha1 pointer in resolve-undo"), - oid_to_hex(&ru->oid[i])); + error(_("%s: invalid sha1 pointer in resolve-undo of %s"), + oid_to_hex(&ru->oid[i]), + index_path); errors_found |= ERROR_REFS; continue; } @@ -795,7 +797,8 @@ static int fsck_resolve_undo(struct index_state *istate) return 0; } -static void fsck_index(struct index_state *istate) +static void fsck_index(struct index_state *istate, const char *index_path, + int is_main_index) { unsigned int i; @@ -816,12 +819,14 @@ static void fsck_index(struct index_state *istate) obj = &blob->object; obj->flags |= USED; fsck_put_object_name(&fsck_walk_options, &obj->oid, - ":%s", istate->cache[i]->name); + "%s:%s", + is_main_index ? "" : index_path, + istate->cache[i]->name); mark_object_reachable(obj); } if (istate->cache_tree) - fsck_cache_tree(istate->cache_tree); - fsck_resolve_undo(istate); + fsck_cache_tree(istate->cache_tree, index_path); + fsck_resolve_undo(istate, index_path); } static void mark_object_for_connectivity(const struct object_id *oid) @@ -993,12 +998,19 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) struct worktree *wt = *p; struct index_state istate = INDEX_STATE_INIT(the_repository); + char *path; - if (read_index_from(&istate, - worktree_git_path(wt, "index"), + /* + * Make a copy since the buffer is reusable + * and may get overwritten by other calls + * while we're examining the index. + */ + path = xstrdup(worktree_git_path(wt, "index")); + if (read_index_from(&istate, path, get_worktree_git_dir(wt)) > 0) - fsck_index(&istate); + fsck_index(&istate, path, wt->is_current); discard_index(&istate); + free(path); } } diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 3b70ad9e22..bca46378b2 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -1032,7 +1032,25 @@ test_expect_success 'fsck detects problems in worktree index' ' blob=$(git -C wt rev-parse :file) && remove_object $blob && - test_must_fail git fsck + test_must_fail git fsck --name-objects >actual 2>&1 && + cat >expect <<-EOF && + missing blob $blob (.git/worktrees/wt/index:file) + EOF + test_cmp expect actual +' + +test_expect_success 'fsck reports problems in main index without filename' ' + test_when_finished "rm -f .git/index && git read-tree HEAD" && + echo "this object will be removed to break the main index" >file && + git add file && + blob=$(git rev-parse :file) && + remove_object $blob && + + test_must_fail git fsck --name-objects >actual 2>&1 && + cat >expect <<-EOF && + missing blob $blob (:file) + EOF + test_cmp expect actual ' test_done From patchwork Sun Feb 26 22:29:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13152512 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00C7CC7EE2D for ; Sun, 26 Feb 2023 22:30:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229773AbjBZWak (ORCPT ); Sun, 26 Feb 2023 17:30:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229794AbjBZWac (ORCPT ); Sun, 26 Feb 2023 17:30:32 -0500 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 499F419F09 for ; Sun, 26 Feb 2023 14:29:55 -0800 (PST) Received: (qmail 1928 invoked by uid 109); 26 Feb 2023 22:29:44 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Sun, 26 Feb 2023 22:29:44 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 9560 invoked by uid 111); 26 Feb 2023 22:29:43 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Sun, 26 Feb 2023 17:29:43 -0500 Authentication-Results: peff.net; auth=none Date: Sun, 26 Feb 2023 17:29:43 -0500 From: Jeff King To: Junio C Hamano Cc: Johannes Sixt , Git Mailing List Subject: [PATCH 4/3] fsck: check even zero-entry index files Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Feb 24, 2023 at 09:30:44AM -0800, Junio C Hamano wrote: > So we had a separate worktree with its index pointing at an object > by its resolve-undo (or cache-tree) extension, but somehow lost that > object to gc (I agree with your assessment that it should no longer > happen since 2017). gc these days knows about looking at the index > of all worktrees, finds the issue, and stops for safety. fsck that > is run in the primary worktree may not have noticed but fsck run > from that worktree would notice the issue. > > Sounds like a frustrating one. > > Thanks, both, for finding and fixing. I saw that this hit next, but I had a few fixups that I had planned to squash in. I saw you got the leak-fix one, but I have one more. Since this is the end of the cycle, we _could_ just squash it in when we rewind next. But having now written it as a patch on top, I think the explanation kind of merits its own commit. -- >8 -- Subject: [PATCH] fsck: check even zero-entry index files In fb64ca526a (fsck: check index files in all worktrees, 2023-02-24), we swapped out a call to vanilla repo_read_index() for a series of read_index_from() calls, one per worktree. The code for the latter was copied from add_index_objects_to_pending(), which checks for a positive return value from the index reading function, and we do the same here in fsck now. But this is probably the wrong thing. I had interpreted the check as "don't operate on the index struct if there was an error". But in reality, if there is an error then the index-reading code will simply die (which admittedly is not great for fsck, but that is not a new problem). The return value here is actually the number of entries read. So it makes sense for add_index_objects_to_pending() to ignore a zero-entry index (there is nothing to add). But for fsck, we would still want to check any extensions, etc (though presumably it is unlikely to have them in an empty index, I don't think it's impossible). So we should ignore the return value from read_index_from() entirely. This matches the behavior before fb64ca526a, when we ignored the return value from repo_read_index(). Signed-off-by: Jeff King --- On top of jk/fsck-indices-in-worktrees. builtin/fsck.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 1b032eebb1..64614b43b2 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -1007,9 +1007,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) * while we're examining the index. */ path = xstrdup(worktree_git_path(wt, "index")); - if (read_index_from(&istate, path, - get_worktree_git_dir(wt)) > 0) - fsck_index(&istate, path, wt->is_current); + read_index_from(&istate, path, get_worktree_git_dir(wt)); + fsck_index(&istate, path, wt->is_current); discard_index(&istate); free(path); }