Message ID | cover.1691773533.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | commit-graph: fsck zero/non-zero generation number fixes | expand |
On Fri, Aug 11, 2023 at 01:05:39PM -0400, Taylor Blau wrote: > Here's a small reroll of a series that I sent which expanded on a patch > that Peff sent earlier in the thread to remove a section of unreachable > code that was noticed by Coverity in the `verify_one_commit_graph()` > function. > > Everything is the same in the first three patches. The fourth patch is > slightly modified to (in addition to flipping the conditional) extract > the mixed zero/non-zero generation number checks out to its own > function. > > The fifth patch is new, and avoids repeatedly warning about mixed > generation numbers by treating `generation_zero` as a bitfield. This all looks correct to me. Let me show what I thought the result might look like, just because I think the result is a bit simpler. We may be hitting diminishing returns on refactoring here, though, so if you're not wildly impressed, I'm happy enough if we go with what you've written. This applies on top of yours, but probably would replace patches 2, 4, and 5 (the flip-flop case isn't even really worth testing after this, since the message can obviously only be shown once). commit-graph.c | 42 +++++++++-------------------------- t/t5318-commit-graph.sh | 18 ++------------- 2 files changed, 13 insertions(+), 47 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 8d924b4509..079bbc8598 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2560,45 +2560,19 @@ static void graph_report(const char *fmt, ...) va_end(ap); } -#define GENERATION_ZERO_EXISTS 1 -#define GENERATION_NUMBER_EXISTS 2 -#define GENERATION_NUMBER_BOTH_EXIST \ - (GENERATION_ZERO_EXISTS | GENERATION_NUMBER_EXISTS) - static int commit_graph_checksum_valid(struct commit_graph *g) { return hashfile_checksum_valid(g->data, g->data_len); } -static void verify_mixed_generation_numbers(struct commit_graph *g, - struct commit *graph_commit, - int *generation_zero) -{ - if (commit_graph_generation_from_graph(graph_commit)) { - if (*generation_zero & GENERATION_ZERO_EXISTS) - graph_report(_("commit-graph has non-zero generation " - "number for commit %s, but zero " - "elsewhere"), - oid_to_hex(&graph_commit->object.oid)); - *generation_zero |= GENERATION_NUMBER_EXISTS; - } else { - if (*generation_zero & GENERATION_NUMBER_EXISTS) - graph_report(_("commit-graph has generation number " - "zero for commit %s, but non-zero " - "elsewhere"), - oid_to_hex(&graph_commit->object.oid)); - *generation_zero |= GENERATION_ZERO_EXISTS; - } -} - static int verify_one_commit_graph(struct repository *r, struct commit_graph *g, struct progress *progress, uint64_t *seen) { uint32_t i, cur_fanout_pos = 0; struct object_id prev_oid, cur_oid; - int generation_zero = 0; + struct commit *seen_gen_zero = NULL, *seen_gen_non_zero = NULL; verify_commit_graph_error = verify_commit_graph_lite(g); if (verify_commit_graph_error) @@ -2704,11 +2678,12 @@ static int verify_one_commit_graph(struct repository *r, graph_report(_("commit-graph parent list for commit %s terminates early"), oid_to_hex(&cur_oid)); - if (generation_zero != GENERATION_NUMBER_BOTH_EXIST) - verify_mixed_generation_numbers(g, graph_commit, - &generation_zero); + if (!commit_graph_generation_from_graph(graph_commit)) + seen_gen_zero = graph_commit; + else + seen_gen_non_zero = graph_commit; - if (generation_zero & GENERATION_ZERO_EXISTS) + if (seen_gen_zero) continue; /* @@ -2734,6 +2709,11 @@ static int verify_one_commit_graph(struct repository *r, odb_commit->date); } + if (seen_gen_zero && seen_gen_non_zero) + graph_report(_("commit-graph has both zero and non-zero generations (e.g., commits %s and %s"), + oid_to_hex(&seen_gen_zero->object.oid), + oid_to_hex(&seen_gen_non_zero->object.oid)); + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2626d41c94..ca5e2c87ae 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -620,26 +620,12 @@ test_expect_success 'detect incorrect chunk count' ' test_expect_success 'detect mixed generation numbers (non-zero to zero)' ' corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION_LAST "\0\0\0\0" \ - "but non-zero elsewhere" + "both zero and non-zero" ' test_expect_success 'detect mixed generation numbers (zero to non-zero)' ' corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\0\0\0\0" \ - "but zero elsewhere" -' - -test_expect_success 'detect mixed generation numbers (flip-flop)' ' - corrupt_graph_setup && - for pos in \ - $GRAPH_BYTE_COMMIT_GENERATION \ - $GRAPH_BYTE_COMMIT_GENERATION_LAST - do - printf "\0\0\0\0" | dd of="full/$objdir/info/commit-graph" bs=1 \ - seek="$pos" conv=notrunc || return 1 - done && - - test_must_fail git -C full commit-graph verify 2>err && - test 1 -eq "$(grep -c "generation number" err)" + "both zero and non-zero" ' test_expect_success 'git fsck (checks commit-graph when config set to true)' '
Jeff King <peff@peff.net> writes: > This applies on top of yours, but probably would replace patches 2, 4, > and 5 (the flip-flop case isn't even really worth testing after this, > since the message can obviously only be shown once). > > commit-graph.c | 42 +++++++++-------------------------- > t/t5318-commit-graph.sh | 18 ++------------- > 2 files changed, 13 insertions(+), 47 deletions(-) Quite an impressive amount of code reduction. I obviously like it. One very minor thing is that how much value are we getting by reporting the object names of one example from each camp, instead of just reporting a single bit "we have commits not counted and also counted their generations, which is an anomaly". Obviously it does not matter. Even if we stopped doing so, the code would not become much simpler. We'd just use a word with two bits instead of two pointers to existing in-core objects, which does not have meaningful performance implications either way. Thanks.
On Fri, Aug 11, 2023 at 12:28:49PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > This applies on top of yours, but probably would replace patches 2, 4, > > and 5 (the flip-flop case isn't even really worth testing after this, > > since the message can obviously only be shown once). > > > > commit-graph.c | 42 +++++++++-------------------------- > > t/t5318-commit-graph.sh | 18 ++------------- > > 2 files changed, 13 insertions(+), 47 deletions(-) > > Quite an impressive amount of code reduction. I obviously like it. > > One very minor thing is that how much value are we getting by > reporting the object names of one example from each camp, instead of > just reporting a single bit "we have commits not counted and also > counted their generations, which is an anomaly". > > Obviously it does not matter. Even if we stopped doing so, the code > would not become much simpler. We'd just use a word with two bits > instead of two pointers to existing in-core objects, which does not > have meaningful performance implications either way. Yeah, I wasn't sure if the commit names were valuable or not. Two bits would definitely work (though I have a slight preference for two boolean variables, just because I find the syntax easier to read). I don't think we've heard from Taylor, but I saw his original patches were in 'next'. I'm happy to clean up what I posted, but I'm also happy if we just merge what's in next and move on. -Peff
On Thu, Aug 17, 2023 at 03:51:08PM -0400, Jeff King wrote: > > One very minor thing is that how much value are we getting by > > reporting the object names of one example from each camp, instead of > > just reporting a single bit "we have commits not counted and also > > counted their generations, which is an anomaly". > > > > Obviously it does not matter. Even if we stopped doing so, the code > > would not become much simpler. We'd just use a word with two bits > > instead of two pointers to existing in-core objects, which does not > > have meaningful performance implications either way. > > Yeah, I wasn't sure if the commit names were valuable or not. Two bits > would definitely work (though I have a slight preference for two > boolean variables, just because I find the syntax easier to read). I think having a single example of both a commit with zero and non-zero generation is marginally useful. I think keeping track of two commit pointers is more straightforward than the bit-field, and it does not complicate things too much, so I think it is worth keeping. > I don't think we've heard from Taylor, but I saw his original patches > were in 'next'. I'm happy to clean up what I posted, but I'm also happy > if we just merge what's in next and move on. Sorry that this fell to the bottom of my queue, which I am just digging out of now that 2.42.0 has been tagged. I think that the clean-up you suggested is worthwhile. Let's replace what we have in 'next' with the reroll that I'm about to submit... Thanks, Taylor