mbox series

[v2,0/5] commit-graph: fsck zero/non-zero generation number fixes

Message ID cover.1691773533.git.me@ttaylorr.com (mailing list archive)
Headers show
Series commit-graph: fsck zero/non-zero generation number fixes | expand

Message

Taylor Blau Aug. 11, 2023, 5:05 p.m. UTC
Here's a small reroll of a series that I sent which expanded on a patch
that Peff sent earlier in the thread to remove a section of unreachable
code that was noticed by Coverity in the `verify_one_commit_graph()`
function.

Everything is the same in the first three patches. The fourth patch is
slightly modified to (in addition to flipping the conditional) extract
the mixed zero/non-zero generation number checks out to its own
function.

The fifth patch is new, and avoids repeatedly warning about mixed
generation numbers by treating `generation_zero` as a bitfield.

Thanks as always for your review!

Jeff King (1):
  commit-graph: verify swapped zero/non-zero generation cases

Taylor Blau (4):
  commit-graph: introduce `commit_graph_generation_from_graph()`
  t/t5318-commit-graph.sh: test generation zero transitions during fsck
  commit-graph: invert negated conditional, extract to function
  commit-graph: avoid repeated mixed generation number warnings

 commit-graph.c          | 48 ++++++++++++++++++++++++++++++++---------
 t/t5318-commit-graph.sh | 32 +++++++++++++++++++++------
 2 files changed, 64 insertions(+), 16 deletions(-)

Range-diff against v1:
1:  6ea610f7d2 < -:  ---------- commit-graph: invert negated conditional
-:  ---------- > 1:  701c198e19 commit-graph: introduce `commit_graph_generation_from_graph()`
-:  ---------- > 2:  9b9483893c commit-graph: verify swapped zero/non-zero generation cases
-:  ---------- > 3:  8679db3d0e t/t5318-commit-graph.sh: test generation zero transitions during fsck
-:  ---------- > 4:  32b5d69ebe commit-graph: invert negated conditional, extract to function
-:  ---------- > 5:  b82b15ebc8 commit-graph: avoid repeated mixed generation number warnings

Comments

Jeff King Aug. 11, 2023, 5:58 p.m. UTC | #1
On Fri, Aug 11, 2023 at 01:05:39PM -0400, Taylor Blau wrote:

> Here's a small reroll of a series that I sent which expanded on a patch
> that Peff sent earlier in the thread to remove a section of unreachable
> code that was noticed by Coverity in the `verify_one_commit_graph()`
> function.
> 
> Everything is the same in the first three patches. The fourth patch is
> slightly modified to (in addition to flipping the conditional) extract
> the mixed zero/non-zero generation number checks out to its own
> function.
> 
> The fifth patch is new, and avoids repeatedly warning about mixed
> generation numbers by treating `generation_zero` as a bitfield.

This all looks correct to me. Let me show what I thought the result
might look like, just because I think the result is a bit simpler.  We
may be hitting diminishing returns on refactoring here, though, so if
you're not wildly impressed, I'm happy enough if we go with what you've
written.

This applies on top of yours, but probably would replace patches 2, 4,
and 5 (the flip-flop case isn't even really worth testing after this,
since the message can obviously only be shown once).

 commit-graph.c          | 42 +++++++++--------------------------
 t/t5318-commit-graph.sh | 18 ++-------------
 2 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8d924b4509..079bbc8598 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2560,45 +2560,19 @@ static void graph_report(const char *fmt, ...)
 	va_end(ap);
 }
 
-#define GENERATION_ZERO_EXISTS 1
-#define GENERATION_NUMBER_EXISTS 2
-#define GENERATION_NUMBER_BOTH_EXIST \
-	(GENERATION_ZERO_EXISTS | GENERATION_NUMBER_EXISTS)
-
 static int commit_graph_checksum_valid(struct commit_graph *g)
 {
 	return hashfile_checksum_valid(g->data, g->data_len);
 }
 
-static void verify_mixed_generation_numbers(struct commit_graph *g,
-					    struct commit *graph_commit,
-					    int *generation_zero)
-{
-	if (commit_graph_generation_from_graph(graph_commit)) {
-		if (*generation_zero & GENERATION_ZERO_EXISTS)
-			graph_report(_("commit-graph has non-zero generation "
-				       "number for commit %s, but zero "
-				       "elsewhere"),
-				     oid_to_hex(&graph_commit->object.oid));
-		*generation_zero |= GENERATION_NUMBER_EXISTS;
-	} else {
-		if (*generation_zero & GENERATION_NUMBER_EXISTS)
-			graph_report(_("commit-graph has generation number "
-				       "zero for commit %s, but non-zero "
-				       "elsewhere"),
-				     oid_to_hex(&graph_commit->object.oid));
-		*generation_zero |= GENERATION_ZERO_EXISTS;
-	}
-}
-
 static int verify_one_commit_graph(struct repository *r,
 				   struct commit_graph *g,
 				   struct progress *progress,
 				   uint64_t *seen)
 {
 	uint32_t i, cur_fanout_pos = 0;
 	struct object_id prev_oid, cur_oid;
-	int generation_zero = 0;
+	struct commit *seen_gen_zero = NULL, *seen_gen_non_zero = NULL;
 
 	verify_commit_graph_error = verify_commit_graph_lite(g);
 	if (verify_commit_graph_error)
@@ -2704,11 +2678,12 @@ static int verify_one_commit_graph(struct repository *r,
 			graph_report(_("commit-graph parent list for commit %s terminates early"),
 				     oid_to_hex(&cur_oid));
 
-		if (generation_zero != GENERATION_NUMBER_BOTH_EXIST)
-			verify_mixed_generation_numbers(g, graph_commit,
-							&generation_zero);
+		if (!commit_graph_generation_from_graph(graph_commit))
+			seen_gen_zero = graph_commit;
+		else
+			seen_gen_non_zero = graph_commit;
 
-		if (generation_zero & GENERATION_ZERO_EXISTS)
+		if (seen_gen_zero)
 			continue;
 
 		/*
@@ -2734,6 +2709,11 @@ static int verify_one_commit_graph(struct repository *r,
 				     odb_commit->date);
 	}
 
+	if (seen_gen_zero && seen_gen_non_zero)
+		graph_report(_("commit-graph has both zero and non-zero generations (e.g., commits %s and %s"),
+			     oid_to_hex(&seen_gen_zero->object.oid),
+			     oid_to_hex(&seen_gen_non_zero->object.oid));
+
 	return verify_commit_graph_error;
 }
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2626d41c94..ca5e2c87ae 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -620,26 +620,12 @@ test_expect_success 'detect incorrect chunk count' '
 
 test_expect_success 'detect mixed generation numbers (non-zero to zero)' '
 	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION_LAST "\0\0\0\0" \
-		"but non-zero elsewhere"
+		"both zero and non-zero"
 '
 
 test_expect_success 'detect mixed generation numbers (zero to non-zero)' '
 	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\0\0\0\0" \
-		"but zero elsewhere"
-'
-
-test_expect_success 'detect mixed generation numbers (flip-flop)' '
-	corrupt_graph_setup &&
-	for pos in \
-		$GRAPH_BYTE_COMMIT_GENERATION \
-		$GRAPH_BYTE_COMMIT_GENERATION_LAST
-	do
-		printf "\0\0\0\0" | dd of="full/$objdir/info/commit-graph" bs=1 \
-		seek="$pos" conv=notrunc || return 1
-	done &&
-
-	test_must_fail git -C full commit-graph verify 2>err &&
-	test 1 -eq "$(grep -c "generation number" err)"
+		"both zero and non-zero"
 '
 
 test_expect_success 'git fsck (checks commit-graph when config set to true)' '
Junio C Hamano Aug. 11, 2023, 7:28 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> This applies on top of yours, but probably would replace patches 2, 4,
> and 5 (the flip-flop case isn't even really worth testing after this,
> since the message can obviously only be shown once).
>
>  commit-graph.c          | 42 +++++++++--------------------------
>  t/t5318-commit-graph.sh | 18 ++-------------
>  2 files changed, 13 insertions(+), 47 deletions(-)

Quite an impressive amount of code reduction.  I obviously like it.

One very minor thing is that how much value are we getting by
reporting the object names of one example from each camp, instead of
just reporting a single bit "we have commits not counted and also
counted their generations, which is an anomaly".

Obviously it does not matter.  Even if we stopped doing so, the code
would not become much simpler.  We'd just use a word with two bits
instead of two pointers to existing in-core objects, which does not
have meaningful performance implications either way.

Thanks.
Jeff King Aug. 17, 2023, 7:51 p.m. UTC | #3
On Fri, Aug 11, 2023 at 12:28:49PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This applies on top of yours, but probably would replace patches 2, 4,
> > and 5 (the flip-flop case isn't even really worth testing after this,
> > since the message can obviously only be shown once).
> >
> >  commit-graph.c          | 42 +++++++++--------------------------
> >  t/t5318-commit-graph.sh | 18 ++-------------
> >  2 files changed, 13 insertions(+), 47 deletions(-)
> 
> Quite an impressive amount of code reduction.  I obviously like it.
> 
> One very minor thing is that how much value are we getting by
> reporting the object names of one example from each camp, instead of
> just reporting a single bit "we have commits not counted and also
> counted their generations, which is an anomaly".
> 
> Obviously it does not matter.  Even if we stopped doing so, the code
> would not become much simpler.  We'd just use a word with two bits
> instead of two pointers to existing in-core objects, which does not
> have meaningful performance implications either way.

Yeah, I wasn't sure if the commit names were valuable or not. Two bits
would definitely work (though I have a slight preference for two
boolean variables, just because I find the syntax easier to read).

I don't think we've heard from Taylor, but I saw his original patches
were in 'next'. I'm happy to clean up what I posted, but I'm also happy
if we just merge what's in next and move on.

-Peff
Taylor Blau Aug. 21, 2023, 9:25 p.m. UTC | #4
On Thu, Aug 17, 2023 at 03:51:08PM -0400, Jeff King wrote:
> > One very minor thing is that how much value are we getting by
> > reporting the object names of one example from each camp, instead of
> > just reporting a single bit "we have commits not counted and also
> > counted their generations, which is an anomaly".
> >
> > Obviously it does not matter.  Even if we stopped doing so, the code
> > would not become much simpler.  We'd just use a word with two bits
> > instead of two pointers to existing in-core objects, which does not
> > have meaningful performance implications either way.
>
> Yeah, I wasn't sure if the commit names were valuable or not. Two bits
> would definitely work (though I have a slight preference for two
> boolean variables, just because I find the syntax easier to read).

I think having a single example of both a commit with zero and non-zero
generation is marginally useful. I think keeping track of two commit
pointers is more straightforward than the bit-field, and it does not
complicate things too much, so I think it is worth keeping.

> I don't think we've heard from Taylor, but I saw his original patches
> were in 'next'. I'm happy to clean up what I posted, but I'm also happy
> if we just merge what's in next and move on.

Sorry that this fell to the bottom of my queue, which I am just digging
out of now that 2.42.0 has been tagged.

I think that the clean-up you suggested is worthwhile. Let's replace
what we have in 'next' with the reroll that I'm about to submit...

Thanks,
Taylor