diff mbox series

[2/4] commit-graph: verify swapped zero/non-zero generation cases

Message ID 9b9483893c072961c5871bd0bae17a7098d73c06.1691699851.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series commit-graph: fsck zero/non-zero generation number fixes | expand

Commit Message

Taylor Blau Aug. 10, 2023, 8:37 p.m. UTC
From: Jeff King <peff@peff.net>

In verify_one_commit_graph(), we have code that complains when a commit
is found with a generation number of zero, and then later with a
non-zero number. It works like this:

  1. When we see an entry with generation zero, we set the
     generation_zero flag to GENERATION_ZERO_EXISTS.

  2. When we later see an entry with a non-zero generation, we complain
     if the flag is GENERATION_ZERO_EXISTS.

There's a matching GENERATION_NUMBER_EXISTS value, which in theory would
be used to find the case that we see the entries in the opposite order:

  1. When we see an entry with a non-zero generation, we set the
     generation_zero flag to GENERATION_NUMBER_EXISTS.

  2. When we later see an entry with a zero generation, we complain if
     the flag is GENERATION_NUMBER_EXISTS.

But that doesn't work; step 2 is implemented, but there is no step 1. We
never use NUMBER_EXISTS at all, and Coverity rightly complains that step
2 is dead code.

We can fix that by implementing that step 1.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Aug. 10, 2023, 9:36 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/commit-graph.c b/commit-graph.c
> index c68f5c6b3a..acca753ce8 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2686,9 +2686,12 @@ static int verify_one_commit_graph(struct repository *r,
>  				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
>  					     oid_to_hex(&cur_oid));
>  			generation_zero = GENERATION_ZERO_EXISTS;
> -		} else if (generation_zero == GENERATION_ZERO_EXISTS)
> -			graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
> -				     oid_to_hex(&cur_oid));
> +		} else {
> +			if (generation_zero == GENERATION_ZERO_EXISTS)
> +				graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
> +					     oid_to_hex(&cur_oid));
> +			generation_zero = GENERATION_NUMBER_EXISTS;
> +		}

Hmph, doesn't this potentially cause us to emit the two reports
alternating, if we are unlucky enough to see a commit with 0
generation first (which will silently set gz to ZERO_EXISTS), then
another commit with non-zero generation (which will complain we saw
non-zero for the current one and earlier we saw zero elsewhere, and
then set gz to NUM_EXISTS), and then another commit with 0
generation (which will complain the other way, and set gz back again
to ZERO_EXISTS)?

I am tempted to say this gz business should be done with two bits
(seen zero bit and seen non-zero bit), and immediately after we see
both kinds, we should report once and stop making further reports,
but ...

>  		if (generation_zero == GENERATION_ZERO_EXISTS)
>  			continue;

... as I do not see what this "continue" is doing, I'd stop at
expressing my puzzlement ;-)

Thanks.
Jeff King Aug. 11, 2023, 3:01 p.m. UTC | #2
On Thu, Aug 10, 2023 at 02:36:06PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > diff --git a/commit-graph.c b/commit-graph.c
> > index c68f5c6b3a..acca753ce8 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -2686,9 +2686,12 @@ static int verify_one_commit_graph(struct repository *r,
> >  				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
> >  					     oid_to_hex(&cur_oid));
> >  			generation_zero = GENERATION_ZERO_EXISTS;
> > -		} else if (generation_zero == GENERATION_ZERO_EXISTS)
> > -			graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
> > -				     oid_to_hex(&cur_oid));
> > +		} else {
> > +			if (generation_zero == GENERATION_ZERO_EXISTS)
> > +				graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
> > +					     oid_to_hex(&cur_oid));
> > +			generation_zero = GENERATION_NUMBER_EXISTS;
> > +		}
> 
> Hmph, doesn't this potentially cause us to emit the two reports
> alternating, if we are unlucky enough to see a commit with 0
> generation first (which will silently set gz to ZERO_EXISTS), then
> another commit with non-zero generation (which will complain we saw
> non-zero for the current one and earlier we saw zero elsewhere, and
> then set gz to NUM_EXISTS), and then another commit with 0
> generation (which will complain the other way, and set gz back again
> to ZERO_EXISTS)?
> 
> I am tempted to say this gz business should be done with two bits
> (seen zero bit and seen non-zero bit), and immediately after we see
> both kinds, we should report once and stop making further reports,
> but ...

Yeah, I think you are right. It might be OK, in the sense that we would
show a different commit each time as we flip-flopped, but it's not clear
to me how valuable that is.

If the actual commit ids are not valuable, then we could just set bits
and then at the end of the loop produce one warning:

  if (seen_zero && seen_non_zero) {
	graph_report("oops, we saw both types");
  }

Certainly that would make the code less confusing to me. :) But I really
don't know if marking the individual commit is useful or not (on the
other hand, it cannot be perfect, since when we see a mismatch we do not
know if it was _this_ commit that is wrong and the previous one is
right, or if the previous one was wrong and this one was right). I guess
we could also save an example of each type and report them (i.e., make
seen_zero and seen_non_zero pointers to commits/oids).

> >  		if (generation_zero == GENERATION_ZERO_EXISTS)
> >  			continue;
> 
> ... as I do not see what this "continue" is doing, I'd stop at
> expressing my puzzlement ;-)

Yeah, I'm not sure on this bit. I had thought at first it was just
trying to avoid the rest of the loop for commits which are 0-generation.
But after Taylor's explanation that this is about whole files with
zero-generations, it makes sense that we would not do the rest of the
loop for any commit (it is already an error to have mixed zero/non-zero
entries, so the file fails verification).

In a "two bits" world, I think this just becomes:

  if (seen_zero)
	continue;

-Peff
Taylor Blau Aug. 11, 2023, 5:08 p.m. UTC | #3
On Fri, Aug 11, 2023 at 11:01:14AM -0400, Jeff King wrote:
> > Hmph, doesn't this potentially cause us to emit the two reports
> > alternating, if we are unlucky enough to see a commit with 0
> > generation first (which will silently set gz to ZERO_EXISTS), then
> > another commit with non-zero generation (which will complain we saw
> > non-zero for the current one and earlier we saw zero elsewhere, and
> > then set gz to NUM_EXISTS), and then another commit with 0
> > generation (which will complain the other way, and set gz back again
> > to ZERO_EXISTS)?
> >
> > I am tempted to say this gz business should be done with two bits
> > (seen zero bit and seen non-zero bit), and immediately after we see
> > both kinds, we should report once and stop making further reports,
> > but ...
>
> Yeah, I think you are right. It might be OK, in the sense that we would
> show a different commit each time as we flip-flopped, but it's not clear
> to me how valuable that is.
>
> If the actual commit ids are not valuable, then we could just set bits
> and then at the end of the loop produce one warning:
>
>   if (seen_zero && seen_non_zero) {
> 	graph_report("oops, we saw both types");
>   }

Thanks, both. I think that this is a very reasonable suggestion and an
improvement on the existing reporting. I made this change and sent the
revised series as a v2 lower down in the thread.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index c68f5c6b3a..acca753ce8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2686,9 +2686,12 @@  static int verify_one_commit_graph(struct repository *r,
 				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
 					     oid_to_hex(&cur_oid));
 			generation_zero = GENERATION_ZERO_EXISTS;
-		} else if (generation_zero == GENERATION_ZERO_EXISTS)
-			graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
-				     oid_to_hex(&cur_oid));
+		} else {
+			if (generation_zero == GENERATION_ZERO_EXISTS)
+				graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
+					     oid_to_hex(&cur_oid));
+			generation_zero = GENERATION_NUMBER_EXISTS;
+		}
 
 		if (generation_zero == GENERATION_ZERO_EXISTS)
 			continue;