diff mbox series

[v2,2/4] commit-graph: fix ordering bug in generation numbers

Message ID cbcbf10e6994d469abc5463d08165e6a2ba73153.1646056423.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Commit-graph: Generation Number v2 Fixes | expand

Commit Message

Derrick Stolee Feb. 28, 2022, 1:53 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

When computing the generation numbers for a commit-graph, we compute
the corrected commit dates and then check if their offsets from the
actual dates is too large to fit in the 32-bit Generation Data chunk.
However, there is a problem with this approach: if we have parsed the
generation data from the previous commit-graph, then we continue the
loop because the corrected commit date is already computed. This causes
an under-count in the number of overflow values.

It is incorrect to add an increment to num_generation_data_overflows
next to this 'continue' statement, because we might start
double-counting commits that are computed because of the depth-first
search walk from a commit with an earlier OID.

Instead, iterate over the full commit list at the end, checking the
offsets to see how many grow beyond the maximum value.

Update a test in t5318 to use a larger time value, which will help
demonstrate this bug in more cases. It still won't hit all potential
cases until the next change, which reenables reading generation numbers.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c          | 10 +++++++---
 t/t5318-commit-graph.sh |  4 ++--
 2 files changed, 9 insertions(+), 5 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Feb. 28, 2022, 3:25 p.m. UTC | #1
On Mon, Feb 28 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
> diff --git a/commit-graph.c b/commit-graph.c
> index 265c010122e..a19bd96c2ee 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1556,12 +1556,16 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  				if (current->date && current->date > max_corrected_commit_date)
>  					max_corrected_commit_date = current->date - 1;
>  				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> -
> -				if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> -					ctx->num_generation_data_overflows++;
>  			}
>  		}
>  	}
> +
> +	for (i = 0; i < ctx->commits.nr; i++) {
> +		struct commit *c = ctx->commits.list[i];
> +		timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
> +		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
> +			ctx->num_generation_data_overflows++;
> +	}
>  	stop_progress(&ctx->progress);
>  }
>  
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 2b05026cf6d..f9bffe38013 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -467,10 +467,10 @@ test_expect_success 'warn on improper hash version' '
>  	)
>  '
>  
> -test_expect_success 'lower layers have overflow chunk' '
> +test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' '
>  	cd "$TRASH_DIRECTORY/full" &&
>  	UNIX_EPOCH_ZERO="@0 +0000" &&
> -	FUTURE_DATE="@2147483646 +0000" &&
> +	FUTURE_DATE="@4147483646 +0000" &&
>  	rm -f .git/objects/info/commit-graph &&
>  	test_commit --date "$FUTURE_DATE" future-1 &&
>  	test_commit --date "$UNIX_EPOCH_ZERO" old-1 &&

Isn't it worth splitting up this test instead, so that we can test cases
where 32 bit timestamps overflow without these new prereqs.

Unless I'm missing something that would just be a matter of splitting
this test into helper that takes that $FUTURE_DATE as an argument, then
running it for both timestamps, with TIME_IS_64BIT,TIME_T_IS_64BIT on
the 64 bit one.

Or maybe I don't get it, but it seems like we're throwing out some
carefully considered testing for 32 bit compatibility with the
proverbial bath water here....

Aside from that I wonder how this interacts with both:

    #define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)

And this existing code, where offset is timestamp_t, but
num_generation_data_overflows is an "int":

    offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;

That proooobably does the right thing if int is say 32 bit, but
timestamp_t is 64 bit (does such an OS exist?), but maybe worth looking
at with a second pair of eyes...
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 265c010122e..a19bd96c2ee 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1556,12 +1556,16 @@  static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 				if (current->date && current->date > max_corrected_commit_date)
 					max_corrected_commit_date = current->date - 1;
 				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
-
-				if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
-					ctx->num_generation_data_overflows++;
 			}
 		}
 	}
+
+	for (i = 0; i < ctx->commits.nr; i++) {
+		struct commit *c = ctx->commits.list[i];
+		timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
+		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
+			ctx->num_generation_data_overflows++;
+	}
 	stop_progress(&ctx->progress);
 }
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2b05026cf6d..f9bffe38013 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -467,10 +467,10 @@  test_expect_success 'warn on improper hash version' '
 	)
 '
 
-test_expect_success 'lower layers have overflow chunk' '
+test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' '
 	cd "$TRASH_DIRECTORY/full" &&
 	UNIX_EPOCH_ZERO="@0 +0000" &&
-	FUTURE_DATE="@2147483646 +0000" &&
+	FUTURE_DATE="@4147483646 +0000" &&
 	rm -f .git/objects/info/commit-graph &&
 	test_commit --date "$FUTURE_DATE" future-1 &&
 	test_commit --date "$UNIX_EPOCH_ZERO" old-1 &&