diff mbox series

commit-graph: fix truncated generation numbers

Message ID f8a0a869e8b0882f05cac49d78f49ba3553d3c44.1679904401.git.ps@pks.im (mailing list archive)
State Accepted
Commit d3af1c193d4d62668a9d7ea98e2ef3771ac4e65a
Headers show
Series commit-graph: fix truncated generation numbers | expand

Commit Message

Patrick Steinhardt March 27, 2023, 8:08 a.m. UTC
In 80c928d947 (commit-graph: simplify compute_generation_numbers(),
2023-03-20), the code to compute generation numbers was simplified to
use the same infrastructure as is used to compute topological levels.
This refactoring introduced a bug where the generation numbers are
truncated when they exceed UINT32_MAX because we explicitly cast the
computed generation number to `uint32_t`. This is not required though:
both the computed value and the field of `struct commit_graph_data` are
of the same type `timestamp_t` already, so casting to `uint32_t` will
cause truncation.

This cast can cause us to miscompute generation data overflows:

    1. Given a commit with no parents and committer date
       `UINT32_MAX + 1`.

    2. We compute its generation number as `UINT32_MAX + 1`, but
       truncate it to `1`.

    3. We calculate the generation offset via `$generation - $date`,
       which is thus `1 - (UINT32_MAX + 1)`. The computation underflows
       and we thus end up with an offset that is bigger than the maximum
       allowed offset.

As a result, we'd be writing generation data overflow information into
the commit-graph that is bogus and ultimately not even required.

Fix this bug by removing the needless cast.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

This commit applies on top of cbfe360b14 (commit-reach: add
tips_reachable_from_bases(), 2023-03-20), which has recently been merged
to next.

 commit-graph.c                     | 2 +-
 t/t5328-commit-graph-64bit-time.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

Comments

Junio C Hamano March 28, 2023, 5:45 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> In 80c928d947 (commit-graph: simplify compute_generation_numbers(),
> 2023-03-20), the code to compute generation numbers was simplified to
> use the same infrastructure as is used to compute topological levels.
> This refactoring introduced a bug where the generation numbers are
> truncated when they exceed UINT32_MAX because we explicitly cast the
> computed generation number to `uint32_t`. This is not required though:
> both the computed value and the field of `struct commit_graph_data` are
> of the same type `timestamp_t` already, so casting to `uint32_t` will
> cause truncation.
>
> This cast can cause us to miscompute generation data overflows:
>
>     1. Given a commit with no parents and committer date
>        `UINT32_MAX + 1`.
>
>     2. We compute its generation number as `UINT32_MAX + 1`, but
>        truncate it to `1`.
>
>     3. We calculate the generation offset via `$generation - $date`,
>        which is thus `1 - (UINT32_MAX + 1)`. The computation underflows
>        and we thus end up with an offset that is bigger than the maximum
>        allowed offset.
>
> As a result, we'd be writing generation data overflow information into
> the commit-graph that is bogus and ultimately not even required.
>
> Fix this bug by removing the needless cast.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>
> This commit applies on top of cbfe360b14 (commit-reach: add
> tips_reachable_from_bases(), 2023-03-20), which has recently been merged
> to next.

The patch is clearly explained and the change looks quite
straight-forward.  Derrick, Ack?

Thanks.
Derrick Stolee March 28, 2023, 6:32 p.m. UTC | #2
On 3/28/23 1:45 PM, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
>> In 80c928d947 (commit-graph: simplify compute_generation_numbers(),
>> 2023-03-20), the code to compute generation numbers was simplified to
>> use the same infrastructure as is used to compute topological levels.
>> This refactoring introduced a bug where the generation numbers are
>> truncated when they exceed UINT32_MAX because we explicitly cast the
>> computed generation number to `uint32_t`. This is not required though:
>> both the computed value and the field of `struct commit_graph_data` are
>> of the same type `timestamp_t` already, so casting to `uint32_t` will
>> cause truncation.
>>
>> This cast can cause us to miscompute generation data overflows:
>>
>>     1. Given a commit with no parents and committer date
>>        `UINT32_MAX + 1`.
>>
>>     2. We compute its generation number as `UINT32_MAX + 1`, but
>>        truncate it to `1`.
>>
>>     3. We calculate the generation offset via `$generation - $date`,
>>        which is thus `1 - (UINT32_MAX + 1)`. The computation underflows
>>        and we thus end up with an offset that is bigger than the maximum
>>        allowed offset.
>>
>> As a result, we'd be writing generation data overflow information into
>> the commit-graph that is bogus and ultimately not even required.
>>
>> Fix this bug by removing the needless cast.
>>
>> Signed-off-by: Patrick Steinhardt <ps@pks.im>
>> ---
>>
>> This commit applies on top of cbfe360b14 (commit-reach: add
>> tips_reachable_from_bases(), 2023-03-20), which has recently been merged
>> to next.
> 
> The patch is clearly explained and the change looks quite
> straight-forward.  Derrick, Ack?

Yes, looks good. What a silly mistake, but thanks for going
the extra mile to introduce a test that will prevent it in
the future.

Thanks,
-Stolee
Taylor Blau March 28, 2023, 6:39 p.m. UTC | #3
On Mon, Mar 27, 2023 at 10:08:25AM +0200, Patrick Steinhardt wrote:
> In 80c928d947 (commit-graph: simplify compute_generation_numbers(),
> 2023-03-20), the code to compute generation numbers was simplified to
> use the same infrastructure as is used to compute topological levels.
> This refactoring introduced a bug where the generation numbers are
> truncated when they exceed UINT32_MAX because we explicitly cast the
> computed generation number to `uint32_t`. This is not required though:
> both the computed value and the field of `struct commit_graph_data` are
> of the same type `timestamp_t` already, so casting to `uint32_t` will
> cause truncation.

Yes, well spotted and explained. Indeed, the `generation` field of the
`commit_graph_data` struct uses our custom `timestamp_t`, so this cast
is unnecessary.

And it's fine to have a value greater than 2<<32-1 here, since we can
represent values that require more than 32-bits of storage. For that we
rely on the extended offset table (in this case, GDA2).

> This cast can cause us to miscompute generation data overflows:
>
>     1. Given a commit with no parents and committer date
>        `UINT32_MAX + 1`.

FWIW, I don't think this is the only way to trigger this bug. It would
also work if there was a commit C whose maximum parent's generation
number is (2<<32-1), in which case C's generation number would be 2<<32,
and trigger our overflow here.

> This commit applies on top of cbfe360b14 (commit-reach: add
> tips_reachable_from_bases(), 2023-03-20), which has recently been merged
> to next.

This all looks good to me, I'd be happy to see this squashed into that
topic on 'next'.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 172e679db1..b96509354e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1565,7 +1565,7 @@  static timestamp_t get_generation_from_graph_data(struct commit *c, void *data)
 static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
 {
 	struct commit_graph_data *g = commit_graph_data_at(c);
-	g->generation = (uint32_t)t;
+	g->generation = t;
 }
 
 static void compute_generation_numbers(struct write_commit_graph_context *ctx)
diff --git a/t/t5328-commit-graph-64bit-time.sh b/t/t5328-commit-graph-64bit-time.sh
index 093f0c067a..57e4d9c699 100755
--- a/t/t5328-commit-graph-64bit-time.sh
+++ b/t/t5328-commit-graph-64bit-time.sh
@@ -63,4 +63,13 @@  test_expect_success 'set up and verify repo with generation data overflow chunk'
 
 graph_git_behavior 'overflow 2' repo left right
 
+test_expect_success 'single commit with generation data exceeding UINT32_MAX' '
+	git init repo-uint32-max &&
+	cd repo-uint32-max &&
+	test_commit --date "@4294967297 +0000" 1 &&
+	git commit-graph write --reachable &&
+	graph_read_expect 1 "generation_data" &&
+	git commit-graph verify
+'
+
 test_done