Message ID | f8a0a869e8b0882f05cac49d78f49ba3553d3c44.1679904401.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | d3af1c193d4d62668a9d7ea98e2ef3771ac4e65a |
Headers | show |
Series | commit-graph: fix truncated generation numbers | expand |
Patrick Steinhardt <ps@pks.im> writes: > In 80c928d947 (commit-graph: simplify compute_generation_numbers(), > 2023-03-20), the code to compute generation numbers was simplified to > use the same infrastructure as is used to compute topological levels. > This refactoring introduced a bug where the generation numbers are > truncated when they exceed UINT32_MAX because we explicitly cast the > computed generation number to `uint32_t`. This is not required though: > both the computed value and the field of `struct commit_graph_data` are > of the same type `timestamp_t` already, so casting to `uint32_t` will > cause truncation. > > This cast can cause us to miscompute generation data overflows: > > 1. Given a commit with no parents and committer date > `UINT32_MAX + 1`. > > 2. We compute its generation number as `UINT32_MAX + 1`, but > truncate it to `1`. > > 3. We calculate the generation offset via `$generation - $date`, > which is thus `1 - (UINT32_MAX + 1)`. The computation underflows > and we thus end up with an offset that is bigger than the maximum > allowed offset. > > As a result, we'd be writing generation data overflow information into > the commit-graph that is bogus and ultimately not even required. > > Fix this bug by removing the needless cast. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > > This commit applies on top of cbfe360b14 (commit-reach: add > tips_reachable_from_bases(), 2023-03-20), which has recently been merged > to next. The patch is clearly explained and the change looks quite straight-forward. Derrick, Ack? Thanks.
On 3/28/23 1:45 PM, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > >> In 80c928d947 (commit-graph: simplify compute_generation_numbers(), >> 2023-03-20), the code to compute generation numbers was simplified to >> use the same infrastructure as is used to compute topological levels. >> This refactoring introduced a bug where the generation numbers are >> truncated when they exceed UINT32_MAX because we explicitly cast the >> computed generation number to `uint32_t`. This is not required though: >> both the computed value and the field of `struct commit_graph_data` are >> of the same type `timestamp_t` already, so casting to `uint32_t` will >> cause truncation. >> >> This cast can cause us to miscompute generation data overflows: >> >> 1. Given a commit with no parents and committer date >> `UINT32_MAX + 1`. >> >> 2. We compute its generation number as `UINT32_MAX + 1`, but >> truncate it to `1`. >> >> 3. We calculate the generation offset via `$generation - $date`, >> which is thus `1 - (UINT32_MAX + 1)`. The computation underflows >> and we thus end up with an offset that is bigger than the maximum >> allowed offset. >> >> As a result, we'd be writing generation data overflow information into >> the commit-graph that is bogus and ultimately not even required. >> >> Fix this bug by removing the needless cast. >> >> Signed-off-by: Patrick Steinhardt <ps@pks.im> >> --- >> >> This commit applies on top of cbfe360b14 (commit-reach: add >> tips_reachable_from_bases(), 2023-03-20), which has recently been merged >> to next. > > The patch is clearly explained and the change looks quite > straight-forward. Derrick, Ack? Yes, looks good. What a silly mistake, but thanks for going the extra mile to introduce a test that will prevent it in the future. Thanks, -Stolee
On Mon, Mar 27, 2023 at 10:08:25AM +0200, Patrick Steinhardt wrote: > In 80c928d947 (commit-graph: simplify compute_generation_numbers(), > 2023-03-20), the code to compute generation numbers was simplified to > use the same infrastructure as is used to compute topological levels. > This refactoring introduced a bug where the generation numbers are > truncated when they exceed UINT32_MAX because we explicitly cast the > computed generation number to `uint32_t`. This is not required though: > both the computed value and the field of `struct commit_graph_data` are > of the same type `timestamp_t` already, so casting to `uint32_t` will > cause truncation. Yes, well spotted and explained. Indeed, the `generation` field of the `commit_graph_data` struct uses our custom `timestamp_t`, so this cast is unnecessary. And it's fine to have a value greater than 2<<32-1 here, since we can represent values that require more than 32-bits of storage. For that we rely on the extended offset table (in this case, GDA2). > This cast can cause us to miscompute generation data overflows: > > 1. Given a commit with no parents and committer date > `UINT32_MAX + 1`. FWIW, I don't think this is the only way to trigger this bug. It would also work if there was a commit C whose maximum parent's generation number is (2<<32-1), in which case C's generation number would be 2<<32, and trigger our overflow here. > This commit applies on top of cbfe360b14 (commit-reach: add > tips_reachable_from_bases(), 2023-03-20), which has recently been merged > to next. This all looks good to me, I'd be happy to see this squashed into that topic on 'next'. Thanks, Taylor
diff --git a/commit-graph.c b/commit-graph.c index 172e679db1..b96509354e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1565,7 +1565,7 @@ static timestamp_t get_generation_from_graph_data(struct commit *c, void *data) static void set_generation_v2(struct commit *c, timestamp_t t, void *data) { struct commit_graph_data *g = commit_graph_data_at(c); - g->generation = (uint32_t)t; + g->generation = t; } static void compute_generation_numbers(struct write_commit_graph_context *ctx) diff --git a/t/t5328-commit-graph-64bit-time.sh b/t/t5328-commit-graph-64bit-time.sh index 093f0c067a..57e4d9c699 100755 --- a/t/t5328-commit-graph-64bit-time.sh +++ b/t/t5328-commit-graph-64bit-time.sh @@ -63,4 +63,13 @@ test_expect_success 'set up and verify repo with generation data overflow chunk' graph_git_behavior 'overflow 2' repo left right +test_expect_success 'single commit with generation data exceeding UINT32_MAX' ' + git init repo-uint32-max && + cd repo-uint32-max && + test_commit --date "@4294967297 +0000" 1 && + git commit-graph write --reachable && + graph_read_expect 1 "generation_data" && + git commit-graph verify +' + test_done
In 80c928d947 (commit-graph: simplify compute_generation_numbers(), 2023-03-20), the code to compute generation numbers was simplified to use the same infrastructure as is used to compute topological levels. This refactoring introduced a bug where the generation numbers are truncated when they exceed UINT32_MAX because we explicitly cast the computed generation number to `uint32_t`. This is not required though: both the computed value and the field of `struct commit_graph_data` are of the same type `timestamp_t` already, so casting to `uint32_t` will cause truncation. This cast can cause us to miscompute generation data overflows: 1. Given a commit with no parents and committer date `UINT32_MAX + 1`. 2. We compute its generation number as `UINT32_MAX + 1`, but truncate it to `1`. 3. We calculate the generation offset via `$generation - $date`, which is thus `1 - (UINT32_MAX + 1)`. The computation underflows and we thus end up with an offset that is bigger than the maximum allowed offset. As a result, we'd be writing generation data overflow information into the commit-graph that is bogus and ultimately not even required. Fix this bug by removing the needless cast. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- This commit applies on top of cbfe360b14 (commit-reach: add tips_reachable_from_bases(), 2023-03-20), which has recently been merged to next. commit-graph.c | 2 +- t/t5328-commit-graph-64bit-time.sh | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletion(-)