Message ID | a3a70a1edd0949ff3088fae625afa68fc61975df.1609154169.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Implement Corrected Commit Date | expand |
On 12/28/2020 6:16 AM, Abhishek Kumar via GitGitGadget wrote: > From: Abhishek Kumar <abhishekkumar8222@gmail.com> ... > +static void validate_mixed_generation_chain(struct commit_graph *g) > +{ > + int read_generation_data; > + > + if (!g) > + return; > + > + read_generation_data = !!g->chunk_generation_data; > + > + while (g) { > + g->read_generation_data = read_generation_data; > + g = g->base_graph; > + } > +} > + This method exists to say "use generation v2 if the top layer has it" and that helps with the future layer checks. > @@ -2239,6 +2263,7 @@ int write_commit_graph(struct object_directory *odb, > struct commit_graph *g = ctx->r->objects->commit_graph; > > while (g) { > + g->read_generation_data = 1; > g->topo_levels = &topo_levels; > g = g->base_graph; > } However, here you just turn them on automatically. I think the diff you want is here: struct commit_graph *g = ctx->r->objects->commit_graph; + validate_mixed_generation_chain(g); + while (g) { g->topo_levels = &topo_levels; g = g->base_graph; } But maybe you have a good reason for what you already have. I paid attention to this because I hit a problem in my local testing. After trying to reproduce it, I think the root cause is that I had a commit-graph that was written by an older version of your series, so it caused an unexpected pairing of an "offset required" bit but no offset chunk. Perhaps this diff is required in the proper place to avoid the segfault I hit, in the case of a malformed commit-graph file: diff --git a/commit-graph.c b/commit-graph.c index c8d7ed1330..d264c90868 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -822,6 +822,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { + if (!g->chunk_generation_data_overflow) + die(_("commit-graph requires overflow generation data but has none")); + offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; graph_data->generation = get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); } else Your tests in this patch seem very thorough, covering all the cases I could think to create this strange situation. I even tried creating cases where the overflow would be necessary. The following test actually fails on the "graph_read_expect 6" due to the extra chunk, not the 'write' process I was trying to trick into failure. diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 8e90f3423b..cfef8e52b9 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -453,6 +453,20 @@ test_expect_success 'prevent regression for duplicate commits across layers' ' git -C dup commit-graph verify ' +test_expect_success 'upgrade to generation data succeeds when there was none' ' + ( + cd dup && + rm -rf .git/objects/info/commit-graph* && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph \ + write --reachable && + GIT_COMMITTER_DATE="1980-01-01 00:00" git commit --allow-empty -m one && + GIT_COMMITTER_DATE="2090-01-01 00:00" git commit --allow-empty -m two && + GIT_COMMITTER_DATE="2000-01-01 00:00" git commit --allow-empty -m three && + git commit-graph write --reachable && + graph_read_expect 6 + ) +' + NUM_FIRST_LAYER_COMMITS=64 NUM_SECOND_LAYER_COMMITS=16 NUM_THIRD_LAYER_COMMITS=7 Thanks, -Stolee
On Tue, Dec 29, 2020 at 10:23:54PM -0500, Derrick Stolee wrote: > On 12/28/2020 6:16 AM, Abhishek Kumar via GitGitGadget wrote: > > From: Abhishek Kumar <abhishekkumar8222@gmail.com> > > ... > > > +static void validate_mixed_generation_chain(struct commit_graph *g) > > +{ > > + int read_generation_data; > > + > > + if (!g) > > + return; > > + > > + read_generation_data = !!g->chunk_generation_data; > > + > > + while (g) { > > + g->read_generation_data = read_generation_data; > > + g = g->base_graph; > > + } > > +} > > + > > This method exists to say "use generation v2 if the top layer has it" > and that helps with the future layer checks. > > > @@ -2239,6 +2263,7 @@ int write_commit_graph(struct object_directory *odb, > > struct commit_graph *g = ctx->r->objects->commit_graph; > > > > while (g) { > > + g->read_generation_data = 1; > > g->topo_levels = &topo_levels; > > g = g->base_graph; > > } > > However, here you just turn them on automatically. > > I think the diff you want is here: > > struct commit_graph *g = ctx->r->objects->commit_graph; > > + validate_mixed_generation_chain(g); > + > while (g) { > g->topo_levels = &topo_levels; > g = g->base_graph; > } > > But maybe you have a good reason for what you already have. > Thanks, that was an oversight. My (incorrect) reasoning at the time was: Since we are computing both topological levels and corrected commit dates, we can read corrected commit dates from layers with a GDAT chunk hidden below non-GDAT layer. But we end up storing both corrected commit date offsets (for a layers with GDAT chunk) and topological level (for layers without GDAT chunk) in the same slab with no way to distinguish between the two! > I paid attention to this because I hit a problem in my local testing. > After trying to reproduce it, I think the root cause is that I had a > commit-graph that was written by an older version of your series, so > it caused an unexpected pairing of an "offset required" bit but no > offset chunk. > > Perhaps this diff is required in the proper place to avoid the > segfault I hit, in the case of a malformed commit-graph file: > > diff --git a/commit-graph.c b/commit-graph.c > index c8d7ed1330..d264c90868 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -822,6 +822,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, > offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); > > if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { > + if (!g->chunk_generation_data_overflow) > + die(_("commit-graph requires overflow generation data but has none")); > + > offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; > graph_data->generation = get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); > } else > > Your tests in this patch seem very thorough, covering all the cases > I could think to create this strange situation. I even tried creating > cases where the overflow would be necessary. The following test actually > fails on the "graph_read_expect 6" due to the extra chunk, not the 'write' > process I was trying to trick into failure. > > diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh > index 8e90f3423b..cfef8e52b9 100755 > --- a/t/t5324-split-commit-graph.sh > +++ b/t/t5324-split-commit-graph.sh > @@ -453,6 +453,20 @@ test_expect_success 'prevent regression for duplicate commits across layers' ' > git -C dup commit-graph verify > ' > > +test_expect_success 'upgrade to generation data succeeds when there was none' ' > + ( > + cd dup && > + rm -rf .git/objects/info/commit-graph* && > + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph \ > + write --reachable && > + GIT_COMMITTER_DATE="1980-01-01 00:00" git commit --allow-empty -m one && > + GIT_COMMITTER_DATE="2090-01-01 00:00" git commit --allow-empty -m two && > + GIT_COMMITTER_DATE="2000-01-01 00:00" git commit --allow-empty -m three && > + git commit-graph write --reachable && > + graph_read_expect 6 > + ) > +' I am not sure what this test adds over the existing generation data overflow related tests added in t5318-commit-graph.sh > + > NUM_FIRST_LAYER_COMMITS=64 > NUM_SECOND_LAYER_COMMITS=16 > NUM_THIRD_LAYER_COMMITS=7 > > Thanks, > -Stolee Thanks - Abhishek
On 1/10/2021 8:13 AM, Abhishek Kumar wrote: > On Tue, Dec 29, 2020 at 10:23:54PM -0500, Derrick Stolee wrote: >> Your tests in this patch seem very thorough, covering all the cases >> I could think to create this strange situation. I even tried creating >> cases where the overflow would be necessary. The following test actually >> fails on the "graph_read_expect 6" due to the extra chunk, not the 'write' >> process I was trying to trick into failure. >> >> diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh >> index 8e90f3423b..cfef8e52b9 100755 >> --- a/t/t5324-split-commit-graph.sh >> +++ b/t/t5324-split-commit-graph.sh >> @@ -453,6 +453,20 @@ test_expect_success 'prevent regression for duplicate commits across layers' ' >> git -C dup commit-graph verify >> ' >> >> +test_expect_success 'upgrade to generation data succeeds when there was none' ' >> + ( >> + cd dup && >> + rm -rf .git/objects/info/commit-graph* && >> + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph \ >> + write --reachable && >> + GIT_COMMITTER_DATE="1980-01-01 00:00" git commit --allow-empty -m one && >> + GIT_COMMITTER_DATE="2090-01-01 00:00" git commit --allow-empty -m two && >> + GIT_COMMITTER_DATE="2000-01-01 00:00" git commit --allow-empty -m three && >> + git commit-graph write --reachable && >> + graph_read_expect 6 >> + ) >> +' > > I am not sure what this test adds over the existing generation data > overflow related tests added in t5318-commit-graph.sh Good point. -Stolee
diff --git a/commit-graph.c b/commit-graph.c index 629b2f17fbc..41a65d98738 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -610,6 +610,21 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static void validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data; + + if (!g) + return; + + read_generation_data = !!g->chunk_generation_data; + + while (g) { + g->read_generation_data = read_generation_data; + g = g->base_graph; + } +} + struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -618,6 +633,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); + validate_mixed_generation_chain(g); + return g; } @@ -787,7 +804,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - if (g->chunk_generation_data) { + if (g->read_generation_data) { offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { @@ -2012,6 +2029,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) if (i < ctx->num_commit_graphs_after) ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid)); + /* + * If the topmost remaining layer has generation data chunk, the + * resultant layer also has generation data chunk. + */ + if (i == ctx->num_commit_graphs_after - 2) + ctx->write_generation_data = !!g->chunk_generation_data; + i--; g = g->base_graph; } @@ -2239,6 +2263,7 @@ int write_commit_graph(struct object_directory *odb, struct commit_graph *g = ctx->r->objects->commit_graph; while (g) { + g->read_generation_data = 1; g->topo_levels = &topo_levels; g = g->base_graph; } @@ -2534,7 +2559,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) * also GENERATION_NUMBER_V1_MAX. Decrement to avoid extra logic * in the following condition. */ - if (!g->chunk_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) + if (!g->read_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); diff --git a/commit-graph.h b/commit-graph.h index 19a02001fde..ad52130883b 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -64,6 +64,7 @@ struct commit_graph { struct object_directory *odb; uint32_t num_commits_in_base; + unsigned int read_generation_data; struct commit_graph *base_graph; const uint32_t *chunk_oid_fanout; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 587757b62d9..8e90f3423b8 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -453,4 +453,185 @@ test_expect_success 'prevent regression for duplicate commits across layers' ' git -C dup commit-graph verify ' +NUM_FIRST_LAYER_COMMITS=64 +NUM_SECOND_LAYER_COMMITS=16 +NUM_THIRD_LAYER_COMMITS=7 +NUM_FOURTH_LAYER_COMMITS=8 +NUM_FIFTH_LAYER_COMMITS=16 +SECOND_LAYER_SEQUENCE_START=$(($NUM_FIRST_LAYER_COMMITS + 1)) +SECOND_LAYER_SEQUENCE_END=$(($SECOND_LAYER_SEQUENCE_START + $NUM_SECOND_LAYER_COMMITS - 1)) +THIRD_LAYER_SEQUENCE_START=$(($SECOND_LAYER_SEQUENCE_END + 1)) +THIRD_LAYER_SEQUENCE_END=$(($THIRD_LAYER_SEQUENCE_START + $NUM_THIRD_LAYER_COMMITS - 1)) +FOURTH_LAYER_SEQUENCE_START=$(($THIRD_LAYER_SEQUENCE_END + 1)) +FOURTH_LAYER_SEQUENCE_END=$(($FOURTH_LAYER_SEQUENCE_START + $NUM_FOURTH_LAYER_COMMITS - 1)) +FIFTH_LAYER_SEQUENCE_START=$(($FOURTH_LAYER_SEQUENCE_END + 1)) +FIFTH_LAYER_SEQUENCE_END=$(($FIFTH_LAYER_SEQUENCE_START + $NUM_FIFTH_LAYER_COMMITS - 1)) + +# Current split graph chain: +# +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' + graphdir=".git/objects/info/commit-graphs" && + test_oid_cache <<-EOF && + oid_version sha1:1 + oid_version sha256:2 + EOF + git init mixed && + ( + cd mixed && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + for i in $(test_seq $NUM_FIRST_LAYER_COMMITS) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split && + graph_read_expect $NUM_FIRST_LAYER_COMMITS && + test_line_count = 1 $graphdir/commit-graph-chain && + for i in $(test_seq $SECOND_LAYER_SEQUENCE_START $SECOND_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge && + test_line_count = 2 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 1 + num_commits: $NUM_SECOND_LAYER_COMMITS + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify && + cat $graphdir/commit-graph-chain + ) +' + +# The new layer will be added without generation data chunk as it was not +# present on the layer underneath it. +# +# 7 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'do not write generation data chunk if not present on existing tip' ' + git clone mixed mixed-no-gdat && + ( + cd mixed-no-gdat && + for i in $(test_seq $THIRD_LAYER_SEQUENCE_START $THIRD_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split=no-merge && + test_line_count = 3 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: $NUM_THIRD_LAYER_COMMITS + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify + ) +' + +# Number of commits in each layer of the split-commit graph before merge: +# +# 8 commits (No GDAT) +# ------------------------ +# 7 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +# The top two layers are merged and do not have generation data chunk as layer below them does +# not have generation data chunk. +# +# 15 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'do not write generation data chunk if the topmost remaining layer does not have generation data chunk' ' + git clone mixed-no-gdat mixed-merge-no-gdat && + ( + cd mixed-merge-no-gdat && + for i in $(test_seq $FOURTH_LAYER_SEQUENCE_START $FOURTH_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split --size-multiple 1 && + test_line_count = 3 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: $(($NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS)) + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify + ) +' + +# Number of commits in each layer of the split-commit graph before merge: +# +# 16 commits (No GDAT) +# ------------------------ +# 15 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +# The top three layers are merged and has generation data chunk as the topmost remaining layer +# has generation data chunk. +# +# 47 commits (GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'write generation data chunk if topmost remaining layer has generation data chunk' ' + git clone mixed-merge-no-gdat mixed-merge-gdat && + ( + cd mixed-merge-gdat && + for i in $(test_seq $FIFTH_LAYER_SEQUENCE_START $FIFTH_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split --size-multiple 1 && + test_line_count = 2 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 5 1 + num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS)) + chunks: oid_fanout oid_lookup commit_metadata generation_data + EOF + test_cmp expect output + ) +' + +test_expect_success 'write generation data chunk when commit-graph chain is replaced' ' + git clone mixed mixed-replace && + ( + cd mixed-replace && + git commit-graph write --reachable --split=replace && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir && + graph_read_expect $(($NUM_FIRST_LAYER_COMMITS + $NUM_SECOND_LAYER_COMMITS)) && + git commit-graph verify + ) +' + test_done