diff mbox series

[v2,05/10] commit-graph: implement generation data chunk

Message ID cb797e20d79e9dcd3e0b953e0db3ed1defb9aa7c.1596941625.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Implement Corrected Commit Date | expand

Commit Message

Johannes Schindelin via GitGitGadget Aug. 9, 2020, 2:53 a.m. UTC
From: Abhishek Kumar <abhishekkumar8222@gmail.com>

As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number was to distinguish
between graph versions in a backwards compatible manner.

We are going to introduce a new chunk called Generation Data chunk (or
GDAT). GDAT stores generation number v2 (and any subsequent versions),
whereas CDAT will still store topological level.

Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit graph written
by old Git).

We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.

[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c                | 40 +++++++++++++++++++---
 commit-graph.h                |  2 ++
 t/README                      |  3 ++
 t/helper/test-read-graph.c    |  2 ++
 t/t4216-log-bloom.sh          |  4 +--
 t/t5318-commit-graph.sh       | 27 +++++++--------
 t/t5324-split-commit-graph.sh | 12 +++----
 t/t6600-test-reach.sh         | 62 +++++++++++++++++++----------------
 8 files changed, 99 insertions(+), 53 deletions(-)

Comments

Derrick Stolee Aug. 10, 2020, 4:28 p.m. UTC | #1
On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote:
> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> 
> As discovered by Ævar, we cannot increment graph version to
> distinguish between generation numbers v1 and v2 [1]. Thus, one of
> pre-requistes before implementing generation number was to distinguish
> between graph versions in a backwards compatible manner.
> 
> We are going to introduce a new chunk called Generation Data chunk (or
> GDAT). GDAT stores generation number v2 (and any subsequent versions),
> whereas CDAT will still store topological level.
> 
> Old Git does not understand GDAT chunk and would ignore it, reading
> topological levels from CDAT. New Git can parse GDAT and take advantage
> of newer generation numbers, falling back to topological levels when
> GDAT chunk is missing (as it would happen with a commit graph written
> by old Git).

There is a philosophical problem with this patch, and I'm not sure
about the right way to fix it, or if there really is a problem at all.
At minimum, the commit message needs to be improved to make the issue
clear:

This version of the chunk does not store corrected commit date offsets!

This commit add a chunk named "GDAT" and fills it with topological
levels. This is _different_ than the intended final format. For that
reason, the commit-graph-format.txt document is not updated.

The reason I say this is a "philosophical" problem is that this patch
introduces a version of Git that has a different interpretation of the
GDAT chunk than the version presented two patches later. While this
version would never be released, it still exists in history and could
present difficulty if someone were to bisect on an issue with the GDAT
chunk (using external data, not data produced by the compiled binary
at that version).

The justification for this commit the way you did it is clear: there
is a lot of test fallout to just including a new chunk. The question
is whether it is enough to justify this "dummy" implementation for
now?

The tricky bit is the series of three patches starting with this
one.

1. The next patch "commit-graph: return 64-bit generation number" can
   be reordered to be before this patch, no problem. I don't think
   there will be any text conflicts _except_ inside the
   write_graph_chunk_generation_data() method introduced here.

2. The patch after that, "commit-graph: implement corrected commit date"
   only has a small dependence: it writes to the GDAT chunk and parses
   it out. If you remove the interaction with the GDAT chunk, then you
   still have the computation as part of compute_generation_numbers()
   that is valuable. You will need to be careful about the exit
   condition, though, since you also introduce the topo_level chunk.

Patches 5-7 could perhaps be reorganized as follows:

  i. commit-graph: return 64-bit generation number, as-is.

 ii. Add a topo_level slab that is parsed from CDAT. Modify
     compute_generation_numbers() to populate this value and modify
     write_graph_chunk_data() to read this value. Simultaneously
     populate the "generation" member with the same value.

iii. "commit-graph: implement corrected commit date" without any GDAT
     chunk interaction. Make sure the algorithm in
     compute_generation_numbers() walks commits if either topo_level or
     generation are unset. There is a trick here: the generation value
     _is_ set if the commit is parsed from the existing commit-graph!
     Is this case covered by the existing logic to not write GDAT when
     writing a split commit-graph file with a base that does not have
     GDAT? Note that the non-split case does not load the commit-graph
     for parsing, so the interesting case is "--split-replace". Worth
     a test (after we write the GDAT chunk), which you have in "commit-graph:
     handle mixed generation commit chains".

 iv. This patch, introducing the chunk and the read/write logic.

  v. Add the remaining patches.

Again, this is a complicated patch-reorganization. The hope is that
the end result is something that is easy to review as well as something
that produces an as-sane-as-possible history for future bisecters.

Perhaps other reviewers have similar feelings, or can say that I am
being too picky.

> We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
> which forces commit-graph file to be written without generation data
> chunk to emulate a commit-graph file written by old Git.

Thank you for introducing this. It really makes it clear what the
benefit is when looking at the t6600-test-reach.sh changes. However,
the changes to that script are more "here is an opportunity for extra
coverage" as opposed to a necessary change immediately upon creating
the GDAT chunk. That could be separated out and justified on its own.
Recall that the justification is that the new version of Git will
continue to work with commit-graph files without a GDAT chunk.

> +static int write_graph_chunk_generation_data(struct hashfile *f,
> +					      struct write_commit_graph_context *ctx)
> +{
> +	int i;
> +	for (i = 0; i < ctx->commits.nr; i++) {
> +		struct commit *c = ctx->commits.list[i];
> +		display_progress(ctx->progress, ++ctx->progress_cnt);
> +		hashwrite_be32(f, commit_graph_data_at(c)->generation);

Here is the "incorrect" data being written.

> +	}
> +
> +	return 0;
> +}
> +

> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -72,7 +72,7 @@ graph_git_behavior 'no graph' full commits/3 commits/1
>  graph_read_expect() {
>  	OPTIONAL=""
>  	NUM_CHUNKS=3
> -	if test ! -z $2
> +	if test ! -z "$2"

A subtle change, but important because we now have multiple "extra"
chunks possible here. Good.

>  graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
> @@ -421,8 +421,9 @@ test_expect_success 'replace-objects invalidates commit-graph' '
>  
>  test_expect_success 'git commit-graph verify' '
>  	cd "$TRASH_DIRECTORY/full" &&
> -	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
> -	git commit-graph verify >output
> +	git rev-parse commits/8 | GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --stdin-commits &&
> +	git commit-graph verify >output &&
> +	graph_read_expect 9 extra_edges
>  '

And it is this case as to why we don't just add "generation_data" to our
list of expected chunks.

> @@ -29,9 +29,9 @@ graph_read_expect() {
>  		NUM_BASE=$2
>  	fi
>  	cat >expect <<- EOF
> -	header: 43475048 1 1 3 $NUM_BASE
> +	header: 43475048 1 1 4 $NUM_BASE
>  	num_commits: $1
> -	chunks: oid_fanout oid_lookup commit_metadata
> +	chunks: oid_fanout oid_lookup commit_metadata generation_data

In this script, you _do_ add it to the default chunk list, which
saves some extra work in the rest of the tests. Good.


Thanks,
-Stolee
Abhishek Kumar Aug. 11, 2020, 11:03 a.m. UTC | #2
On Mon, Aug 10, 2020 at 12:28:10PM -0400, Derrick Stolee wrote:
> On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote:
> > From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> > 
> > As discovered by Ævar, we cannot increment graph version to
> > distinguish between generation numbers v1 and v2 [1]. Thus, one of
> > pre-requistes before implementing generation number was to distinguish
> > between graph versions in a backwards compatible manner.
> > 
> > We are going to introduce a new chunk called Generation Data chunk (or
> > GDAT). GDAT stores generation number v2 (and any subsequent versions),
> > whereas CDAT will still store topological level.
> > 
> > Old Git does not understand GDAT chunk and would ignore it, reading
> > topological levels from CDAT. New Git can parse GDAT and take advantage
> > of newer generation numbers, falling back to topological levels when
> > GDAT chunk is missing (as it would happen with a commit graph written
> > by old Git).
> 
> There is a philosophical problem with this patch, and I'm not sure
> about the right way to fix it, or if there really is a problem at all.
> At minimum, the commit message needs to be improved to make the issue
> clear:
> 
> This version of the chunk does not store corrected commit date offsets!
> 
> This commit add a chunk named "GDAT" and fills it with topological
> levels. This is _different_ than the intended final format. For that
> reason, the commit-graph-format.txt document is not updated.
> 
> The reason I say this is a "philosophical" problem is that this patch
> introduces a version of Git that has a different interpretation of the
> GDAT chunk than the version presented two patches later. While this
> version would never be released, it still exists in history and could
> present difficulty if someone were to bisect on an issue with the GDAT
> chunk (using external data, not data produced by the compiled binary
> at that version).
> 

Yes, that is correct. I did often wonder that our inference that "commit
graph has a generation data chunk implies commit graph stores corrected
commit date offsets" is not always true because of this "dummy"
implementation. 

> The justification for this commit the way you did it is clear: there
> is a lot of test fallout to just including a new chunk. The question
> is whether it is enough to justify this "dummy" implementation for
> now?
> 
> The tricky bit is the series of three patches starting with this
> one.
> 
> 1. The next patch "commit-graph: return 64-bit generation number" can
>    be reordered to be before this patch, no problem. I don't think
>    there will be any text conflicts _except_ inside the
>    write_graph_chunk_generation_data() method introduced here.
> 
> 2. The patch after that, "commit-graph: implement corrected commit date"
>    only has a small dependence: it writes to the GDAT chunk and parses
>    it out. If you remove the interaction with the GDAT chunk, then you
>    still have the computation as part of compute_generation_numbers()
>    that is valuable. You will need to be careful about the exit
>    condition, though, since you also introduce the topo_level chunk.
> 
> Patches 5-7 could perhaps be reorganized as follows:
> 
>   i. commit-graph: return 64-bit generation number, as-is.
> 
>  ii. Add a topo_level slab that is parsed from CDAT. Modify
>      compute_generation_numbers() to populate this value and modify
>      write_graph_chunk_data() to read this value. Simultaneously
>      populate the "generation" member with the same value.
> 
> iii. "commit-graph: implement corrected commit date" without any GDAT
>      chunk interaction. Make sure the algorithm in
>      compute_generation_numbers() walks commits if either topo_level or
>      generation are unset. There is a trick here: the generation value
>      _is_ set if the commit is parsed from the existing commit-graph!
>      Is this case covered by the existing logic to not write GDAT when
>      writing a split commit-graph file with a base that does not have
>      GDAT? Note that the non-split case does not load the commit-graph
>      for parsing, so the interesting case is "--split-replace". Worth
>      a test (after we write the GDAT chunk), which you have in "commit-graph:
>      handle mixed generation commit chains".
> 

Right, so at the end of this patch we compute corrected commit dates but
don't write them to graph file.

Although, writing ii. and iii. together in the same patch makes more
sense to me. Would it be hard to follow for someone who has no context
of this discussion?

>  iv. This patch, introducing the chunk and the read/write logic.
> 
>   v. Add the remaining patches.
> 
> Again, this is a complicated patch-reorganization. The hope is that
> the end result is something that is easy to review as well as something
> that produces an as-sane-as-possible history for future bisecters.
> 
> Perhaps other reviewers have similar feelings, or can say that I am
> being too picky.
> 

I can see how the reorganization helps us avoid a rather nasty
situation to be in. Should not be too hard to reorganize.

> > We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
> > which forces commit-graph file to be written without generation data
> > chunk to emulate a commit-graph file written by old Git.
> 
> ...
> 
> Thanks,
> -Stolee

Thanks
- Abhishek
Derrick Stolee Aug. 11, 2020, 12:27 p.m. UTC | #3
On 8/11/2020 7:03 AM, Abhishek Kumar wrote:
> On Mon, Aug 10, 2020 at 12:28:10PM -0400, Derrick Stolee wrote:
>> Patches 5-7 could perhaps be reorganized as follows:
>>
>>   i. commit-graph: return 64-bit generation number, as-is.
>>
>>  ii. Add a topo_level slab that is parsed from CDAT. Modify
>>      compute_generation_numbers() to populate this value and modify
>>      write_graph_chunk_data() to read this value. Simultaneously
>>      populate the "generation" member with the same value.
>>
>> iii. "commit-graph: implement corrected commit date" without any GDAT
>>      chunk interaction. Make sure the algorithm in
>>      compute_generation_numbers() walks commits if either topo_level or
>>      generation are unset. There is a trick here: the generation value
>>      _is_ set if the commit is parsed from the existing commit-graph!
>>      Is this case covered by the existing logic to not write GDAT when
>>      writing a split commit-graph file with a base that does not have
>>      GDAT? Note that the non-split case does not load the commit-graph
>>      for parsing, so the interesting case is "--split-replace". Worth
>>      a test (after we write the GDAT chunk), which you have in "commit-graph:
>>      handle mixed generation commit chains".
>>
> 
> Right, so at the end of this patch we compute corrected commit dates but
> don't write them to graph file.
> 
> Although, writing ii. and iii. together in the same patch makes more
> sense to me. Would it be hard to follow for someone who has no context
> of this discussion?

It is always easier to combine two patches than to split one into two.

With that in mind, I recommend starting with a split version and then
seeing how each patch looks. I think that these are "independent enough"
ideas that justify the separate patches.

>>  iv. This patch, introducing the chunk and the read/write logic.
>>
>>   v. Add the remaining patches.
>>
>> Again, this is a complicated patch-reorganization. The hope is that
>> the end result is something that is easy to review as well as something
>> that produces an as-sane-as-possible history for future bisecters.
>>
>> Perhaps other reviewers have similar feelings, or can say that I am
>> being too picky.
>>
> 
> I can see how the reorganization helps us avoid a rather nasty
> situation to be in. Should not be too hard to reorganize.

I hope not. Hopefully you get some more review on this version
before jumping in on such a big reorg (in case someone else has
a different opinion).

Thanks,
-Stolee
Taylor Blau Aug. 11, 2020, 6:58 p.m. UTC | #4
On Tue, Aug 11, 2020 at 08:27:41AM -0400, Derrick Stolee wrote:
> On 8/11/2020 7:03 AM, Abhishek Kumar wrote:
> > On Mon, Aug 10, 2020 at 12:28:10PM -0400, Derrick Stolee wrote:
> >> Patches 5-7 could perhaps be reorganized as follows:
> >>
> >>   i. commit-graph: return 64-bit generation number, as-is.
> >>
> >>  ii. Add a topo_level slab that is parsed from CDAT. Modify
> >>      compute_generation_numbers() to populate this value and modify
> >>      write_graph_chunk_data() to read this value. Simultaneously
> >>      populate the "generation" member with the same value.
> >>
> >> iii. "commit-graph: implement corrected commit date" without any GDAT
> >>      chunk interaction. Make sure the algorithm in
> >>      compute_generation_numbers() walks commits if either topo_level or
> >>      generation are unset. There is a trick here: the generation value
> >>      _is_ set if the commit is parsed from the existing commit-graph!
> >>      Is this case covered by the existing logic to not write GDAT when
> >>      writing a split commit-graph file with a base that does not have
> >>      GDAT? Note that the non-split case does not load the commit-graph
> >>      for parsing, so the interesting case is "--split-replace". Worth
> >>      a test (after we write the GDAT chunk), which you have in "commit-graph:
> >>      handle mixed generation commit chains".
> >>
> >
> > Right, so at the end of this patch we compute corrected commit dates but
> > don't write them to graph file.
> >
> > Although, writing ii. and iii. together in the same patch makes more
> > sense to me. Would it be hard to follow for someone who has no context
> > of this discussion?
>
> It is always easier to combine two patches than to split one into two.
>
> With that in mind, I recommend starting with a split version and then
> seeing how each patch looks. I think that these are "independent enough"
> ideas that justify the separate patches.
>
> >>  iv. This patch, introducing the chunk and the read/write logic.
> >>
> >>   v. Add the remaining patches.
> >>
> >> Again, this is a complicated patch-reorganization. The hope is that
> >> the end result is something that is easy to review as well as something
> >> that produces an as-sane-as-possible history for future bisecters.
> >>
> >> Perhaps other reviewers have similar feelings, or can say that I am
> >> being too picky.
> >>
> >
> > I can see how the reorganization helps us avoid a rather nasty
> > situation to be in. Should not be too hard to reorganize.
>
> I hope not. Hopefully you get some more review on this version
> before jumping in on such a big reorg (in case someone else has
> a different opinion).

I think the direction makes sense. We should avoid having the dummy
implementation of the GDAT chunk in the interim if at all possible (and
it seems like it is). What Stolee is proposing is what I'd suggest, too.

Please let us know if you need any help restructuring these patches.
Please make sure to give them a careful review since it is easy to move
a hunk into the wrong commit, or let a detail in the patch text become
out-of-date. Looking over "git log -p origin/master..HEAD" and "git
rebase -x make -j8 DEVELOPER=1 test' origin/master" never hurts ;).

> Thanks,
> -Stolee

Thanks,
Taylor
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index fb6e2bf18f..d5da1e8028 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,11 +38,12 @@  void git_test_write_commit_graph_or_die(void)
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */
 #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
 #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */
 #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
 #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */
-#define MAX_NUM_CHUNKS 7
+#define MAX_NUM_CHUNKS 8
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
@@ -392,6 +393,13 @@  struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size)
 				graph->chunk_commit_data = data + chunk_offset;
 			break;
 
+		case GRAPH_CHUNKID_GENERATION_DATA:
+			if (graph->chunk_generation_data)
+				chunk_repeated = 1;
+			else
+				graph->chunk_generation_data = data + chunk_offset;
+			break;
+
 		case GRAPH_CHUNKID_EXTRAEDGES:
 			if (graph->chunk_extra_edges)
 				chunk_repeated = 1;
@@ -758,7 +766,10 @@  static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	if (g->chunk_generation_data)
+		graph_data->generation = get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
+	else
+		graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -951,7 +962,8 @@  struct write_commit_graph_context {
 		 report_progress:1,
 		 split:1,
 		 changed_paths:1,
-		 order_by_pack:1;
+		 order_by_pack:1,
+		 write_generation_data:1;
 
 	const struct split_commit_graph_opts *split_opts;
 	size_t total_bloom_filter_data_size;
@@ -1105,8 +1117,21 @@  static int write_graph_chunk_data(struct hashfile *f,
 	return 0;
 }
 
+static int write_graph_chunk_generation_data(struct hashfile *f,
+					      struct write_commit_graph_context *ctx)
+{
+	int i;
+	for (i = 0; i < ctx->commits.nr; i++) {
+		struct commit *c = ctx->commits.list[i];
+		display_progress(ctx->progress, ++ctx->progress_cnt);
+		hashwrite_be32(f, commit_graph_data_at(c)->generation);
+	}
+
+	return 0;
+}
+
 static int write_graph_chunk_extra_edges(struct hashfile *f,
-					 struct write_commit_graph_context *ctx)
+					  struct write_commit_graph_context *ctx)
 {
 	struct commit **list = ctx->commits.list;
 	struct commit **last = ctx->commits.list + ctx->commits.nr;
@@ -1710,6 +1735,12 @@  static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	chunks[2].id = GRAPH_CHUNKID_DATA;
 	chunks[2].size = (hashsz + 16) * ctx->commits.nr;
 	chunks[2].write_fn = write_graph_chunk_data;
+	if (ctx->write_generation_data) {
+		chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA;
+		chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
+		chunks[num_chunks].write_fn = write_graph_chunk_generation_data;
+		num_chunks++;
+	}
 	if (ctx->num_extra_edges) {
 		chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES;
 		chunks[num_chunks].size = 4 * ctx->num_extra_edges;
@@ -2113,6 +2144,7 @@  int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->split_opts = split_opts;
 	ctx->total_bloom_filter_data_size = 0;
+	ctx->write_generation_data = !git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0);
 
 	if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS)
 		ctx->changed_paths = 1;
diff --git a/commit-graph.h b/commit-graph.h
index 701e3d41aa..cc232e0678 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -6,6 +6,7 @@ 
 #include "oidset.h"
 
 #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
+#define GIT_TEST_COMMIT_GRAPH_NO_GDAT "GIT_TEST_COMMIT_GRAPH_NO_GDAT"
 #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE"
 #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
 
@@ -67,6 +68,7 @@  struct commit_graph {
 	const uint32_t *chunk_oid_fanout;
 	const unsigned char *chunk_oid_lookup;
 	const unsigned char *chunk_commit_data;
+	const unsigned char *chunk_generation_data;
 	const unsigned char *chunk_extra_edges;
 	const unsigned char *chunk_base_graphs;
 	const unsigned char *chunk_bloom_indexes;
diff --git a/t/README b/t/README
index 70ec61cf88..6647ef132e 100644
--- a/t/README
+++ b/t/README
@@ -379,6 +379,9 @@  GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
 be written after every 'git commit' command, and overrides the
 'core.commitGraph' setting to true.
 
+GIT_TEST_COMMIT_GRAPH_NO_GDAT=<boolean>, when true, forces the
+commit-graph to be written without generation data chunk.
+
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=<boolean>, when true, forces
 commit-graph write to compute and write changed path Bloom filters for
 every 'git commit-graph write', as if the `--changed-paths` option was
diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c
index 6d0c962438..1c2a5366c7 100644
--- a/t/helper/test-read-graph.c
+++ b/t/helper/test-read-graph.c
@@ -32,6 +32,8 @@  int cmd__read_graph(int argc, const char **argv)
 		printf(" oid_lookup");
 	if (graph->chunk_commit_data)
 		printf(" commit_metadata");
+	if (graph->chunk_generation_data)
+		printf(" generation_data");
 	if (graph->chunk_extra_edges)
 		printf(" extra_edges");
 	if (graph->chunk_bloom_indexes)
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index c21cc160f3..55c94e9ebd 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -33,11 +33,11 @@  test_expect_success 'setup test - repo, commits, commit graph, log outputs' '
 	git commit-graph write --reachable --changed-paths
 '
 graph_read_expect () {
-	NUM_CHUNKS=5
+	NUM_CHUNKS=6
 	cat >expect <<- EOF
 	header: 43475048 1 1 $NUM_CHUNKS 0
 	num_commits: $1
-	chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data
+	chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data
 	EOF
 	test-tool read-graph >actual &&
 	test_cmp expect actual
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2804b0dd45..fef05c33d7 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -72,7 +72,7 @@  graph_git_behavior 'no graph' full commits/3 commits/1
 graph_read_expect() {
 	OPTIONAL=""
 	NUM_CHUNKS=3
-	if test ! -z $2
+	if test ! -z "$2"
 	then
 		OPTIONAL=" $2"
 		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
@@ -99,14 +99,14 @@  test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	# valid commit and tree OID
 	git rev-parse HEAD HEAD^{tree} >in &&
 	git commit-graph write --stdin-commits <in &&
-	graph_read_expect 3
+	graph_read_expect 3 generation_data
 '
 
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "3"
+	graph_read_expect "3" generation_data
 '
 
 test_expect_success POSIXPERM 'write graph has correct permissions' '
@@ -215,7 +215,7 @@  test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "10" "extra_edges"
+	graph_read_expect "10" "generation_data extra_edges"
 '
 
 graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
@@ -250,7 +250,7 @@  test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "11" "extra_edges"
+	graph_read_expect "11" "generation_data extra_edges"
 '
 
 graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
@@ -260,7 +260,7 @@  test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "11" "extra_edges"
+	graph_read_expect "11" "generation_data extra_edges"
 '
 
 graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
@@ -270,7 +270,7 @@  test_expect_success 'build graph from latest pack with closure' '
 	cd "$TRASH_DIRECTORY/full" &&
 	cat new-idx | git commit-graph write --stdin-packs &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "9" "extra_edges"
+	graph_read_expect "9" "generation_data extra_edges"
 '
 
 graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
@@ -283,7 +283,7 @@  test_expect_success 'build graph from commits with closure' '
 	git rev-parse merge/1 >>commits-in &&
 	cat commits-in | git commit-graph write --stdin-commits &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "6"
+	graph_read_expect "6" "generation_data"
 '
 
 graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
@@ -293,7 +293,7 @@  test_expect_success 'build graph from commits with append' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git rev-parse merge/3 | git commit-graph write --stdin-commits --append &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "10" "extra_edges"
+	graph_read_expect "10" "generation_data extra_edges"
 '
 
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
@@ -303,7 +303,7 @@  test_expect_success 'build graph using --reachable' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write --reachable &&
 	test_path_is_file $objdir/info/commit-graph &&
-	graph_read_expect "11" "extra_edges"
+	graph_read_expect "11" "generation_data extra_edges"
 '
 
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
@@ -324,7 +324,7 @@  test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
 	test_path_is_file $baredir/info/commit-graph &&
-	graph_read_expect "11" "extra_edges"
+	graph_read_expect "11" "generation_data extra_edges"
 '
 
 graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
@@ -421,8 +421,9 @@  test_expect_success 'replace-objects invalidates commit-graph' '
 
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
-	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
-	git commit-graph verify >output
+	git rev-parse commits/8 | GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --stdin-commits &&
+	git commit-graph verify >output &&
+	graph_read_expect 9 extra_edges
 '
 
 NUM_COMMITS=9
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 9b850ea907..6b25c3d9ce 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -14,11 +14,11 @@  test_expect_success 'setup repo' '
 	graphdir="$infodir/commit-graphs" &&
 	test_oid_init &&
 	test_oid_cache <<-EOM
-	shallow sha1:1760
-	shallow sha256:2064
+	shallow sha1:2132
+	shallow sha256:2436
 
-	base sha1:1376
-	base sha256:1496
+	base sha1:1408
+	base sha256:1528
 	EOM
 '
 
@@ -29,9 +29,9 @@  graph_read_expect() {
 		NUM_BASE=$2
 	fi
 	cat >expect <<- EOF
-	header: 43475048 1 1 3 $NUM_BASE
+	header: 43475048 1 1 4 $NUM_BASE
 	num_commits: $1
-	chunks: oid_fanout oid_lookup commit_metadata
+	chunks: oid_fanout oid_lookup commit_metadata generation_data
 	EOF
 	test-tool read-graph >output &&
 	test_cmp expect output
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 475564bee7..d14b129f06 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -55,10 +55,13 @@  test_expect_success 'setup' '
 	git show-ref -s commit-5-5 | git commit-graph write --stdin-commits &&
 	mv .git/objects/info/commit-graph commit-graph-half &&
 	chmod u+w commit-graph-half &&
+	GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable &&
+	mv .git/objects/info/commit-graph commit-graph-no-gdat &&
+	chmod u+w commit-graph-no-gdat &&
 	git config core.commitGraph true
 '
 
-run_three_modes () {
+run_all_modes () {
 	test_when_finished rm -rf .git/objects/info/commit-graph &&
 	"$@" <input >actual &&
 	test_cmp expect actual &&
@@ -67,11 +70,14 @@  run_three_modes () {
 	test_cmp expect actual &&
 	cp commit-graph-half .git/objects/info/commit-graph &&
 	"$@" <input >actual &&
+	test_cmp expect actual &&
+	cp commit-graph-no-gdat .git/objects/info/commit-graph &&
+	"$@" <input >actual &&
 	test_cmp expect actual
 }
 
-test_three_modes () {
-	run_three_modes test-tool reach "$@"
+test_all_modes () {
+	run_all_modes test-tool reach "$@"
 }
 
 test_expect_success 'ref_newer:miss' '
@@ -80,7 +86,7 @@  test_expect_success 'ref_newer:miss' '
 	B:commit-4-9
 	EOF
 	echo "ref_newer(A,B):0" >expect &&
-	test_three_modes ref_newer
+	test_all_modes ref_newer
 '
 
 test_expect_success 'ref_newer:hit' '
@@ -89,7 +95,7 @@  test_expect_success 'ref_newer:hit' '
 	B:commit-2-3
 	EOF
 	echo "ref_newer(A,B):1" >expect &&
-	test_three_modes ref_newer
+	test_all_modes ref_newer
 '
 
 test_expect_success 'in_merge_bases:hit' '
@@ -98,7 +104,7 @@  test_expect_success 'in_merge_bases:hit' '
 	B:commit-8-8
 	EOF
 	echo "in_merge_bases(A,B):1" >expect &&
-	test_three_modes in_merge_bases
+	test_all_modes in_merge_bases
 '
 
 test_expect_success 'in_merge_bases:miss' '
@@ -107,7 +113,7 @@  test_expect_success 'in_merge_bases:miss' '
 	B:commit-5-9
 	EOF
 	echo "in_merge_bases(A,B):0" >expect &&
-	test_three_modes in_merge_bases
+	test_all_modes in_merge_bases
 '
 
 test_expect_success 'is_descendant_of:hit' '
@@ -118,7 +124,7 @@  test_expect_success 'is_descendant_of:hit' '
 	X:commit-1-1
 	EOF
 	echo "is_descendant_of(A,X):1" >expect &&
-	test_three_modes is_descendant_of
+	test_all_modes is_descendant_of
 '
 
 test_expect_success 'is_descendant_of:miss' '
@@ -129,7 +135,7 @@  test_expect_success 'is_descendant_of:miss' '
 	X:commit-7-6
 	EOF
 	echo "is_descendant_of(A,X):0" >expect &&
-	test_three_modes is_descendant_of
+	test_all_modes is_descendant_of
 '
 
 test_expect_success 'get_merge_bases_many' '
@@ -144,7 +150,7 @@  test_expect_success 'get_merge_bases_many' '
 		git rev-parse commit-5-6 \
 			      commit-4-7 | sort
 	} >expect &&
-	test_three_modes get_merge_bases_many
+	test_all_modes get_merge_bases_many
 '
 
 test_expect_success 'reduce_heads' '
@@ -166,7 +172,7 @@  test_expect_success 'reduce_heads' '
 			      commit-2-8 \
 			      commit-1-10 | sort
 	} >expect &&
-	test_three_modes reduce_heads
+	test_all_modes reduce_heads
 '
 
 test_expect_success 'can_all_from_reach:hit' '
@@ -189,7 +195,7 @@  test_expect_success 'can_all_from_reach:hit' '
 	Y:commit-8-1
 	EOF
 	echo "can_all_from_reach(X,Y):1" >expect &&
-	test_three_modes can_all_from_reach
+	test_all_modes can_all_from_reach
 '
 
 test_expect_success 'can_all_from_reach:miss' '
@@ -211,7 +217,7 @@  test_expect_success 'can_all_from_reach:miss' '
 	Y:commit-8-5
 	EOF
 	echo "can_all_from_reach(X,Y):0" >expect &&
-	test_three_modes can_all_from_reach
+	test_all_modes can_all_from_reach
 '
 
 test_expect_success 'can_all_from_reach_with_flag: tags case' '
@@ -234,7 +240,7 @@  test_expect_success 'can_all_from_reach_with_flag: tags case' '
 	Y:commit-8-1
 	EOF
 	echo "can_all_from_reach_with_flag(X,_,_,0,0):1" >expect &&
-	test_three_modes can_all_from_reach_with_flag
+	test_all_modes can_all_from_reach_with_flag
 '
 
 test_expect_success 'commit_contains:hit' '
@@ -250,8 +256,8 @@  test_expect_success 'commit_contains:hit' '
 	X:commit-9-3
 	EOF
 	echo "commit_contains(_,A,X,_):1" >expect &&
-	test_three_modes commit_contains &&
-	test_three_modes commit_contains --tag
+	test_all_modes commit_contains &&
+	test_all_modes commit_contains --tag
 '
 
 test_expect_success 'commit_contains:miss' '
@@ -267,8 +273,8 @@  test_expect_success 'commit_contains:miss' '
 	X:commit-9-3
 	EOF
 	echo "commit_contains(_,A,X,_):0" >expect &&
-	test_three_modes commit_contains &&
-	test_three_modes commit_contains --tag
+	test_all_modes commit_contains &&
+	test_all_modes commit_contains --tag
 '
 
 test_expect_success 'rev-list: basic topo-order' '
@@ -280,7 +286,7 @@  test_expect_success 'rev-list: basic topo-order' '
 		commit-6-2 commit-5-2 commit-4-2 commit-3-2 commit-2-2 commit-1-2 \
 		commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \
 	>expect &&
-	run_three_modes git rev-list --topo-order commit-6-6
+	run_all_modes git rev-list --topo-order commit-6-6
 '
 
 test_expect_success 'rev-list: first-parent topo-order' '
@@ -292,7 +298,7 @@  test_expect_success 'rev-list: first-parent topo-order' '
 		commit-6-2 \
 		commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \
 	>expect &&
-	run_three_modes git rev-list --first-parent --topo-order commit-6-6
+	run_all_modes git rev-list --first-parent --topo-order commit-6-6
 '
 
 test_expect_success 'rev-list: range topo-order' '
@@ -304,7 +310,7 @@  test_expect_success 'rev-list: range topo-order' '
 		commit-6-2 commit-5-2 commit-4-2 \
 		commit-6-1 commit-5-1 commit-4-1 \
 	>expect &&
-	run_three_modes git rev-list --topo-order commit-3-3..commit-6-6
+	run_all_modes git rev-list --topo-order commit-3-3..commit-6-6
 '
 
 test_expect_success 'rev-list: range topo-order' '
@@ -316,7 +322,7 @@  test_expect_success 'rev-list: range topo-order' '
 		commit-6-2 commit-5-2 commit-4-2 \
 		commit-6-1 commit-5-1 commit-4-1 \
 	>expect &&
-	run_three_modes git rev-list --topo-order commit-3-8..commit-6-6
+	run_all_modes git rev-list --topo-order commit-3-8..commit-6-6
 '
 
 test_expect_success 'rev-list: first-parent range topo-order' '
@@ -328,7 +334,7 @@  test_expect_success 'rev-list: first-parent range topo-order' '
 		commit-6-2 \
 		commit-6-1 commit-5-1 commit-4-1 \
 	>expect &&
-	run_three_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6
+	run_all_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6
 '
 
 test_expect_success 'rev-list: ancestry-path topo-order' '
@@ -338,7 +344,7 @@  test_expect_success 'rev-list: ancestry-path topo-order' '
 		commit-6-4 commit-5-4 commit-4-4 commit-3-4 \
 		commit-6-3 commit-5-3 commit-4-3 \
 	>expect &&
-	run_three_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6
+	run_all_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6
 '
 
 test_expect_success 'rev-list: symmetric difference topo-order' '
@@ -352,7 +358,7 @@  test_expect_success 'rev-list: symmetric difference topo-order' '
 		commit-3-8 commit-2-8 commit-1-8 \
 		commit-3-7 commit-2-7 commit-1-7 \
 	>expect &&
-	run_three_modes git rev-list --topo-order commit-3-8...commit-6-6
+	run_all_modes git rev-list --topo-order commit-3-8...commit-6-6
 '
 
 test_expect_success 'get_reachable_subset:all' '
@@ -372,7 +378,7 @@  test_expect_success 'get_reachable_subset:all' '
 			      commit-1-7 \
 			      commit-5-6 | sort
 	) >expect &&
-	test_three_modes get_reachable_subset
+	test_all_modes get_reachable_subset
 '
 
 test_expect_success 'get_reachable_subset:some' '
@@ -390,7 +396,7 @@  test_expect_success 'get_reachable_subset:some' '
 		git rev-parse commit-3-3 \
 			      commit-1-7 | sort
 	) >expect &&
-	test_three_modes get_reachable_subset
+	test_all_modes get_reachable_subset
 '
 
 test_expect_success 'get_reachable_subset:none' '
@@ -404,7 +410,7 @@  test_expect_success 'get_reachable_subset:none' '
 	Y:commit-2-8
 	EOF
 	echo "get_reachable_subset(X,Y)" >expect &&
-	test_three_modes get_reachable_subset
+	test_all_modes get_reachable_subset
 '
 
 test_done