diff mbox series

[v2,07/10] commit-graph: implement corrected commit date

Message ID bfe14732014807ff19f943cdf51068f0d3043c30.1596941625.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Implement Corrected Commit Date | expand

Commit Message

Linus Arver via GitGitGadget Aug. 9, 2020, 2:53 a.m. UTC
From: Abhishek Kumar <abhishekkumar8222@gmail.com>

With most of preparations done, let's implement corrected commit date
offset. We add a new commit-slab to store topogical levels while
writing commit graph and upgrade the generation member in struct
commit_graph_data to a 64-bit timestamp. We store topological levels to
ensure that older versions of Git will still have the performance
benefits from generation number v2.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c | 89 ++++++++++++++++++++++++++++----------------------
 commit.h       |  1 +
 2 files changed, 51 insertions(+), 39 deletions(-)

Comments

Derrick Stolee Aug. 10, 2020, 2:23 p.m. UTC | #1
On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote:
> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> 
> With most of preparations done, let's implement corrected commit date
> offset. We add a new commit-slab to store topogical levels while
> writing commit graph and upgrade the generation member in struct
> commit_graph_data to a 64-bit timestamp. We store topological levels to
> ensure that older versions of Git will still have the performance
> benefits from generation number v2.
> 
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---

> @@ -767,7 +764,10 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>  	item->date = (timestamp_t)((date_high << 32) | date_low);
>  
>  	if (g->chunk_generation_data)
> -		graph_data->generation = get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
> +	{
> +		graph_data->generation = item->date +
> +			(timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
> +	}

You don't need curly braces here, since this is only one line
in the block. Even if you did, these braces are in the wrong
location.

There is a subtle issue with this interpretation, and it
involves the case where the following happens:

1. A new version of Git writes a commit-graph using the
   GDAT chunk.

2. An older version of Git adds a new layer without the
   GDAT chunk.

At that point, the tip commit-graph does not have GDAT,
so the commits in that layer will get "generation" set
with the topological level, which is likely to be much
lower than the corrected commit dates set in the
"generation" field for commits in the lower layer.

The crux of the issue is that we are only considering
the current layer when interpreting the generation number
value.

The patch below inserts a flag into fill_commit_graph_info()
corresponding to the "global" state of whether the top
commit-graph layer has a GDAT chunk. By your later protection
to not write GDAT chunks on top of commit-graphs without
a GDAT chunk, this top commit-graph has all of the information
we need for this check.

Thanks,
-Stolee

--- >8 ---

From 62189709fad3b051cedbd36193f5244fcce17e1f Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Mon, 10 Aug 2020 10:06:47 -0400
Subject: [PATCH] commit-graph: use generation v2 only if entire chain does

Since there are released versions of Git that understand generation
numbers in the commit-graph's CDAT chunk but do not understand the GDAT
chunk, the following scenario is possible:

 1. "New" Git writes a commit-graph with the GDAT chunk.
 2. "Old" Git writes a split commit-graph on top without a GDAT chunk.

Because of the current use of inspecting the current layer for a
generation_data_chunk pointer, the commits in the lower layer will be
interpreted as having very large generation values (commit date plus
offset) compared to the generation numbers in the top layer (topological
level). This violates the expectation that the generation of a parent is
strictly smaller than the generation of a child.

It is difficult to expose this issue in a test. Since we _start_ with
artificially low generation numbers, any commit walk that prioritizes
generation numbers will walk all of the commits with high generation
number before walking the commits with low generation number. In all the
cases I tried, the commit-graph layers themselves "protect" any
incorrect behavior since none of the commits in the lower layer can
reach the commits in the upper layer.

This issue would manifest itself as a performance problem in this
case, especially with something like "git log --graph" since the low
generation numbers would cause the in-degree queue to walk all of the
commits in the lower layer before allowing the topo-order queue to write
anything to output (depending on the size of the upper layer).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index eb78af3dad..17623274d9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -762,7 +762,9 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 	return &commit_list_insert(c, pptr)->next;
 }
 
-static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
+#define COMMIT_GRAPH_GENERATION_V2 (1 << 0)
+
+static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos, int flags)
 {
 	const unsigned char *commit_data;
 	struct commit_graph_data *graph_data;
@@ -785,11 +787,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	if (g->chunk_generation_data)
-	{
+	if (g->chunk_generation_data && (flags & COMMIT_GRAPH_GENERATION_V2))
 		graph_data->generation = item->date +
 			(timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
-	}
 	else
 		graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
@@ -799,6 +799,10 @@ static inline void set_commit_tree(struct commit *c, struct tree *t)
 	c->maybe_tree = t;
 }
 
+/*
+ * In the case of a split commit-graph, this method expects the given
+ * commit-graph 'g' to be the top layer.
+ */
 static int fill_commit_in_graph(struct repository *r,
 				struct commit *item,
 				struct commit_graph *g, uint32_t pos)
@@ -808,11 +812,15 @@ static int fill_commit_in_graph(struct repository *r,
 	struct commit_list **pptr;
 	const unsigned char *commit_data;
 	uint32_t lex_index;
+	int flags = 0;
+
+	if (g->chunk_generation_data)
+		flags |= COMMIT_GRAPH_GENERATION_V2;
 
 	while (pos < g->num_commits_in_base)
 		g = g->base_graph;
 
-	fill_commit_graph_info(item, g, pos);
+	fill_commit_graph_info(item, g, pos, flags);
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
@@ -904,10 +912,14 @@ int parse_commit_in_graph(struct repository *r, struct commit *item)
 void load_commit_graph_info(struct repository *r, struct commit *item)
 {
 	uint32_t pos;
+	int flags = 0;
+
 	if (!prepare_commit_graph(r))
 		return;
+	if (r->objects->commit_graph->chunk_generation_data)
+		flags |= COMMIT_GRAPH_GENERATION_V2;
 	if (find_commit_in_graph(item, r->objects->commit_graph, &pos))
-		fill_commit_graph_info(item, r->objects->commit_graph, pos);
+		fill_commit_graph_info(item, r->objects->commit_graph, pos, flags);
 }
 
 static struct tree *load_tree_for_commit(struct repository *r,
Abhishek Kumar Aug. 14, 2020, 4:59 a.m. UTC | #2
On Mon, Aug 10, 2020 at 10:23:32AM -0400, Derrick Stolee wrote:
> ....
> --- >8 ---
> 
> From 62189709fad3b051cedbd36193f5244fcce17e1f Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Mon, 10 Aug 2020 10:06:47 -0400
> Subject: [PATCH] commit-graph: use generation v2 only if entire chain does
> 
> Since there are released versions of Git that understand generation
> numbers in the commit-graph's CDAT chunk but do not understand the GDAT
> chunk, the following scenario is possible:
> 
>  1. "New" Git writes a commit-graph with the GDAT chunk.
>  2. "Old" Git writes a split commit-graph on top without a GDAT chunk.
> 
> Because of the current use of inspecting the current layer for a
> generation_data_chunk pointer, the commits in the lower layer will be
> interpreted as having very large generation values (commit date plus
> offset) compared to the generation numbers in the top layer (topological
> level). This violates the expectation that the generation of a parent is
> strictly smaller than the generation of a child.
> 
> It is difficult to expose this issue in a test. Since we _start_ with
> artificially low generation numbers, any commit walk that prioritizes
> generation numbers will walk all of the commits with high generation
> number before walking the commits with low generation number. In all the
> cases I tried, the commit-graph layers themselves "protect" any
> incorrect behavior since none of the commits in the lower layer can
> reach the commits in the upper layer.
> 
> This issue would manifest itself as a performance problem in this
> case, especially with something like "git log --graph" since the low
> generation numbers would cause the in-degree queue to walk all of the
> commits in the lower layer before allowing the topo-order queue to write
> anything to output (depending on the size of the upper layer).
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index eb78af3dad..17623274d9 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -762,7 +762,9 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
>  	return &commit_list_insert(c, pptr)->next;
>  }
>  
> -static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
> +#define COMMIT_GRAPH_GENERATION_V2 (1 << 0)
> +
> +static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos, int flags)
>  {
>  	const unsigned char *commit_data;
>  	struct commit_graph_data *graph_data;
> @@ -785,11 +787,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>  	date_low = get_be32(commit_data + g->hash_len + 12);
>  	item->date = (timestamp_t)((date_high << 32) | date_low);
>  
> -	if (g->chunk_generation_data)
> -	{
> +	if (g->chunk_generation_data && (flags & COMMIT_GRAPH_GENERATION_V2))
>  		graph_data->generation = item->date +
>  			(timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
> -	}
>  	else
>  		graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
>  }
> @@ -799,6 +799,10 @@ static inline void set_commit_tree(struct commit *c, struct tree *t)
>  	c->maybe_tree = t;
>  }
>  
> +/*
> + * In the case of a split commit-graph, this method expects the given
> + * commit-graph 'g' to be the top layer.
> + */

Unfortunately, this turns out to be an optimistic assumption. After
adding in changes from this patch and extended tests for
split-commit-graph [1], the tests fail with `git commit-graph verify` as
Git tries to compare topological level with corrected commit dates.

The problem lies in assuming that `g` is always the top layer, without
any way to assert if that's true. In case of `commit-graph verify`, we
update `g = g->base_graph` and verify recursively.

If we can assume such behavior is only the part of verify subcommand, I
can update the tests to no longer verify at the end.

[1]: https://lore.kernel.org/git/4043ffbc-84df-0cd6-5c75-af80383a56cf@gmail.com/

>  static int fill_commit_in_graph(struct repository *r,
>  				struct commit *item,
>  				struct commit_graph *g, uint32_t pos)
> @@ -808,11 +812,15 @@ static int fill_commit_in_graph(struct repository *r,
>  	struct commit_list **pptr;
>  	const unsigned char *commit_data;
>  	uint32_t lex_index;
> +	int flags = 0;
> +
> +	if (g->chunk_generation_data)
> +		flags |= COMMIT_GRAPH_GENERATION_V2;
>  
>  	while (pos < g->num_commits_in_base)
>  		g = g->base_graph;
>  
> -	fill_commit_graph_info(item, g, pos);
> +	fill_commit_graph_info(item, g, pos, flags);
>  
>  	lex_index = pos - g->num_commits_in_base;
>  	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
> @@ -904,10 +912,14 @@ int parse_commit_in_graph(struct repository *r, struct commit *item)
>  void load_commit_graph_info(struct repository *r, struct commit *item)
>  {
>  	uint32_t pos;
> +	int flags = 0;
> +
>  	if (!prepare_commit_graph(r))
>  		return;
> +	if (r->objects->commit_graph->chunk_generation_data)
> +		flags |= COMMIT_GRAPH_GENERATION_V2;
>  	if (find_commit_in_graph(item, r->objects->commit_graph, &pos))
> -		fill_commit_graph_info(item, r->objects->commit_graph, pos);
> +		fill_commit_graph_info(item, r->objects->commit_graph, pos, flags);
>  }
>  
>  static struct tree *load_tree_for_commit(struct repository *r,
> -- 
> 2.28.0.38.gc6f546511c1
> 

I solved the issue by adding a new member to struct commit_graph
`read_generation_data` to maintain the "global" state of the entire
commit-graph chain instead.

The relevant changes are in validate_mixed_generation_chain(),
read_commit_graph_one() and fill_commit_graph_info().

Thanks
- Abhishek

--- >8 ---
From aaf8c27bfec6e110a8bb12173c2dd612a8c6b8b9 Mon Sep 17 00:00:00 2001
From: Abhishek Kumar <abhishekkumar8222@gmail.com>
Date: Thu, 6 Aug 2020 19:08:52 +0530
Subject: [PATCH] commit-graph: use generation v2 only if entire chain does

Since there are released versions of Git that understand generation
numbers in the commit-graph's CDAT chunk but do not understand the GDAT
chunk, the following scenario is possible:

1. "New" Git writes a commit-graph with the GDAT chunk.
2. "Old" Git writes a split commit-graph on top without a GDAT chunk.

Because of the current use of inspecting the current layer for a
chunk_generation_data pointer, the commits in the lower layer will be
interpreted as having very large generation values (commit date plus
offset) compared to the generation numbers in the top layer (topological
level). This violates the expectation that the generation of a parent is
strictly smaller than the generation of a child.

It is difficult to expose this issue in a test. Since we _start_ with
artificially low generation numbers, any commit walk that prioritizes
generation numbers will walk all of the commits with high generation
number before walking the commits with low generation number. In all the
cases I tried, the commit-graph layers themselves "protect" any
incorrect behavior since none of the commits in the lower layer can
reach the commits in the upper layer.

This issue would manifest itself as a performance problem in this case,
especially with something like "git log --graph" since the low
generation numbers would cause the in-degree queue to walk all of the
commits in the lower layer before allowing the topo-order queue to write
anything to output (depending on the size of the upper layer).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c                | 36 ++++++++++++++++--
 commit-graph.h                |  1 +
 t/t5324-split-commit-graph.sh | 70 +++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d0f977852b..10309f870f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -597,6 +597,27 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	return graph_chain;
 }
 
+static void validate_mixed_generation_chain(struct repository *r)
+{
+	struct commit_graph *g = r->objects->commit_graph;
+	int read_generation_data = 1;
+
+	while (g) {
+		if (!g->chunk_generation_data) {
+			read_generation_data = 0;
+			break;
+		}
+		g = g->base_graph;
+	}
+
+	g = r->objects->commit_graph;
+
+	while (g) {
+		g->read_generation_data = read_generation_data;
+		g = g->base_graph;
+	}
+}
+
 struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb)
 {
@@ -605,6 +626,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 	if (!g)
 		g = load_commit_graph_chain(r, odb);
 
+	validate_mixed_generation_chain(r);
+
 	return g;
 }
 
@@ -740,6 +763,8 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 	return &commit_list_insert(c, pptr)->next;
 }
 
+#define COMMIT_GRAPH_GENERATION_V2 (1 << 0)
+
 static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
 {
 	const unsigned char *commit_data;
@@ -763,11 +788,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	if (g->chunk_generation_data)
-	{
+	if (g->chunk_generation_data && g->read_generation_data)
 		graph_data->generation = item->date +
 			(timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
-	}
 	else
 		graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
@@ -884,6 +907,7 @@ void load_commit_graph_info(struct repository *r, struct commit *item)
 	uint32_t pos;
 	if (!prepare_commit_graph(r))
 		return;
+
 	if (find_commit_in_graph(item, r->objects->commit_graph, &pos))
 		fill_commit_graph_info(item, r->objects->commit_graph, pos);
 }
@@ -2186,6 +2210,9 @@ int write_commit_graph(struct object_directory *odb,
 
 		g = ctx->r->objects->commit_graph;
 
+		if (g && !g->chunk_generation_data)
+			ctx->write_generation_data = 0;
+
 		while (g) {
 			ctx->num_commit_graphs_before++;
 			g = g->base_graph;
@@ -2204,6 +2231,9 @@ int write_commit_graph(struct object_directory *odb,
 
 		if (ctx->split_opts)
 			replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE;
+
+		if (replace)
+			ctx->write_generation_data = 1;
 	}
 
 	ctx->approx_nr_objects = approximate_object_count();
diff --git a/commit-graph.h b/commit-graph.h
index f89614ecd5..305c332b7e 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -63,6 +63,7 @@ struct commit_graph {
 	struct object_directory *odb;
 
 	uint32_t num_commits_in_base;
+	uint32_t read_generation_data;
 	struct commit_graph *base_graph;
 
 	const uint32_t *chunk_oid_fanout;
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 531016f405..ac5e7783fb 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -424,4 +424,74 @@ done <<\EOF
 0600 -r--------
 EOF
 
+test_expect_success 'setup repo for mixed generation commit-graph-chain' '
+	mkdir mixed &&
+	graphdir=".git/objects/info/commit-graphs" &&
+	cd "$TRASH_DIRECTORY/mixed" &&
+	git init &&
+	git config core.commitGraph true &&
+	git config gc.writeCommitGraph false &&
+	for i in $(test_seq 3)
+	do
+		test_commit $i &&
+		git branch commits/$i || return 1
+	done &&
+	git reset --hard commits/1 &&
+	for i in $(test_seq 4 5)
+	do
+		test_commit $i &&
+		git branch commits/$i || return 1
+	done &&
+	git reset --hard commits/2 &&
+	for i in $(test_seq 6 10)
+	do
+		test_commit $i &&
+		git branch commits/$i || return 1
+	done &&
+	git commit-graph write --reachable --split &&
+	git reset --hard commits/2 &&
+	git merge commits/4 &&
+	git branch merge/1 &&
+	git reset --hard commits/4 &&
+	git merge commits/6 &&
+	git branch merge/2 &&
+	GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge &&
+	test-tool read-graph >output &&
+	cat >expect <<-EOF &&
+	header: 43475048 1 1 4 1
+	num_commits: 2
+	chunks: oid_fanout oid_lookup commit_metadata
+	EOF
+	test_cmp expect output &&
+	git commit-graph verify
+'
+
+test_expect_success 'does not write generation data chunk if not present on existing tip' '
+	cd "$TRASH_DIRECTORY/mixed" &&
+	git reset --hard commits/3 &&
+	git merge merge/1 &&
+	git merge commits/5 &&
+	git merge merge/2 &&
+	git branch merge/3 &&
+	git commit-graph write --reachable --split=no-merge &&
+	test-tool read-graph >output &&
+	cat >expect <<-EOF &&
+	header: 43475048 1 1 4 2
+	num_commits: 3
+	chunks: oid_fanout oid_lookup commit_metadata
+	EOF
+	test_cmp expect output &&
+	git commit-graph verify
+'
+
+test_expect_success 'writes generation data chunk when commit-graph chain is replaced' '
+	cd "$TRASH_DIRECTORY/mixed" &&
+	git commit-graph write --reachable --split=replace &&
+	test_path_is_file $graphdir/commit-graph-chain &&
+	test_line_count = 1 $graphdir/commit-graph-chain &&
+	verify_chain_files_exist $graphdir &&
+	graph_read_expect 15 &&
+	git commit-graph verify
+'
+
 test_done
Derrick Stolee Aug. 14, 2020, 12:24 p.m. UTC | #3
On 8/14/2020 12:59 AM, Abhishek Kumar wrote:
> I solved the issue by adding a new member to struct commit_graph
> `read_generation_data` to maintain the "global" state of the entire
> commit-graph chain instead.
> 
> The relevant changes are in validate_mixed_generation_chain(),
> read_commit_graph_one() and fill_commit_graph_info().

I think this is a good way to go. Adding that restriction about
the tip commit-graph was short-sighted of me and was likely to
break in the future.

I think your solution here to store extra state from the entire
chain into each layer makes a lot of sense.

Thanks!
-Stolee
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 42f3ec5460..d0f977852b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -65,6 +65,8 @@  void git_test_write_commit_graph_or_die(void)
 /* Remember to update object flag allocation in object.h */
 #define REACHABLE       (1u<<15)
 
+define_commit_slab(topo_level_slab, uint32_t);
+
 /* Keep track of the order in which commits are added to our list. */
 define_commit_slab(commit_pos, int);
 static struct commit_pos commit_pos = COMMIT_SLAB_INIT(1, commit_pos);
@@ -168,11 +170,6 @@  static int commit_gen_cmp(const void *va, const void *vb)
 	else if (generation_a > generation_b)
 		return 1;
 
-	/* use date as a heuristic when generations are equal */
-	if (a->date < b->date)
-		return -1;
-	else if (a->date > b->date)
-		return 1;
 	return 0;
 }
 
@@ -767,7 +764,10 @@  static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
 	if (g->chunk_generation_data)
-		graph_data->generation = get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
+	{
+		graph_data->generation = item->date +
+			(timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
+	}
 	else
 		graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
@@ -948,6 +948,7 @@  struct write_commit_graph_context {
 	struct progress *progress;
 	int progress_done;
 	uint64_t progress_cnt;
+	struct topo_level_slab *topo_levels;
 
 	char *base_graph_name;
 	int num_commit_graphs_before;
@@ -1106,7 +1107,7 @@  static int write_graph_chunk_data(struct hashfile *f,
 		else
 			packedDate[0] = 0;
 
-		packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);
+		packedDate[0] |= htonl(*topo_level_slab_at(ctx->topo_levels, *list) << 2);
 
 		packedDate[1] = htonl((*list)->date);
 		hashwrite(f, packedDate, 8);
@@ -1123,8 +1124,13 @@  static int write_graph_chunk_generation_data(struct hashfile *f,
 	int i;
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
+		timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
 		display_progress(ctx->progress, ++ctx->progress_cnt);
-		hashwrite_be32(f, commit_graph_data_at(c)->generation);
+
+		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
+			offset = GENERATION_NUMBER_V2_OFFSET_MAX;
+
+		hashwrite_be32(f, offset);
 	}
 
 	return 0;
@@ -1360,11 +1366,11 @@  static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
-		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
+		uint32_t topo_level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
 
 		display_progress(ctx->progress, i + 1);
-		if (generation != GENERATION_NUMBER_V1_INFINITY &&
-		    generation != GENERATION_NUMBER_ZERO)
+		if (topo_level != GENERATION_NUMBER_V1_INFINITY &&
+		    topo_level != GENERATION_NUMBER_ZERO)
 			continue;
 
 		commit_list_insert(ctx->commits.list[i], &list);
@@ -1372,29 +1378,38 @@  static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			struct commit *current = list->item;
 			struct commit_list *parent;
 			int all_parents_computed = 1;
-			uint32_t max_generation = 0;
+			uint32_t max_level = 0;
+			timestamp_t max_corrected_commit_date = current->date - 1;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				generation = commit_graph_data_at(parent->item)->generation;
+				topo_level = *topo_level_slab_at(ctx->topo_levels, parent->item);
 
-				if (generation == GENERATION_NUMBER_V1_INFINITY ||
-				    generation == GENERATION_NUMBER_ZERO) {
+				if (topo_level == GENERATION_NUMBER_V1_INFINITY ||
+				    topo_level == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-				} else if (generation > max_generation) {
-					max_generation = generation;
+				} else {
+					struct commit_graph_data *data = commit_graph_data_at(parent->item);
+
+					if (topo_level > max_level)
+						max_level = topo_level;
+
+					if (data->generation > max_corrected_commit_date)
+						max_corrected_commit_date = data->generation;
 				}
 			}
 
 			if (all_parents_computed) {
 				struct commit_graph_data *data = commit_graph_data_at(current);
 
-				data->generation = max_generation + 1;
-				pop_commit(&list);
+				if (max_level > GENERATION_NUMBER_MAX - 1)
+					max_level = GENERATION_NUMBER_MAX - 1;
+
+				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
+				data->generation = max_corrected_commit_date + 1;
 
-				if (data->generation > GENERATION_NUMBER_MAX)
-					data->generation = GENERATION_NUMBER_MAX;
+				pop_commit(&list);
 			}
 		}
 	}
@@ -2132,6 +2147,7 @@  int write_commit_graph(struct object_directory *odb,
 	uint32_t i, count_distinct = 0;
 	int res = 0;
 	int replace = 0;
+	struct topo_level_slab topo_levels;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
@@ -2146,6 +2162,9 @@  int write_commit_graph(struct object_directory *odb,
 	ctx->total_bloom_filter_data_size = 0;
 	ctx->write_generation_data = !git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0);
 
+	init_topo_level_slab(&topo_levels);
+	ctx->topo_levels = &topo_levels;
+
 	if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS)
 		ctx->changed_paths = 1;
 	if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
@@ -2387,8 +2406,8 @@  int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
-		timestamp_t max_generation = 0;
-		timestamp_t generation;
+		timestamp_t max_parent_corrected_commit_date = 0;
+		timestamp_t corrected_commit_date;
 
 		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
@@ -2427,9 +2446,9 @@  int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			generation = commit_graph_generation(graph_parents->item);
-			if (generation > max_generation)
-				max_generation = generation;
+			corrected_commit_date = commit_graph_generation(graph_parents->item);
+			if (corrected_commit_date > max_parent_corrected_commit_date)
+				max_parent_corrected_commit_date = corrected_commit_date;
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2451,20 +2470,12 @@  int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (generation_zero == GENERATION_ZERO_EXISTS)
 			continue;
 
-		/*
-		 * If one of our parents has generation GENERATION_NUMBER_MAX, then
-		 * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
-		 * extra logic in the following condition.
-		 */
-		if (max_generation == GENERATION_NUMBER_MAX)
-			max_generation--;
-
-		generation = commit_graph_generation(graph_commit);
-		if (generation != max_generation + 1)
-			graph_report(_("commit-graph generation for commit %s is %u != %u"),
+		corrected_commit_date = commit_graph_generation(graph_commit);
+		if (corrected_commit_date < max_parent_corrected_commit_date + 1)
+			graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime),
 				     oid_to_hex(&cur_oid),
-				     generation,
-				     max_generation + 1);
+				     corrected_commit_date,
+				     max_parent_corrected_commit_date + 1);
 
 		if (graph_commit->date != odb_commit->date)
 			graph_report(_("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime),
diff --git a/commit.h b/commit.h
index bc0732a4fe..bb846e0025 100644
--- a/commit.h
+++ b/commit.h
@@ -15,6 +15,7 @@ 
 #define GENERATION_NUMBER_V1_INFINITY 0xFFFFFFFF
 #define GENERATION_NUMBER_MAX 0x3FFFFFFF
 #define GENERATION_NUMBER_ZERO 0
+#define GENERATION_NUMBER_V2_OFFSET_MAX 0xFFFFFFFF
 
 struct commit_list {
 	struct commit *item;