diff mbox series

[1/3] commit-graph: let commit graph respect commit graft

Message ID 19fd72c34dcd1332df638d76b0b028e9d9da3d41.1662025272.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series list-object-filter: introduce depth filter | expand

Commit Message

ZheNing Hu Sept. 1, 2022, 9:41 a.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

In repo_parse_commit_internal(), if we want to use
commit graph, it will call parse_commit_in_graph() to
parse commit's content from commit graph, otherwise
call repo_read_object_file() to parse commit's content
from commit object.

repo_read_object_file() will respect commit graft,
which can correctly amend commit's parents. But
parse_commit_in_graph() not. Inconsistencies here may
result in incorrect processing of shallow clone.

So let parse_commit_in_graph() respect commit graft as
repo_read_object_file() does, which can solve this problem.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 commit-graph.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

Comments

Derrick Stolee Sept. 1, 2022, 7:18 p.m. UTC | #1
On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote:
> From: ZheNing Hu <adlternative@gmail.com>
> 
> In repo_parse_commit_internal(), if we want to use
> commit graph, it will call parse_commit_in_graph() to
> parse commit's content from commit graph, otherwise
> call repo_read_object_file() to parse commit's content
> from commit object.
> 
> repo_read_object_file() will respect commit graft,
> which can correctly amend commit's parents. But
> parse_commit_in_graph() not. Inconsistencies here may
> result in incorrect processing of shallow clone.
> 
> So let parse_commit_in_graph() respect commit graft as
> repo_read_object_file() does, which can solve this problem.

If grafts or replace-objects exist, then the commit-graph
is disabled and this code will never be called. I would
expect a test case demonstrating the change in behavior
here, but that is impossible.

The commit-graph parsing should not be bogged down with
this logic.

Thanks,
-Stolee
ZheNing Hu Sept. 4, 2022, 5:57 a.m. UTC | #2
Derrick Stolee <derrickstolee@github.com> 于2022年9月2日周五 03:18写道:
>
> On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In repo_parse_commit_internal(), if we want to use
> > commit graph, it will call parse_commit_in_graph() to
> > parse commit's content from commit graph, otherwise
> > call repo_read_object_file() to parse commit's content
> > from commit object.
> >
> > repo_read_object_file() will respect commit graft,
> > which can correctly amend commit's parents. But
> > parse_commit_in_graph() not. Inconsistencies here may
> > result in incorrect processing of shallow clone.
> >
> > So let parse_commit_in_graph() respect commit graft as
> > repo_read_object_file() does, which can solve this problem.
>
> If grafts or replace-objects exist, then the commit-graph
> is disabled and this code will never be called. I would
> expect a test case demonstrating the change in behavior
> here, but that is impossible.
>

Thanks for the clarification.
I don't really know what's the wrong here, but just let do a little test:

1. Revert this commit 19fd72c34dcd1332df638d76b0b028e9d9da3d41
$ git revert 19fd72

2. Clone the git repo
$ git clone --bare git@github.com:git/git.git

3. Write commit graph
$ git commit-graph write

4. Use the depth=<depth> to clone (depth=1)
$  git clone --no-checkout --no-local --=depth=1 git.git git1
Cloning into 'git1'...
remote: Enumerating objects: 4306, done.
remote: Counting objects: 100% (4306/4306), done.
remote: Compressing objects: 100% (3785/3785), done.

4.  Use the depth=<depth> to clone (depth=2)
$  git clone --no-checkout --no-local --=depth=2 git.git git2
Cloning into 'git2'...
remote: Enumerating objects: 4311, done.
remote: Counting objects: 100% (4311/4311), done.
remote: Compressing objects: 100% (3788/3788), done.

5. Use the depth filter to clone (depth=1)
$  git clone --no-checkout --no-local --filter=depth:1 git.git git3
Cloning into 'git3'...
remote: Enumerating objects: 4306, done.
remote: Counting objects: 100% (4306/4306), done.
remote: Compressing objects: 100% (3785/3785), done.

6. Use the depth filter to clone (depth=2)
$  git clone --no-checkout --no-local --filter=depth:2 git.git git4
Cloning into 'git4'...
remote: Enumerating objects: 322987, done.
remote: Counting objects: 100% (322987/322987), done.
remote: Compressing objects: 100% (77441/77441), done.

As we can see, when we use --filter=depth:<depth> (depth >= 2),
it seems like we clone a lot of objects. The result is significantly
different from git clone --depth=<depth> (depth >= 2).

So I debug it by reproducing the git pack-objects process:

I find there are different action between --filter=depth:<depth> and
 --depth=<depth> .

--filter=depth:<depth> will be successfully resolved commit parents in
parse_commit_in_graph(),

Call stack( cmd_pack_objects -> get_object_list -> traverse_commit_list ->
traverse_commit_list_filtered -> do_traverse -> get_revision ->
get_revision_internal -> get_revision_1 -> process_parents ->
repo_parse_commit_gently -> repo_parse_commit_internal ->
parse_commit_in_graph)

--depth=<depth> will failed in parse_commit_in_graph(), and call
repo_read_object_file() to resolved commit parents.

Call stack( cmd_pack_objects -> get_object_list -> traverse_commit_list ->
traverse_commit_list_filtered -> do_traverse -> get_revision ->
get_revision_internal -> get_revision_1 -> process_parents ->
repo_parse_commit_gently -> repo_parse_commit_internal ->
repo_read_object_file)

> The commit-graph parsing should not be bogged down with
> this logic.
>

So I try to fix this problem by let commit-graph respect commit-graft.
I don't know if I overlook something before...

> Thanks,
> -Stolee
>

Thanks,
ZheNing Hu
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index f2a36032f84..89bb6f87079 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -820,6 +820,7 @@  static int fill_commit_in_graph(struct repository *r,
 	struct commit_list **pptr;
 	const unsigned char *commit_data;
 	uint32_t lex_index;
+	struct commit_graft *graft;
 
 	while (pos < g->num_commits_in_base)
 		g = g->base_graph;
@@ -833,31 +834,54 @@  static int fill_commit_in_graph(struct repository *r,
 
 	set_commit_tree(item, NULL);
 
+	graft = lookup_commit_graft(r, &item->object.oid);
+	if (graft)
+		r->parsed_objects->substituted_parent = 1;
+
 	pptr = &item->parents;
 
 	edge_value = get_be32(commit_data + g->hash_len);
 	if (edge_value == GRAPH_PARENT_NONE)
 		return 1;
-	pptr = insert_parent_or_die(r, g, edge_value, pptr);
+	if (!(graft && (graft->nr_parent < 0 || grafts_replace_parents)))
+		pptr = insert_parent_or_die(r, g, edge_value, pptr);
 
 	edge_value = get_be32(commit_data + g->hash_len + 4);
 	if (edge_value == GRAPH_PARENT_NONE)
 		return 1;
 	if (!(edge_value & GRAPH_EXTRA_EDGES_NEEDED)) {
-		pptr = insert_parent_or_die(r, g, edge_value, pptr);
-		return 1;
+		if (!(graft && (graft->nr_parent < 0 || grafts_replace_parents))) {
+			pptr = insert_parent_or_die(r, g, edge_value, pptr);
+			return 1;
+		}
 	}
 
 	parent_data_ptr = (uint32_t*)(g->chunk_extra_edges +
 			  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
 	do {
 		edge_value = get_be32(parent_data_ptr);
-		pptr = insert_parent_or_die(r, g,
-					    edge_value & GRAPH_EDGE_LAST_MASK,
-					    pptr);
+		if (!(graft && (graft->nr_parent < 0 || grafts_replace_parents))) {
+			pptr = insert_parent_or_die(r, g,
+						    edge_value & GRAPH_EDGE_LAST_MASK,
+						    pptr);
+		}
 		parent_data_ptr++;
 	} while (!(edge_value & GRAPH_LAST_EDGE));
 
+	if (graft) {
+		int i;
+		struct commit *new_parent;
+		for (i = 0; i < graft->nr_parent; i++) {
+			new_parent = lookup_commit(r,
+						   &graft->parent[i]);
+			if (!new_parent)
+				die(_("bad graft parent %s in commit %s"),
+				       oid_to_hex(&graft->parent[i]),
+				       oid_to_hex(&item->object.oid));
+			pptr = &commit_list_insert(new_parent, pptr)->next;
+		}
+	}
+
 	return 1;
 }