diff mbox series

[v5,1/1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()

Message ID 3ffeed50deb2d292cef0a518085d60d22c5dd79b.1657604799.git.hanxin.hx@bytedance.com (mailing list archive)
State New, archived
Headers show
Series no lazy fetch in lookup_commit_in_graph() | expand

Commit Message

Han Xin July 12, 2022, 6:50 a.m. UTC
The commit-graph is used to opportunistically optimize accesses to
certain pieces of information on commit objects, and
lookup_commit_in_graph() tries to say "no" when the requested commit
does not locally exist by returning NULL, in which case the caller
can ask for (which may result in on-demand fetching from a promisor
remote) and parse the commit object itself.

However, it uses a wrong helper, repo_has_object_file(), to do so.
This helper not only checks if an object is mmediately available in
the local object store, but also tries to fetch from a promisor remote.
But the fetch machinery calls lookup_commit_in_graph(), thus causing an
infinite loop.

We should make lookup_commit_in_graph() expect that a commit given to it
can be legitimately missing from the local object store, by using the
has_object_file() helper instead.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 commit-graph.c                             |  2 +-
 t/t5330-no-lazy-fetch-with-commit-graph.sh | 47 ++++++++++++++++++++++
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100755 t/t5330-no-lazy-fetch-with-commit-graph.sh

Comments

Ævar Arnfjörð Bjarmason July 12, 2022, 9:50 a.m. UTC | #1
On Tue, Jul 12 2022, Han Xin wrote:

> The commit-graph is used to opportunistically optimize accesses to
> certain pieces of information on commit objects, and
> lookup_commit_in_graph() tries to say "no" when the requested commit
> does not locally exist by returning NULL, in which case the caller
> can ask for (which may result in on-demand fetching from a promisor
> remote) and parse the commit object itself.
>
> However, it uses a wrong helper, repo_has_object_file(), to do so.
> This helper not only checks if an object is mmediately available in
> the local object store, but also tries to fetch from a promisor remote.
> But the fetch machinery calls lookup_commit_in_graph(), thus causing an
> infinite loop.
>
> We should make lookup_commit_in_graph() expect that a commit given to it
> can be legitimately missing from the local object store, by using the
> has_object_file() helper instead.
>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  commit-graph.c                             |  2 +-
>  t/t5330-no-lazy-fetch-with-commit-graph.sh | 47 ++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 1 deletion(-)
>  create mode 100755 t/t5330-no-lazy-fetch-with-commit-graph.sh
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 92d4503336..2b04ef072d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -898,7 +898,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
>  		return NULL;
>  	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>  		return NULL;
> -	if (!repo_has_object_file(repo, id))
> +	if (!has_object(repo, id, 0))
>  		return NULL;
>  
>  	commit = lookup_commit(repo, id);
> diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..2cc7fd7a47
> --- /dev/null
> +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> @@ -0,0 +1,47 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup: prepare a repository with a commit' '
> +	git init with-commit &&
> +	test_commit -C with-commit the-commit &&
> +	oid=$(git -C with-commit rev-parse HEAD)
> +'
> +
> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> +	git init with-commit-graph &&
> +	echo "$(pwd)/with-commit/.git/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&
> +	# create a ref that points to the commit in alternates
> +	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> +	# prepare some other objects to commit-graph
> +	test_commit -C with-commit-graph something &&
> +	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> +	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> +'
> +
> +test_expect_success 'setup: change the alternates to what without the commit' '
> +	git init --bare without-commit &&
> +	git -C with-commit-graph cat-file -e $oid &&
> +	echo "$(pwd)/without-commit/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&
> +	test_must_fail git -C with-commit-graph cat-file -e $oid
> +'
> +
> +test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
> +	# setup promisor and prepare any commit to fetch
> +	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> +	git -C with-commit-graph config remote.origin.promisor true &&
> +	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> +	test_commit -C with-commit any-commit &&
> +	anycommit=$(git -C with-commit rev-parse HEAD) &&
> +	GIT_TRACE="$(pwd)/trace.txt" \
> +		git -C with-commit-graph fetch origin $anycommit 2>err &&
> +	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&

This part seems quite odd, we tested the exit code, so here we're being
paranoid about not getting a specific "fatal" error message.

It seems more worthwhile to test the warnings we emit, which in this
case seem to be duplicated (but that's probably an existing issue...).


> +	grep "git fetch origin" trace.txt >actual &&
> +	test_line_count = 1 actual
> +'

I wondered if "test_subcomand" here would be better, i.e. fewer things
scraping GIT_TRACE, and using the machine-readable GIT_TRACE2_EVENT
instead...
Han Xin July 13, 2022, 1:26 a.m. UTC | #2
On Tue, Jul 12, 2022 at 5:52 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > +test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
> > +     # setup promisor and prepare any commit to fetch
> > +     git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> > +     git -C with-commit-graph config remote.origin.promisor true &&
> > +     git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> > +     test_commit -C with-commit any-commit &&
> > +     anycommit=$(git -C with-commit rev-parse HEAD) &&
> > +     GIT_TRACE="$(pwd)/trace.txt" \
> > +             git -C with-commit-graph fetch origin $anycommit 2>err &&
> > +     ! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
>
> This part seems quite odd, we tested the exit code, so here we're being
> paranoid about not getting a specific "fatal" error message.
>
> It seems more worthwhile to test the warnings we emit, which in this
> case seem to be duplicated (but that's probably an existing issue...).
>

So you mean the grep here is redundant?

>
> > +     grep "git fetch origin" trace.txt >actual &&
> > +     test_line_count = 1 actual
> > +'
>
> I wondered if "test_subcomand" here would be better, i.e. fewer things
> scraping GIT_TRACE, and using the machine-readable GIT_TRACE2_EVENT
> instead...

I threw this question up front but got no response.

When using test_subcommand() we should give all the args,
if we remove or add any args later, this test case will always
pass even without this fix. So, is this test case still strict?

    GIT_TRACE2_EVENT="$(PWD)/trace.txt" \
        git -C with-commit-graph fetch origin $anycommit &&
    test_subcommand ! git -c fetch.negotiationAlgorithm=noop \
        fetch origin --no-tags --no-write-fetch-head \
        --recurse-submodules=no --filter=blob:none \
        --stdin <trace.txt

Existing usages are as follows, and they all have fewer parameters:

     test_subcommand ! git gc --quiet <run-config.txt &&

     test_subcommand ! git multi-pack-index write --no-progress <trace-A

     test_subcommand ! git pack-refs --all --prune \
          <incremental-daily.txt &&

Thanks
-Han Xin
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 92d4503336..2b04ef072d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -898,7 +898,7 @@  struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!repo_has_object_file(repo, id))
+	if (!has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
new file mode 100755
index 0000000000..2cc7fd7a47
--- /dev/null
+++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
@@ -0,0 +1,47 @@ 
+#!/bin/sh
+
+test_description='test for no lazy fetch with the commit-graph'
+
+. ./test-lib.sh
+
+test_expect_success 'setup: prepare a repository with a commit' '
+	git init with-commit &&
+	test_commit -C with-commit the-commit &&
+	oid=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
+	git init with-commit-graph &&
+	echo "$(pwd)/with-commit/.git/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	# create a ref that points to the commit in alternates
+	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
+	# prepare some other objects to commit-graph
+	test_commit -C with-commit-graph something &&
+	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
+	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
+'
+
+test_expect_success 'setup: change the alternates to what without the commit' '
+	git init --bare without-commit &&
+	git -C with-commit-graph cat-file -e $oid &&
+	echo "$(pwd)/without-commit/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	test_must_fail git -C with-commit-graph cat-file -e $oid
+'
+
+test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
+	# setup promisor and prepare any commit to fetch
+	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
+	git -C with-commit-graph config remote.origin.promisor true &&
+	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
+	test_commit -C with-commit any-commit &&
+	anycommit=$(git -C with-commit rev-parse HEAD) &&
+	GIT_TRACE="$(pwd)/trace.txt" \
+		git -C with-commit-graph fetch origin $anycommit 2>err &&
+	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
+	grep "git fetch origin" trace.txt >actual &&
+	test_line_count = 1 actual
+'
+
+test_done