diff mbox series

[v3,2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()

Message ID 3cdb1abd43779844b8e8dc094e2fd2da1adc461a.1656381667.git.hanxin.hx@bytedance.com (mailing list archive)
State Superseded
Headers show
Series no lazy fetch in lookup_commit_in_graph() | expand

Commit Message

Han Xin June 28, 2022, 2:02 a.m. UTC
The commit-graph is used to opportunistically optimize accesses to
certain pieces of information on commit objects, and
lookup_commit_in_graph() tries to say "no" when the requested commit
does not locally exist by returning NULL, in which case the caller
can ask for (which may result in on-demand fetching from a promisor
remote) and parse the commit object itself.

However, it uses a wrong helper, repo_has_object_file(), to do so.
This helper not only checks if an object is mmediately available in
the local object store, but also tries to fetch from a promisor remote.
But the fetch machinery calls lookup_commit_in_graph(), thus causing an
infinite loop.

We should make lookup_commit_in_graph() expect that a commit given to it
can be legitimately missing from the local object store, by using the
has_object_file() helper instead.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 commit-graph.c                             |  2 +-
 t/t5329-no-lazy-fetch-with-commit-graph.sh | 53 ++++++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh

Comments

Ævar Arnfjörð Bjarmason June 28, 2022, 7:49 a.m. UTC | #1
On Tue, Jun 28 2022, Han Xin wrote:

> The commit-graph is used to opportunistically optimize accesses to
> certain pieces of information on commit objects, and
> lookup_commit_in_graph() tries to say "no" when the requested commit
> does not locally exist by returning NULL, in which case the caller
> can ask for (which may result in on-demand fetching from a promisor
> remote) and parse the commit object itself.
>
> However, it uses a wrong helper, repo_has_object_file(), to do so.
> This helper not only checks if an object is mmediately available in
> the local object store, but also tries to fetch from a promisor remote.
> But the fetch machinery calls lookup_commit_in_graph(), thus causing an
> infinite loop.
>
> We should make lookup_commit_in_graph() expect that a commit given to it
> can be legitimately missing from the local object store, by using the
> has_object_file() helper instead.
>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  commit-graph.c                             |  2 +-
>  t/t5329-no-lazy-fetch-with-commit-graph.sh | 53 ++++++++++++++++++++++
>  2 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 2b52818731..2dd9bcc7ea 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
>  		return NULL;
>  	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>  		return NULL;
> -	if (!repo_has_object_file(repo, id))
> +	if (!has_object(repo, id, 0))
>  		return NULL;
>  
>  	commit = lookup_commit(repo, id);
> diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..d7877a5758
> --- /dev/null
> +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> @@ -0,0 +1,53 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +if ! test_have_prereq ULIMIT_PROCESSES

I think the prereq in 1/2 would be better off squashed into this commit.

> +then
> +	skip_all='skipping tests for no lazy fetch with the commit-graph, ulimit processes not available'
> +	test_done
> +fi
> +
> +test_expect_success 'setup: prepare a repository with a commit' '
> +	git init with-commit &&
> +	test_commit -C with-commit the-commit &&
> +	oid=$(git -C with-commit rev-parse HEAD)
> +'
> +
> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> +	git init with-commit-graph &&
> +	echo "$(pwd)/with-commit/.git/objects" \

nit: you can use $PWD instead of $(pwd).

> +		>with-commit-graph/.git/objects/info/alternates &&
> +	# create a ref that points to the commit in alternates
> +	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> +	# prepare some other objects to commit-graph
> +	test_commit -C with-commit-graph something &&
> +	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> +	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> +'
> +
> +test_expect_success 'setup: change the alternates to what without the commit' '
> +	git init --bare without-commit &&

Maybe run a successful cat-file here...
> +	echo "$(pwd)/without-commit/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&
> +	test_must_fail git -C with-commit-graph cat-file -e $oid
...to show that it fails after the "echo" above?
> +'
> +
> +test_expect_success 'setup: prepare any commit to fetch' '
> +	test_commit -C with-commit any-commit &&
> +	anycommit=$(git -C with-commit rev-parse HEAD)

I think this would be better just added before a \n\n in the next test.
> +'
> +
> +test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
> +	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> +	git -C with-commit-graph config remote.origin.promisor true &&
> +	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> +	run_with_limited_processses env GIT_TRACE="$(pwd)/trace" \
> +		git -C with-commit-graph fetch origin $anycommit 2>err &&


> +	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> +	test $(grep "fetch origin" trace | wc -l) -eq 1


Use "grep", not "test_i18ngrep", and this should use "test_line_count".

But actually better yet: this whole thing looks like it could use
"test_subcommand" instead, couldn't it?
Junio C Hamano June 28, 2022, 5:36 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +test_description='test for no lazy fetch with the commit-graph'
>> +
>> +. ./test-lib.sh
>> +
>> +if ! test_have_prereq ULIMIT_PROCESSES
>
> I think the prereq in 1/2 would be better off squashed into this commit.

Good thinking.  It also may make sense to implement it in this file,
without touching test-lib.sh at all.

>> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
>> +	git init with-commit-graph &&
>> +	echo "$(pwd)/with-commit/.git/objects" \
>> +		>with-commit-graph/.git/objects/info/alternates &&
>
> nit: you can use $PWD instead of $(pwd).

We can, and it would not make any difference on non-Windows.  

But which one should we use to cater to Windows?  $(pwd) is a full
path in Windows notation "C:\Program Files\Git\..." while $PWD is
MSYS style "/C/Program Files/Git/..." or something like that, IIRC?
Han Xin June 29, 2022, 2:08 a.m. UTC | #3
On Tue, Jun 28, 2022 at 3:53 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > +     test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> > +     test $(grep "fetch origin" trace | wc -l) -eq 1
>
>
> Use "grep", not "test_i18ngrep", and this should use "test_line_count".
>
> But actually better yet: this whole thing looks like it could use
> "test_subcommand" instead, couldn't it?

When using test_subcommand() we should give all the args,
if we remove or add any args later, this test case will always
pass even without this fix. So, is this test case still strict?

    run_with_limited_processses env GIT_TRACE2_EVENT="$(PWD)/trace.txt" \
        git -C with-commit-graph fetch origin $anycommit &&
    test_subcommand ! git -c fetch.negotiationAlgorithm=noop \
        fetch origin --no-tags --no-write-fetch-head \
        --recurse-submodules=no --filter=blob:none \
        --stdin <trace.txt
Johannes Schindelin June 30, 2022, 12:21 p.m. UTC | #4
Hi Junio,

On Tue, 28 Jun 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> >> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> >> +	git init with-commit-graph &&
> >> +	echo "$(pwd)/with-commit/.git/objects" \
> >> +		>with-commit-graph/.git/objects/info/alternates &&
> >
> > nit: you can use $PWD instead of $(pwd).
>
> We can, and it would not make any difference on non-Windows.
>
> But which one should we use to cater to Windows?  $(pwd) is a full
> path in Windows notation "C:\Program Files\Git\..." while $PWD is
> MSYS style "/C/Program Files/Git/..." or something like that, IIRC?

Indeed, and since the `alternates` file is supposed to be read by
`git.exe`, a non-MSYS program, the original was good, and the nit
suggested the incorrect form.

Thank you for catching this before it was worsimproved,
Dscho
Ævar Arnfjörð Bjarmason June 30, 2022, 1:43 p.m. UTC | #5
On Thu, Jun 30 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> On Tue, 28 Jun 2022, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>> >> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
>> >> +	git init with-commit-graph &&
>> >> +	echo "$(pwd)/with-commit/.git/objects" \
>> >> +		>with-commit-graph/.git/objects/info/alternates &&
>> >
>> > nit: you can use $PWD instead of $(pwd).
>>
>> We can, and it would not make any difference on non-Windows.
>>
>> But which one should we use to cater to Windows?  $(pwd) is a full
>> path in Windows notation "C:\Program Files\Git\..." while $PWD is
>> MSYS style "/C/Program Files/Git/..." or something like that, IIRC?
>
> Indeed, and since the `alternates` file is supposed to be read by
> `git.exe`, a non-MSYS program, the original was good, and the nit
> suggested the incorrect form.

I looked at t5615-alternate-env.sh which does the equivalent of:

	GIT_ALTERNATE_OBJECT_DIRECTORIES="$PWD/one.git/objects:$PWD/two.git/objects" \
        	git cat-file [...]

We run that test on all our platforms, does the $PWD form work in the
environment variable, but not when we write it to the "alternates" file?
Or is there some other subtlety there that I'm missing?
Junio C Hamano June 30, 2022, 3:40 p.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Jun 30 2022, Johannes Schindelin wrote:
>
>> Hi Junio,
>>
>> On Tue, 28 Jun 2022, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>> >> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
>>> >> +	git init with-commit-graph &&
>>> >> +	echo "$(pwd)/with-commit/.git/objects" \
>>> >> +		>with-commit-graph/.git/objects/info/alternates &&
>>> >
>>> > nit: you can use $PWD instead of $(pwd).
>>>
>>> We can, and it would not make any difference on non-Windows.
>>>
>>> But which one should we use to cater to Windows?  $(pwd) is a full
>>> path in Windows notation "C:\Program Files\Git\..." while $PWD is
>>> MSYS style "/C/Program Files/Git/..." or something like that, IIRC?
>>
>> Indeed, and since the `alternates` file is supposed to be read by
>> `git.exe`, a non-MSYS program, the original was good, and the nit
>> suggested the incorrect form.
>
> I looked at t5615-alternate-env.sh which does the equivalent of:
>
> 	GIT_ALTERNATE_OBJECT_DIRECTORIES="$PWD/one.git/objects:$PWD/two.git/objects" \
>         	git cat-file [...]
>
> We run that test on all our platforms, does the $PWD form work in the
> environment variable, but not when we write it to the "alternates" file?
> Or is there some other subtlety there that I'm missing?

I am also curious to see a clear and concise explanation so that we
do not have to repeat this discussion later.  We have

 - When a test checks for an absolute path that a git command generated,
   construct the expected value using $(pwd) rather than $PWD,
   $TEST_DIRECTORY, or $TRASH_DIRECTORY. It makes a difference on
   Windows, where the shell (MSYS bash) mangles absolute path names.
   For details, see the commit message of 4114156ae9.

in t/README, but even with the log mesasge of 4114156a (Tests on
Windows: $(pwd) must return Windows-style paths, 2009-03-13) [*1*],
I have no idea what makes the thing you found in t5615 work and your
suggestion to use $PWD in the new one not work.

GIT_ALTERNATE_OBJECT_DIRECTORIES is a PATH_SEP (not necessarily a
colon) separated list, and I think the way t5615 uses it is broken
on Windows where PATH_SEP is defined as semicolon without the $PWD
vs $(pwd) issue.  Is the test checking the right thing?


[Footnote]

*1*

    Tests on Windows: $(pwd) must return Windows-style paths

    Many tests pass $(pwd) in some form to git and later test that the output
    of git contains the correct value of $(pwd). For example, the test of
    'git remote show' sets up a remote that contains $(pwd) and then the
    expected result must contain $(pwd).

    Again, MSYS-bash's path mangling kicks in: Plain $(pwd) uses the MSYS style
    absolute path /c/path/to/git. The test case would write this name into
    the 'expect' file. But when git is invoked, MSYS-bash converts this name to
    the Windows style path c:/path/to/git, and git would produce this form in
    the result; the test would fail.

    We fix this by passing -W to bash's pwd that produces the Windows-style
    path.

    There are a two cases that need an accompanying change:

    - In t1504 the value of $(pwd) becomes part of a path list. In this case,
      the lone 'c' in something like /foo:c:/path/to/git:/bar inhibits
      MSYS-bashes path mangling; IOW in this case we want the /c/path/to/git
      form to allow path mangling. We use $PWD instead of $(pwd), which always
      has the latter form.

    - In t6200, $(pwd) - the Windows style path - must be used to construct the
      expected result because that is the path form that git sees. (The change
      in the test itself is just for consistency: 'git fetch' always sees the
      Windows-style path, with or without the change.)

    Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Ævar Arnfjörð Bjarmason June 30, 2022, 6:47 p.m. UTC | #7
On Thu, Jun 30 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Thu, Jun 30 2022, Johannes Schindelin wrote:
>>
>>> Hi Junio,
>>>
>>> On Tue, 28 Jun 2022, Junio C Hamano wrote:
>>>
>>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>>
>>>> >> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
>>>> >> +	git init with-commit-graph &&
>>>> >> +	echo "$(pwd)/with-commit/.git/objects" \
>>>> >> +		>with-commit-graph/.git/objects/info/alternates &&
>>>> >
>>>> > nit: you can use $PWD instead of $(pwd).
>>>>
>>>> We can, and it would not make any difference on non-Windows.
>>>>
>>>> But which one should we use to cater to Windows?  $(pwd) is a full
>>>> path in Windows notation "C:\Program Files\Git\..." while $PWD is
>>>> MSYS style "/C/Program Files/Git/..." or something like that, IIRC?
>>>
>>> Indeed, and since the `alternates` file is supposed to be read by
>>> `git.exe`, a non-MSYS program, the original was good, and the nit
>>> suggested the incorrect form.
>>
>> I looked at t5615-alternate-env.sh which does the equivalent of:
>>
>> 	GIT_ALTERNATE_OBJECT_DIRECTORIES="$PWD/one.git/objects:$PWD/two.git/objects" \
>>         	git cat-file [...]
>>
>> We run that test on all our platforms, does the $PWD form work in the
>> environment variable, but not when we write it to the "alternates" file?
>> Or is there some other subtlety there that I'm missing?
>
> I am also curious to see a clear and concise explanation so that we
> do not have to repeat this discussion later.  We have
>
>  - When a test checks for an absolute path that a git command generated,
>    construct the expected value using $(pwd) rather than $PWD,
>    $TEST_DIRECTORY, or $TRASH_DIRECTORY. It makes a difference on
>    Windows, where the shell (MSYS bash) mangles absolute path names.
>    For details, see the commit message of 4114156ae9.
>
> in t/README, but even with the log mesasge of 4114156a (Tests on
> Windows: $(pwd) must return Windows-style paths, 2009-03-13) [*1*],
> I have no idea what makes the thing you found in t5615 work and your
> suggestion to use $PWD in the new one not work.
>
> GIT_ALTERNATE_OBJECT_DIRECTORIES is a PATH_SEP (not necessarily a
> colon) separated list, and I think the way t5615 uses it is broken
> on Windows where PATH_SEP is defined as semicolon without the $PWD
> vs $(pwd) issue.  Is the test checking the right thing?

Whatever th explanation is CI passed with a $(pwd) -> $PWD repacement in
the test being introduced here:
https://github.com/avar/git/runs/7136686130?check_suite_focus=true

Diff here:
https://github.com/avar/git/commit/606d0060a57b7021396919044c7696489b7835cd

So either $PWD is fine there, or our Windows CI doesn't reflect this
particular caveat on some Windows systems, or the test is erroneously
passing with an invalid value. Knowing which of those it is would be
very useful...
Johannes Schindelin July 1, 2022, 7:31 p.m. UTC | #8
Hi Junio,

On Thu, 30 Jun 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
> > On Thu, Jun 30 2022, Johannes Schindelin wrote:
> >
> >> On Tue, 28 Jun 2022, Junio C Hamano wrote:
> >>
> >>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> >>>
> >>> >> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> >>> >> +	git init with-commit-graph &&
> >>> >> +	echo "$(pwd)/with-commit/.git/objects" \
> >>> >> +		>with-commit-graph/.git/objects/info/alternates &&
> >>> >
> >>> > nit: you can use $PWD instead of $(pwd).
> >>>
> >>> We can, and it would not make any difference on non-Windows.
> >>>
> >>> But which one should we use to cater to Windows?  $(pwd) is a full
> >>> path in Windows notation "C:\Program Files\Git\..." while $PWD is
> >>> MSYS style "/C/Program Files/Git/..." or something like that, IIRC?
> >>
> >> Indeed, and since the `alternates` file is supposed to be read by
> >> `git.exe`, a non-MSYS program, the original was good, and the nit
> >> suggested the incorrect form.
> >
> > I looked at t5615-alternate-env.sh which does the equivalent of:
> >
> > 	GIT_ALTERNATE_OBJECT_DIRECTORIES="$PWD/one.git/objects:$PWD/two.git/objects" \
> >         	git cat-file [...]
> >
> > We run that test on all our platforms, does the $PWD form work in the
> > environment variable, but not when we write it to the "alternates" file?
> > Or is there some other subtlety there that I'm missing?
>
> I am also curious to see a clear and concise explanation so that we
> do not have to repeat this discussion later.

Unfortunately, I do not see any way to explain this concisely: MSYS2 does
hard-to-explain things here, in the hopes to Do The Right Thing (most of
the time, anyways).

Whenever you call a non-MSYS program from an MSYS program (and remember,
an MSYS program is a program that uses the MSYS2 runtime that acts as a
POSIX emulation layer), "magic" things are done. In our context,
`bash.exe` is an MSYS program, and the non-MSYS program that is called is
`git.exe`.

So what are those "magic" things? The command-line arguments and the
environment variables are auto-converted: everything that looks like a
Unix-style path (or path list, like the `PATH` environment variable) is
converted to a Windows-style path or path list (on Windows, the colon
cannot be the separator in `PATH`, therefore the semicolon is used).

And this is where it gets _really_ tricky to explain what is going on:
what _does_ look like a Unix-style path? The exact rules are convoluted
and hard to explain, but they work _most of the time_. For example,
`/usr/bin:/hello` is converted to `C:\Program Files\Git\usr\bin;C:\Program
Files\Git\hello` or something like that. But `kernel.org:/home/gitster` is
not, because it looks more like an SSH path. Similarly, `C:/Program Files`
is interpreted as a Windows-style path, even if it could technically be a
Unix-style path list.

Now, if you call `git.exe -C /blabla <command>`, it works, because
`git.exe` is a non-MSYS program, therefore that `/blabla` is converted to
a Windows-style path before executing `git.exe`. However, when you write a
file via `echo /blabla >file`, that `echo` is either the Bash built-in, or
it is an MSYS program, and no argument conversion takes place. If you
_then_ ask `git.exe` to read and interpret the file as a path, it won't
know what to do with that Unix-style path.

You can substitute `$PWD` for `/blabla` in all of this, and it will hold
true just the same.

So what makes `pwd` special?

Well, `pwd.exe` itself is an MSYS program, so it would still report a path
that `git.exe` cannot understand. But in Git's test suite, we specifically
override `pwd` to be a shell function that calls `pwd.exe -W`, which does
output Windows-style paths.

The thing that makes that `GIT_*=$PWD git ...` call work is that the
environment is automagically converted because `git` is a non-MSYS
program. The thing that makes `echo $PWD >.git/objects/info/alternates`
not work is that `echo` _is_ an MSYS program (or Bash built-in, which is
the same thing here, for all practical purposes), so it writes the path
verbatim into that file, but then we expect `git.exe` to read this file
and interpret it as a list of paths.

Hopefully that clarifies the scenario a bit, even if it is far from a
concise explanation (I did edit this mail multiple times for clarity and
brevity, though, as I do with pretty much all of my mails).

Ciao,
Dscho

> We have
>
>  - When a test checks for an absolute path that a git command generated,
>    construct the expected value using $(pwd) rather than $PWD,
>    $TEST_DIRECTORY, or $TRASH_DIRECTORY. It makes a difference on
>    Windows, where the shell (MSYS bash) mangles absolute path names.
>    For details, see the commit message of 4114156ae9.
>
> in t/README, but even with the log mesasge of 4114156a (Tests on
> Windows: $(pwd) must return Windows-style paths, 2009-03-13) [*1*],
> I have no idea what makes the thing you found in t5615 work and your
> suggestion to use $PWD in the new one not work.
>
> GIT_ALTERNATE_OBJECT_DIRECTORIES is a PATH_SEP (not necessarily a
> colon) separated list, and I think the way t5615 uses it is broken
> on Windows where PATH_SEP is defined as semicolon without the $PWD
> vs $(pwd) issue.  Is the test checking the right thing?
>
>
> [Footnote]
>
> *1*
>
>     Tests on Windows: $(pwd) must return Windows-style paths
>
>     Many tests pass $(pwd) in some form to git and later test that the output
>     of git contains the correct value of $(pwd). For example, the test of
>     'git remote show' sets up a remote that contains $(pwd) and then the
>     expected result must contain $(pwd).
>
>     Again, MSYS-bash's path mangling kicks in: Plain $(pwd) uses the MSYS style
>     absolute path /c/path/to/git. The test case would write this name into
>     the 'expect' file. But when git is invoked, MSYS-bash converts this name to
>     the Windows style path c:/path/to/git, and git would produce this form in
>     the result; the test would fail.
>
>     We fix this by passing -W to bash's pwd that produces the Windows-style
>     path.
>
>     There are a two cases that need an accompanying change:
>
>     - In t1504 the value of $(pwd) becomes part of a path list. In this case,
>       the lone 'c' in something like /foo:c:/path/to/git:/bar inhibits
>       MSYS-bashes path mangling; IOW in this case we want the /c/path/to/git
>       form to allow path mangling. We use $PWD instead of $(pwd), which always
>       has the latter form.
>
>     - In t6200, $(pwd) - the Windows style path - must be used to construct the
>       expected result because that is the path form that git sees. (The change
>       in the test itself is just for consistency: 'git fetch' always sees the
>       Windows-style path, with or without the change.)
>
>     Signed-off-by: Johannes Sixt <j6t@kdbg.org>
>
>
Junio C Hamano July 1, 2022, 8:47 p.m. UTC | #9
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Whenever you call a non-MSYS program from an MSYS program (and remember,
> an MSYS program is a program that uses the MSYS2 runtime that acts as a
> POSIX emulation layer), "magic" things are done. In our context,
> `bash.exe` is an MSYS program, and the non-MSYS program that is called is
> `git.exe`.
>
> So what are those "magic" things? The command-line arguments and the
> environment variables are auto-converted: everything that looks like a
> Unix-style path (or path list, like the `PATH` environment variable) is
> converted to a Windows-style path or path list (on Windows, the colon
> cannot be the separator in `PATH`, therefore the semicolon is used).
>
> And this is where it gets _really_ tricky to explain what is going on:
> what _does_ look like a Unix-style path? The exact rules are convoluted
> and hard to explain, but they work _most of the time_. For example,
> `/usr/bin:/hello` is converted to `C:\Program Files\Git\usr\bin;C:\Program
> Files\Git\hello` or something like that. But `kernel.org:/home/gitster` is
> not, because it looks more like an SSH path. Similarly, `C:/Program Files`
> is interpreted as a Windows-style path, even if it could technically be a
> Unix-style path list.
>
> Now, if you call `git.exe -C /blabla <command>`, it works, because
> `git.exe` is a non-MSYS program, therefore that `/blabla` is converted to
> a Windows-style path before executing `git.exe`. However, when you write a
> file via `echo /blabla >file`, that `echo` is either the Bash built-in, or
> it is an MSYS program, and no argument conversion takes place. If you
> _then_ ask `git.exe` to read and interpret the file as a path, it won't
> know what to do with that Unix-style path.
>
> You can substitute `$PWD` for `/blabla` in all of this, and it will hold
> true just the same.
>
> So what makes `pwd` special?
>
> Well, `pwd.exe` itself is an MSYS program, so it would still report a path
> that `git.exe` cannot understand. But in Git's test suite, we specifically
> override `pwd` to be a shell function that calls `pwd.exe -W`, which does
> output Windows-style paths.
>
> The thing that makes that `GIT_*=$PWD git ...` call work is that the
> environment is automagically converted because `git` is a non-MSYS
> program. The thing that makes `echo $PWD >.git/objects/info/alternates`
> not work is that `echo` _is_ an MSYS program (or Bash built-in, which is
> the same thing here, for all practical purposes), so it writes the path
> verbatim into that file, but then we expect `git.exe` to read this file
> and interpret it as a list of paths.

----- 8< --------- 8< --------- 8< --------- 8< --------- 8< -----

> Hopefully that clarifies the scenario a bit, even if it is far from a
> concise explanation (I did edit this mail multiple times for clarity and
> brevity, though, as I do with pretty much all of my mails).

Certainly it does help.  Thanks.

I wonder if it makes sense to keep a copy of the bulk of your
response in t/ somewhere, and refer to it from t/README, to help
fellow non-Windows developers to avoid breaking tests on Windows
without knowing.
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 2b52818731..2dd9bcc7ea 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -907,7 +907,7 @@  struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!repo_has_object_file(repo, id))
+	if (!has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
new file mode 100755
index 0000000000..d7877a5758
--- /dev/null
+++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
@@ -0,0 +1,53 @@ 
+#!/bin/sh
+
+test_description='test for no lazy fetch with the commit-graph'
+
+. ./test-lib.sh
+
+if ! test_have_prereq ULIMIT_PROCESSES
+then
+	skip_all='skipping tests for no lazy fetch with the commit-graph, ulimit processes not available'
+	test_done
+fi
+
+test_expect_success 'setup: prepare a repository with a commit' '
+	git init with-commit &&
+	test_commit -C with-commit the-commit &&
+	oid=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
+	git init with-commit-graph &&
+	echo "$(pwd)/with-commit/.git/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	# create a ref that points to the commit in alternates
+	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
+	# prepare some other objects to commit-graph
+	test_commit -C with-commit-graph something &&
+	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
+	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
+'
+
+test_expect_success 'setup: change the alternates to what without the commit' '
+	git init --bare without-commit &&
+	echo "$(pwd)/without-commit/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	test_must_fail git -C with-commit-graph cat-file -e $oid
+'
+
+test_expect_success 'setup: prepare any commit to fetch' '
+	test_commit -C with-commit any-commit &&
+	anycommit=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
+	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
+	git -C with-commit-graph config remote.origin.promisor true &&
+	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
+	run_with_limited_processses env GIT_TRACE="$(pwd)/trace" \
+		git -C with-commit-graph fetch origin $anycommit 2>err &&
+	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
+	test $(grep "fetch origin" trace | wc -l) -eq 1
+'
+
+test_done