diff mbox series

[v2] submodules: fix of regression on fetching of non-init subsub-repo

Message ID 1607348819-61355-1-git-send-email-peter.kaestle@nokia.com (mailing list archive)
State Superseded
Headers show
Series [v2] submodules: fix of regression on fetching of non-init subsub-repo | expand

Commit Message

Peter Kaestle Dec. 7, 2020, 1:46 p.m. UTC
A regression has been introduced by a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28).

The scenario in which it triggers is when one has a remote repository
with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo

Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.

Now person A introduces a change to the inner_repo and propagates it
through the middle_repo and the superproject.

Once person A pushed the changes and person B wants to fetch them using
"git fetch" on superproject level, B's git call will return with error
saying:

Could not access submodule 'inner_repo'
Errors during submodule fetch:
         middle_repo

Expectation is that in this case the inner submodule will be recognized
as uninitialized subrepository and skipped by the git fetch command.

This used to work correctly before 'a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
.git/modules for a directory only existing in the worktree, delivering
then of course wrong return value.

This patch ensures is_empty_dir() is getting the correct path of the
uninitialized submodule by concatenation of the actual worktree and the
name of the uninitialized submodule.

Furthermore a regression test case is added, which tests for recursive
fetches on a superproject with uninitialized sub repositories.  This
issue was leading to an infinite loop when doing a revert of a62387b.

Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
CC: Junio C Hamano <gitster@pobox.com>
CC: Philippe Blain <levraiphilippeblain@gmail.com>
CC: Ralf Thielow <ralf.thielow@gmail.com>
CC: Eric Sunshine <sunshine@sunshineco.com>
---
 submodule.c                 |   7 ++-
 t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++
 2 files changed, 110 insertions(+), 1 deletion(-)

Comments

Philippe Blain Dec. 7, 2020, 6:42 p.m. UTC | #1
Hi Peter,

> Le 7 déc. 2020 à 08:46, Peter Kaestle <peter.kaestle@nokia.com> a écrit :
> 
> A regression has been introduced by a62387b (submodule.c: fetch in
> submodules git directory instead of in worktree, 2018-11-28).
> 
> The scenario in which it triggers is when one has a remote repository
> with a subrepository inside a subrepository like this:
> superproject/middle_repo/inner_repo

The correct terminology is "submodule", not "subrepository". 

Also, (minor point) I would just write "when one has a repository", 
as its simpler (the repository by itself is not "remote", it is only "remote" 
in relation the repositories that are cloned from it).

> Person A and B have both a clone of it, while Person B is not working
> with the inner_repo and thus does not have it initialized in his working
> copy.
> 
> Now person A introduces a change to the inner_repo and propagates it
> through the middle_repo and the superproject.
> 
> Once person A pushed the changes and person B wants to fetch them using
> "git fetch" on superproject level,

s/on/at the/

> B's git call will return with error
> saying:
> 
> Could not access submodule 'inner_repo'
> Errors during submodule fetch:
>         middle_repo
> 
> Expectation is that in this case the inner submodule will be recognized
> as uninitialized subrepository and skipped by the git fetch command.

here again, terminology: "as an uninitialized submodule" 

> This used to work correctly before 'a62387b (submodule.c: fetch in
> submodules git directory instead of in worktree, 2018-11-28)'.
> 
> Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
> .git/modules for a directory only existing in the worktree, delivering
> then of course wrong return value.
> 
> This patch ensures is_empty_dir() is getting the correct path of the
> uninitialized submodule by concatenation of the actual worktree and the
> name of the uninitialized submodule.
> 
> Furthermore a regression test case is added, which tests for recursive
> fetches on a superproject with uninitialized sub repositories.
>  This
> issue was leading to an infinite loop when doing a revert of a62387b.

I would maybe add more details here, something like the following 
(we can cite your previous attempt, because it was merged to 'master'):

The first attempt to fix this regression, in 1b7ac4e6d4 (submodules: 
fix of regression on fetching of non-init subsub-repo, 2020-11-12), by simply
reverting a62387b, resulted in
an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with
uninitialized submodules, and so this commit was reverted in 7091499bc0 (Revert 
"submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02).
To prevent future breakages, also add a regression test for this scenario.

> 
> Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
> CC: Junio C Hamano <gitster@pobox.com>
> CC: Philippe Blain <levraiphilippeblain@gmail.com>
> CC: Ralf Thielow <ralf.thielow@gmail.com>
> CC: Eric Sunshine <sunshine@sunshineco.com>
> ---
> submodule.c                 |   7 ++-
> t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/submodule.c b/submodule.c
> index b3bb59f066..b561445329 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp,
> 			strbuf_release(&submodule_prefix);
> 			return 1;
> 		} else {
> +			struct strbuf empty_submodule_path = STRBUF_INIT;
> 
> 			fetch_task_release(task);
> 			free(task);
> @@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp,
> 			 * An empty directory is normal,
> 			 * the submodule is not initialized
> 			 */
> +			strbuf_addf(&empty_submodule_path, "%s/%s/",
> +							spf->r->worktree,
> +							ce->name);
> 			if (S_ISGITLINK(ce->ce_mode) &&
> -			    !is_empty_dir(ce->name)) {
> +			    !is_empty_dir(empty_submodule_path.buf)) {
> 				spf->result = 1;
> 				strbuf_addf(err,
> 					    _("Could not access submodule '%s'\n"),
> 					    ce->name);
> 			}
> +			strbuf_release(&empty_submodule_path);
> 		}
> 	}


Maybe a personal preference, but I would have gone for something a little simpler, like the following:


diff --git a/submodule.c b/submodule.c
index b3bb59f066..4200865174 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp,
                         * the submodule is not initialized
                         */
                        if (S_ISGITLINK(ce->ce_mode) &&
-                           !is_empty_dir(ce->name)) {
+                           !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) {
                                spf->result = 1;
                                strbuf_addf(err,
                                            _("Could not access submodule '%s'\n"),

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index dd8e423d25..666dd1e2b7 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -719,4 +719,108 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
> 	)
> '
> 
> +add_commit_push () {
> +	dir="$1" &&
> +	msg="$2" &&
> +	shift 2 &&
> +	git -C "$dir" add "$@" &&
> +	git -C "$dir" commit -a -m "$msg" &&
> +	git -C "$dir" push
> +}
> +
> +compare_refs_in_dir () {
> +	fail= &&
> +	if test "x$1" = 'x!'
> +	then
> +		fail='!' &&
> +		shift
> +	fi &&
> +	git -C "$1" rev-parse --verify "$2" >expect &&
> +	git -C "$3" rev-parse --verify "$4" >actual &&
> +	eval $fail test_cmp expect actual
> +}
> +
> +
> +test_expect_success 'setup nested submodule fetch test' '
> +	# does not depend on any previous test setups
> +
> +	for repo in outer middle inner
> +	do
> +		git init --bare $repo &&
> +		git clone $repo ${repo}_content &&
> +		echo "$repo" >"${repo}_content/file" &&
> +		add_commit_push ${repo}_content "initial" file ||
> +		return 1
> +	done &&
> +
> +	git clone outer A &&
> +	git -C A submodule add "$pwd/middle" &&
> +	git -C A/middle/ submodule add "$pwd/inner" &&
> +	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
> +	add_commit_push A/ "adding middle sub" .gitmodules middle &&
> +
> +	git clone outer B &&
> +	git -C B/ submodule update --init middle &&
> +
> +	compare_refs_in_dir A HEAD B HEAD &&
> +	compare_refs_in_dir A/middle HEAD B/middle HEAD &&
> +	test_path_is_file B/file &&
> +	test_path_is_file B/middle/file &&
> +	test_path_is_missing B/middle/inner/file &&
> +
> +	echo "change on inner repo of A" >"A/middle/inner/file" &&
> +	add_commit_push A/middle/inner "change on inner" file &&
> +	add_commit_push A/middle "change on inner" inner &&
> +	add_commit_push A "change on inner" middle
> +'
> +
> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
> +	# depends on previous test for setup
> +
> +	git -C B/ fetch &&
> +	compare_refs_in_dir A origin/master B origin/master
> +'
> +
> +
> +test_expect_success 'setup recursive fetch with uninit submodule' '
> +	# does not depend on any previous test setups
> +
> +	git init main &&
> +	git init sub &&
> +
> +	>sub/file &&
> +	git -C sub add file &&
> +	git -C sub commit -m "add file" &&
> +	git -C sub rev-parse HEAD >expect &&
> +
> +	git -C main submodule add ../sub &&
> +	git -C main submodule init &&
> +	git -C main submodule update --checkout &&

These two steps are unnecessary as they are implicitly done by 'git submodule add'.
I think we could reflect real life a little bit more by cloning the superproject, and running
the 'recursive fetch with uninit submodule' test below in the clone.

> +	git -C main submodule status >out &&
> +	sed -e "s/^ //" -e "s/ sub .*$//" out >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'recursive fetch with uninit submodule' '
> +	# depends on previous test for setup
> +
> +	git -C main submodule deinit -f sub &&

Here you are deiniting the submodule, such that 
the Git directory will stay in .git/modules/sub. This is not the same thing
as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub
will not yet exist. So maybe we could harden the tests by also testing
for that scenario ? I don't know... maybe the infinite loop only happens
if .git/modules/sub actually already exists. If so, the test name should be
"recursive fetch with deinitialized submodule", I think.

> +
> +	# In a regression the following git call will run into infinite recursion.
> +	# To handle that, we connect the grep command to the git call by a pipe
> +	# so that grep can kill the infinite recusion when detected.
> +	# The recursion creates git output like:
> +	# Fetching submodule sub
> +	# Fetching submodule sub/sub              <-- [1]
> +	# Fetching submodule sub/sub/sub
> +	# ...
> +	# [1] grep will trigger here and kill git by exiting and closing its stdin
> +
> +	! git -C main fetch --recurse-submodules 2>&1 |
> +		grep -v -m1 "Fetching submodule sub$" &&
> +	git -C main submodule status >out &&
> +	sed -e "s/^-//" -e "s/ sub$//" out >actual &&
> +	test_cmp expect actual
> +'
> +
> test_done


Thanks for working on that, and sorry for not having the time to comment before
you sent v2.

Cheers,

Philippe.
Junio C Hamano Dec. 7, 2020, 7:22 p.m. UTC | #2
Peter Kaestle <peter.kaestle@nokia.com> writes:

> +add_commit_push () {
> +	dir="$1" &&
> +	msg="$2" &&
> +	shift 2 &&
> +	git -C "$dir" add "$@" &&
> +	git -C "$dir" commit -a -m "$msg" &&
> +	git -C "$dir" push
> +}
> +
> +compare_refs_in_dir () {
> +	fail= &&
> +	if test "x$1" = 'x!'
> +	then
> +		fail='!' &&
> +		shift
> +	fi &&
> +	git -C "$1" rev-parse --verify "$2" >expect &&
> +	git -C "$3" rev-parse --verify "$4" >actual &&
> +	eval $fail test_cmp expect actual
> +}



> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
> +	# depends on previous test for setup
> +
> +	git -C B/ fetch &&
> +	compare_refs_in_dir A origin/master B origin/master

Can we do this without relying on the name of the default branch?
Perhaps when outer, middle and inner are prepared, they can be
forced to be on the 'sample' (not 'master' nor 'main') branch, or
something like that?

> +test_expect_success 'setup recursive fetch with uninit submodule' '
> +	# does not depend on any previous test setups
> +
> +	git init main &&
> +	git init sub &&

"super vs sub" would give us a better contrast than "main vs sub",
and it would help reduce mistakes in the mechanical conversion of
"master" to "main" happening in another topic.

> +	# In a regression the following git call will run into infinite recursion.
> +	# To handle that, we connect the grep command to the git call by a pipe
> +	# so that grep can kill the infinite recusion when detected.
> +	# The recursion creates git output like:
> +	# Fetching submodule sub
> +	# Fetching submodule sub/sub              <-- [1]
> +	# Fetching submodule sub/sub/sub
> +	# ...
> +	# [1] grep will trigger here and kill git by exiting and closing its stdin

"trigger here and kill..." -> "stop reading and cause git to
eventually stop and die"

But we probably cannot use 'grep -m1' so it is a moot point.

> +
> +	! git -C main fetch --recurse-submodules 2>&1 |
> +		grep -v -m1 "Fetching submodule sub$" &&

Unfortunately, "grep -m<count>" is not even in POSIX, I would think.

What do we expect to happen in the correct case?

 - A line "Fetching submodule sub" and nothing else is given?  That
   feels a bit brittle (how are we making sure, in the presence of
   "2>&1", that we will not get any other output, like progress?)

 - "sub" is the only thing that appears on lines that begin with
   "Fetching submodule" (i.e. "Fetching submodule $something" where
   $something is not 'sub' is an error), and we allow other garbage
   in the output?  That would be a bit more robust than the above.

As you seem to be comfortable using "sed" below, perhaps use it to
extract the first few lines that say "^Fetching submodule " from the
output and stop, and check that the output has only one such line
about 'sub' and nothing else?

> +	git -C main submodule status >out &&
> +	sed -e "s/^-//" -e "s/ sub$//" out >actual &&
> +	test_cmp expect actual
Junio C Hamano Dec. 7, 2020, 7:43 p.m. UTC | #3
Philippe Blain <levraiphilippeblain@gmail.com> writes:

> Maybe a personal preference, but I would have gone for something a
> little simpler, like the following:
>
> diff --git a/submodule.c b/submodule.c
> index b3bb59f066..4200865174 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp,
>                          * the submodule is not initialized
>                          */
>                         if (S_ISGITLINK(ce->ce_mode) &&
> -                           !is_empty_dir(ce->name)) {
> +                           !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) {

But then you leak the return value from repo_worktree_path(), no?

>> +test_expect_success 'recursive fetch with uninit submodule' '
>> +	# depends on previous test for setup
>> +
>> +	git -C main submodule deinit -f sub &&
>
> Here you are deiniting the submodule, such that 
> the Git directory will stay in .git/modules/sub. This is not the same thing
> as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub
> will not yet exist. So maybe we could harden the tests by also testing
> for that scenario ? I don't know... maybe the infinite loop only happens
> if .git/modules/sub actually already exists. If so, the test name should be
> "recursive fetch with deinitialized submodule", I think.

Even if the original breakage happens only for deinitialized case,
it would be sensible to test uninitialized one as well, I would
think.

> Thanks for working on that, and sorry for not having the time to comment before
> you sent v2.

Thanks, all.  

It's not like we corral everybody to a single place on a single day
to work on a single thing.  Reviews ov v2 by reviewers who have not
seen v1 is totally expected and very much appreciated.
Junio C Hamano Dec. 7, 2020, 7:56 p.m. UTC | #4
Philippe Blain <levraiphilippeblain@gmail.com> writes:

> I would maybe add more details here, something like the following 
> (we can cite your previous attempt, because it was merged to 'master'):
>
> The first attempt to fix this regression, in 1b7ac4e6d4
> (submodules: fix of regression on fetching of non-init
> subsub-repo, 2020-11-12), by simply reverting a62387b, resulted in
> an infinite loop of submodule fetches in the simpler case of a
> recursive fetch of a superproject with uninitialized submodules,
> and so this commit was reverted in 7091499bc0 (Revert "submodules:
> fix of regression on fetching of non-init subsub-repo",
> 2020-12-02).  To prevent future breakages, also add a regression
> test for this scenario.

Forgot to mention in my other response, but I do find this a very
sensible addition.

Thanks.
Philippe Blain Dec. 7, 2020, 8:44 p.m. UTC | #5
Hi Junio,

> Le 7 déc. 2020 à 14:22, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> Peter Kaestle <peter.kaestle@nokia.com> writes:
> 
>> +add_commit_push () {
>> +	dir="$1" &&
>> +	msg="$2" &&
>> +	shift 2 &&
>> +	git -C "$dir" add "$@" &&
>> +	git -C "$dir" commit -a -m "$msg" &&
>> +	git -C "$dir" push
>> +}
>> +
>> +compare_refs_in_dir () {
>> +	fail= &&
>> +	if test "x$1" = 'x!'
>> +	then
>> +		fail='!' &&
>> +		shift
>> +	fi &&
>> +	git -C "$1" rev-parse --verify "$2" >expect &&
>> +	git -C "$3" rev-parse --verify "$4" >actual &&
>> +	eval $fail test_cmp expect actual
>> +}
> 
> 
> 
>> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
>> +	# depends on previous test for setup
>> +
>> +	git -C B/ fetch &&
>> +	compare_refs_in_dir A origin/master B origin/master
> 
> Can we do this without relying on the name of the default branch?
> Perhaps when outer, middle and inner are prepared, they can be
> forced to be on the 'sample' (not 'master' nor 'main') branch, or
> something like that?

Or, simpler, we could call "git remote set-head -a' 
in A and B in the setup script, which would make
origin/HEAD in A and B point to the default branch, 
such that the call here could be :

compare_refs_in_dir A origin/HEAD B origin/HEAD

Philippe.
Junio C Hamano Dec. 7, 2020, 9:02 p.m. UTC | #6
Philippe Blain <levraiphilippeblain@gmail.com> writes:

>>> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
>>> +	# depends on previous test for setup
>>> +
>>> +	git -C B/ fetch &&
>>> +	compare_refs_in_dir A origin/master B origin/master
>> 
>> Can we do this without relying on the name of the default branch?
>> Perhaps when outer, middle and inner are prepared, they can be
>> forced to be on the 'sample' (not 'master' nor 'main') branch, or
>> something like that?
>
> Or, simpler, we could call "git remote set-head -a' 
> in A and B in the setup script, which would make
> origin/HEAD in A and B point to the default branch, 
> such that the call here could be :

The set-up prepares A and B by cloning from elsewhere, no?  Should
we even need a set-head call?

> compare_refs_in_dir A origin/HEAD B origin/HEAD

Yes, using HEAD would be another simple way to avoid having to rely
on the default behaviour.

THanks.
Junio C Hamano Dec. 7, 2020, 9:10 p.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> Philippe Blain <levraiphilippeblain@gmail.com> writes:
>
>>>> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
>>>> +	# depends on previous test for setup
>>>> +
>>>> +	git -C B/ fetch &&
>>>> +	compare_refs_in_dir A origin/master B origin/master
>>> 
>>> Can we do this without relying on the name of the default branch?
>>> Perhaps when outer, middle and inner are prepared, they can be
>>> forced to be on the 'sample' (not 'master' nor 'main') branch, or
>>> something like that?
>>
>> Or, simpler, we could call "git remote set-head -a' 
>> in A and B in the setup script, which would make
>> origin/HEAD in A and B point to the default branch, 
>> such that the call here could be :
>
> The set-up prepares A and B by cloning from elsewhere, no?  Should
> we even need a set-head call?

Ah, they are created by cloning an empty repository.  That explains
why.  Thanks.

>> compare_refs_in_dir A origin/HEAD B origin/HEAD
>
> Yes, using HEAD would be another simple way to avoid having to rely
> on the default behaviour.
>
> THanks.
Peter Kaestle Dec. 8, 2020, 8:46 a.m. UTC | #8
On 07.12.20 20:43, Junio C Hamano wrote:
> Philippe Blain <levraiphilippeblain@gmail.com> writes:
>> Thanks for working on that, and sorry for not having the time to comment before
>> you sent v2.
> 
> Thanks, all.
> 
> It's not like we corral everybody to a single place on a single day
> to work on a single thing.  Reviews ov v2 by reviewers who have not
> seen v1 is totally expected and very much appreciated.
> 

Thank you very much for all your comments, they're much appreciated.  I 
prefer spending more time on it now for getting it right, than to take 
another revert-rethink-refactor round.
It will take me some time today to work on them.
Peter Kaestle Dec. 8, 2020, 2:06 p.m. UTC | #9
On 07.12.20 19:42, Philippe Blain wrote:
> Hi Peter,
> 
>> Le 7 déc. 2020 à 08:46, Peter Kaestle <peter.kaestle@nokia.com> a écrit :
>>
>> A regression has been introduced by a62387b (submodule.c: fetch in
>> submodules git directory instead of in worktree, 2018-11-28).
>>
>> The scenario in which it triggers is when one has a remote repository
>> with a subrepository inside a subrepository like this:
>> superproject/middle_repo/inner_repo
> 
> The correct terminology is "submodule", not "subrepository".
> 
> Also, (minor point) I would just write "when one has a repository",
> as its simpler (the repository by itself is not "remote", it is only "remote"
> in relation the repositories that are cloned from it).

ok.



>> Person A and B have both a clone of it, while Person B is not working
>> with the inner_repo and thus does not have it initialized in his working
>> copy.
>>
>> Now person A introduces a change to the inner_repo and propagates it
>> through the middle_repo and the superproject.
>>
>> Once person A pushed the changes and person B wants to fetch them using
>> "git fetch" on superproject level,
> 
> s/on/at the/

ok.


>> B's git call will return with error
>> saying:
>>
>> Could not access submodule 'inner_repo'
>> Errors during submodule fetch:
>>          middle_repo
>>
>> Expectation is that in this case the inner submodule will be recognized
>> as uninitialized subrepository and skipped by the git fetch command.
> 
> here again, terminology: "as an uninitialized submodule"

ok.


>> This used to work correctly before 'a62387b (submodule.c: fetch in
>> submodules git directory instead of in worktree, 2018-11-28)'.
>>
>> Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
>> .git/modules for a directory only existing in the worktree, delivering
>> then of course wrong return value.
>>
>> This patch ensures is_empty_dir() is getting the correct path of the
>> uninitialized submodule by concatenation of the actual worktree and the
>> name of the uninitialized submodule.
>>
>> Furthermore a regression test case is added, which tests for recursive
>> fetches on a superproject with uninitialized sub repositories.
>>   This
>> issue was leading to an infinite loop when doing a revert of a62387b.
> 
> I would maybe add more details here, something like the following
> (we can cite your previous attempt, because it was merged to 'master'):
> 
> The first attempt to fix this regression, in 1b7ac4e6d4 (submodules:
> fix of regression on fetching of non-init subsub-repo, 2020-11-12), by simply
> reverting a62387b, resulted in
> an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with
> uninitialized submodules, and so this commit was reverted in 7091499bc0 (Revert
> "submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02).
> To prevent future breakages, also add a regression test for this scenario.

Jip, I like that.


>>
>> Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
>> CC: Junio C Hamano <gitster@pobox.com>
>> CC: Philippe Blain <levraiphilippeblain@gmail.com>
>> CC: Ralf Thielow <ralf.thielow@gmail.com>
>> CC: Eric Sunshine <sunshine@sunshineco.com>
>> ---
>> submodule.c                 |   7 ++-
>> t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++
>> 2 files changed, 110 insertions(+), 1 deletion(-)
>>
>> diff --git a/submodule.c b/submodule.c
>> index b3bb59f066..b561445329 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp,
>> 			strbuf_release(&submodule_prefix);
>> 			return 1;
>> 		} else {
>> +			struct strbuf empty_submodule_path = STRBUF_INIT;
>>
>> 			fetch_task_release(task);
>> 			free(task);
>> @@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp,
>> 			 * An empty directory is normal,
>> 			 * the submodule is not initialized
>> 			 */
>> +			strbuf_addf(&empty_submodule_path, "%s/%s/",
>> +							spf->r->worktree,
>> +							ce->name);
>> 			if (S_ISGITLINK(ce->ce_mode) &&
>> -			    !is_empty_dir(ce->name)) {
>> +			    !is_empty_dir(empty_submodule_path.buf)) {
>> 				spf->result = 1;
>> 				strbuf_addf(err,
>> 					    _("Could not access submodule '%s'\n"),
>> 					    ce->name);
>> 			}
>> +			strbuf_release(&empty_submodule_path);
>> 		}
>> 	}
> 
> 
> Maybe a personal preference, but I would have gone for something a little simpler, like the following:
> 
> 
> diff --git a/submodule.c b/submodule.c
> index b3bb59f066..4200865174 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp,
>                           * the submodule is not initialized
>                           */
>                          if (S_ISGITLINK(ce->ce_mode) &&
> -                           !is_empty_dir(ce->name)) {
> +                           !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) {

I'm not deep enough into the git code to judge which approach is the 
better one.  From my perspective, being a foreigner to the git code, I 
like my proposed code more, as for me it's much easier to understand 
what's happening by having a meaningful variable name and without being 
forced to dig into outer functions first.
Also Junio C Hamano <gitster@pobox.com> is having some concerns, which I 
can't judge:

> But then you leak the return value from repo_worktree_path(), no?

Thus for v3 I'll stick to my proposal and when you'll review it, please 
discuss with each other whether I should go for a v4 using 
repo_worktree_path().

[...]

>> +
>> +test_expect_success 'setup recursive fetch with uninit submodule' '
>> +	# does not depend on any previous test setups
>> +
>> +	git init main &&
>> +	git init sub &&
>> +
>> +	>sub/file &&
>> +	git -C sub add file &&
>> +	git -C sub commit -m "add file" &&
>> +	git -C sub rev-parse HEAD >expect &&
>> +
>> +	git -C main submodule add ../sub &&
>> +	git -C main submodule init &&
>> +	git -C main submodule update --checkout &&
> 
> These two steps are unnecessary as they are implicitly done by 'git submodule add'.
> I think we could reflect real life a little bit more by cloning the superproject, and running
> the 'recursive fetch with uninit submodule' test below in the clone.

Yes, you're right, "...init" and "...update..." can be removed.


>> +	git -C main submodule status >out &&
>> +	sed -e "s/^ //" -e "s/ sub .*$//" out >actual &&
>> +	test_cmp expect actual
>> +'
>> +
>> +test_expect_success 'recursive fetch with uninit submodule' '
>> +	# depends on previous test for setup
>> +
>> +	git -C main submodule deinit -f sub &&
> 
> Here you are deiniting the submodule, such that
> the Git directory will stay in .git/modules/sub. This is not the same thing
> as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub
> will not yet exist. So maybe we could harden the tests by also testing
> for that scenario ? I don't know... maybe the infinite loop only happens
> if .git/modules/sub actually already exists. If so, the test name should be
> "recursive fetch with deinitialized submodule", I think.

I added another test case for v3, which checks for this in case of never 
initialized submodule.  When executing the test, I can see that the 
infinite loop regression only occurs after doing the init followed by a 
deinit.  Thus renaming the test accordingly.
Peter Kaestle Dec. 8, 2020, 2:58 p.m. UTC | #10
On 07.12.20 20:22, Junio C Hamano wrote:
> Peter Kaestle <peter.kaestle@nokia.com> writes:
> 
>> +add_commit_push () {
>> +	dir="$1" &&
>> +	msg="$2" &&
>> +	shift 2 &&
>> +	git -C "$dir" add "$@" &&
>> +	git -C "$dir" commit -a -m "$msg" &&
>> +	git -C "$dir" push
>> +}
>> +
>> +compare_refs_in_dir () {
>> +	fail= &&
>> +	if test "x$1" = 'x!'
>> +	then
>> +		fail='!' &&
>> +		shift
>> +	fi &&
>> +	git -C "$1" rev-parse --verify "$2" >expect &&
>> +	git -C "$3" rev-parse --verify "$4" >actual &&
>> +	eval $fail test_cmp expect actual
>> +}
> 
> 
> 
>> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
>> +	# depends on previous test for setup
>> +
>> +	git -C B/ fetch &&
>> +	compare_refs_in_dir A origin/master B origin/master
> 
> Can we do this without relying on the name of the default branch?
> Perhaps when outer, middle and inner are prepared, they can be
> forced to be on the 'sample' (not 'master' nor 'main') branch, or
> something like that?

Using origin/HEAD for compare_refs_in_dir should be fine without 
additional setup, as for the regression the "git -C B/ fetch" will fail 
and return with false (see description of the patch).  This 
compare_refs_in_dir is just for additional checking as you proposed in 
the mail:
https://public-inbox.org/git/xmqqk0uuct94.fsf@gitster.c.googlers.com/
------------8<-------------
> And from B that was an original copy of A with only the top and
> middle layer instantiated, you run "git fetch".  Are you happy as
> long as "git fetch" does not exit with non-zero status?  That is
> hard to believe---it may be a necessary condition for the command to
> exit with zero status, but you have other expectations, like what
> commit the remote tracking branch refs/remotes/origin/HEAD ought to
> be pointing at.  I think we should check that, too.
----------->8-------------


> 
>> +test_expect_success 'setup recursive fetch with uninit submodule' '
>> +	# does not depend on any previous test setups
>> +
>> +	git init main &&
>> +	git init sub &&
> 
> "super vs sub" would give us a better contrast than "main vs sub",
> and it would help reduce mistakes in the mechanical conversion of
> "master" to "main" happening in another topic.
> 

ok.


>> +	# In a regression the following git call will run into infinite recursion.
>> +	# To handle that, we connect the grep command to the git call by a pipe
>> +	# so that grep can kill the infinite recusion when detected.
>> +	# The recursion creates git output like:
>> +	# Fetching submodule sub
>> +	# Fetching submodule sub/sub              <-- [1]
>> +	# Fetching submodule sub/sub/sub
>> +	# ...
>> +	# [1] grep will trigger here and kill git by exiting and closing its stdin
> 
> "trigger here and kill..." -> "stop reading and cause git to
> eventually stop and die"
> 
> But we probably cannot use 'grep -m1' so it is a moot point.
> 
>> +
>> +	! git -C main fetch --recurse-submodules 2>&1 |
>> +		grep -v -m1 "Fetching submodule sub$" &&
> 
> Unfortunately, "grep -m<count>" is not even in POSIX, I would think.
> 
> What do we expect to happen in the correct case?

sigh, we can't use grep -m1.  Too bad, it was such a nice solution.

> 
>   - A line "Fetching submodule sub" and nothing else is given?  That
>     feels a bit brittle (how are we making sure, in the presence of
>     "2>&1", that we will not get any other output, like progress?)
> 
>   - "sub" is the only thing that appears on lines that begin with
>     "Fetching submodule" (i.e. "Fetching submodule $something" where
>     $something is not 'sub' is an error), and we allow other garbage
>     in the output?  That would be a bit more robust than the above.
> 
> As you seem to be comfortable using "sed" below, perhaps use it to
> extract the first few lines that say "^Fetching submodule " from the
> output and stop, and check that the output has only one such line
> about 'sub' and nothing else?

According to [1] posix sed offers equal possibility to quit like grep 
-m1 and I'll adopt:
$> yes posixgrepisnogoodforus | sed "/posix/q"

[1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html


Looking at the other mails I think I processed over all open comments 
and will prepare for v3.  Thanks again.
diff mbox series

Patch

diff --git a/submodule.c b/submodule.c
index b3bb59f066..b561445329 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1477,6 +1477,7 @@  static int get_next_submodule(struct child_process *cp,
 			strbuf_release(&submodule_prefix);
 			return 1;
 		} else {
+			struct strbuf empty_submodule_path = STRBUF_INIT;
 
 			fetch_task_release(task);
 			free(task);
@@ -1485,13 +1486,17 @@  static int get_next_submodule(struct child_process *cp,
 			 * An empty directory is normal,
 			 * the submodule is not initialized
 			 */
+			strbuf_addf(&empty_submodule_path, "%s/%s/",
+							spf->r->worktree,
+							ce->name);
 			if (S_ISGITLINK(ce->ce_mode) &&
-			    !is_empty_dir(ce->name)) {
+			    !is_empty_dir(empty_submodule_path.buf)) {
 				spf->result = 1;
 				strbuf_addf(err,
 					    _("Could not access submodule '%s'\n"),
 					    ce->name);
 			}
+			strbuf_release(&empty_submodule_path);
 		}
 	}
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index dd8e423d25..666dd1e2b7 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -719,4 +719,108 @@  test_expect_success 'fetch new submodule commit intermittently referenced by sup
 	)
 '
 
+add_commit_push () {
+	dir="$1" &&
+	msg="$2" &&
+	shift 2 &&
+	git -C "$dir" add "$@" &&
+	git -C "$dir" commit -a -m "$msg" &&
+	git -C "$dir" push
+}
+
+compare_refs_in_dir () {
+	fail= &&
+	if test "x$1" = 'x!'
+	then
+		fail='!' &&
+		shift
+	fi &&
+	git -C "$1" rev-parse --verify "$2" >expect &&
+	git -C "$3" rev-parse --verify "$4" >actual &&
+	eval $fail test_cmp expect actual
+}
+
+
+test_expect_success 'setup nested submodule fetch test' '
+	# does not depend on any previous test setups
+
+	for repo in outer middle inner
+	do
+		git init --bare $repo &&
+		git clone $repo ${repo}_content &&
+		echo "$repo" >"${repo}_content/file" &&
+		add_commit_push ${repo}_content "initial" file ||
+		return 1
+	done &&
+
+	git clone outer A &&
+	git -C A submodule add "$pwd/middle" &&
+	git -C A/middle/ submodule add "$pwd/inner" &&
+	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
+	add_commit_push A/ "adding middle sub" .gitmodules middle &&
+
+	git clone outer B &&
+	git -C B/ submodule update --init middle &&
+
+	compare_refs_in_dir A HEAD B HEAD &&
+	compare_refs_in_dir A/middle HEAD B/middle HEAD &&
+	test_path_is_file B/file &&
+	test_path_is_file B/middle/file &&
+	test_path_is_missing B/middle/inner/file &&
+
+	echo "change on inner repo of A" >"A/middle/inner/file" &&
+	add_commit_push A/middle/inner "change on inner" file &&
+	add_commit_push A/middle "change on inner" inner &&
+	add_commit_push A "change on inner" middle
+'
+
+test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
+	# depends on previous test for setup
+
+	git -C B/ fetch &&
+	compare_refs_in_dir A origin/master B origin/master
+'
+
+
+test_expect_success 'setup recursive fetch with uninit submodule' '
+	# does not depend on any previous test setups
+
+	git init main &&
+	git init sub &&
+
+	>sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git -C main submodule add ../sub &&
+	git -C main submodule init &&
+	git -C main submodule update --checkout &&
+	git -C main submodule status >out &&
+	sed -e "s/^ //" -e "s/ sub .*$//" out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'recursive fetch with uninit submodule' '
+	# depends on previous test for setup
+
+	git -C main submodule deinit -f sub &&
+
+	# In a regression the following git call will run into infinite recursion.
+	# To handle that, we connect the grep command to the git call by a pipe
+	# so that grep can kill the infinite recusion when detected.
+	# The recursion creates git output like:
+	# Fetching submodule sub
+	# Fetching submodule sub/sub              <-- [1]
+	# Fetching submodule sub/sub/sub
+	# ...
+	# [1] grep will trigger here and kill git by exiting and closing its stdin
+
+	! git -C main fetch --recurse-submodules 2>&1 |
+		grep -v -m1 "Fetching submodule sub$" &&
+	git -C main submodule status >out &&
+	sed -e "s/^-//" -e "s/ sub$//" out >actual &&
+	test_cmp expect actual
+'
+
 test_done