diff mbox series

submodules: fix of regression on fetching of non-init subsub-repo

Message ID 1607095412-40109-1-git-send-email-peter.kaestle@nokia.com (mailing list archive)
State Superseded
Headers show
Series submodules: fix of regression on fetching of non-init subsub-repo | expand

Commit Message

Peter Kaestle Dec. 4, 2020, 3:23 p.m. UTC
A regression has been introduced by a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28).

The scenario in which it triggers is when one has a remote repository
with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo

Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.

Now person A introduces a change to the inner_repo and propagates it
through the middle_repo and the superproject.

Once person A pushed the changes and person B wants to fetch them using
"git fetch" on superproject level, B's git call will return with error
saying:

Could not access submodule 'inner_repo'
Errors during submodule fetch:
         middle_repo

Expectation is that in this case the inner submodule will be recognized
as uninitialized subrepository and skipped by the git fetch command.

This used to work correctly before 'a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
.git/modules for a directory only existing in the worktree, delivering
then of course wrong return value.

This patch ensures is_empty_dir() is getting the correct path of the
uninitialized submodule by concatenation of the actual worktree and the
name of the uninitialized submodule.

Furthermore a regression test case is added, which tests for recursive
fetches on a superproject with uninitialized sub repositories.  This
issue was leading to an infinite loop when doing a revert of a62387b.

Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
CC: Junio C Hamano <gitster@pobox.com>
CC: Philippe Blain <levraiphilippeblain@gmail.com>
CC: Ralf Thielow <ralf.thielow@gmail.com>
---
 submodule.c                 |  7 ++-
 t/t5526-fetch-submodules.sh | 94 +++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 1 deletion(-)

Comments

Eric Sunshine Dec. 4, 2020, 6:06 p.m. UTC | #1
On Fri, Dec 4, 2020 at 10:25 AM Peter Kaestle <peter.kaestle@nokia.com> wrote:
> [...]
> Furthermore a regression test case is added, which tests for recursive
> fetches on a superproject with uninitialized sub repositories.  This
> issue was leading to an infinite loop when doing a revert of a62387b.

Just a few small comments (nothing comprehensive) from a quick scan of
the patch...

Mostly they are just minor style issues, not necessarily worth a
re-roll, but there is one actionable item.

> Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
> ---
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> @@ -719,4 +719,98 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
> +add_commit_push () {
> +       dir="$1"
> +       msg="$2"
> +       shift 2

We typically recommend including these assignments in the &&-chain to
future-proof against someone later inserting code above them and not
realizing that that code is not part of the &&-chain, in which case if
the new code fails, the failure might go unnoticed.

> +       git -C "$dir" add "$@" &&
> +       git -C "$dir" commit -a -m "$msg" &&
> +       git -C "$dir" push
> +}
> +
> +compare_refs_in_dir () {
> +       fail= &&
> +       if test "x$1" = 'x!'
> +       then
> +               fail='!' &&
> +               shift
> +       fi &&
> +       git -C "$1" rev-parse --verify "$2" >expect &&
> +       git -C "$3" rev-parse --verify "$4" >actual &&
> +       eval $fail test_cmp expect actual
> +}

We have a test_cmp_rev() similar to this but it doesn't support -C as
some of our other test functions do. I briefly wondered if it would
make sense to extend it to understand -C, but even that wouldn't help
this case since compare_refs_in_dir() introduced here involves two
distinct directories. The need here is so special-purpose that it
likely would not make sense to upgrade test_cmp_rev() to accommodate
it. Okay.

> +test_expect_success 'setup nested submodule fetch test' '
> +       # does not depend on any previous test setups
> +
> +       for repo in outer middle inner
> +       do
> +               (
> +                       git init --bare $repo &&
> +                       git clone $repo ${repo}_content &&
> +                       echo "$repo" >"${repo}_content/file" &&
> +                       add_commit_push ${repo}_content "initial" file
> +               ) || return 1
> +       done &&

What is the purpose of the subshell here? Is it to ensure that commits
in each repo have identical timestamps? Or is it just for making the
&& and || expression more clear? If the latter, we normally don't
bother with the parentheses.

> +       git clone outer A &&
> +       git -C A submodule add "$pwd/middle" &&
> +       git -C A/middle/ submodule add "$pwd/inner" &&
> +       add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
> +       add_commit_push A/ "adding middle sub" .gitmodules middle &&
> +
> +       git clone outer B &&
> +       git -C B/ submodule update --init middle &&
> +
> +       compare_refs_in_dir A HEAD B HEAD &&
> +       compare_refs_in_dir A/middle HEAD B/middle HEAD &&
> +       test -f B/file &&
> +       test -f B/middle/file &&
> +       ! test -f B/middle/inner/file &&

These days we typically use test_path_exists() (or
test_path_is_file()) and test_path_is_missing() rather than bare
`test`.

> +test_expect_success 'setup recursive fetch with uninit submodule' '
> +       # does not depend on any previous test setups
> +
> +       git init main &&
> +       git init sub &&
> +
> +       touch sub/file &&

Unless the timestamp of the file is significant to the test, in which
case `touch` is used, we normally create empty files like this:

    >sub/file &&

> +test_expect_success 'recursive fetch with uninit submodule' '
> +       git -C main submodule deinit -f sub &&
> +       ! git -C main fetch --recurse-submodules |&
> +               grep -v -m1 "Fetching submodule sub$" &&

We want the test scripts to be portable, thus avoid Bashisms such as `|&`.

We also avoid placing a Git command upstream in a pipe since doing so
causes the exit code of the Git command to be lost. Instead, we would
normally send the Git output to a file and then send that file to
whatever would be downstream of the Git command in the pipe. So, a
mechanical rewrite of the above (without thinking too hard about it)
might be:

    git -C main fetch --recurse-submodules >out 2>&1 &&
    ! grep -v -m1 "Fetching submodule sub$" &&

> +       git -C main submodule status |
> +               sed -e "s/^-//" -e "s/ sub$//" >actual &&

Same comment about avoiding Git upstream in a pipe, so perhaps:

    git -C main submodule status >out &&
    sed -e "s/^-//" -e "s/ sub$//" out >actual &&

> +       test_cmp expect actual
> +'
Peter Kaestle Dec. 7, 2020, 8:28 a.m. UTC | #2
Hi Eric,

On 04.12.20 19:06, Eric Sunshine wrote:
> On Fri, Dec 4, 2020 at 10:25 AM Peter Kaestle <peter.kaestle@nokia.com> wrote:
>> [...]
>> Furthermore a regression test case is added, which tests for recursive
>> fetches on a superproject with uninitialized sub repositories.  This
>> issue was leading to an infinite loop when doing a revert of a62387b.
> 
> Just a few small comments (nothing comprehensive) from a quick scan of
> the patch...
> 
> Mostly they are just minor style issues, not necessarily worth a
> re-roll, but there is one actionable item.

thanks for your comments.  A new patch will follow soon.


>> Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
>> ---
>> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
>> @@ -719,4 +719,98 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
>> +add_commit_push () {
>> +       dir="$1"
>> +       msg="$2"
>> +       shift 2
> 
> We typically recommend including these assignments in the &&-chain to
> future-proof against someone later inserting code above them and not
> realizing that that code is not part of the &&-chain, in which case if
> the new code fails, the failure might go unnoticed.

ok

> 
>> +       git -C "$dir" add "$@" &&
>> +       git -C "$dir" commit -a -m "$msg" &&
>> +       git -C "$dir" push
>> +}
>> +
>> +compare_refs_in_dir () {
>> +       fail= &&
>> +       if test "x$1" = 'x!'
>> +       then
>> +               fail='!' &&
>> +               shift
>> +       fi &&
>> +       git -C "$1" rev-parse --verify "$2" >expect &&
>> +       git -C "$3" rev-parse --verify "$4" >actual &&
>> +       eval $fail test_cmp expect actual
>> +}
> 
> We have a test_cmp_rev() similar to this but it doesn't support -C as
> some of our other test functions do. I briefly wondered if it would
> make sense to extend it to understand -C, but even that wouldn't help
> this case since compare_refs_in_dir() introduced here involves two
> distinct directories. The need here is so special-purpose that it
> likely would not make sense to upgrade test_cmp_rev() to accommodate
> it. Okay.

Yes, I saw that there's a similar function and I tried to modify this 
one first.  Unfortunately this didn't work without touching much 
unaffected test code.  So I propose to continue with this additional 
function.


>> +test_expect_success 'setup nested submodule fetch test' '
>> +       # does not depend on any previous test setups
>> +
>> +       for repo in outer middle inner
>> +       do
>> +               (
>> +                       git init --bare $repo &&
>> +                       git clone $repo ${repo}_content &&
>> +                       echo "$repo" >"${repo}_content/file" &&
>> +                       add_commit_push ${repo}_content "initial" file
>> +               ) || return 1
>> +       done &&
> 
> What is the purpose of the subshell here? Is it to ensure that commits
> in each repo have identical timestamps? Or is it just for making the
> && and || expression more clear? If the latter, we normally don't
> bother with the parentheses.

It was intended to make the correlation of && and || clear.  I have 
experienced many cases in the past where things were screwed because it 
was not clearly understood by everybody.  I'll propose next patch 
without this subshell.

>> +       git clone outer A &&
>> +       git -C A submodule add "$pwd/middle" &&
>> +       git -C A/middle/ submodule add "$pwd/inner" &&
>> +       add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
>> +       add_commit_push A/ "adding middle sub" .gitmodules middle &&
>> +
>> +       git clone outer B &&
>> +       git -C B/ submodule update --init middle &&
>> +
>> +       compare_refs_in_dir A HEAD B HEAD &&
>> +       compare_refs_in_dir A/middle HEAD B/middle HEAD &&
>> +       test -f B/file &&
>> +       test -f B/middle/file &&
>> +       ! test -f B/middle/inner/file &&
> 
> These days we typically use test_path_exists() (or
> test_path_is_file()) and test_path_is_missing() rather than bare
> `test`.

ok.


>> +test_expect_success 'setup recursive fetch with uninit submodule' '
>> +       # does not depend on any previous test setups
>> +
>> +       git init main &&
>> +       git init sub &&
>> +
>> +       touch sub/file &&
> 
> Unless the timestamp of the file is significant to the test, in which
> case `touch` is used, we normally create empty files like this:
> 
>      >sub/file &&

ok.


> 
>> +test_expect_success 'recursive fetch with uninit submodule' '
>> +       git -C main submodule deinit -f sub &&
>> +       ! git -C main fetch --recurse-submodules |&
>> +               grep -v -m1 "Fetching submodule sub$" &&
> 
> We want the test scripts to be portable, thus avoid Bashisms such as `|&`. > We also avoid placing a Git command upstream in a pipe since doing so
> causes the exit code of the Git command to be lost. Instead, we would
> normally send the Git output to a file and then send that file to
> whatever would be downstream of the Git command in the pipe. So, a
> mechanical rewrite of the above (without thinking too hard about it)
> might be:
> 
>      git -C main fetch --recurse-submodules >out 2>&1 &&
>      ! grep -v -m1 "Fetching submodule sub$" &&

In general I agree, but for this special test case, it's required to 
have the two commands connected by a pipe, as the grep needs to kill the 
git call in error case.  Otherwise for this regression git would go for 
an infinite recursion loop.

Of course, we can go for a "git 2>&1 | grep" solution.


>> +       git -C main submodule status |
>> +               sed -e "s/^-//" -e "s/ sub$//" >actual &&
> 
> Same comment about avoiding Git upstream in a pipe, so perhaps:
> 
>      git -C main submodule status >out &&
>      sed -e "s/^-//" -e "s/ sub$//" out >actual &&
> 
>> +       test_cmp expect actual
>> +'

ok.
Eric Sunshine Dec. 7, 2020, 8:40 a.m. UTC | #3
On Mon, Dec 7, 2020 at 3:29 AM Peter Kästle <peter.kaestle@nokia.com> wrote:
> On 04.12.20 19:06, Eric Sunshine wrote:
> > On Fri, Dec 4, 2020 at 10:25 AM Peter Kaestle <peter.kaestle@nokia.com> wrote:
> >> +       ! git -C main fetch --recurse-submodules |&
> >> +               grep -v -m1 "Fetching submodule sub$" &&
> >
> > We want the test scripts to be portable, thus avoid Bashisms such as `|&`.
> >      git -C main fetch --recurse-submodules >out 2>&1 &&
> >      ! grep -v -m1 "Fetching submodule sub$" &&
>
> In general I agree, but for this special test case, it's required to
> have the two commands connected by a pipe, as the grep needs to kill the
> git call in error case.  Otherwise for this regression git would go for
> an infinite recursion loop.
>
> Of course, we can go for a "git 2>&1 | grep" solution.

In that case, an in-code comment explaining why the output of `git`
must be piped to `grep` would be helpful since it is not uncommon for
people to "modernize" test code as they are working in the vicinity,
and removing such pipes is one of the modernizations often made. A
comment would help avoid such a change.
diff mbox series

Patch

diff --git a/submodule.c b/submodule.c
index b3bb59f066..b561445329 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1477,6 +1477,7 @@  static int get_next_submodule(struct child_process *cp,
 			strbuf_release(&submodule_prefix);
 			return 1;
 		} else {
+			struct strbuf empty_submodule_path = STRBUF_INIT;
 
 			fetch_task_release(task);
 			free(task);
@@ -1485,13 +1486,17 @@  static int get_next_submodule(struct child_process *cp,
 			 * An empty directory is normal,
 			 * the submodule is not initialized
 			 */
+			strbuf_addf(&empty_submodule_path, "%s/%s/",
+							spf->r->worktree,
+							ce->name);
 			if (S_ISGITLINK(ce->ce_mode) &&
-			    !is_empty_dir(ce->name)) {
+			    !is_empty_dir(empty_submodule_path.buf)) {
 				spf->result = 1;
 				strbuf_addf(err,
 					    _("Could not access submodule '%s'\n"),
 					    ce->name);
 			}
+			strbuf_release(&empty_submodule_path);
 		}
 	}
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index dd8e423d25..04eb587862 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -719,4 +719,98 @@  test_expect_success 'fetch new submodule commit intermittently referenced by sup
 	)
 '
 
+add_commit_push () {
+	dir="$1"
+	msg="$2"
+	shift 2
+	git -C "$dir" add "$@" &&
+	git -C "$dir" commit -a -m "$msg" &&
+	git -C "$dir" push
+}
+
+compare_refs_in_dir () {
+	fail= &&
+	if test "x$1" = 'x!'
+	then
+		fail='!' &&
+		shift
+	fi &&
+	git -C "$1" rev-parse --verify "$2" >expect &&
+	git -C "$3" rev-parse --verify "$4" >actual &&
+	eval $fail test_cmp expect actual
+}
+
+
+test_expect_success 'setup nested submodule fetch test' '
+	# does not depend on any previous test setups
+
+	for repo in outer middle inner
+	do
+		(
+			git init --bare $repo &&
+			git clone $repo ${repo}_content &&
+			echo "$repo" >"${repo}_content/file" &&
+			add_commit_push ${repo}_content "initial" file
+		) || return 1
+	done &&
+
+	git clone outer A &&
+	git -C A submodule add "$pwd/middle" &&
+	git -C A/middle/ submodule add "$pwd/inner" &&
+	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
+	add_commit_push A/ "adding middle sub" .gitmodules middle &&
+
+	git clone outer B &&
+	git -C B/ submodule update --init middle &&
+
+	compare_refs_in_dir A HEAD B HEAD &&
+	compare_refs_in_dir A/middle HEAD B/middle HEAD &&
+	test -f B/file &&
+	test -f B/middle/file &&
+	! test -f B/middle/inner/file &&
+
+	echo "change on inner repo of A" >"A/middle/inner/file" &&
+	add_commit_push A/middle/inner "change on inner" file &&
+	add_commit_push A/middle "change on inner" inner &&
+	add_commit_push A "change on inner" middle
+'
+
+test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
+	# depends on previous test for setup
+
+	git -C B/ fetch &&
+	compare_refs_in_dir A origin/master B origin/master
+'
+
+
+test_expect_success 'setup recursive fetch with uninit submodule' '
+	# does not depend on any previous test setups
+
+	git init main &&
+	git init sub &&
+
+	touch sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git -C main submodule add ../sub &&
+	git -C main submodule init &&
+	git -C main submodule update --checkout &&
+	git -C main submodule status |
+		sed -e "s/^ //" -e "s/ sub .*$//" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'recursive fetch with uninit submodule' '
+	# depends on previous test for setup
+
+	git -C main submodule deinit -f sub &&
+	! git -C main fetch --recurse-submodules |&
+		grep -v -m1 "Fetching submodule sub$" &&
+	git -C main submodule status |
+		sed -e "s/^-//" -e "s/ sub$//" >actual &&
+	test_cmp expect actual
+'
+
 test_done