Message ID | 1607348819-61355-1-git-send-email-peter.kaestle@nokia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] submodules: fix of regression on fetching of non-init subsub-repo | expand |
Hi Peter, > Le 7 déc. 2020 à 08:46, Peter Kaestle <peter.kaestle@nokia.com> a écrit : > > A regression has been introduced by a62387b (submodule.c: fetch in > submodules git directory instead of in worktree, 2018-11-28). > > The scenario in which it triggers is when one has a remote repository > with a subrepository inside a subrepository like this: > superproject/middle_repo/inner_repo The correct terminology is "submodule", not "subrepository". Also, (minor point) I would just write "when one has a repository", as its simpler (the repository by itself is not "remote", it is only "remote" in relation the repositories that are cloned from it). > Person A and B have both a clone of it, while Person B is not working > with the inner_repo and thus does not have it initialized in his working > copy. > > Now person A introduces a change to the inner_repo and propagates it > through the middle_repo and the superproject. > > Once person A pushed the changes and person B wants to fetch them using > "git fetch" on superproject level, s/on/at the/ > B's git call will return with error > saying: > > Could not access submodule 'inner_repo' > Errors during submodule fetch: > middle_repo > > Expectation is that in this case the inner submodule will be recognized > as uninitialized subrepository and skipped by the git fetch command. here again, terminology: "as an uninitialized submodule" > This used to work correctly before 'a62387b (submodule.c: fetch in > submodules git directory instead of in worktree, 2018-11-28)'. > > Starting with a62387b the code wants to evaluate "is_empty_dir()" inside > .git/modules for a directory only existing in the worktree, delivering > then of course wrong return value. > > This patch ensures is_empty_dir() is getting the correct path of the > uninitialized submodule by concatenation of the actual worktree and the > name of the uninitialized submodule. > > Furthermore a regression test case is added, which tests for recursive > fetches on a superproject with uninitialized sub repositories. > This > issue was leading to an infinite loop when doing a revert of a62387b. I would maybe add more details here, something like the following (we can cite your previous attempt, because it was merged to 'master'): The first attempt to fix this regression, in 1b7ac4e6d4 (submodules: fix of regression on fetching of non-init subsub-repo, 2020-11-12), by simply reverting a62387b, resulted in an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with uninitialized submodules, and so this commit was reverted in 7091499bc0 (Revert "submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02). To prevent future breakages, also add a regression test for this scenario. > > Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com> > CC: Junio C Hamano <gitster@pobox.com> > CC: Philippe Blain <levraiphilippeblain@gmail.com> > CC: Ralf Thielow <ralf.thielow@gmail.com> > CC: Eric Sunshine <sunshine@sunshineco.com> > --- > submodule.c | 7 ++- > t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 110 insertions(+), 1 deletion(-) > > diff --git a/submodule.c b/submodule.c > index b3bb59f066..b561445329 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp, > strbuf_release(&submodule_prefix); > return 1; > } else { > + struct strbuf empty_submodule_path = STRBUF_INIT; > > fetch_task_release(task); > free(task); > @@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp, > * An empty directory is normal, > * the submodule is not initialized > */ > + strbuf_addf(&empty_submodule_path, "%s/%s/", > + spf->r->worktree, > + ce->name); > if (S_ISGITLINK(ce->ce_mode) && > - !is_empty_dir(ce->name)) { > + !is_empty_dir(empty_submodule_path.buf)) { > spf->result = 1; > strbuf_addf(err, > _("Could not access submodule '%s'\n"), > ce->name); > } > + strbuf_release(&empty_submodule_path); > } > } Maybe a personal preference, but I would have gone for something a little simpler, like the following: diff --git a/submodule.c b/submodule.c index b3bb59f066..4200865174 100644 --- a/submodule.c +++ b/submodule.c @@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp, * the submodule is not initialized */ if (S_ISGITLINK(ce->ce_mode) && - !is_empty_dir(ce->name)) { + !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) { spf->result = 1; strbuf_addf(err, _("Could not access submodule '%s'\n"), > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > index dd8e423d25..666dd1e2b7 100755 > --- a/t/t5526-fetch-submodules.sh > +++ b/t/t5526-fetch-submodules.sh > @@ -719,4 +719,108 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup > ) > ' > > +add_commit_push () { > + dir="$1" && > + msg="$2" && > + shift 2 && > + git -C "$dir" add "$@" && > + git -C "$dir" commit -a -m "$msg" && > + git -C "$dir" push > +} > + > +compare_refs_in_dir () { > + fail= && > + if test "x$1" = 'x!' > + then > + fail='!' && > + shift > + fi && > + git -C "$1" rev-parse --verify "$2" >expect && > + git -C "$3" rev-parse --verify "$4" >actual && > + eval $fail test_cmp expect actual > +} > + > + > +test_expect_success 'setup nested submodule fetch test' ' > + # does not depend on any previous test setups > + > + for repo in outer middle inner > + do > + git init --bare $repo && > + git clone $repo ${repo}_content && > + echo "$repo" >"${repo}_content/file" && > + add_commit_push ${repo}_content "initial" file || > + return 1 > + done && > + > + git clone outer A && > + git -C A submodule add "$pwd/middle" && > + git -C A/middle/ submodule add "$pwd/inner" && > + add_commit_push A/middle/ "adding inner sub" .gitmodules inner && > + add_commit_push A/ "adding middle sub" .gitmodules middle && > + > + git clone outer B && > + git -C B/ submodule update --init middle && > + > + compare_refs_in_dir A HEAD B HEAD && > + compare_refs_in_dir A/middle HEAD B/middle HEAD && > + test_path_is_file B/file && > + test_path_is_file B/middle/file && > + test_path_is_missing B/middle/inner/file && > + > + echo "change on inner repo of A" >"A/middle/inner/file" && > + add_commit_push A/middle/inner "change on inner" file && > + add_commit_push A/middle "change on inner" inner && > + add_commit_push A "change on inner" middle > +' > + > +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' > + # depends on previous test for setup > + > + git -C B/ fetch && > + compare_refs_in_dir A origin/master B origin/master > +' > + > + > +test_expect_success 'setup recursive fetch with uninit submodule' ' > + # does not depend on any previous test setups > + > + git init main && > + git init sub && > + > + >sub/file && > + git -C sub add file && > + git -C sub commit -m "add file" && > + git -C sub rev-parse HEAD >expect && > + > + git -C main submodule add ../sub && > + git -C main submodule init && > + git -C main submodule update --checkout && These two steps are unnecessary as they are implicitly done by 'git submodule add'. I think we could reflect real life a little bit more by cloning the superproject, and running the 'recursive fetch with uninit submodule' test below in the clone. > + git -C main submodule status >out && > + sed -e "s/^ //" -e "s/ sub .*$//" out >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'recursive fetch with uninit submodule' ' > + # depends on previous test for setup > + > + git -C main submodule deinit -f sub && Here you are deiniting the submodule, such that the Git directory will stay in .git/modules/sub. This is not the same thing as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub will not yet exist. So maybe we could harden the tests by also testing for that scenario ? I don't know... maybe the infinite loop only happens if .git/modules/sub actually already exists. If so, the test name should be "recursive fetch with deinitialized submodule", I think. > + > + # In a regression the following git call will run into infinite recursion. > + # To handle that, we connect the grep command to the git call by a pipe > + # so that grep can kill the infinite recusion when detected. > + # The recursion creates git output like: > + # Fetching submodule sub > + # Fetching submodule sub/sub <-- [1] > + # Fetching submodule sub/sub/sub > + # ... > + # [1] grep will trigger here and kill git by exiting and closing its stdin > + > + ! git -C main fetch --recurse-submodules 2>&1 | > + grep -v -m1 "Fetching submodule sub$" && > + git -C main submodule status >out && > + sed -e "s/^-//" -e "s/ sub$//" out >actual && > + test_cmp expect actual > +' > + > test_done Thanks for working on that, and sorry for not having the time to comment before you sent v2. Cheers, Philippe.
Peter Kaestle <peter.kaestle@nokia.com> writes: > +add_commit_push () { > + dir="$1" && > + msg="$2" && > + shift 2 && > + git -C "$dir" add "$@" && > + git -C "$dir" commit -a -m "$msg" && > + git -C "$dir" push > +} > + > +compare_refs_in_dir () { > + fail= && > + if test "x$1" = 'x!' > + then > + fail='!' && > + shift > + fi && > + git -C "$1" rev-parse --verify "$2" >expect && > + git -C "$3" rev-parse --verify "$4" >actual && > + eval $fail test_cmp expect actual > +} > +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' > + # depends on previous test for setup > + > + git -C B/ fetch && > + compare_refs_in_dir A origin/master B origin/master Can we do this without relying on the name of the default branch? Perhaps when outer, middle and inner are prepared, they can be forced to be on the 'sample' (not 'master' nor 'main') branch, or something like that? > +test_expect_success 'setup recursive fetch with uninit submodule' ' > + # does not depend on any previous test setups > + > + git init main && > + git init sub && "super vs sub" would give us a better contrast than "main vs sub", and it would help reduce mistakes in the mechanical conversion of "master" to "main" happening in another topic. > + # In a regression the following git call will run into infinite recursion. > + # To handle that, we connect the grep command to the git call by a pipe > + # so that grep can kill the infinite recusion when detected. > + # The recursion creates git output like: > + # Fetching submodule sub > + # Fetching submodule sub/sub <-- [1] > + # Fetching submodule sub/sub/sub > + # ... > + # [1] grep will trigger here and kill git by exiting and closing its stdin "trigger here and kill..." -> "stop reading and cause git to eventually stop and die" But we probably cannot use 'grep -m1' so it is a moot point. > + > + ! git -C main fetch --recurse-submodules 2>&1 | > + grep -v -m1 "Fetching submodule sub$" && Unfortunately, "grep -m<count>" is not even in POSIX, I would think. What do we expect to happen in the correct case? - A line "Fetching submodule sub" and nothing else is given? That feels a bit brittle (how are we making sure, in the presence of "2>&1", that we will not get any other output, like progress?) - "sub" is the only thing that appears on lines that begin with "Fetching submodule" (i.e. "Fetching submodule $something" where $something is not 'sub' is an error), and we allow other garbage in the output? That would be a bit more robust than the above. As you seem to be comfortable using "sed" below, perhaps use it to extract the first few lines that say "^Fetching submodule " from the output and stop, and check that the output has only one such line about 'sub' and nothing else? > + git -C main submodule status >out && > + sed -e "s/^-//" -e "s/ sub$//" out >actual && > + test_cmp expect actual
Philippe Blain <levraiphilippeblain@gmail.com> writes: > Maybe a personal preference, but I would have gone for something a > little simpler, like the following: > > diff --git a/submodule.c b/submodule.c > index b3bb59f066..4200865174 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp, > * the submodule is not initialized > */ > if (S_ISGITLINK(ce->ce_mode) && > - !is_empty_dir(ce->name)) { > + !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) { But then you leak the return value from repo_worktree_path(), no? >> +test_expect_success 'recursive fetch with uninit submodule' ' >> + # depends on previous test for setup >> + >> + git -C main submodule deinit -f sub && > > Here you are deiniting the submodule, such that > the Git directory will stay in .git/modules/sub. This is not the same thing > as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub > will not yet exist. So maybe we could harden the tests by also testing > for that scenario ? I don't know... maybe the infinite loop only happens > if .git/modules/sub actually already exists. If so, the test name should be > "recursive fetch with deinitialized submodule", I think. Even if the original breakage happens only for deinitialized case, it would be sensible to test uninitialized one as well, I would think. > Thanks for working on that, and sorry for not having the time to comment before > you sent v2. Thanks, all. It's not like we corral everybody to a single place on a single day to work on a single thing. Reviews ov v2 by reviewers who have not seen v1 is totally expected and very much appreciated.
Philippe Blain <levraiphilippeblain@gmail.com> writes: > I would maybe add more details here, something like the following > (we can cite your previous attempt, because it was merged to 'master'): > > The first attempt to fix this regression, in 1b7ac4e6d4 > (submodules: fix of regression on fetching of non-init > subsub-repo, 2020-11-12), by simply reverting a62387b, resulted in > an infinite loop of submodule fetches in the simpler case of a > recursive fetch of a superproject with uninitialized submodules, > and so this commit was reverted in 7091499bc0 (Revert "submodules: > fix of regression on fetching of non-init subsub-repo", > 2020-12-02). To prevent future breakages, also add a regression > test for this scenario. Forgot to mention in my other response, but I do find this a very sensible addition. Thanks.
Hi Junio, > Le 7 déc. 2020 à 14:22, Junio C Hamano <gitster@pobox.com> a écrit : > > Peter Kaestle <peter.kaestle@nokia.com> writes: > >> +add_commit_push () { >> + dir="$1" && >> + msg="$2" && >> + shift 2 && >> + git -C "$dir" add "$@" && >> + git -C "$dir" commit -a -m "$msg" && >> + git -C "$dir" push >> +} >> + >> +compare_refs_in_dir () { >> + fail= && >> + if test "x$1" = 'x!' >> + then >> + fail='!' && >> + shift >> + fi && >> + git -C "$1" rev-parse --verify "$2" >expect && >> + git -C "$3" rev-parse --verify "$4" >actual && >> + eval $fail test_cmp expect actual >> +} > > > >> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' >> + # depends on previous test for setup >> + >> + git -C B/ fetch && >> + compare_refs_in_dir A origin/master B origin/master > > Can we do this without relying on the name of the default branch? > Perhaps when outer, middle and inner are prepared, they can be > forced to be on the 'sample' (not 'master' nor 'main') branch, or > something like that? Or, simpler, we could call "git remote set-head -a' in A and B in the setup script, which would make origin/HEAD in A and B point to the default branch, such that the call here could be : compare_refs_in_dir A origin/HEAD B origin/HEAD Philippe.
Philippe Blain <levraiphilippeblain@gmail.com> writes: >>> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' >>> + # depends on previous test for setup >>> + >>> + git -C B/ fetch && >>> + compare_refs_in_dir A origin/master B origin/master >> >> Can we do this without relying on the name of the default branch? >> Perhaps when outer, middle and inner are prepared, they can be >> forced to be on the 'sample' (not 'master' nor 'main') branch, or >> something like that? > > Or, simpler, we could call "git remote set-head -a' > in A and B in the setup script, which would make > origin/HEAD in A and B point to the default branch, > such that the call here could be : The set-up prepares A and B by cloning from elsewhere, no? Should we even need a set-head call? > compare_refs_in_dir A origin/HEAD B origin/HEAD Yes, using HEAD would be another simple way to avoid having to rely on the default behaviour. THanks.
Junio C Hamano <gitster@pobox.com> writes: > Philippe Blain <levraiphilippeblain@gmail.com> writes: > >>>> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' >>>> + # depends on previous test for setup >>>> + >>>> + git -C B/ fetch && >>>> + compare_refs_in_dir A origin/master B origin/master >>> >>> Can we do this without relying on the name of the default branch? >>> Perhaps when outer, middle and inner are prepared, they can be >>> forced to be on the 'sample' (not 'master' nor 'main') branch, or >>> something like that? >> >> Or, simpler, we could call "git remote set-head -a' >> in A and B in the setup script, which would make >> origin/HEAD in A and B point to the default branch, >> such that the call here could be : > > The set-up prepares A and B by cloning from elsewhere, no? Should > we even need a set-head call? Ah, they are created by cloning an empty repository. That explains why. Thanks. >> compare_refs_in_dir A origin/HEAD B origin/HEAD > > Yes, using HEAD would be another simple way to avoid having to rely > on the default behaviour. > > THanks.
On 07.12.20 20:43, Junio C Hamano wrote: > Philippe Blain <levraiphilippeblain@gmail.com> writes: >> Thanks for working on that, and sorry for not having the time to comment before >> you sent v2. > > Thanks, all. > > It's not like we corral everybody to a single place on a single day > to work on a single thing. Reviews ov v2 by reviewers who have not > seen v1 is totally expected and very much appreciated. > Thank you very much for all your comments, they're much appreciated. I prefer spending more time on it now for getting it right, than to take another revert-rethink-refactor round. It will take me some time today to work on them.
On 07.12.20 19:42, Philippe Blain wrote: > Hi Peter, > >> Le 7 déc. 2020 à 08:46, Peter Kaestle <peter.kaestle@nokia.com> a écrit : >> >> A regression has been introduced by a62387b (submodule.c: fetch in >> submodules git directory instead of in worktree, 2018-11-28). >> >> The scenario in which it triggers is when one has a remote repository >> with a subrepository inside a subrepository like this: >> superproject/middle_repo/inner_repo > > The correct terminology is "submodule", not "subrepository". > > Also, (minor point) I would just write "when one has a repository", > as its simpler (the repository by itself is not "remote", it is only "remote" > in relation the repositories that are cloned from it). ok. >> Person A and B have both a clone of it, while Person B is not working >> with the inner_repo and thus does not have it initialized in his working >> copy. >> >> Now person A introduces a change to the inner_repo and propagates it >> through the middle_repo and the superproject. >> >> Once person A pushed the changes and person B wants to fetch them using >> "git fetch" on superproject level, > > s/on/at the/ ok. >> B's git call will return with error >> saying: >> >> Could not access submodule 'inner_repo' >> Errors during submodule fetch: >> middle_repo >> >> Expectation is that in this case the inner submodule will be recognized >> as uninitialized subrepository and skipped by the git fetch command. > > here again, terminology: "as an uninitialized submodule" ok. >> This used to work correctly before 'a62387b (submodule.c: fetch in >> submodules git directory instead of in worktree, 2018-11-28)'. >> >> Starting with a62387b the code wants to evaluate "is_empty_dir()" inside >> .git/modules for a directory only existing in the worktree, delivering >> then of course wrong return value. >> >> This patch ensures is_empty_dir() is getting the correct path of the >> uninitialized submodule by concatenation of the actual worktree and the >> name of the uninitialized submodule. >> >> Furthermore a regression test case is added, which tests for recursive >> fetches on a superproject with uninitialized sub repositories. >> This >> issue was leading to an infinite loop when doing a revert of a62387b. > > I would maybe add more details here, something like the following > (we can cite your previous attempt, because it was merged to 'master'): > > The first attempt to fix this regression, in 1b7ac4e6d4 (submodules: > fix of regression on fetching of non-init subsub-repo, 2020-11-12), by simply > reverting a62387b, resulted in > an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with > uninitialized submodules, and so this commit was reverted in 7091499bc0 (Revert > "submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02). > To prevent future breakages, also add a regression test for this scenario. Jip, I like that. >> >> Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com> >> CC: Junio C Hamano <gitster@pobox.com> >> CC: Philippe Blain <levraiphilippeblain@gmail.com> >> CC: Ralf Thielow <ralf.thielow@gmail.com> >> CC: Eric Sunshine <sunshine@sunshineco.com> >> --- >> submodule.c | 7 ++- >> t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 110 insertions(+), 1 deletion(-) >> >> diff --git a/submodule.c b/submodule.c >> index b3bb59f066..b561445329 100644 >> --- a/submodule.c >> +++ b/submodule.c >> @@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp, >> strbuf_release(&submodule_prefix); >> return 1; >> } else { >> + struct strbuf empty_submodule_path = STRBUF_INIT; >> >> fetch_task_release(task); >> free(task); >> @@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp, >> * An empty directory is normal, >> * the submodule is not initialized >> */ >> + strbuf_addf(&empty_submodule_path, "%s/%s/", >> + spf->r->worktree, >> + ce->name); >> if (S_ISGITLINK(ce->ce_mode) && >> - !is_empty_dir(ce->name)) { >> + !is_empty_dir(empty_submodule_path.buf)) { >> spf->result = 1; >> strbuf_addf(err, >> _("Could not access submodule '%s'\n"), >> ce->name); >> } >> + strbuf_release(&empty_submodule_path); >> } >> } > > > Maybe a personal preference, but I would have gone for something a little simpler, like the following: > > > diff --git a/submodule.c b/submodule.c > index b3bb59f066..4200865174 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp, > * the submodule is not initialized > */ > if (S_ISGITLINK(ce->ce_mode) && > - !is_empty_dir(ce->name)) { > + !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) { I'm not deep enough into the git code to judge which approach is the better one. From my perspective, being a foreigner to the git code, I like my proposed code more, as for me it's much easier to understand what's happening by having a meaningful variable name and without being forced to dig into outer functions first. Also Junio C Hamano <gitster@pobox.com> is having some concerns, which I can't judge: > But then you leak the return value from repo_worktree_path(), no? Thus for v3 I'll stick to my proposal and when you'll review it, please discuss with each other whether I should go for a v4 using repo_worktree_path(). [...] >> + >> +test_expect_success 'setup recursive fetch with uninit submodule' ' >> + # does not depend on any previous test setups >> + >> + git init main && >> + git init sub && >> + >> + >sub/file && >> + git -C sub add file && >> + git -C sub commit -m "add file" && >> + git -C sub rev-parse HEAD >expect && >> + >> + git -C main submodule add ../sub && >> + git -C main submodule init && >> + git -C main submodule update --checkout && > > These two steps are unnecessary as they are implicitly done by 'git submodule add'. > I think we could reflect real life a little bit more by cloning the superproject, and running > the 'recursive fetch with uninit submodule' test below in the clone. Yes, you're right, "...init" and "...update..." can be removed. >> + git -C main submodule status >out && >> + sed -e "s/^ //" -e "s/ sub .*$//" out >actual && >> + test_cmp expect actual >> +' >> + >> +test_expect_success 'recursive fetch with uninit submodule' ' >> + # depends on previous test for setup >> + >> + git -C main submodule deinit -f sub && > > Here you are deiniting the submodule, such that > the Git directory will stay in .git/modules/sub. This is not the same thing > as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub > will not yet exist. So maybe we could harden the tests by also testing > for that scenario ? I don't know... maybe the infinite loop only happens > if .git/modules/sub actually already exists. If so, the test name should be > "recursive fetch with deinitialized submodule", I think. I added another test case for v3, which checks for this in case of never initialized submodule. When executing the test, I can see that the infinite loop regression only occurs after doing the init followed by a deinit. Thus renaming the test accordingly.
On 07.12.20 20:22, Junio C Hamano wrote: > Peter Kaestle <peter.kaestle@nokia.com> writes: > >> +add_commit_push () { >> + dir="$1" && >> + msg="$2" && >> + shift 2 && >> + git -C "$dir" add "$@" && >> + git -C "$dir" commit -a -m "$msg" && >> + git -C "$dir" push >> +} >> + >> +compare_refs_in_dir () { >> + fail= && >> + if test "x$1" = 'x!' >> + then >> + fail='!' && >> + shift >> + fi && >> + git -C "$1" rev-parse --verify "$2" >expect && >> + git -C "$3" rev-parse --verify "$4" >actual && >> + eval $fail test_cmp expect actual >> +} > > > >> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' >> + # depends on previous test for setup >> + >> + git -C B/ fetch && >> + compare_refs_in_dir A origin/master B origin/master > > Can we do this without relying on the name of the default branch? > Perhaps when outer, middle and inner are prepared, they can be > forced to be on the 'sample' (not 'master' nor 'main') branch, or > something like that? Using origin/HEAD for compare_refs_in_dir should be fine without additional setup, as for the regression the "git -C B/ fetch" will fail and return with false (see description of the patch). This compare_refs_in_dir is just for additional checking as you proposed in the mail: https://public-inbox.org/git/xmqqk0uuct94.fsf@gitster.c.googlers.com/ ------------8<------------- > And from B that was an original copy of A with only the top and > middle layer instantiated, you run "git fetch". Are you happy as > long as "git fetch" does not exit with non-zero status? That is > hard to believe---it may be a necessary condition for the command to > exit with zero status, but you have other expectations, like what > commit the remote tracking branch refs/remotes/origin/HEAD ought to > be pointing at. I think we should check that, too. ----------->8------------- > >> +test_expect_success 'setup recursive fetch with uninit submodule' ' >> + # does not depend on any previous test setups >> + >> + git init main && >> + git init sub && > > "super vs sub" would give us a better contrast than "main vs sub", > and it would help reduce mistakes in the mechanical conversion of > "master" to "main" happening in another topic. > ok. >> + # In a regression the following git call will run into infinite recursion. >> + # To handle that, we connect the grep command to the git call by a pipe >> + # so that grep can kill the infinite recusion when detected. >> + # The recursion creates git output like: >> + # Fetching submodule sub >> + # Fetching submodule sub/sub <-- [1] >> + # Fetching submodule sub/sub/sub >> + # ... >> + # [1] grep will trigger here and kill git by exiting and closing its stdin > > "trigger here and kill..." -> "stop reading and cause git to > eventually stop and die" > > But we probably cannot use 'grep -m1' so it is a moot point. > >> + >> + ! git -C main fetch --recurse-submodules 2>&1 | >> + grep -v -m1 "Fetching submodule sub$" && > > Unfortunately, "grep -m<count>" is not even in POSIX, I would think. > > What do we expect to happen in the correct case? sigh, we can't use grep -m1. Too bad, it was such a nice solution. > > - A line "Fetching submodule sub" and nothing else is given? That > feels a bit brittle (how are we making sure, in the presence of > "2>&1", that we will not get any other output, like progress?) > > - "sub" is the only thing that appears on lines that begin with > "Fetching submodule" (i.e. "Fetching submodule $something" where > $something is not 'sub' is an error), and we allow other garbage > in the output? That would be a bit more robust than the above. > > As you seem to be comfortable using "sed" below, perhaps use it to > extract the first few lines that say "^Fetching submodule " from the > output and stop, and check that the output has only one such line > about 'sub' and nothing else? According to [1] posix sed offers equal possibility to quit like grep -m1 and I'll adopt: $> yes posixgrepisnogoodforus | sed "/posix/q" [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html Looking at the other mails I think I processed over all open comments and will prepare for v3. Thanks again.
diff --git a/submodule.c b/submodule.c index b3bb59f066..b561445329 100644 --- a/submodule.c +++ b/submodule.c @@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp, strbuf_release(&submodule_prefix); return 1; } else { + struct strbuf empty_submodule_path = STRBUF_INIT; fetch_task_release(task); free(task); @@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp, * An empty directory is normal, * the submodule is not initialized */ + strbuf_addf(&empty_submodule_path, "%s/%s/", + spf->r->worktree, + ce->name); if (S_ISGITLINK(ce->ce_mode) && - !is_empty_dir(ce->name)) { + !is_empty_dir(empty_submodule_path.buf)) { spf->result = 1; strbuf_addf(err, _("Could not access submodule '%s'\n"), ce->name); } + strbuf_release(&empty_submodule_path); } } diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh index dd8e423d25..666dd1e2b7 100755 --- a/t/t5526-fetch-submodules.sh +++ b/t/t5526-fetch-submodules.sh @@ -719,4 +719,108 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup ) ' +add_commit_push () { + dir="$1" && + msg="$2" && + shift 2 && + git -C "$dir" add "$@" && + git -C "$dir" commit -a -m "$msg" && + git -C "$dir" push +} + +compare_refs_in_dir () { + fail= && + if test "x$1" = 'x!' + then + fail='!' && + shift + fi && + git -C "$1" rev-parse --verify "$2" >expect && + git -C "$3" rev-parse --verify "$4" >actual && + eval $fail test_cmp expect actual +} + + +test_expect_success 'setup nested submodule fetch test' ' + # does not depend on any previous test setups + + for repo in outer middle inner + do + git init --bare $repo && + git clone $repo ${repo}_content && + echo "$repo" >"${repo}_content/file" && + add_commit_push ${repo}_content "initial" file || + return 1 + done && + + git clone outer A && + git -C A submodule add "$pwd/middle" && + git -C A/middle/ submodule add "$pwd/inner" && + add_commit_push A/middle/ "adding inner sub" .gitmodules inner && + add_commit_push A/ "adding middle sub" .gitmodules middle && + + git clone outer B && + git -C B/ submodule update --init middle && + + compare_refs_in_dir A HEAD B HEAD && + compare_refs_in_dir A/middle HEAD B/middle HEAD && + test_path_is_file B/file && + test_path_is_file B/middle/file && + test_path_is_missing B/middle/inner/file && + + echo "change on inner repo of A" >"A/middle/inner/file" && + add_commit_push A/middle/inner "change on inner" file && + add_commit_push A/middle "change on inner" inner && + add_commit_push A "change on inner" middle +' + +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' ' + # depends on previous test for setup + + git -C B/ fetch && + compare_refs_in_dir A origin/master B origin/master +' + + +test_expect_success 'setup recursive fetch with uninit submodule' ' + # does not depend on any previous test setups + + git init main && + git init sub && + + >sub/file && + git -C sub add file && + git -C sub commit -m "add file" && + git -C sub rev-parse HEAD >expect && + + git -C main submodule add ../sub && + git -C main submodule init && + git -C main submodule update --checkout && + git -C main submodule status >out && + sed -e "s/^ //" -e "s/ sub .*$//" out >actual && + test_cmp expect actual +' + +test_expect_success 'recursive fetch with uninit submodule' ' + # depends on previous test for setup + + git -C main submodule deinit -f sub && + + # In a regression the following git call will run into infinite recursion. + # To handle that, we connect the grep command to the git call by a pipe + # so that grep can kill the infinite recusion when detected. + # The recursion creates git output like: + # Fetching submodule sub + # Fetching submodule sub/sub <-- [1] + # Fetching submodule sub/sub/sub + # ... + # [1] grep will trigger here and kill git by exiting and closing its stdin + + ! git -C main fetch --recurse-submodules 2>&1 | + grep -v -m1 "Fetching submodule sub$" && + git -C main submodule status >out && + sed -e "s/^-//" -e "s/ sub$//" out >actual && + test_cmp expect actual +' + test_done
A regression has been introduced by a62387b (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28). The scenario in which it triggers is when one has a remote repository with a subrepository inside a subrepository like this: superproject/middle_repo/inner_repo Person A and B have both a clone of it, while Person B is not working with the inner_repo and thus does not have it initialized in his working copy. Now person A introduces a change to the inner_repo and propagates it through the middle_repo and the superproject. Once person A pushed the changes and person B wants to fetch them using "git fetch" on superproject level, B's git call will return with error saying: Could not access submodule 'inner_repo' Errors during submodule fetch: middle_repo Expectation is that in this case the inner submodule will be recognized as uninitialized subrepository and skipped by the git fetch command. This used to work correctly before 'a62387b (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)'. Starting with a62387b the code wants to evaluate "is_empty_dir()" inside .git/modules for a directory only existing in the worktree, delivering then of course wrong return value. This patch ensures is_empty_dir() is getting the correct path of the uninitialized submodule by concatenation of the actual worktree and the name of the uninitialized submodule. Furthermore a regression test case is added, which tests for recursive fetches on a superproject with uninitialized sub repositories. This issue was leading to an infinite loop when doing a revert of a62387b. Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com> CC: Junio C Hamano <gitster@pobox.com> CC: Philippe Blain <levraiphilippeblain@gmail.com> CC: Ralf Thielow <ralf.thielow@gmail.com> CC: Eric Sunshine <sunshine@sunshineco.com> --- submodule.c | 7 ++- t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+), 1 deletion(-)