diff mbox series

[RFC,1/2] submodules: test for fetch of non-init subsub-repo

Message ID 1604910829-49109-2-git-send-email-peter.kaestle@nokia.com (mailing list archive)
State New, archived
Headers show
Series Handling regression introduced by a62387b | expand

Commit Message

Peter Kaestle Nov. 9, 2020, 8:33 a.m. UTC
This test case triggers a regression, which was introduced by
a62387b3fc9f5aeeb04a2db278121d33a9caafa7 in following setup:
outer_repo/middle_repo/inner_repo and a change in the remote of
inner_repo happens.  Then it's being fetched by a second clone of the
outer repo, in which the middle is initialized, but the inner is not.

This causes is_empty_dir() in submodule.c:get_next_submodule() to
check for a directory only existing in the actual worktree, while the
is_empty_dir() being called from .git/modules.

Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
---
 t/t5526-fetch-submodules.sh | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

Comments

Junio C Hamano Nov. 9, 2020, 5:52 p.m. UTC | #1
Peter Kaestle <peter.kaestle@nokia.com> writes:

> This test case triggers a regression, which was introduced by
> a62387b3fc9f5aeeb04a2db278121d33a9caafa7 in following setup:

Minor nit.  Please refer to a commit like so:

a62387b3 (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)

That is what "git show -s --pretty=reference" gives for the commit.

If you have older git, "--format='%h (%s, %ad)' --date=short" would
work.

Instead of saying "if you follow this complex thing, it breaks and
it is a regression at there", please describe it as a regular bugfix
log message.  Describe the set-up first, explain the operation you'd
perform under the condition, and tell readers what your expected
outcome is.  Then tell readers what actually happens, and how that
is different from your expected outcome.  Additionally, tell readers
that it used to work before such and such commit broke it and what
the root cause of the breakage is.

What commit the set-up was broken is also an interesting piece of
information, but it is not as important in the overall picture.

Also, it probably is a better arrangement, after explaining how the
current system does not work in the log message, to have the code
fix in the same patch and add test to ensure the bug will stay
fixed, in a single patch.  That way, you do not have to start with
expect_failure and then flip the polarity to expect_success, which
is a horrible style for reviewers to understand the code fix because
the second "fix" step does not actually show the effect of what got
fixed in the patch (the test change shows the flip of the polarity
of the test plus only a few context lines and does not show what
behaviour change the "fix" causes).


> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index dd8e423..9fbd481 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -719,4 +719,42 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
>  	)
>  '
>  
> +add_commit_push()
> +{

Style. 

    add_commit_push () {

cf. Documentation/CodingGuidelines.

> +	dir="$1"
> +	msg="$2"
> +	shift 2
> +	git -C "$dir" add "$@" &&
> +	git -C "$dir" commit -a -m "$msg" &&
> +	git -C "$dir" push
> +}
> +
> +test_expect_failure 'fetching a superproject containing an uninitialized sub/sub project' '
> +	# does not depend on any previous test setups
> +
> +	for repo in outer middle inner
> +	do
> +		git init --bare $repo &&
> +		git clone $repo ${repo}_content &&
> +		echo $repo > ${repo}_content/file &&

Style. 

    echo "$repo" >"${repo}_content/file" &&

cf. Documentation/CodingGuidelines.

> +		add_commit_push ${repo}_content "initial" file

If any of these iterations, except for the last one, fails in the
loop, you do not notice the breakage and go on to the next
iteration.  You'd need "|| return 1" at the end, perhaps.

So far, you created three bare repositories called outer, middle and
inner, and each of {outer,middle,inner}_content repositories is a
copy with a working tree of its counterpart.

> +	done &&
> +
> +	git clone outer A &&
> +	git -C A submodule add "$pwd/middle" &&
> +	git -C A/middle/ submodule add "$pwd/inner" &&

Hmph.  Is it essential to name these directories with full pathname
for the problem to reproduce, or would the issue also appear if
these repositories refer to each other with relative pathnames?
Just being curious---if it only breaks with one and succeeds with
the other, that deserves commenting here.

So far, you created A that is "outer", added "middle" as its
submodule and then added "inner" as a submodule of "middle".

Although it is not wrong per-se, it somehow feels a bit unnatural
that you didn't do all of the above in the working trees you created
in the previous step---I would have expected that middle_content
working tree would be used to add "inner" as its submodule, for
example.


> +	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
> +	add_commit_push A/ "adding middle sub" .gitmodules middle &&

And then you conclude the addition of submodules by recording each
of these two "submodule add" events in a commit and push it out.

> +	git clone outer B &&
> +	git -C B/ submodule update --init middle &&

And then you clone the outer thing (which does not recursively
instantiate) from A, and instantiate the middle layer (which does
not recursively instantiate the bottom later, I presume?)

I _think_ the state here should be minimally validated in this test.
If you expect 'outer' and 'middle' are instantiated, perhaps check
its contents (e.g. do you have a thing called 'file'?  What does it
have in it?) and check the commit (e.g. does 'rev-parse HEAD' give
you the commit you expect?).  If you expect 'inner' is not
instantiated at this point, that should be vaildated as well.  If
anything, that would explain what your expectations are better than
any word in the proposed log message.

In any case, i presume that up to this point things work as expected
with or without the "fix" patch?  If so, the usual way we structure
these tests is to stop here and make that a single "setup" test.
Start the whole sequence above like so, perhaps.

    test_expect_success 'setup nested submodule fetch test' '
		...


and close here.

And then the "interesting" part of the test.  

> +	echo "change on inner repo of A" > A/middle/inner/file &&

Style.

> +	add_commit_push A/middle/inner "change on inner" file &&
> +	add_commit_push A/middle "change on inner" inner &&
> +	add_commit_push A "change on inner" middle &&

So you create a new commit in the bottom layer, propagate it up to
the middle layer, and to the outer layer.  Are these steps also what
you expect to succeed, or does the "regression" break any of these?
If these are still part of set-up that is expected to work, you
probably need to roll these up to the 'setup' step (with some
validation to express what the tests are expecting). From your
description, which did not say where exactly in this long sequence
you expect things to break, unfortunately no reader can tell, so
I'll leave the restructuring up to you.

> +
> +	git -C B/ fetch

And from B that was an original copy of A with only the top and
middle layer instantiated, you run "git fetch".  Are you happy as
long as "git fetch" does not exit with non-zero status?  That is
hard to believe---it may be a necessary condition for the command to
exit with zero status, but you have other expectations, like what
commit the remote tracking branch refs/remotes/origin/HEAD ought to
be pointing at.  I think we should check that, too.

> +'
> +
>  test_done
Peter Kaestle Nov. 11, 2020, 12:45 p.m. UTC | #2
On 09.11.20 18:52, Junio C Hamano wrote:
> Peter Kaestle <peter.kaestle@nokia.com> writes:
> 
>> This test case triggers a regression, which was introduced by
>> a62387b3fc9f5aeeb04a2db278121d33a9caafa7 in following setup:
> 
> Minor nit.  Please refer to a commit like so:
> 
> a62387b3 (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)
> 
> That is what "git show -s --pretty=reference" gives for the commit.
> 
> If you have older git, "--format='%h (%s, %ad)' --date=short" would
> work.

Thanks for this hint, this is really useful.


> Instead of saying "if you follow this complex thing, it breaks and
> it is a regression at there", please describe it as a regular bugfix
> log message.  Describe the set-up first, explain the operation you'd
> perform under the condition, and tell readers what your expected
> outcome is.  Then tell readers what actually happens, and how that
> is different from your expected outcome.  Additionally, tell readers
> that it used to work before such and such commit broke it and what
> the root cause of the breakage is.

hm... I did do this in the cover letter, maybe you missed it or I was 
not able to express myself good enough there.
Anyhow, I'll add it to the commit messages, which goes into the log.

Here is my proposal for a new commit message of the test case:

----8<----
A regression has been introduced by 'a62387b (submodule.c: fetch in 
submodules git directory instead of in worktree, 2018-11-28)'.

The scenario in which it triggers is when one has a remote repository 
with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo

Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.

Now person A introduces a change to the inner_repo and propagates it 
through the middle_repo and the superproject.
Once person A pushed the changes and person B wants to fetch them using 
"git fetch" on superproject level, git will return with error saying:

Could not access submodule 'inner_repo'
Errors during submodule fetch:
         middle_repo

Expectation is that in this case the inner submodule will be recognized 
as uninitialized subrepository and skipped by the git fetch command.

This used to work correctly before 'a62387b (submodule.c: fetch in 
submodules git directory instead of in worktree, 2018-11-28)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside 
.git/modules for a directory only existing in the worktree, delivering 
then of course wrong return value.
---->8----


About the revert of the a62387b commit, which I proposed in the second 
patch, I'm not sure it's the right way.  The revert was simply my quick 
approach to fix it.  As I'm not fully aware of what the idea was behind 
handling the submodules inside .git/modules instead of the worktree, I 
don't know whether this is the best solution.  Maybe rethinking the 
whole get_next_submodule() algorithm or simply fixing the is_empty_dir() 
to use the worktree path will be a better solution.
--> We should discuss about this.


> What commit the set-up was broken is also an interesting piece of
> information, but it is not as important in the overall picture.
> 
> Also, it probably is a better arrangement, after explaining how the
> current system does not work in the log message, to have the code
> fix in the same patch and add test to ensure the bug will stay
> fixed, in a single patch.  That way, you do not have to start with
> expect_failure and then flip the polarity to expect_success, which
> is a horrible style for reviewers to understand the code fix because
> the second "fix" step does not actually show the effect of what got
> fixed in the patch (the test change shows the flip of the polarity
> of the test plus only a few context lines and does not show what
> behaviour change the "fix" causes).

Ok, will deliver the test and the fix proposal in a single patch.


>> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
>> index dd8e423..9fbd481 100755
>> --- a/t/t5526-fetch-submodules.sh
>> +++ b/t/t5526-fetch-submodules.sh
>> @@ -719,4 +719,42 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
>>   	)
>>   '
>>   
>> +add_commit_push()
>> +{
> 
> Style.
> 
>      add_commit_push () {

ok.


> cf. Documentation/CodingGuidelines.
> 
>> +	dir="$1"
>> +	msg="$2"
>> +	shift 2
>> +	git -C "$dir" add "$@" &&
>> +	git -C "$dir" commit -a -m "$msg" &&
>> +	git -C "$dir" push
>> +}
>> +
>> +test_expect_failure 'fetching a superproject containing an uninitialized sub/sub project' '
>> +	# does not depend on any previous test setups
>> +
>> +	for repo in outer middle inner
>> +	do
>> +		git init --bare $repo &&
>> +		git clone $repo ${repo}_content &&
>> +		echo $repo > ${repo}_content/file &&
> 
> Style.
> 
>      echo "$repo" >"${repo}_content/file" &&

ok.


> cf. Documentation/CodingGuidelines.
> 
>> +		add_commit_push ${repo}_content "initial" file
> 
> If any of these iterations, except for the last one, fails in the
> loop, you do not notice the breakage and go on to the next
> iteration.  You'd need "|| return 1" at the end, perhaps.

yes, I definitely missed that.


> So far, you created three bare repositories called outer, middle and
> inner, and each of {outer,middle,inner}_content repositories is a
> copy with a working tree of its counterpart.
> 
>> +	done &&
>> +
>> +	git clone outer A &&
>> +	git -C A submodule add "$pwd/middle" &&
>> +	git -C A/middle/ submodule add "$pwd/inner" &&
> 
> Hmph.  Is it essential to name these directories with full pathname
> for the problem to reproduce, or would the issue also appear if
> these repositories refer to each other with relative pathnames?
> Just being curious---if it only breaks with one and succeeds with
> the other, that deserves commenting here.

Haven't tried that as the case was intended to simulate an environment, 
where one has remote repositories.  And with remote repositories, you 
have an url, which is kind of absolute path.  When reading the failing 
code, I doubt that it really matters.


> So far, you created A that is "outer", added "middle" as its
> submodule and then added "inner" as a submodule of "middle".
> 
> Although it is not wrong per-se, it somehow feels a bit unnatural
> that you didn't do all of the above in the working trees you created
> in the previous step---I would have expected that middle_content
> working tree would be used to add "inner" as its submodule, for
> example.

Not sure I got your concern, maybe it helps you to understand when I add 
this scenario description which we want to mimic:
The "bare" repos outer, middle and inner are created by an administrator 
on a remote server.  Person A is preparing the split of the sources for 
all the other users working in the environment by adding the submodules 
the way which is specified by the software architecture we intend to 
develop in.


>> +	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
>> +	add_commit_push A/ "adding middle sub" .gitmodules middle &&
> 
> And then you conclude the addition of submodules by recording each
> of these two "submodule add" events in a commit and push it out.
> 
>> +	git clone outer B &&
>> +	git -C B/ submodule update --init middle &&
> 
> And then you clone the outer thing (which does not recursively
> instantiate) from A, and instantiate the middle layer (which does
> not recursively instantiate the bottom later, I presume?)

Yes, Person B is cloning into the outer layer without recursively going 
into all the submodules, just initializing the ones, which he is 
expected to work on.  In the tests scenario he's only working on the 
middle layer, but not on the inner one.


> I _think_ the state here should be minimally validated in this test.

Of course we could do so.  My intention was to keep it focused on the 
one thing which we needed to test.  Namely the fetch of an outer repo 
with an uninitialized sub-sub repo.


> If you expect 'outer' and 'middle' are instantiated, perhaps check
> its contents (e.g. do you have a thing called 'file'?  What does it
> have in it?) and check the commit (e.g. does 'rev-parse HEAD' give
> you the commit you expect?).  If you expect 'inner' is not
> instantiated at this point, that should be vaildated as well.  If
> anything, that would explain what your expectations are better than
> any word in the proposed log message.
> 
> In any case, i presume that up to this point things work as expected
> with or without the "fix" patch?  If so, the usual way we structure
> these tests is to stop here and make that a single "setup" test.
> Start the whole sequence above like so, perhaps.
> 
>      test_expect_success 'setup nested submodule fetch test' '
> 		...

Ok, got it, will refactor.


> And then the "interesting" part of the test.
> 
>> +	echo "change on inner repo of A" > A/middle/inner/file &&
> 
> Style.

ok.


>> +	add_commit_push A/middle/inner "change on inner" file &&
>> +	add_commit_push A/middle "change on inner" inner &&
>> +	add_commit_push A "change on inner" middle &&
> 
> So you create a new commit in the bottom layer, propagate it up to
> the middle layer, and to the outer layer.  Are these steps also what
> you expect to succeed, or does the "regression" break any of these?
> If these are still part of set-up that is expected to work, you
> probably need to roll these up to the 'setup' step (with some
> validation to express what the tests are expecting). From your
> description, which did not say where exactly in this long sequence
> you expect things to break, unfortunately no reader can tell, so
> I'll leave the restructuring up to you.

Yes those steps are also expected to succeed, it's just important that 
the initial clone of B happens before those pushes.  For your proposed 
restructuring this could also go into the setup step.  Leaving only one 
single command for the actual test to fail:


>> +
>> +	git -C B/ fetch
> 
> And from B that was an original copy of A with only the top and
> middle layer instantiated, you run "git fetch".  Are you happy as
> long as "git fetch" does not exit with non-zero status?  That is
> hard to believe---it may be a necessary condition for the command to
> exit with zero status, but you have other expectations, like what
> commit the remote tracking branch refs/remotes/origin/HEAD ought to
> be pointing at.  I think we should check that, too.

Checking for return code is the one thing which catches this regression, 
but checking whether all the repositories are at the correct HEAD is 
another thing which we probably want to have in for testing future 
changes on the respective part of the code.  Will add it.

Thank you very much for all the comments, I learned a lot by processing 
through them.  I'll send a patch v2 soon.
Philip Oakley Nov. 11, 2020, 5:22 p.m. UTC | #3
minor readability comment,

On 11/11/2020 12:45, Peter Kästle wrote:
> Here is my proposal for a new commit message of the test case:
>
> ----8<----
> A regression has been introduced by 'a62387b (submodule.c: fetch in
> submodules git directory instead of in worktree, 2018-11-28)'.
>
> The scenario in which it triggers is when one has a remote repository
> with a subrepository inside a subrepository like this:
> superproject/middle_repo/inner_repo
>
> Person A and B have both a clone of it, while Person B is not working
> with the inner_repo and thus does not have it initialized in his working
> copy.
>
> Now person A introduces a change to the inner_repo and propagates it
> through the middle_repo and the superproject.
> Once person A pushed the changes and person B wants to fetch them
> using "git fetch" on superproject level, git

It's not obviously obvious which person is doing this final 'git'
operation (it isn't attached to a particular person). Not sure if moving
the comma, or saying ", B's 'git fetch' will.." is the right choice. The
following sentences also feel as treating 'git' as person (trusted
friend;-).

> will return with error saying:
>
> Could not access submodule 'inner_repo'
> Errors during submodule fetch:
>         middle_repo
>
> Expectation is that in this case the inner submodule will be
> recognized as uninitialized subrepository and skipped by the git fetch
> command.
>
> This used to work correctly before 'a62387b (submodule.c: fetch in
> submodules git directory instead of in worktree, 2018-11-28)'.
>
> Starting with a62387b the code wants to evaluate "is_empty_dir()"
> inside .git/modules for a directory only existing in the worktree,
> delivering then of course wrong return value.
> ---->8----
--
Philip
Philippe Blain Nov. 11, 2020, 5:35 p.m. UTC | #4
Hi Junio,

> Le 9 nov. 2020 à 12:52, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> Peter Kaestle <peter.kaestle@nokia.com> writes:
> 
>> This test case triggers a regression, which was introduced by
>> a62387b3fc9f5aeeb04a2db278121d33a9caafa7 in following setup:
> 
> Also, it probably is a better arrangement, after explaining how the
> current system does not work in the log message, to have the code
> fix in the same patch and add test to ensure the bug will stay
> fixed, in a single patch.  That way, you do not have to start with
> expect_failure and then flip the polarity to expect_success, which
> is a horrible style for reviewers to understand the code fix because
> the second "fix" step does not actually show the effect of what got
> fixed in the patch (the test change shows the flip of the polarity
> of the test plus only a few context lines and does not show what
> behaviour change the "fix" causes).

I had learned by browsing the list that this was the preferred way to
submit patches for bug fixes and regressions for this project, but I had
not yet read a good justification as to *why* it was preferred. 

Thanks for spelling it out; I think a quick paragraph about this somewhere
in SubmittingPatches would be a good addition for new contributors.

Cheers,

Philippe.
Philippe Blain Nov. 11, 2020, 7:24 p.m. UTC | #5
Hi Peter,


> Le 9 nov. 2020 à 03:33, Peter Kaestle <peter.kaestle@nokia.com> a écrit :
> 
> --- snip ---
> 
> +add_commit_push()
> +{
> +	dir="$1"
> +	msg="$2"
> +	shift 2
> +	git -C "$dir" add "$@" &&
> +	git -C "$dir" commit -a -m "$msg" &&
> +	git -C "$dir" push
> +}
> +
> +test_expect_failure 'fetching a superproject containing an uninitialized sub/sub project' '
> +	# does not depend on any previous test setups
> +
> +	for repo in outer middle inner
> +	do
> +		git init --bare $repo &&
> +		git clone $repo ${repo}_content &&
> +		echo $repo > ${repo}_content/file &&
> +		add_commit_push ${repo}_content "initial" file
> +	done &&
> +
> +	git clone outer A &&
> +	git -C A submodule add "$pwd/middle" &&
> +	git -C A/middle/ submodule add "$pwd/inner" &&
> +	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
> +	add_commit_push A/ "adding middle sub" .gitmodules middle &&
> +
> +	git clone outer B &&
> +	git -C B/ submodule update --init middle &&
> +
> +	echo "change on inner repo of A" > A/middle/inner/file &&
> +	add_commit_push A/middle/inner "change on inner" file &&
> +	add_commit_push A/middle "change on inner" inner &&
> +	add_commit_push A "change on inner" middle &&

In addition to what Junio wrote, maybe your test could make better use of the 
functions provided by the test harness library. Take a look at t/README [1] for a 
partial list, and t/test-lib-functions.sh [2] for the full list. test_create_repo, in particular,
could be useful.

Cheers,

Philippe.


[1] https://github.com/git/git/blob/master/t/README#L743
[2] https://github.com/git/git/blob/master/t/test-lib-functions.sh
Junio C Hamano Nov. 11, 2020, 7:27 p.m. UTC | #6
Philippe Blain <levraiphilippeblain@gmail.com> writes:

> Thanks for spelling it out; I think a quick paragraph about this somewhere
> in SubmittingPatches would be a good addition for new contributors.

Patches welcome ;-)
diff mbox series

Patch

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index dd8e423..9fbd481 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -719,4 +719,42 @@  test_expect_success 'fetch new submodule commit intermittently referenced by sup
 	)
 '
 
+add_commit_push()
+{
+	dir="$1"
+	msg="$2"
+	shift 2
+	git -C "$dir" add "$@" &&
+	git -C "$dir" commit -a -m "$msg" &&
+	git -C "$dir" push
+}
+
+test_expect_failure 'fetching a superproject containing an uninitialized sub/sub project' '
+	# does not depend on any previous test setups
+
+	for repo in outer middle inner
+	do
+		git init --bare $repo &&
+		git clone $repo ${repo}_content &&
+		echo $repo > ${repo}_content/file &&
+		add_commit_push ${repo}_content "initial" file
+	done &&
+
+	git clone outer A &&
+	git -C A submodule add "$pwd/middle" &&
+	git -C A/middle/ submodule add "$pwd/inner" &&
+	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
+	add_commit_push A/ "adding middle sub" .gitmodules middle &&
+
+	git clone outer B &&
+	git -C B/ submodule update --init middle &&
+
+	echo "change on inner repo of A" > A/middle/inner/file &&
+	add_commit_push A/middle/inner "change on inner" file &&
+	add_commit_push A/middle "change on inner" inner &&
+	add_commit_push A "change on inner" middle &&
+
+	git -C B/ fetch
+'
+
 test_done