diff mbox series

[v2] submodule: mark submodules with update=none as inactive

Message ID 20210701225117.909892-1-sandals@crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series [v2] submodule: mark submodules with update=none as inactive | expand

Commit Message

brian m. carlson July 1, 2021, 10:51 p.m. UTC
When the user recursively clones a repository with submodules and one or
more of those submodules is marked with the submodule.<name>.update=none
configuration, the submodule will end up being active.  This is a
problem because we will have skipped cloning or checking out the
submodule, and as a result, other commands, such as git reset or git
checkout, will fail if they are invoked with --recurse-submodules (or
when submodule.recurse is true).

This is obviously not the behavior the user wanted, so let's fix this by
specifically setting the submodule as inactive in this case when we're
initializing the repository.  That will make us properly ignore the
submodule when performing recursive operations.

We only do this when initializing a submodule, since git submodule
update can update the submodule with various options despite the setting
of "none" and we want those options to override it as they currently do.

Reported-by: Rose Kunkel <rose@rosekunkel.me>
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/submodule--helper.c |  6 ++++++
 t/t5601-clone.sh            | 24 ++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Philippe Blain July 9, 2021, 8:26 p.m. UTC | #1
Hi brian,

[re-cc'ing Emily and Jonathan who Junio cc'ed in <xmqqeed2sdwc.fsf@gitster.g>
but seemed to have been dropped when you sent v1 and v2 of the patch]

Le 2021-07-01 à 18:51, brian m. carlson a écrit :
> When the user recursively clones a repository with submodules 

Here I would add:

", or runs 'git submodule update --init' after a
non-recursive clone of such a repository, "

> and one or
> more of those submodules is marked with the submodule.<name>.update=none
> configuration, the submodule 

"those submodules" would be clearer, I think.

> will end up being active.  This is a
> problem because we will have skipped cloning or checking out the
> submodule, and as a result, other commands, such as git reset or git
> checkout, will fail if they are invoked with --recurse-submodules (or
> when submodule.recurse is true).
> 
> This is obviously not the behavior the user wanted, so let's fix this by
> specifically setting the submodule as inactive in this case when we're
> initializing the repository.  That will make us properly ignore the
> submodule when performing recursive operations.
> 
> We only do this when initializing a submodule, 

Here for even more clarity I would add:

i.e. 'git submodule init' or 'git submodule update --init',

> since git submodule
> update can update the submodule with various options despite the setting
> of "none" and we want those options to override it as they currently do.
> 
> Reported-by: Rose Kunkel <rose@rosekunkel.me>
> Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>   builtin/submodule--helper.c |  6 ++++++
>   t/t5601-clone.sh            | 24 ++++++++++++++++++++++++
>   2 files changed, 30 insertions(+)

As I said in my review of v1, I think this would warrant a mention in the doc.

In general, I think 'git-submodule(1)' could be more precise about which submodules
are touched by which subcommands. Since the topic that introduced the 'active' concept
was merged in a93dcb0a56 (Merge branch 'bw/submodule-is-active', 2017-03-30), these subcommand
recurse only in active submodules:

- init (with a big caveat, see below)
- sync
- update

The doc makes no mention of that for sync and update. sync says it synchronizes 'all'
submodules, and update says it updates 'registered' submodules ('registered' in not
defined formally anywhere either). And 'active' is mentioned in the description of
'init', but not defined. It would be good to explicitely say "see the 'Active submodules'
section in gitsubmodules(7) for a definition of 'active'", or something like that.

I'm not saying we need to fix that necessarily in this patch, I'm just noting
what my reading of the code and of the doc reveals.

> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index ae6174ab05..a3f8c45d97 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -686,6 +686,12 @@ static void init_submodule(const char *path, const char *prefix,
>   
>   		if (git_config_set_gently(sb.buf, upd))
>   			die(_("Failed to register update mode for submodule path '%s'"), displaypath);
> +
> +		if (sub->update_strategy.type == SM_UPDATE_NONE) {
> +			strbuf_reset(&sb);
> +			strbuf_addf(&sb, "submodule.%s.active", sub->name);
> +			git_config_set_gently(sb.buf, "false");
> +		}
>   	}
>   	strbuf_release(&sb);
>   	free(displaypath);

I did more testing with this patch applied and I fear it is not
completely sufficient. There are 2 main problems, I think.
The first is that the following still triggers the bug:

     git clone server client
     git -C client submodule update --init
     git -C client submodule init       # should be no-op, but isn't
     git -C client reset --hard --recurse-submodules

That's because:

1) 'git submodule init' operates on *all* submodules if 'submodule.active' is unset
     and not <path> is given.
     (see submodule--helper.c::module_init), or the doc [1].
2) 'git submodule init' sets 'submodule.$name.active' to true for the submodules
     on which it operates, unless already covered by 'submodule.active'
     (see submodule--helper.c::init_submodule)
3) the code we're adding to set 'active' to false if 'update=none' is only executed
    if 'submodule.c.update' is not yet in the config, so it gets skipped if we
    repeat 'git submodule init'. (I think this behaviour is sound).

So that's unfortunate, and is also kind of contradictory to what the doc says
for 'git submodule init':
"This command does not alter existing information in .git/config.".
And just to be clear, the behaviour I describe above is already existing, the current
patch just makes it more obvious.

I think we could manage to change that behaviour a bit
in order to have 'submodule init' not modify the config for submodules which are already marked inactive,
*unless* they are explitely matched by the pathspec on the command line.
So we would have:

     git clone server client; cd client
     git submodule init      # initial call sets 'submodule.c.active=false'
     git submodule init      # does not touch c, it's already marked inactive
     git submodule init c    # OK, we really want to mark it as active

To do that, we could use the same trick that we do in update_clone, i.e.

     if (pathspec.nr)
         info.explicit = 1

where 'explicit' (tentative name) is a new field in 'struct init_cb', so that 'init_submodule'
knows if the current submodule was explicitely listed on the command line.


Then there is a second thing. As stated in the commit message,
'git submodule update --checkout' should override the 'update=none'
setting and clone and checkout the submodule. But this behaviour
is broken by the code we're adding, because 'submodule update' only recurse into
active submodules! (see the call to 'is_submodule_active' in
submodule--helper.c::prepare_to_clone_next_submodule).

So this does not clone the submodule:

     git clone --recurse server client   # recursive clone
     git -C client submodule --checkout  # should clone c, doesn't

Neither does this:
    
     git clone server client                   # non-recursive clone
     git -C client submodule update --init
     git -C client submodule update --checkout # should clone c, doesn't

But because of the first problem above, this works(!):

     git clone server client
     git -C client submodule update --init
     git -C client submodule update --init --checkout

Because in the third call, c is set to 'active' by init_submodule,
then is *not* skipped by prepare_to_clone_next_submodule.


So it's all a little bit complicated! But I think that with my suggestion above,
i.e. that 'git submodule init', in the absence of 'submodule.active', would
only switch inactive submodules to active if they are explicitely listed, then
we could get a saner behaviour, at the expense of having to explicitely init
'update=none' submodules on the command line if we really want to '--checkout' :
     
     git clone server client
     git -C client submodule update --init        # first call: set c to inactive
     git -C client submodule update --init        # no-op
     git -C client submodule update --checkout    # does not clone c (currently quiet)
     git -C client submodule update --checkout c  # does not clone c, but warns (current behaviour)
     git -C client submodule init c               # sets c to active
     git -C client submodule update --checkout    # clones c

where the last two command could be a single
'git submodule update --init --checkout c' and ideally the
4th command should also warn the user that they now have to explicitely 'init'
c if they want to check it out, which could simply mean tweaking the already
existing message in next_submodule_warn_missing to also check if
the current submodule has 'update=none' and then display the warning
(instead of just showing it if the submodule was listed on the command
line, which is the current behaviour). Additionnaly, the warning should
say "Maybe you want to use 'update --init %s'?", i.e. specify the path.


What do you think of my suggestions ? I can help push this forward
by contributing patches if we agree that we should go forward with
this slight behaviour change in 'git submodule init' ...


> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index c0688467e7..efe6b13be0 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -752,6 +752,30 @@ test_expect_success 'batch missing blob request does not inadvertently try to fe
>   	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
>   '
>   
> +test_expect_success 'clone with submodule with update=none is not active' '
> +	rm -rf server client &&
> +
> +	test_create_repo server &&
> +	echo a >server/a &&
> +	echo b >server/b &&
> +	git -C server add a b &&
> +	git -C server commit -m x &&
> +
> +	echo aa >server/a &&
> +	echo bb >server/b &&
> +	git -C server submodule add --name c "$(pwd)/repo_for_submodule" c &&
> +	git -C server config -f .gitmodules submodule.c.update none &&
> +	git -C server add a b c .gitmodules &&
> +	git -C server commit -m x &&
> +
> +	git clone --recurse-submodules server client &&
> +	git -C client config submodule.c.active >actual &&
> +	echo false >expected &&
> +	test_cmp actual expected &&
> +	# This would fail if the submodule were active, since it is not checked out.
> +	git -C client reset --recurse-submodules --hard
> +'

I think we might want to also test the non-recursive clone case as well,
i.e. 'git clone' and then 'git submodule update --init', as well as
subsequent calls to 'git submodule init' in light of my analysis above.

Also, the only place in the test suite that I could find where
'update=none' is tested is in t7406.35-38 in t7406-submodule-update.sh
so maybe it would make more sense to put the test(s) there ?

Thanks,

Philippe.

[1] https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-init--ltpathgt82308203
brian m. carlson July 11, 2021, 4:59 p.m. UTC | #2
On 2021-07-09 at 20:26:35, Philippe Blain wrote:
> Hi brian,
> 
> [re-cc'ing Emily and Jonathan who Junio cc'ed in <xmqqeed2sdwc.fsf@gitster.g>
> but seemed to have been dropped when you sent v1 and v2 of the patch]
> 
> Le 2021-07-01 à 18:51, brian m. carlson a écrit :
> > When the user recursively clones a repository with submodules
> 
> Here I would add:
> 
> ", or runs 'git submodule update --init' after a
> non-recursive clone of such a repository, "
> 
> > and one or
> > more of those submodules is marked with the submodule.<name>.update=none
> > configuration, the submodule
> 
> "those submodules" would be clearer, I think.

Sure, I can make that change.

> > will end up being active.  This is a
> > problem because we will have skipped cloning or checking out the
> > submodule, and as a result, other commands, such as git reset or git
> > checkout, will fail if they are invoked with --recurse-submodules (or
> > when submodule.recurse is true).
> > 
> > This is obviously not the behavior the user wanted, so let's fix this by
> > specifically setting the submodule as inactive in this case when we're
> > initializing the repository.  That will make us properly ignore the
> > submodule when performing recursive operations.
> > 
> > We only do this when initializing a submodule,
> 
> Here for even more clarity I would add:
> 
> i.e. 'git submodule init' or 'git submodule update --init',

Okay.

> > since git submodule
> > update can update the submodule with various options despite the setting
> > of "none" and we want those options to override it as they currently do.
> > 
> > Reported-by: Rose Kunkel <rose@rosekunkel.me>
> > Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >   builtin/submodule--helper.c |  6 ++++++
> >   t/t5601-clone.sh            | 24 ++++++++++++++++++++++++
> >   2 files changed, 30 insertions(+)
> 
> As I said in my review of v1, I think this would warrant a mention in the doc.
> 
> In general, I think 'git-submodule(1)' could be more precise about which submodules
> are touched by which subcommands. Since the topic that introduced the 'active' concept
> was merged in a93dcb0a56 (Merge branch 'bw/submodule-is-active', 2017-03-30), these subcommand
> recurse only in active submodules:
> 
> - init (with a big caveat, see below)
> - sync
> - update
> 
> The doc makes no mention of that for sync and update. sync says it synchronizes 'all'
> submodules, and update says it updates 'registered' submodules ('registered' in not
> defined formally anywhere either). And 'active' is mentioned in the description of
> 'init', but not defined. It would be good to explicitely say "see the 'Active submodules'
> section in gitsubmodules(7) for a definition of 'active'", or something like that.
> 
> I'm not saying we need to fix that necessarily in this patch, I'm just noting
> what my reading of the code and of the doc reveals.



> I did more testing with this patch applied and I fear it is not
> completely sufficient. There are 2 main problems, I think.
> The first is that the following still triggers the bug:
> 
>     git clone server client
>     git -C client submodule update --init
>     git -C client submodule init       # should be no-op, but isn't
>     git -C client reset --hard --recurse-submodules
> 
> That's because:
> 
> 1) 'git submodule init' operates on *all* submodules if 'submodule.active' is unset
>     and not <path> is given.
>     (see submodule--helper.c::module_init), or the doc [1].
> 2) 'git submodule init' sets 'submodule.$name.active' to true for the submodules
>     on which it operates, unless already covered by 'submodule.active'
>     (see submodule--helper.c::init_submodule)
> 3) the code we're adding to set 'active' to false if 'update=none' is only executed
>    if 'submodule.c.update' is not yet in the config, so it gets skipped if we
>    repeat 'git submodule init'. (I think this behaviour is sound).
> 
> So that's unfortunate, and is also kind of contradictory to what the doc says
> for 'git submodule init':
> "This command does not alter existing information in .git/config.".
> And just to be clear, the behaviour I describe above is already existing, the current
> patch just makes it more obvious.

Right, I noticed that.

> I think we could manage to change that behaviour a bit
> in order to have 'submodule init' not modify the config for submodules which are already marked inactive,
> *unless* they are explitely matched by the pathspec on the command line.
> So we would have:
> 
>     git clone server client; cd client
>     git submodule init      # initial call sets 'submodule.c.active=false'
>     git submodule init      # does not touch c, it's already marked inactive
>     git submodule init c    # OK, we really want to mark it as active

We could also add an option, --all, to make it work on all submodules.
That's because previously "git submodule init" did work on all the
submodules in the repository in question, but it doesn't now because of
our change.

> So it's all a little bit complicated! But I think that with my suggestion above,
> i.e. that 'git submodule init', in the absence of 'submodule.active', would
> only switch inactive submodules to active if they are explicitely listed, then
> we could get a saner behaviour, at the expense of having to explicitely init
> 'update=none' submodules on the command line if we really want to '--checkout' :
>     git clone server client
>     git -C client submodule update --init        # first call: set c to inactive
>     git -C client submodule update --init        # no-op
>     git -C client submodule update --checkout    # does not clone c (currently quiet)
>     git -C client submodule update --checkout c  # does not clone c, but warns (current behaviour)
>     git -C client submodule init c               # sets c to active
>     git -C client submodule update --checkout    # clones c
> 
> where the last two command could be a single
> 'git submodule update --init --checkout c' and ideally the
> 4th command should also warn the user that they now have to explicitely 'init'
> c if they want to check it out, which could simply mean tweaking the already
> existing message in next_submodule_warn_missing to also check if
> the current submodule has 'update=none' and then display the warning
> (instead of just showing it if the submodule was listed on the command
> line, which is the current behaviour). Additionnaly, the warning should
> say "Maybe you want to use 'update --init %s'?", i.e. specify the path.
> 
> 
> What do you think of my suggestions ? I can help push this forward
> by contributing patches if we agree that we should go forward with
> this slight behaviour change in 'git submodule init' ...

With the modification of adding --all to init so users can get a
behavior a little more similar to the previous, yes, that sounds good.
It would be great if you'd be willing to send a few patches.

> > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> > index c0688467e7..efe6b13be0 100755
> > --- a/t/t5601-clone.sh
> > +++ b/t/t5601-clone.sh
> > @@ -752,6 +752,30 @@ test_expect_success 'batch missing blob request does not inadvertently try to fe
> >   	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
> >   '
> > +test_expect_success 'clone with submodule with update=none is not active' '
> > +	rm -rf server client &&
> > +
> > +	test_create_repo server &&
> > +	echo a >server/a &&
> > +	echo b >server/b &&
> > +	git -C server add a b &&
> > +	git -C server commit -m x &&
> > +
> > +	echo aa >server/a &&
> > +	echo bb >server/b &&
> > +	git -C server submodule add --name c "$(pwd)/repo_for_submodule" c &&
> > +	git -C server config -f .gitmodules submodule.c.update none &&
> > +	git -C server add a b c .gitmodules &&
> > +	git -C server commit -m x &&
> > +
> > +	git clone --recurse-submodules server client &&
> > +	git -C client config submodule.c.active >actual &&
> > +	echo false >expected &&
> > +	test_cmp actual expected &&
> > +	# This would fail if the submodule were active, since it is not checked out.
> > +	git -C client reset --recurse-submodules --hard
> > +'
> 
> I think we might want to also test the non-recursive clone case as well,
> i.e. 'git clone' and then 'git submodule update --init', as well as
> subsequent calls to 'git submodule init' in light of my analysis above.
> 
> Also, the only place in the test suite that I could find where
> 'update=none' is tested is in t7406.35-38 in t7406-submodule-update.sh
> so maybe it would make more sense to put the test(s) there ?

Sure, I can do that.
diff mbox series

Patch

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ae6174ab05..a3f8c45d97 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -686,6 +686,12 @@  static void init_submodule(const char *path, const char *prefix,
 
 		if (git_config_set_gently(sb.buf, upd))
 			die(_("Failed to register update mode for submodule path '%s'"), displaypath);
+
+		if (sub->update_strategy.type == SM_UPDATE_NONE) {
+			strbuf_reset(&sb);
+			strbuf_addf(&sb, "submodule.%s.active", sub->name);
+			git_config_set_gently(sb.buf, "false");
+		}
 	}
 	strbuf_release(&sb);
 	free(displaypath);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index c0688467e7..efe6b13be0 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -752,6 +752,30 @@  test_expect_success 'batch missing blob request does not inadvertently try to fe
 	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
 '
 
+test_expect_success 'clone with submodule with update=none is not active' '
+	rm -rf server client &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	echo aa >server/a &&
+	echo bb >server/b &&
+	git -C server submodule add --name c "$(pwd)/repo_for_submodule" c &&
+	git -C server config -f .gitmodules submodule.c.update none &&
+	git -C server add a b c .gitmodules &&
+	git -C server commit -m x &&
+
+	git clone --recurse-submodules server client &&
+	git -C client config submodule.c.active >actual &&
+	echo false >expected &&
+	test_cmp actual expected &&
+	# This would fail if the submodule were active, since it is not checked out.
+	git -C client reset --recurse-submodules --hard
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd