clone: teach --single-branch and --branch during --recurse
diff mbox series

Message ID 20200108231900.192476-1-emilyshaffer@google.com
State New
Headers show
Series
  • clone: teach --single-branch and --branch during --recurse
Related show

Commit Message

Emily Shaffer Jan. 8, 2020, 11:19 p.m. UTC
Previously, performing "git clone --recurse-submodules --single-branch"
resulted in submodules cloning all branches even though the superproject
cloned only one branch. Pipe --single-branch and its friend, --branch,
through the submodule helper framework to make it to 'clone' later on.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
Note that 'branch' was already in use in git-submodules.sh, so
"submodule branch" aka 'sm_branch' was used to disambiguate the two.

 Documentation/git-submodule.txt    |  6 +++++-
 builtin/clone.c                    |  6 ++++++
 builtin/submodule--helper.c        | 28 +++++++++++++++++++++++++---
 git-submodule.sh                   | 17 ++++++++++++++++-
 t/t5617-clone-submodules-remote.sh | 26 ++++++++++++++++++++++++--
 5 files changed, 76 insertions(+), 7 deletions(-)

Comments

Emily Shaffer Jan. 8, 2020, 11:39 p.m. UTC | #1
On Wed, Jan 08, 2020 at 03:19:00PM -0800, Emily Shaffer wrote:

Bah, in my attempt to keep the subject brief I was aiming to write
"--recursive" but instead just wrote "--recurse" which is wrong. I can
push another version with the fix if desired.

> Previously, performing "git clone --recurse-submodules --single-branch"
> resulted in submodules cloning all branches even though the superproject
> cloned only one branch. Pipe --single-branch and its friend, --branch,
> through the submodule helper framework to make it to 'clone' later on.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> Note that 'branch' was already in use in git-submodules.sh, so
> "submodule branch" aka 'sm_branch' was used to disambiguate the two.
> 
>  Documentation/git-submodule.txt    |  6 +++++-
>  builtin/clone.c                    |  6 ++++++
>  builtin/submodule--helper.c        | 28 +++++++++++++++++++++++++---
>  git-submodule.sh                   | 17 ++++++++++++++++-
>  t/t5617-clone-submodules-remote.sh | 26 ++++++++++++++++++++++++--
>  5 files changed, 76 insertions(+), 7 deletions(-)
Jeff King Jan. 9, 2020, 8:11 a.m. UTC | #2
On Wed, Jan 08, 2020 at 03:19:00PM -0800, Emily Shaffer wrote:

> Subject: clone: teach --single-branch and --branch during --recurse

A minor nit, but this subject confused me for a moment. I think we'd
usually say "teach" for a new feature being implemented, and this is
really just about passing along the existing features. Something like:

  clone: pass --single-branch and --branch when recursing submodules

would have been a bit more obvious (to me anyway).

> Previously, performing "git clone --recurse-submodules --single-branch"
> resulted in submodules cloning all branches even though the superproject
> cloned only one branch. Pipe --single-branch and its friend, --branch,
> through the submodule helper framework to make it to 'clone' later on.

Since I don't really use submodules, I don't have much data or even
intuition to go on. But could this be a regression for some situations?
E.g., imagine I have a repo "parent" whose branch "foo" has a submodule
"child", but "child" only has a branch "bar". What happens now if I "git
clone --recurse-submodules --single-branch -b foo parent", and what will
happen after this patch?

I think it works before, but doesn't now.

Setting up like this:

  git init child
  (
  	cd child
  	git checkout -b bar
  	echo whatever >file
  	git add file
  	git commit -m 'child commit'
  )
  
  git init parent
  cd parent
  git checkout -b foo
  git submodule add $PWD/../child
  git commit -m 'add submodule'

if I use the current tip of master, I get:

  $ git clone --recurse-submodules --single-branch -b foo parent cur
  Cloning into 'cur'...
  done.
  Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child'
  Cloning into '/home/peff/tmp/cur/child'...
  done.
  Submodule path 'child': checked out 'b5cbfcc9fec3b7d67e305468624fed2ba1aa4758'

  $ git -C cur/child log -1 --oneline | cat
  b5cbfcc child commit

with your patch, I get:

  $ git.compile clone --recurse-submodules --single-branch -b foo parent new
  Cloning into 'new'...
  done.
  Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child'
  Cloning into '/home/peff/tmp/new/child'...
  warning: Could not find remote branch foo to clone.
  fatal: Remote branch foo not found in upstream origin
  fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed
  Failed to clone 'child'. Retry scheduled
  Cloning into '/home/peff/tmp/new/child'...
  warning: Could not find remote branch foo to clone.
  fatal: Remote branch foo not found in upstream origin
  fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed
  Failed to clone 'child' a second time, aborting

  $ git -C new/child log -1 --oneline | cat
  11acb3a add submodule

(there's nothing checked out in the submodule).

I have no idea how common this kind of thing would be, and I expect in
most cases your patch would do what people want. But we might need to be
better about retrying without those options when the first clone fails.

-Peff
Emily Shaffer Jan. 16, 2020, 10:38 p.m. UTC | #3
On Thu, Jan 09, 2020 at 03:11:50AM -0500, Jeff King wrote:
> On Wed, Jan 08, 2020 at 03:19:00PM -0800, Emily Shaffer wrote:
> 
> > Subject: clone: teach --single-branch and --branch during --recurse
> 
> A minor nit, but this subject confused me for a moment. I think we'd
> usually say "teach" for a new feature being implemented, and this is
> really just about passing along the existing features. Something like:
> 
>   clone: pass --single-branch and --branch when recursing submodules
> 
> would have been a bit more obvious (to me anyway).
> 
> > Previously, performing "git clone --recurse-submodules --single-branch"
> > resulted in submodules cloning all branches even though the superproject
> > cloned only one branch. Pipe --single-branch and its friend, --branch,
> > through the submodule helper framework to make it to 'clone' later on.
> 
> Since I don't really use submodules, I don't have much data or even
> intuition to go on. But could this be a regression for some situations?
> E.g., imagine I have a repo "parent" whose branch "foo" has a submodule
> "child", but "child" only has a branch "bar". What happens now if I "git
> clone --recurse-submodules --single-branch -b foo parent", and what will
> happen after this patch?
> 
> I think it works before, but doesn't now.
> 
> Setting up like this:
> 
>   git init child
>   (
>   	cd child
>   	git checkout -b bar
>   	echo whatever >file
>   	git add file
>   	git commit -m 'child commit'
>   )
>   
>   git init parent
>   cd parent
>   git checkout -b foo
>   git submodule add $PWD/../child
>   git commit -m 'add submodule'
> 
> if I use the current tip of master, I get:
> 
>   $ git clone --recurse-submodules --single-branch -b foo parent cur
>   Cloning into 'cur'...
>   done.
>   Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child'
>   Cloning into '/home/peff/tmp/cur/child'...
>   done.
>   Submodule path 'child': checked out 'b5cbfcc9fec3b7d67e305468624fed2ba1aa4758'
> 
>   $ git -C cur/child log -1 --oneline | cat
>   b5cbfcc child commit
> 
> with your patch, I get:
> 
>   $ git.compile clone --recurse-submodules --single-branch -b foo parent new
>   Cloning into 'new'...
>   done.
>   Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child'
>   Cloning into '/home/peff/tmp/new/child'...
>   warning: Could not find remote branch foo to clone.
>   fatal: Remote branch foo not found in upstream origin
>   fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed
>   Failed to clone 'child'. Retry scheduled
>   Cloning into '/home/peff/tmp/new/child'...
>   warning: Could not find remote branch foo to clone.
>   fatal: Remote branch foo not found in upstream origin
>   fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed
>   Failed to clone 'child' a second time, aborting
> 
>   $ git -C new/child log -1 --oneline | cat
>   11acb3a add submodule
> 
> (there's nothing checked out in the submodule).
> 
> I have no idea how common this kind of thing would be, and I expect in
> most cases your patch would do what people want. But we might need to be
> better about retrying without those options when the first clone fails.

Yeah, that's interesting. A retry sounds like a pretty solid approach,
although if someone's being cautious and using --single-branch I wonder
if that's really something they want (since that's avoiding grabbing
extraneous branches).

I suppose at the very least, --single-branch without --branch should
become recursive. Whether --branch should become recursive, I'm not
totally sure.

The two scenarios I see in conflict are this:
- Superproject branch "foo"
  - submodule A branch "foo"
  - submodule B branch "foo"
and
- Superproject branch "foo"
  - submodule A branch "bar"
  - submodule B branch "baz"

If we propagate --branch, the first scenario Just Works, and the second
scenario requires something like:

 git clone --recurse-submodules=no --single-branch --branch foo https://superproject
 git submodule update --init --single-branch --branch bar A/
 git submodule update --init --single-branch --branch baz B/

(I guess if the superproject knows what branch it wants for all the submodules,
you could also just do "git submodule update --init --single-branch" and
have it go and ask for all of them.)

If we don't propagate --branch, the second scenario Just Works, and the
first scenario requires something like:

 git clone --recurse-submodules=no --single-branch --branch foo https://superproject
 git submodule update --init --single-branch --branch foo

(That is, I think as long as 'update' takes --branch, even if it's not
passed along by 'clone', it should still work OK when delegating to
everyone.)

Let me know if I misunderstood what you were worried about.

I don't use submodules heavily for myself either. I'll try and ask
around a little to see what folks want, at least here. The ergonomics
seem pretty similar, so I guess it comes down to having explicit
documentation.

> 
> -Peff
Jeff King Jan. 17, 2020, 9:03 p.m. UTC | #4
On Thu, Jan 16, 2020 at 02:38:00PM -0800, Emily Shaffer wrote:

> > I have no idea how common this kind of thing would be, and I expect in
> > most cases your patch would do what people want. But we might need to be
> > better about retrying without those options when the first clone fails.
> 
> Yeah, that's interesting. A retry sounds like a pretty solid approach,
> although if someone's being cautious and using --single-branch I wonder
> if that's really something they want (since that's avoiding grabbing
> extraneous branches).
> 
> I suppose at the very least, --single-branch without --branch should
> become recursive. Whether --branch should become recursive, I'm not
> totally sure.

Yeah, I think it may be worth separating out how to think about the two
options. It's a lot more plausible to me that --single-branch would want
to recurse than --branch.

> The two scenarios I see in conflict are this:
> - Superproject branch "foo"
>   - submodule A branch "foo"
>   - submodule B branch "foo"
> and
> - Superproject branch "foo"
>   - submodule A branch "bar"
>   - submodule B branch "baz"
> 
> If we propagate --branch, the first scenario Just Works, and the second
> scenario requires something like:
> 
>  git clone --recurse-submodules=no --single-branch --branch foo https://superproject
>  git submodule update --init --single-branch --branch bar A/
>  git submodule update --init --single-branch --branch baz B/
> 
> (I guess if the superproject knows what branch it wants for all the submodules,
> you could also just do "git submodule update --init --single-branch" and
> have it go and ask for all of them.)
> 
> If we don't propagate --branch, the second scenario Just Works, and the
> first scenario requires something like:
> 
>  git clone --recurse-submodules=no --single-branch --branch foo https://superproject
>  git submodule update --init --single-branch --branch foo
> 
> (That is, I think as long as 'update' takes --branch, even if it's not
> passed along by 'clone', it should still work OK when delegating to
> everyone.)
> 
> Let me know if I misunderstood what you were worried about.

Right, that's exactly my concern. You're making one case work but
breaking the other.

It really seems like there's no particular reason to assume that the
superproject branch corresponds to the submodule branch (or even
submodules of submodules). I imagine it would in some cases (like trying
to replace the use of "repo" in Android), but that's just one model.

It would make more sense to me to either (or both):

  - make sure that .gitmodules has enough information about which branch
    to use for each submodule

  - offer an extra option for the default branch to use for any
    submodules. This is still not general enough to cover all situations
    (e.g., the bar/baz you showed above), but it at least makes it
    relatively easy to cover the simple cases, without breaking any
    existing ones.

-Peff
Emily Shaffer Jan. 27, 2020, 10:20 p.m. UTC | #5
On Fri, Jan 17, 2020 at 04:03:19PM -0500, Jeff King wrote:

> (like trying to replace the use of "repo" in Android)

Oops, you saw right through us ;)

> It would make more sense to me to either (or both):
> 
>   - make sure that .gitmodules has enough information about which branch
>     to use for each submodule

Hum. I don't work with them day to day, but aren't we already in that
state? Is that not what the 'branch' option for each submodule means?

> 
>   - offer an extra option for the default branch to use for any
>     submodules. This is still not general enough to cover all situations
>     (e.g., the bar/baz you showed above), but it at least makes it
>     relatively easy to cover the simple cases, without breaking any
>     existing ones.

Yeah, this is sort of the direction my mind went too - "not
--branch recursively, but --submodule-branch". But that breaks down when you've
got a nontrivial number of submodules, at which point you're gonna have
a hard time unless you've got the .gitmodules configured correctly.


Well, as for this patch, let me try it with just --single-branch and see
whether that works for the case the user reported. I can head back to
the drawing board if not.

 - Emily
Emily Shaffer Jan. 27, 2020, 10:49 p.m. UTC | #6
On Mon, Jan 27, 2020 at 02:20:19PM -0800, Emily Shaffer wrote:
> On Fri, Jan 17, 2020 at 04:03:19PM -0500, Jeff King wrote:
> 
> > (like trying to replace the use of "repo" in Android)
> 
> Oops, you saw right through us ;)
> 
> > It would make more sense to me to either (or both):
> > 
> >   - make sure that .gitmodules has enough information about which branch
> >     to use for each submodule
> 
> Hum. I don't work with them day to day, but aren't we already in that
> state? Is that not what the 'branch' option for each submodule means?

I've been corrected off-list that the 'branch' in .gitmodules is used
during 'git submodule update --remote', but not during 'git submodule
init' or 'git clone --recurse-submodules'. Then, for the problem in
discussion for this thread, it seems like a better choice is something
like 'git clone --recurse-submdoules --use-gitmodules' or whatever we
want to call it - e.g., rather than fetching the branch where the server
knows HEAD, ask the .gitmodules to figure out which branch?

It seems like that ought to live separately from --single-branch. In the
case where you very strictly only want to fetch one branch (not two
branches) I suppose you'd want something like 'git clone
--recurse-submodules --single-branch --branch=mysuperprojectbranch
--use-gitmodules' to make sure that only one branch per repo comes down.

With n submodules of various naming schemas, provenance, etc., I don't
think there's a good case for recursing --branch one way or another; it
seems like filling out some config is the way to go.

I guess we could also teach it to take some input like
--submodule-branch-spec=foo.txt, and/or a multiply provided
--submodule-branch foo=foobranch --submodule-branch bar/baz=bazbranch.

  [foo.txt]
  foo=foobranch
  bar/baz=bazbranch

With that approach, then someone gets a little more flexibility than
relying on what the .gitmodules has set up.

> >   - offer an extra option for the default branch to use for any
> >     submodules. This is still not general enough to cover all situations
> >     (e.g., the bar/baz you showed above), but it at least makes it
> >     relatively easy to cover the simple cases, without breaking any
> >     existing ones.
> 
> Yeah, this is sort of the direction my mind went too - "not
> --branch recursively, but --submodule-branch". But that breaks down when you've
> got a nontrivial number of submodules, at which point you're gonna have
> a hard time unless you've got the .gitmodules configured correctly.
> 
> 
> Well, as for this patch, let me try it with just --single-branch and see
> whether that works for the case the user reported. I can head back to
> the drawing board if not.

With only half the rework of my patch done, I'm starting to convince
myself it's not actually going to work :) Well, I'll still try and see.

 - Emily
Jeff King Jan. 27, 2020, 11 p.m. UTC | #7
On Mon, Jan 27, 2020 at 02:20:19PM -0800, Emily Shaffer wrote:

> > It would make more sense to me to either (or both):
> > 
> >   - make sure that .gitmodules has enough information about which branch
> >     to use for each submodule
> 
> Hum. I don't work with them day to day, but aren't we already in that
> state? Is that not what the 'branch' option for each submodule means?

Probably? :) I've never used the feature myself, but it does seem like
the right thing (and should definitely take precedence over any "-b"
option passed to the superproject).

> >   - offer an extra option for the default branch to use for any
> >     submodules. This is still not general enough to cover all situations
> >     (e.g., the bar/baz you showed above), but it at least makes it
> >     relatively easy to cover the simple cases, without breaking any
> >     existing ones.
> 
> Yeah, this is sort of the direction my mind went too - "not
> --branch recursively, but --submodule-branch". But that breaks down when you've
> got a nontrivial number of submodules, at which point you're gonna have
> a hard time unless you've got the .gitmodules configured correctly.

Right. Probably the right answer for that bar/baz case is to complain to
the superproject owner that they didn't put branch fields into their
.gitmodules. So...

> Well, as for this patch, let me try it with just --single-branch and see
> whether that works for the case the user reported. I can head back to
> the drawing board if not.

Yeah, that makes perfect sense to me.

-Peff
Jeff King Jan. 27, 2020, 11:10 p.m. UTC | #8
On Mon, Jan 27, 2020 at 02:49:14PM -0800, Emily Shaffer wrote:

> > >   - make sure that .gitmodules has enough information about which branch
> > >     to use for each submodule
> > 
> > Hum. I don't work with them day to day, but aren't we already in that
> > state? Is that not what the 'branch' option for each submodule means?
> 
> I've been corrected off-list that the 'branch' in .gitmodules is used
> during 'git submodule update --remote', but not during 'git submodule
> init' or 'git clone --recurse-submodules'. Then, for the problem in
> discussion for this thread, it seems like a better choice is something
> like 'git clone --recurse-submdoules --use-gitmodules' or whatever we
> want to call it - e.g., rather than fetching the branch where the server
> knows HEAD, ask the .gitmodules to figure out which branch?

Oof, I should have read this message before responding to the other one. ;)

> It seems like that ought to live separately from --single-branch. In the
> case where you very strictly only want to fetch one branch (not two
> branches) I suppose you'd want something like 'git clone
> --recurse-submodules --single-branch --branch=mysuperprojectbranch
> --use-gitmodules' to make sure that only one branch per repo comes down.
> 
> With n submodules of various naming schemas, provenance, etc., I don't
> think there's a good case for recursing --branch one way or another; it
> seems like filling out some config is the way to go.

Yeah, I do still think that it makes sense for clone to pass along
--single-branch, regardless, and then deal with branch selection problem
separately on top.

> I guess we could also teach it to take some input like
> --submodule-branch-spec=foo.txt, and/or a multiply provided
> --submodule-branch foo=foobranch --submodule-branch bar/baz=bazbranch.
> 
>   [foo.txt]
>   foo=foobranch
>   bar/baz=bazbranch
> 
> With that approach, then someone gets a little more flexibility than
> relying on what the .gitmodules has set up.

Yeah, I agree that the most general form is being able to specify the
mapping for each individually. At first I wondered why you'd ever _not_
want to just use the branches specified in .gitmodules. But I suppose
that gets embedded in the superproject history, which gets awkward as
those commits move between branches. E.g., for an android-like project,
you don't want to make a commit that says "use branch devel for all
submodules" on the devel branch of your superproject. Eventually that
will get merged to master, and you'd have to remember to switch it back
to "master".

So for the simple case, you probably do want to be able to say "use this
branch for cloning all submodules".

For the complex cases, you'd need that full mapping. But I think it may
be worth it to punt on that for now. Even if we eventually added such a
feature, I think we'd still want the simpler version anyway (because
when you _can_ use it, it's going to be much easier). So there's nothing
lost by doing the minimal thing now and waiting to see if more complex
use cases even show up.

Another thing occurs to me, though: should the binding of this submodule
default branch be written to disk (e.g., a config option)? I'm thinking
specifically that if you do:

  git clone --submodule-branch=devel -b devel superproject

and then later, you "git fetch" and find that somebody has added a new
submodule, you'd presumably want the devel branch of that, too?

-Peff
Emily Shaffer Jan. 28, 2020, 1:08 a.m. UTC | #9
On Mon, Jan 27, 2020 at 06:10:07PM -0500, Jeff King wrote:
> On Mon, Jan 27, 2020 at 02:49:14PM -0800, Emily Shaffer wrote:
> 
> > > >   - make sure that .gitmodules has enough information about which branch
> > > >     to use for each submodule
> > > 
> > > Hum. I don't work with them day to day, but aren't we already in that
> > > state? Is that not what the 'branch' option for each submodule means?
> > 
> > I've been corrected off-list that the 'branch' in .gitmodules is used
> > during 'git submodule update --remote', but not during 'git submodule
> > init' or 'git clone --recurse-submodules'. Then, for the problem in
> > discussion for this thread, it seems like a better choice is something
> > like 'git clone --recurse-submdoules --use-gitmodules' or whatever we
> > want to call it - e.g., rather than fetching the branch where the server
> > knows HEAD, ask the .gitmodules to figure out which branch?
> 
> Oof, I should have read this message before responding to the other one. ;)
> 
> > It seems like that ought to live separately from --single-branch. In the
> > case where you very strictly only want to fetch one branch (not two
> > branches) I suppose you'd want something like 'git clone
> > --recurse-submodules --single-branch --branch=mysuperprojectbranch
> > --use-gitmodules' to make sure that only one branch per repo comes down.
> > 
> > With n submodules of various naming schemas, provenance, etc., I don't
> > think there's a good case for recursing --branch one way or another; it
> > seems like filling out some config is the way to go.
> 
> Yeah, I do still think that it makes sense for clone to pass along
> --single-branch, regardless, and then deal with branch selection problem
> separately on top.

Sure; I've got that ready to send shortly. It seems to grab HEAD of the
remote for each submodule, and then checkout the specific commit ID the
superproject wants - in my test case, that commit ID was a direct
ancestor of 'master', so the single branch only got 'master'. I'm not
sure how it would work with a commit ID which doesn't exist in the
single branch that was fetched; I'll write a test and have a look.

> 
> > I guess we could also teach it to take some input like
> > --submodule-branch-spec=foo.txt, and/or a multiply provided
> > --submodule-branch foo=foobranch --submodule-branch bar/baz=bazbranch.
> > 
> >   [foo.txt]
> >   foo=foobranch
> >   bar/baz=bazbranch
> > 
> > With that approach, then someone gets a little more flexibility than
> > relying on what the .gitmodules has set up.
> 
> Yeah, I agree that the most general form is being able to specify the
> mapping for each individually. At first I wondered why you'd ever _not_
> want to just use the branches specified in .gitmodules. But I suppose
> that gets embedded in the superproject history, which gets awkward as
> those commits move between branches. E.g., for an android-like project,
> you don't want to make a commit that says "use branch devel for all
> submodules" on the devel branch of your superproject. Eventually that
> will get merged to master, and you'd have to remember to switch it back
> to "master".

Yeah, or I suppose I might be doing something weird, like wanting to run
integration tests for the whole project on changes in just one
submodule, or something.

> So for the simple case, you probably do want to be able to say "use this
> branch for cloning all submodules".

I think it still makes sense to call this out explicitly, yes? Or do you
think that should just be the default?

> 
> For the complex cases, you'd need that full mapping. But I think it may
> be worth it to punt on that for now. Even if we eventually added such a
> feature, I think we'd still want the simpler version anyway (because
> when you _can_ use it, it's going to be much easier). So there's nothing
> lost by doing the minimal thing now and waiting to see if more complex
> use cases even show up.
> 
> Another thing occurs to me, though: should the binding of this submodule
> default branch be written to disk (e.g., a config option)? I'm thinking
> specifically that if you do:
> 
>   git clone --submodule-branch=devel -b devel superproject
> 
> and then later, you "git fetch" and find that somebody has added a new
> submodule, you'd presumably want the devel branch of that, too?

This made me think - I wonder if it makes sense to take
--submodule-branch as a wildcarded spec instead. So in your case, I
could say,

  git clone --submodule-branch *=devel -b devel superproject

And then I don't need to do anything differently for 'git fetch' later.
This also opens the door for some repos getting special treatment:

  git clone --submodule-branch-file=foo.txt -b dev example

  foo.txt:
  curl=stable-1.2.3
  nlohmann=v2.28
  example-*=dev
  *=master

(specifying specific versions for some source dependencies, dev branches
for submodules which are associated with with 'example' superproject and
might be getting active development, and a wild guess for everything
else)

I think that also tends to match the glob-expansion configs we use for
other things. One thing sticking out to me about the idea of providing
--submodule-branch is that you need to know what's in the repo before
you clone it the first time, which being able to use globbing like this
kind of helps with. But then, I suppose if you don't know what you're
looking for, you're not also looking for a very precise filter on your
clone ;)

 - Emily
Jeff King Jan. 28, 2020, 1:31 a.m. UTC | #10
On Mon, Jan 27, 2020 at 05:08:41PM -0800, Emily Shaffer wrote:

> > Yeah, I do still think that it makes sense for clone to pass along
> > --single-branch, regardless, and then deal with branch selection problem
> > separately on top.
> 
> Sure; I've got that ready to send shortly. It seems to grab HEAD of the
> remote for each submodule, and then checkout the specific commit ID the
> superproject wants - in my test case, that commit ID was a direct
> ancestor of 'master', so the single branch only got 'master'. I'm not
> sure how it would work with a commit ID which doesn't exist in the
> single branch that was fetched; I'll write a test and have a look.

Yeah, it's definitely worth exploring how that works. I thought we had
some kind of fallback for when we didn't manage to fetch the object. But
maybe I am confusing it with the fallback for "we tried to fetch this
specific object, but the other side doesn't allow that, so we grabbed a
branch instead".

> > So for the simple case, you probably do want to be able to say "use this
> > branch for cloning all submodules".
> 
> I think it still makes sense to call this out explicitly, yes? Or do you
> think that should just be the default?

Yes, I think it should be a separate option from "--branch".

> This made me think - I wonder if it makes sense to take
> --submodule-branch as a wildcarded spec instead. So in your case, I
> could say,
> 
>   git clone --submodule-branch *=devel -b devel superproject
> 
> And then I don't need to do anything differently for 'git fetch' later.
> This also opens the door for some repos getting special treatment:
> 
>   git clone --submodule-branch-file=foo.txt -b dev example
> 
>   foo.txt:
>   curl=stable-1.2.3
>   nlohmann=v2.28
>   example-*=dev
>   *=master

If we write it all as config, I think things may get simpler. IIRC,
there is already submodule.*.foo config in .git/config (that can mirror
and override what's in .gitmodules).

So if we had some config option for "clone this branch for the submodule
instead of HEAD", then that means you can do:

  git clone -c submodule.foo.clonehead=devel ...

and the result would be used by the submodule code, but also saved for
future invocations. Likewise, if there's no "clonehead" config for a
particular submodule, if we fall back to submodule.defaultclonehead,
then you could do:

  git clone -c submodule.defaultclonehead=devel ...

and it would also be saved as the default for future submodules.  And
all without having to invent a new submodule-branch-file format.

The name "clonehead" isn't great. I'm not sure if this ought to be
submodule.*.branch (since I don't quite know what that's used for). I
think you'll have to explore that a bit.

> I think that also tends to match the glob-expansion configs we use for
> other things. One thing sticking out to me about the idea of providing
> --submodule-branch is that you need to know what's in the repo before
> you clone it the first time, which being able to use globbing like this
> kind of helps with. But then, I suppose if you don't know what you're
> looking for, you're not also looking for a very precise filter on your
> clone ;)

Yeah; the scheme I outlined above only allow specifying the value for
one submodule, or the fallback default. It wouldn't allow arbitrary
globbing. But I also suspect nobody wants that. If you know what the
submodules are, then you can set up config for each. If you don't, then
"everything" is the only glob that makes sense.

-Peff
Emily Shaffer Jan. 28, 2020, 2:10 a.m. UTC | #11
On Mon, Jan 27, 2020 at 08:31:39PM -0500, Jeff King wrote:
> On Mon, Jan 27, 2020 at 05:08:41PM -0800, Emily Shaffer wrote:
> 
> > > Yeah, I do still think that it makes sense for clone to pass along
> > > --single-branch, regardless, and then deal with branch selection problem
> > > separately on top.
> > 
> > Sure; I've got that ready to send shortly. It seems to grab HEAD of the
> > remote for each submodule, and then checkout the specific commit ID the
> > superproject wants - in my test case, that commit ID was a direct
> > ancestor of 'master', so the single branch only got 'master'. I'm not
> > sure how it would work with a commit ID which doesn't exist in the
> > single branch that was fetched; I'll write a test and have a look.
> 
> Yeah, it's definitely worth exploring how that works. I thought we had
> some kind of fallback for when we didn't manage to fetch the object. But
> maybe I am confusing it with the fallback for "we tried to fetch this
> specific object, but the other side doesn't allow that, so we grabbed a
> branch instead".

Ok, so I gave it a try. Some well-trimmed trace output:

1) git clone --recurse-submodules --single-branch <url> (the branch in
question is remote's HEAD)
 - Normal clone of superproject
 - git submodule--helper update-clone --progress --require-init
   --single-branch --
 - ultimately...
 - git clone --no-checkout --progress --separate-git-dir
   '/.../super_clone/.git/modules/sub'
   --single-branch --
   '/path/to/submodule/source'
   '/path/to/submodule/destination'
 - git checkout -f -q <ID of submodule's HEAD>

2) git clone --recurse-submodules --single-branch --branch other <url>
  'other' points to a commit of 'sub' which is not an ancestor of 'sub''s
  current HEAD.
 - Normal clone of superproject identical to 1)
 - git submodule--helper update-clone --progress --require-init
   --single-branch --
 - ultimately...
 - git clone --no-checkout --progress --separate-git-dir
   '/.../super_clone/.git/modules/sub'
   --single-branch --
   '/path/to/submodule/source'
   '/path/to/submodule/destination'
 - git fetch origin <ID of submodule's other branch>
 - git checkout -f -q <ID of submodule's other branch>

So, somewhere in the submodule machinery, it looks like we check if we
have the commit in question, and if not, we do another fetch. So in this
case, we reach to the server twice per submodule.

On the bright side, it doesn't fall over; on the dim side, I'd think
we could ask for this ref up front along with whatever branch HEAD is.
I thought there was a way we could tell the server we want 'master' as
well as '58c34ed'?

> 
> > > So for the simple case, you probably do want to be able to say "use this
> > > branch for cloning all submodules".
> > 
> > I think it still makes sense to call this out explicitly, yes? Or do you
> > think that should just be the default?
> 
> Yes, I think it should be a separate option from "--branch".
> 
> > This made me think - I wonder if it makes sense to take
> > --submodule-branch as a wildcarded spec instead. So in your case, I
> > could say,
> > 
> >   git clone --submodule-branch *=devel -b devel superproject
> > 
> > And then I don't need to do anything differently for 'git fetch' later.
> > This also opens the door for some repos getting special treatment:
> > 
> >   git clone --submodule-branch-file=foo.txt -b dev example
> > 
> >   foo.txt:
> >   curl=stable-1.2.3
> >   nlohmann=v2.28
> >   example-*=dev
> >   *=master
> 
> If we write it all as config, I think things may get simpler. IIRC,
> there is already submodule.*.foo config in .git/config (that can mirror
> and override what's in .gitmodules).

Hm. But at clone time, there is no .git/config yet, which is why I
proposed a file passed in at the command line. Although it does seem to
make sense to write down those preferences in the .git/config after.

I guess you could pass in configs at the command line, though, and then
you don't have to massage it to write your config after fetch.

> So if we had some config option for "clone this branch for the submodule
> instead of HEAD", then that means you can do:
> 
>   git clone -c submodule.foo.clonehead=devel ...
> 
> and the result would be used by the submodule code, but also saved for
> future invocations. Likewise, if there's no "clonehead" config for a
> particular submodule, if we fall back to submodule.defaultclonehead,
> then you could do:
> 
>   git clone -c submodule.defaultclonehead=devel ...
> 
> and it would also be saved as the default for future submodules.  And
> all without having to invent a new submodule-branch-file format.
> 
> The name "clonehead" isn't great.

Au contraire - it might be my new go-to insult. ;)

> I'm not sure if this ought to be submodule.*.branch (since I don't
> quite know what that's used for). I think you'll have to explore that
> a bit.
> 
> > I think that also tends to match the glob-expansion configs we use for
> > other things. One thing sticking out to me about the idea of providing
> > --submodule-branch is that you need to know what's in the repo before
> > you clone it the first time, which being able to use globbing like this
> > kind of helps with. But then, I suppose if you don't know what you're
> > looking for, you're not also looking for a very precise filter on your
> > clone ;)
> 
> Yeah; the scheme I outlined above only allow specifying the value for
> one submodule, or the fallback default. It wouldn't allow arbitrary
> globbing. But I also suspect nobody wants that. If you know what the
> submodules are, then you can set up config for each. If you don't, then
> "everything" is the only glob that makes sense.

Yeah, I suspect you're right and this fancy globbing falls under YAGNI.

 - Emily

Patch
diff mbox series

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 22425cbc76..8c516a9670 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -133,7 +133,7 @@  If you really want to remove a submodule from the repository and commit
 that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
 options.
 
-update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
+update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--single-branch] [-b|--branch <name>] [--] [<path>...]::
 +
 --
 Update the registered submodules to match what the superproject
@@ -430,6 +430,10 @@  options carefully.
 	Clone new submodules in parallel with as many jobs.
 	Defaults to the `submodule.fetchJobs` option.
 
+--single-branch::
+	This option is only valid for the update command.
+	Clone only one branch during update, HEAD or --branch.
+
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
 	to only operate on the submodules found at the specified paths.
diff --git a/builtin/clone.c b/builtin/clone.c
index 6dee265cc9..293cef8b30 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -808,6 +808,12 @@  static int checkout(int submodule_progress)
 			argv_array_push(&args, "--no-fetch");
 		}
 
+		if (option_single_branch)
+			argv_array_push(&args, "--single-branch");
+
+		if (option_branch)
+			argv_array_pushf(&args, "--branch=%s", option_branch);
+
 		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
 		argv_array_clear(&args);
 	}
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c72931ecd7..92bd823d38 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1225,7 +1225,8 @@  static int module_deinit(int argc, const char **argv, const char *prefix)
 
 static int clone_submodule(const char *path, const char *gitdir, const char *url,
 			   const char *depth, struct string_list *reference, int dissociate,
-			   int quiet, int progress)
+			   int quiet, int progress, int single_branch,
+			   const char *branch)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 
@@ -1247,6 +1248,10 @@  static int clone_submodule(const char *path, const char *gitdir, const char *url
 		argv_array_push(&cp.args, "--dissociate");
 	if (gitdir && *gitdir)
 		argv_array_pushl(&cp.args, "--separate-git-dir", gitdir, NULL);
+	if (single_branch)
+		argv_array_push(&cp.args, "--single-branch");
+	if (branch)
+		argv_array_pushl(&cp.args, "--branch", branch, NULL);
 
 	argv_array_push(&cp.args, "--");
 	argv_array_push(&cp.args, url);
@@ -1373,6 +1378,8 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 	struct string_list reference = STRING_LIST_INIT_NODUP;
 	int dissociate = 0, require_init = 0;
 	char *sm_alternate = NULL, *error_strategy = NULL;
+	int single_branch = 0;
+	char *branch = NULL;
 
 	struct option module_clone_options[] = {
 		OPT_STRING(0, "prefix", &prefix,
@@ -1400,12 +1407,17 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 			   N_("force cloning progress")),
 		OPT_BOOL(0, "require-init", &require_init,
 			   N_("disallow cloning into non-empty directory")),
+		OPT_BOOL(0, "single-branch", &single_branch,
+			 N_("clone only one branch, HEAD or --branch")),
+		OPT_STRING('b', "branch", &branch, "<branch>",
+			   N_("checkout <branch> instead of the remote's HEAD")),
 		OPT_END()
 	};
 
 	const char *const git_submodule_helper_usage[] = {
 		N_("git submodule--helper clone [--prefix=<path>] [--quiet] "
 		   "[--reference <repository>] [--name <name>] [--depth <depth>] "
+		   "[--single-branch] [-b | --branch <name>] "
 		   "--url <url> --path <path>"),
 		NULL
 	};
@@ -1438,7 +1450,7 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 		prepare_possible_alternates(name, &reference);
 
 		if (clone_submodule(path, sm_gitdir, url, depth, &reference, dissociate,
-				    quiet, progress))
+				    quiet, progress, single_branch, branch))
 			die(_("clone of '%s' into submodule path '%s' failed"),
 			    url, path);
 	} else {
@@ -1562,6 +1574,8 @@  struct submodule_update_clone {
 	const char *depth;
 	const char *recursive_prefix;
 	const char *prefix;
+	int single_branch;
+	const char *branch;
 
 	/* to be consumed by git-submodule.sh */
 	struct update_clone_data *update_clone;
@@ -1578,7 +1592,7 @@  struct submodule_update_clone {
 };
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
 	SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, 0, \
-	NULL, NULL, NULL, \
+	NULL, NULL, NULL, 0, NULL,\
 	NULL, 0, 0, 0, NULL, 0, 0, 1}
 
 
@@ -1718,6 +1732,10 @@  static int prepare_to_clone_next_submodule(const struct cache_entry *ce,
 		argv_array_push(&child->args, "--dissociate");
 	if (suc->depth)
 		argv_array_push(&child->args, suc->depth);
+	if (suc->single_branch)
+		argv_array_push(&child->args, "--single-branch");
+	if (suc->branch)
+		argv_array_pushl(&child->args, "--branch", suc->branch, NULL);
 
 cleanup:
 	strbuf_reset(&displaypath_sb);
@@ -1897,6 +1915,10 @@  static int update_clone(int argc, const char **argv, const char *prefix)
 			    N_("force cloning progress")),
 		OPT_BOOL(0, "require-init", &suc.require_init,
 			   N_("disallow cloning into non-empty directory")),
+		OPT_BOOL(0, "single-branch", &suc.single_branch,
+			 N_("clone only one branch, HEAD or --branch")),
+		OPT_STRING('b', "branch", &suc.branch, "<branch>",
+			   N_("checkout <branch> instead of the remote's HEAD")),
 		OPT_END()
 	};
 
diff --git a/git-submodule.sh b/git-submodule.sh
index aaa1809d24..c2eadbb930 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -10,7 +10,7 @@  USAGE="[--quiet] [--cached]
    or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
    or: $dashless [--quiet] init [--] [<path>...]
    or: $dashless [--quiet] deinit [-f|--force] (--all| [--] <path>...)
-   or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--] [<path>...]
+   or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--single-branch] [-b|--branch <name>] [--] [<path>...]
    or: $dashless [--quiet] set-branch (--default|--branch <branch>) [--] <path>
    or: $dashless [--quiet] set-url [--] <path> <newurl>
    or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
@@ -47,6 +47,8 @@  custom_name=
 depth=
 progress=
 dissociate=
+single_branch=
+sm_branch=
 
 die_if_unmatched ()
 {
@@ -526,6 +528,17 @@  cmd_update()
 		--jobs=*)
 			jobs=$1
 			;;
+		--single-branch)
+			single_branch=1
+			;;
+		-b|--branch)
+			case "$2" in '') usage ;; esac
+			sm_branch="--branch=$2"
+			shift
+			;;
+		-b=*|--branch=*)
+			sm_branch=$1
+			;;
 		--)
 			shift
 			break
@@ -555,6 +568,8 @@  cmd_update()
 		${dissociate:+"--dissociate"} \
 		${depth:+--depth "$depth"} \
 		${require_init:+--require-init} \
+		${single_branch:+--single-branch} \
+		${sm_branch:+"$sm_branch"} \
 		$recommend_shallow \
 		$jobs \
 		-- \
diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh
index 37fcce9c40..390645897d 100755
--- a/t/t5617-clone-submodules-remote.sh
+++ b/t/t5617-clone-submodules-remote.sh
@@ -14,14 +14,16 @@  test_expect_success 'setup' '
 		cd sub &&
 		git init &&
 		test_commit subcommit1 &&
-		git tag sub_when_added_to_super
+		git tag sub_when_added_to_super &&
+		git branch other
 	) &&
 	git submodule add "file://$pwd/sub" sub &&
 	git commit -m "add submodule" &&
 	(
 		cd sub &&
 		test_commit subcommit2
-	)
+	) &&
+	git branch other
 '
 
 test_expect_success 'clone with --no-remote-submodules' '
@@ -51,4 +53,24 @@  test_expect_success 'check the default is --no-remote-submodules' '
 	)
 '
 
+test_expect_success 'clone with --single-branch' '
+	test_when_finished "rm -rf super_clone" &&
+	git clone --recurse-submodules --single-branch "file://$pwd/." super_clone &&
+	(
+		cd super_clone/sub &&
+		git branch -a >branches &&
+		test_must_fail grep other branches
+	)
+'
+
+test_expect_success 'clone with --single-branch and --branch' '
+	test_when_finished "rm -rf super_clone" &&
+	git clone --recurse-submodules --single-branch --branch other "file://$pwd/." super_clone &&
+	(
+		cd super_clone/sub &&
+		git branch -a >branches &&
+		test_must_fail grep master branches
+	)
+'
+
 test_done