diff mbox series

[v2] clone: pass --single-branch during --recurse-submodules

Message ID 20200128221736.9217-1-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] clone: pass --single-branch during --recurse-submodules | expand

Commit Message

Emily Shaffer Jan. 28, 2020, 10:17 p.m. UTC
Previously, performing "git clone --recurse-submodules --single-branch"
resulted in submodules cloning all branches even though the superproject
cloned only one branch. Pipe --single-branch through the submodule
helper framework to make it to 'clone' later on.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
Removed references to --branch since v1.

As discussed in the thread for v1 of this patch, in cases when the
submodule commit referenced by the specified superproject branch isn't
the same as the HEAD of the submodule repo known by the server side,
this still works in kind of a non-obvious way. In these cases, first we
fetch the single branch that is the ancestor of the server's HEAD; then
we fetch the commit needed by the superproject (and its ancestry). So
while this change prevents us from fetching *all* branches on clone, it
doesn't necessarily limit us to a single branch as described.

To limit submodules to a single branch on clone, we need to know which
branch we want for each submodule; unfortunately it seems that just
propagating --branch doesn't necessarily help as there's not much
guarantee that the superproject branch name matches the submodule branch
name, or that all submodule branch names are the same. So, we need to
give a little more thought there.

 - Emily

 Documentation/git-submodule.txt    |  6 +++++-
 builtin/clone.c                    |  3 +++
 builtin/submodule--helper.c        | 17 ++++++++++++++---
 git-submodule.sh                   |  7 ++++++-
 t/t5617-clone-submodules-remote.sh | 13 ++++++++++++-
 5 files changed, 40 insertions(+), 6 deletions(-)

Comments

Jeff King Jan. 30, 2020, 10:23 a.m. UTC | #1
On Tue, Jan 28, 2020 at 02:17:36PM -0800, Emily Shaffer wrote:

> Previously, performing "git clone --recurse-submodules --single-branch"
> resulted in submodules cloning all branches even though the superproject
> cloned only one branch. Pipe --single-branch through the submodule
> helper framework to make it to 'clone' later on.

This makes sense to me, bearing in mind that I'm not at all a good
person to point out subtleties with submodules that could bite us.

> As discussed in the thread for v1 of this patch, in cases when the
> submodule commit referenced by the specified superproject branch isn't
> the same as the HEAD of the submodule repo known by the server side,
> this still works in kind of a non-obvious way. In these cases, first we
> fetch the single branch that is the ancestor of the server's HEAD; then
> we fetch the commit needed by the superproject (and its ancestry). So
> while this change prevents us from fetching *all* branches on clone, it
> doesn't necessarily limit us to a single branch as described.

Is it worth adding a test that we do the right thing here? Not so much
to prove that it works now, but to protect us against future changes. It
seems like the sort of thing that could get subtly broken.

The patch looks mostly good to me (my, that was a lot of plumbing
through that option); here are a few minor comments:

> -update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
> +update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--single-branch] [--] [<path>...]::

This line is horrendously long. Not new in your patch, but I wonder if
the time might have come to break it up.

> +--single-branch::
> +	This option is only valid for the update command.
> +	Clone only one branch during update, HEAD or --branch.

For some reason my brain insists on parsing this second sentence as a
3-item list without an Oxford comma. I wonder if it would be more clear
as:

  Clone only one branch during update: HEAD or one specified by
  --branch.

or similar.

>  #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
>  	SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, 0, \
> -	NULL, NULL, NULL, \
> +	NULL, NULL, NULL, 0,\
>  	NULL, 0, 0, 0, NULL, 0, 0, 1}

Wow. Also not new in your patch, but I think we're moving towards the
use of C99 named initializers, which would make this a bit less
daunting (all of the NULL/0 items could be omitted!).

> +test_expect_success 'clone with --single-branch' '
> +	test_when_finished "rm -rf super_clone" &&
> +	git clone --recurse-submodules --single-branch "file://$pwd/." super_clone &&
> +	(
> +		cd super_clone/sub &&
> +		git branch -a >branches &&
> +		test_must_fail grep other branches
> +	)
> +'

Don't use test_must_fail with non-Git commands; you can just say "!  grep".

We usually try to avoid scripting around git-branch output (although I
find it pretty unlikely that future changes would break this particular
case). git-for-each-ref would be a better pick, but I wonder if:

  git rev-parse --verify origin/master &&
  test_must_fail git rev-parse --verify origin/other

might express the expectation more clearly.

-Peff
Emily Shaffer Feb. 21, 2020, 2:53 a.m. UTC | #2
On Thu, Jan 30, 2020 at 05:23:03AM -0500, Jeff King wrote:
> On Tue, Jan 28, 2020 at 02:17:36PM -0800, Emily Shaffer wrote:

Ouch. I forgot about this review for some time. Sorry :)

> 
> > Previously, performing "git clone --recurse-submodules --single-branch"
> > resulted in submodules cloning all branches even though the superproject
> > cloned only one branch. Pipe --single-branch through the submodule
> > helper framework to make it to 'clone' later on.
> 
> This makes sense to me, bearing in mind that I'm not at all a good
> person to point out subtleties with submodules that could bite us.

I wonder if it makes sense to ship this to our submodule-using masses
here for a little while and see how it works / whether anybody yells?
This might be too small for that kind of thing.

> 
> > As discussed in the thread for v1 of this patch, in cases when the
> > submodule commit referenced by the specified superproject branch isn't
> > the same as the HEAD of the submodule repo known by the server side,
> > this still works in kind of a non-obvious way. In these cases, first we
> > fetch the single branch that is the ancestor of the server's HEAD; then
> > we fetch the commit needed by the superproject (and its ancestry). So
> > while this change prevents us from fetching *all* branches on clone, it
> > doesn't necessarily limit us to a single branch as described.
> 
> Is it worth adding a test that we do the right thing here? Not so much
> to prove that it works now, but to protect us against future changes. It
> seems like the sort of thing that could get subtly broken.

What did you have in mind beyond the test I added already?

> 
> The patch looks mostly good to me (my, that was a lot of plumbing
> through that option); here are a few minor comments:
> 
> > -update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
> > +update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--single-branch] [--] [<path>...]::
> 
> This line is horrendously long. Not new in your patch, but I wonder if
> the time might have come to break it up.

I dug around in Asciidoc doc and couldn't find a good way to do so. The
trailing :: means this command listing is done as a "definition list",
and I just didn't see any way to multiline an entry for such a thing. :(

> 
> > +--single-branch::
> > +	This option is only valid for the update command.
> > +	Clone only one branch during update, HEAD or --branch.
> 
> For some reason my brain insists on parsing this second sentence as a
> 3-item list without an Oxford comma. I wonder if it would be more clear
> as:
> 
>   Clone only one branch during update: HEAD or one specified by
>   --branch.
> 
> or similar.

Took it verbatim, I agree.

> 
> >  #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
> >  	SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, 0, \
> > -	NULL, NULL, NULL, \
> > +	NULL, NULL, NULL, 0,\
> >  	NULL, 0, 0, 0, NULL, 0, 0, 1}
> 
> Wow. Also not new in your patch, but I think we're moving towards the
> use of C99 named initializers, which would make this a bit less
> daunting (all of the NULL/0 items could be omitted!).

Hrm. Yeah, I'll add a quick patch before this one to do so. It's pretty
gross :)

> > +test_expect_success 'clone with --single-branch' '
> > +	test_when_finished "rm -rf super_clone" &&
> > +	git clone --recurse-submodules --single-branch "file://$pwd/." super_clone &&
> > +	(
> > +		cd super_clone/sub &&
> > +		git branch -a >branches &&
> > +		test_must_fail grep other branches
> > +	)
> > +'
> 
> Don't use test_must_fail with non-Git commands; you can just say "!  grep".
> 
> We usually try to avoid scripting around git-branch output (although I

ooof

> find it pretty unlikely that future changes would break this particular
> case). git-for-each-ref would be a better pick, but I wonder if:
> 
>   git rev-parse --verify origin/master &&
>   test_must_fail git rev-parse --verify origin/other
> 
> might express the expectation more clearly.

Sure, done.

Sorry again for the long wait, and thanks for the effort on the review.
New revision coming momentarily.

 - Emily
Jeff King Feb. 21, 2020, 3:16 a.m. UTC | #3
On Thu, Feb 20, 2020 at 06:53:06PM -0800, Emily Shaffer wrote:

> On Thu, Jan 30, 2020 at 05:23:03AM -0500, Jeff King wrote:
> > On Tue, Jan 28, 2020 at 02:17:36PM -0800, Emily Shaffer wrote:
> 
> Ouch. I forgot about this review for some time. Sorry :)

That's OK. I forgot about, too. :)

> > This makes sense to me, bearing in mind that I'm not at all a good
> > person to point out subtleties with submodules that could bite us.
> 
> I wonder if it makes sense to ship this to our submodule-using masses
> here for a little while and see how it works / whether anybody yells?
> This might be too small for that kind of thing.

Certainly putting it in front of more users gives us more confidence.
But I have a feeling the gotcha here would be that it does what _some_
set of users wants, but not so much for others. And your audience may
not be a representative sample. So certainly it doesn't hurt, but I'll
leave it you whether you think it's worthwhile.

I'm having trouble even coming up with a devil's advocate example for
the "others" in this case, though. Getting general exposure in "next"
and then "master" leading up to the release might also help, but I get
the impression that those audiences aren't necessarily comprehensive,
either.

> > > As discussed in the thread for v1 of this patch, in cases when the
> > > submodule commit referenced by the specified superproject branch isn't
> > > the same as the HEAD of the submodule repo known by the server side,
> > > this still works in kind of a non-obvious way. In these cases, first we
> > > fetch the single branch that is the ancestor of the server's HEAD; then
> > > we fetch the commit needed by the superproject (and its ancestry). So
> > > while this change prevents us from fetching *all* branches on clone, it
> > > doesn't necessarily limit us to a single branch as described.
> > 
> > Is it worth adding a test that we do the right thing here? Not so much
> > to prove that it works now, but to protect us against future changes. It
> > seems like the sort of thing that could get subtly broken.
> 
> What did you have in mind beyond the test I added already?

I think your test just covers the "happy path" that what we want to
fetch is on the tip of that single branch. But do we test a case where
the commit we need for a submodule isn't on the HEAD branch of that
submodule, and we "clone --single-branch --recurse-submodules" the
superproject and it works? I.e., from what you describe above it should
work, but it might be nice to make sure it continues to do so.

> > > -update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
> > > +update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--single-branch] [--] [<path>...]::
> > 
> > This line is horrendously long. Not new in your patch, but I wonder if
> > the time might have come to break it up.
> 
> I dug around in Asciidoc doc and couldn't find a good way to do so. The
> trailing :: means this command listing is done as a "definition list",
> and I just didn't see any way to multiline an entry for such a thing. :(

Hmm, yeah. Personally I think it would be better as:

  update [options] [--] [<path>...]::
     ...
     The supported options are:

       --init::
         ...here's what init does...

but I guess some of them are repeated between commands, which would need
addressing. Anyway, it's clear this is way beyond the scope of your
patch, so let's forget about it for now.

-Peff
diff mbox series

Patch

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 5232407f68..10c42b752a 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -133,7 +133,7 @@  If you really want to remove a submodule from the repository and commit
 that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
 options.
 
-update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
+update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--single-branch] [--] [<path>...]::
 +
 --
 Update the registered submodules to match what the superproject
@@ -430,6 +430,10 @@  options carefully.
 	Clone new submodules in parallel with as many jobs.
 	Defaults to the `submodule.fetchJobs` option.
 
+--single-branch::
+	This option is only valid for the update command.
+	Clone only one branch during update, HEAD or --branch.
+
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
 	to only operate on the submodules found at the specified paths.
diff --git a/builtin/clone.c b/builtin/clone.c
index 0fc89ae2b9..ec749a86c1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -832,6 +832,9 @@  static int checkout(int submodule_progress)
 			argv_array_push(&args, "--no-fetch");
 		}
 
+		if (option_single_branch)
+			argv_array_push(&args, "--single-branch");
+
 		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
 		argv_array_clear(&args);
 	}
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c72931ecd7..50c9e23db1 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1225,7 +1225,7 @@  static int module_deinit(int argc, const char **argv, const char *prefix)
 
 static int clone_submodule(const char *path, const char *gitdir, const char *url,
 			   const char *depth, struct string_list *reference, int dissociate,
-			   int quiet, int progress)
+			   int quiet, int progress, int single_branch)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 
@@ -1247,6 +1247,8 @@  static int clone_submodule(const char *path, const char *gitdir, const char *url
 		argv_array_push(&cp.args, "--dissociate");
 	if (gitdir && *gitdir)
 		argv_array_pushl(&cp.args, "--separate-git-dir", gitdir, NULL);
+	if (single_branch)
+		argv_array_push(&cp.args, "--single-branch");
 
 	argv_array_push(&cp.args, "--");
 	argv_array_push(&cp.args, url);
@@ -1373,6 +1375,7 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 	struct string_list reference = STRING_LIST_INIT_NODUP;
 	int dissociate = 0, require_init = 0;
 	char *sm_alternate = NULL, *error_strategy = NULL;
+	int single_branch = 0;
 
 	struct option module_clone_options[] = {
 		OPT_STRING(0, "prefix", &prefix,
@@ -1400,12 +1403,15 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 			   N_("force cloning progress")),
 		OPT_BOOL(0, "require-init", &require_init,
 			   N_("disallow cloning into non-empty directory")),
+		OPT_BOOL(0, "single-branch", &single_branch,
+			 N_("clone only one branch, HEAD or --branch")),
 		OPT_END()
 	};
 
 	const char *const git_submodule_helper_usage[] = {
 		N_("git submodule--helper clone [--prefix=<path>] [--quiet] "
 		   "[--reference <repository>] [--name <name>] [--depth <depth>] "
+		   "[--single-branch] "
 		   "--url <url> --path <path>"),
 		NULL
 	};
@@ -1438,7 +1444,7 @@  static int module_clone(int argc, const char **argv, const char *prefix)
 		prepare_possible_alternates(name, &reference);
 
 		if (clone_submodule(path, sm_gitdir, url, depth, &reference, dissociate,
-				    quiet, progress))
+				    quiet, progress, single_branch))
 			die(_("clone of '%s' into submodule path '%s' failed"),
 			    url, path);
 	} else {
@@ -1562,6 +1568,7 @@  struct submodule_update_clone {
 	const char *depth;
 	const char *recursive_prefix;
 	const char *prefix;
+	int single_branch;
 
 	/* to be consumed by git-submodule.sh */
 	struct update_clone_data *update_clone;
@@ -1578,7 +1585,7 @@  struct submodule_update_clone {
 };
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
 	SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, 0, \
-	NULL, NULL, NULL, \
+	NULL, NULL, NULL, 0,\
 	NULL, 0, 0, 0, NULL, 0, 0, 1}
 
 
@@ -1718,6 +1725,8 @@  static int prepare_to_clone_next_submodule(const struct cache_entry *ce,
 		argv_array_push(&child->args, "--dissociate");
 	if (suc->depth)
 		argv_array_push(&child->args, suc->depth);
+	if (suc->single_branch)
+		argv_array_push(&child->args, "--single-branch");
 
 cleanup:
 	strbuf_reset(&displaypath_sb);
@@ -1897,6 +1906,8 @@  static int update_clone(int argc, const char **argv, const char *prefix)
 			    N_("force cloning progress")),
 		OPT_BOOL(0, "require-init", &suc.require_init,
 			   N_("disallow cloning into non-empty directory")),
+		OPT_BOOL(0, "single-branch", &suc.single_branch,
+			 N_("clone only one branch, HEAD or --branch")),
 		OPT_END()
 	};
 
diff --git a/git-submodule.sh b/git-submodule.sh
index aaa1809d24..95f73ab775 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -10,7 +10,7 @@  USAGE="[--quiet] [--cached]
    or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
    or: $dashless [--quiet] init [--] [<path>...]
    or: $dashless [--quiet] deinit [-f|--force] (--all| [--] <path>...)
-   or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--] [<path>...]
+   or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--single-branch] [--] [<path>...]
    or: $dashless [--quiet] set-branch (--default|--branch <branch>) [--] <path>
    or: $dashless [--quiet] set-url [--] <path> <newurl>
    or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
@@ -47,6 +47,7 @@  custom_name=
 depth=
 progress=
 dissociate=
+single_branch=
 
 die_if_unmatched ()
 {
@@ -526,6 +527,9 @@  cmd_update()
 		--jobs=*)
 			jobs=$1
 			;;
+		--single-branch)
+			single_branch=1
+			;;
 		--)
 			shift
 			break
@@ -555,6 +559,7 @@  cmd_update()
 		${dissociate:+"--dissociate"} \
 		${depth:+--depth "$depth"} \
 		${require_init:+--require-init} \
+		${single_branch:+--single-branch} \
 		$recommend_shallow \
 		$jobs \
 		-- \
diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh
index 37fcce9c40..67f2b48659 100755
--- a/t/t5617-clone-submodules-remote.sh
+++ b/t/t5617-clone-submodules-remote.sh
@@ -14,7 +14,8 @@  test_expect_success 'setup' '
 		cd sub &&
 		git init &&
 		test_commit subcommit1 &&
-		git tag sub_when_added_to_super
+		git tag sub_when_added_to_super &&
+		git branch other
 	) &&
 	git submodule add "file://$pwd/sub" sub &&
 	git commit -m "add submodule" &&
@@ -51,4 +52,14 @@  test_expect_success 'check the default is --no-remote-submodules' '
 	)
 '
 
+test_expect_success 'clone with --single-branch' '
+	test_when_finished "rm -rf super_clone" &&
+	git clone --recurse-submodules --single-branch "file://$pwd/." super_clone &&
+	(
+		cd super_clone/sub &&
+		git branch -a >branches &&
+		test_must_fail grep other branches
+	)
+'
+
 test_done