diff mbox series

[v2] fetch: do not run a redundant fetch from submodule

Message ID xmqqk0alyqyj.fsf_-_@gitster.g (mailing list archive)
State Accepted
Commit 0353c6881890db1302f0f1bdf85c6076eed61113
Headers show
Series [v2] fetch: do not run a redundant fetch from submodule | expand

Commit Message

Junio C Hamano May 16, 2022, 11:53 p.m. UTC
When 7dce19d3 (fetch/pull: Add the --recurse-submodules option,
2010-11-12) introduced the "--recurse-submodule" option, the
approach taken was to perform fetches in submodules only once, after
all the main fetching (it may usually be a fetch from a single
remote, but it could be fetching from a group of remotes using
fetch_multiple()) succeeded.  Later we added "--all" to fetch from
all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch
--recurse-submodule --all", you'd see a fetch for the top-level,
which invokes another fetch for the submodule, followed by another
fetch for the same submodule.  All but the last fetch for the
submodule come from a "git fetch --recurse-submodules" subprocess
that is spawned via the fetch_multiple() interface for the remotes,
and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for
the top-level in fetch_multiple(), the last fetch in the submodule
is redundant.  It only matters when fetch_one() interacts with a
single remote at the top-level.

While we are at it, there is one optimization that exists in dealing
with a group of remote, but is missing when "--all" is used.  In the
former, when the group turns out to be a group of one, instead of
spawning "git fetch" as a subprocess via the fetch_multiple()
interface, we use the normal fetch_one() code path.  Do the same
when handing "--all", if it turns out that we have only one remote
defined.

Helped-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

So here is a second attempt.  It demonstrates a bit interesting
funny in range-diff where similar changes from the previous round
gets applied to a different target.

t5617 is much cleanly organized than t5526, and we may want to clean
up the latter after dust settles.

1:  006fe43da1 ! 1:  da0a4e341b fetch: do not run a redundant fetch from submodule
    @@ Commit message
         when handing "--all", if it turns out that we have only one remote
         defined.
     
    +    Helped-by: Glen Choo <chooglen@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## builtin/fetch.c ##
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
      			die(_("fetch --all does not make sense with refspecs"));
      		(void) for_each_remote(get_one_remote_for_fetch, &list);
     +
    -+		/* no point doing fetch_multiple() of one */
    ++		/* do not do fetch_multiple() of one */
     +		if (list.nr == 1)
     +			remote = remote_get(list.items[0].string);
      	} else if (argc == 0) {
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
      	}
      
     -	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
    ++
    ++	/*
    ++	 * This is only needed after fetch_one(), which does not fetch
    ++	 * submodules by itself.
    ++	 *
    ++	 * When we fetch from multiple remotes, fetch_multiple() has
    ++	 * already updated submodules to grab commits necessary for
    ++	 * the fetched history from each remote, so there is no need
    ++	 * to fetch submodules from here.
    ++	 */
     +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
      		struct strvec options = STRVEC_INIT;
      		int max_children = max_jobs;
      
     
    - ## t/t5617-clone-submodules-remote.sh ##
    -@@ t/t5617-clone-submodules-remote.sh: test_expect_success 'clone with --single-branch' '
    + ## t/t5526-fetch-submodules.sh ##
    +@@ t/t5526-fetch-submodules.sh: test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopul
      	)
      '
      
     +test_expect_success 'fetch --all with --recurse-submodules' '
    -+	test_when_finished "rm -fr super_clone" &&
    -+	git clone --recurse-submodules srv.bare super_clone &&
    ++	test_when_finished "rm -fr src_clone" &&
    ++	git clone --recurse-submodules src src_clone &&
     +	(
    -+		cd super_clone &&
    ++		cd src_clone &&
     +		git config submodule.recurse true &&
     +		git config fetch.parallel 0 &&
     +		git fetch --all 2>../fetch-log
     +	) &&
    -+	grep "Fetching sub" fetch-log >fetch-subs &&
    ++	grep "^Fetching submodule sub$" fetch-log >fetch-subs &&
     +	test_line_count = 1 fetch-subs
     +'
     +
    - # do basic partial clone from "srv.bare"
    - # confirm partial clone was registered in the local config for super and sub.
    - test_expect_success 'clone with --filter' '
    ++test_expect_success 'fetch --all with --recurse-submodules with multiple' '
    ++	test_when_finished "rm -fr src_clone" &&
    ++	git clone --recurse-submodules src src_clone &&
    ++	(
    ++		cd src_clone &&
    ++		git remote add secondary ../src &&
    ++		git config submodule.recurse true &&
    ++		git config fetch.parallel 0 &&
    ++		git fetch --all 2>../fetch-log
    ++	) &&
    ++	grep "Fetching submodule sub" fetch-log >fetch-subs &&
    ++	test_line_count = 2 fetch-subs
    ++'
    ++
    + test_done


 builtin/fetch.c             | 16 +++++++++++++++-
 t/t5526-fetch-submodules.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

Comments

Glen Choo May 17, 2022, 4:47 p.m. UTC | #1
This version looks good to me, thanks :)

  Reviewed-by: Glen Choo <chooglen@google.com>

Junio C Hamano <gitster@pobox.com> writes:

> t5617 is much cleanly organized than t5526, and we may want to clean
> up the latter after dust settles.

Yeah, t5526 has so many tests for the 'core' functionality that it's
hard to fit something 'tangential' like "--all". I might touch it again
soon, so I'll keep this in mind.

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e3791f09ed..8b15c40bb2 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2261,7 +2265,17 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		result = fetch_multiple(&list, max_children);
>  	}
>  
> -	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
> +
> +	/*
> +	 * This is only needed after fetch_one(), which does not fetch
> +	 * submodules by itself.
> +	 *
> +	 * When we fetch from multiple remotes, fetch_multiple() has
> +	 * already updated submodules to grab commits necessary for
> +	 * the fetched history from each remote, so there is no need
> +	 * to fetch submodules from here.
> +	 */
> +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
>  		struct strvec options = STRVEC_INIT;
>  		int max_children = max_jobs;

Looks good; the comment is easier to understand than my suggestion for
sure.

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 43dada8544..a301b56db8 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -1125,4 +1125,31 @@ test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopul
>  	)
>  '
>  
> +test_expect_success 'fetch --all with --recurse-submodules' '
> +	test_when_finished "rm -fr src_clone" &&
> +	git clone --recurse-submodules src src_clone &&
> +	(
> +		cd src_clone &&
> +		git config submodule.recurse true &&
> +		git config fetch.parallel 0 &&
> +		git fetch --all 2>../fetch-log
> +	) &&
> +	grep "^Fetching submodule sub$" fetch-log >fetch-subs &&
> +	test_line_count = 1 fetch-subs
> +'
> +
> +test_expect_success 'fetch --all with --recurse-submodules with multiple' '
> +	test_when_finished "rm -fr src_clone" &&
> +	git clone --recurse-submodules src src_clone &&
> +	(
> +		cd src_clone &&
> +		git remote add secondary ../src &&
> +		git config submodule.recurse true &&
> +		git config fetch.parallel 0 &&
> +		git fetch --all 2>../fetch-log
> +	) &&
> +	grep "Fetching submodule sub" fetch-log >fetch-subs &&
> +	test_line_count = 2 fetch-subs
> +'
> +

Also looks good.
Junio C Hamano May 18, 2022, 3:53 p.m. UTC | #2
Glen Choo <chooglen@google.com> writes:

>> +
>> +	/*
>> +	 * This is only needed after fetch_one(), which does not fetch
>> +	 * submodules by itself.
>> +	 *
>> +	 * When we fetch from multiple remotes, fetch_multiple() has
>> +	 * already updated submodules to grab commits necessary for
>> +	 * the fetched history from each remote, so there is no need
>> +	 * to fetch submodules from here.
>> +	 */
>> +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
>>  		struct strvec options = STRVEC_INIT;
>>  		int max_children = max_jobs;
>
> Looks good; the comment is easier to understand than my suggestion for
> sure.

Thanks.  Today's code has diverged too much from the original code I
wrote long time ago (before submodules), and I needed an extra set
of eyeballs to double check and tell me that what I (wishfully)
wrote how the code works with submodules is in line with today's
code ;-)
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed..8b15c40bb2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2187,6 +2187,10 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
+
+		/* do not do fetch_multiple() of one */
+		if (list.nr == 1)
+			remote = remote_get(list.items[0].string);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);
@@ -2261,7 +2265,17 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_multiple(&list, max_children);
 	}
 
-	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
+
+	/*
+	 * This is only needed after fetch_one(), which does not fetch
+	 * submodules by itself.
+	 *
+	 * When we fetch from multiple remotes, fetch_multiple() has
+	 * already updated submodules to grab commits necessary for
+	 * the fetched history from each remote, so there is no need
+	 * to fetch submodules from here.
+	 */
+	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct strvec options = STRVEC_INIT;
 		int max_children = max_jobs;
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 43dada8544..a301b56db8 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -1125,4 +1125,31 @@  test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopul
 	)
 '
 
+test_expect_success 'fetch --all with --recurse-submodules' '
+	test_when_finished "rm -fr src_clone" &&
+	git clone --recurse-submodules src src_clone &&
+	(
+		cd src_clone &&
+		git config submodule.recurse true &&
+		git config fetch.parallel 0 &&
+		git fetch --all 2>../fetch-log
+	) &&
+	grep "^Fetching submodule sub$" fetch-log >fetch-subs &&
+	test_line_count = 1 fetch-subs
+'
+
+test_expect_success 'fetch --all with --recurse-submodules with multiple' '
+	test_when_finished "rm -fr src_clone" &&
+	git clone --recurse-submodules src src_clone &&
+	(
+		cd src_clone &&
+		git remote add secondary ../src &&
+		git config submodule.recurse true &&
+		git config fetch.parallel 0 &&
+		git fetch --all 2>../fetch-log
+	) &&
+	grep "Fetching submodule sub" fetch-log >fetch-subs &&
+	test_line_count = 2 fetch-subs
+'
+
 test_done