diff mbox series

fetch: do not run a redundant fetch from submodule

Message ID xmqqczgd16wx.fsf_-_@gitster.g (mailing list archive)
State Superseded
Headers show
Series fetch: do not run a redundant fetch from submodule | expand

Commit Message

Junio C Hamano May 16, 2022, 9:53 p.m. UTC
When 7dce19d3 (fetch/pull: Add the --recurse-submodules option,
2010-11-12) introduced the "--recurse-submodule" option, the
approach taken was to perform fetches in submodules only once, after
all the main fetching (it may usually be a fetch from a single
remote, but it could be fetching from a group of remotes using
fetch_multiple()) succeeded.  Later we added "--all" to fetch from
all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch
--recurse-submodule --all", you'd see a fetch for the top-level,
which invokes another fetch for the submodule, followed by another
fetch for the same submodule.  All but the last fetch for the
submodule come from a "git fetch --recurse-submodules" subprocess
that is spawned via the fetch_multiple() interface for the remotes,
and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for
the top-level in fetch_multiple(), the last fetch in the submodule
is redundant.  It only matters when fetch_one() interacts with a
single remote at the top-level.

While we are at it, there is one optimization that exists in dealing
with a group of remote, but is missing when "--all" is used.  In the
former, when the group turns out to be a group of one, instead of
spawning "git fetch" as a subprocess via the fetch_multiple()
interface, we use the normal fetch_one() code path.  Do the same
when handing "--all", if it turns out that we have only one remote
defined.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * So, here is a mixed thing that fixes the issue in two different
   ways, which makes it unsuitable to be the final patch.  Either
   "turning --all into a single fetch with optimizaiton" that is in
   the first hunk, or "don't do the final sweep unless doing a
   single fetch" that is in the second hunk, is sufficient to make
   the symptom disappear.  Of course, using them both does not break
   anything, but it somehow feels unsatisfactory, and makes readers
   feel that we should be able to do better.

   But the better thing being what Glen alluded to as "non-trivial
   not prohibitivey hard", I'll stop here.

 builtin/fetch.c                    |  6 +++++-
 t/t5617-clone-submodules-remote.sh | 13 +++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

Glen Choo May 16, 2022, 10:56 p.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e3791f09ed..359321e731 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2187,6 +2187,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		else if (argc > 1)
>  			die(_("fetch --all does not make sense with refspecs"));
>  		(void) for_each_remote(get_one_remote_for_fetch, &list);
> +
> +		/* no point doing fetch_multiple() of one */
> +		if (list.nr == 1)
> +			remote = remote_get(list.items[0].string);
>  	} else if (argc == 0) {
>  		/* No arguments -- use default remote */
>  		remote = remote_get(NULL);
> @@ -2261,7 +2265,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		result = fetch_multiple(&list, max_children);
>  	}
>  
> -	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
> +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
>  		struct strvec options = STRVEC_INIT;
>  		int max_children = max_jobs;

IMO "&& remote" is non-inuitive enough to warrant a comment, e.g.

  /* do not update submodules if fetch_multiple() was called */
  if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {

but I suppose that the commit message explains this well enough.

> diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh
> index ca8f80083a..8b6b482a39 100755
> --- a/t/t5617-clone-submodules-remote.sh
> +++ b/t/t5617-clone-submodules-remote.sh
> @@ -72,6 +72,19 @@ test_expect_success 'clone with --single-branch' '
>  	)
>  '
>  
> +test_expect_success 'fetch --all with --recurse-submodules' '
> +	test_when_finished "rm -fr super_clone" &&
> +	git clone --recurse-submodules srv.bare super_clone &&
> +	(
> +		cd super_clone &&
> +		git config submodule.recurse true &&
> +		git config fetch.parallel 0 &&
> +		git fetch --all 2>../fetch-log
> +	) &&
> +	grep "Fetching sub" fetch-log >fetch-subs &&
> +	test_line_count = 1 fetch-subs
> +'
> +

The test looks good, but I think it belongs in
t/t5526-fetch-submodules.sh. I don't see anything else in
t5617-clone-submodules-remote.sh that calls "git fetch".

In addition, I think we need one more test that adds another remote. In
the above test, we only have one remote, so we hit your new optimization
and already pass the test without the need for "&& remote".

Whereas this test fails if we remove "&& remote".

  test_expect_success 'fetch --all with --recurse-submodules and more remotes' '
    test_when_finished "rm -fr super_clone" &&
    git clone --recurse-submodules srv.bare super_clone &&
    cp -r srv.bare srv.bare2 &&
    (
      cd super_clone &&
      git config submodule.recurse true &&
      git config fetch.parallel 0 &&
      git remote add other ../srv.bare2 &&
      git fetch --all 2>../fetch-log
    ) &&
    grep "Fetching sub" fetch-log >fetch-subs &&
    test_line_count = 2 fetch-subs
  '
Junio C Hamano May 16, 2022, 11:33 p.m. UTC | #2
Glen Choo <chooglen@google.com> writes:

> IMO "&& remote" is non-inuitive enough to warrant a comment, e.g.
>
>   /* do not update submodules if fetch_multiple() was called */
>   if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {

As I am not so familiar with this codepath when submodules are
involved, I need to be explained why having called fetch_multiple()
means there is no need to update submodules".

	/* 
	 * This is only needed after fetch_one(), which does not
         * fetch submodules by itself.
	 *
	 * fetch_multiple() has already updated submodules to grab
	 * commits necessary for the fetched history from each remote,
	 * so there is no need to fetch submodules from here.
	 */

perhaps?

> The test looks good, but I think it belongs in
> t/t5526-fetch-submodules.sh. I don't see anything else in
> t5617-clone-submodules-remote.sh that calls "git fetch".

Good eyes.

> In addition, I think we need one more test that adds another remote. In
> the above test, we only have one remote, so we hit your new optimization
> and already pass the test without the need for "&& remote".

True, too.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed..359321e731 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2187,6 +2187,10 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
+
+		/* no point doing fetch_multiple() of one */
+		if (list.nr == 1)
+			remote = remote_get(list.items[0].string);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);
@@ -2261,7 +2265,7 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_multiple(&list, max_children);
 	}
 
-	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
+	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct strvec options = STRVEC_INIT;
 		int max_children = max_jobs;
 
diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh
index ca8f80083a..8b6b482a39 100755
--- a/t/t5617-clone-submodules-remote.sh
+++ b/t/t5617-clone-submodules-remote.sh
@@ -72,6 +72,19 @@  test_expect_success 'clone with --single-branch' '
 	)
 '
 
+test_expect_success 'fetch --all with --recurse-submodules' '
+	test_when_finished "rm -fr super_clone" &&
+	git clone --recurse-submodules srv.bare super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.recurse true &&
+		git config fetch.parallel 0 &&
+		git fetch --all 2>../fetch-log
+	) &&
+	grep "Fetching sub" fetch-log >fetch-subs &&
+	test_line_count = 1 fetch-subs
+'
+
 # do basic partial clone from "srv.bare"
 # confirm partial clone was registered in the local config for super and sub.
 test_expect_success 'clone with --filter' '