diff mbox series

[v3,09/10] fetch: fetch unpopulated, changed submodules

Message ID 20220224100842.95827-10-chooglen@google.com (mailing list archive)
State Superseded
Headers show
Series fetch --recurse-submodules: fetch unpopulated submodules | expand

Commit Message

Glen Choo Feb. 24, 2022, 10:08 a.m. UTC
"git fetch --recurse-submodules" only considers populated
submodules (i.e. submodules that can be found by iterating the index),
which makes "git fetch" behave differently based on which commit is
checked out. As a result, even if the user has initialized all submodules
correctly, they may not fetch the necessary submodule commits, and
commands like "git checkout --recurse-submodules" might fail.

Teach "git fetch" to fetch cloned, changed submodules regardless of
whether they are populated. This is in addition to the current behavior
of fetching populated submodules (which is always attempted regardless
of what was fetched in the superproject, or even if nothing was fetched
in the superproject).

A submodule may be encountered multiple times (via the list of
populated submodules or via the list of changed submodules). When this
happens, "git fetch" only reads the 'populated copy' and ignores the
'changed copy'. Amend the verify_fetch_result() test helper so that we
can assert on which 'copy' is being read.

Signed-off-by: Glen Choo <chooglen@google.com>
---
In the process of writing the new tests [1], I noticed some failures of
the form:

  # rm the submodule's working tree directory.
  $ git rm submodule
  [...]

  # Do a fetch that requires running a child process from the submodule.
  $ git fetch --recurse-submodules same-name-1 
  [...]

  # Fatal error tells us that we cannot chdir to the deleted working
    tree.
  fatal: cannot chdir to '../../../submodule': No such file or directory

This happens because submodules set/unset a value for core.worktree when
they are checked out/"un-checked out" (see submodule_move_head() and
connect_work_tree_and_git_dir()), but "git rm" doesn't know that
core.worktree should be updated.

I've worked around this by passing "--work-tree=." to the child process
[2], but this feels like a hack, especially because this bug should
affect all child processes in a "git rm"-ed submodule (this probably
includes the "git branch" processes in gc/branch-recurse-submodules, but
I haven't confirmed it yet). Some more comprehensive solutions that
could be future work are:

- Teach "git [add|rm]" to unset core.worktree (the reverse operation,
  "git restore", should already do the correct thing). This won't detect
  submodules removed with "rm -r" though.
- Teach submodule child processes to ignore stale core.worktree values.
- Do more things in-core instead of using child processes (avoiding the
  failing chdir() call).

I'm not sure what future work we should pursue, or even if the
"--work-tree=." workaround is even good, so I'd appreciate feedback
here.

[1] There is a similar, preexisting test that also removes the
submodules. However, that test isn't affected because it invokes "git
checkout" after doing "git rm".
[2] Since the submodule has a git dir but no working tree, I also tried
working around the bug by passing "--bare". However, this doesn't work
because work tree settings override "bare-ness" settings, as described
by t/t1510-repo-setup.sh.

 Documentation/fetch-options.txt |  26 ++--
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 125 +++++++++++++--
 submodule.h                     |  12 +-
 t/t5526-fetch-submodules.sh     | 260 +++++++++++++++++++++++++++++++-
 6 files changed, 404 insertions(+), 43 deletions(-)

Comments

Junio C Hamano Feb. 24, 2022, 9:30 p.m. UTC | #1
Glen Choo <chooglen@google.com> writes:

> +	char *path;
>  	/* The submodule commits that have changed in the rev walk. */
>  	struct oid_array new_commits;
>  };
> @@ -818,6 +828,7 @@ struct changed_submodule_data {
>  static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
>  {
>  	oid_array_clear(&cs_data->new_commits);
> +	free(cs_data->path);

OK.

>  }
>  
>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>  		if (!item->util)
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>  		cs_data = item->util;
> +		cs_data->super_oid = commit_oid;
> +		cs_data->path = xstrdup(p->two->path);

Iffy.  If item->util were populated already, wouldn't cs_data
already have its .path member pointing at an allocated piece of
memory?  Can we safely free it before assigning a new value, or does
somebody else still have a copy of .path and we cannot free it?
Junio C Hamano Feb. 25, 2022, 12:33 a.m. UTC | #2
Glen Choo <chooglen@google.com> writes:


> +static struct fetch_task *
> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
> +			    struct strbuf *err)
> +{
> +	for (; spf->changed_count < spf->changed_submodule_names.nr;
> +	     spf->changed_count++) {
> +		struct string_list_item item =
> +			spf->changed_submodule_names.items[spf->changed_count];
> +		struct changed_submodule_data *cs_data = item.util;
> +		struct fetch_task *task;
> +
> +		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
> +			continue;

Where does this function come from?  I seem to be getting compilation errors.

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index ee4dd5a4a9..639290d30d 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -15,8 +15,9 @@ pwd=$(pwd)
>  
>  check_sub() {

Style.  

	check_sub () {

>  	NEW_HEAD=$1 &&
> +	SUPER_HEAD=$2 &&
>  	cat <<-EOF >$pwd/expect.err.sub

Style.

	cat <<-EOF >"$pwd/expect.err.sub"

You may swap the order of redirection (having <<here-doc at the end
of the line might look more familiar to some people).  Try to do as
majority of surrounding code does.

Make sure you quote the redirection target filename if it involves
variable interpolation (see Documentation/CodingGuidelines, look for
"Redirection").

> +	cat <<-EOF > expect.err.combined &&

Style.

	cat <<-EOF >expect.err.combined &&

No SP between redirection operator and its target.

> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&

No ERE in sed.  "[0-9a-f][0-9a-f]*" instead of "[0-9a-f]+" should be
sufficient, I think.
Jonathan Tan Feb. 25, 2022, 12:39 a.m. UTC | #3
Glen Choo <chooglen@google.com> writes:
> In the process of writing the new tests [1], I noticed some failures of
> the form:
> 
>   # rm the submodule's working tree directory.
>   $ git rm submodule
>   [...]
> 
>   # Do a fetch that requires running a child process from the submodule.
>   $ git fetch --recurse-submodules same-name-1 
>   [...]
> 
>   # Fatal error tells us that we cannot chdir to the deleted working
>     tree.
>   fatal: cannot chdir to '../../../submodule': No such file or directory
> 
> This happens because submodules set/unset a value for core.worktree when
> they are checked out/"un-checked out" (see submodule_move_head() and
> connect_work_tree_and_git_dir()), but "git rm" doesn't know that
> core.worktree should be updated.
> 
> I've worked around this by passing "--work-tree=." to the child process
> [2], but this feels like a hack, especially because this bug should
> affect all child processes in a "git rm"-ed submodule (this probably
> includes the "git branch" processes in gc/branch-recurse-submodules, but
> I haven't confirmed it yet). 

Ah...that's a tricky bug. Thanks for finding it.

> Some more comprehensive solutions that
> could be future work are:
> 
> - Teach "git [add|rm]" to unset core.worktree (the reverse operation,
>   "git restore", should already do the correct thing). This won't detect
>   submodules removed with "rm -r" though.

This might work with the caveat you mentioned.

> - Teach submodule child processes to ignore stale core.worktree values.

This might work, coupled with Emily Shaffer's work on teaching
submodules to know that they're submodules (so we know when a stale
core.worktree can be safely ignored).

[1] https://lore.kernel.org/git/20220203215914.683922-1-emilyshaffer@google.com/

> - Do more things in-core instead of using child processes (avoiding the
>   failing chdir() call).

This might not work if the invocation needs to check the worktree (for
example, as far as I know, we won't delete a branch if it's currently
checked out in a worktree).

> I'm not sure what future work we should pursue, or even if the
> "--work-tree=." workaround is even good, so I'd appreciate feedback
> here.

I can't think of better solutions than what you listed, unfortunately. I
also can't think of a better workaround, but at least it's narrowly
scoped: we know that we're running on a submodule and that the operation
is not affected by a worktree (for example, we're fetching, but we know
we're not fetching with a refspec that updates a currently checked out
branch). Let's see what others have to say.

> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>  		if (!item->util)
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>  		cs_data = item->util;
> +		cs_data->super_oid = commit_oid;
> +		cs_data->path = xstrdup(p->two->path);

Junio mentioned the possibility of cs_data->path already being non-NULL
[1].

[1] https://lore.kernel.org/git/xmqqy220j6kf.fsf@gitster.g/

> @@ -1253,14 +1266,33 @@ void check_for_new_submodule_commits(struct object_id *oid)
>  	oid_array_append(&ref_tips_after_fetch, oid);
>  }
>  
> +/*
> + * Returns 1 if there is at least one submodule gitdir in
> + * $GIT_DIR/modules and 0 otherwise. This follows
> + * submodule_name_to_gitdir(), which looks for submodules in
> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
> + *
> + * A submodule can be moved to $GIT_DIR/modules manually by running "git
> + * submodule absorbgitdirs", or it may be initialized there by "git
> + * submodule update".
> + */
> +static int repo_has_absorbed_submodules(struct repository *r)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_repo_git_path(&buf, r, "modules/");
> +	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
> +}

buf needs to be released?

> @@ -1333,7 +1365,8 @@ int submodule_touches_in_range(struct repository *r,
>  }
>  
>  struct submodule_parallel_fetch {
> -	int count;
> +	int index_count;
> +	int changed_count;
>  	struct strvec args;
>  	struct repository *r;
>  	const char *prefix;

If we're sticking with these names, probably worth a comment. E.g.
"index_count" is the number of submodules in <name of field that this is
an index of> that we have processed, and likewise for "changed_count".

> @@ -1343,6 +1376,7 @@ struct submodule_parallel_fetch {
>  	int result;
>  
>  	struct string_list changed_submodule_names;
> +	struct string_list seen_submodule_names;
>  
>  	/* Pending fetches by OIDs */
>  	struct fetch_task **oid_fetch_tasks;

Also here - changed is the list that we generated from walking the
fetched superproject commits, and seen is the list of submodules we've
processed in <name of function>.

> @@ -1539,11 +1582,64 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)

[snip]

> +		/*
> +		 * NEEDSWORK: A submodule unpopulated by "git rm" will
> +		 * have core.worktree set, but the actual core.worktree
> +		 * directory won't exist, causing the child process to
> +		 * fail. Forcibly set --work-tree until we get smarter
> +		 * handling for core.worktree in unpopulated submodules.
> +		 */
> +		strvec_push(&task->git_args, "--work-tree=.");
> +		return task;
> +	}
> +	return NULL;
> +}

If we end up sticking to this workaround (which sounds reasonable to
me), the comment here probably should contain a lot of what was written
under the "---" in the commit message.

> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '

[snip]

> +# In downstream, init "submodule2", but do not check it out while
> +# fetching. This lets us assert that unpopulated submodules can be
> +# fetched.
> +test_expect_success 'setup downstream branch with other submodule' '
> +	mkdir submodule2 &&
> +	(
> +		cd submodule2 &&
> +		git init &&
> +		echo sub2content >sub2file &&
> +		git add sub2file &&
> +		git commit -a -m new &&
> +		git branch -M sub2
> +	) &&
> +	git checkout -b super-sub2-only &&
> +	git submodule add "$pwd/submodule2" submodule2 &&
> +	git commit -m "add sub2" &&
> +	git checkout super &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin &&
> +		git checkout super-sub2-only &&
> +		# Explicitly run "git submodule update" because sub2 is new
> +		# and has not been cloned.
> +		git submodule update --init &&
> +		git checkout --recurse-submodules super
> +	)
> +'

Hmm...what is the difference between this and the original case in which
the index has no submodules? Both assert that unpopulated submodules
(submodules that cannot be found by iterating the index, as described in
your commit message) can be fetched.

> +	# Use test_cmp manually because verify_fetch_result does not
> +	# consider submodule2. All the repos should be fetched, but only
> +	# submodule2 should be read from a commit
> +	cat <<-EOF > expect.err.combined &&
> +	From $pwd/.
> +	   OLD_HEAD..$super_head  super           -> origin/super
> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
> +	Fetching submodule submodule
> +	From $pwd/submodule
> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
> +	Fetching submodule submodule2 at commit $super_sub2_only_head
> +	From $pwd/submodule2
> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
> +	EOF
> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
> +	test_cmp expect.err.combined actual.err.cmp
> +'

Could verify_fetch_result be modified to consider the new submodule
instead?

> +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
> +	test_when_finished "git -C same-name-downstream checkout master" &&
> +	(
> +		cd same-name-1 &&
> +		test_commit -C submodule --no-tag b1 &&
> +		git add submodule &&
> +		git commit -m "super-b1"
> +	) &&
> +	(
> +		cd same-name-2 &&
> +		test_commit -C submodule --no-tag b2 &&
> +		git add submodule &&
> +		git commit -m "super-b2"
> +	) &&
> +	(
> +		cd same-name-downstream &&
> +		# even though the .gitmodules is correct, we cannot
> +		# fetch from same-name-2
> +		git checkout same-name-2/master &&
> +		git fetch --recurse-submodules same-name-1 &&
> +		test_must_fail git fetch --recurse-submodules same-name-2

What's the error message printed to the user here? (Just from reading
the code, I would have expected this to succeed, with the submodule
fetch being from same-name-1's submodule since we're fetching submodules
by name, but apparently that is not the case.)

> +	(
> +		cd same-name-downstream/.git/modules/submodule &&
> +		# The submodule has core.worktree pointing to the "git
> +		# rm"-ed directory, overwrite the invalid value.
> +		git --work-tree=. cat-file -e $head1 &&
> +		test_must_fail git --work-tree=. cat-file -e $head2
> +	)

Regarding the worktree workaround, also say "see comment in
get_fetch_task() for more information" or something like that.
Glen Choo Feb. 25, 2022, 3:04 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> +	char *path;
>>  	/* The submodule commits that have changed in the rev walk. */
>>  	struct oid_array new_commits;
>>  };
>> @@ -818,6 +828,7 @@ struct changed_submodule_data {
>>  static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
>>  {
>>  	oid_array_clear(&cs_data->new_commits);
>> +	free(cs_data->path);
>
> OK.
>
>>  }
>>  
>>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>>  		if (!item->util)
>>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>>  		cs_data = item->util;
>> +		cs_data->super_oid = commit_oid;
>> +		cs_data->path = xstrdup(p->two->path);
>
> Iffy.  If item->util were populated already, wouldn't cs_data
> already have its .path member pointing at an allocated piece of
> memory?  Can we safely free it before assigning a new value, or does
> somebody else still have a copy of .path and we cannot free it?

Great catch! This is a silly mistake, it looks like this because I
copied the pattern that we used to _append_ new commit oids, but
.super_oid and .path aren't appended, they're replaced.

But we don't even need to replace .super_oid and .path, we can use the
first values we encounter and ignore subsequent ones.
Glen Choo Feb. 25, 2022, 3:07 a.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>
>> +static struct fetch_task *
>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
>> +			    struct strbuf *err)
>> +{
>> +	for (; spf->changed_count < spf->changed_submodule_names.nr;
>> +	     spf->changed_count++) {
>> +		struct string_list_item item =
>> +			spf->changed_submodule_names.items[spf->changed_count];
>> +		struct changed_submodule_data *cs_data = item.util;
>> +		struct fetch_task *task;
>> +
>> +		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
>> +			continue;
>
> Where does this function come from?  I seem to be getting compilation errors.

Sorry, this was introduced in gc/branch-recurse-submodules, but I
neglected to mention that I used that as the base in v1.

>> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
>> index ee4dd5a4a9..639290d30d 100755
>> --- a/t/t5526-fetch-submodules.sh
>> +++ b/t/t5526-fetch-submodules.sh
>> @@ -15,8 +15,9 @@ pwd=$(pwd)
>>  
>>  check_sub() {
>
> Style.  
>
> 	check_sub () {
>
>>  	NEW_HEAD=$1 &&
>> +	SUPER_HEAD=$2 &&
>>  	cat <<-EOF >$pwd/expect.err.sub
>
> Style.
>
> 	cat <<-EOF >"$pwd/expect.err.sub"
>
> You may swap the order of redirection (having <<here-doc at the end
> of the line might look more familiar to some people).  Try to do as
> majority of surrounding code does.
>
> Make sure you quote the redirection target filename if it involves
> variable interpolation (see Documentation/CodingGuidelines, look for
> "Redirection").
>
>> +	cat <<-EOF > expect.err.combined &&
>
> Style.
>
> 	cat <<-EOF >expect.err.combined &&
>
> No SP between redirection operator and its target.
>
>> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
>
> No ERE in sed.  "[0-9a-f][0-9a-f]*" instead of "[0-9a-f]+" should be
> sufficient, I think.

Thanks :)
Glen Choo Feb. 25, 2022, 3:46 a.m. UTC | #6
Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> I'm not sure what future work we should pursue, or even if the
>> "--work-tree=." workaround is even good, so I'd appreciate feedback
>> here.
>
> I can't think of better solutions than what you listed, unfortunately. I
> also can't think of a better workaround, but at least it's narrowly
> scoped: we know that we're running on a submodule and that the operation
> is not affected by a worktree (for example, we're fetching, but we know
> we're not fetching with a refspec that updates a currently checked out
> branch). Let's see what others have to say.

Thanks for sharing your thoughts :)

>> @@ -1253,14 +1266,33 @@ void check_for_new_submodule_commits(struct object_id *oid)
>>  	oid_array_append(&ref_tips_after_fetch, oid);
>>  }
>>  
>> +/*
>> + * Returns 1 if there is at least one submodule gitdir in
>> + * $GIT_DIR/modules and 0 otherwise. This follows
>> + * submodule_name_to_gitdir(), which looks for submodules in
>> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
>> + *
>> + * A submodule can be moved to $GIT_DIR/modules manually by running "git
>> + * submodule absorbgitdirs", or it may be initialized there by "git
>> + * submodule update".
>> + */
>> +static int repo_has_absorbed_submodules(struct repository *r)
>> +{
>> +	struct strbuf buf = STRBUF_INIT;
>> +
>> +	strbuf_repo_git_path(&buf, r, "modules/");
>> +	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
>> +}
>
> buf needs to be released?

Ah, thanks.

>> @@ -1333,7 +1365,8 @@ int submodule_touches_in_range(struct repository *r,
>>  }
>>  
>>  struct submodule_parallel_fetch {
>> -	int count;
>> +	int index_count;
>> +	int changed_count;
>>  	struct strvec args;
>>  	struct repository *r;
>>  	const char *prefix;
>
> If we're sticking with these names, probably worth a comment. E.g.
> "index_count" is the number of submodules in <name of field that this is
> an index of> that we have processed, and likewise for "changed_count".
>
>> @@ -1343,6 +1376,7 @@ struct submodule_parallel_fetch {
>>  	int result;
>>  
>>  	struct string_list changed_submodule_names;
>> +	struct string_list seen_submodule_names;
>>  
>>  	/* Pending fetches by OIDs */
>>  	struct fetch_task **oid_fetch_tasks;
>
> Also here - changed is the list that we generated from walking the
> fetched superproject commits, and seen is the list of submodules we've
> processed in <name of function>.

Makes sense.

>> @@ -1539,11 +1582,64 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
>
> [snip]
>
>> +		/*
>> +		 * NEEDSWORK: A submodule unpopulated by "git rm" will
>> +		 * have core.worktree set, but the actual core.worktree
>> +		 * directory won't exist, causing the child process to
>> +		 * fail. Forcibly set --work-tree until we get smarter
>> +		 * handling for core.worktree in unpopulated submodules.
>> +		 */
>> +		strvec_push(&task->git_args, "--work-tree=.");
>> +		return task;
>> +	}
>> +	return NULL;
>> +}
>
> If we end up sticking to this workaround (which sounds reasonable to
> me), the comment here probably should contain a lot of what was written
> under the "---" in the commit message.

I assume this includes documenting solutions (like your NEEDSWORK
comment on submodule_name_to_gitdir()) and why core.worktree isn't
usually a problem (because checkout et al do the right thing).

>> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
>
> [snip]
>
>> +# In downstream, init "submodule2", but do not check it out while
>> +# fetching. This lets us assert that unpopulated submodules can be
>> +# fetched.
>> +test_expect_success 'setup downstream branch with other submodule' '
>> +	mkdir submodule2 &&
>> +	(
>> +		cd submodule2 &&
>> +		git init &&
>> +		echo sub2content >sub2file &&
>> +		git add sub2file &&
>> +		git commit -a -m new &&
>> +		git branch -M sub2
>> +	) &&
>> +	git checkout -b super-sub2-only &&
>> +	git submodule add "$pwd/submodule2" submodule2 &&
>> +	git commit -m "add sub2" &&
>> +	git checkout super &&
>> +	(
>> +		cd downstream &&
>> +		git fetch --recurse-submodules origin &&
>> +		git checkout super-sub2-only &&
>> +		# Explicitly run "git submodule update" because sub2 is new
>> +		# and has not been cloned.
>> +		git submodule update --init &&
>> +		git checkout --recurse-submodules super
>> +	)
>> +'
>
> Hmm...what is the difference between this and the original case in which
> the index has no submodules? Both assert that unpopulated submodules
> (submodules that cannot be found by iterating the index, as described in
> your commit message) can be fetched.

In the previous test, the index has no submodules (it's completely empty
in fact, so we don't iterate the index at all), but in this test, it
does. This lets us check that there aren't any buggy interactions when
both changed and index submodules are present.

I think such mistakes are pretty easy to introduce on accident - I made
one pre-v1 where I reused .count between both iterators (instead
of having .index_count and .changed_count). It passed the previous test
because we didn't care about the index, but it obviously wouldn't pass
this one.

>> +	# Use test_cmp manually because verify_fetch_result does not
>> +	# consider submodule2. All the repos should be fetched, but only
>> +	# submodule2 should be read from a commit
>> +	cat <<-EOF > expect.err.combined &&
>> +	From $pwd/.
>> +	   OLD_HEAD..$super_head  super           -> origin/super
>> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
>> +	Fetching submodule submodule
>> +	From $pwd/submodule
>> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
>> +	Fetching submodule submodule/subdir/deepsubmodule
>> +	From $pwd/deepsubmodule
>> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
>> +	Fetching submodule submodule2 at commit $super_sub2_only_head
>> +	From $pwd/submodule2
>> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
>> +	EOF
>> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
>> +	test_cmp expect.err.combined actual.err.cmp
>> +'
>
> Could verify_fetch_result be modified to consider the new submodule
> instead?

Since submodule2 is on the end of the file, I could modify
verify_fetch_result() to concatenate extra text on the end. But if it
weren't in the middle, we'd need to insert arbitrary text in the middle
of the file.

I can't think of a good way to do this without compromising test
readability, so I'll just do concatenation for now.

>> +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
>> +	test_when_finished "git -C same-name-downstream checkout master" &&
>> +	(
>> +		cd same-name-1 &&
>> +		test_commit -C submodule --no-tag b1 &&
>> +		git add submodule &&
>> +		git commit -m "super-b1"
>> +	) &&
>> +	(
>> +		cd same-name-2 &&
>> +		test_commit -C submodule --no-tag b2 &&
>> +		git add submodule &&
>> +		git commit -m "super-b2"
>> +	) &&
>> +	(
>> +		cd same-name-downstream &&
>> +		# even though the .gitmodules is correct, we cannot
>> +		# fetch from same-name-2
>> +		git checkout same-name-2/master &&
>> +		git fetch --recurse-submodules same-name-1 &&
>> +		test_must_fail git fetch --recurse-submodules same-name-2
>
> What's the error message printed to the user here? (Just from reading
> the code, I would have expected this to succeed, with the submodule
> fetch being from same-name-1's submodule since we're fetching submodules
> by name, but apparently that is not the case.)

Yeah, I think this might trip up some readers. The message is:

  From ../same-name-2
    b7ebb59..944b5ac  master     -> same-name-2/master
  Fetching submodule submodule
  fatal: git upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
  fatal: remote error: upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
  Errors during submodule fetch:
          submodule

Which, I believe, comes from how we fetch commits by oid:

  static int get_next_submodule(struct child_process *cp, struct strbuf *err,
              void *data, void **task_cb)
  [...]
    oid_array_for_each_unique(task->commits,
          append_oid_to_argv, &cp->args);

When the following is true:

- the submodule is found in the index
- we are fetching submodules unconditionally (--recurse-submodules=yes")
- no superproject commit "changes" the submodule

task->commits is empty, and we just fetch the from the submodule's
remote by name. But as long as any superproject commit "changes" the
submodule, we try to fetch by oid, which, as this test demonstrates, may
fail.

>
>> +	(
>> +		cd same-name-downstream/.git/modules/submodule &&
>> +		# The submodule has core.worktree pointing to the "git
>> +		# rm"-ed directory, overwrite the invalid value.
>> +		git --work-tree=. cat-file -e $head1 &&
>> +		test_must_fail git --work-tree=. cat-file -e $head2
>> +	)
>
> Regarding the worktree workaround, also say "see comment in
> get_fetch_task() for more information" or something like that.

Makes sense.
Junio C Hamano Feb. 26, 2022, 6:53 p.m. UTC | #7
A few tests added by this patch have been failing on one specific
job (linux-gcc ubuntu-latest) at GitHub CI.

https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520

    Side note: you may need to be logged in to GitHub to view them.
    These two use different versions of CI to show the test traces;
    in the latter you may have to click on right-facing rectangle on
    the line with label "5520" to see the breakage.

I think there is some baked-in assumption in the failing test what
the name of the initial branch by default is, which may be the reason
why this particular job fails while others don't.

Can you take a look at it?

Thanks.
Johannes Schindelin March 1, 2022, 8:24 p.m. UTC | #8
Hi,

On Sat, 26 Feb 2022, Junio C Hamano wrote:

> A few tests added by this patch have been failing on one specific
> job (linux-gcc ubuntu-latest) at GitHub CI.
>
> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>
>     Side note: you may need to be logged in to GitHub to view them.
>     These two use different versions of CI to show the test traces;
>     in the latter you may have to click on right-facing rectangle on
>     the line with label "5520" to see the breakage.
>
> I think there is some baked-in assumption in the failing test what
> the name of the initial branch by default is, which may be the reason
> why this particular job fails while others don't.
>
> Can you take a look at it?

The log says this:

-- snip --
[...]
  + git commit -m super-b2
  [main 00b85ba] super-b2
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+), 1 deletion(-)
  + cd same-name-downstream
  + git checkout same-name-2/master
  error: pathspec 'same-name-2/master' did not match any file(s) known to git
  error: last command exited with $?=1
  + git -C same-name-downstream checkout master
  error: pathspec 'master' did not match any file(s) known to git
  + eval_ret=1
  + :
  not ok 49 - fetch --recurse-submodules updates name-conflicted, populated submodule

  #
  #		test_when_finished "git -C same-name-downstream checkout master" &&
  #		(
  #			cd same-name-1 &&
  #			test_commit -C submodule --no-tag b1 &&
  #			git add submodule &&
  #			git commit -m "super-b1"
  #		) &&
  #		(
  #			cd same-name-2 &&
  #			test_commit -C submodule --no-tag b2 &&
  #			git add submodule &&
  #			git commit -m "super-b2"
  #		) &&
  #		(
  #			cd same-name-downstream &&
  #			# even though the .gitmodules is correct, we
  #			cannot
  #			# fetch from same-name-2
  #			git checkout same-name-2/master &&
  #			git fetch --recurse-submodules same-name-1 &&
  #			test_must_fail git fetch --recurse-submodules
  #			same-name-2
  #		) &&
  #		super_head1=$(git -C same-name-1 rev-parse HEAD) &&
  #		git -C same-name-downstream cat-file -e $super_head1 &&
  #
  #		super_head2=$(git -C same-name-2 rev-parse HEAD) &&
  #		git -C same-name-downstream cat-file -e $super_head2 &&
  #
  #		sub_head1=$(git -C same-name-1/submodule rev-parse HEAD)
  #		&&
  #		git -C same-name-downstream/submodule cat-file -e
  #		$sub_head1 &&
  #
  #		sub_head2=$(git -C same-name-2/submodule rev-parse HEAD)
  #		&&
  #		test_must_fail git -C same-name-downstream/submodule
  #		cat-file -e $sub_head2
  #
-- snap --

So yes, there is a lot of `master`ing going on.

I _think_ the remedy will be to use the `-b <branch-name>` option of `git
init` in
https://github.com/git/git/blob/82dd0cbc7fcf2985a3dcfbd99caa9f80626b00df/t/t5526-fetch-submodules.sh#L1015
in
https://github.com/git/git/blob/82dd0cbc7fcf2985a3dcfbd99caa9f80626b00df/t/t5526-fetch-submodules.sh#L1024
and in
https://github.com/git/git/blob/82dd0cbc7fcf2985a3dcfbd99caa9f80626b00df/t/t5526-fetch-submodules.sh#L1033
e.g.

	git -C submodule init -b main

At least that's how _I_ tried to address similar issues in the test suite
in the past.

Ciao,
Dscho
Junio C Hamano March 1, 2022, 8:32 p.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

> A few tests added by this patch have been failing on one specific
> job (linux-gcc ubuntu-latest) at GitHub CI.
>
> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>
>     Side note: you may need to be logged in to GitHub to view them.
>     These two use different versions of CI to show the test traces;
>     in the latter you may have to click on right-facing rectangle on
>     the line with label "5520" to see the breakage.
>
> I think there is some baked-in assumption in the failing test what
> the name of the initial branch by default is, which may be the reason
> why this particular job fails while others don't.
>
> Can you take a look at it?
>
> Thanks.

In case you haven't noticed, this is what I have near the tip of the
topic to fix it.

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index a395d2b979..9415a1e7c0 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -3,6 +3,8 @@
 
 test_description='Recursive "git fetch" for submodules'
 
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
 export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
Junio C Hamano March 1, 2022, 8:33 p.m. UTC | #10
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Sat, 26 Feb 2022, Junio C Hamano wrote:
>
>> A few tests added by this patch have been failing on one specific
>> job (linux-gcc ubuntu-latest) at GitHub CI.
>>
>> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
>> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>>
>>     Side note: you may need to be logged in to GitHub to view them.
>>     These two use different versions of CI to show the test traces;
>>     in the latter you may have to click on right-facing rectangle on
>>     the line with label "5520" to see the breakage.
>>
>> I think there is some baked-in assumption in the failing test what
>> the name of the initial branch by default is, which may be the reason
>> why this particular job fails while others don't.
>>
>> Can you take a look at it?
>
> The log says this:
> ...
> At least that's how _I_ tried to address similar issues in the test suite
> in the past.

Yes, I had a squashable fix/workaround queued since last night.
Glen Choo March 2, 2022, 11:25 p.m. UTC | #11
Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Hi,
>>
>> On Sat, 26 Feb 2022, Junio C Hamano wrote:
>>
>>> A few tests added by this patch have been failing on one specific
>>> job (linux-gcc ubuntu-latest) at GitHub CI.
>>>
>>> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
>>> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>>>
>>>     Side note: you may need to be logged in to GitHub to view them.
>>>     These two use different versions of CI to show the test traces;
>>>     in the latter you may have to click on right-facing rectangle on
>>>     the line with label "5520" to see the breakage.
>>>
>>> I think there is some baked-in assumption in the failing test what
>>> the name of the initial branch by default is, which may be the reason
>>> why this particular job fails while others don't.
>>>
>>> Can you take a look at it?
>>
>> The log says this:
>> ...
>> At least that's how _I_ tried to address similar issues in the test suite
>> in the past.
>
> Yes, I had a squashable fix/workaround queued since last night.

Thanks, both! I especially appreciate the pointers because I couldn't
remember how to set the default branch off the top of my head.
Jonathan Tan March 4, 2022, 11:46 p.m. UTC | #12
Glen Choo <chooglen@google.com> writes:
> >> +	# Use test_cmp manually because verify_fetch_result does not
> >> +	# consider submodule2. All the repos should be fetched, but only
> >> +	# submodule2 should be read from a commit
> >> +	cat <<-EOF > expect.err.combined &&
> >> +	From $pwd/.
> >> +	   OLD_HEAD..$super_head  super           -> origin/super
> >> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
> >> +	Fetching submodule submodule
> >> +	From $pwd/submodule
> >> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
> >> +	Fetching submodule submodule/subdir/deepsubmodule
> >> +	From $pwd/deepsubmodule
> >> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
> >> +	Fetching submodule submodule2 at commit $super_sub2_only_head
> >> +	From $pwd/submodule2
> >> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
> >> +	EOF
> >> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
> >> +	test_cmp expect.err.combined actual.err.cmp
> >> +'
> >
> > Could verify_fetch_result be modified to consider the new submodule
> > instead?
> 
> Since submodule2 is on the end of the file, I could modify
> verify_fetch_result() to concatenate extra text on the end. But if it
> weren't in the middle, we'd need to insert arbitrary text in the middle
> of the file.
> 
> I can't think of a good way to do this without compromising test
> readability, so I'll just do concatenation for now.

Looking at it, I think you can do it by adding a section that verifies
the "Fetching submodule submodule2" part if the file is present (so, no
change in behavior in the rest of the tests since they don't write this
file) and also modifying check_super to allow specification of the sub2
part (or making a new function for this).

> > What's the error message printed to the user here? (Just from reading
> > the code, I would have expected this to succeed, with the submodule
> > fetch being from same-name-1's submodule since we're fetching submodules
> > by name, but apparently that is not the case.)
> 
> Yeah, I think this might trip up some readers. The message is:
> 
>   From ../same-name-2
>     b7ebb59..944b5ac  master     -> same-name-2/master
>   Fetching submodule submodule
>   fatal: git upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>   fatal: remote error: upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>   Errors during submodule fetch:
>           submodule
> 
> Which, I believe, comes from how we fetch commits by oid:
> 
>   static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>               void *data, void **task_cb)
>   [...]
>     oid_array_for_each_unique(task->commits,
>           append_oid_to_argv, &cp->args);
> 
> When the following is true:
> 
> - the submodule is found in the index
> - we are fetching submodules unconditionally (--recurse-submodules=yes")
> - no superproject commit "changes" the submodule
> 
> task->commits is empty, and we just fetch the from the submodule's
> remote by name. But as long as any superproject commit "changes" the
> submodule, we try to fetch by oid, which, as this test demonstrates, may
> fail.

Ah, so we try to fetch an OID from a submodule given by a fetched
commit, which is different from the submodule the client already has
locally. This might be a sign that we need to store more information
about the submodule so that we can print a clearer message. I haven't
looked into this deeply, but this might be possible by putting more
information in the util of changed_submodule_names, and when we have
already seen that submodule, to add more information to the util instead
of skipping it.
Jonathan Tan March 4, 2022, 11:53 p.m. UTC | #13
Glen Choo <chooglen@google.com> writes:
> >> +# In downstream, init "submodule2", but do not check it out while
> >> +# fetching. This lets us assert that unpopulated submodules can be
> >> +# fetched.
> >> +test_expect_success 'setup downstream branch with other submodule' '
> >> +	mkdir submodule2 &&
> >> +	(
> >> +		cd submodule2 &&
> >> +		git init &&
> >> +		echo sub2content >sub2file &&
> >> +		git add sub2file &&
> >> +		git commit -a -m new &&
> >> +		git branch -M sub2
> >> +	) &&
> >> +	git checkout -b super-sub2-only &&
> >> +	git submodule add "$pwd/submodule2" submodule2 &&
> >> +	git commit -m "add sub2" &&
> >> +	git checkout super &&
> >> +	(
> >> +		cd downstream &&
> >> +		git fetch --recurse-submodules origin &&
> >> +		git checkout super-sub2-only &&
> >> +		# Explicitly run "git submodule update" because sub2 is new
> >> +		# and has not been cloned.
> >> +		git submodule update --init &&
> >> +		git checkout --recurse-submodules super
> >> +	)
> >> +'
> >
> > Hmm...what is the difference between this and the original case in which
> > the index has no submodules? Both assert that unpopulated submodules
> > (submodules that cannot be found by iterating the index, as described in
> > your commit message) can be fetched.
> 
> In the previous test, the index has no submodules (it's completely empty
> in fact, so we don't iterate the index at all), but in this test, it
> does. This lets us check that there aren't any buggy interactions when
> both changed and index submodules are present.
> 
> I think such mistakes are pretty easy to introduce on accident - I made
> one pre-v1 where I reused .count between both iterators (instead
> of having .index_count and .changed_count). It passed the previous test
> because we didn't care about the index, but it obviously wouldn't pass
> this one.

In that case, describe this difference (one has no submodules in index,
one has other submodules in index) and maybe position this so that both
test cases (the no-submodule-in-index one and the
other-submodule-in-index one) are next to each other.
Glen Choo March 5, 2022, 12:22 a.m. UTC | #14
Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> > What's the error message printed to the user here? (Just from reading
>> > the code, I would have expected this to succeed, with the submodule
>> > fetch being from same-name-1's submodule since we're fetching submodules
>> > by name, but apparently that is not the case.)
>> 
>> Yeah, I think this might trip up some readers. The message is:
>> 
>>   From ../same-name-2
>>     b7ebb59..944b5ac  master     -> same-name-2/master
>>   Fetching submodule submodule
>>   fatal: git upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>>   fatal: remote error: upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>>   Errors during submodule fetch:
>>           submodule
>> 
>> Which, I believe, comes from how we fetch commits by oid:
>> 
>>   static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>>               void *data, void **task_cb)
>>   [...]
>>     oid_array_for_each_unique(task->commits,
>>           append_oid_to_argv, &cp->args);
>> 
>> When the following is true:
>> 
>> - the submodule is found in the index
>> - we are fetching submodules unconditionally (--recurse-submodules=yes")
>> - no superproject commit "changes" the submodule
>> 
>> task->commits is empty, and we just fetch the from the submodule's
>> remote by name. But as long as any superproject commit "changes" the
>> submodule, we try to fetch by oid, which, as this test demonstrates, may
>> fail.
>
> Ah, so we try to fetch an OID from a submodule given by a fetched
> commit, which is different from the submodule the client already has
> locally. This might be a sign that we need to store more information
> about the submodule so that we can print a clearer message. I haven't
> looked into this deeply, but this might be possible by putting more
> information in the util of changed_submodule_names, and when we have
> already seen that submodule, to add more information to the util instead
> of skipping it.

Storing the submodule URL might achieve this purpose, but if the URL
doesn't match, I think we'd want to skip the commit instead of trying to
fetch a commit from an unrelated URL. I don't know if this is a good
idea though yet - I haven't looked deeply into what Git uses the URL for
and whether users might want to change the URL even though the submodule
is the 'same' (e.g. pointing the URL to another remote instead of having
two completely different repos with the same submodule name).
diff mbox series

Patch

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index e967ff1874..38dad13683 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -185,15 +185,23 @@  endif::git-pull[]
 ifndef::git-pull[]
 --recurse-submodules[=yes|on-demand|no]::
 	This option controls if and under what conditions new commits of
-	populated submodules should be fetched too. It can be used as a
-	boolean option to completely disable recursion when set to 'no' or to
-	unconditionally recurse into all populated submodules when set to
-	'yes', which is the default when this option is used without any
-	value. Use 'on-demand' to only recurse into a populated submodule
-	when the superproject retrieves a commit that updates the submodule's
-	reference to a commit that isn't already in the local submodule
-	clone. By default, 'on-demand' is used, unless
-	`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
+	submodules should be fetched too. When recursing through submodules,
+	`git fetch` always attempts to fetch "changed" submodules, that is, a
+	submodule that has commits that are referenced by a newly fetched
+	superproject commit but are missing in the local submodule clone. A
+	changed submodule can be fetched as long as it is present locally e.g.
+	in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
+	adds a new submodule, that submodule cannot be fetched until it is
+	cloned e.g. by `git submodule update`.
++
+When set to 'on-demand', only changed submodules are fetched. When set
+to 'yes', all populated submodules are fetched and submodules that are
+both unpopulated and changed are fetched. When set to 'no', submodules
+are never fetched.
++
+When unspecified, this uses the value of `fetch.recurseSubmodules` if it
+is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
+When this option is used without any value, it defaults to 'yes'.
 endif::git-pull[]
 
 -j::
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index 550c16ca61..e9d364669a 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -287,12 +287,10 @@  include::transfer-data-leaks.txt[]
 
 BUGS
 ----
-Using --recurse-submodules can only fetch new commits in already checked
-out submodules right now. When e.g. upstream added a new submodule in the
-just fetched commits of the superproject the submodule itself cannot be
-fetched, making it impossible to check out that submodule later without
-having to do a fetch again. This is expected to be fixed in a future Git
-version.
+Using --recurse-submodules can only fetch new commits in submodules that are
+present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
+submodule, that submodule cannot be fetched until it is cloned e.g. by `git
+submodule update`. This is expected to be fixed in a future Git version.
 
 SEE ALSO
 --------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7abbc31ff..faaf89f637 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2122,13 +2122,13 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
-		result = fetch_populated_submodules(the_repository,
-						    &options,
-						    submodule_prefix,
-						    recurse_submodules,
-						    recurse_submodules_default,
-						    verbosity < 0,
-						    max_children);
+		result = fetch_submodules(the_repository,
+					  &options,
+					  submodule_prefix,
+					  recurse_submodules,
+					  recurse_submodules_default,
+					  verbosity < 0,
+					  max_children);
 		strvec_clear(&options);
 	}
 
diff --git a/submodule.c b/submodule.c
index 03af223aba..d60f877b1f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -811,6 +811,16 @@  static const char *default_name_or_path(const char *path_or_name)
  * member of the changed submodule string_list_item.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to the
+	 * submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array new_commits;
 };
@@ -818,6 +828,7 @@  struct changed_submodule_data {
 static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(&cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -865,6 +876,8 @@  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!item->util)
 			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
 		cs_data = item->util;
+		cs_data->super_oid = commit_oid;
+		cs_data->path = xstrdup(p->two->path);
 		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
@@ -1253,14 +1266,33 @@  void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
+/*
+ * Returns 1 if there is at least one submodule gitdir in
+ * $GIT_DIR/modules and 0 otherwise. This follows
+ * submodule_name_to_gitdir(), which looks for submodules in
+ * $GIT_DIR/modules, not $GIT_COMMON_DIR.
+ *
+ * A submodule can be moved to $GIT_DIR/modules manually by running "git
+ * submodule absorbgitdirs", or it may be initialized there by "git
+ * submodule update".
+ */
+static int repo_has_absorbed_submodules(struct repository *r)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_repo_git_path(&buf, r, "modules/");
+	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
+}
+
 static void calculate_changed_submodule_paths(struct repository *r,
 		struct string_list *changed_submodule_names)
 {
 	struct strvec argv = STRVEC_INIT;
 	struct string_list_item *name;
 
-	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
+	/* No need to check if no submodules would be fetched */
+	if (!submodule_from_path(r, NULL, NULL) &&
+	    !repo_has_absorbed_submodules(r))
 		return;
 
 	strvec_push(&argv, "--"); /* argv[0] program name */
@@ -1333,7 +1365,8 @@  int submodule_touches_in_range(struct repository *r,
 }
 
 struct submodule_parallel_fetch {
-	int count;
+	int index_count;
+	int changed_count;
 	struct strvec args;
 	struct repository *r;
 	const char *prefix;
@@ -1343,6 +1376,7 @@  struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list seen_submodule_names;
 
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
@@ -1353,6 +1387,7 @@  struct submodule_parallel_fetch {
 #define SPF_INIT { \
 	.args = STRVEC_INIT, \
 	.changed_submodule_names = STRING_LIST_INIT_DUP, \
+	.seen_submodule_names = STRING_LIST_INIT_DUP, \
 	.submodules_with_errors = STRBUF_INIT, \
 }
 
@@ -1390,6 +1425,7 @@  struct fetch_task {
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
 	const char *default_argv;
+	struct strvec git_args;
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1425,6 +1461,8 @@  static void fetch_task_release(struct fetch_task *p)
 	if (p->repo)
 		repo_clear(p->repo);
 	FREE_AND_NULL(p->repo);
+
+	strvec_clear(&p->git_args);
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
@@ -1463,6 +1501,9 @@  static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 		task->free_sub = 1;
 	}
 
+	if (string_list_lookup(&spf->seen_submodule_names, task->sub->name))
+		goto cleanup;
+
 	switch (get_fetch_recurse_config(task->sub, spf))
 	{
 	default:
@@ -1493,10 +1534,12 @@  static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 }
 
 static struct fetch_task *
-get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
+get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
+			  struct strbuf *err)
 {
-	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		const struct cache_entry *ce = spf->r->index->cache[spf->count];
+	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
+		const struct cache_entry *ce =
+			spf->r->index->cache[spf->index_count];
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1511,7 +1554,7 @@  get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
 
-			spf->count++;
+			spf->index_count++;
 			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
@@ -1539,11 +1582,64 @@  get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 	return NULL;
 }
 
+static struct fetch_task *
+get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
+			    struct strbuf *err)
+{
+	for (; spf->changed_count < spf->changed_submodule_names.nr;
+	     spf->changed_count++) {
+		struct string_list_item item =
+			spf->changed_submodule_names.items[spf->changed_count];
+		struct changed_submodule_data *cs_data = item.util;
+		struct fetch_task *task;
+
+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
+			continue;
+
+		task = fetch_task_create(spf, cs_data->path,
+					 cs_data->super_oid);
+		if (!task)
+			continue;
+
+		if (!task->repo) {
+			strbuf_addf(err, _("Could not access submodule '%s' at commit %s\n"),
+				    cs_data->path,
+				    find_unique_abbrev(cs_data->super_oid, DEFAULT_ABBREV));
+
+			fetch_task_release(task);
+			free(task);
+			continue;
+		}
+
+		if (!spf->quiet)
+			strbuf_addf(err,
+				    _("Fetching submodule %s%s at commit %s\n"),
+				    spf->prefix, task->sub->path,
+				    find_unique_abbrev(cs_data->super_oid,
+						       DEFAULT_ABBREV));
+
+		spf->changed_count++;
+		/*
+		 * NEEDSWORK: A submodule unpopulated by "git rm" will
+		 * have core.worktree set, but the actual core.worktree
+		 * directory won't exist, causing the child process to
+		 * fail. Forcibly set --work-tree until we get smarter
+		 * handling for core.worktree in unpopulated submodules.
+		 */
+		strvec_push(&task->git_args, "--work-tree=.");
+		return task;
+	}
+	return NULL;
+}
+
 static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 			      void *data, void **task_cb)
 {
 	struct submodule_parallel_fetch *spf = data;
-	struct fetch_task *task = get_fetch_task(spf, err);
+	struct fetch_task *task =
+		get_fetch_task_from_index(spf, err);
+	if (!task)
+		task = get_fetch_task_from_changed(spf, err);
 
 	if (task) {
 		struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1553,6 +1649,8 @@  static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
 		cp->git_cmd = 1;
 		strvec_init(&cp->args);
+		if (task->git_args.nr)
+			strvec_pushv(&cp->args, task->git_args.v);
 		strvec_pushv(&cp->args, spf->args.v);
 		strvec_push(&cp->args, task->default_argv);
 		strvec_push(&cp->args, "--submodule-prefix");
@@ -1564,6 +1662,7 @@  static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		*task_cb = task;
 
 		strbuf_release(&submodule_prefix);
+		string_list_insert(&spf->seen_submodule_names, task->sub->name);
 		return 1;
 	}
 
@@ -1678,11 +1777,11 @@  static int fetch_finish(int retvalue, struct strbuf *err,
 	return 0;
 }
 
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix, int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs)
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix, int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs)
 {
 	int i;
 	struct submodule_parallel_fetch spf = SPF_INIT;
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..61bebde319 100644
--- a/submodule.h
+++ b/submodule.h
@@ -88,12 +88,12 @@  int should_update_submodules(void);
  */
 const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 void check_for_new_submodule_commits(struct object_id *oid);
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix,
-			       int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs);
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix,
+		     int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index ee4dd5a4a9..639290d30d 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -15,8 +15,9 @@  pwd=$(pwd)
 
 check_sub() {
 	NEW_HEAD=$1 &&
+	SUPER_HEAD=$2 &&
 	cat <<-EOF >$pwd/expect.err.sub
-	Fetching submodule submodule
+	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
 	From $pwd/submodule
 	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
 	EOF
@@ -24,8 +25,9 @@  check_sub() {
 
 check_deep() {
 	NEW_HEAD=$1 &&
+	SUB_HEAD=$2 &&
 	cat <<-EOF >$pwd/expect.err.deep
-	Fetching submodule submodule/subdir/deepsubmodule
+	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
 	From $pwd/deepsubmodule
 	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
 	EOF
@@ -418,6 +420,155 @@  test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	verify_fetch_result actual.err
 '
 
+# Test that we can fetch submodules in other branches by running fetch
+# in a commit that has no submodules.
+test_expect_success 'setup downstream branch without submodules' '
+	(
+		cd downstream &&
+		git checkout --recurse-submodules -b no-submodules &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	check_sub $sub_head $super_head &&
+	check_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	check_sub $sub_head $super_head &&
+	check_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	super_head=$(git rev-parse --short HEAD) &&
+	check_super $super_head &&
+	# Neither should be fetched because the submodule is inactive
+	rm expect.err.sub &&
+	rm expect.err.deep &&
+	verify_fetch_result actual.err
+'
+
+# In downstream, init "submodule2", but do not check it out while
+# fetching. This lets us assert that unpopulated submodules can be
+# fetched.
+test_expect_success 'setup downstream branch with other submodule' '
+	mkdir submodule2 &&
+	(
+		cd submodule2 &&
+		git init &&
+		echo sub2content >sub2file &&
+		git add sub2file &&
+		git commit -a -m new &&
+		git branch -M sub2
+	) &&
+	git checkout -b super-sub2-only &&
+	git submodule add "$pwd/submodule2" submodule2 &&
+	git commit -m "add sub2" &&
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin &&
+		git checkout super-sub2-only &&
+		# Explicitly run "git submodule update" because sub2 is new
+		# and has not been cloned.
+		git submodule update --init &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
+	# Create new commit in origin/super
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Create new commit in origin/super-sub2-only
+	git checkout super-sub2-only &&
+	(
+		cd submodule2 &&
+		test_commit --no-tag foo
+	) &&
+	git add submodule2 &&
+	git commit -m "new submodule2" &&
+
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
+
+	# Use test_cmp manually because verify_fetch_result does not
+	# consider submodule2. All the repos should be fetched, but only
+	# submodule2 should be read from a commit
+	cat <<-EOF > expect.err.combined &&
+	From $pwd/.
+	   OLD_HEAD..$super_head  super           -> origin/super
+	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
+	Fetching submodule submodule
+	From $pwd/submodule
+	   OLD_HEAD..$sub_head  sub        -> origin/sub
+	Fetching submodule submodule/subdir/deepsubmodule
+	From $pwd/deepsubmodule
+	   OLD_HEAD..$deep_head  deep       -> origin/deep
+	Fetching submodule submodule2 at commit $super_sub2_only_head
+	From $pwd/submodule2
+	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
+	EOF
+	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
+	test_cmp expect.err.combined actual.err.cmp
+'
+
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_submodule_commits &&
 	echo a >> file &&
@@ -860,4 +1011,109 @@  test_expect_success 'recursive fetch after deinit a submodule' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup repo with upstreams that share a submodule name' '
+	mkdir same-name-1 &&
+	(
+		cd same-name-1 &&
+		git init &&
+		test_commit --no-tag a
+	) &&
+	git clone same-name-1 same-name-2 &&
+	# same-name-1 and same-name-2 both add a submodule with the
+	# name "submodule"
+	(
+		cd same-name-1 &&
+		mkdir submodule &&
+		git -C submodule init &&
+		test_commit -C submodule --no-tag a1 &&
+		git submodule add "$pwd/same-name-1/submodule" &&
+		git add submodule &&
+		git commit -m "super-a1"
+	) &&
+	(
+		cd same-name-2 &&
+		mkdir submodule &&
+		git -C submodule init &&
+		test_commit -C submodule --no-tag a2 &&
+		git submodule add "$pwd/same-name-2/submodule" &&
+		git add submodule &&
+		git commit -m "super-a2"
+	) &&
+	git clone same-name-1 -o same-name-1 same-name-downstream &&
+	(
+		cd same-name-downstream &&
+		git remote add same-name-2 ../same-name-2 &&
+		git fetch --all &&
+		# init downstream with same-name-1
+		git submodule update --init
+	)
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
+	test_when_finished "git -C same-name-downstream checkout master" &&
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag b1 &&
+		git add submodule &&
+		git commit -m "super-b1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag b2 &&
+		git add submodule &&
+		git commit -m "super-b2"
+	) &&
+	(
+		cd same-name-downstream &&
+		# even though the .gitmodules is correct, we cannot
+		# fetch from same-name-2
+		git checkout same-name-2/master &&
+		git fetch --recurse-submodules same-name-1 &&
+		test_must_fail git fetch --recurse-submodules same-name-2
+	) &&
+	super_head1=$(git -C same-name-1 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head1 &&
+
+	super_head2=$(git -C same-name-2 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head2 &&
+
+	sub_head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	git -C same-name-downstream/submodule cat-file -e $sub_head1 &&
+
+	sub_head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	test_must_fail git -C same-name-downstream/submodule cat-file -e $sub_head2
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopulated submodule' '
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag c1 &&
+		git add submodule &&
+		git commit -m "super-c1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag c2 &&
+		git add submodule &&
+		git commit -m "super-c2"
+	) &&
+	(
+		cd same-name-downstream &&
+		git checkout master &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git fetch --recurse-submodules same-name-1
+	) &&
+	head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	(
+		cd same-name-downstream/.git/modules/submodule &&
+		# The submodule has core.worktree pointing to the "git
+		# rm"-ed directory, overwrite the invalid value.
+		git --work-tree=. cat-file -e $head1 &&
+		test_must_fail git --work-tree=. cat-file -e $head2
+	)
+'
+
 test_done