diff mbox series

[8/9] fetch: retry fetching submodules if needed objects were not fetched

Message ID 20181016181327.107186-9-sbeller@google.com (mailing list archive)
State New, archived
Headers show
Series Resending sb/submodule-recursive-fetch-gets-the-tip | expand

Commit Message

Stefan Beller Oct. 16, 2018, 6:13 p.m. UTC
Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c             |   9 +-
 submodule.c                 | 185 ++++++++++++++++++++++++++++++------
 t/t5526-fetch-submodules.sh |  16 ++++
 3 files changed, 177 insertions(+), 33 deletions(-)

Comments

Jonathan Tan Oct. 18, 2018, 12:39 a.m. UTC | #1
> Currently when git-fetch is asked to recurse into submodules, it dispatches
> a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> such as prefix and recusing strategy, but) without any information of the
> remote or the tip that should be fetched.
> 
> This works surprisingly well in some workflows (such as using submodules
> as a third party library), while not so well in other scenarios, such
> as in a Gerrit topic-based workflow, that can tie together changes
> (potentially across repositories) on the server side. One of the parts
> of such a Gerrit workflow is to download a change when wanting to examine
> it, and you'd want to have its submodule changes that are in the same
> topic downloaded as well. However these submodule changes reside in their
> own repository in their own ref (refs/changes/<int>).

It seems that the pertinent situation is when, in the superproject, you
fetch a commit (which may be the target of a ref, or an ancestor of the
target of a ref) that points to a submodule commit that was not fetched
by the default refspec-less fetch that you describe in the first
paragraph. I would just describe it as follows:

  But this default fetch is not sufficient, as a newly fetched commit in
  the superproject could point to a commit in the submodule that is not
  in the default refspec. This is common in workflows like Gerrit's.
  When fetching a Gerrit change under review (from refs/changes/??), the
  commits in that change likely point to submodule commits that have not
  been merged to a branch yet.

Another thing you need to clarify is what happens if the fetch-by-commit
fails. Right now, it seems that it will make the whole thing fail, which
might be a surprising change in behavior.

> Retry fetching a submodule by object name if the object id that the
> superproject points to, cannot be found.

I don't really think of this as a retry - the first time, you're
fetching the default refspec, and the second time, with a specific list
of object IDs. Also, be consistent in the usage of "object name" and
"object id" - as far as I know, they are the same thing.

> This retrying does not happen when the "git fetch" done at the
> superproject is not storing the fetched results in remote
> tracking branches (i.e. instead just recording them to
> FETCH_HEAD) in this step. A later patch will fix this.

The test stores the result in a normal branch, not a remote tracking
branch. Is storing in a normal branch required?

Also, do you know why this is required? A naive reading of the patch
leads me to believe that this should work even if merely fetching to
FETCH_HEAD.

> +struct get_next_submodule_task {
> +	struct repository *repo;
> +	const struct submodule *sub;
> +	unsigned free_sub : 1; /* Do we need to free the submodule? */
> +	struct oid_array *commits;
> +};

Firstly, I don't think "commits" needs to be a pointer.

Document at least "commits". As far as I can tell, if NULL (or empty if
you take my suggestion), this means that this task is the first pass for
this particular submodule and the fetch needs to be done using the
default refspec. Otherwise, this task is the second pass for this
particular submodule and the fetch needs to be done passing the
contained OIDs.

> +static const struct submodule *get_default_submodule(const char *path)
> +{
> +	struct submodule *ret = NULL;
> +	const char *name = default_name_or_path(path);
> +
> +	if (!name)
> +		return NULL;
> +
> +	ret = xmalloc(sizeof(*ret));
> +	memset(ret, 0, sizeof(*ret));
> +	ret->path = name;
> +	ret->name = name;
> +
> +	return (const struct submodule *) ret;
> +}

What is a "default" submodule and why would you need one?

> +		task = get_next_submodule_task_create(spf->r, ce->name);
> +
> +		if (!task->sub) {
> +			free(task);
> +			continue;
>  		}

Will task->sub ever be NULL?

> +	if (spf->retry_nr) {
> +		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> +		struct strbuf submodule_prefix = STRBUF_INIT;
> +		spf->retry_nr--;
> +
> +		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> +
> +		child_process_init(cp);
> +		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +		cp->git_cmd = 1;
> +		cp->dir = task->repo->gitdir;
> +
> +		argv_array_init(&cp->args);
> +		argv_array_pushv(&cp->args, spf->args.argv);
> +		argv_array_push(&cp->args, "on-demand");

This means that we need to trust that the last entry in spf->args can
take an "on-demand" parameter. Could we supply that argument here
explicitly instead?

> +		argv_array_push(&cp->args, "--submodule-prefix");
> +		argv_array_push(&cp->args, submodule_prefix.buf);
> +
> +		/* NEEDSWORK: have get_default_remote from s--h */

What is s--h?

> +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> +{
> +	struct repository *subrepo = data;
> +
> +	enum object_type type = oid_object_info(subrepo, oid, NULL);
> +
> +	return type != OBJ_COMMIT;
> +}

Shouldn't the function name be commit_missing_in_sub?

>  static int fetch_finish(int retvalue, struct strbuf *err,
>  			void *cb, void *task_cb)
>  {
>  	struct submodule_parallel_fetch *spf = cb;
> +	struct get_next_submodule_task *task = task_cb;
> +	const struct submodule *sub;
> +
> +	struct string_list_item *it;
> +	struct oid_array *commits;
>  
>  	if (retvalue)
>  		spf->result = 1;
>  
> +	if (!task)
> +		return 0;

When will task be NULL?

> +
> +	sub = task->sub;
> +	if (!sub)
> +		goto out;

Same as above - when will sub be NULL?

> +	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> +	if (!it)
> +		goto out;

And "it" as well.

> +	commits = it->util;
> +	oid_array_filter(commits,
> +			 commit_exists_in_sub,
> +			 task->repo);
> +
> +	/* Are there commits that do not exist? */
> +	if (commits->nr) {
> +		/* We already tried fetching them, do not try again. */
> +		if (task->commits)
> +			return 0;

Clearer and more efficient if the check for task->commits was first
before all the filtering.

> +test_expect_success "fetch new commits on-demand when they are not reachable" '

"not reachable" confused me - they are indeed reachable, just not from
the default refspec.

> +	git checkout --detach &&
> +	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/1 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	git commit -m "updated submodule outside of refs/heads" &&
> +	D=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/2 $D &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git -C submodule cat-file -t $C &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'

For maximum test coverage, I think this test should involve 2
submodules:

 B   C  E   F
  \ /    \ /
   A      D

and the upstream superproject having:

  G -> H -> I

The downstream superproject already has G, and is fetching I. In H, the
submodule gitlinks point to B and E respectively, and in I, the
submodule gitlinks point to C and F respectively. This ensures that both
multiple submodules work, and that submodule commits are also fetched
for interior superproject commits.
Stefan Beller Oct. 23, 2018, 10:37 p.m. UTC | #2
On Wed, Oct 17, 2018 at 5:40 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Currently when git-fetch is asked to recurse into submodules, it dispatches
> > a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> > such as prefix and recusing strategy, but) without any information of the
> > remote or the tip that should be fetched.
> >
> > This works surprisingly well in some workflows (such as using submodules
> > as a third party library), while not so well in other scenarios, such
> > as in a Gerrit topic-based workflow, that can tie together changes
> > (potentially across repositories) on the server side. One of the parts
> > of such a Gerrit workflow is to download a change when wanting to examine
> > it, and you'd want to have its submodule changes that are in the same
> > topic downloaded as well. However these submodule changes reside in their
> > own repository in their own ref (refs/changes/<int>).
>
> It seems that the pertinent situation is when, in the superproject, you
> fetch a commit (which may be the target of a ref, or an ancestor of the
> target of a ref) that points to a submodule commit that was not fetched
> by the default refspec-less fetch that you describe in the first
> paragraph. I would just describe it as follows:
>
>   But this default fetch is not sufficient, as a newly fetched commit in
>   the superproject could point to a commit in the submodule that is not
>   in the default refspec. This is common in workflows like Gerrit's.
>   When fetching a Gerrit change under review (from refs/changes/??), the
>   commits in that change likely point to submodule commits that have not
>   been merged to a branch yet.

Makes sense.

> Another thing you need to clarify is what happens if the fetch-by-commit
> fails. Right now, it seems that it will make the whole thing fail, which
> might be a surprising change in behavior.

But a positive surprise, I would assume?

> The test stores the result in a normal branch, not a remote tracking
> branch. Is storing in a normal branch required?

In the test we fetch from another repository, such that in the
repository-under-test this will show up in a remote tracking branch?

> Also, do you know why this is required? A naive reading of the patch
> leads me to believe that this should work even if merely fetching to
> FETCH_HEAD.

See the next patch, check_for_new_submodule_commits() is missing
for FETCH_HEAD.

>
> > +struct get_next_submodule_task {
> > +     struct repository *repo;
> > +     const struct submodule *sub;
> > +     unsigned free_sub : 1; /* Do we need to free the submodule? */
> > +     struct oid_array *commits;
> > +};
>
> Firstly, I don't think "commits" needs to be a pointer.
>
> Document at least "commits". As far as I can tell, if NULL (or empty if
> you take my suggestion), this means that this task is the first pass for
> this particular submodule and the fetch needs to be done using the
> default refspec. Otherwise, this task is the second pass for this
> particular submodule and the fetch needs to be done passing the
> contained OIDs.

Makes sense. I think I'll make it a non-pointer, but will introduce another
flag or counter for the phase.

>
> > +static const struct submodule *get_default_submodule(const char *path)
> > +{
> > +     struct submodule *ret = NULL;
> > +     const char *name = default_name_or_path(path);
> > +
> > +     if (!name)
> > +             return NULL;
> > +
> > +     ret = xmalloc(sizeof(*ret));
> > +     memset(ret, 0, sizeof(*ret));
> > +     ret->path = name;
> > +     ret->name = name;
> > +
> > +     return (const struct submodule *) ret;
> > +}
>
> What is a "default" submodule and why would you need one?

s/default/artificial/. Such a submodule is a submodule that has no
config in the .gitmodules file and its name == path.
We need to keep those around for historic reasons AFAICT, c.f.
c68f837576 (implement fetching of moved submodules, 2017-10-16)

> > +             task = get_next_submodule_task_create(spf->r, ce->name);
> > +
> > +             if (!task->sub) {
> > +                     free(task);
> > +                     continue;
> >               }
>
> Will task->sub ever be NULL?

Yes, if we fall back to these "default" submodule and just try if it
can be handled
as a submodule, but it cannot be handled as such,
get_next_submodule_task_create has

    task->sub = submodule_from_path(r, &null_oid, path);
    if (!task->sub) {
        task->sub = get_default_submodule(path);

and get_default_submodule can return NULL.


>
> > +     if (spf->retry_nr) {
> > +             struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> > +             struct strbuf submodule_prefix = STRBUF_INIT;
> > +             spf->retry_nr--;
> > +
> > +             strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> > +
> > +             child_process_init(cp);
> > +             prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> > +             cp->git_cmd = 1;
> > +             cp->dir = task->repo->gitdir;
> > +
> > +             argv_array_init(&cp->args);
> > +             argv_array_pushv(&cp->args, spf->args.argv);
> > +             argv_array_push(&cp->args, "on-demand");
>
> This means that we need to trust that the last entry in spf->args can
> take an "on-demand" parameter. Could we supply that argument here
> explicitly instead?
>
> > +             argv_array_push(&cp->args, "--submodule-prefix");
> > +             argv_array_push(&cp->args, submodule_prefix.buf);
> > +
> > +             /* NEEDSWORK: have get_default_remote from s--h */
>
> What is s--h?

builtin/submodule--helper, will clarify

>
> > +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> > +{
> > +     struct repository *subrepo = data;
> > +
> > +     enum object_type type = oid_object_info(subrepo, oid, NULL);
> > +
> > +     return type != OBJ_COMMIT;
> > +}
>
> Shouldn't the function name be commit_missing_in_sub?

yes.

>
> >  static int fetch_finish(int retvalue, struct strbuf *err,
> >                       void *cb, void *task_cb)
> >  {
> >       struct submodule_parallel_fetch *spf = cb;
> > +     struct get_next_submodule_task *task = task_cb;
> > +     const struct submodule *sub;
> > +
> > +     struct string_list_item *it;
> > +     struct oid_array *commits;
> >
> >       if (retvalue)
> >               spf->result = 1;
> >
> > +     if (!task)
> > +             return 0;
>
> When will task be NULL?
>
> > +
> > +     sub = task->sub;
> > +     if (!sub)
> > +             goto out;
>
> Same as above - when will sub be NULL?
>
> > +     it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> > +     if (!it)
> > +             goto out;
>
> And "it" as well.

I'll add comments.

>
> > +     commits = it->util;
> > +     oid_array_filter(commits,
> > +                      commit_exists_in_sub,
> > +                      task->repo);
> > +
> > +     /* Are there commits that do not exist? */
> > +     if (commits->nr) {
> > +             /* We already tried fetching them, do not try again. */
> > +             if (task->commits)
> > +                     return 0;
>
> Clearer and more efficient if the check for task->commits was first
> before all the filtering.
>
> > +test_expect_success "fetch new commits on-demand when they are not reachable" '
>
> "not reachable" confused me - they are indeed reachable, just not from
> the default refspec.

Makes sense

>
> > +     git checkout --detach &&
> > +     C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> > +     git -C submodule update-ref refs/changes/1 $C &&
> > +     git update-index --cacheinfo 160000 $C submodule &&
> > +     git commit -m "updated submodule outside of refs/heads" &&
> > +     D=$(git rev-parse HEAD) &&
> > +     git update-ref refs/changes/2 $D &&
> > +     (
> > +             cd downstream &&
> > +             git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> > +             git -C submodule cat-file -t $C &&
> > +             git checkout --recurse-submodules FETCH_HEAD
> > +     )
> > +'
>
> For maximum test coverage, I think this test should involve 2
> submodules:
>
>  B   C  E   F
>   \ /    \ /
>    A      D
>
> and the upstream superproject having:
>
>   G -> H -> I
>
> The downstream superproject already has G, and is fetching I. In H, the
> submodule gitlinks point to B and E respectively, and in I, the
> submodule gitlinks point to C and F respectively. This ensures that both
> multiple submodules work, and that submodule commits are also fetched
> for interior superproject commits.

ok.
Jonathan Tan Oct. 23, 2018, 11:37 p.m. UTC | #3
> > Another thing you need to clarify is what happens if the fetch-by-commit
> > fails. Right now, it seems that it will make the whole thing fail, which
> > might be a surprising change in behavior.
> 
> But a positive surprise, I would assume?

Whether positive or negative, I think that this needs to be mentioned in
the commit message.

As for positive or negative, I tend to agree that it's positive - sure,
some previously successful fetches would now fail, but the results of
those fetches could not be recursively checked out anyway.

> > The test stores the result in a normal branch, not a remote tracking
> > branch. Is storing in a normal branch required?
> 
> In the test we fetch from another repository, such that in the
> repository-under-test this will show up in a remote tracking branch?

If that were true, I would expect that when this line:

> git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&

is replaced by this line:

> git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2 &&

then things would still work. The tests pass with the first line (after
I fixed a type mismatch) but not with the second. (Also I don't think a
remote-tracking branch is generated here - the output printed doesn't
indicate so, and refs/changes/2 is not a branch anyway.)

> > Also, do you know why this is required? A naive reading of the patch
> > leads me to believe that this should work even if merely fetching to
> > FETCH_HEAD.
> 
> See the next patch, check_for_new_submodule_commits() is missing
> for FETCH_HEAD.

I see in the next patch that there is an "if" branch in
store_updated_refs() without update_local_ref() in which
"check_for_new_submodule_commits(&rm->old_oid)" needs to be inserted. I
think this is a symptom that maybe check_for_new_submodule_commits()
needs to be extracted from update_local_ref() and put into
store_updated_refs() instead? In update_local_ref(), it is called on
ref->new_oid, which is actually the same as rm->old_oid anyway (there is
an oidcpy earlier).

> > > +static const struct submodule *get_default_submodule(const char *path)
> > > +{
> > > +     struct submodule *ret = NULL;
> > > +     const char *name = default_name_or_path(path);
> > > +
> > > +     if (!name)
> > > +             return NULL;
> > > +
> > > +     ret = xmalloc(sizeof(*ret));
> > > +     memset(ret, 0, sizeof(*ret));
> > > +     ret->path = name;
> > > +     ret->name = name;
> > > +
> > > +     return (const struct submodule *) ret;
> > > +}
> >
> > What is a "default" submodule and why would you need one?
> 
> s/default/artificial/. Such a submodule is a submodule that has no
> config in the .gitmodules file and its name == path.
> We need to keep those around for historic reasons AFAICT, c.f.
> c68f837576 (implement fetching of moved submodules, 2017-10-16)

Ah, OK. I would call it a fake submodule then, and copy over the "No
entry in .gitmodules?" comment.

> > Will task->sub ever be NULL?
> 
> Yes, if we fall back to these "default" submodule and just try if it
> can be handled
> as a submodule, but it cannot be handled as such,
> get_next_submodule_task_create has
> 
>     task->sub = submodule_from_path(r, &null_oid, path);
>     if (!task->sub) {
>         task->sub = get_default_submodule(path);
> 
> and get_default_submodule can return NULL.

Ah, yes you're right.
Stefan Beller Oct. 25, 2018, 9:42 p.m. UTC | #4
On Tue, Oct 23, 2018 at 4:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > > Another thing you need to clarify is what happens if the fetch-by-commit
> > > fails. Right now, it seems that it will make the whole thing fail, which
> > > might be a surprising change in behavior.
> >
> > But a positive surprise, I would assume?
>
> Whether positive or negative, I think that this needs to be mentioned in
> the commit message.
>
> As for positive or negative, I tend to agree that it's positive - sure,
> some previously successful fetches would now fail, but the results of
> those fetches could not be recursively checked out anyway.
>
> > > The test stores the result in a normal branch, not a remote tracking
> > > branch. Is storing in a normal branch required?
> >
> > In the test we fetch from another repository, such that in the
> > repository-under-test this will show up in a remote tracking branch?

I messed up there. Yes, we need to fetch into a normal branch
such that the logic of check_for_new_submodule_commits triggers
no matter where it is on the remote.

Your experiment below shows that we cannot fetch into FETCH_HEAD:

> If that were true, I would expect that when this line:
>
> > git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
>
> is replaced by this line:
>
> > git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2 &&
>
> then things would still work. The tests pass with the first line (after
> I fixed a type mismatch) but not with the second. (Also I don't think a
> remote-tracking branch is generated here - the output printed doesn't
> indicate so, and refs/changes/2 is not a branch anyway.)

> > > Also, do you know why this is required? A naive reading of the patch
> > > leads me to believe that this should work even if merely fetching to
> > > FETCH_HEAD.
> >
> > See the next patch, check_for_new_submodule_commits() is missing
> > for FETCH_HEAD.
>
> I see in the next patch that there is an "if" branch in
> store_updated_refs() without update_local_ref() in which
> "check_for_new_submodule_commits(&rm->old_oid)" needs to be inserted. I
> think this is a symptom that maybe check_for_new_submodule_commits()
> needs to be extracted from update_local_ref() and put into
> store_updated_refs() instead? In update_local_ref(), it is called on
> ref->new_oid, which is actually the same as rm->old_oid anyway (there is
> an oidcpy earlier).

I'll look into that.

> > > What is a "default" submodule and why would you need one?
> >
> > s/default/artificial/. Such a submodule is a submodule that has no
> > config in the .gitmodules file and its name == path.
> > We need to keep those around for historic reasons AFAICT, c.f.
> > c68f837576 (implement fetching of moved submodules, 2017-10-16)
>
> Ah, OK. I would call it a fake submodule then, and copy over the "No
> entry in .gitmodules?" comment.

"fake submodule" sounds like
http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb
which is what I think of when hearing fake submodules.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..95c44bf6ff 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@  static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@  static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@  static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 30c06507e3..7246b776f3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1141,8 +1141,12 @@  struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct get_next_submodule_task **retry;
+	int retry_nr, retry_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  NULL, 0, 0}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1247,6 +1251,56 @@  static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+struct get_next_submodule_task {
+	struct repository *repo;
+	const struct submodule *sub;
+	unsigned free_sub : 1; /* Do we need to free the submodule? */
+	struct oid_array *commits;
+};
+
+static const struct submodule *get_default_submodule(const char *path)
+{
+	struct submodule *ret = NULL;
+	const char *name = default_name_or_path(path);
+
+	if (!name)
+		return NULL;
+
+	ret = xmalloc(sizeof(*ret));
+	memset(ret, 0, sizeof(*ret));
+	ret->path = name;
+	ret->name = name;
+
+	return (const struct submodule *) ret;
+}
+
+static struct get_next_submodule_task *get_next_submodule_task_create(
+	struct repository *r, const char *path)
+{
+	struct get_next_submodule_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(r, &null_oid, path);
+	if (!task->sub) {
+		task->sub = get_default_submodule(path);
+		task->free_sub = 1;
+	}
+
+	return task;
+}
+
+static void get_next_submodule_task_release(struct get_next_submodule_task *p)
+{
+	if (p->free_sub)
+		free((void*)p->sub);
+	p->free_sub = 0;
+	p->sub = NULL;
+
+	if (p->repo)
+		repo_clear(p->repo);
+	FREE_AND_NULL(p->repo);
+}
+
 static struct repository *get_submodule_repo_for(struct repository *r,
 						 const struct submodule *sub)
 {
@@ -1273,39 +1327,35 @@  static struct repository *get_submodule_repo_for(struct repository *r,
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
-	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_prefix = STRBUF_INIT;
+		int recurse_config;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *default_argv;
-		const struct submodule *submodule;
-		struct repository *repo;
-		struct submodule default_submodule = SUBMODULE_INIT;
+		struct get_next_submodule_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		submodule = submodule_from_path(spf->r, &null_oid, ce->name);
-		if (!submodule) {
-			const char *name = default_name_or_path(ce->name);
-			if (name) {
-				default_submodule.path = name;
-				default_submodule.name = name;
-				submodule = &default_submodule;
-			}
+		task = get_next_submodule_task_create(spf->r, ce->name);
+
+		if (!task->sub) {
+			free(task);
+			continue;
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(task->sub, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule ||
+			if (!task->sub ||
 			    !string_list_lookup(
 					&spf->changed_submodule_names,
-					submodule->name))
+					task->sub->name))
 				continue;
 			default_argv = "on-demand";
 			break;
@@ -1316,12 +1366,12 @@  static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		repo = get_submodule_repo_for(spf->r, submodule);
-		if (repo) {
+		task->repo = get_submodule_repo_for(spf->r, task->sub);
+		if (task->repo) {
+			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
 			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->dir = xstrdup(repo->gitdir);
+			cp->dir = task->repo->gitdir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1330,18 +1380,51 @@  static int get_next_submodule(struct child_process *cp,
 			argv_array_pushv(&cp->args, spf->args.argv);
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
+
+			strbuf_addf(&submodule_prefix, "%s%s/",
+						       spf->prefix,
+						       task->sub->path);
 			argv_array_push(&cp->args, submodule_prefix.buf);
 
-			repo_clear(repo);
-			free(repo);
-			ret = 1;
-		}
-		strbuf_release(&submodule_prefix);
-		if (ret) {
 			spf->count++;
+			*task_cb = task;
+
+			strbuf_release(&submodule_prefix);
 			return 1;
+		} else {
+			get_next_submodule_task_release(task);
+			free(task);
 		}
 	}
+
+	if (spf->retry_nr) {
+		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		spf->retry_nr--;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
+
+		child_process_init(cp);
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		cp->dir = task->repo->gitdir;
+
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(task->commits,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = task;
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1349,20 +1432,68 @@  static int fetch_start_failure(struct strbuf *err,
 			       void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct get_next_submodule_task *task = task_cb;
 
 	spf->result = 1;
 
+	get_next_submodule_task_release(task);
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct get_next_submodule_task *task = task_cb;
+	const struct submodule *sub;
+
+	struct string_list_item *it;
+	struct oid_array *commits;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!task)
+		return 0;
+
+	sub = task->sub;
+	if (!sub)
+		goto out;
+
+	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+	if (!it)
+		goto out;
+
+	commits = it->util;
+	oid_array_filter(commits,
+			 commit_exists_in_sub,
+			 task->repo);
+
+	/* Are there commits that do not exist? */
+	if (commits->nr) {
+		/* We already tried fetching them, do not try again. */
+		if (task->commits)
+			return 0;
+
+		task->commits = commits;
+		ALLOC_GROW(spf->retry, spf->retry_nr + 1, spf->retry_alloc);
+		spf->retry[spf->retry_nr] = task;
+		spf->retry_nr++;
+		return 0;
+	}
+
+out:
+	get_next_submodule_task_release(task);
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1..af12c50e7d 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@  test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done