diff mbox series

[RFC/PATCH] Add fetch.updateHead option

Message ID 20201118091219.3341585-1-felipe.contreras@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC/PATCH] Add fetch.updateHead option | expand

Commit Message

Felipe Contreras Nov. 18, 2020, 9:12 a.m. UTC
Users might change the behavior when running "git fetch" so that the
remote's HEAD symbolic ref is updated at certain point.

For example after running "git remote add" the remote HEAD is not
set like it is with "git clone".

Setting "fetch.updatehead = missing" would probably be a sensible
default that everyone would want, but for now the default behavior is to
never update HEAD, so there shouldn't be any functional changes.

For the next major version of Git, we might want to change this default.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---

This is just a RFC, the tests are missing.

 Documentation/config/fetch.txt  |  4 +++
 Documentation/config/remote.txt |  3 ++
 builtin/fetch.c                 | 53 ++++++++++++++++++++++++++++++++-
 remote.c                        | 21 +++++++++++++
 remote.h                        | 11 +++++++
 5 files changed, 91 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Nov. 18, 2020, 9:30 a.m. UTC | #1
On Wed, Nov 18 2020, Felipe Contreras wrote:

> Users might change the behavior when running "git fetch" so that the
> remote's HEAD symbolic ref is updated at certain point.
>
> For example after running "git remote add" the remote HEAD is not
> set like it is with "git clone".
>
> Setting "fetch.updatehead = missing" would probably be a sensible
> default that everyone would want, but for now the default behavior is to
> never update HEAD, so there shouldn't be any functional changes.
>
> For the next major version of Git, we might want to change this default.
>
> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
> ---
>
> This is just a RFC, the tests are missing.

I haven't taken much time to re-think through the patch/implications of
this, but I remember running into this and going through some pre-patch
investigation at some point.

It's really annoying in some cases that "clone" isn't creating the same
state as "remote". IIRC I was doing some heuristics to figure out the
remote branch name etc.

Isn't this something we can just change without an option? There were a
bunch of cases in clone/fetch that were different for no different
reasons, IIRC I patched one or two of those in the past. But I haven't
gone through the history of the feature and checked if it was
intentional.
Felipe Contreras Nov. 18, 2020, 9:43 a.m. UTC | #2
On Wed, Nov 18, 2020 at 3:30 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> I haven't taken much time to re-think through the patch/implications of
> this, but I remember running into this and going through some pre-patch
> investigation at some point.
>
> It's really annoying in some cases that "clone" isn't creating the same
> state as "remote". IIRC I was doing some heuristics to figure out the
> remote branch name etc.
>
> Isn't this something we can just change without an option? There were a
> bunch of cases in clone/fetch that were different for no different
> reasons, IIRC I patched one or two of those in the past. But I haven't
> gone through the history of the feature and checked if it was
> intentional.

Apparently remote/HEAD is supposed to be set manually, which is why
there is "git remote add -m master", and "git remote set-head origin
master".

Personally I don't see any point in that.

I think if no remote/HEAD is set (manually), it should be set
automatically on the first "git fetch", and that should mirror the
behavior of "git clone". This would be the equivalent of
"fetch.updatehead = missing", which in my opinion should be the
default.

This configuration was suggested by Jeff King, see:

https://lore.kernel.org/git/20201118020618.GE650959@coredump.intra.peff.net/

Personally I don't need a configuration, I would be happy if
"fetch.updatehead = missing" was the default.

Cheers.
Junio C Hamano Nov. 18, 2020, 3:53 p.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Nov 18 2020, Felipe Contreras wrote:
>
>> Users might change the behavior when running "git fetch" so that the
>> remote's HEAD symbolic ref is updated at certain point.
>>
>> For example after running "git remote add" the remote HEAD is not
>> set like it is with "git clone".
>>
>> Setting "fetch.updatehead = missing" would probably be a sensible
>> default that everyone would want, but for now the default behavior is to
>> never update HEAD, so there shouldn't be any functional changes.
>>
>> For the next major version of Git, we might want to change this default.
>>
>> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
>> ---
>>
>> This is just a RFC, the tests are missing.
>
> I haven't taken much time to re-think through the patch/implications of
> this, but I remember running into this and going through some pre-patch
> investigation at some point.
>
> It's really annoying in some cases that "clone" isn't creating the same
> state as "remote". IIRC I was doing some heuristics to figure out the
> remote branch name etc.
>
> Isn't this something we can just change without an option? There were a
> bunch of cases in clone/fetch that were different for no different
> reasons, IIRC I patched one or two of those in the past. But I haven't
> gone through the history of the feature and checked if it was
> intentional.

I think what Peff outlined earlier was reasonable.  "remote add -f",
since it talks with the remote, should be able to learn where their
HEAD points at and set it up.  "remote add" that does not talk to
the remote cannot do so and "fetch" could help but we should not
touch existing refs/remotes/$name/HEAD by default [*1*], as the
symref is meant to indicate the local choice of which one of their
branches is significant to _us_ and what "clone" does is merely to
give it the initial value.

But when interacting with a remote whose choice of HEAD is always
what the local user wants to follow, letting "git fetch" update
refs/remotes/$name/HEAD to a newly observed value would be a welcome
optional feature.

I haven't thought through what Jonathan (Nieder) said about an
option to fail a fetch when refs/remotes/$name/HEAD and the remote
HEAD "git fetch" observed do not match.  It may make sense in some
cases but not always, and I do not know what the right explanation
for the use case the mode is supposed to be a good match is.


[Footnote]

*1* This is important when you work in more than one repositories
with working tree and use fetch+rebase to keep them in sync.
Felipe Contreras Nov. 18, 2020, 7:04 p.m. UTC | #4
On Wed, Nov 18, 2020 at 9:53 AM Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> > I haven't taken much time to re-think through the patch/implications of
> > this, but I remember running into this and going through some pre-patch
> > investigation at some point.
> >
> > It's really annoying in some cases that "clone" isn't creating the same
> > state as "remote". IIRC I was doing some heuristics to figure out the
> > remote branch name etc.
> >
> > Isn't this something we can just change without an option? There were a
> > bunch of cases in clone/fetch that were different for no different
> > reasons, IIRC I patched one or two of those in the past. But I haven't
> > gone through the history of the feature and checked if it was
> > intentional.
>
> I think what Peff outlined earlier was reasonable.  "remote add -f",
> since it talks with the remote, should be able to learn where their
> HEAD points at and set it up.  "remote add" that does not talk to
> the remote cannot do so and "fetch" could help but we should not
> touch existing refs/remotes/$name/HEAD by default [*1*], as the
> symref is meant to indicate the local choice of which one of their
> branches is significant to _us_ and what "clone" does is merely to
> give it the initial value.

The new suggested behavior (fetch.updatehead = missing) is that "git
fetch" touches $remote/HEAD *only* when it doesn't exist.

So if you set $remote/HEAD manually, you would not affected.

However, that behavior is clearly not ideal in the long term, since it
seems basically nobody uses that feature.

Either way I see no argument against adding this option, and making
the default fetch.updatehead = never, which doesn't change the current
behavior at all.

> But when interacting with a remote whose choice of HEAD is always
> what the local user wants to follow, letting "git fetch" update
> refs/remotes/$name/HEAD to a newly observed value would be a welcome
> optional feature.

That would be fetch.updatehead = always.

Cheers.
Jeff King Nov. 20, 2020, 11:52 p.m. UTC | #5
On Wed, Nov 18, 2020 at 03:12:19AM -0600, Felipe Contreras wrote:

> Users might change the behavior when running "git fetch" so that the
> remote's HEAD symbolic ref is updated at certain point.
> 
> For example after running "git remote add" the remote HEAD is not
> set like it is with "git clone".
> 
> Setting "fetch.updatehead = missing" would probably be a sensible
> default that everyone would want, but for now the default behavior is to
> never update HEAD, so there shouldn't be any functional changes.
> 
> For the next major version of Git, we might want to change this default.

Thanks for working on this. Regardless of whether we change the default
behavior, this seems like an obvious improvement (and I do think it
makes sense to eventually change the default; I'd even be OK switching
it to "missing" in the near term).

The implementation looks pretty straight-forward, but I have a few
comments below:

> --- a/Documentation/config/remote.txt
> +++ b/Documentation/config/remote.txt
> @@ -84,3 +84,6 @@ remote.<name>.promisor::
>  remote.<name>.partialclonefilter::
>  	The filter that will be applied when fetching from this
>  	promisor remote.
> +
> +remote.<name>.updateHead::
> +  Defines when to update the remote HEAD symbolic ref. See `fetch.updateHead`.

Nice. I think fetch.updateHead is likely to be the commonly used one,
but if this isn't hard to support, it's a nice bonus for flexibility.

> +static void update_head(int config, const struct ref *head, const struct remote *remote)
> +{
> +	struct strbuf ref = STRBUF_INIT, target = STRBUF_INIT;
> +	const char *r, *head_name = NULL;
> +
> +	if (!head || !head->symref || !remote)
> +		return;

I'd expect us to return early here, as well, if the config is set to
"never". I think your function will usually still do the right thing
(because we return early before writing), but it makes a pointless
lookup of the current origin/HEAD. But see below.

> +	strbuf_addf(&ref, "refs/remotes/%s/HEAD", remote->name);
> +	skip_prefix(head->symref, "refs/heads/", &head_name);

Should we bail or complain if this skip_prefix() doesn't match? I think
it would be a sign of weirdness on the remote side, since HEAD is never
supposed to point to anything except refs/heads/. But if we ever did see
it, it's probably better to bail than to point to
refs/remotes/origin/refs/foo/bar or whatever.

> +	strbuf_addf(&target, "refs/remotes/%s/%s", remote->name, head_name);
> +
> +	r = refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
> +		ref.buf, RESOLVE_REF_READING,
> +		NULL, NULL);

This won't resolve a symref pointing to an unborn branch, so it would
count as "missing". I.e.:

  git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/nope
  git -c fetch.updatehead=missing fetch

will update it based on the remote HEAD.  I guess I could see some
argument for defining "missing" in that way, but I suspect it is not
what somebody in this situation would expect.

I think it's hard to get into this situation intentionally. If you clone
an empty repo, we won't write the symref at all (since we have nothing
to write into it), so both sides would be missing. But it's easy enough
to do the right thing; see symbolic-ref.c's check_symref() function
(basically drop the READING flag).

Likewise, I'd not expect to see a non-symref at that name, but what
should we do if we see one? I think overwrite it for "always", but not
for "missing". You can tell the difference by checking REF_ISSYMREF in
the returned flags, though the distinction may not matter if we follow
the semantics I just said.

> +	if (r) {
> +		if (config == FETCH_UPDATE_HEAD_MISSING)
> +			/* already present */
> +			return;
> +		else if (config == FETCH_UPDATE_HEAD_ALWAYS && !strcmp(r, target.buf))
> +			/* already up-to-date */
> +			return;
> +		else
> +			/* should never happen */
> +			return;
> +	}
> +
> +	if (create_symref(ref.buf, target.buf, "remote update head"))
> +		warning(_("could not set remote head"));
> +}

If we do update the symref, we should probably tell the user. Better
still if we can print it as part of the usual fetch output table.

> @@ -1382,8 +1420,18 @@ static int do_fetch(struct transport *transport,
>  				break;
>  			}
>  		}
> -	} else if (transport->remote && transport->remote->fetch.nr)
> +	} else if (transport->remote && transport->remote->fetch.nr) {
> +		if (transport->remote->update_head)
> +			update_head_config = transport->remote->update_head;
> +		else
> +			update_head_config = fetch_update_head;
> +
> +		need_update_head = update_head_config && update_head_config != FETCH_UPDATE_HEAD_NEVER;
> +
> +		if (need_update_head)
> +			strvec_push(&ref_prefixes, "HEAD");
>  		refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes);
> +	}

Good catch. We need this for:

  git fetch origin

since otherwise it doesn't ask about HEAD in a v2 exchange. What about:

  git fetch origin master

That won't report on HEAD either, even with your patch, because it hits
the part of the conditional before your "else if". What should it do?  I
can see an argument for "nothing, we only update head on full configured
fetches", but if so we should definitely make that clear in the
documentation. I can also see an argument for "always, if we happen to
have heard about it" (just like we opportunistically update tracking
refs even if they are fetched by name into FETCH_HEAD).

> diff --git a/remote.h b/remote.h
> index 3211abdf05..c16a2d7b2e 100644
> --- a/remote.h
> +++ b/remote.h

I wondered if we should be interacting with remote.[ch]'s
guess_remote_head() here, which is what powers "remote set-head -a", as
well as the initial clone decision. But I don't think it make sense. If
we were not explicitly given information about a symref, it does not
make much sense to guess without seeing all of the refs (which we may
well not, due to the v2 ref-prefix capability).

In most cases the distinction would not matter anyway. The "guess" part
comes in only for ancient pre-symref-capability versions of Git (which
we are unlikely to see and it's OK if they just don't make use of the
feature), or for remotes with detached HEADs (in which case not setting
our local origin/HEAD is probably the best thing, as a bad guess would
foul up a later use of the "missing" mode).

-Peff
Junio C Hamano Nov. 21, 2020, 12:28 a.m. UTC | #6
Jeff King <peff@peff.net> writes:

> Thanks for working on this. Regardless of whether we change the default
> behavior, this seems like an obvious improvement (and I do think it
> makes sense to eventually change the default; I'd even be OK switching
> it to "missing" in the near term).

I agree that "missing" would be an easy thing to take, and I do not
mind seeing it made the default in the near term.  It won't break
existing expectations too much, and can even be seen as a bugfix for
the current behaviour by making "init && fetch" a step closer to
"clone".  Beyond that to modify what the end user already has is a
much harder sell.  For some it may be an improvement, but for others
it would be a breaking change.


> The implementation looks pretty straight-forward, but I have a few
> comments below:
> ...
>> +	strbuf_addf(&ref, "refs/remotes/%s/HEAD", remote->name);
>> +	skip_prefix(head->symref, "refs/heads/", &head_name);
>
> Should we bail or complain if this skip_prefix() doesn't match? I think
> it would be a sign of weirdness on the remote side, ...

Yes, we should notice the weirdness and stop doing any further harm
to the local side.

>> +	strbuf_addf(&target, "refs/remotes/%s/%s", remote->name, head_name);
>> +
>> +	r = refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
>> +		ref.buf, RESOLVE_REF_READING,
>> +		NULL, NULL);
>
> This won't resolve a symref pointing to an unborn branch, so it would
> count as "missing". I.e.:
>
>   git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/nope
>   git -c fetch.updatehead=missing fetch
>
> will update it based on the remote HEAD.  I guess I could see some
> argument for defining "missing" in that way, but I suspect it is not
> what somebody in this situation would expect.

What do we do in "git clone" of an empty repository with the current
branch not yet to be born?  Modern Git tells where the HEAD points at
even for unborn branch, so using that would be a natural thing to do.

> If we do update the symref, we should probably tell the user. Better
> still if we can print it as part of the usual fetch output table.

True.

>> +		if (need_update_head)
>> +			strvec_push(&ref_prefixes, "HEAD");
>>  		refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes);
>> +	}
>
> Good catch. We need this for:
>
>   git fetch origin
>
> since otherwise it doesn't ask about HEAD in a v2 exchange. What about:
>
>   git fetch origin master
>
> That won't report on HEAD either, even with your patch, because it hits
> the part of the conditional before your "else if". What should it do?  I
> can see an argument for "nothing, we only update head on full configured
> fetches", but if so we should definitely make that clear in the
> documentation. I can also see an argument for "always, if we happen to
> have heard about it" (just like we opportunistically update tracking
> refs even if they are fetched by name into FETCH_HEAD).

After seeing that all users got accustomed to seeing an explicit
fetch into FETCH_HEAD still update the remote-tracking branches, it
is more consistent and easier to understand by users if the "if we
happen to have heard about it" is used, I would think.  Documenting
the behaviour certainly is needed in a real version after this RFC.

> In most cases the distinction would not matter anyway. The "guess" part
> comes in only for ancient pre-symref-capability versions of Git (which
> we are unlikely to see and it's OK if they just don't make use of the
> feature),

I agree with you that at this point the versions of Git that does
not know about advertising symref can be left to rot.

> or for remotes with detached HEADs (in which case not setting
> our local origin/HEAD is probably the best thing, as a bad guess would
> foul up a later use of the "missing" mode).

Yes.  That would help the "missing" mode.  With a mode stronger than
"missing" that modifies what the local side has, detaching origin/HEAD
in a similar way as the remote has would be OK (although I do not
see me using that mode personally).

Thanks for a detailed review.
Jeff King Nov. 21, 2020, 12:40 a.m. UTC | #7
On Fri, Nov 20, 2020 at 04:28:45PM -0800, Junio C Hamano wrote:

> > This won't resolve a symref pointing to an unborn branch, so it would
> > count as "missing". I.e.:
> >
> >   git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/nope
> >   git -c fetch.updatehead=missing fetch
> >
> > will update it based on the remote HEAD.  I guess I could see some
> > argument for defining "missing" in that way, but I suspect it is not
> > what somebody in this situation would expect.
> 
> What do we do in "git clone" of an empty repository with the current
> branch not yet to be born?  Modern Git tells where the HEAD points at
> even for unborn branch, so using that would be a natural thing to do.

We don't seem to do anything:

  $ git init
  $ git clone . dst
  Cloning into 'dst'...
  warning: You appear to have cloned an empty repository.
  done.
  $ find dst/.git/refs
  dst/.git/refs
  dst/.git/refs/tags
  dst/.git/refs/heads

Likewise with --no-local.

I don't think we do advertise the symref in such a case. In v2, the
symref information is attached to individual lines in the ref
advertisement. And we don't advertise the unborn line (we could, but I
think we might need a special syntax for the oid field).

In v0, it comes in the capability section attached to the first line of
the advertisement, but it doesn't have to be about that particular line.
If there are no refs to advertise, we don't seem to send anything (I
_thought_ we sent a capabilities^{} line, but I think that is only
receive-pack; if we have no refs to fetch, then capabilities are not
interesting on the upload-pack side anyway).

But even if we do have a ref in v0, it looks like we don't advertise the
symref:

  $ git init
  $ git commit --allow-empty -m foo
  $ git checkout --orphan another-branch
  $ git-upload-pack .
  0104d4cebe701d3d7b36e6c383193e92ef4bd49ab2b0 refs/heads/mastermulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed object-format=sha1 agent=git/2.29.2.730.g3e418f96ba

We could likewise support it there, but I don't think modifying the v0
protocol at this point is that interesting.

-Peff
Felipe Contreras Nov. 21, 2020, 1:18 a.m. UTC | #8
On Fri, Nov 20, 2020 at 6:40 PM Jeff King <peff@peff.net> wrote:

> I don't think we do advertise the symref in such a case. In v2, the
> symref information is attached to individual lines in the ref
> advertisement. And we don't advertise the unborn line (we could, but I
> think we might need a special syntax for the oid field).

This may be worth considering changing.

If a hosting provider (e.g. GitHub) decides to configure an initial
branch (e.g. main) it would be nice for "git clone" to have
information about that initial branch so the user doesn't have to
change it manually to please the provider's aesthetics.

So the instructions could be:

  git clone $url .
  echo "# myproject" >> README.md
  git add README.md
  git commit -m "first commit"
  # git branch -M main # this step would not be necessary
  git push -u origin HEAD

Personally I don't care about this. But others might find it useful.

Cheers.
Felipe Contreras Nov. 21, 2020, 1:41 a.m. UTC | #9
On Fri, Nov 20, 2020 at 5:52 PM Jeff King <peff@peff.net> wrote:
> On Wed, Nov 18, 2020 at 03:12:19AM -0600, Felipe Contreras wrote:

> > +static void update_head(int config, const struct ref *head, const struct remote *remote)
> > +{
> > +     struct strbuf ref = STRBUF_INIT, target = STRBUF_INIT;
> > +     const char *r, *head_name = NULL;
> > +
> > +     if (!head || !head->symref || !remote)
> > +             return;
>
> I'd expect us to return early here, as well, if the config is set to
> "never". I think your function will usually still do the right thing
> (because we return early before writing), but it makes a pointless
> lookup of the current origin/HEAD. But see below.

If the config is set to "never", the function update_head is never
called, since the boolean need_update_head is never set in the outer
function.

I don't like the convolutedness of this approach, but couldn't think
of anything better.

> > +     strbuf_addf(&ref, "refs/remotes/%s/HEAD", remote->name);
> > +     skip_prefix(head->symref, "refs/heads/", &head_name);
>
> Should we bail or complain if this skip_prefix() doesn't match? I think
> it would be a sign of weirdness on the remote side, since HEAD is never
> supposed to point to anything except refs/heads/. But if we ever did see
> it, it's probably better to bail than to point to
> refs/remotes/origin/refs/foo/bar or whatever.

OK. An extra check doesn't hurt.

> > +     strbuf_addf(&target, "refs/remotes/%s/%s", remote->name, head_name);
> > +
> > +     r = refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
> > +             ref.buf, RESOLVE_REF_READING,
> > +             NULL, NULL);
>
> This won't resolve a symref pointing to an unborn branch, so it would
> count as "missing". I.e.:
>
>   git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/nope
>   git -c fetch.updatehead=missing fetch
>
> will update it based on the remote HEAD.  I guess I could see some
> argument for defining "missing" in that way, but I suspect it is not
> what somebody in this situation would expect.
>
> I think it's hard to get into this situation intentionally. If you clone
> an empty repo, we won't write the symref at all (since we have nothing
> to write into it), so both sides would be missing. But it's easy enough
> to do the right thing; see symbolic-ref.c's check_symref() function
> (basically drop the READING flag).
>
> Likewise, I'd not expect to see a non-symref at that name, but what
> should we do if we see one? I think overwrite it for "always", but not
> for "missing". You can tell the difference by checking REF_ISSYMREF in
> the returned flags, though the distinction may not matter if we follow
> the semantics I just said.

All right. So, still checking if REF_ISSYMREF is set, just in case.

> > +     if (r) {
> > +             if (config == FETCH_UPDATE_HEAD_MISSING)
> > +                     /* already present */
> > +                     return;
> > +             else if (config == FETCH_UPDATE_HEAD_ALWAYS && !strcmp(r, target.buf))
> > +                     /* already up-to-date */
> > +                     return;
> > +             else
> > +                     /* should never happen */
> > +                     return;
> > +     }
> > +
> > +     if (create_symref(ref.buf, target.buf, "remote update head"))
> > +             warning(_("could not set remote head"));
> > +}
>
> If we do update the symref, we should probably tell the user. Better
> still if we can print it as part of the usual fetch output table.

Agreed.

> > @@ -1382,8 +1420,18 @@ static int do_fetch(struct transport *transport,
> >                               break;
> >                       }
> >               }
> > -     } else if (transport->remote && transport->remote->fetch.nr)
> > +     } else if (transport->remote && transport->remote->fetch.nr) {
> > +             if (transport->remote->update_head)
> > +                     update_head_config = transport->remote->update_head;
> > +             else
> > +                     update_head_config = fetch_update_head;
> > +
> > +             need_update_head = update_head_config && update_head_config != FETCH_UPDATE_HEAD_NEVER;
> > +
> > +             if (need_update_head)
> > +                     strvec_push(&ref_prefixes, "HEAD");
> >               refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes);
> > +     }
>
> Good catch. We need this for:
>
>   git fetch origin
>
> since otherwise it doesn't ask about HEAD in a v2 exchange. What about:
>
>   git fetch origin master
>
> That won't report on HEAD either, even with your patch, because it hits
> the part of the conditional before your "else if". What should it do?  I
> can see an argument for "nothing, we only update head on full configured
> fetches", but if so we should definitely make that clear in the
> documentation. I can also see an argument for "always, if we happen to
> have heard about it" (just like we opportunistically update tracking
> refs even if they are fetched by name into FETCH_HEAD).

I don't see the point in complicating the code for those corner-cases.

And I also don't see how HEAD can be fetched unless we specifically
ask for it. What would that command look like?


Also, there's two optimizations that apparently went unnoticed:

1. In the case of "missing". We could preemptively check if there's
already an "origin/HEAD" before adding "HEAD" to the prefixes (and
setting need_update_head). That would probably complicate the code.

2. Also in the case of "missing". There's no point in building the
"target" string buffer, since we are never going to compare it to
anything. Again, this would complicate the code.

I decided against these in v1 because as I already stated: it
complicates the code.

Plus, I forgot to release the string buffers (I'm glad nobody
noticed). That will be in v2.

Cheers.

Cheers.
Felipe Contreras Nov. 21, 2020, 1:53 a.m. UTC | #10
On Fri, Nov 20, 2020 at 6:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > Thanks for working on this. Regardless of whether we change the default
> > behavior, this seems like an obvious improvement (and I do think it
> > makes sense to eventually change the default; I'd even be OK switching
> > it to "missing" in the near term).
>
> I agree that "missing" would be an easy thing to take, and I do not
> mind seeing it made the default in the near term.  It won't break
> existing expectations too much, and can even be seen as a bugfix for
> the current behaviour by making "init && fetch" a step closer to
> "clone".  Beyond that to modify what the end user already has is a
> much harder sell.  For some it may be an improvement, but for others
> it would be a breaking change.

Changing the default to "missing" breaks a lot of tests. I already
have the fixes for the tests, but this can be seen as an indication
that the expectations of users would change.

I never knew of this $remote/HEAD feature, and searching forums some
people seem perplexed by its existence and ask how to remove it.

So, if the "missing" behavior is the one we are targeting (which I
argue we should), we probably need a warning before doing the flip, so
that users become aware of the feature and are not surprised by a
sudden $remote/HEAD popping (or repopping) seemingly out of nowhere.

Cheers.
Jeff King Nov. 24, 2020, 6:58 a.m. UTC | #11
On Fri, Nov 20, 2020 at 07:18:39PM -0600, Felipe Contreras wrote:

> On Fri, Nov 20, 2020 at 6:40 PM Jeff King <peff@peff.net> wrote:
> 
> > I don't think we do advertise the symref in such a case. In v2, the
> > symref information is attached to individual lines in the ref
> > advertisement. And we don't advertise the unborn line (we could, but I
> > think we might need a special syntax for the oid field).
> 
> This may be worth considering changing.

I agree it might be nice, but may not be worth the effort (it will
require a protocol extension).

However...

> If a hosting provider (e.g. GitHub) decides to configure an initial
> branch (e.g. main) it would be nice for "git clone" to have
> information about that initial branch so the user doesn't have to
> change it manually to please the provider's aesthetics.
> 
> So the instructions could be:
> 
>   git clone $url .
>   echo "# myproject" >> README.md
>   git add README.md
>   git commit -m "first commit"
>   # git branch -M main # this step would not be necessary
>   git push -u origin HEAD
> 
> Personally I don't care about this. But others might find it useful.

I don't think you have to do the rename here if you don't want to.
If there is already content on the "main" branch at GitHub, then clone
should pick that as the default branch for your local clone.

If there isn't, then you'll get whatever your local git-clone decides
on. But when you push, at least to GitHub, if the server-side HEAD
points to an unborn branch, it will re-point to the newly pushed branch.
So with the current version of Git, you would end up with "master" on
the remote side. And that should continue to work as it always has
(AFAIK any changes on GitHub's side are purely about default branch
names, such as when you add template files as part of the repo
initialization).

Of course if you _want_ "main" in both cases, you'd have to rename the
branch. But you would probably set init.defaultBranch in that case.

-Peff
Jeff King Nov. 24, 2020, 7:09 a.m. UTC | #12
On Fri, Nov 20, 2020 at 07:41:48PM -0600, Felipe Contreras wrote:

> > > +static void update_head(int config, const struct ref *head, const struct remote *remote)
> > > +{
> > > +     struct strbuf ref = STRBUF_INIT, target = STRBUF_INIT;
> > > +     const char *r, *head_name = NULL;
> > > +
> > > +     if (!head || !head->symref || !remote)
> > > +             return;
> >
> > I'd expect us to return early here, as well, if the config is set to
> > "never". I think your function will usually still do the right thing
> > (because we return early before writing), but it makes a pointless
> > lookup of the current origin/HEAD. But see below.
> 
> If the config is set to "never", the function update_head is never
> called, since the boolean need_update_head is never set in the outer
> function.

True.

> I don't like the convolutedness of this approach, but couldn't think
> of anything better.

In general, I think keeping as much logic as possible in update_head()
makes sense, rather than the caller. We don't need to avoid running
update_head() when the config is "never" if it is a true noop in that
case.

Unfortunately, we do still need to make a decision on whether to send a
request for "HEAD" to the other side in some cases, so some logic has to
be there.

But if we don't send such a request, but we _do_ find out about HEAD
somehow (e.g., because the user said "git fetch origin HEAD", which
seems like an obvious thing for someone to try in this case), we
probably should trigger the function.

> > since otherwise it doesn't ask about HEAD in a v2 exchange. What about:
> >
> >   git fetch origin master
> >
> > That won't report on HEAD either, even with your patch, because it hits
> > the part of the conditional before your "else if". What should it do?  I
> > can see an argument for "nothing, we only update head on full configured
> > fetches", but if so we should definitely make that clear in the
> > documentation. I can also see an argument for "always, if we happen to
> > have heard about it" (just like we opportunistically update tracking
> > refs even if they are fetched by name into FETCH_HEAD).
> 
> I don't see the point in complicating the code for those corner-cases.

I would think:

  git pull origin master

isn't that much of a corner case, and people would expect a HEAD update
to trigger (especially if that HEAD is master). That will run "git fetch
origin master" under the hood (and update both FETCH_HEAD and
refs/remotes/origin/master).

> And I also don't see how HEAD can be fetched unless we specifically
> ask for it. What would that command look like?

I think it would be nice if any of:

  - git fetch

  - git fetch origin

  - git fetch origin master

  - git fetch origin HEAD

triggered the auto-update, along with their git-pull equivalents. In the
first three, v2 protocol will require mentioning HEAD as a ref-prefix.

> Also, there's two optimizations that apparently went unnoticed:
> 
> 1. In the case of "missing". We could preemptively check if there's
> already an "origin/HEAD" before adding "HEAD" to the prefixes (and
> setting need_update_head). That would probably complicate the code.

Hmm. That might be worth doing, as it does involve extra network traffic
(and the server-side lookup is more complicated, as it looks up all
possible variants of HEAD). But it is only a few bytes. I guess it would
involve saving the value of origin/HEAD between the two calls, but that
doesn't seem all that hard.

But I'd also be OK with just unconditionally asking for HEAD (but as
above, I think it should happen even with a refspec, and anytime we hear
about HEAD we should consider updating the symref).

-Peff
diff mbox series

Patch

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index 6af6f5edb2..93d6c59fac 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -94,3 +94,7 @@  fetch.writeCommitGraph::
 	merge and the write may take longer. Having an updated commit-graph
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.updateHead::
+  Defines when to update the remote HEAD symbolic ref. Values are 'never',
+  'missing' (update only when HEAD is missing), and 'always'.
diff --git a/Documentation/config/remote.txt b/Documentation/config/remote.txt
index a8e6437a90..905661c7f7 100644
--- a/Documentation/config/remote.txt
+++ b/Documentation/config/remote.txt
@@ -84,3 +84,6 @@  remote.<name>.promisor::
 remote.<name>.partialclonefilter::
 	The filter that will be applied when fetching from this
 	promisor remote.
+
+remote.<name>.updateHead::
+  Defines when to update the remote HEAD symbolic ref. See `fetch.updateHead`.
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f9c3c49f14..b47b06f001 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -55,6 +55,8 @@  static int fetch_prune_tags_config = -1; /* unspecified */
 static int prune_tags = -1; /* unspecified */
 #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */
 
+static int fetch_update_head = FETCH_UPDATE_HEAD_DEFAULT;
+
 static int all, append, dry_run, force, keep, multiple, update_head_ok;
 static int write_fetch_head = 1;
 static int verbosity, deepen_relative, set_upstream;
@@ -120,6 +122,9 @@  static int git_fetch_config(const char *k, const char *v, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(k, "fetch.updatehead"))
+		return parse_update_head(&fetch_update_head, k, v);
+
 	return git_default_config(k, v, cb);
 }
 
@@ -1342,6 +1347,38 @@  static void backfill_tags(struct transport *transport, struct ref *ref_map)
 	}
 }
 
+static void update_head(int config, const struct ref *head, const struct remote *remote)
+{
+	struct strbuf ref = STRBUF_INIT, target = STRBUF_INIT;
+	const char *r, *head_name = NULL;
+
+	if (!head || !head->symref || !remote)
+		return;
+
+	strbuf_addf(&ref, "refs/remotes/%s/HEAD", remote->name);
+	skip_prefix(head->symref, "refs/heads/", &head_name);
+	strbuf_addf(&target, "refs/remotes/%s/%s", remote->name, head_name);
+
+	r = refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
+		ref.buf, RESOLVE_REF_READING,
+		NULL, NULL);
+
+	if (r) {
+		if (config == FETCH_UPDATE_HEAD_MISSING)
+			/* already present */
+			return;
+		else if (config == FETCH_UPDATE_HEAD_ALWAYS && !strcmp(r, target.buf))
+			/* already up-to-date */
+			return;
+		else
+			/* should never happen */
+			return;
+	}
+
+	if (create_symref(ref.buf, target.buf, "remote update head"))
+		warning(_("could not set remote head"));
+}
+
 static int do_fetch(struct transport *transport,
 		    struct refspec *rs)
 {
@@ -1351,6 +1388,7 @@  static int do_fetch(struct transport *transport,
 	const struct ref *remote_refs;
 	struct strvec ref_prefixes = STRVEC_INIT;
 	int must_list_refs = 1;
+	int need_update_head = 0, update_head_config = 0;
 
 	if (tags == TAGS_DEFAULT) {
 		if (transport->remote->fetch_tags == 2)
@@ -1382,8 +1420,18 @@  static int do_fetch(struct transport *transport,
 				break;
 			}
 		}
-	} else if (transport->remote && transport->remote->fetch.nr)
+	} else if (transport->remote && transport->remote->fetch.nr) {
+		if (transport->remote->update_head)
+			update_head_config = transport->remote->update_head;
+		else
+			update_head_config = fetch_update_head;
+
+		need_update_head = update_head_config && update_head_config != FETCH_UPDATE_HEAD_NEVER;
+
+		if (need_update_head)
+			strvec_push(&ref_prefixes, "HEAD");
 		refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes);
+	}
 
 	if (tags == TAGS_SET || tags == TAGS_DEFAULT) {
 		must_list_refs = 1;
@@ -1427,6 +1475,9 @@  static int do_fetch(struct transport *transport,
 		goto cleanup;
 	}
 
+	if (need_update_head)
+		update_head(update_head_config, find_ref_by_name(remote_refs, "HEAD"), transport->remote);
+
 	if (set_upstream) {
 		struct branch *branch = branch_get("HEAD");
 		struct ref *rm;
diff --git a/remote.c b/remote.c
index 8a6dbbb903..dad8062561 100644
--- a/remote.c
+++ b/remote.c
@@ -298,6 +298,25 @@  static void read_branches_file(struct remote *remote)
 	remote->fetch_tags = 1; /* always auto-follow */
 }
 
+int parse_update_head(int *r, const char *var, const char *value)
+{
+	if (!r)
+		return -1;
+	else if (!value)
+		return config_error_nonbool(var);
+	else if (!strcmp(value, "never"))
+		*r = FETCH_UPDATE_HEAD_NEVER;
+	else if (!strcmp(value, "missing"))
+		*r = FETCH_UPDATE_HEAD_MISSING;
+	else if (!strcmp(value, "always"))
+		*r = FETCH_UPDATE_HEAD_ALWAYS;
+	else {
+		error(_("malformed value for %s: %s"), var, value);
+		return error(_("must be one of never, missing, or always"));
+	}
+	return 0;
+}
+
 static int handle_config(const char *key, const char *value, void *cb)
 {
 	const char *name;
@@ -418,6 +437,8 @@  static int handle_config(const char *key, const char *value, void *cb)
 					 key, value);
 	} else if (!strcmp(subkey, "vcs")) {
 		return git_config_string(&remote->foreign_vcs, key, value);
+	} else if (!strcmp(subkey, "updatehead")) {
+		return parse_update_head(&remote->update_head, key, value);
 	}
 	return 0;
 }
diff --git a/remote.h b/remote.h
index 3211abdf05..c16a2d7b2e 100644
--- a/remote.h
+++ b/remote.h
@@ -21,6 +21,13 @@  enum {
 	REMOTE_BRANCHES
 };
 
+enum {
+	FETCH_UPDATE_HEAD_DEFAULT = 0,
+	FETCH_UPDATE_HEAD_NEVER,
+	FETCH_UPDATE_HEAD_MISSING,
+	FETCH_UPDATE_HEAD_ALWAYS,
+};
+
 struct remote {
 	struct hashmap_entry ent;
 
@@ -62,6 +69,8 @@  struct remote {
 	int prune;
 	int prune_tags;
 
+	int update_head;
+
 	/**
 	 * The configured helper programs to run on the remote side, for
 	 * Git-native protocols.
@@ -372,4 +381,6 @@  int parseopt_push_cas_option(const struct option *, const char *arg, int unset);
 int is_empty_cas(const struct push_cas_option *);
 void apply_push_cas(struct push_cas_option *, struct remote *, struct ref *);
 
+int parse_update_head(int *r, const char *var, const char *value);
+
 #endif