diff mbox series

[v6,1/3] ls-refs: report unborn targets of symrefs

Message ID 411bbafe25f60c2ef9d9ab773e5eb046b8549b2d.1612231707.git.jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series Cloning with remote unborn HEAD | expand

Commit Message

Jonathan Tan Feb. 2, 2021, 2:14 a.m. UTC
When cloning, we choose the default branch based on the remote HEAD.
But if there is no remote HEAD reported (which could happen if the
target of the remote HEAD is unborn), we'll fall back to using our local
init.defaultBranch. Traditionally this hasn't been a big deal, because
most repos used "master" as the default. But these days it is likely to
cause confusion if the server and client implementations choose
different values (e.g., if the remote started with "main", we may choose
"master" locally, create commits there, and then the user is surprised
when they push to "master" and not "main").

To solve this, the remote needs to communicate the target of the HEAD
symref, even if it is unborn, and "git clone" needs to use this
information.

Currently, symrefs that have unborn targets (such as in this case) are
not communicated by the protocol. Teach Git to advertise and support the
"unborn" feature in "ls-refs" (by default, this is advertised, but
server administrators may turn this off through the lsrefs.allowunborn
config). This feature indicates that "ls-refs" supports the "unborn"
argument; when it is specified, "ls-refs" will send the HEAD symref with
the name of its unborn target.

This change is only for protocol v2. A similar change for protocol v0
would require independent protocol design (there being no analogous
position to signal support for "unborn") and client-side plumbing of the
data required, so the scope of this patch set is limited to protocol v2.

The client side will be updated to use this in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt                |  2 +
 Documentation/config/lsrefs.txt         |  3 ++
 Documentation/technical/protocol-v2.txt | 10 ++++-
 ls-refs.c                               | 59 +++++++++++++++++++++++--
 ls-refs.h                               |  1 +
 serve.c                                 |  2 +-
 t/t5701-git-serve.sh                    |  2 +-
 7 files changed, 73 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/lsrefs.txt

Comments

Junio C Hamano Feb. 2, 2021, 4:55 p.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> When cloning, we choose the default branch based on the remote HEAD.
> But if there is no remote HEAD reported (which could happen if the
> target of the remote HEAD is unborn), we'll fall back to using our local
> init.defaultBranch. Traditionally this hasn't been a big deal, because
> most repos used "master" as the default. But these days it is likely to
> cause confusion if the server and client implementations choose
> different values (e.g., if the remote started with "main", we may choose
> "master" locally, create commits there, and then the user is surprised
> when they push to "master" and not "main").
> ...
> The client side will be updated to use this in a subsequent commit.

Nicely explained.

> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 85daeb5d9e..4707511c10 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -192,11 +192,19 @@ ls-refs takes in the following arguments:
>  	When specified, only references having a prefix matching one of
>  	the provided prefixes are displayed.
>  
> +If the 'unborn' feature is advertised the following argument can be
> +included in the client's request.
> +
> +    unborn
> +	The server may send symrefs pointing to unborn branches in the form
> +	"unborn <refname> symref-target:<target>".
> +

I somehow had an impression that this is done only for HEAD and no
other symrefs.

If this describes the ideal endgame state and the implementation at
the moment only covers what is practically the most useful (i.e.
HEAD), that is fine.

If we do not plan to support symrefs other than HEAD that are
dangling, that is fine as well, but then the description needs
updating, no?
Jonathan Tan Feb. 2, 2021, 6:34 p.m. UTC | #2
> > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> > index 85daeb5d9e..4707511c10 100644
> > --- a/Documentation/technical/protocol-v2.txt
> > +++ b/Documentation/technical/protocol-v2.txt
> > @@ -192,11 +192,19 @@ ls-refs takes in the following arguments:
> >  	When specified, only references having a prefix matching one of
> >  	the provided prefixes are displayed.
> >  
> > +If the 'unborn' feature is advertised the following argument can be
> > +included in the client's request.
> > +
> > +    unborn
> > +	The server may send symrefs pointing to unborn branches in the form
> > +	"unborn <refname> symref-target:<target>".
> > +
> 
> I somehow had an impression that this is done only for HEAD and no
> other symrefs.

Right now, that's true.

> If this describes the ideal endgame state and the implementation at
> the moment only covers what is practically the most useful (i.e.
> HEAD), that is fine.
> 
> If we do not plan to support symrefs other than HEAD that are
> dangling, that is fine as well, but then the description needs
> updating, no?

I'm not sure what the ideal endgame state is, but I could see how
sending all symlinks would be useful (e.g. if a client wanted to mirror
another repo with more fidelity). Right now I don't plan on adding
support for dangling symrefs other than HEAD, though. Maybe I'll update
it to something like:

  If HEAD is a symref pointing to an unborn branch, the server may send
  it in the form "unborn HEAD symref-target:<target>". In the future,
  this may be extended to other symrefs as well.

I think that there is a discussion point to be decided
(advertise/allow/ignore vs allow/ignore), so I'll wait for that before
sending v7.
Junio C Hamano Feb. 2, 2021, 10:17 p.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

> I'm not sure what the ideal endgame state is, but I could see how
> sending all symlinks would be useful (e.g. if a client wanted to mirror
> another repo with more fidelity). Right now I don't plan on adding
> support for dangling symrefs other than HEAD, though. Maybe I'll update
> it to something like:
>
>   If HEAD is a symref pointing to an unborn branch, the server may send
>   it in the form "unborn HEAD symref-target:<target>". In the future,
>   this may be extended to other symrefs as well.

Unless you plan to add support for symbolic refs that are not HEAD
in immediate future, "In the future, ..." is not even necessary to
say.  The users cannot expect to exploit the missing feature anyway,
and they cannot even plan to use it in near future.

I've been disturbed by the phrase "the server may send it" quite a
lot, actually.  In the original before the rewrite above, it was a
good cop-out excuse "no, we do not send symbolic refs other than
HEAD because we only say 'the server may' and do not promise
anything beyond that".  But now we are tightening the description
to HEAD that we do intend to support well, it probably is a good
idea to give users a promise a bit firmer than that.

    unborn If HEAD is a symref pointing to an unborn branch <b>, the
    server reports it as "unborn HEAD symref-target:refs/heads/<b>"
    in its response.

It would make it clear that by sending 'unborn' in the request, the
client is not just allowing the server to include the unborn
information in the response.  It is asking the server, that has
advertised that it is capable to do so, to exercise the feature.

> I think that there is a discussion point to be decided
> (advertise/allow/ignore vs allow/ignore), so I'll wait for that before
> sending v7.

What is the downside of having three choices (which allows phased
deployment, where everybody starts as capable of responding without
advertising in the first phase, and once everybody becomes capable
of responding, they start advertising) and the reason we might
prefer just allow/ignore instead?  Too much complexity?  It does not
help the real deployment as much in practice as it seems on paper?

I am not advocating three-choice option; I am neutral, but do not
see a good reason to reject it (while I can easily see a reason to
reject the other one).

Thanks.
Jonathan Tan Feb. 3, 2021, 1:04 a.m. UTC | #4
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > I'm not sure what the ideal endgame state is, but I could see how
> > sending all symlinks would be useful (e.g. if a client wanted to mirror
> > another repo with more fidelity). Right now I don't plan on adding
> > support for dangling symrefs other than HEAD, though. Maybe I'll update
> > it to something like:
> >
> >   If HEAD is a symref pointing to an unborn branch, the server may send
> >   it in the form "unborn HEAD symref-target:<target>". In the future,
> >   this may be extended to other symrefs as well.
> 
> Unless you plan to add support for symbolic refs that are not HEAD
> in immediate future, "In the future, ..." is not even necessary to
> say.  The users cannot expect to exploit the missing feature anyway,
> and they cannot even plan to use it in near future.
> 
> I've been disturbed by the phrase "the server may send it" quite a
> lot, actually.  In the original before the rewrite above, it was a
> good cop-out excuse "no, we do not send symbolic refs other than
> HEAD because we only say 'the server may' and do not promise
> anything beyond that".  But now we are tightening the description
> to HEAD that we do intend to support well, it probably is a good
> idea to give users a promise a bit firmer than that.
> 
>     unborn If HEAD is a symref pointing to an unborn branch <b>, the
>     server reports it as "unborn HEAD symref-target:refs/heads/<b>"
>     in its response.
> 
> It would make it clear that by sending 'unborn' in the request, the
> client is not just allowing the server to include the unborn
> information in the response.  It is asking the server, that has
> advertised that it is capable to do so, to exercise the feature.

That makes sense. OK, I'll make the promise firmer.

> > I think that there is a discussion point to be decided
> > (advertise/allow/ignore vs allow/ignore), so I'll wait for that before
> > sending v7.
> 
> What is the downside of having three choices (which allows phased
> deployment, where everybody starts as capable of responding without
> advertising in the first phase, and once everybody becomes capable
> of responding, they start advertising) and the reason we might
> prefer just allow/ignore instead?  Too much complexity?  It does not
> help the real deployment as much in practice as it seems on paper?
> 
> I am not advocating three-choice option; I am neutral, but do not
> see a good reason to reject it (while I can easily see a reason to
> reject the other one).

Here's a reason from Peff's email [1] against advertise/allow/ignore (the "code
change" is a temporary hack that teaches Git to accept but not advertise
report-status-v2). Granted, he does say that this may be an oversimplification,
and in the overall email, he was arguing more for having this feature on by
default (whether we have advertise/allow/ignore, allow/ignore, or no config at
all) rather than for any specific configuration scheme.

  - one nice thing about the code change is that after the rollout is
    done, it's safe to make the code unconditional again, which makes
    it simpler to read/reason about.

    This may be oversimplifying it a bit, of course. On one platform, we
    know when the rollout is happening. But if it's something we ship
    upstream, then "rollout" may be on the jump from v2.28 to v2.29, or
    to v2.30, or v2.31, etc. You can never say "rollouts are done, and
    existing server versions know about this feature". So any upstream
    support like config has to stay forever.

To balance that out, from the same email [1], a slight argument against no
config at all:

  (I know there was also an indication that some people might want it off
  because they somehow want to have no HEAD at all. I don't find this
  particularly compelling, but even if it were, I think we could leave it
  the config as an escape hatch for such folks, but still default it to
  "on").

[1] https://lore.kernel.org/git/X9xJLWdFJfNJTn0p@coredump.intra.peff.net/

And an argument against the allow/ignore [2]:

  If we are not going to support config that helps you do an atomic
  deploy, then I don't really see the point of having config at all.
  Here are three plausible implementations I can conceive of:
  
    - allowUnborn is a tri-state for "accept-but-do-not-advertise",
      "accept-and-advertise", and "disallow". This helps with rollout in a
      cluster by setting it to the accept-but-do-not-advertise.  The
      default would be accept-and-advertise, which is what most servers
      would want. I don't really know why anyone would want "disallow".
  
    - allowUnborn is a bool for "accept-and-advertise" or "disallow". This
      doesn't help cluster rollout. I don't know why anyone would want to
      switch away from the default of accept-and-advertise.
  
    - allowUnborn is always on.
  
  The first one helps the cluster case, at the cost of introducing an
  extra config knob. The third one doesn't help that case, but is one less
  knob for server admins to think about. But the second one has a knob
  that I don't understand why anybody would tweak. It seems like the worst
  of both.

[2] https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6ba50b1104..d08e83a148 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -398,6 +398,8 @@  include::config/interactive.txt[]
 
 include::config/log.txt[]
 
+include::config/lsrefs.txt[]
+
 include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
diff --git a/Documentation/config/lsrefs.txt b/Documentation/config/lsrefs.txt
new file mode 100644
index 0000000000..dcbec11aaa
--- /dev/null
+++ b/Documentation/config/lsrefs.txt
@@ -0,0 +1,3 @@ 
+lsrefs.allowUnborn::
+	Allow the server to send information about unborn symrefs during the
+	protocol v2 ref advertisement.
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 85daeb5d9e..4707511c10 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -192,11 +192,19 @@  ls-refs takes in the following arguments:
 	When specified, only references having a prefix matching one of
 	the provided prefixes are displayed.
 
+If the 'unborn' feature is advertised the following argument can be
+included in the client's request.
+
+    unborn
+	The server may send symrefs pointing to unborn branches in the form
+	"unborn <refname> symref-target:<target>".
+
 The output of ls-refs is as follows:
 
     output = *ref
 	     flush-pkt
-    ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+    obj-id-or-unborn = (obj-id | "unborn")
+    ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
     ref-attribute = (symref | peeled)
     symref = "symref-target:" symref-target
     peeled = "peeled:" obj-id
diff --git a/ls-refs.c b/ls-refs.c
index a1e0b473e4..daf4e77a4a 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -7,6 +7,24 @@ 
 #include "pkt-line.h"
 #include "config.h"
 
+static int config_read;
+static int allow_unborn;
+
+static void ensure_config_read(void)
+{
+	if (config_read)
+		return;
+
+	if (repo_config_get_bool(the_repository, "lsrefs.allowunborn",
+				 &allow_unborn))
+		/*
+		 * If there is no such config, set it to 1 to allow it by
+		 * default.
+		 */
+		allow_unborn = 1;
+	config_read = 1;
+}
+
 /*
  * Check if one of the prefixes is a prefix of the ref.
  * If no prefixes were provided, all refs match.
@@ -32,6 +50,7 @@  struct ls_refs_data {
 	unsigned peel;
 	unsigned symrefs;
 	struct strvec prefixes;
+	unsigned unborn : 1;
 };
 
 static int send_ref(const char *refname, const struct object_id *oid,
@@ -47,7 +66,10 @@  static int send_ref(const char *refname, const struct object_id *oid,
 	if (!ref_match(&data->prefixes, refname_nons))
 		return 0;
 
-	strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+	if (oid)
+		strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+	else
+		strbuf_addf(&refline, "unborn %s", refname_nons);
 	if (data->symrefs && flag & REF_ISSYMREF) {
 		struct object_id unused;
 		const char *symref_target = resolve_ref_unsafe(refname, 0,
@@ -61,7 +83,7 @@  static int send_ref(const char *refname, const struct object_id *oid,
 			    strip_namespace(symref_target));
 	}
 
-	if (data->peel) {
+	if (data->peel && oid) {
 		struct object_id peeled;
 		if (!peel_ref(refname, &peeled))
 			strbuf_addf(&refline, " peeled:%s", oid_to_hex(&peeled));
@@ -74,6 +96,23 @@  static int send_ref(const char *refname, const struct object_id *oid,
 	return 0;
 }
 
+static void send_possibly_unborn_head(struct ls_refs_data *data)
+{
+	struct strbuf namespaced = STRBUF_INIT;
+	struct object_id oid;
+	int flag;
+	int oid_is_null;
+
+	strbuf_addf(&namespaced, "%sHEAD", get_git_namespace());
+	if (!resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag))
+		return; /* bad ref */
+	oid_is_null = is_null_oid(&oid);
+	if (!oid_is_null ||
+	    (data->unborn && data->symrefs && (flag & REF_ISSYMREF)))
+		send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data);
+	strbuf_release(&namespaced);
+}
+
 static int ls_refs_config(const char *var, const char *value, void *data)
 {
 	/*
@@ -91,6 +130,7 @@  int ls_refs(struct repository *r, struct strvec *keys,
 
 	memset(&data, 0, sizeof(data));
 
+	ensure_config_read();
 	git_config(ls_refs_config, NULL);
 
 	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
@@ -103,14 +143,27 @@  int ls_refs(struct repository *r, struct strvec *keys,
 			data.symrefs = 1;
 		else if (skip_prefix(arg, "ref-prefix ", &out))
 			strvec_push(&data.prefixes, out);
+		else if (!strcmp("unborn", arg))
+			data.unborn = allow_unborn;
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ls-refs arguments"));
 
-	head_ref_namespaced(send_ref, &data);
+	send_possibly_unborn_head(&data);
 	for_each_namespaced_ref(send_ref, &data);
 	packet_flush(1);
 	strvec_clear(&data.prefixes);
 	return 0;
 }
+
+int ls_refs_advertise(struct repository *r, struct strbuf *value)
+{
+	if (value) {
+		ensure_config_read();
+		if (allow_unborn)
+			strbuf_addstr(value, "unborn");
+	}
+
+	return 1;
+}
diff --git a/ls-refs.h b/ls-refs.h
index 7b33a7c6b8..a99e4be0bd 100644
--- a/ls-refs.h
+++ b/ls-refs.h
@@ -6,5 +6,6 @@  struct strvec;
 struct packet_reader;
 int ls_refs(struct repository *r, struct strvec *keys,
 	    struct packet_reader *request);
+int ls_refs_advertise(struct repository *r, struct strbuf *value);
 
 #endif /* LS_REFS_H */
diff --git a/serve.c b/serve.c
index eec2fe6f29..ac20c72763 100644
--- a/serve.c
+++ b/serve.c
@@ -73,7 +73,7 @@  struct protocol_capability {
 
 static struct protocol_capability capabilities[] = {
 	{ "agent", agent_advertise, NULL },
-	{ "ls-refs", always_advertise, ls_refs },
+	{ "ls-refs", ls_refs_advertise, ls_refs },
 	{ "fetch", upload_pack_advertise, upload_pack_v2 },
 	{ "server-option", always_advertise, NULL },
 	{ "object-format", object_format_advertise, NULL },
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index a1f5fdc9fd..df29504161 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -12,7 +12,7 @@  test_expect_success 'test capability advertisement' '
 	cat >expect <<-EOF &&
 	version 2
 	agent=git/$(git version | cut -d" " -f3)
-	ls-refs
+	ls-refs=unborn
 	fetch=shallow
 	server-option
 	object-format=$(test_oid algo)