diff mbox series

[v4,1/3] ls-refs: report unborn targets of symrefs

Message ID a66e50626ef9343e0351fc6cfe8abcf4f9eed7f3.1608673963.git.jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series Cloning with remote unborn HEAD | expand

Commit Message

Jonathan Tan Dec. 22, 2020, 9:54 p.m. UTC
When cloning, we choose the default branch based on the remote HEAD.
But if there is no remote HEAD reported (which could happen if the
target of the remote HEAD is unborn), we'll fall back to using our local
init.defaultBranch. Traditionally this hasn't been a big deal, because
most repos used "master" as the default. But these days it is likely to
cause confusion if the server and client implementations choose
different values (e.g., if the remote started with "main", we may choose
"master" locally, create commits there, and then the user is surprised
when they push to "master" and not "main").

To solve this, the remote needs to communicate the target of the HEAD
symref, even if it is unborn, and "git clone" needs to use this
information.

Currently, symrefs that have unborn targets (such as in this case) are
not communicated by the protocol. Teach Git to advertise and support the
"unborn" feature in "ls-refs" (guarded by the lsrefs.allowunborn
config). This feature indicates that "ls-refs" supports the "unborn"
argument; when it is specified, "ls-refs" will send the HEAD symref with
the name of its unborn target.

This change is only for protocol v2. A similar change for protocol v0
would require independent protocol design (there being no analogous
position to signal support for "unborn") and client-side plumbing of the
data required, so the scope of this patch set is limited to protocol v2.

The client side will be updated to use this in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt                |  2 +
 Documentation/config/lsrefs.txt         |  3 ++
 Documentation/technical/protocol-v2.txt | 10 ++++-
 ls-refs.c                               | 52 +++++++++++++++++++++++--
 ls-refs.h                               |  1 +
 serve.c                                 |  2 +-
 6 files changed, 64 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/lsrefs.txt

Comments

Jeff King Jan. 21, 2021, 8:48 p.m. UTC | #1
On Tue, Dec 22, 2020 at 01:54:18PM -0800, Jonathan Tan wrote:

> -static int ls_refs_config(const char *var, const char *value, void *data)
> +static void send_possibly_unborn_head(struct ls_refs_data *data)
>  {
> +	struct strbuf namespaced = STRBUF_INIT;
> +	struct object_id oid;
> +	int flag;
> +	int oid_is_null;
> +
> +	memset(&oid, 0, sizeof(oid));
> +	strbuf_addf(&namespaced, "%sHEAD", get_git_namespace());
> +	resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag);

It feels weird to call resolve_ref_unsafe() without checking the return
value. How do we detect errors?

I think the logic is that we make assumptions about which fields it will
touch (i.e., zeroing the flags, and not touching our zero'd oid), and
then check those. That feels a bit non-obvious and intimate with the
implementation, though (and was presumably the source of the "oops, we
need to clear the oid bug between v3 and v4).

I feel like that deserves a comment, but I also wonder if:

  refname = resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag);
  if (!refname)
	return; /* broken, bad name, not even a symref, etc */

  /*
   * now we can look at oid even if we didn't memset() it, because
   * a successful return from resolve_ref_unsafe() means that it
   * has cleared it if appropriate
   */
  oid_is_null = is_null_oid(&oid);
  ...etc...

> +	if (!oid_is_null ||
> +	    (data->unborn && data->symrefs && (flag & REF_ISSYMREF)))
> +		send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data);

It likewise feels a bit funny that we determine the symref name in the
earlier call to resolve_ref_unsafe(), but we do not pass it here (and in
fact, we'll end up looking it up again!).

But that is not much different than what we do for normal refs passed to
the send_ref() callback. It would be nice if the iteration could pass in
"by the way, here is the symref value" to avoid that. But in practice it
isn't a big deal, since we only do the lookup when we see the ISSYMREF
flag set. So typically it is only one or two extra ref resolutions.

> @@ -91,7 +118,7 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  
>  	memset(&data, 0, sizeof(data));
>  
> -	git_config(ls_refs_config, NULL);
> +	git_config(ls_refs_config, &data);

You will probably not be surprised that I would suggest defaulting
data->allow_unborn to 1 before this config call. :)

> @@ -103,14 +130,31 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  			data.symrefs = 1;
>  		else if (skip_prefix(arg, "ref-prefix ", &out))
>  			strvec_push(&data.prefixes, out);
> +		else if (data.allow_unborn && !strcmp("unborn", arg))
> +			data.unborn = 1;
>  	}

So if we have not set allow_unborn, we will not accept the client saying
"unborn". Which makes sense, because we would not have advertised it in
that case.

But we use the same boolean for advertising, too. So this loses the
"allow us to accept it, but not advertise it" logic that your earlier
versions had, doesn't it? And that is the important element for making
things work across a non-atomic deploy of versions.

This straight-boolean version works as long as you can atomically update
the _config_ on each version. But that seems like roughly the same
problem (having dealt with this on GitHub servers, they are not
equivalent, and depending on your infrastructure, it definitely _can_ be
easier to do one versus the other. But it seems like a funny place to
leave this upstream feature).

Or is the intent that an unconfigured reader would silently ignore the
unborn flag in that case? That would at least not cause it to bail on
the client in a mixed-version environment. But it does feel like a
confusing result.

-Peff
Jonathan Tan Jan. 26, 2021, 6:13 p.m. UTC | #2
> On Tue, Dec 22, 2020 at 01:54:18PM -0800, Jonathan Tan wrote:
> 
> > -static int ls_refs_config(const char *var, const char *value, void *data)
> > +static void send_possibly_unborn_head(struct ls_refs_data *data)
> >  {
> > +	struct strbuf namespaced = STRBUF_INIT;
> > +	struct object_id oid;
> > +	int flag;
> > +	int oid_is_null;
> > +
> > +	memset(&oid, 0, sizeof(oid));
> > +	strbuf_addf(&namespaced, "%sHEAD", get_git_namespace());
> > +	resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag);
> 
> It feels weird to call resolve_ref_unsafe() without checking the return
> value. How do we detect errors?
> 
> I think the logic is that we make assumptions about which fields it will
> touch (i.e., zeroing the flags, and not touching our zero'd oid), and
> then check those. That feels a bit non-obvious and intimate with the
> implementation, though (and was presumably the source of the "oops, we
> need to clear the oid bug between v3 and v4).
> 
> I feel like that deserves a comment, but I also wonder if:
> 
>   refname = resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag);
>   if (!refname)
> 	return; /* broken, bad name, not even a symref, etc */

From my reading of this part of refs_resolve_ref_unsafe():

                if (!(read_flags & REF_ISSYMREF)) {
                        if (*flags & REF_BAD_NAME) {
                                oidclr(oid);
                                *flags |= REF_ISBROKEN;
                        }
                        return refname;
                }

it seems that resolve_ref_unsafe() returns non-NULL if the ref is not a
symref but is otherwise valid. But this is exactly what we want -
send_possibly_unborn_head() must send HEAD in this situation anyway.
Thanks - I've switched to checking the return value.

(It was a bit confusing that refs_resolve_ref_unsafe() returns one of
its input arguments if it succeeds and NULL if it fails, but that's
outside the scope of this patch, I think.)

> > +	if (!oid_is_null ||
> > +	    (data->unborn && data->symrefs && (flag & REF_ISSYMREF)))
> > +		send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data);
> 
> It likewise feels a bit funny that we determine the symref name in the
> earlier call to resolve_ref_unsafe(), but we do not pass it here (and in
> fact, we'll end up looking it up again!).
> 
> But that is not much different than what we do for normal refs passed to
> the send_ref() callback. It would be nice if the iteration could pass in
> "by the way, here is the symref value" to avoid that.

Yes, that would be nice.

> But in practice it
> isn't a big deal, since we only do the lookup when we see the ISSYMREF
> flag set. So typically it is only one or two extra ref resolutions.

OK.

> > @@ -91,7 +118,7 @@ int ls_refs(struct repository *r, struct strvec *keys,
> >  
> >  	memset(&data, 0, sizeof(data));
> >  
> > -	git_config(ls_refs_config, NULL);
> > +	git_config(ls_refs_config, &data);
> 
> You will probably not be surprised that I would suggest defaulting
> data->allow_unborn to 1 before this config call. :)

I don't think many people have made comments either way, so I'll go
ahead with defaulting it to true. I can see arguments for both sides.

> > @@ -103,14 +130,31 @@ int ls_refs(struct repository *r, struct strvec *keys,
> >  			data.symrefs = 1;
> >  		else if (skip_prefix(arg, "ref-prefix ", &out))
> >  			strvec_push(&data.prefixes, out);
> > +		else if (data.allow_unborn && !strcmp("unborn", arg))
> > +			data.unborn = 1;
> >  	}
> 
> So if we have not set allow_unborn, we will not accept the client saying
> "unborn". Which makes sense, because we would not have advertised it in
> that case.
> 
> But we use the same boolean for advertising, too. So this loses the
> "allow us to accept it, but not advertise it" logic that your earlier
> versions had, doesn't it?

Yes, it does.

> And that is the important element for making
> things work across a non-atomic deploy of versions.
> 
> This straight-boolean version works as long as you can atomically update
> the _config_ on each version. But that seems like roughly the same
> problem (having dealt with this on GitHub servers, they are not
> equivalent, and depending on your infrastructure, it definitely _can_ be
> easier to do one versus the other. But it seems like a funny place to
> leave this upstream feature).

Well, I was just agreeing with what you said [1]. :-)

[1] https://lore.kernel.org/git/X9xJLWdFJfNJTn0p@coredump.intra.peff.net/

> Or is the intent that an unconfigured reader would silently ignore the
> unborn flag in that case? That would at least not cause it to bail on
> the client in a mixed-version environment. But it does feel like a
> confusing result.

Right now, an old server would ignore "unborn", yes. I'm not sure of
what the intent should be - tightening ls-refs and fetch to forbid
unknown arguments seems like a good idea to me.
Jeff King Jan. 26, 2021, 11:16 p.m. UTC | #3
On Tue, Jan 26, 2021 at 10:13:58AM -0800, Jonathan Tan wrote:

> (It was a bit confusing that refs_resolve_ref_unsafe() returns one of
> its input arguments if it succeeds and NULL if it fails, but that's
> outside the scope of this patch, I think.)

Yep. It would probably be much nicer for it to return a numeric success
code, and to take an optional strbuf into which to write the resolved
symref name (if the caller even cares about it). But definitely out of
scope for your patch.

> > This straight-boolean version works as long as you can atomically update
> > the _config_ on each version. But that seems like roughly the same
> > problem (having dealt with this on GitHub servers, they are not
> > equivalent, and depending on your infrastructure, it definitely _can_ be
> > easier to do one versus the other. But it seems like a funny place to
> > leave this upstream feature).
> 
> Well, I was just agreeing with what you said [1]. :-)
> 
> [1] https://lore.kernel.org/git/X9xJLWdFJfNJTn0p@coredump.intra.peff.net/

Oh, I just need to you to agree harder then. ;)

If we are not going to support config that helps you do an atomic
deploy, then I don't really see the point of having config at all.
Here are three plausible implementations I can conceive of:

  - allowUnborn is a tri-state for "accept-but-do-not-advertise",
    "accept-and-advertise", and "disallow". This helps with rollout in a
    cluster by setting it to the accept-but-do-not-advertise.  The
    default would be accept-and-advertise, which is what most servers
    would want. I don't really know why anyone would want "disallow".

  - allowUnborn is a bool for "accept-and-advertise" or "disallow". This
    doesn't help cluster rollout. I don't know why anyone would want to
    switch away from the default of accept-and-advertise.

  - allowUnborn is always on.

The first one helps the cluster case, at the cost of introducing an
extra config knob. The third one doesn't help that case, but is one less
knob for server admins to think about. But the second one has a knob
that I don't understand why anybody would tweak. It seems like the worst
of both.

Perhaps there's a reason for setting "disallow" that I don't know. Or
perhaps you're happy to help the cluster case using a simple bool with
atomic config rollouts (which are outside the scope of Git itself).

> > Or is the intent that an unconfigured reader would silently ignore the
> > unborn flag in that case? That would at least not cause it to bail on
> > the client in a mixed-version environment. But it does feel like a
> > confusing result.
> 
> Right now, an old server would ignore "unborn", yes. I'm not sure of
> what the intent should be - tightening ls-refs and fetch to forbid
> unknown arguments seems like a good idea to me.

If we had a just a bool (case 2 from above), and there was an
always-implied "accept unborn even if not advertised", then that _does_
let the config help out the cluster case (it just turns off
advertisements, basically making the bool "accept-but-do-not-advertise"
versus "disallow").

I don't love it. The protocol spec does say "don't ask for capability
foo if the server didn't say it knows about foo". We'd be loosening the
enforcement of that (if only for capabilities we _do_ in fact know
about), even though we don't know if it was due to a race, or if the
client is just misbehaving. But I wondered if that was the direction you
were going to try to solve your cluster-rollout problem.

-Peff
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6ba50b1104..d08e83a148 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -398,6 +398,8 @@  include::config/interactive.txt[]
 
 include::config/log.txt[]
 
+include::config/lsrefs.txt[]
+
 include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
diff --git a/Documentation/config/lsrefs.txt b/Documentation/config/lsrefs.txt
new file mode 100644
index 0000000000..dcbec11aaa
--- /dev/null
+++ b/Documentation/config/lsrefs.txt
@@ -0,0 +1,3 @@ 
+lsrefs.allowUnborn::
+	Allow the server to send information about unborn symrefs during the
+	protocol v2 ref advertisement.
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 85daeb5d9e..4707511c10 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -192,11 +192,19 @@  ls-refs takes in the following arguments:
 	When specified, only references having a prefix matching one of
 	the provided prefixes are displayed.
 
+If the 'unborn' feature is advertised the following argument can be
+included in the client's request.
+
+    unborn
+	The server may send symrefs pointing to unborn branches in the form
+	"unborn <refname> symref-target:<target>".
+
 The output of ls-refs is as follows:
 
     output = *ref
 	     flush-pkt
-    ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+    obj-id-or-unborn = (obj-id | "unborn")
+    ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
     ref-attribute = (symref | peeled)
     symref = "symref-target:" symref-target
     peeled = "peeled:" obj-id
diff --git a/ls-refs.c b/ls-refs.c
index a1e0b473e4..ff61e704f1 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -32,6 +32,8 @@  struct ls_refs_data {
 	unsigned peel;
 	unsigned symrefs;
 	struct strvec prefixes;
+	unsigned allow_unborn : 1;
+	unsigned unborn : 1;
 };
 
 static int send_ref(const char *refname, const struct object_id *oid,
@@ -47,7 +49,10 @@  static int send_ref(const char *refname, const struct object_id *oid,
 	if (!ref_match(&data->prefixes, refname_nons))
 		return 0;
 
-	strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+	if (oid)
+		strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+	else
+		strbuf_addf(&refline, "unborn %s", refname_nons);
 	if (data->symrefs && flag & REF_ISSYMREF) {
 		struct object_id unused;
 		const char *symref_target = resolve_ref_unsafe(refname, 0,
@@ -74,8 +79,30 @@  static int send_ref(const char *refname, const struct object_id *oid,
 	return 0;
 }
 
-static int ls_refs_config(const char *var, const char *value, void *data)
+static void send_possibly_unborn_head(struct ls_refs_data *data)
 {
+	struct strbuf namespaced = STRBUF_INIT;
+	struct object_id oid;
+	int flag;
+	int oid_is_null;
+
+	memset(&oid, 0, sizeof(oid));
+	strbuf_addf(&namespaced, "%sHEAD", get_git_namespace());
+	resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag);
+	oid_is_null = is_null_oid(&oid);
+	if (!oid_is_null ||
+	    (data->unborn && data->symrefs && (flag & REF_ISSYMREF)))
+		send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data);
+	strbuf_release(&namespaced);
+}
+
+static int ls_refs_config(const char *var, const char *value, void *cb_data)
+{
+	struct ls_refs_data *data = cb_data;
+
+	if (!strcmp("lsrefs.allowunborn", var))
+		data->allow_unborn = git_config_bool(var, value);
+
 	/*
 	 * We only serve fetches over v2 for now, so respect only "uploadpack"
 	 * config. This may need to eventually be expanded to "receive", but we
@@ -91,7 +118,7 @@  int ls_refs(struct repository *r, struct strvec *keys,
 
 	memset(&data, 0, sizeof(data));
 
-	git_config(ls_refs_config, NULL);
+	git_config(ls_refs_config, &data);
 
 	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
 		const char *arg = request->line;
@@ -103,14 +130,31 @@  int ls_refs(struct repository *r, struct strvec *keys,
 			data.symrefs = 1;
 		else if (skip_prefix(arg, "ref-prefix ", &out))
 			strvec_push(&data.prefixes, out);
+		else if (data.allow_unborn && !strcmp("unborn", arg))
+			data.unborn = 1;
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ls-refs arguments"));
 
-	head_ref_namespaced(send_ref, &data);
+	send_possibly_unborn_head(&data);
 	for_each_namespaced_ref(send_ref, &data);
 	packet_flush(1);
 	strvec_clear(&data.prefixes);
 	return 0;
 }
+
+int ls_refs_advertise(struct repository *r, struct strbuf *value)
+{
+	if (value) {
+		int allow_unborn_value;
+
+		if (!repo_config_get_bool(the_repository,
+					 "lsrefs.allowunborn",
+					 &allow_unborn_value) &&
+		    allow_unborn_value)
+			strbuf_addstr(value, "unborn");
+	}
+
+	return 1;
+}
diff --git a/ls-refs.h b/ls-refs.h
index 7b33a7c6b8..a99e4be0bd 100644
--- a/ls-refs.h
+++ b/ls-refs.h
@@ -6,5 +6,6 @@  struct strvec;
 struct packet_reader;
 int ls_refs(struct repository *r, struct strvec *keys,
 	    struct packet_reader *request);
+int ls_refs_advertise(struct repository *r, struct strbuf *value);
 
 #endif /* LS_REFS_H */
diff --git a/serve.c b/serve.c
index eec2fe6f29..ac20c72763 100644
--- a/serve.c
+++ b/serve.c
@@ -73,7 +73,7 @@  struct protocol_capability {
 
 static struct protocol_capability capabilities[] = {
 	{ "agent", agent_advertise, NULL },
-	{ "ls-refs", always_advertise, ls_refs },
+	{ "ls-refs", ls_refs_advertise, ls_refs },
 	{ "fetch", upload_pack_advertise, upload_pack_v2 },
 	{ "server-option", always_advertise, NULL },
 	{ "object-format", object_format_advertise, NULL },