Message ID | a66e50626ef9343e0351fc6cfe8abcf4f9eed7f3.1608673963.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cloning with remote unborn HEAD | expand |
On Tue, Dec 22, 2020 at 01:54:18PM -0800, Jonathan Tan wrote: > -static int ls_refs_config(const char *var, const char *value, void *data) > +static void send_possibly_unborn_head(struct ls_refs_data *data) > { > + struct strbuf namespaced = STRBUF_INIT; > + struct object_id oid; > + int flag; > + int oid_is_null; > + > + memset(&oid, 0, sizeof(oid)); > + strbuf_addf(&namespaced, "%sHEAD", get_git_namespace()); > + resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag); It feels weird to call resolve_ref_unsafe() without checking the return value. How do we detect errors? I think the logic is that we make assumptions about which fields it will touch (i.e., zeroing the flags, and not touching our zero'd oid), and then check those. That feels a bit non-obvious and intimate with the implementation, though (and was presumably the source of the "oops, we need to clear the oid bug between v3 and v4). I feel like that deserves a comment, but I also wonder if: refname = resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag); if (!refname) return; /* broken, bad name, not even a symref, etc */ /* * now we can look at oid even if we didn't memset() it, because * a successful return from resolve_ref_unsafe() means that it * has cleared it if appropriate */ oid_is_null = is_null_oid(&oid); ...etc... > + if (!oid_is_null || > + (data->unborn && data->symrefs && (flag & REF_ISSYMREF))) > + send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data); It likewise feels a bit funny that we determine the symref name in the earlier call to resolve_ref_unsafe(), but we do not pass it here (and in fact, we'll end up looking it up again!). But that is not much different than what we do for normal refs passed to the send_ref() callback. It would be nice if the iteration could pass in "by the way, here is the symref value" to avoid that. But in practice it isn't a big deal, since we only do the lookup when we see the ISSYMREF flag set. So typically it is only one or two extra ref resolutions. > @@ -91,7 +118,7 @@ int ls_refs(struct repository *r, struct strvec *keys, > > memset(&data, 0, sizeof(data)); > > - git_config(ls_refs_config, NULL); > + git_config(ls_refs_config, &data); You will probably not be surprised that I would suggest defaulting data->allow_unborn to 1 before this config call. :) > @@ -103,14 +130,31 @@ int ls_refs(struct repository *r, struct strvec *keys, > data.symrefs = 1; > else if (skip_prefix(arg, "ref-prefix ", &out)) > strvec_push(&data.prefixes, out); > + else if (data.allow_unborn && !strcmp("unborn", arg)) > + data.unborn = 1; > } So if we have not set allow_unborn, we will not accept the client saying "unborn". Which makes sense, because we would not have advertised it in that case. But we use the same boolean for advertising, too. So this loses the "allow us to accept it, but not advertise it" logic that your earlier versions had, doesn't it? And that is the important element for making things work across a non-atomic deploy of versions. This straight-boolean version works as long as you can atomically update the _config_ on each version. But that seems like roughly the same problem (having dealt with this on GitHub servers, they are not equivalent, and depending on your infrastructure, it definitely _can_ be easier to do one versus the other. But it seems like a funny place to leave this upstream feature). Or is the intent that an unconfigured reader would silently ignore the unborn flag in that case? That would at least not cause it to bail on the client in a mixed-version environment. But it does feel like a confusing result. -Peff
> On Tue, Dec 22, 2020 at 01:54:18PM -0800, Jonathan Tan wrote: > > > -static int ls_refs_config(const char *var, const char *value, void *data) > > +static void send_possibly_unborn_head(struct ls_refs_data *data) > > { > > + struct strbuf namespaced = STRBUF_INIT; > > + struct object_id oid; > > + int flag; > > + int oid_is_null; > > + > > + memset(&oid, 0, sizeof(oid)); > > + strbuf_addf(&namespaced, "%sHEAD", get_git_namespace()); > > + resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag); > > It feels weird to call resolve_ref_unsafe() without checking the return > value. How do we detect errors? > > I think the logic is that we make assumptions about which fields it will > touch (i.e., zeroing the flags, and not touching our zero'd oid), and > then check those. That feels a bit non-obvious and intimate with the > implementation, though (and was presumably the source of the "oops, we > need to clear the oid bug between v3 and v4). > > I feel like that deserves a comment, but I also wonder if: > > refname = resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag); > if (!refname) > return; /* broken, bad name, not even a symref, etc */ From my reading of this part of refs_resolve_ref_unsafe(): if (!(read_flags & REF_ISSYMREF)) { if (*flags & REF_BAD_NAME) { oidclr(oid); *flags |= REF_ISBROKEN; } return refname; } it seems that resolve_ref_unsafe() returns non-NULL if the ref is not a symref but is otherwise valid. But this is exactly what we want - send_possibly_unborn_head() must send HEAD in this situation anyway. Thanks - I've switched to checking the return value. (It was a bit confusing that refs_resolve_ref_unsafe() returns one of its input arguments if it succeeds and NULL if it fails, but that's outside the scope of this patch, I think.) > > + if (!oid_is_null || > > + (data->unborn && data->symrefs && (flag & REF_ISSYMREF))) > > + send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data); > > It likewise feels a bit funny that we determine the symref name in the > earlier call to resolve_ref_unsafe(), but we do not pass it here (and in > fact, we'll end up looking it up again!). > > But that is not much different than what we do for normal refs passed to > the send_ref() callback. It would be nice if the iteration could pass in > "by the way, here is the symref value" to avoid that. Yes, that would be nice. > But in practice it > isn't a big deal, since we only do the lookup when we see the ISSYMREF > flag set. So typically it is only one or two extra ref resolutions. OK. > > @@ -91,7 +118,7 @@ int ls_refs(struct repository *r, struct strvec *keys, > > > > memset(&data, 0, sizeof(data)); > > > > - git_config(ls_refs_config, NULL); > > + git_config(ls_refs_config, &data); > > You will probably not be surprised that I would suggest defaulting > data->allow_unborn to 1 before this config call. :) I don't think many people have made comments either way, so I'll go ahead with defaulting it to true. I can see arguments for both sides. > > @@ -103,14 +130,31 @@ int ls_refs(struct repository *r, struct strvec *keys, > > data.symrefs = 1; > > else if (skip_prefix(arg, "ref-prefix ", &out)) > > strvec_push(&data.prefixes, out); > > + else if (data.allow_unborn && !strcmp("unborn", arg)) > > + data.unborn = 1; > > } > > So if we have not set allow_unborn, we will not accept the client saying > "unborn". Which makes sense, because we would not have advertised it in > that case. > > But we use the same boolean for advertising, too. So this loses the > "allow us to accept it, but not advertise it" logic that your earlier > versions had, doesn't it? Yes, it does. > And that is the important element for making > things work across a non-atomic deploy of versions. > > This straight-boolean version works as long as you can atomically update > the _config_ on each version. But that seems like roughly the same > problem (having dealt with this on GitHub servers, they are not > equivalent, and depending on your infrastructure, it definitely _can_ be > easier to do one versus the other. But it seems like a funny place to > leave this upstream feature). Well, I was just agreeing with what you said [1]. :-) [1] https://lore.kernel.org/git/X9xJLWdFJfNJTn0p@coredump.intra.peff.net/ > Or is the intent that an unconfigured reader would silently ignore the > unborn flag in that case? That would at least not cause it to bail on > the client in a mixed-version environment. But it does feel like a > confusing result. Right now, an old server would ignore "unborn", yes. I'm not sure of what the intent should be - tightening ls-refs and fetch to forbid unknown arguments seems like a good idea to me.
On Tue, Jan 26, 2021 at 10:13:58AM -0800, Jonathan Tan wrote: > (It was a bit confusing that refs_resolve_ref_unsafe() returns one of > its input arguments if it succeeds and NULL if it fails, but that's > outside the scope of this patch, I think.) Yep. It would probably be much nicer for it to return a numeric success code, and to take an optional strbuf into which to write the resolved symref name (if the caller even cares about it). But definitely out of scope for your patch. > > This straight-boolean version works as long as you can atomically update > > the _config_ on each version. But that seems like roughly the same > > problem (having dealt with this on GitHub servers, they are not > > equivalent, and depending on your infrastructure, it definitely _can_ be > > easier to do one versus the other. But it seems like a funny place to > > leave this upstream feature). > > Well, I was just agreeing with what you said [1]. :-) > > [1] https://lore.kernel.org/git/X9xJLWdFJfNJTn0p@coredump.intra.peff.net/ Oh, I just need to you to agree harder then. ;) If we are not going to support config that helps you do an atomic deploy, then I don't really see the point of having config at all. Here are three plausible implementations I can conceive of: - allowUnborn is a tri-state for "accept-but-do-not-advertise", "accept-and-advertise", and "disallow". This helps with rollout in a cluster by setting it to the accept-but-do-not-advertise. The default would be accept-and-advertise, which is what most servers would want. I don't really know why anyone would want "disallow". - allowUnborn is a bool for "accept-and-advertise" or "disallow". This doesn't help cluster rollout. I don't know why anyone would want to switch away from the default of accept-and-advertise. - allowUnborn is always on. The first one helps the cluster case, at the cost of introducing an extra config knob. The third one doesn't help that case, but is one less knob for server admins to think about. But the second one has a knob that I don't understand why anybody would tweak. It seems like the worst of both. Perhaps there's a reason for setting "disallow" that I don't know. Or perhaps you're happy to help the cluster case using a simple bool with atomic config rollouts (which are outside the scope of Git itself). > > Or is the intent that an unconfigured reader would silently ignore the > > unborn flag in that case? That would at least not cause it to bail on > > the client in a mixed-version environment. But it does feel like a > > confusing result. > > Right now, an old server would ignore "unborn", yes. I'm not sure of > what the intent should be - tightening ls-refs and fetch to forbid > unknown arguments seems like a good idea to me. If we had a just a bool (case 2 from above), and there was an always-implied "accept unborn even if not advertised", then that _does_ let the config help out the cluster case (it just turns off advertisements, basically making the bool "accept-but-do-not-advertise" versus "disallow"). I don't love it. The protocol spec does say "don't ask for capability foo if the server didn't say it knows about foo". We'd be loosening the enforcement of that (if only for capabilities we _do_ in fact know about), even though we don't know if it was due to a race, or if the client is just misbehaving. But I wondered if that was the direction you were going to try to solve your cluster-rollout problem. -Peff
diff --git a/Documentation/config.txt b/Documentation/config.txt index 6ba50b1104..d08e83a148 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -398,6 +398,8 @@ include::config/interactive.txt[] include::config/log.txt[] +include::config/lsrefs.txt[] + include::config/mailinfo.txt[] include::config/mailmap.txt[] diff --git a/Documentation/config/lsrefs.txt b/Documentation/config/lsrefs.txt new file mode 100644 index 0000000000..dcbec11aaa --- /dev/null +++ b/Documentation/config/lsrefs.txt @@ -0,0 +1,3 @@ +lsrefs.allowUnborn:: + Allow the server to send information about unborn symrefs during the + protocol v2 ref advertisement. diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 85daeb5d9e..4707511c10 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -192,11 +192,19 @@ ls-refs takes in the following arguments: When specified, only references having a prefix matching one of the provided prefixes are displayed. +If the 'unborn' feature is advertised the following argument can be +included in the client's request. + + unborn + The server may send symrefs pointing to unborn branches in the form + "unborn <refname> symref-target:<target>". + The output of ls-refs is as follows: output = *ref flush-pkt - ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF) + obj-id-or-unborn = (obj-id | "unborn") + ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF) ref-attribute = (symref | peeled) symref = "symref-target:" symref-target peeled = "peeled:" obj-id diff --git a/ls-refs.c b/ls-refs.c index a1e0b473e4..ff61e704f1 100644 --- a/ls-refs.c +++ b/ls-refs.c @@ -32,6 +32,8 @@ struct ls_refs_data { unsigned peel; unsigned symrefs; struct strvec prefixes; + unsigned allow_unborn : 1; + unsigned unborn : 1; }; static int send_ref(const char *refname, const struct object_id *oid, @@ -47,7 +49,10 @@ static int send_ref(const char *refname, const struct object_id *oid, if (!ref_match(&data->prefixes, refname_nons)) return 0; - strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); + if (oid) + strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons); + else + strbuf_addf(&refline, "unborn %s", refname_nons); if (data->symrefs && flag & REF_ISSYMREF) { struct object_id unused; const char *symref_target = resolve_ref_unsafe(refname, 0, @@ -74,8 +79,30 @@ static int send_ref(const char *refname, const struct object_id *oid, return 0; } -static int ls_refs_config(const char *var, const char *value, void *data) +static void send_possibly_unborn_head(struct ls_refs_data *data) { + struct strbuf namespaced = STRBUF_INIT; + struct object_id oid; + int flag; + int oid_is_null; + + memset(&oid, 0, sizeof(oid)); + strbuf_addf(&namespaced, "%sHEAD", get_git_namespace()); + resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag); + oid_is_null = is_null_oid(&oid); + if (!oid_is_null || + (data->unborn && data->symrefs && (flag & REF_ISSYMREF))) + send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data); + strbuf_release(&namespaced); +} + +static int ls_refs_config(const char *var, const char *value, void *cb_data) +{ + struct ls_refs_data *data = cb_data; + + if (!strcmp("lsrefs.allowunborn", var)) + data->allow_unborn = git_config_bool(var, value); + /* * We only serve fetches over v2 for now, so respect only "uploadpack" * config. This may need to eventually be expanded to "receive", but we @@ -91,7 +118,7 @@ int ls_refs(struct repository *r, struct strvec *keys, memset(&data, 0, sizeof(data)); - git_config(ls_refs_config, NULL); + git_config(ls_refs_config, &data); while (packet_reader_read(request) == PACKET_READ_NORMAL) { const char *arg = request->line; @@ -103,14 +130,31 @@ int ls_refs(struct repository *r, struct strvec *keys, data.symrefs = 1; else if (skip_prefix(arg, "ref-prefix ", &out)) strvec_push(&data.prefixes, out); + else if (data.allow_unborn && !strcmp("unborn", arg)) + data.unborn = 1; } if (request->status != PACKET_READ_FLUSH) die(_("expected flush after ls-refs arguments")); - head_ref_namespaced(send_ref, &data); + send_possibly_unborn_head(&data); for_each_namespaced_ref(send_ref, &data); packet_flush(1); strvec_clear(&data.prefixes); return 0; } + +int ls_refs_advertise(struct repository *r, struct strbuf *value) +{ + if (value) { + int allow_unborn_value; + + if (!repo_config_get_bool(the_repository, + "lsrefs.allowunborn", + &allow_unborn_value) && + allow_unborn_value) + strbuf_addstr(value, "unborn"); + } + + return 1; +} diff --git a/ls-refs.h b/ls-refs.h index 7b33a7c6b8..a99e4be0bd 100644 --- a/ls-refs.h +++ b/ls-refs.h @@ -6,5 +6,6 @@ struct strvec; struct packet_reader; int ls_refs(struct repository *r, struct strvec *keys, struct packet_reader *request); +int ls_refs_advertise(struct repository *r, struct strbuf *value); #endif /* LS_REFS_H */ diff --git a/serve.c b/serve.c index eec2fe6f29..ac20c72763 100644 --- a/serve.c +++ b/serve.c @@ -73,7 +73,7 @@ struct protocol_capability { static struct protocol_capability capabilities[] = { { "agent", agent_advertise, NULL }, - { "ls-refs", always_advertise, ls_refs }, + { "ls-refs", ls_refs_advertise, ls_refs }, { "fetch", upload_pack_advertise, upload_pack_v2 }, { "server-option", always_advertise, NULL }, { "object-format", object_format_advertise, NULL },
When cloning, we choose the default branch based on the remote HEAD. But if there is no remote HEAD reported (which could happen if the target of the remote HEAD is unborn), we'll fall back to using our local init.defaultBranch. Traditionally this hasn't been a big deal, because most repos used "master" as the default. But these days it is likely to cause confusion if the server and client implementations choose different values (e.g., if the remote started with "main", we may choose "master" locally, create commits there, and then the user is surprised when they push to "master" and not "main"). To solve this, the remote needs to communicate the target of the HEAD symref, even if it is unborn, and "git clone" needs to use this information. Currently, symrefs that have unborn targets (such as in this case) are not communicated by the protocol. Teach Git to advertise and support the "unborn" feature in "ls-refs" (guarded by the lsrefs.allowunborn config). This feature indicates that "ls-refs" supports the "unborn" argument; when it is specified, "ls-refs" will send the HEAD symref with the name of its unborn target. This change is only for protocol v2. A similar change for protocol v0 would require independent protocol design (there being no analogous position to signal support for "unborn") and client-side plumbing of the data required, so the scope of this patch set is limited to protocol v2. The client side will be updated to use this in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Documentation/config.txt | 2 + Documentation/config/lsrefs.txt | 3 ++ Documentation/technical/protocol-v2.txt | 10 ++++- ls-refs.c | 52 +++++++++++++++++++++++-- ls-refs.h | 1 + serve.c | 2 +- 6 files changed, 64 insertions(+), 6 deletions(-) create mode 100644 Documentation/config/lsrefs.txt