[v5,1/3] ls-refs: report unborn targets of symrefs

Message ID	32e16dfdbd3f54c0d4b39cbd555e66aa3950fffd.1611686656.git.jonathantanmy@google.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Sender: "jonathantanmy via sendgmr" <jonathantanmy@twelve4.c.googlers.com> Date: Tue, 26 Jan 2021 10:55:55 -0800 In-Reply-To: <cover.1611686656.git.jonathantanmy@google.com> Message-Id: <32e16dfdbd3f54c0d4b39cbd555e66aa3950fffd.1611686656.git.jonathantanmy@google.com> Mime-Version: 1.0 References: <20201208013121.677494-1-jonathantanmy@google.com> <cover.1611686656.git.jonathantanmy@google.com> Subject: [PATCH v5 1/3] ls-refs: report unborn targets of symrefs From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com>, Junio C Hamano <gitster@pobox.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	Cloning with remote unborn HEAD \| expand [v5,0/3] Cloning with remote unborn HEAD [v5,1/3] ls-refs: report unborn targets of symrefs [v5,2/3] connect, transport: encapsulate arg in struct [v5,3/3] clone: respect remote unborn HEAD

Jonathan Tan Jan. 26, 2021, 6:55 p.m. UTC

When cloning, we choose the default branch based on the remote HEAD.
But if there is no remote HEAD reported (which could happen if the
target of the remote HEAD is unborn), we'll fall back to using our local
init.defaultBranch. Traditionally this hasn't been a big deal, because
most repos used "master" as the default. But these days it is likely to
cause confusion if the server and client implementations choose
different values (e.g., if the remote started with "main", we may choose
"master" locally, create commits there, and then the user is surprised
when they push to "master" and not "main").

To solve this, the remote needs to communicate the target of the HEAD
symref, even if it is unborn, and "git clone" needs to use this
information.

Currently, symrefs that have unborn targets (such as in this case) are
not communicated by the protocol. Teach Git to advertise and support the
"unborn" feature in "ls-refs" (by default, this is advertised, but
server administrators may turn this off through the lsrefs.allowunborn
config). This feature indicates that "ls-refs" supports the "unborn"
argument; when it is specified, "ls-refs" will send the HEAD symref with
the name of its unborn target.

This change is only for protocol v2. A similar change for protocol v0
would require independent protocol design (there being no analogous
position to signal support for "unborn") and client-side plumbing of the
data required, so the scope of this patch set is limited to protocol v2.

The client side will be updated to use this in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt                |  2 +
 Documentation/config/lsrefs.txt         |  3 ++
 Documentation/technical/protocol-v2.txt | 10 ++++-
 ls-refs.c                               | 53 +++++++++++++++++++++++--
 ls-refs.h                               |  1 +
 serve.c                                 |  2 +-
 t/t5701-git-serve.sh                    |  2 +-
 7 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/config/lsrefs.txt

Junio C Hamano Jan. 26, 2021, 9:38 p.m. UTC | #1

Jonathan Tan <jonathantanmy@google.com> writes:

> When cloning, we choose the default branch based on the remote HEAD.
> But if there is no remote HEAD reported (which could happen if the
> target of the remote HEAD is unborn), we'll fall back to using our local
> init.defaultBranch. Traditionally this hasn't been a big deal, because
> most repos used "master" as the default. But these days it is likely to
> cause confusion if the server and client implementations choose
> different values (e.g., if the remote started with "main", we may choose
> "master" locally, create commits there, and then the user is surprised
> when they push to "master" and not "main").
>
> To solve this, the remote needs to communicate the target of the HEAD
> symref, even if it is unborn, and "git clone" needs to use this
> information.
>
> Currently, symrefs that have unborn targets (such as in this case) are
> not communicated by the protocol. Teach Git to advertise and support the
> "unborn" feature in "ls-refs" (by default, this is advertised, but
> server administrators may turn this off through the lsrefs.allowunborn
> config). This feature indicates that "ls-refs" supports the "unborn"
> argument; when it is specified, "ls-refs" will send the HEAD symref with
> the name of its unborn target.
>
> This change is only for protocol v2. A similar change for protocol v0
> would require independent protocol design (there being no analogous
> position to signal support for "unborn") and client-side plumbing of the
> data required, so the scope of this patch set is limited to protocol v2.
>
> The client side will be updated to use this in a subsequent commit.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/config.txt                |  2 +
>  Documentation/config/lsrefs.txt         |  3 ++
>  Documentation/technical/protocol-v2.txt | 10 ++++-
>  ls-refs.c                               | 53 +++++++++++++++++++++++--
>  ls-refs.h                               |  1 +
>  serve.c                                 |  2 +-
>  t/t5701-git-serve.sh                    |  2 +-
>  7 files changed, 66 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/config/lsrefs.txt
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 6ba50b1104..d08e83a148 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -398,6 +398,8 @@ include::config/interactive.txt[]
>  
>  include::config/log.txt[]
>  
> +include::config/lsrefs.txt[]
> +
>  include::config/mailinfo.txt[]
>  
>  include::config/mailmap.txt[]
> diff --git a/Documentation/config/lsrefs.txt b/Documentation/config/lsrefs.txt
> new file mode 100644
> index 0000000000..dcbec11aaa
> --- /dev/null
> +++ b/Documentation/config/lsrefs.txt
> @@ -0,0 +1,3 @@
> +lsrefs.allowUnborn::
> +	Allow the server to send information about unborn symrefs during the
> +	protocol v2 ref advertisement.
> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 85daeb5d9e..4707511c10 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -192,11 +192,19 @@ ls-refs takes in the following arguments:
>  	When specified, only references having a prefix matching one of
>  	the provided prefixes are displayed.
>  
> +If the 'unborn' feature is advertised the following argument can be
> +included in the client's request.
> +
> +    unborn
> +	The server may send symrefs pointing to unborn branches in the form
> +	"unborn <refname> symref-target:<target>".
> +
>  The output of ls-refs is as follows:
>  
>      output = *ref
>  	     flush-pkt
> -    ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
> +    obj-id-or-unborn = (obj-id | "unborn")
> +    ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
>      ref-attribute = (symref | peeled)
>      symref = "symref-target:" symref-target
>      peeled = "peeled:" obj-id
> diff --git a/ls-refs.c b/ls-refs.c
> index a1e0b473e4..4077adeb6a 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -32,6 +32,8 @@ struct ls_refs_data {
>  	unsigned peel;
>  	unsigned symrefs;
>  	struct strvec prefixes;
> +	unsigned allow_unborn : 1;
> +	unsigned unborn : 1;
>  };
>  
>  static int send_ref(const char *refname, const struct object_id *oid,
> @@ -47,7 +49,10 @@ static int send_ref(const char *refname, const struct object_id *oid,
>  	if (!ref_match(&data->prefixes, refname_nons))
>  		return 0;
>  
> -	strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
> +	if (oid)
> +		strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
> +	else
> +		strbuf_addf(&refline, "unborn %s", refname_nons);

When a call is made to this helper with NULL "oid", it unconditionally
sends the "refname" out as an 'unborn' thing.  If data->symrefs is not
true, or flag does not have REF_ISSYMREF set, then we'd end up
sending

    "unborn" SP refname LF

without any ref-attribute.  The caller is responsible for ensuring
that it passes sensible data->symrefs and flag when it passes
oid==NULL to this function, but it is OK because this is a private
helper.

OK.

>  	if (data->symrefs && flag & REF_ISSYMREF) {
>  		struct object_id unused;
>  		const char *symref_target = resolve_ref_unsafe(refname, 0,
> @@ -74,8 +79,30 @@ static int send_ref(const char *refname, const struct object_id *oid,
>  	return 0;
>  }
>  
> -static int ls_refs_config(const char *var, const char *value, void *data)
> +static void send_possibly_unborn_head(struct ls_refs_data *data)
>  {
> +	struct strbuf namespaced = STRBUF_INIT;
> +	struct object_id oid;
> +	int flag;
> +	int oid_is_null;
> +
> +	strbuf_addf(&namespaced, "%sHEAD", get_git_namespace());
> +	if (!resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag))
> +		return; /* bad ref */
> +	oid_is_null = is_null_oid(&oid);
> +	if (!oid_is_null ||
> +	    (data->unborn && data->symrefs && (flag & REF_ISSYMREF)))
> +		send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data);

And this caller makes sure that send_ref()'s expectation holds.

> +	strbuf_release(&namespaced);
> +}
> +
> +static int ls_refs_config(const char *var, const char *value, void *cb_data)
> +{
> +	struct ls_refs_data *data = cb_data;
> +
> +	if (!strcmp("lsrefs.allowunborn", var))
> +		data->allow_unborn = git_config_bool(var, value);
> +
>  	/*
>  	 * We only serve fetches over v2 for now, so respect only "uploadpack"
>  	 * config. This may need to eventually be expanded to "receive", but we
> @@ -91,7 +118,8 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  
>  	memset(&data, 0, sizeof(data));
>  
> -	git_config(ls_refs_config, NULL);
> +	data.allow_unborn = 1;
> +	git_config(ls_refs_config, &data);

The above is a usual sequence of "an unspecified allow-unborn
defaults to true, but the configuration can turn it off".  OK
>  
>  	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
>  		const char *arg = request->line;
> @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  			data.symrefs = 1;
>  		else if (skip_prefix(arg, "ref-prefix ", &out))
>  			strvec_push(&data.prefixes, out);
> +		else if (data.allow_unborn && !strcmp("unborn", arg))
> +			data.unborn = 1;

I think the use of &&-cascade is iffy here.  Even when we are *not*
accepting request for unborn, we should still parse it as such.
This does not matter in today's code, but it is a basic courtesy for
future developers who may add more "else if" after it.

IOW

		else if (!strcmp("unborn", arg)) {
			if (!data.allow_unborn)
				; /* we are not accepting the request */
			else
				data.unborn = 1;
		}

I wrote the above in longhand only for documentation purposes; in
practice, 

		else if (!strcmp("unborn", arg))
                	data.unborn = data.allow_unborn;

may suffice.

>  	}
>  
>  	if (request->status != PACKET_READ_FLUSH)
>  		die(_("expected flush after ls-refs arguments"));
>  
> -	head_ref_namespaced(send_ref, &data);
> +	send_possibly_unborn_head(&data);
>  	for_each_namespaced_ref(send_ref, &data);

And here is another caller of send_ref().  Are we sure that
send_ref()'s expectation is satisfied by this caller when the
iteration encounters a broken ref (e.g. refs/heads/broken not a
symref but names an object that does not exist and get_sha1()
yielding 0{40}), or a dangling symref (e.g. refs/remotes/origin/HEAD
pointing at something that does not exist)?

>  	packet_flush(1);
>  	strvec_clear(&data.prefixes);
>  	return 0;
>  }
> +
> +int ls_refs_advertise(struct repository *r, struct strbuf *value)
> +{
> +	if (value) {
> +		int allow_unborn_value;
> +
> +		if (repo_config_get_bool(the_repository,
> +					 "lsrefs.allowunborn",
> +					 &allow_unborn_value) ||
> +		    allow_unborn_value)
> +			strbuf_addstr(value, "unborn");
> +	}

This reads "when not explicitly disabled, stuff "unborn" in there".

It feels somewhat brittle that we have to read the same variable and
apply the same "default to true" logic in two places and have to
keep them in sync.  Is this because the decision to advertize or not
has to be made way before the code that is specific to the
implementation of ls-refs is run?

If ls_refs_advertise() is always called first before ls_refs(), I
wonder if it makes sense to reuse what we found out about the
configured (or left unconfigured) state here and use it when
ls_refs() gets called?  I know that the way serve.c infrastructure
calls "do we advertise?" helper from each protocol-element handler
is too narrow and does not allow us to pass such a necessary piece
of information but I view it as a misdesign that can be corrected
(and until that happens, we could use file-local static limited to
ls-refs.c).

> +	return 1;
> +}
> diff --git a/ls-refs.h b/ls-refs.h
> index 7b33a7c6b8..a99e4be0bd 100644
> --- a/ls-refs.h
> +++ b/ls-refs.h
> @@ -6,5 +6,6 @@ struct strvec;
>  struct packet_reader;
>  int ls_refs(struct repository *r, struct strvec *keys,
>  	    struct packet_reader *request);
> +int ls_refs_advertise(struct repository *r, struct strbuf *value);
>  
>  #endif /* LS_REFS_H */
> diff --git a/serve.c b/serve.c
> index eec2fe6f29..ac20c72763 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -73,7 +73,7 @@ struct protocol_capability {
>  
>  static struct protocol_capability capabilities[] = {
>  	{ "agent", agent_advertise, NULL },
> -	{ "ls-refs", always_advertise, ls_refs },
> +	{ "ls-refs", ls_refs_advertise, ls_refs },
>  	{ "fetch", upload_pack_advertise, upload_pack_v2 },
>  	{ "server-option", always_advertise, NULL },
>  	{ "object-format", object_format_advertise, NULL },
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index a1f5fdc9fd..df29504161 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -12,7 +12,7 @@ test_expect_success 'test capability advertisement' '
>  	cat >expect <<-EOF &&
>  	version 2
>  	agent=git/$(git version | cut -d" " -f3)
> -	ls-refs
> +	ls-refs=unborn
>  	fetch=shallow
>  	server-option
>  	object-format=$(test_oid algo)

Junio C Hamano Jan. 26, 2021, 11:03 p.m. UTC | #2

Junio C Hamano <gitster@pobox.com> writes:

> It feels somewhat brittle that we have to read the same variable and
> apply the same "default to true" logic in two places and have to
> keep them in sync.  Is this because the decision to advertize or not
> has to be made way before the code that is specific to the
> implementation of ls-refs is run?
>
> If ls_refs_advertise() is always called first before ls_refs(), I
> wonder if it makes sense to reuse what we found out about the
> configured (or left unconfigured) state here and use it when
> ls_refs() gets called?  I know that the way serve.c infrastructure
> calls "do we advertise?" helper from each protocol-element handler
> is too narrow and does not allow us to pass such a necessary piece
> of information but I view it as a misdesign that can be corrected
> (and until that happens, we could use file-local static limited to
> ls-refs.c).

After giving the above a bit more thought, here are a few random
thoughts around the area.

 * As "struct protocol_capability" indicates, we have <name of
   service, the logic to advertise, the logic to serve> as a
   three-tuple for services.  The serving logic should know what
   advertising logic advertised (or more precisely, what information
   advertising logic used to make that decision) so that they can
   work consistently.

   For that, there should be a mechanism that advertising logic can
   use to leave a note to serving logic, perhaps by adding a "void
   *" to both of these functions.  The advertising function would
   allocate a piece of memory it wants to use and returns the
   pointer to it to the caller in serve.c, and that pointer is given
   to the corresponding ls_refs() when it is called by serve.c.
   Then ls_refs_advertise can say "I found this configuration
   setting and decided to advertise" to later ls_refs() and the
   latter can say "ah, as you have advertised, I have to respond to
   such a request".

 * I am not sure if "lsrefs.allowunborn = yes/no" is a good way to
   configure this feature.  Wouldn't it be more natural to make this
   three-way, i.e. "lsrefs.unborn = advertise/serve/ignore", where
   the server operator can choose among (1) advertise the presence
   of the capability and respond to requests, (2) do not advertise
   the capability but if a request comes, respond to it, and (3) do
   not advertise and do not respond.  We could throw in 'deny' that
   causes the request to result in a failure but I do not care too
   deeply about that fourth option.

   Using such a configuration mechanism, ls_refs_advertise may leave
   the value of "lsrefs.unborn" (or lack thereof) it found and used
   to base its decision to advertise, for use by ls_refs.  ls_refs
   in turn can use the value found there to decide if it ignores or
   responds to the "unborn" request.

Jeff King Jan. 26, 2021, 11:20 p.m. UTC | #3

On Tue, Jan 26, 2021 at 01:38:49PM -0800, Junio C Hamano wrote:

> > @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys,
> >  			data.symrefs = 1;
> >  		else if (skip_prefix(arg, "ref-prefix ", &out))
> >  			strvec_push(&data.prefixes, out);
> > +		else if (data.allow_unborn && !strcmp("unborn", arg))
> > +			data.unborn = 1;
> 
> I think the use of &&-cascade is iffy here.  Even when we are *not*
> accepting request for unborn, we should still parse it as such.
> This does not matter in today's code, but it is a basic courtesy for
> future developers who may add more "else if" after it.
> 
> IOW
> 
> 		else if (!strcmp("unborn", arg)) {
> 			if (!data.allow_unborn)
> 				; /* we are not accepting the request */
> 			else
> 				data.unborn = 1;
> 		}
> 
> I wrote the above in longhand only for documentation purposes; in
> practice, 
> 
> 		else if (!strcmp("unborn", arg))
>                 	data.unborn = data.allow_unborn;
> 
> may suffice.

Doing it that way is friendlier, but loosens enforcement of:

  Client will then send a space separated list of capabilities it wants
  to be in effect. The client MUST NOT ask for capabilities the server
  did not say it supports.

from Documentation/technical/protocol-capabilities.txt.

It does solve Jonathan's racy cluster-deploy problem, though. See the
discussion in the v4 thread (sorry, seems not to have hit the archive
yet, but hopefully this link will work soon):

  https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/

-Peff

Junio C Hamano Jan. 26, 2021, 11:38 p.m. UTC | #4

Jeff King <peff@peff.net> writes:

> See the
> discussion in the v4 thread (sorry, seems not to have hit the archive
> yet, but hopefully this link will work soon):
>
>   https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/

I guess vger having some sort of constipation, we (not you-and-me
but list participants as a whole) would be doing this kind of
back-and-force while untable to read what others have said. 

https://lore.kernel.org/git/xmqqmtwvz4g9.fsf@gitster.c.googlers.com/

will have the following.

> It feels somewhat brittle that we have to read the same variable and
> apply the same "default to true" logic in two places and have to
> keep them in sync.  Is this because the decision to advertize or not
> has to be made way before the code that is specific to the
> implementation of ls-refs is run?
>
> If ls_refs_advertise() is always called first before ls_refs(), I
> wonder if it makes sense to reuse what we found out about the
> configured (or left unconfigured) state here and use it when
> ls_refs() gets called?  I know that the way serve.c infrastructure
> calls "do we advertise?" helper from each protocol-element handler
> is too narrow and does not allow us to pass such a necessary piece
> of information but I view it as a misdesign that can be corrected
> (and until that happens, we could use file-local static limited to
> ls-refs.c).

After giving the above a bit more thought, here are a few random
thoughts around the area.

 * As "struct protocol_capability" indicates, we have <name of
   service, the logic to advertise, the logic to serve> as a
   three-tuple for services.  The serving logic should know what
   advertising logic advertised (or more precisely, what information
   advertising logic used to make that decision) so that they can
   work consistently.

   For that, there should be a mechanism that advertising logic can
   use to leave a note to serving logic, perhaps by adding a "void
   *" to both of these functions.  The advertising function would
   allocate a piece of memory it wants to use and returns the
   pointer to it to the caller in serve.c, and that pointer is given
   to the corresponding ls_refs() when it is called by serve.c.
   Then ls_refs_advertise can say "I found this configuration
   setting and decided to advertise" to later ls_refs() and the
   latter can say "ah, as you have advertised, I have to respond to
   such a request".

 * I am not sure if "lsrefs.allowunborn = yes/no" is a good way to
   configure this feature.  Wouldn't it be more natural to make this
   three-way, i.e. "lsrefs.unborn = advertise/serve/ignore", where
   the server operator can choose among (1) advertise the presence
   of the capability and respond to requests, (2) do not advertise
   the capability but if a request comes, respond to it, and (3) do
   not advertise and do not respond.  We could throw in 'deny' that
   causes the request to result in a failure but I do not care too
   deeply about that fourth option.

   Using such a configuration mechanism, ls_refs_advertise may leave
   the value of "lsrefs.unborn" (or lack thereof) it found and used
   to base its decision to advertise, for use by ls_refs.  ls_refs
   in turn can use the value found there to decide if it ignores or
   responds to the "unborn" request.

Ævar Arnfjörð Bjarmason Jan. 27, 2021, 1:28 a.m. UTC | #5

On Tue, Jan 26 2021, Jonathan Tan wrote:

> +If the 'unborn' feature is advertised the following argument can be
> +included in the client's request.
> +
> +    unborn
> +	The server may send symrefs pointing to unborn branches in the form
> +	"unborn <refname> symref-target:<target>".
> +

"branches" as in things under refs/heads/*? What should happen if you
send this for a refs/tags/* or refs/xyz/*? Maybe overly pedantic, but it
seems we have no other explicit mention of refs/{heads,tags}/ in
protocol-v2.txt before this[1].

1. Although as I've learned from another recent thread include-tag is
   magical for refs/tags/* only.

Jonathan Tan Jan. 29, 2021, 8:23 p.m. UTC | #6

> > @@ -47,7 +49,10 @@ static int send_ref(const char *refname, const struct object_id *oid,
> >  	if (!ref_match(&data->prefixes, refname_nons))
> >  		return 0;
> >  
> > -	strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
> > +	if (oid)
> > +		strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
> > +	else
> > +		strbuf_addf(&refline, "unborn %s", refname_nons);
> 
> When a call is made to this helper with NULL "oid", it unconditionally
> sends the "refname" out as an 'unborn' thing.  If data->symrefs is not
> true, or flag does not have REF_ISSYMREF set, then we'd end up
> sending
> 
>     "unborn" SP refname LF
> 
> without any ref-attribute.  The caller is responsible for ensuring
> that it passes sensible data->symrefs and flag when it passes
> oid==NULL to this function, but it is OK because this is a private
> helper.
> 
> OK.

Thanks for checking.

> >  	if (data->symrefs && flag & REF_ISSYMREF) {
> >  		struct object_id unused;
> >  		const char *symref_target = resolve_ref_unsafe(refname, 0,
> > @@ -74,8 +79,30 @@ static int send_ref(const char *refname, const struct object_id *oid,
> >  	return 0;
> >  }
> >  
> > -static int ls_refs_config(const char *var, const char *value, void *data)
> > +static void send_possibly_unborn_head(struct ls_refs_data *data)
> >  {
> > +	struct strbuf namespaced = STRBUF_INIT;
> > +	struct object_id oid;
> > +	int flag;
> > +	int oid_is_null;
> > +
> > +	strbuf_addf(&namespaced, "%sHEAD", get_git_namespace());
> > +	if (!resolve_ref_unsafe(namespaced.buf, 0, &oid, &flag))
> > +		return; /* bad ref */
> > +	oid_is_null = is_null_oid(&oid);
> > +	if (!oid_is_null ||
> > +	    (data->unborn && data->symrefs && (flag & REF_ISSYMREF)))
> > +		send_ref(namespaced.buf, oid_is_null ? NULL : &oid, flag, data);
> 
> And this caller makes sure that send_ref()'s expectation holds.

Thanks for checking.

> > +	strbuf_release(&namespaced);
> > +}
> > +
> > +static int ls_refs_config(const char *var, const char *value, void *cb_data)
> > +{
> > +	struct ls_refs_data *data = cb_data;
> > +
> > +	if (!strcmp("lsrefs.allowunborn", var))
> > +		data->allow_unborn = git_config_bool(var, value);
> > +
> >  	/*
> >  	 * We only serve fetches over v2 for now, so respect only "uploadpack"
> >  	 * config. This may need to eventually be expanded to "receive", but we
> > @@ -91,7 +118,8 @@ int ls_refs(struct repository *r, struct strvec *keys,
> >  
> >  	memset(&data, 0, sizeof(data));
> >  
> > -	git_config(ls_refs_config, NULL);
> > +	data.allow_unborn = 1;
> > +	git_config(ls_refs_config, &data);
> 
> The above is a usual sequence of "an unspecified allow-unborn
> defaults to true, but the configuration can turn it off".  OK

Later, you address this issue again so I'll comment there.

> > @@ -103,14 +131,31 @@ int ls_refs(struct repository *r, struct strvec *keys,
> >  			data.symrefs = 1;
> >  		else if (skip_prefix(arg, "ref-prefix ", &out))
> >  			strvec_push(&data.prefixes, out);
> > +		else if (data.allow_unborn && !strcmp("unborn", arg))
> > +			data.unborn = 1;
> 
> I think the use of &&-cascade is iffy here.  Even when we are *not*
> accepting request for unborn, we should still parse it as such.
> This does not matter in today's code, but it is a basic courtesy for
> future developers who may add more "else if" after it.
> 
> IOW
> 
> 		else if (!strcmp("unborn", arg)) {
> 			if (!data.allow_unborn)
> 				; /* we are not accepting the request */
> 			else
> 				data.unborn = 1;
> 		}
> 
> I wrote the above in longhand only for documentation purposes; in
> practice, 
> 
> 		else if (!strcmp("unborn", arg))
>                 	data.unborn = data.allow_unborn;
> 
> may suffice.

My thinking was (and is) that falling through in the case of a
disallowed argument (as opposed to a completely unrecognized argument)
makes it more straightforward later if we ever decide to tighten
validation of the ls-refs request - we would only have to put some code
at the end that reports back to the user.

If we write it as you suggest, we would have to remember to replace the
"we are not accepting the request" part (as in the comment in your
suggested code) with an error report, but perhaps that is a good thing -
we would be able to insert a custom error message instead of an
information-hiding "argument not supported".

I'm OK either way.

> >  	}
> >  
> >  	if (request->status != PACKET_READ_FLUSH)
> >  		die(_("expected flush after ls-refs arguments"));
> >  
> > -	head_ref_namespaced(send_ref, &data);
> > +	send_possibly_unborn_head(&data);
> >  	for_each_namespaced_ref(send_ref, &data);
> 
> And here is another caller of send_ref().  Are we sure that
> send_ref()'s expectation is satisfied by this caller when the
> iteration encounters a broken ref (e.g. refs/heads/broken not a
> symref but names an object that does not exist and get_sha1()
> yielding 0{40}), or a dangling symref (e.g. refs/remotes/origin/HEAD
> pointing at something that does not exist)?

I assume that by "this caller" you mean for_each_namespaced_ref(), since
you mention an iteration. I believe so - send_ref has been changed to
tolerate a NULL (as in (void*)0, not 0{40}) oid, and that is the only
change, so if it worked previously, it should still work now.

> >  	packet_flush(1);
> >  	strvec_clear(&data.prefixes);
> >  	return 0;
> >  }
> > +
> > +int ls_refs_advertise(struct repository *r, struct strbuf *value)
> > +{
> > +	if (value) {
> > +		int allow_unborn_value;
> > +
> > +		if (repo_config_get_bool(the_repository,
> > +					 "lsrefs.allowunborn",
> > +					 &allow_unborn_value) ||
> > +		    allow_unborn_value)
> > +			strbuf_addstr(value, "unborn");
> > +	}
> 
> This reads "when not explicitly disabled, stuff "unborn" in there".
> 
> It feels somewhat brittle that we have to read the same variable and
> apply the same "default to true" logic in two places and have to
> keep them in sync.  Is this because the decision to advertize or not
> has to be made way before the code that is specific to the
> implementation of ls-refs is run?
> 
> If ls_refs_advertise() is always called first before ls_refs(), I
> wonder if it makes sense to reuse what we found out about the
> configured (or left unconfigured) state here and use it when
> ls_refs() gets called?  I know that the way serve.c infrastructure
> calls "do we advertise?" helper from each protocol-element handler
> is too narrow and does not allow us to pass such a necessary piece
> of information but I view it as a misdesign that can be corrected
> (and until that happens, we could use file-local static limited to
> ls-refs.c).

Perhaps what I could do is have a static variable that tracks whether
config has been read and what the config is (or if the default variable
is used), and have each function call another function that sets that
static variable if config has not yet been read. I think that will
address this concern.

Junio C Hamano Jan. 29, 2021, 10:04 p.m. UTC | #7

Jonathan Tan <jonathantanmy@google.com> writes:

>> I think the use of &&-cascade is iffy here.  Even when we are *not*
>> accepting request for unborn, we should still parse it as such.
>> This does not matter in today's code, but it is a basic courtesy for
>> future developers who may add more "else if" after it.
>> 
>> IOW
>> 
>> 		else if (!strcmp("unborn", arg)) {
>> 			if (!data.allow_unborn)
>> 				; /* we are not accepting the request */
>> 			else
>> 				data.unborn = 1;
>> 		}
>> 
>> I wrote the above in longhand only for documentation purposes; in
>> practice, 
>> 
>> 		else if (!strcmp("unborn", arg))
>>                 	data.unborn = data.allow_unborn;
>> 
>> may suffice.
>
> My thinking was (and is) that falling through in the case of a
> disallowed argument (as opposed to a completely unrecognized argument)
> makes it more straightforward later if we ever decide to tighten
> validation of the ls-refs request - we would only have to put some code
> at the end that reports back to the user.

Sorry, I do not quite follow.  If "unborn" is conditionally allowed,
you can extend what I suggested above like so:

	if (we see we got an unborn request) {
-		if (allowed)
+		if (partially allowed)
+			record that we got unborn request and will
+			partially respond to it
+		else if (allowed)
			record that we got unborn request;
+		else
+			report that we don't accept unborn request;
	}

This will matter even more if you write more else-if.  The
downstream of else-if clauses are forced to interpret (and fail)
"unborn" request they are not interested in.

>> >  	if (request->status != PACKET_READ_FLUSH)
>> >  		die(_("expected flush after ls-refs arguments"));
>> >  
>> > -	head_ref_namespaced(send_ref, &data);
>> > +	send_possibly_unborn_head(&data);
>> >  	for_each_namespaced_ref(send_ref, &data);
>> 
>> And here is another caller of send_ref().  Are we sure that
>> send_ref()'s expectation is satisfied by this caller when the
>> iteration encounters a broken ref (e.g. refs/heads/broken not a
>> symref but names an object that does not exist and get_sha1()
>> yielding 0{40}), or a dangling symref (e.g. refs/remotes/origin/HEAD
>> pointing at something that does not exist)?
>
> I assume that by "this caller" you mean for_each_namespaced_ref(), since
> you mention an iteration. I believe so - send_ref has been changed to
> tolerate a NULL (as in (void*)0, not 0{40}) oid, and that is the only
> change, so if it worked previously, it should still work now.

So a dangling symref, e.g. "refs/remotes/origin/HEAD -> trunk" when
no "refs/remotes/origin/trunk" exists, is not reported to send_ref()
in the same way as an unborn "HEAD"?  I would have expected that we'd
report where it points at, and for that to work, you'd have to use
not just the vanilla send_ref() as the callback, but something that
knows how to do "are we expected to send unborn symrefs" logic, like
send_possibly_unborn_head does.

That "changed to tolerate ... should work" worries me.

If "for_each_namespaced_ref(send_ref, &data)" will never call send_ref()
with NULL (as in (void *)0) oid, then that would be OK, but if it
ends up calling with NULL somehow, it is responsible to ensure that
data->symrefs is true and flag has REF_ISSYMREF set, or send_ref()
would misbehave, (see the first part of your message, which I am
responding to), no?

Jonathan Tan Jan. 30, 2021, 3:55 a.m. UTC | #8

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > It feels somewhat brittle that we have to read the same variable and
> > apply the same "default to true" logic in two places and have to
> > keep them in sync.  Is this because the decision to advertize or not
> > has to be made way before the code that is specific to the
> > implementation of ls-refs is run?
> >
> > If ls_refs_advertise() is always called first before ls_refs(), I
> > wonder if it makes sense to reuse what we found out about the
> > configured (or left unconfigured) state here and use it when
> > ls_refs() gets called?  I know that the way serve.c infrastructure
> > calls "do we advertise?" helper from each protocol-element handler
> > is too narrow and does not allow us to pass such a necessary piece
> > of information but I view it as a misdesign that can be corrected
> > (and until that happens, we could use file-local static limited to
> > ls-refs.c).
> 
> After giving the above a bit more thought, here are a few random
> thoughts around the area.
> 
>  * As "struct protocol_capability" indicates, we have <name of
>    service, the logic to advertise, the logic to serve> as a
>    three-tuple for services.  The serving logic should know what
>    advertising logic advertised (or more precisely, what information
>    advertising logic used to make that decision) so that they can
>    work consistently.
> 
>    For that, there should be a mechanism that advertising logic can
>    use to leave a note to serving logic, perhaps by adding a "void
>    *" to both of these functions.  The advertising function would
>    allocate a piece of memory it wants to use and returns the
>    pointer to it to the caller in serve.c, and that pointer is given
>    to the corresponding ls_refs() when it is called by serve.c.
>    Then ls_refs_advertise can say "I found this configuration
>    setting and decided to advertise" to later ls_refs() and the
>    latter can say "ah, as you have advertised, I have to respond to
>    such a request".

Usually the advertising is in the same file as the serving (true for
ls-refs and fetch, so far) so I think it's easier if they just
communicate on their own instead of through this "void *". I agree with
the communication idea, though.

>  * I am not sure if "lsrefs.allowunborn = yes/no" is a good way to
>    configure this feature.  Wouldn't it be more natural to make this
>    three-way, i.e. "lsrefs.unborn = advertise/serve/ignore", where
>    the server operator can choose among (1) advertise the presence
>    of the capability and respond to requests, (2) do not advertise
>    the capability but if a request comes, respond to it, and (3) do
>    not advertise and do not respond.  We could throw in 'deny' that
>    causes the request to result in a failure but I do not care too
>    deeply about that fourth option.
> 
>    Using such a configuration mechanism, ls_refs_advertise may leave
>    the value of "lsrefs.unborn" (or lack thereof) it found and used
>    to base its decision to advertise, for use by ls_refs.  ls_refs
>    in turn can use the value found there to decide if it ignores or
>    responds to the "unborn" request.

lsrefs.unborn = advertise/serve/ignore was how it was in version 2 [1]
(with different names) and I changed it due to Peff's suggestion, but
perhaps I landed up in the unhappy middle [2]. I think we should just
pick a standard and use it for this feature and whatever future features
may come. The distinction between advertise and serve ("allow" in my
version 2) is useful in some cases but is useless once migration has
occurred (and thus clutters the code), but perhaps it could be argued
that all servers need to do the migration at least once.

[1] https://lore.kernel.org/git/cover.1608084282.git.jonathantanmy@google.com/
[2] https://lore.kernel.org/git/YBCitNb75rpnuW2L@coredump.intra.peff.net/

Jonathan Tan Jan. 30, 2021, 4:04 a.m. UTC | #9

> 
> On Tue, Jan 26 2021, Jonathan Tan wrote:
> 
> > +If the 'unborn' feature is advertised the following argument can be
> > +included in the client's request.
> > +
> > +    unborn
> > +	The server may send symrefs pointing to unborn branches in the form
> > +	"unborn <refname> symref-target:<target>".
> > +
> 
> "branches" as in things under refs/heads/*? What should happen if you
> send this for a refs/tags/* or refs/xyz/*? Maybe overly pedantic, but it
> seems we have no other explicit mention of refs/{heads,tags}/ in
> protocol-v2.txt before this[1].
> 
> 1. Although as I've learned from another recent thread include-tag is
>    magical for refs/tags/* only.

Thanks for spotting this. Right now the server sends anything, but the
client only uses the information if it is a branch. I think this is the
most flexible approach so I'll keep it this way and document it
explicitly.

Jonathan Tan Feb. 2, 2021, 2:20 a.m. UTC | #10

> So a dangling symref, e.g. "refs/remotes/origin/HEAD -> trunk" when
> no "refs/remotes/origin/trunk" exists, is not reported to send_ref()
> in the same way as an unborn "HEAD"?

I've tried it, and yes, for_each_namespaced_ref() will not report it.
Looking at the code, I think it is packed_ref_iterator_advance() which
checks for broken objects and skips over them.

> I would have expected that we'd
> report where it points at, and for that to work, you'd have to use
> not just the vanilla send_ref() as the callback, but something that
> knows how to do "are we expected to send unborn symrefs" logic, like
> send_possibly_unborn_head does.
> 
> That "changed to tolerate ... should work" worries me.
> 
> If "for_each_namespaced_ref(send_ref, &data)" will never call send_ref()
> with NULL (as in (void *)0) oid, then that would be OK,

If it called send_ref() with (void *)0 OID, it would segfault with the
current code, which calls oid_to_hex() on the OID unconditionally.

> but if it
> ends up calling with NULL somehow, it is responsible to ensure that
> data->symrefs is true and flag has REF_ISSYMREF set, or send_ref()
> would misbehave, (see the first part of your message, which I am
> responding to), no?

If it did, then yes.

Junio C Hamano Feb. 2, 2021, 5 a.m. UTC | #11

Jonathan Tan <jonathantanmy@google.com> writes:

>> So a dangling symref, e.g. "refs/remotes/origin/HEAD -> trunk" when
>> no "refs/remotes/origin/trunk" exists, is not reported to send_ref()
>> in the same way as an unborn "HEAD"?
>
> I've tried it, and yes, for_each_namespaced_ref() will not report it.
> Looking at the code, I think it is packed_ref_iterator_advance() which
> checks for broken objects and skips over them.
>
>> I would have expected that we'd
>> report where it points at, and for that to work, you'd have to use
>> not just the vanilla send_ref() as the callback, but something that
>> knows how to do "are we expected to send unborn symrefs" logic, like
>> send_possibly_unborn_head does.
>> 
>> That "changed to tolerate ... should work" worries me.
>> 
>> If "for_each_namespaced_ref(send_ref, &data)" will never call send_ref()
>> with NULL (as in (void *)0) oid, then that would be OK,
>
> If it called send_ref() with (void *)0 OID, it would segfault with the
> current code, which calls oid_to_hex() on the OID unconditionally.
>
>> but if it
>> ends up calling with NULL somehow, it is responsible to ensure that
>> data->symrefs is true and flag has REF_ISSYMREF set, or send_ref()
>> would misbehave, (see the first part of your message, which I am
>> responding to), no?
>
> If it did, then yes.

So, in other words, the series is *only* about HEAD and no other
symrefs are reported when dangling as "unborn"?

[v5,1/3] ls-refs: report unborn targets of symrefs

Commit Message

Comments

Patch