diff mbox series

[3/4] promisor-remote: allow a server to advertise extra fields

Message ID 20250414160343.2216312-4-christian.couder@gmail.com (mailing list archive)
State New
Headers show
Series Make the "promisor-remote" capability support extra fields | expand

Commit Message

Christian Couder April 14, 2025, 4:03 p.m. UTC
For now the "promisor-remote" protocol capability can only pass "name"
and "url" information from a server to a client in the form
"name=<remote_name>,url=<remote_url>".

Let's make it possible to pass more information by introducing a new
"promisor.sendExtraFields" configuration variable. This variable
should contain a comma or space separated list of fields that will be
looked up in the configuration of the remote on the server to find the
values that will be passed to the client.

For example if "promisor.sendExtraFields" is set to "partialCloneFilter",
and the server has the "remote.<name>.partialCloneFilter" config
variable set to a value for a remote, then that value will be passed
in the form "partialCloneFilter=<value>" after the "name" and "url"
fields.

A following commit will allow the client to use the extra information
to decide if it accepts the remote or not. For now the client doesn't
do anything with the extra information it receives.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.adoc    | 15 ++++++
 Documentation/gitprotocol-v2.adoc     | 42 +++++++++------
 promisor-remote.c                     | 75 ++++++++++++++++++++++++---
 t/t5710-promisor-remote-capability.sh | 32 ++++++++++++
 4 files changed, 139 insertions(+), 25 deletions(-)

Comments

Junio C Hamano April 14, 2025, 10:04 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

>  
> +promisor.sendExtraFields::
> +	A comma or space separated list of additional remote related
> +	fields that a server will send while advertising its promisor
> +	remotes using the "promisor-remote" capability, see
> +	linkgit:gitprotocol-v2[5]. When a field named "bar" is part of
> +	this list and a corresponding "remote.foo.bar" config variable
> +	is set on the server to a non empty value, for example "baz",
> +	then the field and its value, so "bar=baz", will be sent when
> +	advertising the promisor remote "foo". This list has no effect
> +	unless the "promisor.advertise" config variable, see above, is
> +	set to "true", and if that's the case, then whatever this list
> +	contains, the "name" and "url" fields are advertised anyway
> +	and contain the remote name and URL respectively, so there is
> +	no need to add "name" or "url" to this list.

As a description of overall syntax of the protocol, this and ...


>  promisor.acceptFromServer::
>  	If set to "all", a client will accept all the promisor remotes
>  	a server might advertise using the "promisor-remote"
> diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
> index 5598c93e67..f649745837 100644
> --- a/Documentation/gitprotocol-v2.adoc
> +++ b/Documentation/gitprotocol-v2.adoc
> @@ -785,33 +785,39 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>  
> -promisor-remote=<pr-infos>
> +promisor-remote=<pr-info>
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
>  The server may advertise some promisor remotes it is using or knows
>  about to a client which may want to use them as its promisor remotes,
> -instead of this repository. In this case <pr-infos> should be of the
> +instead of this repository. In this case <pr-info> should be of the
>  form:
>  
> -	pr-infos = pr-info | pr-infos ";" pr-info
> +	pr-info = pr-fields | pr-info ";" pr-info
>  
> -	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> +	pr-fields = fld-key "=" fld-value | pr-fields "," pr-fields
>  
> -where `pr-name` is the urlencoded name of a promisor remote, and
> -`pr-url` the urlencoded URL of that promisor remote.
> +where all the `fld-key` and `fld-value` in a given `pr-fields` are
> +field keys and values related to a single promisor remote.

... this may work, but as we are defining protocol between two
parties, in order to ensure interoperable reimplementations, we need
to also specify semantics, what are the defined "fields", and what
each of them mean.  Proposing nebulous "with this framework your
imagination is the limit, you can invent anything" may work for
other parts of the system, but not for the part that is about
communication between two repositories.

IOW, we shouldn't be internally calling these "extra".  From the
point of view of "core" they may be "extra", but to the developers
and certainly to the end-users, they shouldn't be "extra" at all.
They are all supported parts of the system with defined semantics,
right?

Another reason why I hate seeing this nebulous "with this, we can
send anything extra" is because such a thing will have a wrong
security posture.  If we truly *need* to be able to carry
*anything*, we need to make sure how values are quoted/escaped, and
the code for dequoting/unescaping are robustly written to avoid
passing malformed input and misinterpreting it as something else,
which would give a new attack vector.  If we can enumerate supported
fields, their syntax and their possible values, we can make the
attack surface a lot smaller.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 2638b01f83..bc08999733 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -9,6 +9,21 @@  promisor.advertise::
 	"false", which means the "promisor-remote" capability is not
 	advertised.
 
+promisor.sendExtraFields::
+	A comma or space separated list of additional remote related
+	fields that a server will send while advertising its promisor
+	remotes using the "promisor-remote" capability, see
+	linkgit:gitprotocol-v2[5]. When a field named "bar" is part of
+	this list and a corresponding "remote.foo.bar" config variable
+	is set on the server to a non empty value, for example "baz",
+	then the field and its value, so "bar=baz", will be sent when
+	advertising the promisor remote "foo". This list has no effect
+	unless the "promisor.advertise" config variable, see above, is
+	set to "true", and if that's the case, then whatever this list
+	contains, the "name" and "url" fields are advertised anyway
+	and contain the remote name and URL respectively, so there is
+	no need to add "name" or "url" to this list.
+
 promisor.acceptFromServer::
 	If set to "all", a client will accept all the promisor remotes
 	a server might advertise using the "promisor-remote"
diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
index 5598c93e67..f649745837 100644
--- a/Documentation/gitprotocol-v2.adoc
+++ b/Documentation/gitprotocol-v2.adoc
@@ -785,33 +785,39 @@  retrieving the header from a bundle at the indicated URI, and thus
 save themselves and the server(s) the request(s) needed to inspect the
 headers of that bundle or bundles.
 
-promisor-remote=<pr-infos>
+promisor-remote=<pr-info>
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The server may advertise some promisor remotes it is using or knows
 about to a client which may want to use them as its promisor remotes,
-instead of this repository. In this case <pr-infos> should be of the
+instead of this repository. In this case <pr-info> should be of the
 form:
 
-	pr-infos = pr-info | pr-infos ";" pr-info
+	pr-info = pr-fields | pr-info ";" pr-info
 
-	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
+	pr-fields = fld-key "=" fld-value | pr-fields "," pr-fields
 
-where `pr-name` is the urlencoded name of a promisor remote, and
-`pr-url` the urlencoded URL of that promisor remote.
+where all the `fld-key` and `fld-value` in a given `pr-fields` are
+field keys and values related to a single promisor remote.
 
-In this case, if the client decides to use one or more promisor
-remotes the server advertised, it can reply with
-"promisor-remote=<pr-names>" where <pr-names> should be of the form:
+The server MUST advertise at least the "name" and "url" field keys
+along with the associated field values, which are the name of a valid
+remote and its URL, in each `pr-fields`.
 
-	pr-names = pr-name | pr-names ";" pr-name
+`fld-key` MUST start with an alphabetic character and contain only
+alphanumeric or '-' characters, and `fld-value` MUST be urlencoded.
+
+If the client decides to use one or more promisor remotes the server
+advertised, it can reply with "promisor-remote=<pr-names>" where
+<pr-names> should be of the form:
+
+	pr-names = pr-name | pr-names ";" pr-names
 
 where `pr-name` is the urlencoded name of a promisor remote the server
 advertised and the client accepts.
 
-Note that, everywhere in this document, `pr-name` MUST be a valid
-remote name, and the ';' and ',' characters MUST be encoded if they
-appear in `pr-name` or `pr-url`.
+Note that, everywhere in this document, the ';' and ',' characters
+MUST be encoded if they appear in `pr-name` or `fld-value`.
 
 If the server doesn't know any promisor remote that could be good for
 a client to use, or prefers a client not to use any promisor remote it
@@ -822,10 +828,12 @@  In this case, or if the client doesn't want to use any promisor remote
 the server advertised, the client shouldn't advertise the
 "promisor-remote" capability at all in its reply.
 
-The "promisor.advertise" and "promisor.acceptFromServer" configuration
-options can be used on the server and client side to control what they
-advertise or accept respectively. See the documentation of these
-configuration options for more information.
+On the server side, the "promisor.advertise" and
+"promisor.sendExtraFields" configuration options can be used to
+control what it advertises. On the client side, the
+"promisor.acceptFromServer" configuration option can be used to
+control what it accepts. See the documentation of these configuration
+options for more information.
 
 Note that in the future it would be nice if the "promisor-remote"
 protocol capability could be used by the server, when responding to
diff --git a/promisor-remote.c b/promisor-remote.c
index 0fb07f25af..424d88d4a1 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -314,6 +314,65 @@  static int allow_unsanitized(char ch)
 	return ch > 32 && ch < 127;
 }
 
+/*
+ * A valid extra field "foo" should correspond to a
+ * "remote.<name>.foo" config variable, so, like config variables
+ * keys, it should start with an alphabetic character and otherwise
+ * each character should satisfy is_config_key_char().
+ */
+static int valid_extra_field(struct string_list_item *item, void *cb_data)
+{
+	const char *field = item->string;
+	const char *config_key = (const char *)cb_data;
+
+	for (size_t i = 0; field[i]; i++)
+		if (i ? !is_config_key_char(field[i]) : !isalpha(field[i])) {
+			warning(_("invalid field '%s' in '%s' config"), field, config_key);
+			return 0;
+		}
+	return 1;
+}
+
+static char *fields_from_config(struct string_list *fields_list, const char *config_key)
+{
+	char *extras = NULL;
+
+	if (!git_config_get_string(config_key, &extras) && *extras) {
+		string_list_split_in_place(fields_list, extras, ", ", -1);
+		filter_string_list(fields_list, 0, valid_extra_field, (void *)config_key);
+	}
+
+	return extras;
+}
+
+static struct string_list *extra_fields_sent(void)
+{
+	static struct string_list fields_list = STRING_LIST_INIT_NODUP;
+	static int initialized = 0;
+
+	if (!initialized) {
+		fields_from_config(&fields_list, "promisor.sendExtraFields");
+		initialized = 1;
+	}
+
+	return &fields_list;
+}
+
+static void append_extra_fields(struct string_list *fields,
+				struct string_list *extra_fields,
+				const char *name)
+{
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, extra_fields) {
+		char *key = xstrfmt("remote.%s.%s", name, item->string);
+		const char *val;
+		if (!git_config_get_string_tmp(key, &val) && *val)
+			string_list_append(fields, item->string)->util = (char *)val;
+		free(key);
+	}
+}
+
 /*
  * Linked list for promisor remotes.
  *
@@ -342,7 +401,8 @@  static void free_info_list(struct promisor_info *p)
  * remotes. For each promisor remote, some of its fields, starting
  * with "name" and "url", are put in the 'fields' string_list.
  */
-static struct promisor_info *promisor_info_list(struct repository *repo)
+static struct promisor_info *promisor_info_list(struct repository *repo,
+						struct string_list *extra_fields)
 {
 	struct promisor_info *infos = NULL;
 	struct promisor_info **last_info = &infos;
@@ -363,6 +423,9 @@  static struct promisor_info *promisor_info_list(struct repository *repo)
 			string_list_append(&new_info->fields, "name")->util = (char *)r->name;
 			string_list_append(&new_info->fields, "url")->util = (char *)url;
 
+			if (extra_fields)
+				append_extra_fields(&new_info->fields, extra_fields, r->name);
+
 			*last_info = new_info;
 			last_info = &new_info->next;
 		}
@@ -385,7 +448,7 @@  char *promisor_remote_info(struct repository *repo)
 	if (!advertise_promisors)
 		return NULL;
 
-	info_list = promisor_info_list(repo);
+	info_list = promisor_info_list(repo, extra_fields_sent());
 
 	if (!info_list)
 		return NULL;
@@ -502,7 +565,7 @@  static void filter_promisor_remote(struct repository *repo,
 		return;
 
 	if (accept != ACCEPT_ALL)
-		info_list = promisor_info_list(repo);
+		info_list = promisor_info_list(repo, NULL);
 
 	/* Parse remote info received */
 
@@ -519,13 +582,9 @@  static void filter_promisor_remote(struct repository *repo,
 		elems = strbuf_split(remotes[i], ',');
 
 		for (size_t j = 0; elems[j]; j++) {
-			int res;
 			strbuf_strip_suffix(elems[j], ",");
-			res = skip_prefix(elems[j]->buf, "name=", &remote_name) ||
+			if (!skip_prefix(elems[j]->buf, "name=", &remote_name))
 				skip_prefix(elems[j]->buf, "url=", &remote_url);
-			if (!res)
-				warning(_("unknown element '%s' from remote info"),
-					elems[j]->buf);
 		}
 
 		if (remote_name)
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index b35b774235..26f3c63112 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -289,6 +289,38 @@  test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
 	check_missing_objects server 1 "$oid"
 '
 
+test_expect_success "clone with promisor.sendExtraFields" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	git -C server remote add otherLop "https://invalid.invalid"  &&
+	git -C server config remote.otherLop.id "fooBar" &&
+	git -C server config remote.otherLop.stuff "baz" &&
+	git -C server config remote.otherLop.partialCloneFilter "blob:limit=10k" &&
+	test_when_finished "git -C server remote remove otherLop" &&
+	git -C server config promisor.sendExtraFields "id, partialCloneFilter" &&
+	test_when_finished "git -C server config unset promisor.sendExtraFields" &&
+	test_when_finished "rm trace" &&
+
+	# Clone from server to create a client
+	GIT_TRACE_PACKET="$(pwd)/trace" GIT_NO_LAZY_FETCH=0 git clone \
+		-c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=All \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that extra fields are properly transmitted
+	ENCODED_URL=$(echo "file://$(pwd)/lop" | sed -e "s/ /%20/g") &&
+	PR1="name=lop,url=$ENCODED_URL,partialCloneFilter=blob:none" &&
+	PR2="name=otherLop,url=https://invalid.invalid,id=fooBar,partialCloneFilter=blob:limit=10k" &&
+	test_grep "clone< promisor-remote=$PR1;$PR2" trace &&
+	test_grep "clone> promisor-remote=lop;otherLop" trace &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&