diff mbox series

[v3] promisor-remote: fix segfault when remote URL is missing

Message ID 20250312114628.2744747-1-christian.couder@gmail.com (mailing list archive)
State New
Headers show
Series [v3] promisor-remote: fix segfault when remote URL is missing | expand

Commit Message

Christian Couder March 12, 2025, 11:46 a.m. UTC
Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's push the remote name
instead of `NULL` into the 'strvec' that stores URLs. This is
similar to what other Git commands do. For example `git fetch` uses
the remote name to fetch if no url has been configured. Similarly
`git remote get-url` reports the remote name if no url has been
configured.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

Note that an empty URL can still be configured using something like
`git remote add foo ""`.

While at it, let's warn and reject the remote, in the 'KnownUrl' case,
when no URL or an empty URL is advertised by the server, or when an
empty URL is configured on the client for a remote name advertised by
the server and configured on the client. This is on par with a warning
already emitted when URLs are different in the same case.

Let's also warn if the remote is rejected because name and url are the
same, as it could mean the url has not been configured.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Also let's spell "URL" with uppercase letters in all the warnings.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 46 ++++++++++++++++++---
 t/t5710-promisor-remote-capability.sh | 59 +++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 5 deletions(-)

Comments

Junio C Hamano March 12, 2025, 5:02 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

>  	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> +		const char *url;
>  		char *url_key = xstrfmt("remote.%s.url", r->name);
>  
>  		strvec_push(names, r->name);
>  
> +		/*
> +		 * No URL defaults to the name of the remote, like
> +		 * elsewhere in Git (e.g. `git fetch` or `git remote
> +		 * get-url`). It's still possible that an empty URL is
> +		 * configured.
> +		 */

Not a huge deal as it is not telling any lies, but does the second
sentence need to be said?  An element in the urls strvec being an
empty string is not all that more interesting than it being an
incorrect or malformed URL to those who are reading this piece of
code, is it?  It is also possible that an unreachable URL or
misspelt URL is configured, but it is not a job of this piece of
code to worry about them, just like it is none of the business of
this code if the configured URL is an empty string, no?

> +		strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);

More on this below.  Unlike "git fetch" and "git push" used as the
source and destination, the remote URL used in this context are
exposed to the outside world, and I am not sure the usual r->name
fallback makes sense.

>  		free(url_key);
>  	}
>  }
> @@ -356,7 +362,7 @@ char *promisor_remote_info(struct repository *repo)
>  			strbuf_addch(&sb, ';');
>  		strbuf_addstr(&sb, "name=");
>  		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
> -		if (urls.v[i]) {
> +		if (*urls.v[i]) {
>  			strbuf_addstr(&sb, ",url=");
>  			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);

We used to advertise an empty string name to the other end, but we
no longer do, which is a good hygiene to be strict on what we send
out.

But now our updated promisor_info_vecs() pushes our local name
r->name as a fallback. The idea of r->name fallback is to use it as
a local directory path for "git fetch" and friends, but the local
pathname has no meaning to the other side, does it?  Is it something
we want to let the other side even know???

What other uses do the name/url vectors prepared by
promisor_info_vecs() have?  Is it that we use them only to advertise
with this code, and then match with what they advertise?  If we are
not using these names and urls locally to fetch from in code paths,
I am inclined to suggest that promisor_info_vecs() should not shove
these fallback URLs (local directory name implicitly inferred) into
the names/urls vectors.

On the other hand, if other callsites that use the names/urls
obtained from that function do want to see such local pathnames, we
cannot lose information at the source, so we'd somehow need to
filter them at various places, I guess.  And this place that builds
up the string to be sent as capability response should be one of
these places that must filter.

> @@ -409,12 +415,42 @@ static int should_accept_remote(enum accept_promisor accept,
>  	if (accept != ACCEPT_KNOWN_URL)
>  		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
>  
> +	if (!remote_url) {
> +		warning(_("no URL advertised for remote '%s'"), remote_name);
> +		return 0;
> +	}

Except for the above "no URL advertised" warning and returning,
which is absolutely a good thing to do, I am still not sure how
relevant various checks for an empty string new code added by this
patch makes are ...

> +	if (!*remote_url) {
> +		/*
> +		 * This shouldn't happen with a Git server, but not
> +		 * sure how other servers will be implemented in the
> +		 * future.
> +		 */
> +		warning(_("empty URL advertised for remote '%s'"), remote_name);
> +		return 0;
> +	}
> +
> +	if (!*urls->v[i]) {
> +		warning(_("empty URL configured for remote '%s'"), remote_name);
> +		return 0;
> +	}
> +

... would it be so different to pass an empty string as to pass a
misspelt URL received from the other end?  Wouldn't the end result
the same (i.e., we thought we had a URL usable as a promisor remote,
but it turns out that we cannot reach it)?

>  	if (!strcmp(urls->v[i], remote_url))
>  		return 1;

Past this point, I am not sure what the points of these checks and
warnings are; even with these "problematic" remote_name and remote_url
combinations these warnings attempt to warn against are used, as long
as the above check said it is OK, we'd silently said "should accept"
already to the caller.

> -	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
> +	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
>  		remote_name, urls->v[i], remote_url);
>  
> +	if (!strcmp(remote_name, urls->v[i]))

The 'i' was obtained by calling remote_nick_find(), which uses
strcasecmp() to find named remote (which I doubt it is a sensible
design by the way).  This code should be consistent with whatever
comparison used there.

> +		warning(_("remote name and URL are the same '%s', "
> +			  "maybe the URL is not configured locally"),
> +			remote_name);
> +
> +	if (!strcmp(remote_name, remote_url))

This is matching what r->name fallback did so it is correct to be
strcmp().  But (1) it may be way too late after the above "return
1", and (2) if we are *not* going to use it, perhaps we shouldn't
place it in the resulting strvec from promisor_info_vecs() in the
first place?

> +		warning(_("remote name and URL are the same '%s', "
> +			  "maybe the URL is not configured on the remote side"),
> +			remote_name);
> +
>  	return 0;
>  }
diff mbox series

Patch

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..479b9a27af 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,13 +323,19 @@  static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		const char *url;
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
 
-		free(url);
+		/*
+		 * No URL defaults to the name of the remote, like
+		 * elsewhere in Git (e.g. `git fetch` or `git remote
+		 * get-url`). It's still possible that an empty URL is
+		 * configured.
+		 */
+		strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
+
 		free(url_key);
 	}
 }
@@ -356,7 +362,7 @@  char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (*urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
@@ -409,12 +415,42 @@  static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url) {
+		warning(_("no URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*remote_url) {
+		/*
+		 * This shouldn't happen with a Git server, but not
+		 * sure how other servers will be implemented in the
+		 * future.
+		 */
+		warning(_("empty URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*urls->v[i]) {
+		warning(_("empty URL configured for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
-	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
+	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
 		remote_name, urls->v[i], remote_url);
 
+	if (!strcmp(remote_name, urls->v[i]))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured locally"),
+			remote_name);
+
+	if (!strcmp(remote_name, remote_url))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured on the remote side"),
+			remote_name);
+
 	return 0;
 }
 
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..05ae96d1f7 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@  test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,46 @@  test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as URLs are
+	# different, and the server cannot lazy fetch as the LOP URL is
+	# missing, so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config set remote.lop.url "" &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as an empty URL is
+	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
+	# so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&