diff mbox series

[v4] promisor-remote: fix segfault when remote URL is missing

Message ID 20250313103859.817127-1-christian.couder@gmail.com (mailing list archive)
State New
Headers show
Series [v4] promisor-remote: fix segfault when remote URL is missing | expand

Commit Message

Christian Couder March 13, 2025, 10:38 a.m. UTC
Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's push the remote name
instead of `NULL` into the 'strvec' that stores URLs. This is
similar to what other Git commands do. For example `git fetch` uses
the remote name to fetch if no url has been configured. Similarly
`git remote get-url` reports the remote name if no url has been
configured.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

Note that an empty URL can still be configured using something like
`git remote add foo ""`.

While at it, let's warn and reject the remote, in the 'KnownUrl' case,
when no URL or an empty URL is advertised by the server, or when an
empty URL is configured on the client for a remote name advertised by
the server and configured on the client. This is on par with a warning
already emitted when URLs are different in the same case.

Let's also warn if the remote is rejected because name and url are the
same, as it could mean the url has not been configured.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Also let's spell "URL" with uppercase letters in all the warnings.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---

Only 2 small changes since v3:

  - a sentence in a code comment has been removed,
  - a strcmp() has been replaced with strcasecmp()

Range diff since v3:

1:  ea3c8347de ! 1:  f94452eaa2 promisor-remote: fix segfault when remote URL is missing
    @@ promisor-remote.c: static void promisor_info_vecs(struct repository *repo,
      
     -          free(url);
     +          /*
    -+           * No URL defaults to the name of the remote, like
    -+           * elsewhere in Git (e.g. `git fetch` or `git remote
    -+           * get-url`). It's still possible that an empty URL is
    -+           * configured.
    ++           * No URL defaults to the name of the remote, like elsewhere
    ++           * in Git (e.g. `git fetch` or `git remote get-url`).
     +           */
     +          strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
     +
    @@ promisor-remote.c: static int should_accept_remote(enum accept_promisor accept,
     +  warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
                remote_name, urls->v[i], remote_url);
      
    -+  if (!strcmp(remote_name, urls->v[i]))
    ++  if (!strcasecmp(remote_name, urls->v[i]))
     +          warning(_("remote name and URL are the same '%s', "
     +                    "maybe the URL is not configured locally"),
     +                  remote_name);

 promisor-remote.c                     | 44 +++++++++++++++++---
 t/t5710-promisor-remote-capability.sh | 59 +++++++++++++++++++++++++++
 2 files changed, 98 insertions(+), 5 deletions(-)

Comments

Junio C Hamano March 13, 2025, 4:28 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> While at it, let's warn and reject the remote, in the 'KnownUrl' case,
> when no URL or an empty URL is advertised by the server, or when an
> empty URL is configured on the client for a remote name advertised by
> the server and configured on the client. This is on par with a warning
> already emitted when URLs are different in the same case.

That explanation makes it unclear why we need a new one.  If the
configured and davertised are both empty and the same, according to
that "warning already emitted", that is not a warning-worthy event,
is it?

> Let's also warn if the remote is rejected because name and url are the
> same, as it could mean the url has not been configured.

Are we rejecting a remote _because_ r->name is used?  I thought the
code did something quite different.  We reject because the url does
not match, and then after that give an extra warning if remote nick
was used as a fallback URL.  Even if URL is configured as 'orogin'
for a remote with nick 'origin', the code would have rejected the
remote with the same logic in the same code path, wouldn't it?  It
is a bit confusiong to call such a situation "rejected because name
and URL are the same".

In any case, why do we want to keep these unconfigured remotes in
the list of candidate lop-remotes in the first place, and why do we
want to treat empty URLs as being so special, more special than say
a randomly misspelt URL?  I think I asked these questions on the
previous round and I do not think I saw them addressed at all in
this round.

Thanks.
Junio C Hamano March 13, 2025, 5:23 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> In any case, why do we want to keep these unconfigured remotes in
> the list of candidate lop-remotes in the first place, and why do we
> want to treat empty URLs as being so special, more special than say
> a randomly misspelt URL?  I think I asked these questions on the
> previous round and I do not think I saw them addressed at all in
> this round.

Ah, sorry, I saw this new round before seeing the response to v3
review, in which these are covered.
Christian Couder March 14, 2025, 2:10 p.m. UTC | #3
On Thu, Mar 13, 2025 at 5:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > While at it, let's warn and reject the remote, in the 'KnownUrl' case,
> > when no URL or an empty URL is advertised by the server, or when an
> > empty URL is configured on the client for a remote name advertised by
> > the server and configured on the client. This is on par with a warning
> > already emitted when URLs are different in the same case.
>
> That explanation makes it unclear why we need a new one.  If the
> configured and davertised are both empty and the same, according to
> that "warning already emitted", that is not a warning-worthy event,
> is it?

We have to check that remote_url is not NULL before using it in
strcmp(). If it is NULL, we need to reject the remote, and it makes
sense to warn before doing that with `return 0;` because we warn
otherwise when a remote is rejected to try to help diagnose things at
the end of the function.

And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty with something like:

    if (!remote_url || !*remote_url) {
        warning(_("no or empty URL advertised for remote '%s'"), remote_name);
        return 0;
    }

I have used the above in the next version. Also I think this part
deserves its own patch too, so it is in a separate patch in the next
version.

> > Let's also warn if the remote is rejected because name and url are the
> > same, as it could mean the url has not been configured.
>
> Are we rejecting a remote _because_ r->name is used?  I thought the
> code did something quite different.  We reject because the url does
> not match, and then after that give an extra warning if remote nick
> was used as a fallback URL.  Even if URL is configured as 'orogin'
> for a remote with nick 'origin', the code would have rejected the
> remote with the same logic in the same code path, wouldn't it?  It
> is a bit confusiong to call such a situation "rejected because name
> and URL are the same".

Yeah, I have removed the above code as it's not needed anyway if we
don't process the remotes when they don't have a non-empty URL
configured.

Thanks.
diff mbox series

Patch

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..dd589dd4ea 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,13 +323,17 @@  static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		const char *url;
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
 
-		free(url);
+		/*
+		 * No URL defaults to the name of the remote, like elsewhere
+		 * in Git (e.g. `git fetch` or `git remote get-url`).
+		 */
+		strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
+
 		free(url_key);
 	}
 }
@@ -356,7 +360,7 @@  char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (*urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
@@ -409,12 +413,42 @@  static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url) {
+		warning(_("no URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*remote_url) {
+		/*
+		 * This shouldn't happen with a Git server, but not
+		 * sure how other servers will be implemented in the
+		 * future.
+		 */
+		warning(_("empty URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*urls->v[i]) {
+		warning(_("empty URL configured for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
-	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
+	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
 		remote_name, urls->v[i], remote_url);
 
+	if (!strcasecmp(remote_name, urls->v[i]))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured locally"),
+			remote_name);
+
+	if (!strcmp(remote_name, remote_url))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured on the remote side"),
+			remote_name);
+
 	return 0;
 }
 
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..05ae96d1f7 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@  test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,46 @@  test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as URLs are
+	# different, and the server cannot lazy fetch as the LOP URL is
+	# missing, so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config set remote.lop.url "" &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as an empty URL is
+	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
+	# so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&