diff mbox series

[v2] promisor-remote: fix segfault when remote URL is missing

Message ID 20250311152413.1059343-1-christian.couder@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] promisor-remote: fix segfault when remote URL is missing | expand

Commit Message

Christian Couder March 11, 2025, 3:24 p.m. UTC
Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's push an empty string
instead of `NULL` into the 'strvec' that stores URLs.

We could have modified strvec_push() to behave like
strvec_push_nodup() and accept `NULL`, but it's not clear that it's
the right thing to do for the strvec API. 'strvec' is a kind of NULL
terminated array that is designed to be compatible with 'argv'
variables used on the command line. So we might want to disallow
pushing any `NULL` in it instead.

It's also not clear if `xstrdup(NULL)` should crash or BUG or just
return NULL.

For all these reasons, let's just focus on fixing the issue in
"promisor-remote.c" and let's leave improving the strvec API and/or
xstrdup() for a future effort.

While at it let's warn and reject the remote, in the 'KnownUrl'
case, when no URL is advertised by the server or no URL is
configured on the client for a remote name advertised by the server
and configured on the client. This is on par with a warning already
emitted when URLs are different in the same case.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 20 +++++++++++---
 t/t5710-promisor-remote-capability.sh | 39 +++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 3 deletions(-)

Comments

Junio C Hamano March 11, 2025, 4:59 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> Using strvec_push() to push `NULL` into a 'strvec' results in a
> segfault, because `xstrdup(NULL)` crashes.
>
> So when an URL is missing from the config, let's push an empty string
> instead of `NULL` into the 'strvec' that stores URLs.
>

> We could have modified strvec_push() to behave like
> strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> the right thing to do for the strvec API. 'strvec' is a kind of NULL
> terminated array that is designed to be compatible with 'argv'
> variables used on the command line. So we might want to disallow
> pushing any `NULL` in it instead.
>
> It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> return NULL.

Yup, the above two paragraphs are irrelevant, I would think.

What we could have done may be to ignore (or error out) a
configuration entry that lacks URL as an error.  After all, isn't
this caused by a misconfiguration?  How such a misconfiguration is
swept under the rug, whether with an empty string or NULL, is
secondary, I would think.

> For all these reasons, let's just focus on fixing the issue in
> "promisor-remote.c" and let's leave improving the strvec API and/or
> xstrdup() for a future effort.

Absolutely.

> While at it let's warn and reject the remote, in the 'KnownUrl'
> case, when no URL is advertised by the server or no URL is
> configured on the client for a remote name advertised by the server
> and configured on the client. This is on par with a warning already
> emitted when URLs are different in the same case.

Yup.
Junio C Hamano March 11, 2025, 8:48 p.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> +	GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \

This one triggers test-lint violation.

diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 23203814d6..4c5c3c7656 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -257,7 +257,7 @@ test_expect_success "clone with 'KnownUrl' and url not advertised" '
 	# It should fail because the client will reject the LOP
 	# as URLs are different, and the server cannot lazy fetch as
 	# the LOP URL is missing.
-	GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
 		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=KnownUrl \
Jeff King March 11, 2025, 11:06 p.m. UTC | #3
On Tue, Mar 11, 2025 at 04:24:13PM +0100, Christian Couder wrote:

> Using strvec_push() to push `NULL` into a 'strvec' results in a
> segfault, because `xstrdup(NULL)` crashes.
> 
> So when an URL is missing from the config, let's push an empty string
> instead of `NULL` into the 'strvec' that stores URLs.

Is a configured remote with out a url key really a missing url, though?
In other contexts it defaults to the name of the remote. E.g.:

  # make a repo so "foo" is a valid url
  git init foo
  git -C foo commit --allow-empty bar

  # configure a fetch refspec, but no url!
  git init
  git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*'

  # now fetching will use the configured refspec with a url of "foo"
  git fetch foo

  # and git-remote will report it, along with its url
  git remote ;# shows "foo"
  git remote --get-url foo ;# also shows "foo"

This is obviously a weird thing to be doing, so I admit I don't really
care all that much. But it feels like the most natural thing is just:

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..761eb1dbd5 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -327,7 +327,7 @@ static void promisor_info_vecs(struct repository *repo,
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		strvec_push(urls, git_config_get_string(url_key, &url) ? r->name : url);
 
 		free(url);
 		free(url_key);

> We could have modified strvec_push() to behave like
> strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> the right thing to do for the strvec API. 'strvec' is a kind of NULL
> terminated array that is designed to be compatible with 'argv'
> variables used on the command line. So we might want to disallow
> pushing any `NULL` in it instead.
> 
> It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> return NULL.

We have xstrdup_or_null() for the latter suggestion. There was some
light discussion at the time about having xstrdup(NULL) handle this
automatically:

  https://lore.kernel.org/git/20150112231231.GA4023@peff.net/

but it was mostly negative. I don't think anybody really dug into the
thought experiment beyond a general "it might propagate NULL places you
wouldn't expect" vibe, though.

For the same reason I'd be a little hesitant to bless NULLs inside
strvec structures. I think "nodup" allowing them is mostly an unintended
consequence.

> For all these reasons, let's just focus on fixing the issue in
> "promisor-remote.c" and let's leave improving the strvec API and/or
> xstrdup() for a future effort.

This part I certainly agree with. ;)

>  	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> -		char *url;
> +		char *url = NULL;
> +		const char *url_pushed = "";
>  		char *url_key = xstrfmt("remote.%s.url", r->name);
>  
> +		if (!git_config_get_string(url_key, &url) && url)
> +			url_pushed = url;
> +
>  		strvec_push(names, r->name);
> -		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> +		strvec_push(urls, url_pushed);
>  
>  		free(url);

Probably not super important, but while reading this I noticed that
using git_config_get_string_tmp() would make the memory management a
little simpler (since you do not need to free "url", you are free to
point it to at the empty string and do not need a separate url_pushed).

-Peff
Junio C Hamano March 11, 2025, 11:36 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> Is a configured remote with out a url key really a missing url, though?
> In other contexts it defaults to the name of the remote. E.g.:
>
>   # make a repo so "foo" is a valid url
>   git init foo
>   git -C foo commit --allow-empty bar
>
>   # configure a fetch refspec, but no url!
>   git init
>   git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*'
>
>   # now fetching will use the configured refspec with a url of "foo"
>   git fetch foo
>
>   # and git-remote will report it, along with its url
>   git remote ;# shows "foo"
>   git remote --get-url foo ;# also shows "foo"

Yeah, that does sound like a more natural way to look at it.
Christian Couder March 12, 2025, 11:47 a.m. UTC | #5
On Tue, Mar 11, 2025 at 9:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > +     GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
>
> This one triggers test-lint violation.
>
> diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
> index 23203814d6..4c5c3c7656 100755
> --- a/t/t5710-promisor-remote-capability.sh
> +++ b/t/t5710-promisor-remote-capability.sh
> @@ -257,7 +257,7 @@ test_expect_success "clone with 'KnownUrl' and url not advertised" '
>         # It should fail because the client will reject the LOP
>         # as URLs are different, and the server cannot lazy fetch as
>         # the LOP URL is missing.
> -       GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
> +       test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
>                 -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
>                 -c remote.lop.url="file://$(pwd)/lop" \
>                 -c promisor.acceptfromserver=KnownUrl \

Sorry for forgetting to check that and thanks for the fix. I use it in
the next version.
Christian Couder March 12, 2025, 11:47 a.m. UTC | #6
On Wed, Mar 12, 2025 at 12:06 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Mar 11, 2025 at 04:24:13PM +0100, Christian Couder wrote:
>
> > Using strvec_push() to push `NULL` into a 'strvec' results in a
> > segfault, because `xstrdup(NULL)` crashes.
> >
> > So when an URL is missing from the config, let's push an empty string
> > instead of `NULL` into the 'strvec' that stores URLs.
>
> Is a configured remote with out a url key really a missing url, though?
> In other contexts it defaults to the name of the remote. E.g.:
>
>   # make a repo so "foo" is a valid url
>   git init foo
>   git -C foo commit --allow-empty bar
>
>   # configure a fetch refspec, but no url!
>   git init
>   git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*'
>
>   # now fetching will use the configured refspec with a url of "foo"
>   git fetch foo
>
>   # and git-remote will report it, along with its url
>   git remote ;# shows "foo"
>   git remote --get-url foo ;# also shows "foo"
>
> This is obviously a weird thing to be doing, so I admit I don't really
> care all that much. But it feels like the most natural thing is just:
>
> diff --git a/promisor-remote.c b/promisor-remote.c
> index 6a0a61382f..761eb1dbd5 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -327,7 +327,7 @@ static void promisor_info_vecs(struct repository *repo,
>                 char *url_key = xstrfmt("remote.%s.url", r->name);
>
>                 strvec_push(names, r->name);
> -               strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> +               strvec_push(urls, git_config_get_string(url_key, &url) ? r->name : url);
>
>                 free(url);
>                 free(url_key);

Yeah, right I am using this in the next version. I have added warnings
to help debug this in the case a remote is rejected because urls are
different, as I think it could confuse users.

> > We could have modified strvec_push() to behave like
> > strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> > the right thing to do for the strvec API. 'strvec' is a kind of NULL
> > terminated array that is designed to be compatible with 'argv'
> > variables used on the command line. So we might want to disallow
> > pushing any `NULL` in it instead.
> >
> > It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> > return NULL.
>
> We have xstrdup_or_null() for the latter suggestion.

Yeah, I forgot about it. I think it makes sense to replace xstrdup()
with xstrdup_or_null() in strvec_push().

If we ever want a mode (possibly the default one) that forbids NULL in
strvec, we could add that on top. But right now as strvec_push_nodup()
accepts NULL, I think it makes sense for strvec_push() to accept NULL
too.

Anyway this is something we can work on after the release.

> There was some
> light discussion at the time about having xstrdup(NULL) handle this
> automatically:
>
>   https://lore.kernel.org/git/20150112231231.GA4023@peff.net/
>
> but it was mostly negative. I don't think anybody really dug into the
> thought experiment beyond a general "it might propagate NULL places you
> wouldn't expect" vibe, though.

I don't mind having both xstrdup() and xstrdup_or_null(). At least it
gives a hint to readers about NULL being expected or not.

> For the same reason I'd be a little hesitant to bless NULLs inside
> strvec structures. I think "nodup" allowing them is mostly an unintended
> consequence.

Yeah, but then if we ever need a strvec like struct that can contain
NULL, it would be kind of sad to have a separate struct with its own
files mostly duplicating the strvec code. I think we would then be
better with strvec having two modes, one accepting NULL and one
rejecting it.

> > For all these reasons, let's just focus on fixing the issue in
> > "promisor-remote.c" and let's leave improving the strvec API and/or
> > xstrdup() for a future effort.
>
> This part I certainly agree with. ;)
>
> >       for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> > -             char *url;
> > +             char *url = NULL;
> > +             const char *url_pushed = "";
> >               char *url_key = xstrfmt("remote.%s.url", r->name);
> >
> > +             if (!git_config_get_string(url_key, &url) && url)
> > +                     url_pushed = url;
> > +
> >               strvec_push(names, r->name);
> > -             strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> > +             strvec_push(urls, url_pushed);
> >
> >               free(url);
>
> Probably not super important, but while reading this I noticed that
> using git_config_get_string_tmp() would make the memory management a
> little simpler (since you do not need to free "url", you are free to
> point it to at the empty string and do not need a separate url_pushed).

Yeah, I will use this in the next version.

Thanks for the review.
Christian Couder March 12, 2025, 11:48 a.m. UTC | #7
On Tue, Mar 11, 2025 at 5:59 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:

> > We could have modified strvec_push() to behave like
> > strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> > the right thing to do for the strvec API. 'strvec' is a kind of NULL
> > terminated array that is designed to be compatible with 'argv'
> > variables used on the command line. So we might want to disallow
> > pushing any `NULL` in it instead.
> >
> > It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> > return NULL.
>
> Yup, the above two paragraphs are irrelevant, I would think.

I have removed them.

Thanks.
diff mbox series

Patch

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..92786d72b4 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,11 +323,15 @@  static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		char *url = NULL;
+		const char *url_pushed = "";
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
+		if (!git_config_get_string(url_key, &url) && url)
+			url_pushed = url;
+
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		strvec_push(urls, url_pushed);
 
 		free(url);
 		free(url_key);
@@ -356,7 +360,7 @@  char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (*urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
@@ -409,6 +413,16 @@  static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url) {
+		warning(_("no URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*urls->v[i]) {
+		warning(_("no URL configured for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..23203814d6 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@  test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,26 @@  test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP
+	# as URLs are different, and the server cannot lazy fetch as
+	# the LOP URL is missing.
+	GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&