Message ID | 20250311152413.1059343-1-christian.couder@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] promisor-remote: fix segfault when remote URL is missing | expand |
Christian Couder <christian.couder@gmail.com> writes: > Using strvec_push() to push `NULL` into a 'strvec' results in a > segfault, because `xstrdup(NULL)` crashes. > > So when an URL is missing from the config, let's push an empty string > instead of `NULL` into the 'strvec' that stores URLs. > > We could have modified strvec_push() to behave like > strvec_push_nodup() and accept `NULL`, but it's not clear that it's > the right thing to do for the strvec API. 'strvec' is a kind of NULL > terminated array that is designed to be compatible with 'argv' > variables used on the command line. So we might want to disallow > pushing any `NULL` in it instead. > > It's also not clear if `xstrdup(NULL)` should crash or BUG or just > return NULL. Yup, the above two paragraphs are irrelevant, I would think. What we could have done may be to ignore (or error out) a configuration entry that lacks URL as an error. After all, isn't this caused by a misconfiguration? How such a misconfiguration is swept under the rug, whether with an empty string or NULL, is secondary, I would think. > For all these reasons, let's just focus on fixing the issue in > "promisor-remote.c" and let's leave improving the strvec API and/or > xstrdup() for a future effort. Absolutely. > While at it let's warn and reject the remote, in the 'KnownUrl' > case, when no URL is advertised by the server or no URL is > configured on the client for a remote name advertised by the server > and configured on the client. This is on par with a warning already > emitted when URLs are different in the same case. Yup.
Christian Couder <christian.couder@gmail.com> writes:
> + GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
This one triggers test-lint violation.
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 23203814d6..4c5c3c7656 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -257,7 +257,7 @@ test_expect_success "clone with 'KnownUrl' and url not advertised" '
# It should fail because the client will reject the LOP
# as URLs are different, and the server cannot lazy fetch as
# the LOP URL is missing.
- GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
+ test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
-c remote.lop.url="file://$(pwd)/lop" \
-c promisor.acceptfromserver=KnownUrl \
On Tue, Mar 11, 2025 at 04:24:13PM +0100, Christian Couder wrote: > Using strvec_push() to push `NULL` into a 'strvec' results in a > segfault, because `xstrdup(NULL)` crashes. > > So when an URL is missing from the config, let's push an empty string > instead of `NULL` into the 'strvec' that stores URLs. Is a configured remote with out a url key really a missing url, though? In other contexts it defaults to the name of the remote. E.g.: # make a repo so "foo" is a valid url git init foo git -C foo commit --allow-empty bar # configure a fetch refspec, but no url! git init git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*' # now fetching will use the configured refspec with a url of "foo" git fetch foo # and git-remote will report it, along with its url git remote ;# shows "foo" git remote --get-url foo ;# also shows "foo" This is obviously a weird thing to be doing, so I admit I don't really care all that much. But it feels like the most natural thing is just: diff --git a/promisor-remote.c b/promisor-remote.c index 6a0a61382f..761eb1dbd5 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -327,7 +327,7 @@ static void promisor_info_vecs(struct repository *repo, char *url_key = xstrfmt("remote.%s.url", r->name); strvec_push(names, r->name); - strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); + strvec_push(urls, git_config_get_string(url_key, &url) ? r->name : url); free(url); free(url_key); > We could have modified strvec_push() to behave like > strvec_push_nodup() and accept `NULL`, but it's not clear that it's > the right thing to do for the strvec API. 'strvec' is a kind of NULL > terminated array that is designed to be compatible with 'argv' > variables used on the command line. So we might want to disallow > pushing any `NULL` in it instead. > > It's also not clear if `xstrdup(NULL)` should crash or BUG or just > return NULL. We have xstrdup_or_null() for the latter suggestion. There was some light discussion at the time about having xstrdup(NULL) handle this automatically: https://lore.kernel.org/git/20150112231231.GA4023@peff.net/ but it was mostly negative. I don't think anybody really dug into the thought experiment beyond a general "it might propagate NULL places you wouldn't expect" vibe, though. For the same reason I'd be a little hesitant to bless NULLs inside strvec structures. I think "nodup" allowing them is mostly an unintended consequence. > For all these reasons, let's just focus on fixing the issue in > "promisor-remote.c" and let's leave improving the strvec API and/or > xstrdup() for a future effort. This part I certainly agree with. ;) > for (r = repo->promisor_remote_config->promisors; r; r = r->next) { > - char *url; > + char *url = NULL; > + const char *url_pushed = ""; > char *url_key = xstrfmt("remote.%s.url", r->name); > > + if (!git_config_get_string(url_key, &url) && url) > + url_pushed = url; > + > strvec_push(names, r->name); > - strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); > + strvec_push(urls, url_pushed); > > free(url); Probably not super important, but while reading this I noticed that using git_config_get_string_tmp() would make the memory management a little simpler (since you do not need to free "url", you are free to point it to at the empty string and do not need a separate url_pushed). -Peff
Jeff King <peff@peff.net> writes: > Is a configured remote with out a url key really a missing url, though? > In other contexts it defaults to the name of the remote. E.g.: > > # make a repo so "foo" is a valid url > git init foo > git -C foo commit --allow-empty bar > > # configure a fetch refspec, but no url! > git init > git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*' > > # now fetching will use the configured refspec with a url of "foo" > git fetch foo > > # and git-remote will report it, along with its url > git remote ;# shows "foo" > git remote --get-url foo ;# also shows "foo" Yeah, that does sound like a more natural way to look at it.
On Tue, Mar 11, 2025 at 9:48 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > + GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \ > > This one triggers test-lint violation. > > diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh > index 23203814d6..4c5c3c7656 100755 > --- a/t/t5710-promisor-remote-capability.sh > +++ b/t/t5710-promisor-remote-capability.sh > @@ -257,7 +257,7 @@ test_expect_success "clone with 'KnownUrl' and url not advertised" ' > # It should fail because the client will reject the LOP > # as URLs are different, and the server cannot lazy fetch as > # the LOP URL is missing. > - GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \ > + test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \ > -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \ > -c remote.lop.url="file://$(pwd)/lop" \ > -c promisor.acceptfromserver=KnownUrl \ Sorry for forgetting to check that and thanks for the fix. I use it in the next version.
On Wed, Mar 12, 2025 at 12:06 AM Jeff King <peff@peff.net> wrote: > > On Tue, Mar 11, 2025 at 04:24:13PM +0100, Christian Couder wrote: > > > Using strvec_push() to push `NULL` into a 'strvec' results in a > > segfault, because `xstrdup(NULL)` crashes. > > > > So when an URL is missing from the config, let's push an empty string > > instead of `NULL` into the 'strvec' that stores URLs. > > Is a configured remote with out a url key really a missing url, though? > In other contexts it defaults to the name of the remote. E.g.: > > # make a repo so "foo" is a valid url > git init foo > git -C foo commit --allow-empty bar > > # configure a fetch refspec, but no url! > git init > git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*' > > # now fetching will use the configured refspec with a url of "foo" > git fetch foo > > # and git-remote will report it, along with its url > git remote ;# shows "foo" > git remote --get-url foo ;# also shows "foo" > > This is obviously a weird thing to be doing, so I admit I don't really > care all that much. But it feels like the most natural thing is just: > > diff --git a/promisor-remote.c b/promisor-remote.c > index 6a0a61382f..761eb1dbd5 100644 > --- a/promisor-remote.c > +++ b/promisor-remote.c > @@ -327,7 +327,7 @@ static void promisor_info_vecs(struct repository *repo, > char *url_key = xstrfmt("remote.%s.url", r->name); > > strvec_push(names, r->name); > - strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); > + strvec_push(urls, git_config_get_string(url_key, &url) ? r->name : url); > > free(url); > free(url_key); Yeah, right I am using this in the next version. I have added warnings to help debug this in the case a remote is rejected because urls are different, as I think it could confuse users. > > We could have modified strvec_push() to behave like > > strvec_push_nodup() and accept `NULL`, but it's not clear that it's > > the right thing to do for the strvec API. 'strvec' is a kind of NULL > > terminated array that is designed to be compatible with 'argv' > > variables used on the command line. So we might want to disallow > > pushing any `NULL` in it instead. > > > > It's also not clear if `xstrdup(NULL)` should crash or BUG or just > > return NULL. > > We have xstrdup_or_null() for the latter suggestion. Yeah, I forgot about it. I think it makes sense to replace xstrdup() with xstrdup_or_null() in strvec_push(). If we ever want a mode (possibly the default one) that forbids NULL in strvec, we could add that on top. But right now as strvec_push_nodup() accepts NULL, I think it makes sense for strvec_push() to accept NULL too. Anyway this is something we can work on after the release. > There was some > light discussion at the time about having xstrdup(NULL) handle this > automatically: > > https://lore.kernel.org/git/20150112231231.GA4023@peff.net/ > > but it was mostly negative. I don't think anybody really dug into the > thought experiment beyond a general "it might propagate NULL places you > wouldn't expect" vibe, though. I don't mind having both xstrdup() and xstrdup_or_null(). At least it gives a hint to readers about NULL being expected or not. > For the same reason I'd be a little hesitant to bless NULLs inside > strvec structures. I think "nodup" allowing them is mostly an unintended > consequence. Yeah, but then if we ever need a strvec like struct that can contain NULL, it would be kind of sad to have a separate struct with its own files mostly duplicating the strvec code. I think we would then be better with strvec having two modes, one accepting NULL and one rejecting it. > > For all these reasons, let's just focus on fixing the issue in > > "promisor-remote.c" and let's leave improving the strvec API and/or > > xstrdup() for a future effort. > > This part I certainly agree with. ;) > > > for (r = repo->promisor_remote_config->promisors; r; r = r->next) { > > - char *url; > > + char *url = NULL; > > + const char *url_pushed = ""; > > char *url_key = xstrfmt("remote.%s.url", r->name); > > > > + if (!git_config_get_string(url_key, &url) && url) > > + url_pushed = url; > > + > > strvec_push(names, r->name); > > - strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); > > + strvec_push(urls, url_pushed); > > > > free(url); > > Probably not super important, but while reading this I noticed that > using git_config_get_string_tmp() would make the memory management a > little simpler (since you do not need to free "url", you are free to > point it to at the empty string and do not need a separate url_pushed). Yeah, I will use this in the next version. Thanks for the review.
On Tue, Mar 11, 2025 at 5:59 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > We could have modified strvec_push() to behave like > > strvec_push_nodup() and accept `NULL`, but it's not clear that it's > > the right thing to do for the strvec API. 'strvec' is a kind of NULL > > terminated array that is designed to be compatible with 'argv' > > variables used on the command line. So we might want to disallow > > pushing any `NULL` in it instead. > > > > It's also not clear if `xstrdup(NULL)` should crash or BUG or just > > return NULL. > > Yup, the above two paragraphs are irrelevant, I would think. I have removed them. Thanks.
diff --git a/promisor-remote.c b/promisor-remote.c index 6a0a61382f..92786d72b4 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -323,11 +323,15 @@ static void promisor_info_vecs(struct repository *repo, promisor_remote_init(repo); for (r = repo->promisor_remote_config->promisors; r; r = r->next) { - char *url; + char *url = NULL; + const char *url_pushed = ""; char *url_key = xstrfmt("remote.%s.url", r->name); + if (!git_config_get_string(url_key, &url) && url) + url_pushed = url; + strvec_push(names, r->name); - strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url); + strvec_push(urls, url_pushed); free(url); free(url_key); @@ -356,7 +360,7 @@ char *promisor_remote_info(struct repository *repo) strbuf_addch(&sb, ';'); strbuf_addstr(&sb, "name="); strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized); - if (urls.v[i]) { + if (*urls.v[i]) { strbuf_addstr(&sb, ",url="); strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized); } @@ -409,6 +413,16 @@ static int should_accept_remote(enum accept_promisor accept, if (accept != ACCEPT_KNOWN_URL) BUG("Unhandled 'enum accept_promisor' value '%d'", accept); + if (!remote_url) { + warning(_("no URL advertised for remote '%s'"), remote_name); + return 0; + } + + if (!*urls->v[i]) { + warning(_("no URL configured for remote '%s'"), remote_name); + return 0; + } + if (!strcmp(urls->v[i], remote_url)) return 1; diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh index d2cc69a17e..23203814d6 100755 --- a/t/t5710-promisor-remote-capability.sh +++ b/t/t5710-promisor-remote-capability.sh @@ -193,6 +193,25 @@ test_expect_success "clone with 'KnownName' and different remote names" ' initialize_server 1 "$oid" ' +test_expect_success "clone with 'KnownName' and missing URL in the config" ' + git -C server config promisor.advertise true && + + # Clone from server to create a client + # Lazy fetching by the client from the LOP will fail because of the + # missing URL in the client config, so the server will have to lazy + # fetch from the LOP. + GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \ + -c promisor.acceptfromserver=KnownName \ + --no-local --filter="blob:limit=5k" server client && + test_when_finished "rm -rf client" && + + # Check that the largest object is not missing on the server + check_missing_objects server 0 "" && + + # Reinitialize server so that the largest object is missing again + initialize_server 1 "$oid" +' + test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" ' git -C server config promisor.advertise true && @@ -228,6 +247,26 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" ' initialize_server 1 "$oid" ' +test_expect_success "clone with 'KnownUrl' and url not advertised" ' + git -C server config promisor.advertise true && + + git -C server config unset remote.lop.url && + test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" && + + # Clone from server to create a client + # It should fail because the client will reject the LOP + # as URLs are different, and the server cannot lazy fetch as + # the LOP URL is missing. + GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \ + -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \ + -c remote.lop.url="file://$(pwd)/lop" \ + -c promisor.acceptfromserver=KnownUrl \ + --no-local --filter="blob:limit=5k" server client && + + # Check that the largest object is still missing on the server + check_missing_objects server 1 "$oid" +' + test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" ' git -C server config promisor.advertise true &&
Using strvec_push() to push `NULL` into a 'strvec' results in a segfault, because `xstrdup(NULL)` crashes. So when an URL is missing from the config, let's push an empty string instead of `NULL` into the 'strvec' that stores URLs. We could have modified strvec_push() to behave like strvec_push_nodup() and accept `NULL`, but it's not clear that it's the right thing to do for the strvec API. 'strvec' is a kind of NULL terminated array that is designed to be compatible with 'argv' variables used on the command line. So we might want to disallow pushing any `NULL` in it instead. It's also not clear if `xstrdup(NULL)` should crash or BUG or just return NULL. For all these reasons, let's just focus on fixing the issue in "promisor-remote.c" and let's leave improving the strvec API and/or xstrdup() for a future effort. While at it let's warn and reject the remote, in the 'KnownUrl' case, when no URL is advertised by the server or no URL is configured on the client for a remote name advertised by the server and configured on the client. This is on par with a warning already emitted when URLs are different in the same case. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- promisor-remote.c | 20 +++++++++++--- t/t5710-promisor-remote-capability.sh | 39 +++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 3 deletions(-)