Message ID | 20220504104601.136403-1-chriscool@tuxfamily.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] http: add custom hostname to IP address resolutions | expand |
On Wed, May 04, 2022 at 12:46:01PM +0200, Christian Couder wrote: > diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh > index f92c79c132..d97380be87 100755 > --- a/t/t5551-http-fetch-smart.sh > +++ b/t/t5551-http-fetch-smart.sh > @@ -567,4 +567,11 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' > grep symref=HEAD:refs/heads/ trace > ' > > +test_expect_success 'passing hostname resolution information works' ' > + BOGUS_HOST=gitbogusexamplehost.com && minor nitpick, but better to use example.com here which is reserved for this type of uses and therefore unlikely to conflict with a possibly assigned domain. Carlo
On Thu, May 5, 2022 at 1:21 PM Carlo Marcelo Arenas Belón <carenas@gmail.com> wrote: > > On Wed, May 04, 2022 at 12:46:01PM +0200, Christian Couder wrote: > > diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh > > index f92c79c132..d97380be87 100755 > > --- a/t/t5551-http-fetch-smart.sh > > +++ b/t/t5551-http-fetch-smart.sh > > @@ -567,4 +567,11 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' > > grep symref=HEAD:refs/heads/ trace > > ' > > > > +test_expect_success 'passing hostname resolution information works' ' > > + BOGUS_HOST=gitbogusexamplehost.com && > > minor nitpick, but better to use example.com here which is reserved for this > type of uses and therefore unlikely to conflict with a possibly assigned > domain. Sorry for not replying to this earlier, but Junio previously suggested the following: "Perhaps invent a totally bogus domain name, map that to localhost ::1, run a test server locally, and try to clone from that bogus domain?" (See: https://lore.kernel.org/git/xmqqfslrycvp.fsf@gitster.g/) I think "a totally bogus domain name" refers to something other than "example.com". Also "example.com" does seem to resolve to an IP address and even has an HTTP(S) server on it, while I think the purpose of the test would be to check that there is not even a valid DNS resolution when the new option is not used.
Christian Couder <christian.couder@gmail.com> writes: > "Perhaps invent a totally bogus domain name, map that to > localhost ::1, run a test server locally, and try to clone from that > bogus domain?" > > (See: https://lore.kernel.org/git/xmqqfslrycvp.fsf@gitster.g/) > > I think "a totally bogus domain name" refers to something other than > "example.com". I meant a domain that should not be used for purposes other than being examples in the real world, including "example.com". But RFC6761, which is an update to RFC2606, describes a set of properties that make .invalid nice domain to use, including: 1. Users are free to use "invalid" names as they would any other domain names. Users MAY assume that queries for "invalid" names will always return NXDOMAIN responses. 3. Name resolution APIs and libraries SHOULD recognize "invalid" names as special and SHOULD always return immediate negative responses. Name resolution APIs SHOULD NOT send queries for "invalid" names to their configured caching DNS server(s). Another possibility is ".test" but it is more for testing DNS, not application, i.e. 1. Users are free to use these test names as they would any other domain names. However, since there is no central authority responsible for use of test names, users SHOULD be aware that these names are likely to yield different results on different networks. 3. Name resolution APIs and libraries SHOULD NOT recognize test names as special and SHOULD NOT treat them differently. Name resolution APIs SHOULD send queries for test names to their configured caching DNS server(s). so for a code like what we are discussing, which would not want the names to be shown to DNS and yield any IP address, ".test" makes a poorer "bogus domain name" than ".invalid", I think. By the way, we seem to have references to .xz top-level domain, which appeared only in earlier drafts of what became RFC2606 (which was updated by RFC6761) in both documentation pages and tests. At some point we may want to update the former to ".example" and the latter to ".invalid" as a clean-up. > Also "example.com" does seem to resolve to an IP address and even has > an HTTP(S) server on it, while I think the purpose of the test would > be to check that there is not even a valid DNS resolution when the new > option is not used. Yup, that makes ".invalid" a better candidate, I think. Thanks.
On Thu, May 12, 2022 at 6:22 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > "Perhaps invent a totally bogus domain name, map that to > > localhost ::1, run a test server locally, and try to clone from that > > bogus domain?" > > > > (See: https://lore.kernel.org/git/xmqqfslrycvp.fsf@gitster.g/) > > > > I think "a totally bogus domain name" refers to something other than > > "example.com". > > I meant a domain that should not be used for purposes other than > being examples in the real world, including "example.com". Ok, thanks for the clarification and for copying the relevant RFC information below. > But RFC6761, which is an update to RFC2606, describes a set of > properties that make .invalid nice domain to use, including: > > 1. Users are free to use "invalid" names as they would any other > domain names. Users MAY assume that queries for "invalid" > names will always return NXDOMAIN responses. > > 3. Name resolution APIs and libraries SHOULD recognize "invalid" > names as special and SHOULD always return immediate negative > responses. Name resolution APIs SHOULD NOT send queries for > "invalid" names to their configured caching DNS server(s). I wonder if libcurl considers itself as a name resolution library or not. It has a DNS cache, so maybe in some ways it is. Also however it considers itself now, it could perhaps change in the future. Even if the current developers are against such a change, a new RFC might be more precise and specify something for libraries like libcurl which could make it change. So I am not so sure that using "invalid" is our best bet. > Another possibility is ".test" but it is more for testing DNS, not > application, i.e. In a way we are testing DNS, as we are actually testing libcurl's DNS caching and its CURLOPT_RESOLVE option (even if we also test that Git is correctly passing the config option to libcurl at the same time). > 1. Users are free to use these test names as they would any other > domain names. However, since there is no central authority > responsible for use of test names, users SHOULD be aware that > these names are likely to yield different results on different > networks. > > 3. Name resolution APIs and libraries SHOULD NOT recognize test > names as special and SHOULD NOT treat them differently. Name > resolution APIs SHOULD send queries for test names to their > configured caching DNS server(s). So with this we can at least expect that the way libcurl considers itself will have no impact on our tests. > so for a code like what we are discussing, which would not want the > names to be shown to DNS and yield any IP address, ".test" makes a > poorer "bogus domain name" than ".invalid", I think. I would think that there are risks in both cases. I am Ok with using any of the following in the test: BOGUS_HOST=gitbogusexamplehost.invalid # or BOGUS_HOST=gitbogusexamplehost.test The test passes for me either way. > By the way, we seem to have references to .xz top-level domain, > which appeared only in earlier drafts of what became RFC2606 (which > was updated by RFC6761) in both documentation pages and tests. At > some point we may want to update the former to ".example" and the > latter to ".invalid" as a clean-up. Yeah, good idea. > > Also "example.com" does seem to resolve to an IP address and even has > > an HTTP(S) server on it, while I think the purpose of the test would > > be to check that there is not even a valid DNS resolution when the new > > option is not used. > > Yup, that makes ".invalid" a better candidate, I think. Ok, I will use "gitbogusexamplehost.invalid" in the next iteration then.
diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt index 7003661c0d..86f8a5978f 100644 --- a/Documentation/config/http.txt +++ b/Documentation/config/http.txt @@ -98,6 +98,22 @@ http.version:: - HTTP/2 - HTTP/1.1 +http.resolve:: + Hostname resolution information that will be used first when sending + HTTP requests. This information should be in one of the following + formats: + + - [+]HOST:PORT:ADDRESS[,ADDRESS] + - -HOST:PORT + ++ +The first format redirects all requests to the given `HOST:PORT` +to the provided `ADDRESS`(s). The second format clears all +previous config values for that `HOST:PORT` combination. To +allow easy overriding of all the settings inherited from the +system config, an empty value will reset all resolution +information to the empty list. + http.sslVersion:: The SSL version to use when negotiating an SSL connection, if you want to force the default. The available and default version diff --git a/http.c b/http.c index 229da4d148..7f3b7403ce 100644 --- a/http.c +++ b/http.c @@ -128,6 +128,8 @@ static struct curl_slist *pragma_header; static struct curl_slist *no_pragma_header; static struct string_list extra_http_headers = STRING_LIST_INIT_DUP; +static struct curl_slist *host_resolutions; + static struct active_request_slot *active_queue_head; static char *cached_accept_language; @@ -393,6 +395,18 @@ static int http_options(const char *var, const char *value, void *cb) return 0; } + if (!strcmp("http.resolve", var)) { + if (!value) { + return config_error_nonbool(var); + } else if (!*value) { + curl_slist_free_all(host_resolutions); + host_resolutions = NULL; + } else { + host_resolutions = curl_slist_append(host_resolutions, value); + } + return 0; + } + if (!strcmp("http.followredirects", var)) { if (value && !strcmp(value, "initial")) http_follow_config = HTTP_FOLLOW_INITIAL; @@ -1131,6 +1145,9 @@ void http_cleanup(void) curl_slist_free_all(no_pragma_header); no_pragma_header = NULL; + curl_slist_free_all(host_resolutions); + host_resolutions = NULL; + if (curl_http_proxy) { free((void *)curl_http_proxy); curl_http_proxy = NULL; @@ -1211,6 +1228,7 @@ struct active_request_slot *get_active_slot(void) if (curl_save_cookies) curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file); curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header); + curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions); curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr); curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL); curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index f92c79c132..d97380be87 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -567,4 +567,11 @@ test_expect_success 'client falls back from v2 to v0 to match server' ' grep symref=HEAD:refs/heads/ trace ' +test_expect_success 'passing hostname resolution information works' ' + BOGUS_HOST=gitbogusexamplehost.com && + BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT && + test_must_fail git ls-remote "$BOGUS_HTTPD_URL/smart/repo.git" >/dev/null && + git -c "http.resolve=$BOGUS_HOST:$LIB_HTTPD_PORT:127.0.0.1" ls-remote "$BOGUS_HTTPD_URL/smart/repo.git" >/dev/null +' + test_done
Libcurl has a CURLOPT_RESOLVE easy option that allows the result of hostname resolution in the following format to be passed: [+]HOST:PORT:ADDRESS[,ADDRESS] This way, redirects and everything operating against the HOST+PORT will use the provided ADDRESS(s). The following format is also allowed to stop using hostname resolutions that have already been passed: -HOST:PORT See https://curl.se/libcurl/c/CURLOPT_RESOLVE.html for more details. Let's add a corresponding "http.resolve" config option that takes advantage of CURLOPT_RESOLVE. Each value configured for the "http.resolve" key is passed "as is" to curl through CURLOPT_RESOLVE, so it should be in one of the above 2 formats. This keeps the implementation simple and makes us consistent with libcurl's CURLOPT_RESOLVE, and with curl's corresponding `--resolve` command line option. The implementation uses CURLOPT_RESOLVE only in get_active_slot() which is called by all the HTTP request sending functions. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- Changes since v1 are the following: - rename the new config option to "http.resolve" - use "resolution" instead of "resolve" for the noun - use "format" instead of "form" - improved commit message and documentation - stop using a string_list and remove unnecessary related variables and functions - add a simple test Thanks to Junio and Carlo for the suggestions. As this version is very different from v1, I am not sure a range diff would be very useful as it would be very long compared to the size of the patch. Documentation/config/http.txt | 16 ++++++++++++++++ http.c | 18 ++++++++++++++++++ t/t5551-http-fetch-smart.sh | 7 +++++++ 3 files changed, 41 insertions(+)