[v2] http: add custom hostname to IP address resolutions

Message ID	20220504104601.136403-1-chriscool@tuxfamily.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Christian Couder <chriscool@tuxfamily.org>, Derrick Stolee <derrickstolee@github.com>, Carlo Arenas <carenas@gmail.com> Subject: [PATCH v2] http: add custom hostname to IP address resolutions Date: Wed, 4 May 2022 12:46:01 +0200 Message-Id: <20220504104601.136403-1-chriscool@tuxfamily.org> In-Reply-To: <20220502083639.610279-1-chriscool@tuxfamily.org> References: <20220502083639.610279-1-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] http: add custom hostname to IP address resolutions \| expand [v2] http: add custom hostname to IP address resolutions

Message ID

20220504104601.136403-1-chriscool@tuxfamily.org (mailing list archive)

State

Superseded

Headers

From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
        Christian Couder <chriscool@tuxfamily.org>,
        Derrick Stolee <derrickstolee@github.com>,
        Carlo Arenas <carenas@gmail.com>
Subject: [PATCH v2] http: add custom hostname to IP address resolutions
Date: Wed,  4 May 2022 12:46:01 +0200
Message-Id: <20220504104601.136403-1-chriscool@tuxfamily.org>
In-Reply-To: <20220502083639.610279-1-chriscool@tuxfamily.org>
References: <20220502083639.610279-1-chriscool@tuxfamily.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[v2] http: add custom hostname to IP address resolutions | expand

Commit Message

Christian Couder May 4, 2022, 10:46 a.m. UTC

Libcurl has a CURLOPT_RESOLVE easy option that allows
the result of hostname resolution in the following
format to be passed:

	[+]HOST:PORT:ADDRESS[,ADDRESS]

This way, redirects and everything operating against the
HOST+PORT will use the provided ADDRESS(s).

The following format is also allowed to stop using
hostname resolutions that have already been passed:

	-HOST:PORT

See https://curl.se/libcurl/c/CURLOPT_RESOLVE.html for
more details.

Let's add a corresponding "http.resolve" config option
that takes advantage of CURLOPT_RESOLVE.

Each value configured for the "http.resolve" key is
passed "as is" to curl through CURLOPT_RESOLVE, so it
should be in one of the above 2 formats. This keeps the
implementation simple and makes us consistent with
libcurl's CURLOPT_RESOLVE, and with curl's corresponding
`--resolve` command line option.

The implementation uses CURLOPT_RESOLVE only in
get_active_slot() which is called by all the HTTP
request sending functions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---

Changes since v1 are the following:

  - rename the new config option to "http.resolve" 
  - use "resolution" instead of "resolve" for the noun
  - use "format" instead of "form"
  - improved commit message and documentation
  - stop using a string_list and remove unnecessary
    related variables and functions
  - add a simple test

Thanks to Junio and Carlo for the suggestions.

As this version is very different from v1, I am not
sure a range diff would be very useful as it would
be very long compared to the size of the patch.

 Documentation/config/http.txt | 16 ++++++++++++++++
 http.c                        | 18 ++++++++++++++++++
 t/t5551-http-fetch-smart.sh   |  7 +++++++
 3 files changed, 41 insertions(+)

Comments

Carlo Marcelo Arenas Belón May 5, 2022, 11:21 a.m. UTC | #1

On Wed, May 04, 2022 at 12:46:01PM +0200, Christian Couder wrote:
> diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> index f92c79c132..d97380be87 100755
> --- a/t/t5551-http-fetch-smart.sh
> +++ b/t/t5551-http-fetch-smart.sh
> @@ -567,4 +567,11 @@ test_expect_success 'client falls back from v2 to v0 to match server' '
>  	grep symref=HEAD:refs/heads/ trace
>  '
>  
> +test_expect_success 'passing hostname resolution information works' '
> +	BOGUS_HOST=gitbogusexamplehost.com &&

minor nitpick, but better to use example.com here which is reserved for this
type of uses and therefore unlikely to conflict with a possibly assigned
domain.

Carlo

Christian Couder May 12, 2022, 8:52 a.m. UTC | #2

On Thu, May 5, 2022 at 1:21 PM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
>
> On Wed, May 04, 2022 at 12:46:01PM +0200, Christian Couder wrote:
> > diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> > index f92c79c132..d97380be87 100755
> > --- a/t/t5551-http-fetch-smart.sh
> > +++ b/t/t5551-http-fetch-smart.sh
> > @@ -567,4 +567,11 @@ test_expect_success 'client falls back from v2 to v0 to match server' '
> >       grep symref=HEAD:refs/heads/ trace
> >  '
> >
> > +test_expect_success 'passing hostname resolution information works' '
> > +     BOGUS_HOST=gitbogusexamplehost.com &&
>
> minor nitpick, but better to use example.com here which is reserved for this
> type of uses and therefore unlikely to conflict with a possibly assigned
> domain.

Sorry for not replying to this earlier, but Junio previously suggested
the following:

"Perhaps invent a totally bogus domain name, map that to
localhost ::1, run a test server locally, and try to clone from that
bogus domain?"

(See: https://lore.kernel.org/git/xmqqfslrycvp.fsf@gitster.g/)

I think "a totally bogus domain name" refers to something other than
"example.com".

Also "example.com" does seem to resolve to an IP address and even has
an HTTP(S) server on it, while I think the purpose of the test would
be to check that there is not even a valid DNS resolution when the new
option is not used.

Junio C Hamano May 12, 2022, 4:22 p.m. UTC | #3

Christian Couder <christian.couder@gmail.com> writes:

> "Perhaps invent a totally bogus domain name, map that to
> localhost ::1, run a test server locally, and try to clone from that
> bogus domain?"
>
> (See: https://lore.kernel.org/git/xmqqfslrycvp.fsf@gitster.g/)
>
> I think "a totally bogus domain name" refers to something other than
> "example.com".

I meant a domain that should not be used for purposes other than
being examples in the real world, including "example.com".

But RFC6761, which is an update to RFC2606, describes a set of
properties that make .invalid nice domain to use, including:

 1.  Users are free to use "invalid" names as they would any other
     domain names.  Users MAY assume that queries for "invalid"
     names will always return NXDOMAIN responses.

 3.  Name resolution APIs and libraries SHOULD recognize "invalid"
     names as special and SHOULD always return immediate negative
     responses.  Name resolution APIs SHOULD NOT send queries for
     "invalid" names to their configured caching DNS server(s).

Another possibility is ".test" but it is more for testing DNS, not
application, i.e.

 1.  Users are free to use these test names as they would any other
     domain names.  However, since there is no central authority
     responsible for use of test names, users SHOULD be aware that
     these names are likely to yield different results on different
     networks.

 3.  Name resolution APIs and libraries SHOULD NOT recognize test
     names as special and SHOULD NOT treat them differently.  Name
     resolution APIs SHOULD send queries for test names to their
     configured caching DNS server(s).

so for a code like what we are discussing, which would not want the
names to be shown to DNS and yield any IP address, ".test" makes a
poorer "bogus domain name" than ".invalid", I think.

By the way, we seem to have references to .xz top-level domain,
which appeared only in earlier drafts of what became RFC2606 (which
was updated by RFC6761) in both documentation pages and tests.  At
some point we may want to update the former to ".example" and the
latter to ".invalid" as a clean-up.

> Also "example.com" does seem to resolve to an IP address and even has
> an HTTP(S) server on it, while I think the purpose of the test would
> be to check that there is not even a valid DNS resolution when the new
> option is not used.

Yup, that makes ".invalid" a better candidate, I think.

Thanks.

Christian Couder May 12, 2022, 6:57 p.m. UTC | #4

On Thu, May 12, 2022 at 6:22 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > "Perhaps invent a totally bogus domain name, map that to
> > localhost ::1, run a test server locally, and try to clone from that
> > bogus domain?"
> >
> > (See: https://lore.kernel.org/git/xmqqfslrycvp.fsf@gitster.g/)
> >
> > I think "a totally bogus domain name" refers to something other than
> > "example.com".
>
> I meant a domain that should not be used for purposes other than
> being examples in the real world, including "example.com".

Ok, thanks for the clarification and for copying the relevant RFC
information below.

> But RFC6761, which is an update to RFC2606, describes a set of
> properties that make .invalid nice domain to use, including:
>
>  1.  Users are free to use "invalid" names as they would any other
>      domain names.  Users MAY assume that queries for "invalid"
>      names will always return NXDOMAIN responses.
>
>  3.  Name resolution APIs and libraries SHOULD recognize "invalid"
>      names as special and SHOULD always return immediate negative
>      responses.  Name resolution APIs SHOULD NOT send queries for
>      "invalid" names to their configured caching DNS server(s).

I wonder if libcurl considers itself as a name resolution library or
not. It has a DNS cache, so maybe in some ways it is. Also however it
considers itself now, it could perhaps change in the future. Even if
the current developers are against such a change, a new RFC might be
more precise and specify something for libraries like libcurl which
could make it change.

So I am not so sure that using "invalid" is our best bet.

> Another possibility is ".test" but it is more for testing DNS, not
> application, i.e.

In a way we are testing DNS, as we are actually testing libcurl's DNS
caching and its CURLOPT_RESOLVE option (even if we also test that Git
is correctly passing the config option to libcurl at the same time).

>  1.  Users are free to use these test names as they would any other
>      domain names.  However, since there is no central authority
>      responsible for use of test names, users SHOULD be aware that
>      these names are likely to yield different results on different
>      networks.
>
>  3.  Name resolution APIs and libraries SHOULD NOT recognize test
>      names as special and SHOULD NOT treat them differently.  Name
>      resolution APIs SHOULD send queries for test names to their
>      configured caching DNS server(s).

So with this we can at least expect that the way libcurl considers
itself will have no impact on our tests.

> so for a code like what we are discussing, which would not want the
> names to be shown to DNS and yield any IP address, ".test" makes a
> poorer "bogus domain name" than ".invalid", I think.

I would think that there are risks in both cases. I am Ok with using
any of the following in the test:

BOGUS_HOST=gitbogusexamplehost.invalid # or
BOGUS_HOST=gitbogusexamplehost.test

The test passes for me either way.

> By the way, we seem to have references to .xz top-level domain,
> which appeared only in earlier drafts of what became RFC2606 (which
> was updated by RFC6761) in both documentation pages and tests.  At
> some point we may want to update the former to ".example" and the
> latter to ".invalid" as a clean-up.

Yeah, good idea.

> > Also "example.com" does seem to resolve to an IP address and even has
> > an HTTP(S) server on it, while I think the purpose of the test would
> > be to check that there is not even a valid DNS resolution when the new
> > option is not used.
>
> Yup, that makes ".invalid" a better candidate, I think.

Ok, I will use "gitbogusexamplehost.invalid" in the next iteration then.

diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt
index 7003661c0d..86f8a5978f 100644
--- a/Documentation/config/http.txt
+++ b/Documentation/config/http.txt
@@ -98,6 +98,22 @@  http.version::
 	- HTTP/2
 	- HTTP/1.1
 
+http.resolve::
+	Hostname resolution information that will be used first when sending
+	HTTP requests.  This information should be in one of the following
+	formats:
+
+	- [+]HOST:PORT:ADDRESS[,ADDRESS]
+	- -HOST:PORT
+
++
+The first format redirects all requests to the given `HOST:PORT`
+to the provided `ADDRESS`(s). The second format clears all
+previous config values for that `HOST:PORT` combination.  To
+allow easy overriding of all the settings inherited from the
+system config, an empty value will reset all resolution
+information to the empty list.
+
 http.sslVersion::
 	The SSL version to use when negotiating an SSL connection, if you
 	want to force the default.  The available and default version
diff --git a/http.c b/http.c
index 229da4d148..7f3b7403ce 100644
--- a/http.c
+++ b/http.c
@@ -128,6 +128,8 @@  static struct curl_slist *pragma_header;
 static struct curl_slist *no_pragma_header;
 static struct string_list extra_http_headers = STRING_LIST_INIT_DUP;
 
+static struct curl_slist *host_resolutions;
+
 static struct active_request_slot *active_queue_head;
 
 static char *cached_accept_language;
@@ -393,6 +395,18 @@  static int http_options(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp("http.resolve", var)) {
+		if (!value) {
+			return config_error_nonbool(var);
+		} else if (!*value) {
+			curl_slist_free_all(host_resolutions);
+			host_resolutions = NULL;
+		} else {
+			host_resolutions = curl_slist_append(host_resolutions, value);
+		}
+		return 0;
+	}
+
 	if (!strcmp("http.followredirects", var)) {
 		if (value && !strcmp(value, "initial"))
 			http_follow_config = HTTP_FOLLOW_INITIAL;
@@ -1131,6 +1145,9 @@  void http_cleanup(void)
 	curl_slist_free_all(no_pragma_header);
 	no_pragma_header = NULL;
 
+	curl_slist_free_all(host_resolutions);
+	host_resolutions = NULL;
+
 	if (curl_http_proxy) {
 		free((void *)curl_http_proxy);
 		curl_http_proxy = NULL;
@@ -1211,6 +1228,7 @@  struct active_request_slot *get_active_slot(void)
 	if (curl_save_cookies)
 		curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file);
 	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header);
+	curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions);
 	curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr);
 	curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index f92c79c132..d97380be87 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -567,4 +567,11 @@  test_expect_success 'client falls back from v2 to v0 to match server' '
 	grep symref=HEAD:refs/heads/ trace
 '
 
+test_expect_success 'passing hostname resolution information works' '
+	BOGUS_HOST=gitbogusexamplehost.com &&
+	BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT &&
+	test_must_fail git ls-remote "$BOGUS_HTTPD_URL/smart/repo.git" >/dev/null &&
+	git -c "http.resolve=$BOGUS_HOST:$LIB_HTTPD_PORT:127.0.0.1" ls-remote "$BOGUS_HTTPD_URL/smart/repo.git" >/dev/null
+'
+
 test_done

[v2] http: add custom hostname to IP address resolutions

Commit Message

Comments

Patch