Message ID | pull.1767.git.1722009170590.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | http: do not ignore proxy path | expand |
"Ryan Hendrickson via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> > > The documentation for `http.proxy` describes that option, and the > environment variables it overrides, as supporting "the syntax understood > by curl". curl allows SOCKS proxies to use a path to a Unix domain > socket, like `socks5h://localhost/path/to/socket.sock`. Git should > therefore include, if present, the path part of the proxy URL in what it > passes to libcurl. > > Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> > --- > http: do not ignore proxy path > > * Documentation: do I need to add anything? http.proxy documentation says The syntax thus is [protocol://][user[:password]@]proxyhost[:port]. but the updated code pays attention to what can come after the "host[:post]" part, does it not? > diff --git a/http.c b/http.c > index 623ed234891..0cd75986a6b 100644 > --- a/http.c > +++ b/http.c I am unfamiliar with this code path, so let me follow along aloud. > @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void) > if (!proxy_auth.host) > die("Invalid proxy URL '%s'", curl_http_proxy); We grabbed the value from the configuration variable (or various environment variables like $http_proxy) in the curl_http_proxy variable, and then passed it to credential_from_url() to parse parts out of [protocol://][user[:password]@]proxyhost[:port][/p/a/t/h]. The parsed result is in proxy_auth struct and there is no hope it can be usable if the .host member is missing. > - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); We used to only use the .host member but the curl_http_proxy could have had a path after it, held in the .path member. > + if (proxy_auth.path) { > + struct strbuf proxy = STRBUF_INIT; > + strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); > + curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); > + strbuf_release(&proxy); > + } else > + curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); Style. If "if" side needs {braces} because it consists of multiple statements, the corresponding "else" side should also have {braces} around its body, even if it only has a single statement. If you have the proxy strbuf in a bit wider scope, then the above becomes if (proxy_auth.path) strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); else strbuf_addstr(&proxy, proxy_auth.host); curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); strbuf_release(&proxy); which might (I am not decided) be easier to follow. Could existing users have been taking advantage of the fact that the extra /path at the end of http.proxy (and $http_proxy and friends) are ignored? For them, this change will appear as a regression. Other than that (and the lack of any documentation and test updates), looking good. Thanks.
At 2024-07-26 09:29-0700, Junio C Hamano <gitster@pobox.com> sent: > http.proxy documentation says > > The syntax thus is [protocol://][user[:password]@]proxyhost[:port]. > > but the updated code pays attention to what can come after the > "host[:post]" part, does it not? Correct; I'll add a "[/path]" to that construction. >> + if (proxy_auth.path) { >> + struct strbuf proxy = STRBUF_INIT; >> + strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); >> + curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); >> + strbuf_release(&proxy); >> + } else >> + curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); > > Style. If "if" side needs {braces} because it consists of multiple > statements, the corresponding "else" side should also have {braces} > around its body, even if it only has a single statement. > > If you have the proxy strbuf in a bit wider scope, then the above becomes > > if (proxy_auth.path) > strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); > else > strbuf_addstr(&proxy, proxy_auth.host); > curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); > strbuf_release(&proxy); > > which might (I am not decided) be easier to follow. For what it's worth, I was following the precedent set by the if-statement starting at line 1256 (a few lines above this patch): > if (strstr(curl_http_proxy, "://")) > credential_from_url(&proxy_auth, curl_http_proxy); > else { > struct strbuf url = STRBUF_INIT; > strbuf_addf(&url, "http://%s", curl_http_proxy); > credential_from_url(&proxy_auth, url.buf); > strbuf_release(&url); > } <https://github.com/git/git/blob/ad57f148c6b5f8735b62238dda8f571c582e0e54/http.c#L1256> I have no problem with being inconsistent with surrounding code in these style choices; just let me know what I should do in light of that. > Could existing users have been taking advantage of the fact that the > extra /path at the end of http.proxy (and $http_proxy and friends) > are ignored? For them, this change will appear as a regression. That is possible, though I have difficulty imagining a scenario in which it would be intentional. What do you recommend I do about that possibility? R
Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> writes: > For what it's worth, I was following the precedent set by the > if-statement starting at line 1256 (a few lines above this patch): It is worth nothing. Existing violation does not make it a good idea to add more of them. It makes it more work to clean them all up later. >> Could existing users have been taking advantage of the fact that the >> extra /path at the end of http.proxy (and $http_proxy and friends) >> are ignored? For them, this change will appear as a regression. > > That is possible, though I have difficulty imagining a scenario in > which it would be intentional. > > What do you recommend I do about that possibility? I have no idea. I already said that I am not familiar with this code path, and it is likely I am a worse person than you are to find a potential creative (ab)uses of the mechanism to take advantage of the fact that the current code ignores the path part. Having said that, given that http.proxy falls back to environment variables that have been established a lot longer than Git itself, like HTTPS_PROXY etc., if we _know_ that setting HTTPS_PROXY to such a value with /path part causes curl (or other popular programs) to try using such a value without stripping the path part (and even better if that causes them to fail), then such an observation would make a very good argument why it is extremely unlikely that existing users used such a setting, hence it is unlikely this patch would make a regression. Thanks.
On Fri, Jul 26, 2024 at 03:52:50PM +0000, Ryan Hendrickson via GitGitGadget wrote: > * Tests: I could use a pointer on how best to add a test for this. > Adding a case to t5564-http-proxy.sh seems straightforward but I > don't think httpd can be configured to listen to domain sockets; can > I use netcat? I don't offhand know of a way to test this without a custom program like netcat. If it's the only option, it's OK to use tools that might not be available everywhere as long as the tests are marked as optional with the appropriate prerequisite. You can find prior art by looking for test_lazy_prereq calls (e.g., the ones for GZIP or ZIPINFO are pretty straight-forward). I would warn that there are several not-quite-compatible variants of netcat floating around, which can create headaches. You might be better off with a short perl script using IO::Socket::UNIX or similar. > diff --git a/http.c b/http.c > index 623ed234891..0cd75986a6b 100644 > --- a/http.c > +++ b/http.c > @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void) > if (!proxy_auth.host) > die("Invalid proxy URL '%s'", curl_http_proxy); > > - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); > + if (proxy_auth.path) { > + struct strbuf proxy = STRBUF_INIT; > + strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); > + curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); > + strbuf_release(&proxy); > + } else > + curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); The fields in the proxy_auth struct have been parsed from the url, with any url encoding removed. But then we paste them back together into a pseudo-url without doing any further encoding. Is that correct? I doubt that the host contains a "/", but if you had a path that contained a "%", then the URL form of that is going to be %25. Which is curl expecting to get here? I say "pseudo-url" because it is weird to slap a path on the end of the hostname but leave off the scheme, etc. Which kind of makes me wonder why we pass proxy_auth.host in the first place, and not simply the original curl_http_proxy. It looks like a weird interaction between 372370f167 (http: use credential API to handle proxy authentication, 2016-01-26) and 57415089bd (http: honor empty http.proxy option to bypass proxy, 2017-04-11). The former added a _second_ CURLOPT_PROXY call, and the latter removed the first one. I wonder if we could go back to passing the string straight to curl (as we did prior to 2016), and keeping the proxy_auth struct purely as a mechanism for gathering credentials. That would presumably fix your use case. And this is also perhaps an interesting data point for Junio's question about regressions (if people were passing paths, it used to work and was broken in 2016). -Peff
At 2024-07-26 17:11-0400, Jeff King <peff@peff.net> sent: > I would warn that there are several not-quite-compatible variants of > netcat floating around, which can create headaches. You might be better > off with a short perl script using IO::Socket::UNIX or similar. Ah, okay, thanks for the pointer! >> diff --git a/http.c b/http.c >> index 623ed234891..0cd75986a6b 100644 >> --- a/http.c >> +++ b/http.c >> @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void) >> if (!proxy_auth.host) >> die("Invalid proxy URL '%s'", curl_http_proxy); >> >> - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); >> + if (proxy_auth.path) { >> + struct strbuf proxy = STRBUF_INIT; >> + strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); >> + curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); >> + strbuf_release(&proxy); >> + } else >> + curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); > > The fields in the proxy_auth struct have been parsed from the url, with > any url encoding removed. But then we paste them back together into a > pseudo-url without doing any further encoding. Is that correct? > > I doubt that the host contains a "/", but if you had a path that > contained a "%", then the URL form of that is going to be %25. Which is > curl expecting to get here? Oh, I see! Yes, this is an issue with my patch: if I create a socket file named "%30", command-line curl wants http_proxy to contain "%2530", and patched Git wants http_proxy to contain "%252530". Good edge case to put in a test. > I wonder if we could go back to passing the string straight to curl (as > we did prior to 2016), and keeping the proxy_auth struct purely as a > mechanism for gathering credentials. Hmm, that would be nice, but I think curl doesn't deal well with the extra case that Git supports of specifying a username but no password. It causes one of the existing tests to fail if I pass the string straight through. On top of that, all of those starts_with tests for checking the protocol are written quite loosely, so in practice Git "supports" the protocols "socks://" and "socksonmyfeet://" by mapping them both to SOCKS4, and curl would not like it if it received those strings directly. So given that Git wants to handle the protocol and the credentials, it makes sense that only the host and the path are passed to curl. I just have to make sure that they are correctly re-encoded. R
On Fri, Jul 26, 2024 at 06:43:29PM -0400, Ryan Hendrickson wrote: > > I wonder if we could go back to passing the string straight to curl (as > > we did prior to 2016), and keeping the proxy_auth struct purely as a > > mechanism for gathering credentials. > > Hmm, that would be nice, but I think curl doesn't deal well with the extra > case that Git supports of specifying a username but no password. It causes > one of the existing tests to fail if I pass the string straight through. OK, thanks for trying. It would have been nice, but I'm not surprised that there's an unusual interaction. > On top of that, all of those starts_with tests for checking the protocol are > written quite loosely, so in practice Git "supports" the protocols > "socks://" and "socksonmyfeet://" by mapping them both to SOCKS4, and curl > would not like it if it received those strings directly. > > So given that Git wants to handle the protocol and the credentials, it makes > sense that only the host and the path are passed to curl. I just have to > make sure that they are correctly re-encoded. Makes sense. -Peff
diff --git a/http.c b/http.c index 623ed234891..0cd75986a6b 100644 --- a/http.c +++ b/http.c @@ -1265,7 +1265,13 @@ static CURL *get_curl_handle(void) if (!proxy_auth.host) die("Invalid proxy URL '%s'", curl_http_proxy); - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); + if (proxy_auth.path) { + struct strbuf proxy = STRBUF_INIT; + strbuf_addf(&proxy, "%s/%s", proxy_auth.host, proxy_auth.path); + curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); + strbuf_release(&proxy); + } else + curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); var_override(&curl_no_proxy, getenv("NO_PROXY")); var_override(&curl_no_proxy, getenv("no_proxy")); curl_easy_setopt(result, CURLOPT_NOPROXY, curl_no_proxy);