diff mbox series

[v4] http: do not ignore proxy path

Message ID pull.1767.v4.git.1722489776279.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v4] http: do not ignore proxy path | expand

Commit Message

Ryan Hendrickson Aug. 1, 2024, 5:22 a.m. UTC
From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>

The documentation for `http.proxy` describes that option, and the
environment variables it overrides, as supporting "the syntax understood
by curl". curl allows SOCKS proxies to use a path to a Unix domain
socket, like `socks5h://localhost/path/to/socket.sock`. Git should
therefore include, if present, the path part of the proxy URL in what it
passes to libcurl.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
---
    http: do not ignore proxy path
    
    Re earlier discussion about regressions, it turns out that curl has only
    supported this syntax since 2022 [https://curl.se/ch/7.84.0.html].
    Earlier versions of curl, if provided an http_proxy that contained a
    path, would also ignore it. curl didn't seem to consider this a breaking
    change when the feature was introduced
    [https://github.com/curl/curl/pull/8668], though; is that a valid
    argument for Git not to lose sleep over it either?
    
    A test has been added, but unfortunately it is not being run on any of
    GitHub's CI workflows, because the runner images in use all have
    versions of libcurl that are too old. It works on my machine, but
    unknown troubles may await someone else.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1767%2Frhendric%2Frhendric%2Fhttp-proxy-path-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1767/rhendric/rhendric/http-proxy-path-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1767

Range-diff vs v3:

 1:  7a58da7102e ! 1:  507fb75c1a6 http: do not ignore proxy path
     @@ http.c: static CURL *get_curl_handle(void)
       		curl_easy_setopt(result, CURLOPT_PROXY, "");
       	} else if (curl_http_proxy) {
      +		struct strbuf proxy = STRBUF_INIT;
     ++
       		if (starts_with(curl_http_proxy, "socks5h"))
       			curl_easy_setopt(result,
       				CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);
     @@ http.c: static CURL *get_curl_handle(void)
      +		strbuf_addstr(&proxy, proxy_auth.host);
      +		if (proxy_auth.path) {
      +			curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
     ++
      +			if (ver->version_num < 0x075400)
      +				die("libcurl 7.84 or later is required to support paths in proxy URLs");
      +
     @@ t/t5564-http-proxy.sh: test_expect_success 'clone can prompt for proxy password'
      +	mkfifo socks_output &&
      +	{
      +		"$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
     -+		socks_pid=$!
     ++		echo $! > "$TRASH_DIRECTORY/socks.pid"
      +	} &&
      +	read line <socks_output &&
      +	test "$line" = ready
      +}
      +
     -+test_expect_success PERL 'try to start SOCKS proxy' '
     -+	# The %30 tests that the correct amount of percent-encoding is applied
     -+	# to the proxy string passed to curl.
     -+	if start_socks %30.sock
     -+	then
     -+		test_set_prereq SOCKS_PROXY
     -+	fi
     -+'
     ++# The %30 tests that the correct amount of percent-encoding is applied to the
     ++# proxy string passed to curl.
     ++test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'
     ++
     ++test_atexit 'test ! -e "$TRASH_DIRECTORY/socks.pid" || kill "$(cat "$TRASH_DIRECTORY/socks.pid")"'
      +
      +test_expect_success SOCKS_PROXY 'clone via Unix socket' '
      +	test_when_finished "rm -rf clone" &&
     @@ t/t5564-http-proxy.sh: test_expect_success 'clone can prompt for proxy password'
      +	}
      +'
      +
     -+test_expect_success SOCKS_PROXY 'stop SOCKS proxy' 'kill "$socks_pid"'
     -+
      +test_expect_success 'Unix socket requires socks*:' '
      +	! git clone -c http.proxy=localhost/path https://example.com/repo.git 2>err && {
      +		grep "^fatal: Invalid proxy URL '\''localhost/path'\'': only SOCKS proxies support paths" err ||


 Documentation/config/http.txt |  4 +--
 http.c                        | 24 +++++++++++++++++-
 t/socks4-proxy.pl             | 48 +++++++++++++++++++++++++++++++++++
 t/t5564-http-proxy.sh         | 41 ++++++++++++++++++++++++++++++
 4 files changed, 114 insertions(+), 3 deletions(-)
 create mode 100644 t/socks4-proxy.pl


base-commit: 39bf06adf96da25b87c9aa7d35a32ef3683eb4a4

Comments

Jeff King Aug. 1, 2024, 6:04 a.m. UTC | #1
On Thu, Aug 01, 2024 at 05:22:56AM +0000, Ryan Hendrickson via GitGitGadget wrote:

> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
> 
> The documentation for `http.proxy` describes that option, and the
> environment variables it overrides, as supporting "the syntax understood
> by curl". curl allows SOCKS proxies to use a path to a Unix domain
> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
> therefore include, if present, the path part of the proxy URL in what it
> passes to libcurl.
> 
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>

Thanks for crediting me. I'll add my:

 Signed-off-by: Jeff King <peff@peff.net>

to be explicit that the proxy script is under the DCO.

> -		curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
> +		strbuf_addstr(&proxy, proxy_auth.host);
> +		if (proxy_auth.path) {
> +			curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
> +
> +			if (ver->version_num < 0x075400)
> +				die("libcurl 7.84 or later is required to support paths in proxy URLs");
> +
> +			if (!starts_with(proxy_auth.protocol, "socks"))
> +				die("Invalid proxy URL '%s': only SOCKS proxies support paths",
> +				    curl_http_proxy);
> +
> +			if (strcasecmp(proxy_auth.host, "localhost"))
> +				die("Invalid proxy URL '%s': host must be localhost if a path is present",
> +				    curl_http_proxy);
> +
> +			strbuf_addch(&proxy, '/');
> +			strbuf_add_percentencode(&proxy, proxy_auth.path, 0);

This all looks good to me. I wondered briefly whether
proxy_auth.protocol could ever be NULL, but I think we refuse to parse
such a URL.

> +start_socks() {
> +	mkfifo socks_output &&
> +	{
> +		"$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
> +		echo $! > "$TRASH_DIRECTORY/socks.pid"
> +	} &&
> +	read line <socks_output &&
> +	test "$line" = ready
> +}
> +
> +# The %30 tests that the correct amount of percent-encoding is applied to the
> +# proxy string passed to curl.
> +test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'

OK, I see you figured out that the lazy prereq requires giving the full
path to the socket. :) I had forgotten that we also run the prereq in a
subshell to avoid side effects, but you caught that, as well.

All of this to me is good evidence that the non-lazy version you had
originally is a better approach. But I don't think it's worth spending
time fighting over, so I'm OK either way.

> +test_atexit 'test ! -e "$TRASH_DIRECTORY/socks.pid" || kill "$(cat "$TRASH_DIRECTORY/socks.pid")"'

Ah, good. I had meant to mention cleaning up at the end (as we do for
git-daemon and apache), but forgot. I'm glad you included it here.

> +test_expect_success SOCKS_PROXY 'clone via Unix socket' '
> +	test_when_finished "rm -rf clone" &&
> +	test_config_global http.proxy "socks4://localhost$PWD/%2530.sock" && {
> +		{
> +			GIT_TRACE_CURL=$PWD/trace git clone "$HTTPD_URL/smart/repo.git" clone 2>err &&
> +			grep -i "SOCKS4 request granted." trace
> +		} ||
> +		grep "^fatal: libcurl 7\.84 or later" err
> +	}
> +'

Looks good. Usually I do not bother with escaping "." in grep messages,
as it is already a loose match and it keeps the test easier to read. But
it is OK to do so.

We do have a test_grep wrapper which gives nicer output when the match
fails, but maybe that is a bad choice here, since we accept either of
the two messages. (Likewise for using test_cmp, I suppose).

The use of "||" in the command-chaining is unusual enough that it's
probably worth calling out either in a comment or in the commit message.

> +test_expect_success 'Unix socket requires socks*:' '
> +	! git clone -c http.proxy=localhost/path https://example.com/repo.git 2>err && {
> +		grep "^fatal: Invalid proxy URL '\''localhost/path'\'': only SOCKS proxies support paths" err ||
> +		grep "^fatal: libcurl 7\.84 or later" err
> +	}
> +'
> +
> +test_expect_success 'Unix socket requires localhost' '
> +	! git clone -c http.proxy=socks4://127.0.0.1/path https://example.com/repo.git 2>err && {
> +		grep "^fatal: Invalid proxy URL '\''socks4://127\.0\.0\.1/path'\'': host must be localhost if a path is present" err ||
> +		grep "^fatal: libcurl 7\.84 or later" err
> +	}
> +'
> +

Likewise here, I'd probably just match the single-quotes with "." for
readability (but it's OK to write it as you did). You can (since a week
or so ago) also use a here-doc body like:

  test_expect_success 'Unix socket requires socks*:' - <<\EOT
	...
	test_grep "^fatal: Invalid proxy URL 'localhost/path': only SOCKS proxies support paths" err ||
	...
  EOT

but I'm OK with it either way. The same "||" comment applies.

-Peff
Junio C Hamano Aug. 1, 2024, 5:04 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Thu, Aug 01, 2024 at 05:22:56AM +0000, Ryan Hendrickson via GitGitGadget wrote:
>
>> From: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>> 
>> The documentation for `http.proxy` describes that option, and the
>> environment variables it overrides, as supporting "the syntax understood
>> by curl". curl allows SOCKS proxies to use a path to a Unix domain
>> socket, like `socks5h://localhost/path/to/socket.sock`. Git should
>> therefore include, if present, the path part of the proxy URL in what it
>> passes to libcurl.
>> 
>> Co-authored-by: Jeff King <peff@peff.net>
>> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
>
> Thanks for crediting me. I'll add my:
>
>  Signed-off-by: Jeff King <peff@peff.net>
>
> to be explicit that the proxy script is under the DCO.

OK, I'll amend it while queuing this v4.

Thanks.

>> +# The %30 tests that the correct amount of percent-encoding is applied to the
>> +# proxy string passed to curl.
>> +test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'
>
> OK, I see you figured out that the lazy prereq requires giving the full
> path to the socket. :) I had forgotten that we also run the prereq in a
> subshell to avoid side effects, but you caught that, as well.

;-)

> All of this to me is good evidence that the non-lazy version you had
> originally is a better approach. But I don't think it's worth spending
> time fighting over, so I'm OK either way.

I'd be OK either way, too.

Thanks, both.
diff mbox series

Patch

diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt
index 162b33fc52f..a14371b5c96 100644
--- a/Documentation/config/http.txt
+++ b/Documentation/config/http.txt
@@ -5,8 +5,8 @@  http.proxy::
 	proxy string with a user name but no password, in which case git will
 	attempt to acquire one in the same way it does for other credentials. See
 	linkgit:gitcredentials[7] for more information. The syntax thus is
-	'[protocol://][user[:password]@]proxyhost[:port]'. This can be overridden
-	on a per-remote basis; see remote.<name>.proxy
+	'[protocol://][user[:password]@]proxyhost[:port][/path]'. This can be
+	overridden on a per-remote basis; see remote.<name>.proxy
 +
 Any proxy, however configured, must be completely transparent and must not
 modify, transform, or buffer the request or response in any way.  Proxies which
diff --git a/http.c b/http.c
index 623ed234891..6c6cc5c822a 100644
--- a/http.c
+++ b/http.c
@@ -1227,6 +1227,8 @@  static CURL *get_curl_handle(void)
 		 */
 		curl_easy_setopt(result, CURLOPT_PROXY, "");
 	} else if (curl_http_proxy) {
+		struct strbuf proxy = STRBUF_INIT;
+
 		if (starts_with(curl_http_proxy, "socks5h"))
 			curl_easy_setopt(result,
 				CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);
@@ -1265,7 +1267,27 @@  static CURL *get_curl_handle(void)
 		if (!proxy_auth.host)
 			die("Invalid proxy URL '%s'", curl_http_proxy);
 
-		curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host);
+		strbuf_addstr(&proxy, proxy_auth.host);
+		if (proxy_auth.path) {
+			curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW);
+
+			if (ver->version_num < 0x075400)
+				die("libcurl 7.84 or later is required to support paths in proxy URLs");
+
+			if (!starts_with(proxy_auth.protocol, "socks"))
+				die("Invalid proxy URL '%s': only SOCKS proxies support paths",
+				    curl_http_proxy);
+
+			if (strcasecmp(proxy_auth.host, "localhost"))
+				die("Invalid proxy URL '%s': host must be localhost if a path is present",
+				    curl_http_proxy);
+
+			strbuf_addch(&proxy, '/');
+			strbuf_add_percentencode(&proxy, proxy_auth.path, 0);
+		}
+		curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf);
+		strbuf_release(&proxy);
+
 		var_override(&curl_no_proxy, getenv("NO_PROXY"));
 		var_override(&curl_no_proxy, getenv("no_proxy"));
 		curl_easy_setopt(result, CURLOPT_NOPROXY, curl_no_proxy);
diff --git a/t/socks4-proxy.pl b/t/socks4-proxy.pl
new file mode 100644
index 00000000000..4c3a35c0083
--- /dev/null
+++ b/t/socks4-proxy.pl
@@ -0,0 +1,48 @@ 
+use strict;
+use IO::Select;
+use IO::Socket::UNIX;
+use IO::Socket::INET;
+
+my $path = shift;
+
+unlink($path);
+my $server = IO::Socket::UNIX->new(Listen => 1, Local => $path)
+	or die "unable to listen on $path: $!";
+
+$| = 1;
+print "ready\n";
+
+while (my $client = $server->accept()) {
+	sysread $client, my $buf, 8;
+	my ($version, $cmd, $port, $ip) = unpack 'CCnN', $buf;
+	next unless $version == 4; # socks4
+	next unless $cmd == 1; # TCP stream connection
+
+	# skip NUL-terminated id
+	while (sysread $client, my $char, 1) {
+		last unless ord($char);
+	}
+
+	# version(0), reply(5a == granted), port (ignored), ip (ignored)
+	syswrite $client, "\x00\x5a\x00\x00\x00\x00\x00\x00";
+
+	my $remote = IO::Socket::INET->new(PeerHost => $ip, PeerPort => $port)
+		or die "unable to connect to $ip/$port: $!";
+
+	my $io = IO::Select->new($client, $remote);
+	while ($io->count) {
+		for my $fh ($io->can_read(0)) {
+			for my $pair ([$client, $remote], [$remote, $client]) {
+				my ($from, $to) = @$pair;
+				next unless $fh == $from;
+
+				my $r = sysread $from, my $buf, 1024;
+				if (!defined $r || $r <= 0) {
+					$io->remove($from);
+					next;
+				}
+				syswrite $to, $buf;
+			}
+		}
+	}
+}
diff --git a/t/t5564-http-proxy.sh b/t/t5564-http-proxy.sh
index bb35b87071d..039d7fc748e 100755
--- a/t/t5564-http-proxy.sh
+++ b/t/t5564-http-proxy.sh
@@ -39,4 +39,45 @@  test_expect_success 'clone can prompt for proxy password' '
 	expect_askpass pass proxuser
 '
 
+start_socks() {
+	mkfifo socks_output &&
+	{
+		"$PERL_PATH" "$TEST_DIRECTORY/socks4-proxy.pl" "$1" >socks_output &
+		echo $! > "$TRASH_DIRECTORY/socks.pid"
+	} &&
+	read line <socks_output &&
+	test "$line" = ready
+}
+
+# The %30 tests that the correct amount of percent-encoding is applied to the
+# proxy string passed to curl.
+test_lazy_prereq SOCKS_PROXY 'test_have_prereq PERL && start_socks "$TRASH_DIRECTORY/%30.sock"'
+
+test_atexit 'test ! -e "$TRASH_DIRECTORY/socks.pid" || kill "$(cat "$TRASH_DIRECTORY/socks.pid")"'
+
+test_expect_success SOCKS_PROXY 'clone via Unix socket' '
+	test_when_finished "rm -rf clone" &&
+	test_config_global http.proxy "socks4://localhost$PWD/%2530.sock" && {
+		{
+			GIT_TRACE_CURL=$PWD/trace git clone "$HTTPD_URL/smart/repo.git" clone 2>err &&
+			grep -i "SOCKS4 request granted." trace
+		} ||
+		grep "^fatal: libcurl 7\.84 or later" err
+	}
+'
+
+test_expect_success 'Unix socket requires socks*:' '
+	! git clone -c http.proxy=localhost/path https://example.com/repo.git 2>err && {
+		grep "^fatal: Invalid proxy URL '\''localhost/path'\'': only SOCKS proxies support paths" err ||
+		grep "^fatal: libcurl 7\.84 or later" err
+	}
+'
+
+test_expect_success 'Unix socket requires localhost' '
+	! git clone -c http.proxy=socks4://127.0.0.1/path https://example.com/repo.git 2>err && {
+		grep "^fatal: Invalid proxy URL '\''socks4://127\.0\.0\.1/path'\'': host must be localhost if a path is present" err ||
+		grep "^fatal: libcurl 7\.84 or later" err
+	}
+'
+
 test_done