diff mbox series

remote-curl: die on server-side errors

Message ID 9a89e54e79593f6455b52e01d802695362f4ec21.1542062657.git.steadmon@google.com
State New, archived
Headers show
Series remote-curl: die on server-side errors | expand

Commit Message

Josh Steadmon Nov. 12, 2018, 10:44 p.m. UTC
When a smart HTTP server sends an error message via pkt-line,
remote-curl will fail to detect the error (which usually results in
incorrectly falling back to dumb-HTTP mode).

This patch adds a check in discover_refs() for server-side error
messages, as well as a test case for this issue.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 remote-curl.c                   | 4 +++-
 t/lib-httpd.sh                  | 1 +
 t/lib-httpd/apache.conf         | 4 ++++
 t/lib-httpd/error-smart-http.sh | 3 +++
 t/t5551-http-fetch-smart.sh     | 5 +++++
 5 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

Comments

Stefan Beller Nov. 12, 2018, 10:55 p.m. UTC | #1
On Mon, Nov 12, 2018 at 2:45 PM <steadmon@google.com> wrote:
>
> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>

Reviewed-by: Stefan Beller <sbeller@google.com>
Junio C Hamano Nov. 13, 2018, 2:52 a.m. UTC | #2
steadmon@google.com writes:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  remote-curl.c                   | 4 +++-
>  t/lib-httpd.sh                  | 1 +
>  t/lib-httpd/apache.conf         | 4 ++++
>  t/lib-httpd/error-smart-http.sh | 3 +++
>  t/t5551-http-fetch-smart.sh     | 5 +++++
>  5 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 t/lib-httpd/error-smart-http.sh
>
> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..bb3a86505e 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	} else if (maybe_smart &&
>  		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
>  		last->proto_git = 1;
> -	}
> +	} else if (maybe_smart && last->len > 5 &&
> +		   starts_with(last->buf + 4, "ERR "))
> +		die(_("remote error: %s"), last->buf + 8);

Two observations and a half.

 - All of these if/else if/ blocks (currently 2, now you added the
   third one) are to special case the "maybe_smart" case.  Perhaps
   the whole thing should be restrucutred to

	if (maybe_smart) {
		if ...
		else if ...
		else if ...
	}

 - All conditions skip the first four bytes in last->buf and does
   starts_with (the first branch is "starts_with("#") in disguise).
   Can we do something to the hardcoded magic number 4, which looks
   somewhat irritating?

 - This is less of the problem with the suggested change in the
   first bullet item above, but the current code structure makes
   readers wonder if maybe_start that starts as 1 is turned off
   somewhere in the if/else if/ cascade when we find out that we are
   not dealing with smart HTTP after all.  That however is not what
   is happening.  The "maybe" variable is primarily there to avoid
   repeating the logic that determined its initial value in the
   early part of the function.  The last->proto_git field is used to
   record if we are using smart HTTP or not.

Otherwise the change looks sensible.


> diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
> index a8729f8232..4e946623bb 100644
> --- a/t/lib-httpd.sh
> +++ b/t/lib-httpd.sh
> @@ -131,6 +131,7 @@ prepare_httpd() {
>  	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
>  	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
>  	install_script broken-smart-http.sh
> +	install_script error-smart-http.sh
>  	install_script error.sh
>  	install_script apply-one-time-sed.sh
>  
> diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
> index 581c010d8f..6de2bc775c 100644
> --- a/t/lib-httpd/apache.conf
> +++ b/t/lib-httpd/apache.conf
> @@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/
>  </LocationMatch>
>  ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
>  ScriptAlias /broken_smart/ broken-smart-http.sh/
> +ScriptAlias /error_smart/ error-smart-http.sh/
>  ScriptAlias /error/ error.sh/
>  ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
>  <Directory ${GIT_EXEC_PATH}>
> @@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
>  <Files broken-smart-http.sh>
>  	Options ExecCGI
>  </Files>
> +<Files error-smart-http.sh>
> +	Options ExecCGI
> +</Files>
>  <Files error.sh>
>    Options ExecCGI
>  </Files>
> diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
> new file mode 100644
> index 0000000000..0a1af318f7
> --- /dev/null
> +++ b/t/lib-httpd/error-smart-http.sh
> @@ -0,0 +1,3 @@
> +printf "Content-Type: text/%s\n" "html"
> +echo
> +printf "%s" "0019ERR server-side error"
> diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> index 8630b0cc39..ba83e567e5 100755
> --- a/t/t5551-http-fetch-smart.sh
> +++ b/t/t5551-http-fetch-smart.sh
> @@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
>  	! grep "=> Send data" err
>  '
>  
> +test_expect_success 'server-side error detected' '
> +	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
> +	grep "server-side error" actual
> +'
> +
>  stop_httpd
>  test_done
Junio C Hamano Nov. 13, 2018, 3:02 a.m. UTC | #3
steadmon@google.com writes:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---

Forgot to mention one procedural comment.

As you can see in the To: line of this reply, your MUA is placing
only the e-mail address without name on your From: line.

Preferrably I'd like to see the same string as your sign-off on the
"From:" line in your messages for a bit of human touch ;-) Can you
tweak your MUA to make that happen?

The second preference is to add an in-body header (i.e. as the first
line of the body of the message) so that the body of the message
starts like this:

    From: Josh Steadmon <steadmon@google.com>

    When a smart HTTP server sends an error message via pkt-line,
    remote-curl will fail to detect the error (which usually results in
    incorrectly falling back to dumb-HTTP mode).

    This patch adds a check in discover_refs() for server-side error
    messages, as well as a test case for this issue.

    Signed-off-by: Josh Steadmon <steadmon@google.com>
    ---
     remote-curl.c                   | 4 +++-
     t/lib-httpd.sh                  | 1 +
     t/lib-httpd/apache.conf         | 4 ++++

Either way would make sure that the resulting patch's author line
will be attributed correctly to the same identity as who is signing
it off first as the author.

Thanks.
Jeff King Nov. 13, 2018, 2:26 p.m. UTC | #4
On Mon, Nov 12, 2018 at 02:44:56PM -0800, steadmon@google.com wrote:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
> 
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.

Aside from the reformatting of the conditional that Junio mentioned,
this seems pretty good to me. But while looking at that, I found a few
things, some old and some new. :)

> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..bb3a86505e 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	} else if (maybe_smart &&
>  		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
>  		last->proto_git = 1;
> -	}
> +	} else if (maybe_smart && last->len > 5 &&
> +		   starts_with(last->buf + 4, "ERR "))
> +		die(_("remote error: %s"), last->buf + 8);

The magic "4" here and in the existing "version 2" check is because we
are expecting pkt-lines. The original conditional always calls
packed_read_line_buf() and will complain if we didn't actually get a
pkt-line.

Should we confirm that we got a real packet-line? Or at least that those
first four are even plausible hex chars?

I admit that it's pretty unlikely that the server is going to fool us
here. It would need something like "foo ERRORS ARE FUN!". And even then
we'd report an error (whereas the correct behavior is falling back to
dumb http, but we know that won't work anyway because that's not a valid
ref advertisement). So I doubt this is really a bug per se, but it might
make it easier to understand what's going on if we actually parsed the
packet.

Similarly, we seem eager to accept "version 2" even if we are only
expecting v0. I know you have another series working in that direction,
but I don't think it touches this "proto_git". I guess accepting
"version 2" as "we're speaking git protocol" and then barfing later with
"wait, I didn't mean to speak v2" is probably OK.

-Peff
Junio C Hamano Nov. 13, 2018, 2:30 p.m. UTC | #5
steadmon@google.com writes:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).

OK, that is a valid thing to worry about.

>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.

This makes me wonder if discoer_refs() is the only place where we
ought to be checking for ERR packets but we are not.  Are there
other places that we also need a similar fix?
Josh Steadmon Nov. 13, 2018, 10:15 p.m. UTC | #6
On 2018.11.13 12:02, Junio C Hamano wrote:
> steadmon@google.com writes:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> >
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> >
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> > ---
> 
> Forgot to mention one procedural comment.
> 
> As you can see in the To: line of this reply, your MUA is placing
> only the e-mail address without name on your From: line.
> 
> Preferrably I'd like to see the same string as your sign-off on the
> "From:" line in your messages for a bit of human touch ;-) Can you
> tweak your MUA to make that happen?
> 
> The second preference is to add an in-body header (i.e. as the first
> line of the body of the message) so that the body of the message
> starts like this:
> 
>     From: Josh Steadmon <steadmon@google.com>
> 
>     When a smart HTTP server sends an error message via pkt-line,
>     remote-curl will fail to detect the error (which usually results in
>     incorrectly falling back to dumb-HTTP mode).
> 
>     This patch adds a check in discover_refs() for server-side error
>     messages, as well as a test case for this issue.
> 
>     Signed-off-by: Josh Steadmon <steadmon@google.com>
>     ---
>      remote-curl.c                   | 4 +++-
>      t/lib-httpd.sh                  | 1 +
>      t/lib-httpd/apache.conf         | 4 ++++
> 
> Either way would make sure that the resulting patch's author line
> will be attributed correctly to the same identity as who is signing
> it off first as the author.
> 
> Thanks.

This should be fixed for future patch submissions. Thanks for the
heads-up.
Josh Steadmon Nov. 13, 2018, 10:25 p.m. UTC | #7
On 2018.11.13 09:26, Jeff King wrote:
> On Mon, Nov 12, 2018 at 02:44:56PM -0800, steadmon@google.com wrote:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> > 
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> 
> Aside from the reformatting of the conditional that Junio mentioned,
> this seems pretty good to me. But while looking at that, I found a few
> things, some old and some new. :)
> 
> > diff --git a/remote-curl.c b/remote-curl.c
> > index 762a55a75f..bb3a86505e 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
> >  	} else if (maybe_smart &&
> >  		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> >  		last->proto_git = 1;
> > -	}
> > +	} else if (maybe_smart && last->len > 5 &&
> > +		   starts_with(last->buf + 4, "ERR "))
> > +		die(_("remote error: %s"), last->buf + 8);
> 
> The magic "4" here and in the existing "version 2" check is because we
> are expecting pkt-lines. The original conditional always calls
> packed_read_line_buf() and will complain if we didn't actually get a
> pkt-line.
> 
> Should we confirm that we got a real packet-line? Or at least that those
> first four are even plausible hex chars?
> 
> I admit that it's pretty unlikely that the server is going to fool us
> here. It would need something like "foo ERRORS ARE FUN!". And even then
> we'd report an error (whereas the correct behavior is falling back to
> dumb http, but we know that won't work anyway because that's not a valid
> ref advertisement). So I doubt this is really a bug per se, but it might
> make it easier to understand what's going on if we actually parsed the
> packet.

Unfortunately we can't just directly parse the data in last->buf,
because other parts of the code are expecting to see the raw pkt-line
data there. I tried adding a duplicate pointer and length variable for
this data and parsing that through packet_read_line_buf(), but even
without using the results this apparently has side-effects that break
all of t5550 (and probably other tests as well). It also fails if I
completely duplicate last->buf into a new char* and call
packet_readline_buf() on that, so there's clearly some interaction in
the pkt-line guts that I'm not properly accounting for.

> Similarly, we seem eager to accept "version 2" even if we are only
> expecting v0. I know you have another series working in that direction,
> but I don't think it touches this "proto_git". I guess accepting
> "version 2" as "we're speaking git protocol" and then barfing later with
> "wait, I didn't mean to speak v2" is probably OK.
> 
> -Peff
Josh Steadmon Nov. 13, 2018, 10:28 p.m. UTC | #8
On 2018.11.13 23:30, Junio C Hamano wrote:
> steadmon@google.com writes:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> 
> OK, that is a valid thing to worry about.
> 
> >
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> 
> This makes me wonder if discoer_refs() is the only place where we
> ought to be checking for ERR packets but we are not.  Are there
> other places that we also need a similar fix?

Quite possibly; I'll review the other client code to see if there are
similar issues before sending v2.
Jeff King Nov. 14, 2018, 12:49 a.m. UTC | #9
On Tue, Nov 13, 2018 at 02:25:40PM -0800, Josh Steadmon wrote:

> > The magic "4" here and in the existing "version 2" check is because we
> > are expecting pkt-lines. The original conditional always calls
> > packed_read_line_buf() and will complain if we didn't actually get a
> > pkt-line.
> > 
> > Should we confirm that we got a real packet-line? Or at least that those
> > first four are even plausible hex chars?
> > 
> > I admit that it's pretty unlikely that the server is going to fool us
> > here. It would need something like "foo ERRORS ARE FUN!". And even then
> > we'd report an error (whereas the correct behavior is falling back to
> > dumb http, but we know that won't work anyway because that's not a valid
> > ref advertisement). So I doubt this is really a bug per se, but it might
> > make it easier to understand what's going on if we actually parsed the
> > packet.
> 
> Unfortunately we can't just directly parse the data in last->buf,
> because other parts of the code are expecting to see the raw pkt-line
> data there. I tried adding a duplicate pointer and length variable for
> this data and parsing that through packet_read_line_buf(), but even
> without using the results this apparently has side-effects that break
> all of t5550 (and probably other tests as well). It also fails if I
> completely duplicate last->buf into a new char* and call
> packet_readline_buf() on that, so there's clearly some interaction in
> the pkt-line guts that I'm not properly accounting for.

Yes, the packet_read_line_buf() interface will both advance the buf
pointer and decrement the length.  So if we want to "peek", we have to
do so with a copy (there's a peek function if you use the packet_reader
interface, but that might be overkill here).

You can rewrite it like this, which is a pretty faithful conversion and
passes the tests (but see below).

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..66c57c9d62 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,78 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char pkt[LARGE_PACKET_MAX];
+	int pkt_len;
+	enum packet_read_status r;
+
+	/*
+	 * We speculatively try to read a packet, which means we must preserve
+	 * the original buf/len pair in some cases.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	r = packet_read_with_status(-1, &src_buf, &src_len,
+				    pkt, sizeof(pkt), &pkt_len,
+				    PACKET_READ_GENTLE |
+				    PACKET_READ_CHOMP_NEWLINE);
+
+	/*
+	 * This could probably just be handled as "not smart" like all the
+	 * other pkt-line error cases, but traditionally we've complained
+	 * about it (technically we used to do so only when we got the right
+	 * content-type, but it probably doesn't matter).
+	 */
+	if (r == PACKET_READ_FLUSH)
+		die("invalid server response; expected service, got flush packet");
+	if (r != PACKET_READ_NORMAL)
+		return; /* not smart */
+
+	if (pkt[0] == '#') {
+		/* v0 smart http */
+		struct strbuf exp = STRBUF_INIT;
+
+		strbuf_addf(&exp, "application/x-%s-advertisement", service);
+		if (strbuf_cmp(&exp, type)) {
+			strbuf_release(&exp);
+			return;
+		}
+
+		strbuf_reset(&exp);
+		strbuf_addf(&exp, "# service=%s", service);
+		if (strcmp(pkt, exp.buf))
+			die("invalid server response; got '%s'", pkt);
+
+		strbuf_release(&exp);
+
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(pkt, "version 2")) {
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere. Should we be checking the content-type
+		 * in this code-path, too?
+		 */
+		d->proto_git = 1;
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +474,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +483,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);

So the few tricky things are:

  - the ordering with respect to checking the packet and the
    content-type is a little different here. Should v2 protocol be
    expecting that content-type, too? If so, then I think it would make
    sense to check that first, before even considering if there's a
    packet to read (and the current code is overly-permissive in that
    case).

  - there is no way to speculatively read a packet and not die() if the
    buffer doesn't have pkt-lines in it. So we'd additionally need the
    patch below (which annoyingly has to touch a bunch of switch
    statements to keep the compiler happy). If we _can_ check the
    content-type first, then all of that could go away (i.e., once we
    see the right content-type, we're confident enough in expecting
    packets to die() in the existing packet code-paths).

diff --git a/connect.c b/connect.c
index 24281b6082..1caafc3ce7 100644
--- a/connect.c
+++ b/connect.c
@@ -125,6 +125,8 @@ enum protocol_version discover_version(struct packet_reader *reader)
 	switch (packet_reader_peek(reader)) {
 	case PACKET_READ_EOF:
 		die_initial_contact(0);
+	case PACKET_READ_FORMAT_ERROR:
+		BUG("format error from non-gentle packet read");
 	case PACKET_READ_FLUSH:
 	case PACKET_READ_DELIM:
 		version = protocol_v0;
@@ -304,6 +306,8 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 		switch (packet_reader_read(reader)) {
 		case PACKET_READ_EOF:
 			die_initial_contact(1);
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			len = reader->pktlen;
 			if (len > 4 && skip_prefix(reader->line, "ERR ", &arg))
diff --git a/pkt-line.c b/pkt-line.c
index 04d10bbd03..2daacc6188 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -293,7 +293,8 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 
 	/* And complain if we didn't get enough bytes to satisfy the read. */
 	if (ret != size) {
-		if (options & PACKET_READ_GENTLE_ON_EOF)
+		if ((options & PACKET_READ_GENTLE_ON_EOF) ||
+		    (options & PACKET_READ_GENTLE))
 			return -1;
 
 		die(_("the remote end hung up unexpectedly"));
@@ -324,6 +325,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 	len = packet_length(linelen);
 
 	if (len < 0) {
+		if (options & PACKET_READ_GENTLE) {
+			*pktlen = -1;
+			return PACKET_READ_FORMAT_ERROR;
+		}
 		die(_("protocol error: bad line length character: %.4s"), linelen);
 	} else if (!len) {
 		packet_trace("0000", 4, 0);
@@ -334,12 +339,21 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 		*pktlen = 0;
 		return PACKET_READ_DELIM;
 	} else if (len < 4) {
+		if (options & PACKET_READ_GENTLE) {
+			*pktlen = -1;
+			return PACKET_READ_FORMAT_ERROR;
+		}
 		die(_("protocol error: bad line length %d"), len);
 	}
 
 	len -= 4;
-	if ((unsigned)len >= size)
+	if ((unsigned)len >= size) {
+		if (options & PACKET_READ_GENTLE) {
+			*pktlen = -1;
+			return PACKET_READ_FORMAT_ERROR;
+		}
 		die(_("protocol error: bad line length %d"), len);
+	}
 
 	if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0) {
 		*pktlen = -1;
diff --git a/pkt-line.h b/pkt-line.h
index 5b28d43472..7580f83cdc 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -44,7 +44,7 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  * If src_buffer (or *src_buffer) is NULL, then data is read from the
  * descriptor "fd".
  *
- * If options does not contain PACKET_READ_GENTLE_ON_EOF, we will die under any
+ * If options does not contain PACKET_READ_GENTLE, we will die under any
  * of the following conditions:
  *
  *   1. Read error from descriptor.
@@ -56,6 +56,8 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  *   4. Truncated output from the remote (e.g., we expected a packet but got
  *      EOF, or we got a partial packet followed by EOF).
  *
+ * If options does contain PACKET_READ_GENTLE, we'll instead return -1.
+ *
  * If options does contain PACKET_READ_GENTLE_ON_EOF, we will not die on
  * condition 4 (truncated input), but instead return -1. However, we will still
  * die for the other 3 conditions.
@@ -65,6 +67,7 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  */
 #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
 #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
+#define PACKET_READ_GENTLE        (1u<<2)
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
 		*buffer, unsigned size, int options);
 
@@ -79,6 +82,7 @@ enum packet_read_status {
 	PACKET_READ_NORMAL,
 	PACKET_READ_FLUSH,
 	PACKET_READ_DELIM,
+	PACKET_READ_FORMAT_ERROR,
 };
 enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 						size_t *src_len, char *buffer,
diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..66c57c9d62 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1214,6 +1252,8 @@ static size_t proxy_in(char *buffer, size_t eltsize,
 		switch (packet_reader_read(&p->reader)) {
 		case PACKET_READ_EOF:
 			die("unexpected EOF when reading from parent process");
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			packet_buf_write_len(&p->request_buffer, p->reader.line,
 					     p->reader.pktlen);
diff --git a/serve.c b/serve.c
index bda085f09c..e88936262b 100644
--- a/serve.c
+++ b/serve.c
@@ -181,6 +181,8 @@ static int process_request(void)
 		switch (packet_reader_peek(&reader)) {
 		case PACKET_READ_EOF:
 			BUG("Should have already died when seeing EOF");
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
 			if (is_command(reader.line, &command) ||
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 282d536384..1522176a2f 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -37,6 +37,8 @@ static void unpack(void)
 		switch (reader.status) {
 		case PACKET_READ_EOF:
 			break;
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			printf("%s\n", reader.line);
 			break;
@@ -64,6 +66,8 @@ static void unpack_sideband(void)
 		switch (reader.status) {
 		case PACKET_READ_EOF:
 			break;
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			band = reader.line[0] & 0xff;
 			if (band < 1 || band > 2)
Jeff King Nov. 14, 2018, 7 a.m. UTC | #10
On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote:

> Yes, the packet_read_line_buf() interface will both advance the buf
> pointer and decrement the length.  So if we want to "peek", we have to
> do so with a copy (there's a peek function if you use the packet_reader
> interface, but that might be overkill here).
> 
> You can rewrite it like this, which is a pretty faithful conversion and
> passes the tests (but see below).
> [...]

Here's a version which is less faithful, but I think does sensible
things in all cases, and is much easier to follow. I get a little
nervous just because it tightens some cases, and one never knows if
other implementations might be relying on the looseness. E.g.:

  - in the current code we'd still drop back to dumb http if the server
    tells us "application/x-git-upload-pack" but the initial pktline
    doesn't start with "#" (even though if it _does_ have "#" at
    position 5 but isn't a valid pktline, we'd complain!)

  - right now the v2 protocol does not require the server to say
    "application/x-git-upload-pack" for the content-type

This patch tightens both of those (I also made a few stylistic tweaks,
and added the ERR condition to show where it would go). I dunno. Part of
me sees this as a nice cleanup, but maybe it is better to just leave it
alone. A lot of these behaviors are just how it happens to work now, and
not part of the spec, but we don't know what might be relying on them.

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..1adb96311b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,61 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char *line;
+	const char *p;
+
+	if (!skip_prefix(type->buf, "application/x-", &p) ||
+	    !skip_prefix(p, service, &p) ||
+	    strcmp(p, "-advertisement"))
+		return;
+
+	/*
+	 * We speculatively try to read a packet, which means we must preserve
+	 * the original buf/len pair in some cases.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	line = packet_read_line_buf(&src_buf, &src_len, NULL);
+	if (!line)
+		die("invalid server response; expected service, got flush packet");
+
+	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		/*
+		 * v0 smart http; callers expect us to soak up the
+		 * service and header packets
+		 */
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(line, "version 2")) { /* should be strcmp()? */
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere.
+		 */
+		d->proto_git = 1;
+	} else if (skip_prefix(line, "ERR ", &p)) {
+		die(_("remote error: %s"), p);
+	} else {
+		die("invalid server response; got '%s'", line);
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +466,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);
Josh Steadmon Nov. 15, 2018, 9:51 p.m. UTC | #11
On 2018.11.14 02:00, Jeff King wrote:
> On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote:
> 
> > Yes, the packet_read_line_buf() interface will both advance the buf
> > pointer and decrement the length.  So if we want to "peek", we have to
> > do so with a copy (there's a peek function if you use the packet_reader
> > interface, but that might be overkill here).
> > 
> > You can rewrite it like this, which is a pretty faithful conversion and
> > passes the tests (but see below).
> > [...]
> 
> Here's a version which is less faithful, but I think does sensible
> things in all cases, and is much easier to follow. I get a little
> nervous just because it tightens some cases, and one never knows if
> other implementations might be relying on the looseness. E.g.:
> 
>   - in the current code we'd still drop back to dumb http if the server
>     tells us "application/x-git-upload-pack" but the initial pktline
>     doesn't start with "#" (even though if it _does_ have "#" at
>     position 5 but isn't a valid pktline, we'd complain!)
> 
>   - right now the v2 protocol does not require the server to say
>     "application/x-git-upload-pack" for the content-type
> 
> This patch tightens both of those (I also made a few stylistic tweaks,
> and added the ERR condition to show where it would go). I dunno. Part of
> me sees this as a nice cleanup, but maybe it is better to just leave it
> alone. A lot of these behaviors are just how it happens to work now, and
> not part of the spec, but we don't know what might be relying on them.

At least according to the protocol-v2 and http-protocol docs, the
stricter behavior seems correct:

For the first point above, dumb servers should never use an
"application/x-git-*" content type (http-protocol.txt line 163-167).

For the second point, the docs require v2 servers to use
"application/x-git-*" content types. protocol-v2.txt lines 63-65 state
that v2 clients should make a smart http request, while
http-protocol.txt lines 247-252 state that a smart server's response
type must be "application/x-git-*".

Of course we don't know if other implementations follow the spec, but
ISTM that this patch at least doesn't contradict how we've promised the
protocols should work.

If no one has any objections, I'll include the diff below in v2. Thanks
for the help Jeff!

> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..1adb96311b 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,61 @@ static int get_protocol_http_header(enum protocol_version version,
>  	return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +			     struct strbuf *type)
> +{
> +	char *src_buf;
> +	size_t src_len;
> +	char *line;
> +	const char *p;
> +
> +	if (!skip_prefix(type->buf, "application/x-", &p) ||
> +	    !skip_prefix(p, service, &p) ||
> +	    strcmp(p, "-advertisement"))
> +		return;
> +
> +	/*
> +	 * We speculatively try to read a packet, which means we must preserve
> +	 * the original buf/len pair in some cases.
> +	 */
> +	src_buf = d->buf;
> +	src_len = d->len;
> +	line = packet_read_line_buf(&src_buf, &src_len, NULL);
> +	if (!line)
> +		die("invalid server response; expected service, got flush packet");
> +
> +	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
> +		/*
> +		 * The header can include additional metadata lines, up
> +		 * until a packet flush marker.  Ignore these now, but
> +		 * in the future we might start to scan them.
> +		 */
> +		while (packet_read_line_buf(&src_buf, &src_len, NULL))
> +			;
> +
> +		/*
> +		 * v0 smart http; callers expect us to soak up the
> +		 * service and header packets
> +		 */
> +		d->buf = src_buf;
> +		d->len = src_len;
> +		d->proto_git = 1;
> +
> +	} else if (starts_with(line, "version 2")) { /* should be strcmp()? */
> +		/*
> +		 * v2 smart http; do not consume version packet, which will
> +		 * be handled elsewhere.
> +		 */
> +		d->proto_git = 1;
> +	} else if (skip_prefix(line, "ERR ", &p)) {
> +		die(_("remote error: %s"), p);
> +	} else {
> +		die("invalid server response; got '%s'", line);
> +	}
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> -	struct strbuf exp = STRBUF_INIT;
>  	struct strbuf type = STRBUF_INIT;
>  	struct strbuf charset = STRBUF_INIT;
>  	struct strbuf buffer = STRBUF_INIT;
> @@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	last->buf_alloc = strbuf_detach(&buffer, &last->len);
>  	last->buf = last->buf_alloc;
>  
> -	strbuf_addf(&exp, "application/x-%s-advertisement", service);
> -	if (maybe_smart &&
> -	    (5 <= last->len && last->buf[4] == '#') &&
> -	    !strbuf_cmp(&exp, &type)) {
> -		char *line;
> -
> -		/*
> -		 * smart HTTP response; validate that the service
> -		 * pkt-line matches our request.
> -		 */
> -		line = packet_read_line_buf(&last->buf, &last->len, NULL);
> -		if (!line)
> -			die("invalid server response; expected service, got flush packet");
> -
> -		strbuf_reset(&exp);
> -		strbuf_addf(&exp, "# service=%s", service);
> -		if (strcmp(line, exp.buf))
> -			die("invalid server response; got '%s'", line);
> -		strbuf_release(&exp);
> -
> -		/* The header can include additional metadata lines, up
> -		 * until a packet flush marker.  Ignore these now, but
> -		 * in the future we might start to scan them.
> -		 */
> -		while (packet_read_line_buf(&last->buf, &last->len, NULL))
> -			;
> -
> -		last->proto_git = 1;
> -	} else if (maybe_smart &&
> -		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> -		last->proto_git = 1;
> -	}
> +	if (maybe_smart)
> +		check_smart_http(last, service, &type);
>  
>  	if (last->proto_git)
>  		last->refs = parse_git_refs(last, for_push);
> @@ -444,7 +466,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  		last->refs = parse_info_refs(last);
>  
>  	strbuf_release(&refs_url);
> -	strbuf_release(&exp);
>  	strbuf_release(&type);
>  	strbuf_release(&charset);
>  	strbuf_release(&effective_url);
diff mbox series

Patch

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..bb3a86505e 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -436,7 +436,9 @@  static struct discovery *discover_refs(const char *service, int for_push)
 	} else if (maybe_smart &&
 		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
 		last->proto_git = 1;
-	}
+	} else if (maybe_smart && last->len > 5 &&
+		   starts_with(last->buf + 4, "ERR "))
+		die(_("remote error: %s"), last->buf + 8);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index a8729f8232..4e946623bb 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -131,6 +131,7 @@  prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error-smart-http.sh
 	install_script error.sh
 	install_script apply-one-time-sed.sh
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 581c010d8f..6de2bc775c 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -117,6 +117,7 @@  Alias /auth/dumb/ www/auth/dumb/
 </LocationMatch>
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error_smart/ error-smart-http.sh/
 ScriptAlias /error/ error.sh/
 ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Directory ${GIT_EXEC_PATH}>
@@ -125,6 +126,9 @@  ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error-smart-http.sh>
+	Options ExecCGI
+</Files>
 <Files error.sh>
   Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
new file mode 100644
index 0000000000..0a1af318f7
--- /dev/null
+++ b/t/lib-httpd/error-smart-http.sh
@@ -0,0 +1,3 @@ 
+printf "Content-Type: text/%s\n" "html"
+echo
+printf "%s" "0019ERR server-side error"
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 8630b0cc39..ba83e567e5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -429,5 +429,10 @@  test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
 	! grep "=> Send data" err
 '
 
+test_expect_success 'server-side error detected' '
+	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
+	grep "server-side error" actual
+'
+
 stop_httpd
 test_done