diff mbox series

[1/3] remote-curl: refactor smart-http discovery

Message ID 20181116084725.GA31603@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series remote-curl smart-http discovery cleanup | expand

Commit Message

Jeff King Nov. 16, 2018, 8:47 a.m. UTC
After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:

  1. For v0, we require that the content-type indicates a smart-http
     response. We also require the response to look vaguely like a
     pkt-line starting with "#". If one of those does not match, we fall
     back to dumb-http.

     But according to our http protocol spec[1]:

       Dumb servers MUST NOT return a return type starting with
       `application/x-git-`.

     If we see the expected content-type, we should consider it
     smart-http. At that point we can parse the pkt-line for real, and
     complain if it is not syntactically valid.

  2. For v2, we do not actually check the content-type. Our v2 protocol
     spec says[2]:

       When using the http:// or https:// transport a client makes a
       "smart" info/refs request as described in `http-protocol.txt`[...]

     and the http spec is clear that for a smart-http[3]:

       The Content-Type MUST be `application/x-$servicename-advertisement`.

     So it is required according to the spec.

These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:

 - we now predicate the smart/dumb decision entirely on the presence of
   the correct content-type

 - we do a real pkt-line parse before deciding how to proceed (and die
   if it isn't valid)

 - use skip_prefix() for comparing service strings, instead of
   constructing expected output in a strbuf; this avoids dealing with
   memory cleanup

Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.

[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247

Helped-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 34 deletions(-)

Comments

Josh Steadmon Nov. 16, 2018, 8:27 p.m. UTC | #1
On 2018.11.16 03:47, Jeff King wrote:
> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> 
>   1. For v0, we require that the content-type indicates a smart-http
>      response. We also require the response to look vaguely like a
>      pkt-line starting with "#". If one of those does not match, we fall
>      back to dumb-http.
> 
>      But according to our http protocol spec[1]:
> 
>        Dumb servers MUST NOT return a return type starting with
>        `application/x-git-`.
> 
>      If we see the expected content-type, we should consider it
>      smart-http. At that point we can parse the pkt-line for real, and
>      complain if it is not syntactically valid.
> 
>   2. For v2, we do not actually check the content-type. Our v2 protocol
>      spec says[2]:
> 
>        When using the http:// or https:// transport a client makes a
>        "smart" info/refs request as described in `http-protocol.txt`[...]
> 
>      and the http spec is clear that for a smart-http[3]:
> 
>        The Content-Type MUST be `application/x-$servicename-advertisement`.
> 
>      So it is required according to the spec.
> 
> These inconsistencies were easy to miss because of the way the original
> code was written as an inline conditional. Let's pull it out into its
> own function for readability, and improve a few things:
> 
>  - we now predicate the smart/dumb decision entirely on the presence of
>    the correct content-type
> 
>  - we do a real pkt-line parse before deciding how to proceed (and die
>    if it isn't valid)
> 
>  - use skip_prefix() for comparing service strings, instead of
>    constructing expected output in a strbuf; this avoids dealing with
>    memory cleanup
> 
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
> 
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
> 
> Helped-by: Josh Steadmon <steadmon@google.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 59 insertions(+), 34 deletions(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..dd9bc41aa1 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum protocol_version version,
>  	return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +			     struct strbuf *type)
> +{
> +	char *src_buf;
> +	size_t src_len;
> +	char *line;
> +	const char *p;
> +
> +	/*
> +	 * If we don't see x-$service-advertisement, then it's not smart-http.
> +	 * But once we do, we commit to it and assume any other protocol
> +	 * violations are hard errors.
> +	 */
> +	if (!skip_prefix(type->buf, "application/x-", &p) ||
> +	    !skip_prefix(p, service, &p) ||
> +	    strcmp(p, "-advertisement"))
> +		return;
> +
> +	/*
> +	 * "Peek" at the first packet by using a separate buf/len pair; some
> +	 * cases below require us leaving the originals intact.
> +	 */
> +	src_buf = d->buf;
> +	src_len = d->len;
> +	line = packet_read_line_buf(&src_buf, &src_len, NULL);
> +	if (!line)
> +		die("invalid server response; expected service, got flush packet");
> +
> +	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
> +		/*
> +		 * The header can include additional metadata lines, up
> +		 * until a packet flush marker.  Ignore these now, but
> +		 * in the future we might start to scan them.
> +		 */
> +		while (packet_read_line_buf(&src_buf, &src_len, NULL))
> +			;
> +
> +		/*
> +		 * v0 smart http; callers expect us to soak up the
> +		 * service and header packets
> +		 */
> +		d->buf = src_buf;
> +		d->len = src_len;
> +		d->proto_git = 1;
> +
> +	} else if (starts_with(line, "version 2")) {
> +		/*
> +		 * v2 smart http; do not consume version packet, which will
> +		 * be handled elsewhere.
> +		 */
> +		d->proto_git = 1;
> +
> +	} else {
> +		die("invalid server response; got '%s'", line);
> +	}
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> -	struct strbuf exp = STRBUF_INIT;
>  	struct strbuf type = STRBUF_INIT;
>  	struct strbuf charset = STRBUF_INIT;
>  	struct strbuf buffer = STRBUF_INIT;
> @@ -405,38 +461,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	last->buf_alloc = strbuf_detach(&buffer, &last->len);
>  	last->buf = last->buf_alloc;
>  
> -	strbuf_addf(&exp, "application/x-%s-advertisement", service);
> -	if (maybe_smart &&
> -	    (5 <= last->len && last->buf[4] == '#') &&
> -	    !strbuf_cmp(&exp, &type)) {
> -		char *line;
> -
> -		/*
> -		 * smart HTTP response; validate that the service
> -		 * pkt-line matches our request.
> -		 */
> -		line = packet_read_line_buf(&last->buf, &last->len, NULL);
> -		if (!line)
> -			die("invalid server response; expected service, got flush packet");
> -
> -		strbuf_reset(&exp);
> -		strbuf_addf(&exp, "# service=%s", service);
> -		if (strcmp(line, exp.buf))
> -			die("invalid server response; got '%s'", line);
> -		strbuf_release(&exp);
> -
> -		/* The header can include additional metadata lines, up
> -		 * until a packet flush marker.  Ignore these now, but
> -		 * in the future we might start to scan them.
> -		 */
> -		while (packet_read_line_buf(&last->buf, &last->len, NULL))
> -			;
> -
> -		last->proto_git = 1;
> -	} else if (maybe_smart &&
> -		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> -		last->proto_git = 1;
> -	}
> +	if (maybe_smart)
> +		check_smart_http(last, service, &type);
>  
>  	if (last->proto_git)
>  		last->refs = parse_git_refs(last, for_push);
> @@ -444,7 +470,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  		last->refs = parse_info_refs(last);
>  
>  	strbuf_release(&refs_url);
> -	strbuf_release(&exp);
>  	strbuf_release(&type);
>  	strbuf_release(&charset);
>  	strbuf_release(&effective_url);
> -- 
> 2.19.1.1636.gc7a073d580
> 

Looks good to me.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Junio C Hamano Feb. 5, 2019, 11:29 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> ...
>
>  - we now predicate the smart/dumb decision entirely on the presence of
>    the correct content-type
>
>  - we do a real pkt-line parse before deciding how to proceed (and die
>    if it isn't valid)
>
>  - use skip_prefix() for comparing service strings, instead of
>    constructing expected output in a strbuf; this avoids dealing with
>    memory cleanup
>
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
>
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
>
> Helped-by: Josh Steadmon <steadmon@google.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 59 insertions(+), 34 deletions(-)

It turns out that this has interactions with 01f9ec64 ("Use
packet_reader instead of packet_read_line", 2018-12-29) on the
ms/packet-err-check branch.  Can we get this rebased on top of
a more recent 'master'?

Thanks.
Jeff King Feb. 6, 2019, 7:16 p.m. UTC | #3
On Tue, Feb 05, 2019 at 03:29:18PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > After making initial contact with an http server, we have to decide if
> > the server supports smart-http, and if so, which version. Our rules are
> > a bit inconsistent:
> > ...
> >
> >  - we now predicate the smart/dumb decision entirely on the presence of
> >    the correct content-type
> >
> >  - we do a real pkt-line parse before deciding how to proceed (and die
> >    if it isn't valid)
> >
> >  - use skip_prefix() for comparing service strings, instead of
> >    constructing expected output in a strbuf; this avoids dealing with
> >    memory cleanup
> >
> > Note that this _is_ tightening what the client will allow. It's all
> > according to the spec, but it's possible that other implementations
> > might violate these. However, violating these particular rules seems
> > like an odd choice for a server to make.
> >
> > [1] Documentation/technical/http-protocol.txt, l. 166-167
> > [2] Documentation/technical/protocol-v2.txt, l. 63-64
> > [3] Documentation/technical/http-protocol.txt, l. 247
> >
> > Helped-by: Josh Steadmon <steadmon@google.com>
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
> >  1 file changed, 59 insertions(+), 34 deletions(-)
> 
> It turns out that this has interactions with 01f9ec64 ("Use
> packet_reader instead of packet_read_line", 2018-12-29) on the
> ms/packet-err-check branch.  Can we get this rebased on top of
> a more recent 'master'?

Yep. Here it is.

Rather than a range-diff, which is quite large due to the code movement,
I'll include below the interesting hunk of a diff between the two
endpoints (i.e., what we would have seen applying the packet-err-check
changes on top of my code movement, which is more or less what I did to
generate it).

Josh's original 3/3 isn't needed anymore, since ms/packet-err-check
covers that case already. However, he did write tests, which
ms/packet-err-check does not have. So I've converted his final patch
into just a test addition.

  [1/3]: remote-curl: refactor smart-http discovery
  [2/3]: remote-curl: tighten "version 2" check for smart-http
  [3/3]: t5551: test server-side ERR packet

 remote-curl.c                   | 100 ++++++++++++++++++--------------
 t/lib-httpd.sh                  |   1 +
 t/lib-httpd/apache.conf         |   4 ++
 t/lib-httpd/error-smart-http.sh |   3 +
 t/t5551-http-fetch-smart.sh     |   5 ++
 5 files changed, 70 insertions(+), 43 deletions(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

Here's that hunk:

diff --git a/remote-curl.c b/remote-curl.c
index b1309f2bdc..bb7421023b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -333,10 +334,8 @@ static int get_protocol_http_header(enum protocol_version version,
 static void check_smart_http(struct discovery *d, const char *service,
 			     struct strbuf *type)
 {
-	char *src_buf;
-	size_t src_len;
-	char *line;
 	const char *p;
+	struct packet_reader reader;
 
 	/*
 	 * If we don't see x-$service-advertisement, then it's not smart-http.
@@ -348,45 +347,42 @@ static void check_smart_http(struct discovery *d, const char *service,
 	    strcmp(p, "-advertisement"))
 		return;
 
-	/*
-	 * "Peek" at the first packet by using a separate buf/len pair; some
-	 * cases below require us leaving the originals intact.
-	 */
-	src_buf = d->buf;
-	src_len = d->len;
-	line = packet_read_line_buf(&src_buf, &src_len, NULL);
-	if (!line)
+	packet_reader_init(&reader, -1, d->buf, d->len,
+			   PACKET_READ_CHOMP_NEWLINE |
+			   PACKET_READ_DIE_ON_ERR_PACKET);
+	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
 		die("invalid server response; expected service, got flush packet");
 
-	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+	if (skip_prefix(reader.line, "# service=", &p) && !strcmp(p, service)) {
 		/*
 		 * The header can include additional metadata lines, up
 		 * until a packet flush marker.  Ignore these now, but
 		 * in the future we might start to scan them.
 		 */
-		while (packet_read_line_buf(&src_buf, &src_len, NULL))
-			;
+		for (;;) {
+			packet_reader_read(&reader);
+			if (reader.pktlen <= 0) {
+				break;
+			}
+		}
 
 		/*
 		 * v0 smart http; callers expect us to soak up the
 		 * service and header packets
 		 */
-		d->buf = src_buf;
-		d->len = src_len;
+		d->buf = reader.src_buffer;
+		d->len = reader.src_len;
 		d->proto_git = 1;
 
-	} else if (!strcmp(line, "version 2")) {
+	} else if (!strcmp(reader.line, "version 2")) {
 		/*
 		 * v2 smart http; do not consume version packet, which will
 		 * be handled elsewhere.
 		 */
 		d->proto_git = 1;
 
-	} else if (skip_prefix(line, "ERR ", &p)) {
-		die(_("remote error: %s"), p);
-
 	} else {
-		die("invalid server response; got '%s'", line);
+		die("invalid server response; got '%s'", reader.line);
 	}
 }
Junio C Hamano Feb. 6, 2019, 10:19 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> Yep. Here it is.
>
> Rather than a range-diff, which is quite large due to the code movement,
> I'll include below the interesting hunk of a diff between the two
> endpoints (i.e., what we would have seen applying the packet-err-check
> changes on top of my code movement, which is more or less what I did to
> generate it).

Thanks.  The hunks shown below are to use packet_reader interface in
check_smart_http(), which is exactly what we want to see in this
step, and what I was too lazy to recreate as a part of conflict
resolution ;-)

Thanks, queued.
diff mbox series

Patch

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..dd9bc41aa1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,65 @@  static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char *line;
+	const char *p;
+
+	/*
+	 * If we don't see x-$service-advertisement, then it's not smart-http.
+	 * But once we do, we commit to it and assume any other protocol
+	 * violations are hard errors.
+	 */
+	if (!skip_prefix(type->buf, "application/x-", &p) ||
+	    !skip_prefix(p, service, &p) ||
+	    strcmp(p, "-advertisement"))
+		return;
+
+	/*
+	 * "Peek" at the first packet by using a separate buf/len pair; some
+	 * cases below require us leaving the originals intact.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	line = packet_read_line_buf(&src_buf, &src_len, NULL);
+	if (!line)
+		die("invalid server response; expected service, got flush packet");
+
+	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		/*
+		 * v0 smart http; callers expect us to soak up the
+		 * service and header packets
+		 */
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(line, "version 2")) {
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere.
+		 */
+		d->proto_git = 1;
+
+	} else {
+		die("invalid server response; got '%s'", line);
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +461,8 @@  static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +470,6 @@  static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);