diff mbox series

[05/12] http: simplify parsing of remote objects/info/packs

Message ID 20190404232704.GE21839@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series a rabbit hole of update-server-info fixes | expand

Commit Message

Jeff King April 4, 2019, 11:27 p.m. UTC
We can use skip_prefix() and parse_oid_hex() to continuously increment
our pointer, rather than dealing with magic numbers. This also fixes a
few small shortcomings:

  - if we see a 'P' line that does not match our expectations, we'll
    leave our "i" counter in the middle of the line. So we'll parse:
    "P P P pack-1234.pack" as if there was just one "P" which was not
    intentional (though probably not that harmful).

  - if we see a line with the right prefix, suffix, and length, i.e.
    matching /P pack-.{40}.pack\n/, we'll interpret the middle part as
    hex without checking if it could be parsed. This could lead to us
    looking at uninitialized garbage in the hash array. In practice this
    means we'll just make a garbage request to the server which will
    fail, though it's interesting that a malicious server could convince
    us to leak 40 bytes of uninitialized stack to them.

  - the current code is picky about seeing a newline at the end of file,
    but we can easily be more liberal

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

Comments

René Scharfe April 5, 2019, 10:41 a.m. UTC | #1
Am 05.04.2019 um 01:27 schrieb Jeff King:
> We can use skip_prefix() and parse_oid_hex() to continuously increment
> our pointer, rather than dealing with magic numbers. This also fixes a
> few small shortcomings:
>
>   - if we see a 'P' line that does not match our expectations, we'll
>     leave our "i" counter in the middle of the line. So we'll parse:
>     "P P P pack-1234.pack" as if there was just one "P" which was not
>     intentional (though probably not that harmful).

How so?  The default case, which we'd fall through to, skips the rest
of such a line, doesn't it?

>
>   - if we see a line with the right prefix, suffix, and length, i.e.
>     matching /P pack-.{40}.pack\n/, we'll interpret the middle part as
>     hex without checking if it could be parsed. This could lead to us
>     looking at uninitialized garbage in the hash array. In practice this
>     means we'll just make a garbage request to the server which will
>     fail, though it's interesting that a malicious server could convince
>     us to leak 40 bytes of uninitialized stack to them.
>
>   - the current code is picky about seeing a newline at the end of file,
>     but we can easily be more liberal
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http.c | 35 ++++++++++++++---------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
>
> diff --git a/http.c b/http.c
> index a32ad36ddf..2ef47bc779 100644
> --- a/http.c
> +++ b/http.c
> @@ -2147,11 +2147,11 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
>  int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
>  {
>  	struct http_get_options options = {0};
> -	int ret = 0, i = 0;
> -	char *url, *data;
> +	int ret = 0;
> +	char *url;
> +	const char *data;
>  	struct strbuf buf = STRBUF_INIT;
> -	unsigned char hash[GIT_MAX_RAWSZ];
> -	const unsigned hexsz = the_hash_algo->hexsz;
> +	struct object_id oid;
>
>  	end_url_with_slash(&buf, base_url);
>  	strbuf_addstr(&buf, "objects/info/packs");
> @@ -2163,24 +2163,17 @@ int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
>  		goto cleanup;
>
>  	data = buf.buf;
> -	while (i < buf.len) {
> -		switch (data[i]) {
> -		case 'P':
> -			i++;
> -			if (i + hexsz + 12 <= buf.len &&
> -			    starts_with(data + i, " pack-") &&
> -			    starts_with(data + i + hexsz + 6, ".pack\n")) {
> -				get_sha1_hex(data + i + 6, hash);
> -				fetch_and_setup_pack_index(packs_head, hash,
> -						      base_url);
> -				i += hexsz + 11;
> -				break;
> -			}
> -		default:
> -			while (i < buf.len && data[i] != '\n')
> -				i++;
> +	while (*data) {
> +		if (skip_prefix(data, "P pack-", &data) &&
> +		    !parse_oid_hex(data, &oid, &data) &&
> +		    skip_prefix(data, ".pack", &data) &&
> +		    (*data == '\n' || *data == '\0')) {
> +			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
> +		} else {
> +			data = strchrnul(data, '\n');
>  		}
> -		i++;
> +		if (*data)
> +			data++; /* skip past newline */

So much simpler, *and* converted to object_id -- I like it!

Parsing "P" and "pack-" together crosses logical token boundaries,
but that I don't mind it here.

René
Jeff King April 5, 2019, 6:11 p.m. UTC | #2
On Fri, Apr 05, 2019 at 12:41:27PM +0200, René Scharfe wrote:

> Am 05.04.2019 um 01:27 schrieb Jeff King:
> > We can use skip_prefix() and parse_oid_hex() to continuously increment
> > our pointer, rather than dealing with magic numbers. This also fixes a
> > few small shortcomings:
> >
> >   - if we see a 'P' line that does not match our expectations, we'll
> >     leave our "i" counter in the middle of the line. So we'll parse:
> >     "P P P pack-1234.pack" as if there was just one "P" which was not
> >     intentional (though probably not that harmful).
> 
> How so?  The default case, which we'd fall through to, skips the rest
> of such a line, doesn't it?

Oh, you're right. I didn't notice the fall-through, which is quite
subtle (I'm actually surprised -Wimplicit-fallthrough doesn't complain
about this).

I'll fix up the commit message (the cleanup is still very much worth it
for the garbage-oid issue, IMHO).

> > +	while (*data) {
> > +		if (skip_prefix(data, "P pack-", &data) &&
> > +		    !parse_oid_hex(data, &oid, &data) &&
> > +		    skip_prefix(data, ".pack", &data) &&
> > +		    (*data == '\n' || *data == '\0')) {
> > +			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
> > +		} else {
> > +			data = strchrnul(data, '\n');
> >  		}
> > -		i++;
> > +		if (*data)
> > +			data++; /* skip past newline */
> 
> So much simpler, *and* converted to object_id -- I like it!
> 
> Parsing "P" and "pack-" together crosses logical token boundaries,
> but that I don't mind it here.

Yeah, I was tempted to write:

  if (skip_prefix(data, "P ", &data) &&
      skip_prefix(data, "pack-", &data) &&
      ...

but that felt a little silly. I dunno. I guess it is probably not any
less efficient, because we'd expect skip_prefix() and its loop to get
inlined here anyway.

-Peff
René Scharfe April 5, 2019, 8:17 p.m. UTC | #3
Am 05.04.2019 um 20:11 schrieb Jeff King:
> On Fri, Apr 05, 2019 at 12:41:27PM +0200, René Scharfe wrote:
>> Parsing "P" and "pack-" together crosses logical token boundaries,
>> but that I don't mind it here.
>
> Yeah, I was tempted to write:
>
>   if (skip_prefix(data, "P ", &data) &&
>       skip_prefix(data, "pack-", &data) &&
>       ...
>
> but that felt a little silly. I dunno. I guess it is probably not any
> less efficient, because we'd expect skip_prefix() and its loop to get
> inlined here anyway.

Didn't think of inlining.  Clang unrolls the whole comparison (except on
powerpc64), but the other compilers available at the Compiler Explorer
website keep the consecutive calls separate: https://godbolt.org/z/7eTarV

René
diff mbox series

Patch

diff --git a/http.c b/http.c
index a32ad36ddf..2ef47bc779 100644
--- a/http.c
+++ b/http.c
@@ -2147,11 +2147,11 @@  static int fetch_and_setup_pack_index(struct packed_git **packs_head,
 int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
 {
 	struct http_get_options options = {0};
-	int ret = 0, i = 0;
-	char *url, *data;
+	int ret = 0;
+	char *url;
+	const char *data;
 	struct strbuf buf = STRBUF_INIT;
-	unsigned char hash[GIT_MAX_RAWSZ];
-	const unsigned hexsz = the_hash_algo->hexsz;
+	struct object_id oid;
 
 	end_url_with_slash(&buf, base_url);
 	strbuf_addstr(&buf, "objects/info/packs");
@@ -2163,24 +2163,17 @@  int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
 		goto cleanup;
 
 	data = buf.buf;
-	while (i < buf.len) {
-		switch (data[i]) {
-		case 'P':
-			i++;
-			if (i + hexsz + 12 <= buf.len &&
-			    starts_with(data + i, " pack-") &&
-			    starts_with(data + i + hexsz + 6, ".pack\n")) {
-				get_sha1_hex(data + i + 6, hash);
-				fetch_and_setup_pack_index(packs_head, hash,
-						      base_url);
-				i += hexsz + 11;
-				break;
-			}
-		default:
-			while (i < buf.len && data[i] != '\n')
-				i++;
+	while (*data) {
+		if (skip_prefix(data, "P pack-", &data) &&
+		    !parse_oid_hex(data, &oid, &data) &&
+		    skip_prefix(data, ".pack", &data) &&
+		    (*data == '\n' || *data == '\0')) {
+			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
+		} else {
+			data = strchrnul(data, '\n');
 		}
-		i++;
+		if (*data)
+			data++; /* skip past newline */
 	}
 
 cleanup: