[05/12] http: simplify parsing of remote objects/info/packs

Message ID	20190404232704.GE21839@sigill.intra.peff.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Thu, 4 Apr 2019 19:27:04 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Subject: [PATCH 05/12] http: simplify parsing of remote objects/info/packs Message-ID: <20190404232704.GE21839@sigill.intra.peff.net> References: <20190404232104.GA27770@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190404232104.GA27770@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	a rabbit hole of update-server-info fixes \| expand [0/12] a rabbit hole of update-server-info fixes [01/12] t5319: fix bogus cat-file argument [02/12] t5319: drop useless --buffer from cat-file [03/12] packfile: factor out .pack to .idx name conversion [04/12] packfile: check midx coverage with .idx rather than .pack [05/12] http: simplify parsing of remote objects/info/packs [06/12] server-info: fix blind pointer arithmetic [07/12] server-info: simplify cleanup in parse_pack_def() [08/12] server-info: use strbuf to read old info/packs file [09/12] server-info: drop nr_alloc struct member [10/12] packfile.h: drop extern from function declarations [11/12] server-info: drop objdirlen pointer arithmetic [12/12] update_info_refs(): drop unused force parameter

Message ID

20190404232704.GE21839@sigill.intra.peff.net (mailing list archive)

State

New, archived

Headers

Date: Thu, 4 Apr 2019 19:27:04 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 05/12] http: simplify parsing of remote objects/info/packs
Message-ID: <20190404232704.GE21839@sigill.intra.peff.net>
References: <20190404232104.GA27770@sigill.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20190404232104.GA27770@sigill.intra.peff.net>
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

a rabbit hole of update-server-info fixes | expand

Commit Message

Jeff King April 4, 2019, 11:27 p.m. UTC

We can use skip_prefix() and parse_oid_hex() to continuously increment
our pointer, rather than dealing with magic numbers. This also fixes a
few small shortcomings:

  - if we see a 'P' line that does not match our expectations, we'll
    leave our "i" counter in the middle of the line. So we'll parse:
    "P P P pack-1234.pack" as if there was just one "P" which was not
    intentional (though probably not that harmful).

  - if we see a line with the right prefix, suffix, and length, i.e.
    matching /P pack-.{40}.pack\n/, we'll interpret the middle part as
    hex without checking if it could be parsed. This could lead to us
    looking at uninitialized garbage in the hash array. In practice this
    means we'll just make a garbage request to the server which will
    fail, though it's interesting that a malicious server could convince
    us to leak 40 bytes of uninitialized stack to them.

  - the current code is picky about seeing a newline at the end of file,
    but we can easily be more liberal

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

Comments

René Scharfe April 5, 2019, 10:41 a.m. UTC | #1

Am 05.04.2019 um 01:27 schrieb Jeff King:
> We can use skip_prefix() and parse_oid_hex() to continuously increment
> our pointer, rather than dealing with magic numbers. This also fixes a
> few small shortcomings:
>
>   - if we see a 'P' line that does not match our expectations, we'll
>     leave our "i" counter in the middle of the line. So we'll parse:
>     "P P P pack-1234.pack" as if there was just one "P" which was not
>     intentional (though probably not that harmful).

How so?  The default case, which we'd fall through to, skips the rest
of such a line, doesn't it?

>
>   - if we see a line with the right prefix, suffix, and length, i.e.
>     matching /P pack-.{40}.pack\n/, we'll interpret the middle part as
>     hex without checking if it could be parsed. This could lead to us
>     looking at uninitialized garbage in the hash array. In practice this
>     means we'll just make a garbage request to the server which will
>     fail, though it's interesting that a malicious server could convince
>     us to leak 40 bytes of uninitialized stack to them.
>
>   - the current code is picky about seeing a newline at the end of file,
>     but we can easily be more liberal
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http.c | 35 ++++++++++++++---------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
>
> diff --git a/http.c b/http.c
> index a32ad36ddf..2ef47bc779 100644
> --- a/http.c
> +++ b/http.c
> @@ -2147,11 +2147,11 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
>  int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
>  {
>  	struct http_get_options options = {0};
> -	int ret = 0, i = 0;
> -	char *url, *data;
> +	int ret = 0;
> +	char *url;
> +	const char *data;
>  	struct strbuf buf = STRBUF_INIT;
> -	unsigned char hash[GIT_MAX_RAWSZ];
> -	const unsigned hexsz = the_hash_algo->hexsz;
> +	struct object_id oid;
>
>  	end_url_with_slash(&buf, base_url);
>  	strbuf_addstr(&buf, "objects/info/packs");
> @@ -2163,24 +2163,17 @@ int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
>  		goto cleanup;
>
>  	data = buf.buf;
> -	while (i < buf.len) {
> -		switch (data[i]) {
> -		case 'P':
> -			i++;
> -			if (i + hexsz + 12 <= buf.len &&
> -			    starts_with(data + i, " pack-") &&
> -			    starts_with(data + i + hexsz + 6, ".pack\n")) {
> -				get_sha1_hex(data + i + 6, hash);
> -				fetch_and_setup_pack_index(packs_head, hash,
> -						      base_url);
> -				i += hexsz + 11;
> -				break;
> -			}
> -		default:
> -			while (i < buf.len && data[i] != '\n')
> -				i++;
> +	while (*data) {
> +		if (skip_prefix(data, "P pack-", &data) &&
> +		    !parse_oid_hex(data, &oid, &data) &&
> +		    skip_prefix(data, ".pack", &data) &&
> +		    (*data == '\n' || *data == '\0')) {
> +			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
> +		} else {
> +			data = strchrnul(data, '\n');
>  		}
> -		i++;
> +		if (*data)
> +			data++; /* skip past newline */

So much simpler, *and* converted to object_id -- I like it!

Parsing "P" and "pack-" together crosses logical token boundaries,
but that I don't mind it here.

René

Jeff King April 5, 2019, 6:11 p.m. UTC | #2

On Fri, Apr 05, 2019 at 12:41:27PM +0200, René Scharfe wrote:

> Am 05.04.2019 um 01:27 schrieb Jeff King:
> > We can use skip_prefix() and parse_oid_hex() to continuously increment
> > our pointer, rather than dealing with magic numbers. This also fixes a
> > few small shortcomings:
> >
> >   - if we see a 'P' line that does not match our expectations, we'll
> >     leave our "i" counter in the middle of the line. So we'll parse:
> >     "P P P pack-1234.pack" as if there was just one "P" which was not
> >     intentional (though probably not that harmful).
> 
> How so?  The default case, which we'd fall through to, skips the rest
> of such a line, doesn't it?

Oh, you're right. I didn't notice the fall-through, which is quite
subtle (I'm actually surprised -Wimplicit-fallthrough doesn't complain
about this).

I'll fix up the commit message (the cleanup is still very much worth it
for the garbage-oid issue, IMHO).

> > +	while (*data) {
> > +		if (skip_prefix(data, "P pack-", &data) &&
> > +		    !parse_oid_hex(data, &oid, &data) &&
> > +		    skip_prefix(data, ".pack", &data) &&
> > +		    (*data == '\n' || *data == '\0')) {
> > +			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
> > +		} else {
> > +			data = strchrnul(data, '\n');
> >  		}
> > -		i++;
> > +		if (*data)
> > +			data++; /* skip past newline */
> 
> So much simpler, *and* converted to object_id -- I like it!
> 
> Parsing "P" and "pack-" together crosses logical token boundaries,
> but that I don't mind it here.

Yeah, I was tempted to write:

  if (skip_prefix(data, "P ", &data) &&
      skip_prefix(data, "pack-", &data) &&
      ...

but that felt a little silly. I dunno. I guess it is probably not any
less efficient, because we'd expect skip_prefix() and its loop to get
inlined here anyway.

-Peff

René Scharfe April 5, 2019, 8:17 p.m. UTC | #3

Am 05.04.2019 um 20:11 schrieb Jeff King:
> On Fri, Apr 05, 2019 at 12:41:27PM +0200, René Scharfe wrote:
>> Parsing "P" and "pack-" together crosses logical token boundaries,
>> but that I don't mind it here.
>
> Yeah, I was tempted to write:
>
>   if (skip_prefix(data, "P ", &data) &&
>       skip_prefix(data, "pack-", &data) &&
>       ...
>
> but that felt a little silly. I dunno. I guess it is probably not any
> less efficient, because we'd expect skip_prefix() and its loop to get
> inlined here anyway.

Didn't think of inlining.  Clang unrolls the whole comparison (except on
powerpc64), but the other compilers available at the Compiler Explorer
website keep the consecutive calls separate: https://godbolt.org/z/7eTarV

René

diff --git a/http.c b/http.c
index a32ad36ddf..2ef47bc779 100644
--- a/http.c
+++ b/http.c
@@ -2147,11 +2147,11 @@  static int fetch_and_setup_pack_index(struct packed_git **packs_head,
 int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
 {
 	struct http_get_options options = {0};
-	int ret = 0, i = 0;
-	char *url, *data;
+	int ret = 0;
+	char *url;
+	const char *data;
 	struct strbuf buf = STRBUF_INIT;
-	unsigned char hash[GIT_MAX_RAWSZ];
-	const unsigned hexsz = the_hash_algo->hexsz;
+	struct object_id oid;
 
 	end_url_with_slash(&buf, base_url);
 	strbuf_addstr(&buf, "objects/info/packs");
@@ -2163,24 +2163,17 @@  int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
 		goto cleanup;
 
 	data = buf.buf;
-	while (i < buf.len) {
-		switch (data[i]) {
-		case 'P':
-			i++;
-			if (i + hexsz + 12 <= buf.len &&
-			    starts_with(data + i, " pack-") &&
-			    starts_with(data + i + hexsz + 6, ".pack\n")) {
-				get_sha1_hex(data + i + 6, hash);
-				fetch_and_setup_pack_index(packs_head, hash,
-						      base_url);
-				i += hexsz + 11;
-				break;
-			}
-		default:
-			while (i < buf.len && data[i] != '\n')
-				i++;
+	while (*data) {
+		if (skip_prefix(data, "P pack-", &data) &&
+		    !parse_oid_hex(data, &oid, &data) &&
+		    skip_prefix(data, ".pack", &data) &&
+		    (*data == '\n' || *data == '\0')) {
+			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
+		} else {
+			data = strchrnul(data, '\n');
 		}
-		i++;
+		if (*data)
+			data++; /* skip past newline */
 	}
 
 cleanup:

[05/12] http: simplify parsing of remote objects/info/packs

Commit Message

Comments

Patch