[12/20] commit-graph: check size of commit data chunk

Message ID	20231009210536.GL3282181@coredump.intra.peff.net (mailing list archive)
State	Accepted
Commit	b72df612afc12b46ea003732d739d7d746871773
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Mon, 9 Oct 2023 17:05:36 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Taylor Blau <me@ttaylorr.com> Subject: [PATCH 12/20] commit-graph: check size of commit data chunk Message-ID: <20231009210536.GL3282181@coredump.intra.peff.net> References: <20231009205544.GA3281950@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net> Precedence: bulk
Series	bounds-checks for chunk-based files \| expand [0/20] bounds-checks for chunk-based files [01/20] chunk-format: note that pair_chunk() is unsafe [02/20] t: add library for munging chunk-format files [03/20] midx: stop ignoring malformed oid fanout chunk [04/20] commit-graph: check size of oid fanout chunk [05/20] midx: check size of oid lookup chunk [06/20] commit-graph: check consistency of fanout table [07/20] midx: check size of pack names chunk [08/20] midx: enforce chunk alignment on reading [09/20] midx: check size of object offset chunk [10/20] midx: bounds-check large offset chunk [11/20] midx: check size of revindex chunk [12/20] commit-graph: check size of commit data chunk [13/20] commit-graph: detect out-of-bounds extra-edges pointers [14/20] commit-graph: bounds-check base graphs chunk [15/20] commit-graph: check size of generations chunk [16/20] commit-graph: bounds-check generation overflow chunk [17/20] commit-graph: check bounds when accessing BDAT chunk [18/20] commit-graph: check bounds when accessing BIDX chunk [19/20] commit-graph: detect out-of-order BIDX offsets [20/20] chunk-format: drop pair_chunk_unsafe() [21/20] t5319: make corrupted large-offset test more robust

Message ID

20231009210536.GL3282181@coredump.intra.peff.net (mailing list archive)

State

Accepted

Commit

b72df612afc12b46ea003732d739d7d746871773

Headers

Date: Mon, 9 Oct 2023 17:05:36 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>
Subject: [PATCH 12/20] commit-graph: check size of commit data chunk
Message-ID: <20231009210536.GL3282181@coredump.intra.peff.net>
References: <20231009205544.GA3281950@coredump.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>
Precedence: bulk

Series

bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 9:05 p.m. UTC

We expect a commit-graph file to have a fixed-size data record for each
commit in the file (and we know the number of commits to expct from the
size of the lookup table). If we encounter a file where this is too
small, we'll look past the end of the chunk (and possibly even off the
mapped memory).

We can fix this by checking the size up front when we record the
pointer.

The included test doesn't segfault, since it ends up reading bytes
from another chunk. But it produces nonsense results, since the values
it reads are garbage. Our test notices this by comparing the output to a
non-corrupted run of the same command (and of course we also check that
the expected error is printed to stderr).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          | 12 +++++++++++-
 t/t5318-commit-graph.sh |  9 +++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

Comments

Taylor Blau Oct. 11, 2023, 6:46 p.m. UTC | #1

On Mon, Oct 09, 2023 at 05:05:36PM -0400, Jeff King wrote:
> We expect a commit-graph file to have a fixed-size data record for each
> commit in the file (and we know the number of commits to expct from the
> size of the lookup table). If we encounter a file where this is too
> small, we'll look past the end of the chunk (and possibly even off the
> mapped memory).
>
> We can fix this by checking the size up front when we record the
> pointer.
>
> The included test doesn't segfault, since it ends up reading bytes
> from another chunk. But it produces nonsense results, since the values
> it reads are garbage. Our test notices this by comparing the output to a
> non-corrupted run of the same command (and of course we also check that
> the expected error is printed to stderr).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c          | 12 +++++++++++-
>  t/t5318-commit-graph.sh |  9 +++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 472332f603..9b80bbd75b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -340,6 +340,16 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
>  	return 0;
>  }
>
> +static int graph_read_commit_data(const unsigned char *chunk_start,
> +				  size_t chunk_size, void *data)
> +{
> +	struct commit_graph *g = data;
> +	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)

Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is
defined as (the_hash_algo->rawsz + 16), so I *think* that the expression
in the parenthesis would get done as a size_t, and then g->num_commits
would be widened to a size_t for the purposes of evaluating this
expression.

So I think that this is all OK in the sense that we'd never underflow
the 64-bit space, and having more than 2^64-1/36 (some eighteen
quintillion) commits is... unlikely ;-).

But it may be worth wrapping this computation in an st_mult() anyway to
avoid future readers having to think about this.

> +		return error("commit-graph commit data chunk is wrong size");
> +	g->chunk_commit_data = chunk_start;
> +	return 0;
> +}
> +
>  static int graph_read_bloom_data(const unsigned char *chunk_start,
>  				  size_t chunk_size, void *data)
>  {
> @@ -422,7 +432,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>
>  	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
>  	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
> -	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
> +	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);

Here again would be a good use-case for a `pair_chunk_expect()`
function, but I don't want to beat a dead horse ;-).

Thanks,
Taylor

Jeff King Oct. 11, 2023, 11:22 p.m. UTC | #2

On Wed, Oct 11, 2023 at 02:46:28PM -0400, Taylor Blau wrote:

> > +static int graph_read_commit_data(const unsigned char *chunk_start,
> > +				  size_t chunk_size, void *data)
> > +{
> > +	struct commit_graph *g = data;
> > +	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
> 
> Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is
> defined as (the_hash_algo->rawsz + 16), so I *think* that the expression
> in the parenthesis would get done as a size_t, and then g->num_commits
> would be widened to a size_t for the purposes of evaluating this
> expression.
> 
> So I think that this is all OK in the sense that we'd never underflow
> the 64-bit space, and having more than 2^64-1/36 (some eighteen
> quintillion) commits is... unlikely ;-).

Hmm, yeah, I think you are right, but I agree it's awfully subtle. There
is no reason somebody couldn't later change "rawsz" to a smaller type
(after all, we know it's going to be tiny), and it would be quite
surprising if that introduces an overflow in far-away code. We should
protect ourselves here.

-Peff

diff --git a/commit-graph.c b/commit-graph.c
index 472332f603..9b80bbd75b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -340,6 +340,16 @@  static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
+static int graph_read_commit_data(const unsigned char *chunk_start,
+				  size_t chunk_size, void *data)
+{
+	struct commit_graph *g = data;
+	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
+		return error("commit-graph commit data chunk is wrong size");
+	g->chunk_commit_data = chunk_start;
+	return 0;
+}
+
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -422,7 +432,7 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 
 	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
-	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
+	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d10658de9e..492460157d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -870,4 +870,13 @@  test_expect_success 'reader notices out-of-bounds fanout' '
 	test_cmp expect.err err
 '
 
+test_expect_success 'reader notices too-small commit data chunk' '
+	check_corrupt_chunk CDAT clear 00000000 &&
+	cat >expect.err <<-\EOF &&
+	error: commit-graph commit data chunk is wrong size
+	error: commit-graph is missing the Commit Data chunk
+	EOF
+	test_cmp expect.err err
+'
+
 test_done

[12/20] commit-graph: check size of commit data chunk

Commit Message

Comments

Patch