diff mbox series

[12/20] commit-graph: check size of commit data chunk

Message ID 20231009210536.GL3282181@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit b72df612afc12b46ea003732d739d7d746871773
Headers show
Series bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 9:05 p.m. UTC
We expect a commit-graph file to have a fixed-size data record for each
commit in the file (and we know the number of commits to expct from the
size of the lookup table). If we encounter a file where this is too
small, we'll look past the end of the chunk (and possibly even off the
mapped memory).

We can fix this by checking the size up front when we record the
pointer.

The included test doesn't segfault, since it ends up reading bytes
from another chunk. But it produces nonsense results, since the values
it reads are garbage. Our test notices this by comparing the output to a
non-corrupted run of the same command (and of course we also check that
the expected error is printed to stderr).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          | 12 +++++++++++-
 t/t5318-commit-graph.sh |  9 +++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

Comments

Taylor Blau Oct. 11, 2023, 6:46 p.m. UTC | #1
On Mon, Oct 09, 2023 at 05:05:36PM -0400, Jeff King wrote:
> We expect a commit-graph file to have a fixed-size data record for each
> commit in the file (and we know the number of commits to expct from the
> size of the lookup table). If we encounter a file where this is too
> small, we'll look past the end of the chunk (and possibly even off the
> mapped memory).
>
> We can fix this by checking the size up front when we record the
> pointer.
>
> The included test doesn't segfault, since it ends up reading bytes
> from another chunk. But it produces nonsense results, since the values
> it reads are garbage. Our test notices this by comparing the output to a
> non-corrupted run of the same command (and of course we also check that
> the expected error is printed to stderr).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c          | 12 +++++++++++-
>  t/t5318-commit-graph.sh |  9 +++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 472332f603..9b80bbd75b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -340,6 +340,16 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
>  	return 0;
>  }
>
> +static int graph_read_commit_data(const unsigned char *chunk_start,
> +				  size_t chunk_size, void *data)
> +{
> +	struct commit_graph *g = data;
> +	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)

Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is
defined as (the_hash_algo->rawsz + 16), so I *think* that the expression
in the parenthesis would get done as a size_t, and then g->num_commits
would be widened to a size_t for the purposes of evaluating this
expression.

So I think that this is all OK in the sense that we'd never underflow
the 64-bit space, and having more than 2^64-1/36 (some eighteen
quintillion) commits is... unlikely ;-).

But it may be worth wrapping this computation in an st_mult() anyway to
avoid future readers having to think about this.

> +		return error("commit-graph commit data chunk is wrong size");
> +	g->chunk_commit_data = chunk_start;
> +	return 0;
> +}
> +
>  static int graph_read_bloom_data(const unsigned char *chunk_start,
>  				  size_t chunk_size, void *data)
>  {
> @@ -422,7 +432,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>
>  	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
>  	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
> -	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
> +	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);

Here again would be a good use-case for a `pair_chunk_expect()`
function, but I don't want to beat a dead horse ;-).

Thanks,
Taylor
Jeff King Oct. 11, 2023, 11:22 p.m. UTC | #2
On Wed, Oct 11, 2023 at 02:46:28PM -0400, Taylor Blau wrote:

> > +static int graph_read_commit_data(const unsigned char *chunk_start,
> > +				  size_t chunk_size, void *data)
> > +{
> > +	struct commit_graph *g = data;
> > +	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
> 
> Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is
> defined as (the_hash_algo->rawsz + 16), so I *think* that the expression
> in the parenthesis would get done as a size_t, and then g->num_commits
> would be widened to a size_t for the purposes of evaluating this
> expression.
> 
> So I think that this is all OK in the sense that we'd never underflow
> the 64-bit space, and having more than 2^64-1/36 (some eighteen
> quintillion) commits is... unlikely ;-).

Hmm, yeah, I think you are right, but I agree it's awfully subtle. There
is no reason somebody couldn't later change "rawsz" to a smaller type
(after all, we know it's going to be tiny), and it would be quite
surprising if that introduces an overflow in far-away code. We should
protect ourselves here.

-Peff
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 472332f603..9b80bbd75b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -340,6 +340,16 @@  static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
+static int graph_read_commit_data(const unsigned char *chunk_start,
+				  size_t chunk_size, void *data)
+{
+	struct commit_graph *g = data;
+	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
+		return error("commit-graph commit data chunk is wrong size");
+	g->chunk_commit_data = chunk_start;
+	return 0;
+}
+
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -422,7 +432,7 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 
 	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
-	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
+	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d10658de9e..492460157d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -870,4 +870,13 @@  test_expect_success 'reader notices out-of-bounds fanout' '
 	test_cmp expect.err err
 '
 
+test_expect_success 'reader notices too-small commit data chunk' '
+	check_corrupt_chunk CDAT clear 00000000 &&
+	cat >expect.err <<-\EOF &&
+	error: commit-graph commit data chunk is wrong size
+	error: commit-graph is missing the Commit Data chunk
+	EOF
+	test_cmp expect.err err
+'
+
 test_done