diff mbox series

[18/20] commit-graph: check bounds when accessing BIDX chunk

Message ID 20231009210553.GR3282181@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 581e0f8b189b5b3d50aae16be09941ab6fccf335
Headers show
Series bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 9:05 p.m. UTC
We load the bloom_filter_indexes chunk using pair_chunk(), so we have no
idea how big it is. This can lead to out-of-bounds reads if it is
smaller than expected, since we index it based on the number of commits
found elsewhere in the graph file.

We can check the chunk size up front, like we do for CDAT and other
chunks with one fixed-size record per commit.

The test case demonstrates the problem. It actually won't segfault,
because we end up reading random data from the follow-on chunk (BDAT in
this case), and the bounds checks added in the previous patch complain.
But this is by no means assured, and you can craft a commit-graph file
with BIDX at the end (or a smaller BDAT) that does segfault.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c       | 16 ++++++++++++++--
 t/t4216-log-bloom.sh |  9 +++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

Comments

Taylor Blau Oct. 11, 2023, 7:15 p.m. UTC | #1
On Mon, Oct 09, 2023 at 05:05:53PM -0400, Jeff King wrote:
> We load the bloom_filter_indexes chunk using pair_chunk(), so we have no
> idea how big it is. This can lead to out-of-bounds reads if it is
> smaller than expected, since we index it based on the number of commits
> found elsewhere in the graph file.
>
> We can check the chunk size up front, like we do for CDAT and other
> chunks with one fixed-size record per commit.
>
> The test case demonstrates the problem. It actually won't segfault,
> because we end up reading random data from the follow-on chunk (BDAT in
> this case), and the bounds checks added in the previous patch complain.
> But this is by no means assured, and you can craft a commit-graph file
> with BIDX at the end (or a smaller BDAT) that does segfault.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c       | 16 ++++++++++++++--
>  t/t4216-log-bloom.sh |  9 +++++++++
>  2 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index f7a42be6d0..1f334987b5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -360,6 +360,18 @@ static int graph_read_generation_data(const unsigned char *chunk_start,
>  	return 0;
>  }
>
> +static int graph_read_bloom_index(const unsigned char *chunk_start,
> +				  size_t chunk_size, void *data)
> +{
> +	struct commit_graph *g = data;
> +	if (chunk_size != g->num_commits * 4) {

Here we probably should wrap `g->num_commits * 4` in an st_mult() call.
Having 4B commits is probably pretty unlikely in practice, but we have
definitely seen issues in the wild at GitHub where we have more than
2^32-1/20 commits in a single network.git.

> +		warning("commit-graph changed-path index chunk is too small");

Should this be marked for translation?

> +		return -1;
> +	}
> +	g->chunk_bloom_indexes = chunk_start;
> +	return 0;
> +}
> +
>  static int graph_read_bloom_data(const unsigned char *chunk_start,
>  				  size_t chunk_size, void *data)
>  {
> @@ -470,8 +482,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>  	}
>
>  	if (s->commit_graph_read_changed_paths) {
> -		pair_chunk_unsafe(cf, GRAPH_CHUNKID_BLOOMINDEXES,
> -			   &graph->chunk_bloom_indexes);
> +		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
> +			   graph_read_bloom_index, graph);

Since we know g->num_commits by this point, this would be another spot
that would benefit from a `pair_chunk_expect()` function. But, again,
not trying to beat a dead horse here or anything ;-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index f7a42be6d0..1f334987b5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -360,6 +360,18 @@  static int graph_read_generation_data(const unsigned char *chunk_start,
 	return 0;
 }
 
+static int graph_read_bloom_index(const unsigned char *chunk_start,
+				  size_t chunk_size, void *data)
+{
+	struct commit_graph *g = data;
+	if (chunk_size != g->num_commits * 4) {
+		warning("commit-graph changed-path index chunk is too small");
+		return -1;
+	}
+	g->chunk_bloom_indexes = chunk_start;
+	return 0;
+}
+
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
@@ -470,8 +482,8 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}
 
 	if (s->commit_graph_read_changed_paths) {
-		pair_chunk_unsafe(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-			   &graph->chunk_bloom_indexes);
+		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
+			   graph_read_bloom_index, graph);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
 			   graph_read_bloom_data, graph);
 	}
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 7a727bcddd..f6054cbb27 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -432,4 +432,13 @@  test_expect_success 'Bloom reader notices out-of-bounds filter offsets' '
 	grep "warning: ignoring out-of-range offset (4294967295) for changed-path filter at pos 3 of .git/objects/info/commit-graph" err
 '
 
+test_expect_success 'Bloom reader notices too-small index chunk' '
+	# replace the index with a single entry, making most
+	# lookups out-of-bounds
+	check_corrupt_graph BIDX clear 00000000 &&
+	echo "warning: commit-graph changed-path index chunk" \
+		"is too small" >expect.err &&
+	test_cmp expect.err err
+'
+
 test_done