Message ID | 20231009210553.GR3282181@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 581e0f8b189b5b3d50aae16be09941ab6fccf335 |
Headers | show |
Series | bounds-checks for chunk-based files | expand |
On Mon, Oct 09, 2023 at 05:05:53PM -0400, Jeff King wrote: > We load the bloom_filter_indexes chunk using pair_chunk(), so we have no > idea how big it is. This can lead to out-of-bounds reads if it is > smaller than expected, since we index it based on the number of commits > found elsewhere in the graph file. > > We can check the chunk size up front, like we do for CDAT and other > chunks with one fixed-size record per commit. > > The test case demonstrates the problem. It actually won't segfault, > because we end up reading random data from the follow-on chunk (BDAT in > this case), and the bounds checks added in the previous patch complain. > But this is by no means assured, and you can craft a commit-graph file > with BIDX at the end (or a smaller BDAT) that does segfault. > > Signed-off-by: Jeff King <peff@peff.net> > --- > commit-graph.c | 16 ++++++++++++++-- > t/t4216-log-bloom.sh | 9 +++++++++ > 2 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index f7a42be6d0..1f334987b5 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -360,6 +360,18 @@ static int graph_read_generation_data(const unsigned char *chunk_start, > return 0; > } > > +static int graph_read_bloom_index(const unsigned char *chunk_start, > + size_t chunk_size, void *data) > +{ > + struct commit_graph *g = data; > + if (chunk_size != g->num_commits * 4) { Here we probably should wrap `g->num_commits * 4` in an st_mult() call. Having 4B commits is probably pretty unlikely in practice, but we have definitely seen issues in the wild at GitHub where we have more than 2^32-1/20 commits in a single network.git. > + warning("commit-graph changed-path index chunk is too small"); Should this be marked for translation? > + return -1; > + } > + g->chunk_bloom_indexes = chunk_start; > + return 0; > +} > + > static int graph_read_bloom_data(const unsigned char *chunk_start, > size_t chunk_size, void *data) > { > @@ -470,8 +482,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, > } > > if (s->commit_graph_read_changed_paths) { > - pair_chunk_unsafe(cf, GRAPH_CHUNKID_BLOOMINDEXES, > - &graph->chunk_bloom_indexes); > + read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, > + graph_read_bloom_index, graph); Since we know g->num_commits by this point, this would be another spot that would benefit from a `pair_chunk_expect()` function. But, again, not trying to beat a dead horse here or anything ;-). Thanks, Taylor
diff --git a/commit-graph.c b/commit-graph.c index f7a42be6d0..1f334987b5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -360,6 +360,18 @@ static int graph_read_generation_data(const unsigned char *chunk_start, return 0; } +static int graph_read_bloom_index(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + if (chunk_size != g->num_commits * 4) { + warning("commit-graph changed-path index chunk is too small"); + return -1; + } + g->chunk_bloom_indexes = chunk_start; + return 0; +} + static int graph_read_bloom_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { @@ -470,8 +482,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, } if (s->commit_graph_read_changed_paths) { - pair_chunk_unsafe(cf, GRAPH_CHUNKID_BLOOMINDEXES, - &graph->chunk_bloom_indexes); + read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, + graph_read_bloom_index, graph); read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, graph_read_bloom_data, graph); } diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 7a727bcddd..f6054cbb27 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -432,4 +432,13 @@ test_expect_success 'Bloom reader notices out-of-bounds filter offsets' ' grep "warning: ignoring out-of-range offset (4294967295) for changed-path filter at pos 3 of .git/objects/info/commit-graph" err ' +test_expect_success 'Bloom reader notices too-small index chunk' ' + # replace the index with a single entry, making most + # lookups out-of-bounds + check_corrupt_graph BIDX clear 00000000 && + echo "warning: commit-graph changed-path index chunk" \ + "is too small" >expect.err && + test_cmp expect.err err +' + test_done
We load the bloom_filter_indexes chunk using pair_chunk(), so we have no idea how big it is. This can lead to out-of-bounds reads if it is smaller than expected, since we index it based on the number of commits found elsewhere in the graph file. We can check the chunk size up front, like we do for CDAT and other chunks with one fixed-size record per commit. The test case demonstrates the problem. It actually won't segfault, because we end up reading random data from the follow-on chunk (BDAT in this case), and the bounds checks added in the previous patch complain. But this is by no means assured, and you can craft a commit-graph file with BIDX at the end (or a smaller BDAT) that does segfault. Signed-off-by: Jeff King <peff@peff.net> --- commit-graph.c | 16 ++++++++++++++-- t/t4216-log-bloom.sh | 9 +++++++++ 2 files changed, 23 insertions(+), 2 deletions(-)