Message ID | 20231009210550.GQ3282181@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 920f400e919c7c51f81adc6989cdd52630220783 |
Headers | show |
Series | bounds-checks for chunk-based files | expand |
On Mon, Oct 09, 2023 at 05:05:50PM -0400, Jeff King wrote: > --- > bloom.c | 24 ++++++++++++++++++++++++ > commit-graph.c | 10 ++++++++++ > commit-graph.h | 1 + > t/t4216-log-bloom.sh | 28 ++++++++++++++++++++++++++++ > 4 files changed, 63 insertions(+) > > diff --git a/bloom.c b/bloom.c > index aef6b5fea2..61abad7f8c 100644 > --- a/bloom.c > +++ b/bloom.c > @@ -29,6 +29,26 @@ static inline unsigned char get_bitmask(uint32_t pos) > return ((unsigned char)1) << (pos & (BITS_PER_WORD - 1)); > } > > +static int check_bloom_offset(struct commit_graph *g, uint32_t pos, > + uint32_t offset) > +{ > + /* > + * Note that we allow offsets equal to the data size, which would set > + * our pointers at one past the end of the chunk memory. This is > + * necessary because the on-disk index points to the end of the > + * entries (so we can compute size by comparing adjacent ones). And > + * naturally the final entry's end is one-past-the-end of the chunk. > + */ > + if (offset <= g->chunk_bloom_data_size - BLOOMDATA_CHUNK_HEADER_SIZE) > + return 0; > + > + warning("ignoring out-of-range offset (%"PRIuMAX") for changed-path" > + " filter at pos %"PRIuMAX" of %s (chunk size: %"PRIuMAX")", > + (uintmax_t)offset, (uintmax_t)pos, > + g->filename, (uintmax_t)g->chunk_bloom_data_size); Should this be marked for translation? > + return -1; > +} > + > static int load_bloom_filter_from_graph(struct commit_graph *g, > struct bloom_filter *filter, > uint32_t graph_pos) > @@ -51,6 +71,10 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, > else > start_index = 0; > > + if (check_bloom_offset(g, lex_pos, end_index) < 0 || > + check_bloom_offset(g, lex_pos - 1, start_index) < 0) Can lex_pos ever be zero? I can't think of any reason that it couldn't, and indeed the (elided) diff context shows that immediately preceding this if-statement is another that checks "if (lex_pos > 0)". So I think we'd want to avoid checking lex_pos - 1 if we know that lex_pos is zero. Not that any of this really matters, since the only thing we use the computed value for is showing the graph position in the warning() message. So seeing a bogus value there isn't the end of the world. But avoiding this check when lex_pos == 0 is straightforward, so is probably worth doing. (As an aside, we should be able to simplify that if statement to just "(if lex_pos)", since lex_pos will never be negative). > + return 0; > + > filter->len = end_index - start_index; > filter->data = (unsigned char *)(g->chunk_bloom_data + > sizeof(unsigned char) * start_index + > diff --git a/commit-graph.c b/commit-graph.c > index f446e76c28..f7a42be6d0 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -365,7 +365,17 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, > { > struct commit_graph *g = data; > uint32_t hash_version; > + > + if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) { > + warning("ignoring too-small changed-path chunk" > + " (%"PRIuMAX" < %"PRIuMAX") in commit-graph file", > + (uintmax_t)chunk_size, > + (uintmax_t)BLOOMDATA_CHUNK_HEADER_SIZE); Ditto on the translation suggestion from above. Thanks, Taylor
On Wed, Oct 11, 2023 at 03:11:59PM -0400, Taylor Blau wrote: > > + if (check_bloom_offset(g, lex_pos, end_index) < 0 || > > + check_bloom_offset(g, lex_pos - 1, start_index) < 0) > > Can lex_pos ever be zero? I can't think of any reason that it couldn't, > and indeed the (elided) diff context shows that immediately preceding > this if-statement is another that checks "if (lex_pos > 0)". > > So I think we'd want to avoid checking lex_pos - 1 if we know that > lex_pos is zero. Not that any of this really matters, since the only > thing we use the computed value for is showing the graph position in the > warning() message. So seeing a bogus value there isn't the end of the > world. If lex_pos is 0, then we set start_index to 0 manually, which we know to be valid. So we can't trigger a bogus warning(). My thinking was that it was OK to just let this fall out naturally as it does here, since it makes the code shorter. But if we want to cover that case separately, I think you'd want to split the checks like: diff --git a/bloom.c b/bloom.c index 1474aa19fa..d9ad69ad82 100644 --- a/bloom.c +++ b/bloom.c @@ -65,16 +65,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, lex_pos = graph_pos - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); + if (check_bloom_offset(g, lex_pos, end_index) < 0) + return 0; - if (lex_pos > 0) + if (lex_pos > 0) { start_index = get_be32(g->chunk_bloom_indexes + 4 * (lex_pos - 1)); - else + if (check_bloom_offset(g, lex_pos - 1, start_index) < 0) + return 0; + } else start_index = 0; - if (check_bloom_offset(g, lex_pos, end_index) < 0 || - check_bloom_offset(g, lex_pos - 1, start_index) < 0) - return 0; - if (end_index < start_index) { warning("ignoring decreasing changed-path index offsets" " (%"PRIuMAX" > %"PRIuMAX") for positions" -Peff
diff --git a/bloom.c b/bloom.c index aef6b5fea2..61abad7f8c 100644 --- a/bloom.c +++ b/bloom.c @@ -29,6 +29,26 @@ static inline unsigned char get_bitmask(uint32_t pos) return ((unsigned char)1) << (pos & (BITS_PER_WORD - 1)); } +static int check_bloom_offset(struct commit_graph *g, uint32_t pos, + uint32_t offset) +{ + /* + * Note that we allow offsets equal to the data size, which would set + * our pointers at one past the end of the chunk memory. This is + * necessary because the on-disk index points to the end of the + * entries (so we can compute size by comparing adjacent ones). And + * naturally the final entry's end is one-past-the-end of the chunk. + */ + if (offset <= g->chunk_bloom_data_size - BLOOMDATA_CHUNK_HEADER_SIZE) + return 0; + + warning("ignoring out-of-range offset (%"PRIuMAX") for changed-path" + " filter at pos %"PRIuMAX" of %s (chunk size: %"PRIuMAX")", + (uintmax_t)offset, (uintmax_t)pos, + g->filename, (uintmax_t)g->chunk_bloom_data_size); + return -1; +} + static int load_bloom_filter_from_graph(struct commit_graph *g, struct bloom_filter *filter, uint32_t graph_pos) @@ -51,6 +71,10 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, else start_index = 0; + if (check_bloom_offset(g, lex_pos, end_index) < 0 || + check_bloom_offset(g, lex_pos - 1, start_index) < 0) + return 0; + filter->len = end_index - start_index; filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + diff --git a/commit-graph.c b/commit-graph.c index f446e76c28..f7a42be6d0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -365,7 +365,17 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, { struct commit_graph *g = data; uint32_t hash_version; + + if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) { + warning("ignoring too-small changed-path chunk" + " (%"PRIuMAX" < %"PRIuMAX") in commit-graph file", + (uintmax_t)chunk_size, + (uintmax_t)BLOOMDATA_CHUNK_HEADER_SIZE); + return -1; + } + g->chunk_bloom_data = chunk_start; + g->chunk_bloom_data_size = chunk_size; hash_version = get_be32(chunk_start); if (hash_version != 1) diff --git a/commit-graph.h b/commit-graph.h index b373f15802..c6870274c5 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -101,6 +101,7 @@ struct commit_graph { size_t chunk_base_graphs_size; const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + size_t chunk_bloom_data_size; struct topo_level_slab *topo_levels; struct bloom_filter_settings *bloom_filter_settings; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fa9d32facf..7a727bcddd 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -5,6 +5,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-chunk.sh GIT_TEST_COMMIT_GRAPH=0 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 @@ -404,4 +405,31 @@ test_expect_success 'Bloom generation backfills empty commits' ' ) ' +corrupt_graph () { + graph=.git/objects/info/commit-graph && + test_when_finished "rm -rf $graph" && + git commit-graph write --reachable --changed-paths && + corrupt_chunk_file $graph "$@" +} + +check_corrupt_graph () { + corrupt_graph "$@" && + git -c core.commitGraph=false log -- A/B/file2 >expect.out && + git -c core.commitGraph=true log -- A/B/file2 >out 2>err && + test_cmp expect.out out +} + +test_expect_success 'Bloom reader notices too-small data chunk' ' + check_corrupt_graph BDAT clear 00000000 && + echo "warning: ignoring too-small changed-path chunk" \ + "(4 < 12) in commit-graph file" >expect.err && + test_cmp expect.err err +' + +test_expect_success 'Bloom reader notices out-of-bounds filter offsets' ' + check_corrupt_graph BIDX 12 FFFFFFFF && + # use grep to avoid depending on exact chunk size + grep "warning: ignoring out-of-range offset (4294967295) for changed-path filter at pos 3 of .git/objects/info/commit-graph" err +' + test_done
When loading Bloom filters from a commit-graph file, we use the offset values in the BIDX chunk to index into the memory mapped for the BDAT chunk. But since we don't record how big the BDAT chunk is, we just trust that the BIDX offsets won't cause us to read outside of the chunk memory. A corrupted or malicious commit-graph file will cause us to segfault (in practice this isn't a very interesting attack, since commit-graph files are local-only, and the worst case is an out-of-bounds read). We can't fix this by checking the chunk size during parsing, since the data in the BDAT chunk doesn't have a fixed size (that's why we need the BIDX in the first place). So we'll fix it in two parts: 1. Record the BDAT chunk size during parsing, and then later check that the BIDX offsets we look up are within bounds. 2. Because the offsets are relative to the end of the BDAT header, we must also make sure that the BDAT chunk is at least as large as the expected header size. Otherwise, we overflow when trying to move past the header, even for an offset of "0". We can check this early, during the parsing stage. The error messages are rather verbose, but since this is not something you'd expect to see outside of severe bugs or corruption, it makes sense to err on the side of too many details. Sadly we can't mention the filename during the chunk-parsing stage, as we haven't set g->filename at this point, nor passed it down through the stack. Signed-off-by: Jeff King <peff@peff.net> --- bloom.c | 24 ++++++++++++++++++++++++ commit-graph.c | 10 ++++++++++ commit-graph.h | 1 + t/t4216-log-bloom.sh | 28 ++++++++++++++++++++++++++++ 4 files changed, 63 insertions(+)