Message ID | 20231009205544.GA3281950@coredump.intra.peff.net (mailing list archive) |
---|---|
Headers | show |
Series | bounds-checks for chunk-based files | expand |
On Mon, Oct 09, 2023 at 04:55:44PM -0400, Jeff King wrote: > bloom.c | 34 +++++++++ > chunk-format.c | 24 ++++-- > chunk-format.h | 9 ++- > commit-graph.c | 119 ++++++++++++++++++++++++----- > commit-graph.h | 4 + > midx.c | 68 +++++++++++++---- > midx.h | 3 + > pack-revindex.c | 13 +++- > t/lib-chunk.sh | 17 +++++ > t/lib-chunk/corrupt-chunk-file.pl | 66 ++++++++++++++++ > t/t4216-log-bloom.sh | 50 ++++++++++++ > t/t5318-commit-graph.sh | 76 +++++++++++++++++- > t/t5319-multi-pack-index.sh | 102 ++++++++++++++++++++++++- > t/t5324-split-commit-graph.sh | 20 ++++- > t/t5328-commit-graph-64bit-time.sh | 10 +++ > 15 files changed, 568 insertions(+), 47 deletions(-) > create mode 100644 t/lib-chunk.sh > create mode 100644 t/lib-chunk/corrupt-chunk-file.pl I reviewed this carefully (well, except for the new Perl script, for obvious[^1] reasons ;-)). Everything mostly looks good to me, though I had a handful of review comments throughout. Many of them are trivial (e.g. a number of warning() and error() strings should be marked for translation, etc.), but a couple of them I think are worth looking at. Most notably, I think that by the end of the series, I was convinced that having some kind of 'pair_chunk_expectsz()' or similar would be useful and eliminate a good chunk of the boilerplate you have to write to check the chunk size against an expected value when using read_chunk(). Otherwise, this looks great. I appreciate the care you took in finding and fixing these issues, as well as thoroughly documenting the process (and the security implications, or lack thereof). Thanks for working on this! Thanks, Taylor [^1]: That I may be the world's least competent Perl programmer.
On Wed, Oct 11, 2023 at 03:19:28PM -0400, Taylor Blau wrote: > I reviewed this carefully (well, except for the new Perl script, for > obvious[^1] reasons ;-)). > > Everything mostly looks good to me, though I > had a handful of review comments throughout. Many of them are trivial > (e.g. a number of warning() and error() strings should be marked for > translation, etc.), but a couple of them I think are worth looking at. Thanks for taking a look. I think it may make sense to come back on top and adjust a few of the commit messages, along with adding a few st_mult() overflow checks that you suggest. > Most notably, I think that by the end of the series, I was convinced > that having some kind of 'pair_chunk_expectsz()' or similar would be > useful and eliminate a good chunk of the boilerplate you have to write > to check the chunk size against an expected value when using > read_chunk(). This I'm less convinced by. In fact, I _almost_ just dropped pair_chunk() entirely. Adding an out-parameter for the size at least forces the caller to consider what to do with the size. But really, I think the right mindset is "we should be sanity-checking this chunk as we load it". And having a callback, even if it is a little bit of boilerplate, helps set that frame of mind. I dunno. Maybe that is all just programmer pseudo-psychology. But I also don't like that about half the calls to pair_chunk() can't do a size check (so we need two functions, or to make the "expect" parameter optional). -Peff