mbox series

[0/20] bounds-checks for chunk-based files

Message ID 20231009205544.GA3281950@coredump.intra.peff.net (mailing list archive)
Headers show
Series bounds-checks for chunk-based files | expand

Message

Jeff King Oct. 9, 2023, 8:55 p.m. UTC
As part of my -Wunused-parameter series, I noticed that a few callbacks
used with the chunk-format API ignored the "chunk_size" parameter. I had
initially just annotated these with UNUSED under the usual "well, not
all callbacks need all parameters" logic.

But if you think about it, a chunk callback that does not look at the
chunk size _must_ be buggy, as there is no way it could ensure that it
does not read past the end of the chunk. In a well-formed file this
isn't a problem, since most chunks have expected sizes (e.g., a
commit-graph has a fixed-size commit-data record for every commit
mentioned in its index chunk). But it is very easy to get out-of-bounds
reads for files that are not well-formed.

I think the security implications here are pretty minor. The only files
which use the chunk format are multi-pack-index and commit-graph,
neither of which we'd expect to receive over the network (so you'd need
to access an untrusted tarball, etc, to even see a malicious file). And
we'd never try to write to this memory (they're read-only mmaps of the
files). So your worst case is probably an unexpected out-of-bounds read
and segfault, on a file that is hard to put in front of the victim.

But I think it's still worth fixing. The extra checks aren't very
expensive, and let us handle bugs or corruption more gracefully, as
well.

It's a lot of patches because there are a lot of chunk types. ;) But
each one is hopefully pretty straightforward in isolation. I tried to
group similar chunks together (e.g., commit-graph and midx both have
OIDF and OIDL chunks), but otherwise just fixed the midx chunks in the
order they appear in the code, followed by the same for commit-graph.

  [01/20]: chunk-format: note that pair_chunk() is unsafe
  [02/20]: t: add library for munging chunk-format files
  [03/20]: midx: stop ignoring malformed oid fanout chunk
  [04/20]: commit-graph: check size of oid fanout chunk
  [05/20]: midx: check size of oid lookup chunk
  [06/20]: commit-graph: check consistency of fanout table
  [07/20]: midx: check size of pack names chunk
  [08/20]: midx: enforce chunk alignment on reading
  [09/20]: midx: check size of object offset chunk
  [10/20]: midx: bounds-check large offset chunk
  [11/20]: midx: check size of revindex chunk
  [12/20]: commit-graph: check size of commit data chunk
  [13/20]: commit-graph: detect out-of-bounds extra-edges pointers
  [14/20]: commit-graph: bounds-check base graphs chunk
  [15/20]: commit-graph: check size of generations chunk
  [16/20]: commit-graph: bounds-check generation overflow chunk
  [17/20]: commit-graph: check bounds when accessing BDAT chunk
  [18/20]: commit-graph: check bounds when accessing BIDX chunk
  [19/20]: commit-graph: detect out-of-order BIDX offsets
  [20/20]: chunk-format: drop pair_chunk_unsafe()

 bloom.c                            |  34 +++++++++
 chunk-format.c                     |  24 ++++--
 chunk-format.h                     |   9 ++-
 commit-graph.c                     | 119 ++++++++++++++++++++++++-----
 commit-graph.h                     |   4 +
 midx.c                             |  68 +++++++++++++----
 midx.h                             |   3 +
 pack-revindex.c                    |  13 +++-
 t/lib-chunk.sh                     |  17 +++++
 t/lib-chunk/corrupt-chunk-file.pl  |  66 ++++++++++++++++
 t/t4216-log-bloom.sh               |  50 ++++++++++++
 t/t5318-commit-graph.sh            |  76 +++++++++++++++++-
 t/t5319-multi-pack-index.sh        | 102 ++++++++++++++++++++++++-
 t/t5324-split-commit-graph.sh      |  20 ++++-
 t/t5328-commit-graph-64bit-time.sh |  10 +++
 15 files changed, 568 insertions(+), 47 deletions(-)
 create mode 100644 t/lib-chunk.sh
 create mode 100644 t/lib-chunk/corrupt-chunk-file.pl

-Peff

Comments

Taylor Blau Oct. 11, 2023, 7:19 p.m. UTC | #1
On Mon, Oct 09, 2023 at 04:55:44PM -0400, Jeff King wrote:
>  bloom.c                            |  34 +++++++++
>  chunk-format.c                     |  24 ++++--
>  chunk-format.h                     |   9 ++-
>  commit-graph.c                     | 119 ++++++++++++++++++++++++-----
>  commit-graph.h                     |   4 +
>  midx.c                             |  68 +++++++++++++----
>  midx.h                             |   3 +
>  pack-revindex.c                    |  13 +++-
>  t/lib-chunk.sh                     |  17 +++++
>  t/lib-chunk/corrupt-chunk-file.pl  |  66 ++++++++++++++++
>  t/t4216-log-bloom.sh               |  50 ++++++++++++
>  t/t5318-commit-graph.sh            |  76 +++++++++++++++++-
>  t/t5319-multi-pack-index.sh        | 102 ++++++++++++++++++++++++-
>  t/t5324-split-commit-graph.sh      |  20 ++++-
>  t/t5328-commit-graph-64bit-time.sh |  10 +++
>  15 files changed, 568 insertions(+), 47 deletions(-)
>  create mode 100644 t/lib-chunk.sh
>  create mode 100644 t/lib-chunk/corrupt-chunk-file.pl

I reviewed this carefully (well, except for the new Perl script, for
obvious[^1] reasons ;-)).

Everything mostly looks good to me, though I
had a handful of review comments throughout. Many of them are trivial
(e.g. a number of warning() and error() strings should be marked for
translation, etc.), but a couple of them I think are worth looking at.

Most notably, I think that by the end of the series, I was convinced
that having some kind of 'pair_chunk_expectsz()' or similar would be
useful and eliminate a good chunk of the boilerplate you have to write
to check the chunk size against an expected value when using
read_chunk().

Otherwise, this looks great. I appreciate the care you took in finding
and fixing these issues, as well as thoroughly documenting the process
(and the security implications, or lack thereof). Thanks for working on
this!

Thanks,
Taylor

[^1]: That I may be the world's least competent Perl programmer.
Jeff King Oct. 11, 2023, 11:31 p.m. UTC | #2
On Wed, Oct 11, 2023 at 03:19:28PM -0400, Taylor Blau wrote:

> I reviewed this carefully (well, except for the new Perl script, for
> obvious[^1] reasons ;-)).
> 
> Everything mostly looks good to me, though I
> had a handful of review comments throughout. Many of them are trivial
> (e.g. a number of warning() and error() strings should be marked for
> translation, etc.), but a couple of them I think are worth looking at.

Thanks for taking a look. I think it may make sense to come back on top
and adjust a few of the commit messages, along with adding a few
st_mult() overflow checks that you suggest.

> Most notably, I think that by the end of the series, I was convinced
> that having some kind of 'pair_chunk_expectsz()' or similar would be
> useful and eliminate a good chunk of the boilerplate you have to write
> to check the chunk size against an expected value when using
> read_chunk().

This I'm less convinced by. In fact, I _almost_ just dropped
pair_chunk() entirely. Adding an out-parameter for the size at least
forces the caller to consider what to do with the size. But really, I
think the right mindset is "we should be sanity-checking this chunk as
we load it". And having a callback, even if it is a little bit of
boilerplate, helps set that frame of mind.

I dunno. Maybe that is all just programmer pseudo-psychology. But I also
don't like that about half the calls to pair_chunk() can't do a size
check (so we need two functions, or to make the "expect" parameter
optional).

-Peff