Message ID | 20231009210514.GG3282181@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 72a9a08283f1b56598d4af5efb8cd178d4150323 |
Headers | show |
Series | bounds-checks for chunk-based files | expand |
On Mon, Oct 09, 2023 at 05:05:14PM -0400, Jeff King wrote: > @@ -176,9 +176,16 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local > > cur_pack_name = (const char *)m->chunk_pack_names; > for (i = 0; i < m->num_packs; i++) { > + const char *end; > + size_t avail = m->chunk_pack_names_len - > + (cur_pack_name - (const char *)m->chunk_pack_names); > + This patch all looks good to me, but reading this hunk gave me a little bit of pause. I was wondering what might happen if chunk_pack_names_len was zero, and subtracting some other non-zero size_t from it might cause us to wrap around. But I think that's a non-issue here, since we'd set cur_pack_name to the beginning of the chunk, and compute avail as 0 - (m->chunk_pack_names - m->chunk_pack_names), and get 0 back, causing our memchr() lookup below to fail, and for us to report this chunk is garbage. And since cur_pack_name monotonically increases, I think that there is an inductive argument to be made that this subtraction is always safe to do. But it couldn't hurt to do something like: size_t read = cur_pack_name - (const char *)m->chunk_pack_names; size_t avail = m->chunk_pack_names_len; if (read > avail) die("..."); avail -= read; to make absolutely sure that we would never underflow here. Thanks, Taylor
On Wed, Oct 11, 2023 at 10:52:13AM -0400, Taylor Blau wrote: > On Mon, Oct 09, 2023 at 05:05:14PM -0400, Jeff King wrote: > > @@ -176,9 +176,16 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local > > > > cur_pack_name = (const char *)m->chunk_pack_names; > > for (i = 0; i < m->num_packs; i++) { > > + const char *end; > > + size_t avail = m->chunk_pack_names_len - > > + (cur_pack_name - (const char *)m->chunk_pack_names); > > + > > This patch all looks good to me, but reading this hunk gave me a little > bit of pause. I was wondering what might happen if chunk_pack_names_len > was zero, and subtracting some other non-zero size_t from it might cause > us to wrap around. > > But I think that's a non-issue here, since we'd set cur_pack_name to the > beginning of the chunk, and compute avail as 0 - (m->chunk_pack_names - > m->chunk_pack_names), and get 0 back, causing our memchr() lookup below > to fail, and for us to report this chunk is garbage. Right. If it is 0, then we should have 0 avail here in the first loop. I was more worried while writing this that: cur_pack_name = end + 1; later in the loop could get us an off-by-one. But we know we are always pointing to one past an available NUL there, so at most our subtraction will equal m->chunk_pack_names_len. > And since cur_pack_name monotonically increases, I think that there is > an inductive argument to be made that this subtraction is always safe to > do. But it couldn't hurt to do something like: > > size_t read = cur_pack_name - (const char *)m->chunk_pack_names; > size_t avail = m->chunk_pack_names_len; > > if (read > avail) > die("..."); > avail -= read; > > to make absolutely sure that we would never underflow here. You end up with two die() calls, then. One for "hey, we somehow read too far", and one for "hey, I ran out of data while reading the entry". And the first one cannot be triggered. IOW, I think your die() here is a BUG(). -Peff
diff --git a/midx.c b/midx.c index 62e4c03e79..ec585cae1b 100644 --- a/midx.c +++ b/midx.c @@ -157,7 +157,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local MIDX_HEADER_SIZE, m->num_chunks)) goto cleanup_fail; - if (pair_chunk_unsafe(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names)) + if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names, &m->chunk_pack_names_len)) die(_("multi-pack-index required pack-name chunk missing or corrupted")); if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m)) die(_("multi-pack-index required OID fanout chunk missing or corrupted")); @@ -176,9 +176,16 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local cur_pack_name = (const char *)m->chunk_pack_names; for (i = 0; i < m->num_packs; i++) { + const char *end; + size_t avail = m->chunk_pack_names_len - + (cur_pack_name - (const char *)m->chunk_pack_names); + m->pack_names[i] = cur_pack_name; - cur_pack_name += strlen(cur_pack_name) + 1; + end = memchr(cur_pack_name, '\0', avail); + if (!end) + die(_("multi-pack-index pack-name chunk is too short")); + cur_pack_name = end + 1; if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) die(_("multi-pack-index pack names out of order: '%s' before '%s'"), diff --git a/midx.h b/midx.h index 5578cd7b83..5b2a7da043 100644 --- a/midx.h +++ b/midx.h @@ -32,6 +32,7 @@ struct multi_pack_index { int local; const unsigned char *chunk_pack_names; + size_t chunk_pack_names_len; const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_object_offsets; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 2722e495b2..0a0ccec8a4 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -1083,4 +1083,15 @@ test_expect_success 'reader notices too-small oid lookup chunk' ' test_cmp expect err ' +test_expect_success 'reader notices too-small pack names chunk' ' + # There is no NUL to terminate the name here, so the + # chunk is too short. + corrupt_chunk PNAM clear 70656666 && + test_must_fail git log 2>err && + cat >expect <<-\EOF && + fatal: multi-pack-index pack-name chunk is too short + EOF + test_cmp expect err +' + test_done
We parse the pack-name chunk as a series of NUL-terminated strings. But since we don't look at the chunk size, there's nothing to guarantee that we don't parse off the end of the chunk (or even off the end of the mapped file). We can record the length, and then as we parse make sure that we never walk past it. The new test exercises the case, though note that it does not actually segfault before this patch. It hits a NUL byte somewhere in one of the other chunks, and comes up with a garbage pack name. You could construct one that reads out-of-bounds (e.g., a PNAM chunk at the end of file), but this case is simple and sufficient to check that we detect the problem. Signed-off-by: Jeff King <peff@peff.net> --- midx.c | 11 +++++++++-- midx.h | 1 + t/t5319-multi-pack-index.sh | 11 +++++++++++ 3 files changed, 21 insertions(+), 2 deletions(-)