diff mbox series

[07/20] midx: check size of pack names chunk

Message ID 20231009210514.GG3282181@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 72a9a08283f1b56598d4af5efb8cd178d4150323
Headers show
Series bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 9:05 p.m. UTC
We parse the pack-name chunk as a series of NUL-terminated strings. But
since we don't look at the chunk size, there's nothing to guarantee that
we don't parse off the end of the chunk (or even off the end of the
mapped file).

We can record the length, and then as we parse make sure that we never
walk past it.

The new test exercises the case, though note that it does not actually
segfault before this patch. It hits a NUL byte somewhere in one of the
other chunks, and comes up with a garbage pack name. You could construct
one that reads out-of-bounds (e.g., a PNAM chunk at the end of file),
but this case is simple and sufficient to check that we detect the
problem.

Signed-off-by: Jeff King <peff@peff.net>
---
 midx.c                      | 11 +++++++++--
 midx.h                      |  1 +
 t/t5319-multi-pack-index.sh | 11 +++++++++++
 3 files changed, 21 insertions(+), 2 deletions(-)

Comments

Taylor Blau Oct. 11, 2023, 2:52 p.m. UTC | #1
On Mon, Oct 09, 2023 at 05:05:14PM -0400, Jeff King wrote:
> @@ -176,9 +176,16 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>
>  	cur_pack_name = (const char *)m->chunk_pack_names;
>  	for (i = 0; i < m->num_packs; i++) {
> +		const char *end;
> +		size_t avail = m->chunk_pack_names_len -
> +				(cur_pack_name - (const char *)m->chunk_pack_names);
> +

This patch all looks good to me, but reading this hunk gave me a little
bit of pause. I was wondering what might happen if chunk_pack_names_len
was zero, and subtracting some other non-zero size_t from it might cause
us to wrap around.

But I think that's a non-issue here, since we'd set cur_pack_name to the
beginning of the chunk, and compute avail as 0 - (m->chunk_pack_names -
m->chunk_pack_names), and get 0 back, causing our memchr() lookup below
to fail, and for us to report this chunk is garbage.

And since cur_pack_name monotonically increases, I think that there is
an inductive argument to be made that this subtraction is always safe to
do. But it couldn't hurt to do something like:

    size_t read = cur_pack_name - (const char *)m->chunk_pack_names;
    size_t avail = m->chunk_pack_names_len;

    if (read > avail)
        die("...");
    avail -= read;

to make absolutely sure that we would never underflow here.

Thanks,
Taylor
Jeff King Oct. 11, 2023, 11:06 p.m. UTC | #2
On Wed, Oct 11, 2023 at 10:52:13AM -0400, Taylor Blau wrote:

> On Mon, Oct 09, 2023 at 05:05:14PM -0400, Jeff King wrote:
> > @@ -176,9 +176,16 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
> >
> >  	cur_pack_name = (const char *)m->chunk_pack_names;
> >  	for (i = 0; i < m->num_packs; i++) {
> > +		const char *end;
> > +		size_t avail = m->chunk_pack_names_len -
> > +				(cur_pack_name - (const char *)m->chunk_pack_names);
> > +
> 
> This patch all looks good to me, but reading this hunk gave me a little
> bit of pause. I was wondering what might happen if chunk_pack_names_len
> was zero, and subtracting some other non-zero size_t from it might cause
> us to wrap around.
> 
> But I think that's a non-issue here, since we'd set cur_pack_name to the
> beginning of the chunk, and compute avail as 0 - (m->chunk_pack_names -
> m->chunk_pack_names), and get 0 back, causing our memchr() lookup below
> to fail, and for us to report this chunk is garbage.

Right. If it is 0, then we should have 0 avail here in the first loop.

I was more worried while writing this that:

  cur_pack_name = end + 1;

later in the loop could get us an off-by-one. But we know we are always
pointing to one past an available NUL there, so at most our subtraction
will equal m->chunk_pack_names_len.

> And since cur_pack_name monotonically increases, I think that there is
> an inductive argument to be made that this subtraction is always safe to
> do. But it couldn't hurt to do something like:
> 
>     size_t read = cur_pack_name - (const char *)m->chunk_pack_names;
>     size_t avail = m->chunk_pack_names_len;
> 
>     if (read > avail)
>         die("...");
>     avail -= read;
> 
> to make absolutely sure that we would never underflow here.

You end up with two die() calls, then. One for "hey, we somehow read too
far", and one for "hey, I ran out of data while reading the entry". And
the first one cannot be triggered.

IOW, I think your die() here is a BUG().

-Peff
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index 62e4c03e79..ec585cae1b 100644
--- a/midx.c
+++ b/midx.c
@@ -157,7 +157,7 @@  struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 				   MIDX_HEADER_SIZE, m->num_chunks))
 		goto cleanup_fail;
 
-	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names))
+	if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names, &m->chunk_pack_names_len))
 		die(_("multi-pack-index required pack-name chunk missing or corrupted"));
 	if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m))
 		die(_("multi-pack-index required OID fanout chunk missing or corrupted"));
@@ -176,9 +176,16 @@  struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 	cur_pack_name = (const char *)m->chunk_pack_names;
 	for (i = 0; i < m->num_packs; i++) {
+		const char *end;
+		size_t avail = m->chunk_pack_names_len -
+				(cur_pack_name - (const char *)m->chunk_pack_names);
+
 		m->pack_names[i] = cur_pack_name;
 
-		cur_pack_name += strlen(cur_pack_name) + 1;
+		end = memchr(cur_pack_name, '\0', avail);
+		if (!end)
+			die(_("multi-pack-index pack-name chunk is too short"));
+		cur_pack_name = end + 1;
 
 		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
 			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
diff --git a/midx.h b/midx.h
index 5578cd7b83..5b2a7da043 100644
--- a/midx.h
+++ b/midx.h
@@ -32,6 +32,7 @@  struct multi_pack_index {
 	int local;
 
 	const unsigned char *chunk_pack_names;
+	size_t chunk_pack_names_len;
 	const uint32_t *chunk_oid_fanout;
 	const unsigned char *chunk_oid_lookup;
 	const unsigned char *chunk_object_offsets;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 2722e495b2..0a0ccec8a4 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1083,4 +1083,15 @@  test_expect_success 'reader notices too-small oid lookup chunk' '
 	test_cmp expect err
 '
 
+test_expect_success 'reader notices too-small pack names chunk' '
+	# There is no NUL to terminate the name here, so the
+	# chunk is too short.
+	corrupt_chunk PNAM clear 70656666 &&
+	test_must_fail git log 2>err &&
+	cat >expect <<-\EOF &&
+	fatal: multi-pack-index pack-name chunk is too short
+	EOF
+	test_cmp expect err
+'
+
 test_done