diff mbox series

[6/6] pack-bitmap.c: factor out `bitmap_index_seek_commit()`

Message ID 82049ed09e1695db644d1d4cf17557214e54dcea.1679342296.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit 3acd24d42afc1f51ec55a33abe6168ccd6c92efb
Headers show
Series pack-bitmap: miscellaneous mmap read hardening | expand

Commit Message

Taylor Blau March 20, 2023, 8:02 p.m. UTC
Factor out a common pattern within `lazy_bitmap_for_commit()` where we
seek to a given position (expecting to read the start of an individual
bitmap entry).

Both spots within `lazy_bitmap_for_commit()` emit a common error, so
factor out the whole routine into its own function to DRY things up a
little.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

Comments

Jeff King March 21, 2023, 6:13 p.m. UTC | #1
On Mon, Mar 20, 2023 at 04:02:55PM -0400, Taylor Blau wrote:

> Factor out a common pattern within `lazy_bitmap_for_commit()` where we
> seek to a given position (expecting to read the start of an individual
> bitmap entry).
> 
> Both spots within `lazy_bitmap_for_commit()` emit a common error, so
> factor out the whole routine into its own function to DRY things up a
> little.

OK, so this case makes more sense to me as a bounds-check, because we
are seeking to an arbitrary offset.

But...

> +static int bitmap_index_seek_commit(struct bitmap_index *bitmap_git,
> +				     struct object_id *oid,
> +				     size_t pos)
> +{
> +	const int bitmap_header_size = 6;
> +
> +	bitmap_index_seek(bitmap_git, pos, SEEK_SET);
> +
> +	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size)
> +		return error(_("corrupt ewah bitmap: truncated header for "
> +			       "bitmap of commit \"%s\""),
> +			oid_to_hex(oid));
> +	return 0;
> +}

So here we seek to the position, and then make sure we have 6 bytes to
read, which makes sense. But in the seek step, are we worried that we
will seek to somewhere bogus? If so, we'll get a BUG(). But is that the
right thing if somebody feeds a bogus "pos" that moves past truncation?

I'm not 100% sure on where these offsets come from. But it looks like
they're coming from the bitmap lookup table. In which case a bogus value
there should be an error(), and not a BUG(), I would think.

-Peff
Taylor Blau March 21, 2023, 6:16 p.m. UTC | #2
On Tue, Mar 21, 2023 at 02:13:15PM -0400, Jeff King wrote:
> I'm not 100% sure on where these offsets come from. But it looks like
> they're coming from the bitmap lookup table. In which case a bogus value
> there should be an error(), and not a BUG(), I would think.

They do come from the lookup table, yes. I'm not sure that I agree that
bogus values here should be an error() or a BUG(), or if I even have a
strong preference between one and the other.

But I do think that trying to make it an error() makes it awkward for
all of the other callers that want it to be a BUG(), since the detail of
whether to call one or the other is private to bitmap_index_seek().

We *could* open-code it, introduce a variant of bitmap_index_seek(),
make it take an additional parameter specifying whether to call one over
the other, *or* check the bounds ourselves before even calling
bitmap_index_seek().

But none of those seem like great options to me, TBH. I would be just as
happy to leave this as a BUG(), to be honest.

Thanks,
Taylor
Jeff King March 21, 2023, 6:27 p.m. UTC | #3
On Tue, Mar 21, 2023 at 02:16:40PM -0400, Taylor Blau wrote:

> On Tue, Mar 21, 2023 at 02:13:15PM -0400, Jeff King wrote:
> > I'm not 100% sure on where these offsets come from. But it looks like
> > they're coming from the bitmap lookup table. In which case a bogus value
> > there should be an error(), and not a BUG(), I would think.
> 
> They do come from the lookup table, yes. I'm not sure that I agree that
> bogus values here should be an error() or a BUG(), or if I even have a
> strong preference between one and the other.

The usual philosophy we've applied is: a BUG() should not be
trigger-able, even if Git is fed bad data. A BUG() should indicate an
error in the program logic, and if we see one, there should be a code
fix that handles the case.

Whereas if I understand this correctly, if I corrupt the bitmap file on
disk, we'd trigger this BUG().

In many cases I think one could argue that it's kind of academic. But in
this case we should be able to say "oops, the bitmap file seems corrupt"
and skip using it, rather than bailing completely from the process.

> But I do think that trying to make it an error() makes it awkward for
> all of the other callers that want it to be a BUG(), since the detail of
> whether to call one or the other is private to bitmap_index_seek().
>
> We *could* open-code it, introduce a variant of bitmap_index_seek(),
> make it take an additional parameter specifying whether to call one over
> the other, *or* check the bounds ourselves before even calling
> bitmap_index_seek().

I'm mostly unconvinced of the value of bitmap_index_seek() doing
checking at all, because it is too late in most of the cases. In fact it
is only in this case that it is doing something useful, which makes me
think that the check should be open-coded here.

-Peff
Derrick Stolee March 24, 2023, 6:09 p.m. UTC | #4
On 3/21/2023 2:27 PM, Jeff King wrote:
> On Tue, Mar 21, 2023 at 02:16:40PM -0400, Taylor Blau wrote:
> 
>> On Tue, Mar 21, 2023 at 02:13:15PM -0400, Jeff King wrote:
>>> I'm not 100% sure on where these offsets come from. But it looks like
>>> they're coming from the bitmap lookup table. In which case a bogus value
>>> there should be an error(), and not a BUG(), I would think.
>>
>> They do come from the lookup table, yes. I'm not sure that I agree that
>> bogus values here should be an error() or a BUG(), or if I even have a
>> strong preference between one and the other.
> 
> The usual philosophy we've applied is: a BUG() should not be
> trigger-able, even if Git is fed bad data. A BUG() should indicate an
> error in the program logic, and if we see one, there should be a code
> fix that handles the case.
> 
> Whereas if I understand this correctly, if I corrupt the bitmap file on
> disk, we'd trigger this BUG().
> 
> In many cases I think one could argue that it's kind of academic. But in
> this case we should be able to say "oops, the bitmap file seems corrupt"
> and skip using it, rather than bailing completely from the process.

It's not just academic. BUG() statements kill the process without running
important cleanup steps like deleting open .lock files or outputting the
final traces. This can be especially problematic when we count on those
operations in order to recover a repository from such errors.
 
>> But I do think that trying to make it an error() makes it awkward for
>> all of the other callers that want it to be a BUG(), since the detail of
>> whether to call one or the other is private to bitmap_index_seek().
>>
>> We *could* open-code it, introduce a variant of bitmap_index_seek(),
>> make it take an additional parameter specifying whether to call one over
>> the other, *or* check the bounds ourselves before even calling
>> bitmap_index_seek().
> 
> I'm mostly unconvinced of the value of bitmap_index_seek() doing
> checking at all, because it is too late in most of the cases. In fact it
> is only in this case that it is doing something useful, which makes me
> think that the check should be open-coded here.

If we universally check whether bitmap_index_seek() works, then there
is value. It avoids the existing ad-hoc checks in favor of always-on
checks (as well as avoiding potential disconnects between the check
and the seeked position in the future).

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 38a3c6a3f9..9859f61a5a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -156,6 +156,21 @@  static size_t bitmap_index_seek(struct bitmap_index *bitmap_git, size_t offset,
 	return bitmap_git->map_pos;
 }
 
+static int bitmap_index_seek_commit(struct bitmap_index *bitmap_git,
+				     struct object_id *oid,
+				     size_t pos)
+{
+	const int bitmap_header_size = 6;
+
+	bitmap_index_seek(bitmap_git, pos, SEEK_SET);
+
+	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size)
+		return error(_("corrupt ewah bitmap: truncated header for "
+			       "bitmap of commit \"%s\""),
+			oid_to_hex(oid));
+	return 0;
+}
+
 /*
  * Read a bitmap from the current read position on the mmaped
  * index, and increase the read position accordingly
@@ -737,7 +752,6 @@  static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	struct object_id *oid = &commit->object.oid;
 	struct ewah_bitmap *bitmap;
 	struct stored_bitmap *xor_bitmap = NULL;
-	const int bitmap_header_size = 6;
 	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
 	static size_t xor_items_nr = 0, xor_items_alloc = 0;
 	static int is_corrupt = 0;
@@ -796,13 +810,10 @@  static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 
 	while (xor_items_nr) {
 		xor_item = &xor_items[xor_items_nr - 1];
-		bitmap_index_seek(bitmap_git, xor_item->offset, SEEK_SET);
 
-		if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
-			error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
-				oid_to_hex(&xor_item->oid));
+		if (bitmap_index_seek_commit(bitmap_git, &xor_item->oid,
+					     xor_item->offset) < 0)
 			goto corrupt;
-		}
 
 		bitmap_index_seek(bitmap_git,
 				  sizeof(uint32_t) + sizeof(uint8_t), SEEK_CUR);
@@ -816,12 +827,8 @@  static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 		xor_items_nr--;
 	}
 
-	bitmap_index_seek(bitmap_git, offset, SEEK_SET);
-	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
-		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
-			oid_to_hex(oid));
+	if (bitmap_index_seek_commit(bitmap_git, oid, offset) < 0)
 		goto corrupt;
-	}
 
 	/*
 	 * Don't bother reading the commit's index position or its xor