diff mbox series

[v2,01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps

Message ID a18baeb0b42994ebcb216df5fe69459ba9a33795.1624314293.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series multi-pack reachability bitmaps | expand

Commit Message

Taylor Blau June 21, 2021, 10:24 p.m. UTC
The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

Comments

Ævar Arnfjörð Bjarmason June 24, 2021, 11:02 p.m. UTC | #1
On Mon, Jun 21 2021, Taylor Blau wrote:

> +	enum object_type bitmap_type = OBJ_NONE;
> +	int bitmaps_nr = 0;
> +
> +	if (bitmap_get(tdata->commits, pos)) {
> +		bitmap_type = OBJ_COMMIT;
> +		bitmaps_nr++;
> +	}
> +	if (bitmap_get(tdata->trees, pos)) {
> +		bitmap_type = OBJ_TREE;
> +		bitmaps_nr++;
> +	}
> +	if (bitmap_get(tdata->blobs, pos)) {
> +		bitmap_type = OBJ_BLOB;
> +		bitmaps_nr++;
> +	}
> +	if (bitmap_get(tdata->tags, pos)) {
> +		bitmap_type = OBJ_TAG;
> +		bitmaps_nr++;
> +	}

This made me wonder if this could be better with something like the
HAS_MULTI_BITS() macro, but that trick probably can't be applied here in
any shape or form :)

> +
> +	if (!bitmap_type)
> +		die("object %s not found in type bitmaps",
> +		    oid_to_hex(&obj->oid));

It feels a bit magical to use an enum and then assume to know the enum's
values, I know we do "type < 0" all over the place, but I'd think "if
(bitmap_type == OBJ_NONE)" would be better here....

> +
> +	if (bitmaps_nr > 1)
> +		die("object %s does not have a unique type",
> +		    oid_to_hex(&obj->oid));

Or just check the bitmaps_nr instead:

    if (!bitmaps_nr)
        die("found none");
    else if (bitmaps_nr > 1)
        ...;

Just bikeshedding...

> +
> +	if (bitmap_type != obj->type)
> +		die("object %s: real type %s, expected: %s",
> +		    oid_to_hex(&obj->oid),
> +		    type_name(obj->type),
> +		    type_name(bitmap_type));

To argue against myself (sort of) about that "== OBJ_NONE" above, if
we're not assuming that then it's sort of weird not to also assume that
type_name(type) won't return a NULL in the case of OBJ_NONE, which it
does (but this code guards against).
Taylor Blau July 14, 2021, 5:24 p.m. UTC | #2
On Fri, Jun 25, 2021 at 01:02:56AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jun 21 2021, Taylor Blau wrote:
>
> > +	enum object_type bitmap_type = OBJ_NONE;
> > +	int bitmaps_nr = 0;
> > +
> > +	if (bitmap_get(tdata->commits, pos)) {
> > +		bitmap_type = OBJ_COMMIT;
> > +		bitmaps_nr++;
> > +	}
> > +	if (bitmap_get(tdata->trees, pos)) {
> > +		bitmap_type = OBJ_TREE;
> > +		bitmaps_nr++;
> > +	}
> > +	if (bitmap_get(tdata->blobs, pos)) {
> > +		bitmap_type = OBJ_BLOB;
> > +		bitmaps_nr++;
> > +	}
> > +	if (bitmap_get(tdata->tags, pos)) {
> > +		bitmap_type = OBJ_TAG;
> > +		bitmaps_nr++;
> > +	}
>
> This made me wonder if this could be better with something like the
> HAS_MULTI_BITS() macro, but that trick probably can't be applied here in
> any shape or form :)

Right; since we're looking at the same bit position in each of the
type-level bitmaps, we can't just OR them together, since all of the
bits are in the same place.

And really, the object_type enum doesn't have values that tell us the
type of an object by looking at just a single bit. So
HAS_MULTI_BITS(OBJ_BLOB) would return "true", since OBJ_BLOB is 3.

> > +
> > +	if (bitmap_type != obj->type)
> > +		die("object %s: real type %s, expected: %s",
> > +		    oid_to_hex(&obj->oid),
> > +		    type_name(obj->type),
> > +		    type_name(bitmap_type));
>
> To argue against myself (sort of) about that "== OBJ_NONE" above, if
> we're not assuming that then it's sort of weird not to also assume that
> type_name(type) won't return a NULL in the case of OBJ_NONE, which it
> does (but this code guards against).

I tend to agree. To restate what you're saying: by the time we get to
the type_name(bitmap_type) we know that bitmap_type is non-zero, so we
assume it's OK to call type_name() on it.

Of course, the object_type_strings does handle the zero argument, so
this is probably a little academic, but good to think through
nonetheless.

Thanks,
Taylor
Jeff King July 21, 2021, 9:45 a.m. UTC | #3
On Mon, Jun 21, 2021 at 06:24:59PM -0400, Taylor Blau wrote:

> The special `--test-bitmap` mode of `git rev-list` is used to compare
> the result of an object traversal with a bitmap to check its integrity.
> This mode does not, however, assert that the types of reachable objects
> are stored correctly.
> 
> Harden this mode by teaching it to also check that each time an object's
> bit is marked, the corresponding bit should be set in exactly one of the
> type bitmaps (whose type matches the object's true type).

Yep, makes sense, and the patch looks good.

> +{
> +	enum object_type bitmap_type = OBJ_NONE;
> [...]
> +
> +	if (!bitmap_type)
> +		die("object %s not found in type bitmaps",
> +		    oid_to_hex(&obj->oid));

I think the suggestion to do:

  if (bitmap_type == OBJ_NONE)

is reasonable here, as it assumes less about the enum. I do think
OBJ_BAD and OBJ_NONE were chosen with these kind of numeric comparisons
in mind, but there is no reason to rely on them in places we don't need
to.

-Peff
Taylor Blau July 21, 2021, 5:15 p.m. UTC | #4
On Wed, Jul 21, 2021 at 05:45:12AM -0400, Jeff King wrote:
> > +{
> > +	enum object_type bitmap_type = OBJ_NONE;
> > [...]
> > +
> > +	if (!bitmap_type)
> > +		die("object %s not found in type bitmaps",
> > +		    oid_to_hex(&obj->oid));
>
> I think the suggestion to do:
>
>   if (bitmap_type == OBJ_NONE)
>
> is reasonable here, as it assumes less about the enum. I do think
> OBJ_BAD and OBJ_NONE were chosen with these kind of numeric comparisons
> in mind, but there is no reason to rely on them in places we don't need
> to.

I had to double check your suggestion, because my first question was
"what if bitmap_type is OBJ_BAD?" We can call type_name() on OBJ_BAD, but
it will return NULL, and we use the return value in a format string
unconditionally.

So that would be a problem, but it's impossible for this to ever be
OBJ_BAD, because we only set it based on the type bitmaps; so it's
either a commit/tree/blob/tag, or none (but not bad).

I took your suggestion, thanks.

> -Peff

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d90e1d9d8c..368fa59a42 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1301,10 +1301,52 @@  void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (!bitmap_type)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1314,6 +1356,7 @@  static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1328,6 +1371,7 @@  static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1375,6 +1419,10 @@  void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;