t5304: add a test for pruning with bitmaps
diff mbox series

Message ID 20190418200827.GB15249@sigill.intra.peff.net
State New
Headers show
Series
  • t5304: add a test for pruning with bitmaps
Related show

Commit Message

Jeff King April 18, 2019, 8:08 p.m. UTC
On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:

> > This block after "if (bitmap_git)" is not exercised by the (non-performance)
> > test suite, so the rest of the code above is not tested, either. Could a test
> > of this "prune" capability be added to the regression tests around the bitmaps?
> 
> I have somewhat mixed feelings here.  We can add a test with a trivial
> bitmap to get code coverage here. But as with many optimizations
> (bitmaps, but also your commit graph work), we get a much more thorough
> correctness test by re-running the whole test suite (though we do not
> yet have one for running with bitmaps on), and a better test that the
> optimization is kicking in and working via the t/perf tests.
> 
> I dunno. I guess it does not hurt to at least to at least make sure this
> code is running in the normal suite. I don't think that will find the
> more interesting regressions, but at least saves us the from the most
> egregious ones ("oops, the whole thing segfaults because somebody
> changed the API" kinds of things).

So here's a test. This goes on top of jk/prune-optim (which is also
already in master).

-- >8 --
Subject: [PATCH] t5304: add a test for pruning with bitmaps

Commit fde67d6896 (prune: use bitmaps for reachability traversal,
2019-02-13) uses bitmaps for pruning when they're available, but only
covers this functionality in the t/perf tests. This makes a kind of
sense, since the point is that the behaviour is indistinguishable before
and after the patch, just faster.

But since the bitmap code path is not exercised at all in the regular
test suite, it leaves us open to a regression where the behavior does in
fact change. The most thorough way to test that would be running the
whole suite with bitmaps enabled. But we don't yet have a way to do
that, and anyway it's expensive to do so. Let's at least add a basic
test that exercises this path and make sure we prune an object we should
(and not one that we shouldn't).

That would hopefully catch the most obvious breakages early.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5304-prune.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Derrick Stolee April 20, 2019, 1:01 a.m. UTC | #1
On 4/18/2019 4:08 PM, Jeff King wrote:
> On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:
>> I dunno. I guess it does not hurt to at least to at least make sure this
>> code is running in the normal suite. I don't think that will find the
>> more interesting regressions, but at least saves us the from the most
>> egregious ones ("oops, the whole thing segfaults because somebody
>> changed the API" kinds of things).

That's the level of coverage I was hoping to see. I won't be picky if
the OBJ_TAG case isn't hit or something.

> So here's a test. This goes on top of jk/prune-optim (which is also
> already in master).

[snip]

> +test_expect_success 'trivial prune with bitmaps enabled' '
> +	git repack -adb &&
> +	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> +	git prune --expire=now &&
> +	git cat-file -e HEAD &&
> +	test_must_fail git cat-file -e $blob
> +'
> +
>  test_done

This test does cover the 'mark_object_seen()' method, which I tested by
removing the "!" in the "if (!obj) die();" condition.

However, my first inclination was to remove the line

	obj->flags |= SEEN;

from the method, and I found that it still worked! This was confusing,
so I looked for places where the SEEN flag was added during bitmap walks,
and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

This means that we can remove a lot of the implementation from reachable.c
and have the same effect (and drop an iteration through the objects). See the
diff below.

Thoughts?
-Stolee

-->8--

diff --git a/reachable.c b/reachable.c
index 0d00a91de4..7d2762d47f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -159,39 +159,6 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
                                      FOR_EACH_OBJECT_LOCAL_ONLY);
 }

-static void *lookup_object_by_type(struct repository *r,
-                                  const struct object_id *oid,
-                                  enum object_type type)
-{
-       switch (type) {
-       case OBJ_COMMIT:
-               return lookup_commit(r, oid);
-       case OBJ_TREE:
-               return lookup_tree(r, oid);
-       case OBJ_TAG:
-               return lookup_tag(r, oid);
-       case OBJ_BLOB:
-               return lookup_blob(r, oid);
-       default:
-               die("BUG: unknown object type %d", type);
-       }
-}
-
-static int mark_object_seen(const struct object_id *oid,
-                            enum object_type type,
-                            int exclude,
-                            uint32_t name_hash,
-                            struct packed_git *found_pack,
-                            off_t found_offset)
-{
-       struct object *obj = lookup_object_by_type(the_repository, oid, type);
-       if (!obj)
-               die("unable to create object '%s'", oid_to_hex(oid));
-
-       obj->flags |= SEEN;
-       return 0;
-}
-
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
                            timestamp_t mark_recent, struct progress *progress)
 {
@@ -225,7 +192,10 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,

        bitmap_git = prepare_bitmap_walk(revs);
        if (bitmap_git) {
-               traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+               /*
+                * reachable objects are marked as SEEN
+                * by prepare_bitmap_walk(revs).
+                */
                free_bitmap_index(bitmap_git);
                return;
        }
Jeff King April 20, 2019, 3:24 a.m. UTC | #2
On Fri, Apr 19, 2019 at 09:01:50PM -0400, Derrick Stolee wrote:

> > +test_expect_success 'trivial prune with bitmaps enabled' '
> > +	git repack -adb &&
> > +	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> > +	git prune --expire=now &&
> > +	git cat-file -e HEAD &&
> > +	test_must_fail git cat-file -e $blob
> > +'
> > +
> >  test_done
> 
> This test does cover the 'mark_object_seen()' method, which I tested by
> removing the "!" in the "if (!obj) die();" condition.
> 
> However, my first inclination was to remove the line
> 
> 	obj->flags |= SEEN;
> 
> from the method, and I found that it still worked! This was confusing,
> so I looked for places where the SEEN flag was added during bitmap walks,
> and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

I think this is only _some_ objects. The full bitmap is created by
reading the on-disk bitmaps, and then filling in any necessary gaps by
doing a traditional traversal. We also set it on tip objects to de-dup
the initial revs->pending list.

Try running t5304 with this:

diff --git a/reachable.c b/reachable.c
index eba6f64e01..7ec73ef43f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
 	if (!obj)
 		die("unable to create object '%s'", oid_to_hex(oid));
 
+	if (!(obj->flags & SEEN))
+		die("seen flag not already there");
 	obj->flags |= SEEN;
 	return 0;
 }

which shows that we are indeed freshly setting SEEN on some objects.

Interestingly, I don't _think_ you can cause an object to get pruned
accidentally here, because:

  1. Any object that will miss its SEEN has to be in the bitmapped pack,
     and not found through normal traversal.

  2. git-prune only removes loose objects, not packed ones.

  3. Loose objects cannot be in the bitmapped pack. Therefore, no object
     can hit both cases (1) and (2).

Even if that were the end of the story, I don't think I'd want to rely
on it here. The post-condition of the function should be that SEEN is
set on all reachable objects, whether bitmaps are used or not. And we do
indeed use this elsewhere:

We'll later call prune_shallow(), which uses SEEN to discard unreachable
entries. I'm not sure it even makes sense to have a shallow repo with
bitmaps, though.  Obviously we're lacking the full graph, but I wouldn't
be surprised if the shallow code quietly pretends that all is well and
we generate bogus bitmaps or something. Or maybe it even mostly works as
long as you don't walk over the shallow cut-points.

mark_reachable_objects() is also used for reflog expiration with
--stale-fix. I tried generating a test that would fail with your patch,
but I think it's not quite possible, because --stale-fix will do a
follow-up walk of the graph to see which objects mentioned in the reflog
we have and do not have. So it doesn't actually break things, it just
makes them slower (because we erroneously fail to mark SEEN things that
it's then forced to walk).

So I don't _think_ your patch actually breaks anything user-visible, but
it's a bit like leaving a live grenade in your living room for somebody
to find.

And I think we are indeed covering lookup_object_by_type(), etc in the
t5304 test I added. So AFAICT all of the new code is covered after that?

-Peff
Derrick Stolee April 20, 2019, 9:01 p.m. UTC | #3
On 4/19/2019 11:24 PM, Jeff King wrote:
> Try running t5304 with this:
> 
> diff --git a/reachable.c b/reachable.c
> index eba6f64e01..7ec73ef43f 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
>  	if (!obj)
>  		die("unable to create object '%s'", oid_to_hex(oid));
>  
> +	if (!(obj->flags & SEEN))
> +		die("seen flag not already there");
>  	obj->flags |= SEEN;
>  	return 0;
>  }
> 
> which shows that we are indeed freshly setting SEEN on some objects.

Good point! Thanks for clearing that up for me.
 
> Interestingly, I don't _think_ you can cause an object to get pruned
> accidentally here, because:

[snip]

I appreciate the additional context that you gave (and I snipped). This
makes me comfortable accepting this test as an exercise of the code (to
guard against future changes that create failures like null-refs) and we
will need to rely on the performance suite to catch issues like removing
the SEEN markers that I had tested.

Thanks,
-Stolee

Patch
diff mbox series

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 1eeb828a15..df60f18fb8 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -341,4 +341,12 @@  test_expect_success 'prune: handle expire option correctly' '
 	git prune --no-expire
 '
 
+test_expect_success 'trivial prune with bitmaps enabled' '
+	git repack -adb &&
+	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
+	git prune --expire=now &&
+	git cat-file -e HEAD &&
+	test_must_fail git cat-file -e $blob
+'
+
 test_done