diff mbox series

[3/3] pack-bitmap.c: use commit boundary during bitmap traversal

Message ID 91ed8c70f22dd2c47c60d650323579fc42cc7f2d.1682380788.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: boundary-based bitmap traversal | expand

Commit Message

Taylor Blau April 25, 2023, midnight UTC
When reachability bitmap coverage exists in a repository, Git will use a
different (and hopefully faster) traversal to compute revision walks.

Consider a set of positive and negative tips (which we'll refer to with
their standard bitmap parlance by, "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
in prepare_bitmap_walk() looks something like:

  1. Consider if we can even compute the set of objects with bitmaps,
     and fall back to the usual traversal if we cannot. For example,
     pathspec limiting traversals can't be computed using bitmaps (since
     they don't know which objects are at which paths). The same is true
     of certain kinds of non-trivial object filters.

  2. If we can compute the traversal with bitmaps, partition the
     (dereferenced) tips into two object lists, "haves", and "wants",
     based on whether or not the objects have the UNINTERESTING flag,
     respectively.

  3. Fall back to the ordinary object traversal if either (a) there are
     no haves, (b) none of the haves are in the bitmapped pack or MIDX,
     or (c) there are no wants.

  4. Construct a reachability bitmap for the "haves" side by walking
     from the revision tips down to any existing bitmaps, OR-ing in any
     bitmaps as they are found.

  5. Then do the same for the "wants" side, stopping at any objects that
     appear in the "haves" bitmap.

  6. Filter the results if any object filter (that can be easily
     computed with bitmaps alone) was given, and then return back to the
     caller.

When there is good bitmap coverage relative to the traversal tips, this
walk is often significantly faster than an ordinary object traversal
because it can visit far fewer objects.

But in certain cases, it can be significantly *slower* than the usual
object traversal. Why? Because we need to compute complete bitmaps on
either side of the walk. If either one (or both) of the sides require
walking many (or all!) objects before they get to an existing bitmap,
the extra bitmap machinery is mostly or all overhead.

One of the benefits, however, is that even if the walk is slower, bitmap
traversals are guaranteed to provide an *exact* answer. Unlike the
traditional object traversal algorithm, which can over-count the results
by not opening trees for older commits, the bitmap walk builds an exact
reachability bitmap for either side, meaning the results are never
over-counted.

But producing non-exact results is OK for our traversal here (both in
the bitmap case and not), as long as the results are over-counted, not
under.

Relaxing the bitmap traversal to allow it to produce over-counted
results gives us the opportunity to make some significant improvements.
Instead of the above, the new algorithm only has to walk from the
*boundary* down to the nearest bitmap, instead of from each of the
UNINTERESTING tips.

The boundary-based approach still has degenerate cases, but we'll show
in a moment that it is often a significant improvement.

The new algorithm works as follows:

  1. Build a (partial) bitmap of the haves side by first OR-ing any
     bitmap(s) that already exist for UNINTERESTING commits between the
     haves and the boundary.

  2. For each commit along the boundary, add it as a fill-in traversal
     tip (where the traversal terminates once an existing bitmap is
     found), and perform fill-in traversal.

  3. Build up a complete bitmap of the wants side as usual, stopping any
     time we intersect the (partial) haves side.

  4. Return the results.

And is more-or-less equivalent to using the *old* algorithm with this
invocation:

    $ git rev-list --objects --boundary $WANTS --not $HAVES |
        perl -lne 'print $1 if /^-(.*)/' |
        xargs git rev-list --objects --use-bitmap-index $WANTS --not

The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
shorter than the distance from (all of) the have tips to the nearest
bitmapped commit.

Note that when using the old bitmap traversal algorithm, the results can
be *slower* than without bitmaps! Under the new algorithm, the result is
computed faster with bitmaps than without (at the cost of over-counting
the true number of objects in a similar fashion as the non-bitmap
traversal):

    # (Computing objects unique to 'master' among all branches, not
    # using bitmaps).
    $ time git rev-list --count --objects master --not --exclude=master --branches
    54

    real	0m1.012s
    user	0m0.796s
    sys	0m0.214s

    # (Computing the same uniqueness query using the old bitmap
    # traversal algorithm.)
    $ time git rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches
    42

    real	0m29.571s
    user	0m28.404s
    sys	0m1.164s

    # (Computing the same uniqueness query using the new bitmap
    # traversal algorithm.)
    $ time git.compile rev-list --count --objects --use-bitmap-index master --not --exclude=master --branches
    54

    real	0m2.279s
    user	0m2.023s
    sys	0m0.256s

The new algorithm is still slower than not using bitmaps at all, but it
is nearly a 13-fold improvement over the existing traversal.

In a more realistic setting (using my local copy of git.git), I can
observe a similar speed-up:

    $ ours="$(git branch --show-current)"
      argv="--count --objects $ours --not --exclude=$ours --branches"
      hyperfine \
        -n 'no bitmaps' "git.compile rev-list $argv" \
        -n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \
        -n 'this commit' "git.compile rev-list --use-bitmap-index $argv"
    Benchmark 1: no bitmaps
      Time (mean ± σ):      15.1 ms ±   4.1 ms    [User: 8.1 ms, System: 6.9 ms]
      Range (min … max):     7.4 ms …  21.8 ms    131 runs

    Benchmark 2: existing bitmap traversal
      Time (mean ± σ):      82.6 ms ±   9.2 ms    [User: 63.6 ms, System: 19.0 ms]
      Range (min … max):    73.8 ms … 105.4 ms    28 runs

    Benchmark 3: this commit
      Time (mean ± σ):      19.8 ms ±   3.1 ms    [User: 13.0 ms, System: 6.8 ms]
      Range (min … max):    17.7 ms …  38.6 ms    158 runs

    Summary
      'no bitmaps' ran
        1.31 ± 0.41 times faster than 'this commit'
        5.49 ± 1.62 times faster than 'existing bitmap traversal'

Here, the new algorithm is also still slower than not using bitmaps, but
represents a 4-fold improvement over the existing traversal in a more
modest example.

Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
the new algorithm's result more competitive. A few other future
directions for improving bitmap traversal times beyond not using bitmaps
at all:

  - Decrease the cost to decompress and OR together many bitmaps
    together (particularly when enumerating the uninteresting side of
    the walk). Here we could explore more efficient bitmap storage
    techniques, like Roaring+Run and/or use SIMD instructions to speed
    up ORing them together.

  - Store pseudo-merge bitmaps, which could allow us to OR together
    fewer "summary" bitmaps (which would also help with the above).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 194 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 158 insertions(+), 36 deletions(-)

Comments

Junio C Hamano April 25, 2023, 6:52 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

>   3. Fall back to the ordinary object traversal if either (a) there are
>      no haves, (b) none of the haves are in the bitmapped pack or MIDX,
>      or (c) there are no wants.

Tangents.  If there are no haves, wouldn't the answer by definition
everything that are reachable from all wants?  Is there a reason why
a bitmapped walk cannot fulfill such a request?  If there are no
wants, wouldn't the answer by definition an empty set?  Again, I am
not sure what the ordinary object traversal is expected to do in
such a case.

These questions are not at all important, as this part is an
explation of what happened in the old code.

> Relaxing the bitmap traversal to allow it to produce over-counted
> results gives us the opportunity to make some significant improvements.
> Instead of the above, the new algorithm only has to walk from the
> *boundary* down to the nearest bitmap, instead of from each of the
> UNINTERESTING tips.

Is it only me, or are all readers equally confused by the use of
the term "boundary" that hasn't been given any definition?

> And is more-or-less equivalent to using the *old* algorithm with this
> invocation:
>
>     $ git rev-list --objects --boundary $WANTS --not $HAVES |

It is especially confusing because the "--boundary" (at least as I
originally had invented it), i.e. a commit that is smudged with
UNINTERESTING bit, whose parent we did not bother to dig further to
paint with the same UNINTERESTING bit, is not something we start our
computation with; it is something we learn _after_ completing the
history traversing.  So I am utterly confused X-<.
Derrick Stolee April 25, 2023, 6:53 p.m. UTC | #2
On 4/24/2023 8:00 PM, Taylor Blau wrote:
> When reachability bitmap coverage exists in a repository, Git will use a
> different (and hopefully faster) traversal to compute revision walks.

Before getting into the code...

> Consider a set of positive and negative tips (which we'll refer to with
> their standard bitmap parlance by, "wants", and "haves"). In order to
> figure out what objects exist between the tips, the existing traversal
> in prepare_bitmap_walk() looks something like:

>   4. Construct a reachability bitmap for the "haves" side by walking
>      from the revision tips down to any existing bitmaps, OR-ing in any
>      bitmaps as they are found.
> 
>   5. Then do the same for the "wants" side, stopping at any objects that
>      appear in the "haves" bitmap.

One important thing about this older algorithm is that it will avoid
including trees and blobs that are reachable from the "haves" that
are not reachable from the boundary between "haves" and "wants". The
most obvious case is that someone force-pushed a branch after rewording
the commit messages but keeping the trees and blobs identical.

This case is rare, and usually the objects that would be repeated are
low enough in count and size (with deltas) that it's not a big deal, but
it is worth bringing up. I didn't see a reference to this side of the
difference in your message.

> One of the benefits, however, is that even if the walk is slower, bitmap
> traversals are guaranteed to provide an *exact* answer. Unlike the
> traditional object traversal algorithm, which can over-count the results
> by not opening trees for older commits, the bitmap walk builds an exact
> reachability bitmap for either side, meaning the results are never
> over-counted.

So the new algorithm is a new, third state of "not overcounting things
reachable beyond the commit-walk boundary" and "overcounting things
reachable from the wants..haves commit region".

> But producing non-exact results is OK for our traversal here (both in
> the bitmap case and not), as long as the results are over-counted, not
> under.

True, our response will not create a pack-file that cannot be applied
locally, but it might be larger than required.

For these reasons, I'm surprised that this patch completely replaces
the old algorithm for the new one. I would prefer to see a config
option that enables this new algorithm. It would be safer to deploy
that way, too.

> The new result performs significantly better in many cases, particularly
> when the distance from the boundary commit(s) to an existing bitmap is
> shorter than the distance from (all of) the have tips to the nearest
> bitmapped commit.

Aside: we may even gain benefits by adjusting the commits selected for
bitmaps by trying for this data shape. One way to attempt this is to
not focus on refs but on history common to "most refs" using first-
parent history as a good heuristic. If a commit appears in a lot of
first-parent histories, then it is more likely to be helpful to these
kinds of walks.
 
> Note that when using the old bitmap traversal algorithm, the results can
> be *slower* than without bitmaps! Under the new algorithm, the result is
> computed faster with bitmaps than without (at the cost of over-counting
> the true number of objects in a similar fashion as the non-bitmap
> traversal):
>
>     # (Computing objects unique to 'master' among all branches, not
>     # using bitmaps).
>     $ time git rev-list --count --objects master --not --exclude=master --branches
>     54

I like how you included the output of --count here. It might be interesting
to demonstrate the different forms of overcounting in a carefully constructed
test case, which would help us be sure that we are using the desired
algorithm during a test case.

>     real	0m1.012s  (no bitmaps)
>     real	0m29.571s (bitmaps, old)
>     real	0m2.279s  (bitmaps, new)

> The new algorithm is still slower than not using bitmaps at all, but it
> is nearly a 13-fold improvement over the existing traversal.
 
> In a more realistic setting (using my local copy of git.git), I can
> observe a similar speed-up:
...
> Here, the new algorithm is also still slower than not using bitmaps, but
> represents a 4-fold improvement over the existing traversal in a more
> modest example.

Interesting that the new bitmap algorithm is worse in all presented cases.

How "atypical" do we need to be in order for bitmaps to outperform the
tree-parsing algorithm? Do we need to try "--branches --not master~1000"
to replicate a stale single-branch clone getting every latest commit?

(I'll leave code review to a separate reply.)

Thanks,
-Stolee
Felipe Contreras May 2, 2023, 9:31 p.m. UTC | #3
Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:

> > Relaxing the bitmap traversal to allow it to produce over-counted
> > results gives us the opportunity to make some significant improvements.
> > Instead of the above, the new algorithm only has to walk from the
> > *boundary* down to the nearest bitmap, instead of from each of the
> > UNINTERESTING tips.
> 
> Is it only me, or are all readers equally confused by the use of
> the term "boundary" that hasn't been given any definition?
> 
> > And is more-or-less equivalent to using the *old* algorithm with this
> > invocation:
> >
> >     $ git rev-list --objects --boundary $WANTS --not $HAVES |
> 
> It is especially confusing because the "--boundary" (at least as I
> originally had invented it), i.e. a commit that is smudged with
> UNINTERESTING bit, whose parent we did not bother to dig further to
> paint with the same UNINTERESTING bit, is not something we start our
> computation with; it is something we learn _after_ completing the
> history traversing.  So I am utterly confused X-<.

I don't know if it's the case, but in my mind all the `--not $HAVES` are
boundaries. Some of these might be overshadowed by another boundary higher up
in the topology and thus not shown, so in a sense are "uninteresting
boundaries".

Perhaps because you invented `--boundary` you think a "boundary" is only an
interesting boundary which must be computed, and all the other are not true
boundaries.

Cheers.
Taylor Blau May 3, 2023, 9:42 p.m. UTC | #4
On Tue, Apr 25, 2023 at 11:52:25AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >   3. Fall back to the ordinary object traversal if either (a) there are
> >      no haves, (b) none of the haves are in the bitmapped pack or MIDX,
> >      or (c) there are no wants.
>
> Tangents.  If there are no haves, wouldn't the answer by definition
> everything that are reachable from all wants?  Is there a reason why
> a bitmapped walk cannot fulfill such a request?  If there are no
> wants, wouldn't the answer by definition an empty set?  Again, I am
> not sure what the ordinary object traversal is expected to do in
> such a case.

Oops, (a) is just wrong. The old code is:

    /*
     * if we have a HAVES list, but none of those haves is contained
     * in the packfile that has a bitmap, we don't have anything to
     * optimize here
     */
    if (haves && !in_bitmapped_pack(bitmap_git, haves))
      goto cleanup;

which is itself an iffy optimization that I have toyed with the idea of
getting rid of over the years, since I am skeptical that it is doing
much even in the classic bitmap traversal.

But we only do that in_bitmapped_pack() if there are haves to begin
with, so (a) from my commit message is incorrect. Sorry about that.

> > And is more-or-less equivalent to using the *old* algorithm with this
> > invocation:
> >
> >     $ git rev-list --objects --boundary $WANTS --not $HAVES |
>
> It is especially confusing because the "--boundary" (at least as I
> originally had invented it), i.e. a commit that is smudged with
> UNINTERESTING bit, whose parent we did not bother to dig further to
> paint with the same UNINTERESTING bit, is not something we start our
> computation with; it is something we learn _after_ completing the
> history traversing.  So I am utterly confused X-<.

Hmm. This is from code that Peff and I wrote together a long time ago.
My recollection is that we only care about the UNINTERESTING commits
between the tips and boundary (as you described it), but no further, and
that we could discover that set during limit_list() alone.

But maybe not? I honestly cannot remember. Perhaps Peff does?

Thanks,
Taylor
Junio C Hamano May 3, 2023, 9:58 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> Hmm. This is from code that Peff and I wrote together a long time ago.
> My recollection is that we only care about the UNINTERESTING commits
> between the tips and boundary (as you described it), but no further, and
> that we could discover that set during limit_list() alone.
>
> But maybe not? I honestly cannot remember. Perhaps Peff does?

Yes, that matches my understanding.

By the time you finish limit_list(), from the "revision traversal"
point of view, you have done all the hard work already, sifting
commits between the ones that are and are not interesting.  I am
surprised that whatever you learn only after reaching that point is
still helpful to optimize some operations.
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 046240d072..c47c97f35b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1048,7 +1048,7 @@  static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
 	struct include_data incdata;
 	struct bitmap_show_data show_data;
 
-	if (base == NULL)
+	if (!base)
 		base = bitmap_new();
 
 	incdata.bitmap_git = bitmap_git;
@@ -1075,6 +1075,145 @@  static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
 	return base;
 }
 
+static int obj_in_bitmap(struct bitmap_index *bitmap_git,
+			 struct object *obj, struct bitmap *bitmap)
+{
+	int pos;
+
+	if (!bitmap)
+		return 0;
+	pos = bitmap_position(bitmap_git, &obj->oid);
+	if (pos < 0)
+		return 0;
+	return bitmap_get(bitmap, pos);
+}
+
+static void show_boundary_commit(struct commit *commit, void *data)
+{
+	struct object_array *boundary = data;
+	if (!(commit->object.flags & BOUNDARY))
+		return;
+
+	add_object_array(&commit->object, "", boundary);
+}
+
+static void show_boundary_object(struct object *object,
+				 const char *name, void *data)
+{
+	BUG("should not be called");
+}
+
+static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
+					    struct rev_info *revs,
+					    struct object_list *roots)
+{
+	struct bitmap *base = NULL;
+	struct object_array boundary = OBJECT_ARRAY_INIT;
+	int any_missing = 0;
+	unsigned int i;
+	int tmp_blobs, tmp_trees, tmp_tags;
+
+	revs->ignore_missing_links = 1;
+	revs->collect_uninteresting = 1;
+
+	/*
+	 * OR in any existing reachability bitmaps among `roots` into `base`.
+	 */
+	while (roots) {
+		struct object *object = roots->item;
+		roots = roots->next;
+
+		if (object->type == OBJ_COMMIT &&
+		    !obj_in_bitmap(bitmap_git, object, base) &&
+		    add_commit_to_bitmap(bitmap_git, &base,
+					 (struct commit *)object)) {
+			object->flags |= SEEN;
+			continue;
+		}
+
+		any_missing = 1;
+	}
+
+	if (!any_missing)
+		goto cleanup;
+
+	tmp_blobs = revs->blob_objects;
+	tmp_trees = revs->tree_objects;
+	tmp_tags = revs->blob_objects;
+	revs->blob_objects = 0;
+	revs->tree_objects = 0;
+	revs->tag_objects = 0;
+
+	/*
+	 * We didn't have complete coverage of the roots. First OR in any
+	 * bitmaps that are UNINTERESTING between the tips and boundary.
+	 */
+	trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository);
+	if (prepare_revision_walk(revs))
+		die("revision walk setup failed");
+	trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository);
+
+	trace2_region_enter("pack-bitmap", "boundary-load-bitmaps", the_repository);
+	for (i = 0; i < revs->uninteresting_commits.nr; i++) {
+		struct object *obj = revs->uninteresting_commits.objects[i].item;
+		if (obj->type != OBJ_COMMIT)
+			BUG("unexpected non-commit %s marked uninteresting",
+			    oid_to_hex(&obj->oid));
+
+		if (obj_in_bitmap(bitmap_git, obj, base))
+			continue;
+
+		add_commit_to_bitmap(bitmap_git, &base, (struct commit *)obj);
+	}
+	trace2_region_leave("pack-bitmap", "boundary-load-bitmaps", the_repository);
+
+	/*
+	 * Then add the boundary commit(s) as fill-in traversal tips.
+	 */
+	trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository);
+	revs->boundary = 1;
+	traverse_commit_list_filtered(revs,
+				      show_boundary_commit,
+				      show_boundary_object,
+				      &boundary, NULL);
+	revs->boundary = 0;
+	revs->blob_objects = tmp_blobs;
+	revs->tree_objects = tmp_trees;
+	revs->tag_objects = tmp_tags;
+
+	reset_revision_walk();
+	clear_object_flags(UNINTERESTING);
+	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
+
+	trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
+	if (boundary.nr) {
+		struct object *obj;
+		int needs_walk = 0;
+
+		for (i = 0; i < boundary.nr; i++) {
+			obj = boundary.objects[i].item;
+
+			if (obj_in_bitmap(bitmap_git, obj, base)) {
+				obj->flags |= SEEN;
+			} else {
+				add_pending_object(revs, obj, "");
+				needs_walk = 1;
+			}
+		}
+
+		if (needs_walk)
+			base = fill_in_bitmap(bitmap_git, revs, base, NULL);
+	}
+	trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);
+
+
+cleanup:
+	revs->ignore_missing_links = 0;
+	revs->collect_uninteresting = 0;
+
+	return base;
+}
+
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
@@ -1140,8 +1279,21 @@  static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		}
 	}
 
-	if (needs_walk)
+	if (needs_walk) {
+		/*
+		 * This fill-in traversal may walk over some objects
+		 * again, since we have already traversed in order to
+		 * find the boundary.
+		 *
+		 * But this extra walk should be extremely cheap, since
+		 * all commit objects are loaded into memory, and
+		 * because we skip walking to parents that are
+		 * UNINTERESTING, since it will be marked in the haves
+		 * bitmap already (or it has an on-disk bitmap, since
+		 * OR-ing it in covers all of its ancestors).
+		 */
 		base = fill_in_bitmap(bitmap_git, revs, base, seen);
+	}
 
 	return base;
 }
@@ -1257,25 +1409,6 @@  static void show_objects_for_type(
 	}
 }
 
-static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
-			     struct object_list *roots)
-{
-	while (roots) {
-		struct object *object = roots->item;
-		roots = roots->next;
-
-		if (bitmap_is_midx(bitmap_git)) {
-			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
-				return 1;
-		} else {
-			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-				return 1;
-		}
-	}
-
-	return 0;
-}
-
 static struct bitmap *find_tip_objects(struct bitmap_index *bitmap_git,
 				       struct object_list *tip_objects,
 				       enum object_type type)
@@ -1583,14 +1716,6 @@  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			object_list_insert(object, &wants);
 	}
 
-	/*
-	 * if we have a HAVES list, but none of those haves is contained
-	 * in the packfile that has a bitmap, we don't have anything to
-	 * optimize here
-	 */
-	if (haves && !in_bitmapped_pack(bitmap_git, haves))
-		goto cleanup;
-
 	/* if we don't want anything, we're done here */
 	if (!wants)
 		goto cleanup;
@@ -1603,18 +1728,15 @@  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
-	object_array_clear(&revs->pending);
-
 	if (haves) {
-		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
-		reset_revision_walk();
-		revs->ignore_missing_links = 0;
-
+		haves_bitmap = find_boundary_objects(bitmap_git, revs, haves);
 		if (!haves_bitmap)
 			BUG("failed to perform bitmap walk");
 	}
 
+	object_array_clear(&revs->pending);
+	reset_revision_walk();
+
 	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
 
 	if (!wants_bitmap)