diff mbox series

[4/4] pack-bitmap: pass object filter to fill-in traversal

Message ID 65467a058e7dca6cf1e2db9cdab81513989b5db0.1587597151.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series pack-bitmap: use bitmaps for traversals with '--filter=tree:0' | expand

Commit Message

Taylor Blau April 22, 2020, 11:13 p.m. UTC
From: Jeff King <peff@peff.net>

Sometimes a bitmap traversal still has to walk some commits manually,
because those commits aren't included in the bitmap packfile (e.g., due
to a push or commit since the last full repack). If we're given an
object filter, we don't pass it down to this traversal. It's not
necessary for correctness because the bitmap code has its own filters to
post-process the bitmap result (which it must, to filter out the objects
that _are_ mentioned in the bitmapped packfile).

And with blob filters, there was no performance reason to pass along
those filters, either. The fill-in traversal could omit them from the
result, but it wouldn't save us any time to do so, since we'd still have
to walk each tree entry to see if it's a blob or not.

But now that we support tree filters, there's opportunity for savings. A
tree:depth=0 filter means we can avoid accessing trees entirely, since
we know we won't them (or any of the subtrees or blobs they point to).
The new test in p5310 shows this off (the "partial bitmap" state is one
where HEAD~100 and its ancestors are all in a bitmapped pack, but
HEAD~100..HEAD are not). Here are the results (run against linux.git):

  Test                                                  HEAD^               HEAD
  -------------------------------------------------------------------------------------------------
  [...]
  5310.16: rev-list with tree filter (partial bitmap)   0.19(0.17+0.02)     0.03(0.02+0.01) -84.2%

The absolute number of savings isn't _huge_, but keep in mind that we
only omitted 100 first-parent links (in the version of linux.git here,
that's 894 actual commits). In a more pathological case, we might have a
much larger proportion of non-bitmapped commits. I didn't bother
creating such a case in the perf script because the setup is expensive,
and this is plenty to show the savings as a percentage.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c                | 14 +++++++++-----
 t/perf/p5310-pack-bitmaps.sh |  5 +++++
 2 files changed, 14 insertions(+), 5 deletions(-)

Comments

Jeff King April 24, 2020, 5:42 a.m. UTC | #1
On Wed, Apr 22, 2020 at 05:13:35PM -0600, Taylor Blau wrote:

> From: Jeff King <peff@peff.net>
> 
> Sometimes a bitmap traversal still has to walk some commits manually,
> because those commits aren't included in the bitmap packfile (e.g., due
> to a push or commit since the last full repack). If we're given an
> object filter, we don't pass it down to this traversal. It's not
> necessary for correctness because the bitmap code has its own filters to
> post-process the bitmap result (which it must, to filter out the objects
> that _are_ mentioned in the bitmapped packfile).
> 
> And with blob filters, there was no performance reason to pass along
> those filters, either. The fill-in traversal could omit them from the
> result, but it wouldn't save us any time to do so, since we'd still have
> to walk each tree entry to see if it's a blob or not.
> 
> But now that we support tree filters, there's opportunity for savings. A
> tree:depth=0 filter means we can avoid accessing trees entirely, since
> we know we won't them (or any of the subtrees or blobs they point to).

s/won't them/won't include them/ perhaps

> diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
> index b629a211f9..95379b1d4e 100755
> --- a/t/perf/p5310-pack-bitmaps.sh
> +++ b/t/perf/p5310-pack-bitmaps.sh
> @@ -95,4 +95,9 @@ test_perf 'pack to file (partial bitmap)' '
>  	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
>  '
>  
> +test_perf 'rev-list with tree filter (partial bitmap)' '
> +	git rev-list --use-bitmap-index --count --objects --all \
> +		--filter=tree:0 >/dev/null
> +'

This covers perf testing of this partial-bitmap state, but we shoudl
make sure that we are covering correctness, too. I think so, because
t6113 creates a similar state for all of its tests.

-Peff
Taylor Blau April 24, 2020, 4:54 p.m. UTC | #2
On Fri, Apr 24, 2020 at 01:42:27AM -0400, Jeff King wrote:
> On Wed, Apr 22, 2020 at 05:13:35PM -0600, Taylor Blau wrote:
>
> > From: Jeff King <peff@peff.net>
> >
> > Sometimes a bitmap traversal still has to walk some commits manually,
> > because those commits aren't included in the bitmap packfile (e.g., due
> > to a push or commit since the last full repack). If we're given an
> > object filter, we don't pass it down to this traversal. It's not
> > necessary for correctness because the bitmap code has its own filters to
> > post-process the bitmap result (which it must, to filter out the objects
> > that _are_ mentioned in the bitmapped packfile).
> >
> > And with blob filters, there was no performance reason to pass along
> > those filters, either. The fill-in traversal could omit them from the
> > result, but it wouldn't save us any time to do so, since we'd still have
> > to walk each tree entry to see if it's a blob or not.
> >
> > But now that we support tree filters, there's opportunity for savings. A
> > tree:depth=0 filter means we can avoid accessing trees entirely, since
> > we know we won't them (or any of the subtrees or blobs they point to).
>
> s/won't them/won't include them/ perhaps

Oops. Even though you wrote this patch, I clearly also should have
proofread it more carefully before sending it to the list ;).

Junio -- assuming that you are comfortable taking this series as-is (and
please let me know if you are not), would you mind fixing up this typo
as you apply it?

> > diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
> > index b629a211f9..95379b1d4e 100755
> > --- a/t/perf/p5310-pack-bitmaps.sh
> > +++ b/t/perf/p5310-pack-bitmaps.sh
> > @@ -95,4 +95,9 @@ test_perf 'pack to file (partial bitmap)' '
> >  	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
> >  '
> >
> > +test_perf 'rev-list with tree filter (partial bitmap)' '
> > +	git rev-list --use-bitmap-index --count --objects --all \
> > +		--filter=tree:0 >/dev/null
> > +'
>
> This covers perf testing of this partial-bitmap state, but we shoudl
> make sure that we are covering correctness, too. I think so, because
> t6113 creates a similar state for all of its tests.

Yeah, we are covered there. The last three tests in t6613 cover
'--filter=tree:0', bot with and without specified objects (to make sure
that named objects don't get culled out by the filter), as well as the
'--filter=tree:1', to make sure that we didn't break that.

> -Peff

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 195ee8cad0..4077e731e8 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -506,7 +506,8 @@  static int should_include(struct commit *commit, void *_data)
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
-				   struct bitmap *seen)
+				   struct bitmap *seen,
+				   struct list_objects_filter_options *filter)
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
@@ -599,8 +600,9 @@  static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list(revs, show_commit, show_object,
-				     &show_data);
+		traverse_commit_list_filtered(filter, revs,
+					      show_commit, show_object,
+					      &show_data, NULL);
 	}
 
 	return base;
@@ -999,7 +1001,8 @@  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 
 	if (haves) {
 		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
+		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL,
+					    filter);
 		reset_revision_walk();
 		revs->ignore_missing_links = 0;
 
@@ -1007,7 +1010,8 @@  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			BUG("failed to perform bitmap walk");
 	}
 
-	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
+	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap,
+				    filter);
 
 	if (!wants_bitmap)
 		BUG("failed to perform bitmap walk");
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index b629a211f9..95379b1d4e 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -95,4 +95,9 @@  test_perf 'pack to file (partial bitmap)' '
 	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
 '
 
+test_perf 'rev-list with tree filter (partial bitmap)' '
+	git rev-list --use-bitmap-index --count --objects --all \
+		--filter=tree:0 >/dev/null
+'
+
 test_done