diff mbox series

[11/24] pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature

Message ID 432854b27c6731bd6ab1fa739b3a086ec0a90be8.1701198172.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-objects: multi-pack verbatim reuse | expand

Commit Message

Taylor Blau Nov. 28, 2023, 7:08 p.m. UTC
The signature of `reuse_partial_packfile_from_bitmap()` currently takes
in a bitmap, as well as three output parameters (filled through
pointers, and passed as arguments), and also returns an integer result.

The output parameters are filled out with: (a) the packfile used for
pack-reuse, (b) the number of objects from that pack that we can reuse,
and (c) a bitmap indicating which objects we can reuse. The return value
is either -1 (when there are no objects to reuse), or 0 (when there is
at least one object to reuse).

Some of these parameters are redundant. Notably, we can infer from the
bitmap how many objects are reused by calling bitmap_popcount(). And we
can similar compute the return value based on that number as well.

As such, clean up the signature of this function to drop the "*entries"
parameter, as well as the int return value, since the single caller of
this function can infer these values themself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 16 +++++++++-------
 pack-bitmap.c          | 16 +++++++---------
 pack-bitmap.h          |  7 +++----
 3 files changed, 19 insertions(+), 20 deletions(-)

Comments

Patrick Steinhardt Dec. 7, 2023, 1:13 p.m. UTC | #1
On Tue, Nov 28, 2023 at 02:08:24PM -0500, Taylor Blau wrote:
> The signature of `reuse_partial_packfile_from_bitmap()` currently takes
> in a bitmap, as well as three output parameters (filled through
> pointers, and passed as arguments), and also returns an integer result.
> 
> The output parameters are filled out with: (a) the packfile used for
> pack-reuse, (b) the number of objects from that pack that we can reuse,
> and (c) a bitmap indicating which objects we can reuse. The return value
> is either -1 (when there are no objects to reuse), or 0 (when there is
> at least one object to reuse).
> 
> Some of these parameters are redundant. Notably, we can infer from the
> bitmap how many objects are reused by calling bitmap_popcount(). And we
> can similar compute the return value based on that number as well.
> 
> As such, clean up the signature of this function to drop the "*entries"
> parameter, as well as the int return value, since the single caller of
> this function can infer these values themself.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 16 +++++++++-------
>  pack-bitmap.c          | 16 +++++++---------
>  pack-bitmap.h          |  7 +++----
>  3 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 107154db34..2bb1b64e8f 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3946,13 +3946,15 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
>  	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
>  		return -1;
>  
> -	if (pack_options_allow_reuse() &&
> -	    !reuse_partial_packfile_from_bitmap(
> -			bitmap_git,
> -			&reuse_packfile,
> -			&reuse_packfile_objects,
> -			&reuse_packfile_bitmap)) {
> -		assert(reuse_packfile_objects);
> +	if (pack_options_allow_reuse())
> +		reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile,
> +						   &reuse_packfile_bitmap);
> +
> +	if (reuse_packfile) {
> +		reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap);
> +		if (!reuse_packfile_objects)
> +			BUG("expected non-empty reuse bitmap");

We're now re-computing `bitmap_popcount()` for the bitmap a second time.
But I really don't think this is ever going to be a problem in practice
given that it only does a bunch of math. Any performance regression
would thus ultimately be drowned out by everything else.

In other words: this is probably fine.

Patrick
Taylor Blau Dec. 7, 2023, 2:36 p.m. UTC | #2
On Thu, Dec 07, 2023 at 02:13:19PM +0100, Patrick Steinhardt wrote:
> > +	if (reuse_packfile) {
> > +		reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap);
> > +		if (!reuse_packfile_objects)
> > +			BUG("expected non-empty reuse bitmap");
>
> We're now re-computing `bitmap_popcount()` for the bitmap a second time.
> But I really don't think this is ever going to be a problem in practice
> given that it only does a bunch of math. Any performance regression
> would thus ultimately be drowned out by everything else.
>
> In other words: this is probably fine.

I definitely agree that any performance regression from calling
bitmap_popcount() twice would be drowned out by the rest of what
pack-objects is doing.

For what it's worth:

- The bitmap_popcount() call is a loop over ewah_bit_popcount64() for
  each of the allocated words. And the latter is more or less three
  copies of:

      b7:	55 55 55
      ba:	48 23 45 f8          	and    -0x8(%rbp),%rax
      be:	48 8b 55 f8          	mov    -0x8(%rbp),%rdx
      c2:	48 89 d1             	mov    %rdx,%rcx
      c5:	48 d1 e9             	shr    %rcx
      c8:	48 ba 55 55 55 55 55 	movabs $0x5555555555555555,%rdx
      cf:	55 55 55
      d2:	48 21 ca             	and    %rcx,%rdx
      d5:	48 01 d0             	add    %rdx,%rax
      d8:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
      dc:	48 b8 33 33 33 33 33 	movabs $0x3333333333333333,%rax

  Followed by:

     144:	48 0f af c2          	imul   %rdx,%rax
     148:	48 c1 e8 38          	shr    $0x38,%rax
     14c:	5d                   	pop    %rbp
     14d:	c3                   	ret

  With the usual x86 ABI preamble and postamble. So this should be an
  extremely cheap function to compute.

- But, the earlier bitmap_popcount() call in
  reuse_partial_packfile_from_bitmap() is not necessary, since we only
  care whether or not there are _any_ bits set in the bitmap, not how
  many of them there are.

  So we could write something like `bitmap_empty(reuse)` instead, which
  would be much cheaper (again, not that I think we'll notice this
  either way, but throwing away the result of bitmap_popcount() and
  calling it twice does leave me a little unsatisfied).

So I think we could reasonably do something like:

--- 8< ---
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 7b525b1ecd..ac7e0af622 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -169,6 +169,15 @@ size_t bitmap_popcount(struct bitmap *self)
 	return count;
 }

+int bitmap_is_empty(struct bitmap *self)
+{
+	size_t i;
+	for (i = 0; i < self->word_alloc; i++)
+		if (self->words[i])
+			return 0;
+	return 1;
+}
+
 int bitmap_equals(struct bitmap *self, struct bitmap *other)
 {
 	struct bitmap *big, *small;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 7eb8b9b630..c11d76c6f3 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -189,5 +189,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);

 size_t bitmap_popcount(struct bitmap *self);
+int bitmap_is_empty(struct bitmap *self);

 #endif
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 614fc09a4e..e50b322779 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2045,7 +2045,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,

 	reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse);

-	if (!bitmap_popcount(reuse)) {
+	if (bitmap_is_empty(reuse)) {
 		free(packs);
 		bitmap_free(reuse);
 		return;
--- >8 ---

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 107154db34..2bb1b64e8f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3946,13 +3946,15 @@  static int get_object_list_from_bitmap(struct rev_info *revs)
 	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
 		return -1;
 
-	if (pack_options_allow_reuse() &&
-	    !reuse_partial_packfile_from_bitmap(
-			bitmap_git,
-			&reuse_packfile,
-			&reuse_packfile_objects,
-			&reuse_packfile_bitmap)) {
-		assert(reuse_packfile_objects);
+	if (pack_options_allow_reuse())
+		reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile,
+						   &reuse_packfile_bitmap);
+
+	if (reuse_packfile) {
+		reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap);
+		if (!reuse_packfile_objects)
+			BUG("expected non-empty reuse bitmap");
+
 		nr_result += reuse_packfile_objects;
 		nr_seen += reuse_packfile_objects;
 		display_progress(progress_state, nr_seen);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2ebe2c314e..614fc09a4e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1988,10 +1988,9 @@  static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
 	unuse_pack(&w_curs);
 }
 
-int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
-				       struct packed_git **packfile_out,
-				       uint32_t *entries,
-				       struct bitmap **reuse_out)
+void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
+					struct packed_git **packfile_out,
+					struct bitmap **reuse_out)
 {
 	struct repository *r = the_repository;
 	struct bitmapped_pack *packs = NULL;
@@ -2013,7 +2012,7 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				warning(_("unable to load pack: '%s', disabling pack-reuse"),
 					bitmap_git->midx->pack_names[i]);
 				free(packs);
-				return -1;
+				return;
 			}
 			if (!pack.bitmap_nr)
 				continue; /* no objects from this pack */
@@ -2046,10 +2045,10 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 
 	reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse);
 
-	*entries = bitmap_popcount(reuse);
-	if (!*entries) {
+	if (!bitmap_popcount(reuse)) {
+		free(packs);
 		bitmap_free(reuse);
-		return -1;
+		return;
 	}
 
 	/*
@@ -2059,7 +2058,6 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	bitmap_and_not(result, reuse);
 	*packfile_out = packs[0].p;
 	*reuse_out = reuse;
-	return 0;
 }
 
 int bitmap_walk_contains(struct bitmap_index *bitmap_git,
diff --git a/pack-bitmap.h b/pack-bitmap.h
index b7fa1a42a9..5bc1ca5b65 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -78,10 +78,9 @@  int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 					 int filter_provided_objects);
 uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git);
-int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
-				       struct packed_git **packfile,
-				       uint32_t *entries,
-				       struct bitmap **reuse_out);
+void reuse_partial_packfile_from_bitmap(struct bitmap_index *,
+					struct packed_git **packfile,
+					struct bitmap **reuse_out);
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
 			     kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);