Message ID | 432854b27c6731bd6ab1fa739b3a086ec0a90be8.1701198172.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-objects: multi-pack verbatim reuse | expand |
On Tue, Nov 28, 2023 at 02:08:24PM -0500, Taylor Blau wrote: > The signature of `reuse_partial_packfile_from_bitmap()` currently takes > in a bitmap, as well as three output parameters (filled through > pointers, and passed as arguments), and also returns an integer result. > > The output parameters are filled out with: (a) the packfile used for > pack-reuse, (b) the number of objects from that pack that we can reuse, > and (c) a bitmap indicating which objects we can reuse. The return value > is either -1 (when there are no objects to reuse), or 0 (when there is > at least one object to reuse). > > Some of these parameters are redundant. Notably, we can infer from the > bitmap how many objects are reused by calling bitmap_popcount(). And we > can similar compute the return value based on that number as well. > > As such, clean up the signature of this function to drop the "*entries" > parameter, as well as the int return value, since the single caller of > this function can infer these values themself. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > builtin/pack-objects.c | 16 +++++++++------- > pack-bitmap.c | 16 +++++++--------- > pack-bitmap.h | 7 +++---- > 3 files changed, 19 insertions(+), 20 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 107154db34..2bb1b64e8f 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -3946,13 +3946,15 @@ static int get_object_list_from_bitmap(struct rev_info *revs) > if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) > return -1; > > - if (pack_options_allow_reuse() && > - !reuse_partial_packfile_from_bitmap( > - bitmap_git, > - &reuse_packfile, > - &reuse_packfile_objects, > - &reuse_packfile_bitmap)) { > - assert(reuse_packfile_objects); > + if (pack_options_allow_reuse()) > + reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile, > + &reuse_packfile_bitmap); > + > + if (reuse_packfile) { > + reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); > + if (!reuse_packfile_objects) > + BUG("expected non-empty reuse bitmap"); We're now re-computing `bitmap_popcount()` for the bitmap a second time. But I really don't think this is ever going to be a problem in practice given that it only does a bunch of math. Any performance regression would thus ultimately be drowned out by everything else. In other words: this is probably fine. Patrick
On Thu, Dec 07, 2023 at 02:13:19PM +0100, Patrick Steinhardt wrote: > > + if (reuse_packfile) { > > + reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); > > + if (!reuse_packfile_objects) > > + BUG("expected non-empty reuse bitmap"); > > We're now re-computing `bitmap_popcount()` for the bitmap a second time. > But I really don't think this is ever going to be a problem in practice > given that it only does a bunch of math. Any performance regression > would thus ultimately be drowned out by everything else. > > In other words: this is probably fine. I definitely agree that any performance regression from calling bitmap_popcount() twice would be drowned out by the rest of what pack-objects is doing. For what it's worth: - The bitmap_popcount() call is a loop over ewah_bit_popcount64() for each of the allocated words. And the latter is more or less three copies of: b7: 55 55 55 ba: 48 23 45 f8 and -0x8(%rbp),%rax be: 48 8b 55 f8 mov -0x8(%rbp),%rdx c2: 48 89 d1 mov %rdx,%rcx c5: 48 d1 e9 shr %rcx c8: 48 ba 55 55 55 55 55 movabs $0x5555555555555555,%rdx cf: 55 55 55 d2: 48 21 ca and %rcx,%rdx d5: 48 01 d0 add %rdx,%rax d8: 48 89 45 f8 mov %rax,-0x8(%rbp) dc: 48 b8 33 33 33 33 33 movabs $0x3333333333333333,%rax Followed by: 144: 48 0f af c2 imul %rdx,%rax 148: 48 c1 e8 38 shr $0x38,%rax 14c: 5d pop %rbp 14d: c3 ret With the usual x86 ABI preamble and postamble. So this should be an extremely cheap function to compute. - But, the earlier bitmap_popcount() call in reuse_partial_packfile_from_bitmap() is not necessary, since we only care whether or not there are _any_ bits set in the bitmap, not how many of them there are. So we could write something like `bitmap_empty(reuse)` instead, which would be much cheaper (again, not that I think we'll notice this either way, but throwing away the result of bitmap_popcount() and calling it twice does leave me a little unsatisfied). So I think we could reasonably do something like: --- 8< --- diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 7b525b1ecd..ac7e0af622 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -169,6 +169,15 @@ size_t bitmap_popcount(struct bitmap *self) return count; } +int bitmap_is_empty(struct bitmap *self) +{ + size_t i; + for (i = 0; i < self->word_alloc; i++) + if (self->words[i]) + return 0; + return 1; +} + int bitmap_equals(struct bitmap *self, struct bitmap *other) { struct bitmap *big, *small; diff --git a/ewah/ewok.h b/ewah/ewok.h index 7eb8b9b630..c11d76c6f3 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -189,5 +189,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other); void bitmap_or(struct bitmap *self, const struct bitmap *other); size_t bitmap_popcount(struct bitmap *self); +int bitmap_is_empty(struct bitmap *self); #endif diff --git a/pack-bitmap.c b/pack-bitmap.c index 614fc09a4e..e50b322779 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2045,7 +2045,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); - if (!bitmap_popcount(reuse)) { + if (bitmap_is_empty(reuse)) { free(packs); bitmap_free(reuse); return; --- >8 --- Thanks, Taylor
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 107154db34..2bb1b64e8f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3946,13 +3946,15 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; - if (pack_options_allow_reuse() && - !reuse_partial_packfile_from_bitmap( - bitmap_git, - &reuse_packfile, - &reuse_packfile_objects, - &reuse_packfile_bitmap)) { - assert(reuse_packfile_objects); + if (pack_options_allow_reuse()) + reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile, + &reuse_packfile_bitmap); + + if (reuse_packfile) { + reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); + if (!reuse_packfile_objects) + BUG("expected non-empty reuse bitmap"); + nr_result += reuse_packfile_objects; nr_seen += reuse_packfile_objects; display_progress(progress_state, nr_seen); diff --git a/pack-bitmap.c b/pack-bitmap.c index 2ebe2c314e..614fc09a4e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1988,10 +1988,9 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git unuse_pack(&w_curs); } -int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, - uint32_t *entries, - struct bitmap **reuse_out) +void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct packed_git **packfile_out, + struct bitmap **reuse_out) { struct repository *r = the_repository; struct bitmapped_pack *packs = NULL; @@ -2013,7 +2012,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, warning(_("unable to load pack: '%s', disabling pack-reuse"), bitmap_git->midx->pack_names[i]); free(packs); - return -1; + return; } if (!pack.bitmap_nr) continue; /* no objects from this pack */ @@ -2046,10 +2045,10 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); - *entries = bitmap_popcount(reuse); - if (!*entries) { + if (!bitmap_popcount(reuse)) { + free(packs); bitmap_free(reuse); - return -1; + return; } /* @@ -2059,7 +2058,6 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, bitmap_and_not(result, reuse); *packfile_out = packs[0].p; *reuse_out = reuse; - return 0; } int bitmap_walk_contains(struct bitmap_index *bitmap_git, diff --git a/pack-bitmap.h b/pack-bitmap.h index b7fa1a42a9..5bc1ca5b65 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,10 +78,9 @@ int test_bitmap_hashes(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects); uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); -int reuse_partial_packfile_from_bitmap(struct bitmap_index *, - struct packed_git **packfile, - uint32_t *entries, - struct bitmap **reuse_out); +void reuse_partial_packfile_from_bitmap(struct bitmap_index *, + struct packed_git **packfile, + struct bitmap **reuse_out); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *);
The signature of `reuse_partial_packfile_from_bitmap()` currently takes in a bitmap, as well as three output parameters (filled through pointers, and passed as arguments), and also returns an integer result. The output parameters are filled out with: (a) the packfile used for pack-reuse, (b) the number of objects from that pack that we can reuse, and (c) a bitmap indicating which objects we can reuse. The return value is either -1 (when there are no objects to reuse), or 0 (when there is at least one object to reuse). Some of these parameters are redundant. Notably, we can infer from the bitmap how many objects are reused by calling bitmap_popcount(). And we can similar compute the return value based on that number as well. As such, clean up the signature of this function to drop the "*entries" parameter, as well as the int return value, since the single caller of this function can infer these values themself. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/pack-objects.c | 16 +++++++++------- pack-bitmap.c | 16 +++++++--------- pack-bitmap.h | 7 +++---- 3 files changed, 19 insertions(+), 20 deletions(-)