diff mbox series

[4/5] pack-bitmap.c: avoid repeated `pack_pos_to_offset()` during reuse

Message ID afa25c8806b0e80f1d3ed46f29eb164cab4583ac.1724793201.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit db40e3c92b9ccd03d2263b12cf1824ffab8a1cce
Headers show
Series pack-objects: brown-paper-bag fixes for multi-pack reuse | expand

Commit Message

Taylor Blau Aug. 27, 2024, 9:13 p.m. UTC
When calling `try_partial_reuse()`, the (sole) caller from the function
`reuse_partial_packfile_from_bitmap_1()` has to translate its bit
position to a pack position.

In the MIDX bitmap case, the caller translates from the bit position, to
a position in the MIDX's pseudo-pack order (with `pack_pos_to_midx()`),
then get a pack offset (with `nth_midxed_offset()`) before finally
working backwards to get the pack position in the source pack by calling
`offset_to_pack_pos()`.

In the non-MIDX bitmap case, we can use the bit position as the pack
position directly (see the comment at the beginning of the
`reuse_partial_packfile_from_bitmap_1()` function for why).

In either case, the first thing that `try_partial_reuse()` does after
being called is determine the offset of the object at the given pack
position by calling `pack_pos_to_offset()`. But we already have that
information in the MIDX case!

Avoid re-computing that information by instead passing it in. In the
MIDX case, we already have that information stored. In the non-MIDX
case, the call to `pack_pos_to_offset()` moves from the function
`try_partial_reuse()` to its caller. In total, we'll save one call to
`pack_pos_to_offset()` when processing MIDX bitmaps.

(On my machine, there is a slight speed-up on the order of ~2ms, but it
is within the margin of error over 10 runs, so I think you'd have to
have a truly gigantic repository to confidently measure any significant
improvement here).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Sept. 4, 2024, 6:54 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> @@ -2055,17 +2055,18 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
>  			     struct bitmapped_pack *pack,
>  			     size_t bitmap_pos,
>  			     uint32_t pack_pos,
> +			     off_t offset,
>  			     struct bitmap *reuse,
>  			     struct pack_window **w_curs)
>  {
> -	off_t offset, delta_obj_offset;
> +	off_t delta_obj_offset;
>  	enum object_type type;
>  	unsigned long size;
>  
>  	if (pack_pos >= pack->p->num_objects)
>  		return -1; /* not actually in the pack */
>  
> -	offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos);

We are now passing redundant piece of information "offset" that
could be derived from two other parameters, which feels a bit icky,

I suspect that try_partial_reuse() can be taught not to require
pack_pos and instead work entirely on offset

 - The caller can pass a very large offset to represent "not
   actually in the pack" early return condition.

 - The logic to punt on a delta pointing backwards can be done by
   comparing the base_offset and offset, instead of comparing the
   base_pos and pack_pos.

but it may not be worth the effort, unless we are going to do
similar clean-up globally in all the codepaths that access
packfiles.

Thanks.
Taylor Blau Sept. 4, 2024, 7:28 p.m. UTC | #2
On Wed, Sep 04, 2024 at 11:54:18AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > @@ -2055,17 +2055,18 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
> >  			     struct bitmapped_pack *pack,
> >  			     size_t bitmap_pos,
> >  			     uint32_t pack_pos,
> > +			     off_t offset,
> >  			     struct bitmap *reuse,
> >  			     struct pack_window **w_curs)
> >  {
> > -	off_t offset, delta_obj_offset;
> > +	off_t delta_obj_offset;
> >  	enum object_type type;
> >  	unsigned long size;
> >
> >  	if (pack_pos >= pack->p->num_objects)
> >  		return -1; /* not actually in the pack */
> >
> > -	offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos);
>
> We are now passing redundant piece of information "offset" that
> could be derived from two other parameters, which feels a bit icky,

Yeah, I agree. The only spot we use pack_pos in this function is in the
hunk above, and in the check where we reject any deltas whose base has a
greater than or equal to bit position than the delta object's bit
position.

But I think both of those are redundant. The only caller in
reuse_partial_packfile_from_bitmap_1() only passes bit positions that
are set, so we'll never trip the bounds check above. For the latter
check, it would suffice to compare object offsets in the single-pack
case, since the bit positions corresponding to objects in a single pack
are derived from the ordering of their actual offset.

>  - The logic to punt on a delta pointing backwards can be done by
>    comparing the base_offset and offset, instead of comparing the
>    base_pos and pack_pos.

Ah, yes -- exactly ;-). I should have read the email in its entirety
before starting to respond.

> but it may not be worth the effort, unless we are going to do
> similar clean-up globally in all the codepaths that access
> packfiles.

I'm not so sure we can do it in all cases, but at least in the case of
try_partial_reuse(), doing something like this (which compiles and
passes "make test") is fine:

--- 8< ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9d9b8c4bfb..fc51f298d5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2054,7 +2054,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 static int try_partial_reuse(struct bitmap_index *bitmap_git,
 			     struct bitmapped_pack *pack,
 			     size_t bitmap_pos,
-			     uint32_t pack_pos,
 			     off_t offset,
 			     struct bitmap *reuse,
 			     struct pack_window **w_curs)
@@ -2063,9 +2062,6 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;

-	if (pack_pos >= pack->p->num_objects)
-		return -1; /* not actually in the pack */
-
 	delta_obj_offset = offset;
 	type = unpack_object_header(pack->p, w_curs, &offset, &size);
 	if (type < 0)
@@ -2121,8 +2117,13 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 			 * OFS_DELTA is the default). But let's double check to
 			 * make sure the pack wasn't written with odd
 			 * parameters.
+			 *
+			 * Since we're working on a single-pack bitmap, we can
+			 * use the object offset as a proxy for the bit
+			 * position, since the bits are ordered by their
+			 * positions within the pack.
 			 */
-			if (base_pos >= pack_pos)
+			if (base_offset >= offset)
 				return 0;
 			base_bitmap_pos = pack->bitmap_pos + base_pos;
 		}
@@ -2218,8 +2219,8 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
 				ofs = pack_pos_to_offset(pack->p, pack_pos);
 			}

-			if (try_partial_reuse(bitmap_git, pack, bit_pos,
-					      pack_pos, ofs, reuse, &w_curs) < 0) {
+			if (try_partial_reuse(bitmap_git, pack, bit_pos, ofs,
+					      reuse, &w_curs) < 0) {
 				/*
 				 * try_partial_reuse indicated we couldn't reuse
 				 * any bits, so there is no point in trying more
--- >8 ---

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 218d7ac2eb..9d9b8c4bfb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2055,17 +2055,18 @@  static int try_partial_reuse(struct bitmap_index *bitmap_git,
 			     struct bitmapped_pack *pack,
 			     size_t bitmap_pos,
 			     uint32_t pack_pos,
+			     off_t offset,
 			     struct bitmap *reuse,
 			     struct pack_window **w_curs)
 {
-	off_t offset, delta_obj_offset;
+	off_t delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
 	if (pack_pos >= pack->p->num_objects)
 		return -1; /* not actually in the pack */
 
-	offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos);
+	delta_obj_offset = offset;
 	type = unpack_object_header(pack->p, w_curs, &offset, &size);
 	if (type < 0)
 		return -1; /* broken packfile, punt */
@@ -2184,6 +2185,7 @@  static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
 		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
 			size_t bit_pos;
 			uint32_t pack_pos;
+			off_t ofs;
 
 			if (word >> offset == 0)
 				break;
@@ -2198,7 +2200,6 @@  static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
 
 			if (bitmap_is_midx(bitmap_git)) {
 				uint32_t midx_pos;
-				off_t ofs;
 
 				midx_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
 				ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
@@ -2213,10 +2214,12 @@  static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
 					BUG("advanced beyond the end of pack %s (%"PRIuMAX" > %"PRIu32")",
 					    pack_basename(pack->p), (uintmax_t)pack_pos,
 					    pack->p->num_objects);
+
+				ofs = pack_pos_to_offset(pack->p, pack_pos);
 			}
 
 			if (try_partial_reuse(bitmap_git, pack, bit_pos,
-					      pack_pos, reuse, &w_curs) < 0) {
+					      pack_pos, ofs, reuse, &w_curs) < 0) {
 				/*
 				 * try_partial_reuse indicated we couldn't reuse
 				 * any bits, so there is no point in trying more