diff mbox series

[16/24] pack-objects: prepare `write_reused_pack()` for multi-pack reuse

Message ID 67a8196978244b56d4f60861f140b78c59d15e30.1701198172.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-objects: multi-pack verbatim reuse | expand

Commit Message

Taylor Blau Nov. 28, 2023, 7:08 p.m. UTC
The function `write_reused_pack()` within `builtin/pack-objects.c` is
responsible for performing pack-reuse on a single pack, and has two main
functions:

  - it dispatches a call to `write_reused_pack_verbatim()` to see if we
    can reuse portions of the packfile in whole-word chunks

  - for any remaining objects (that is, any objects that appear after
    the first "gap" in the bitmap), call write_reused_pack_one() on that
    object to record it for reuse.

Prepare this function for multi-pack reuse by removing the assumption
that the bit position corresponding to the first object being reused
from a given pack may not be at bit position zero.

The changes in this function are mostly straightforward. Initialize `i`
to the position of the first word to contain bits corresponding to that
reuse pack. In most situations, we throw the initialized value away,
since we end up replacing it with the return value from
write_reused_pack_verbatim(), moving us past the section of whole words
that we reused.

Likewise, modify the per-object loop to ignore any bits at the beginning
of the first word that do not belong to the pack currently being reused,
as well as skip to the "done" section once we have processed the last
bit corresponding to this pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Patrick Steinhardt Dec. 7, 2023, 1:13 p.m. UTC | #1
On Tue, Nov 28, 2023 at 02:08:37PM -0500, Taylor Blau wrote:
> The function `write_reused_pack()` within `builtin/pack-objects.c` is
> responsible for performing pack-reuse on a single pack, and has two main
> functions:
> 
>   - it dispatches a call to `write_reused_pack_verbatim()` to see if we
>     can reuse portions of the packfile in whole-word chunks
> 
>   - for any remaining objects (that is, any objects that appear after
>     the first "gap" in the bitmap), call write_reused_pack_one() on that
>     object to record it for reuse.
> 
> Prepare this function for multi-pack reuse by removing the assumption
> that the bit position corresponding to the first object being reused
> from a given pack may not be at bit position zero.

Is this double-negation intended? We remove the assumption that we start
reading at position zero, but the paragraph here states that we remove
the assumption that we do _not_ start at bit zero.

I'll stop reviewing here and will come back to this series somewhen next
week.

Patrick
Taylor Blau Dec. 7, 2023, 8:47 p.m. UTC | #2
On Thu, Dec 07, 2023 at 02:13:29PM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 28, 2023 at 02:08:37PM -0500, Taylor Blau wrote:
> > The function `write_reused_pack()` within `builtin/pack-objects.c` is
> > responsible for performing pack-reuse on a single pack, and has two main
> > functions:
> >
> >   - it dispatches a call to `write_reused_pack_verbatim()` to see if we
> >     can reuse portions of the packfile in whole-word chunks
> >
> >   - for any remaining objects (that is, any objects that appear after
> >     the first "gap" in the bitmap), call write_reused_pack_one() on that
> >     object to record it for reuse.
> >
> > Prepare this function for multi-pack reuse by removing the assumption
> > that the bit position corresponding to the first object being reused
> > from a given pack may not be at bit position zero.
>
> Is this double-negation intended? We remove the assumption that we start
> reading at position zero, but the paragraph here states that we remove
> the assumption that we do _not_ start at bit zero.

Oops, great catch. I'll s/may not/must in the last paragraph to clarify.

> I'll stop reviewing here and will come back to this series somewhen next
> week.

Thanks as usual for your review -- I appreciate you digging through this
rather dense series.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3b7704d062..b5e6f6377a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1128,7 +1128,7 @@  static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
 static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
 			      struct hashfile *f)
 {
-	size_t i = 0;
+	size_t i = reuse_packfile->bitmap_pos / BITS_IN_EWORD;
 	uint32_t offset;
 	off_t pack_start = hashfile_total(f) - sizeof(struct pack_header);
 	struct pack_window *w_curs = NULL;
@@ -1146,17 +1146,23 @@  static void write_reused_pack(struct bitmapped_pack *reuse_packfile,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			if (pos + offset < reuse_packfile->bitmap_pos)
+				continue;
+			if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr)
+				goto done;
 			/*
 			 * Can use bit positions directly, even for MIDX
 			 * bitmaps. See comment in try_partial_reuse()
 			 * for why.
 			 */
-			write_reused_pack_one(reuse_packfile->p, pos + offset,
+			write_reused_pack_one(reuse_packfile->p,
+					      pos + offset - reuse_packfile->bitmap_pos,
 					      f, pack_start, &w_curs);
 			display_progress(progress_state, ++written);
 		}
 	}
 
+done:
 	unuse_pack(&w_curs);
 }