diff mbox series

[v2,2/7] pack-bitmap.c: propagate namehash values from existing bitmaps

Message ID 2cd2f3aa5e74a3aefd6924c528cbfb578a510d31.1631657157.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: permute existing namehash values | expand

Commit Message

Taylor Blau Sept. 14, 2021, 10:06 p.m. UTC
When an old bitmap exists while writing a new one, we load it and build
a "reposition" table which maps bit positions of objects from the old
bitmap to their respective positions in the new bitmap. This can help
when we encounter a commit which was selected in both the old and new
bitmap, since we only need to permute its bit (not recompute it from
scratch).

We do not, however, repurpose existing namehash values in the case of
the hash-cache extension. There has been thus far no good reason to do
so, since all of the namehash values for objects in the new bitmap would
be populated during the traversal that was just performed by
pack-objects when generating single-pack reachability bitmaps.

But this isn't the case for multi-pack bitmaps, which are written via
`git multi-pack-index write --bitmap` and do not perform any traversal.
In this case all namehash values are set to zero, but we don't even
bother to check the `pack.writeBitmapHashcache` option anyway, so it
fails to matter.

There are two approaches we could take to fill in non-zero hash-cache
values:

  - have either the multi-pack-index builtin run its own
    traversal to attempt to fill in some values, or let a hypothetical
    caller (like `pack-objects` when `repack` eventually drives the
    `multi-pack-index` builtin) fill in the values they found during
    their traversal

  - or copy any existing namehash values that were stored in an
    existing bitmap to their corresponding positions in the new bitmap

In a system where a repository is generally repacked with `git repack
--geometric=<d>` and occasionally repacked with `git repack -a`, the
hash-cache coverage will tend towards all objects.

Since populating the hash-cache is additive (i.e., doing so only helps
our delta search), any intermediate lack of full coverage is just fine.
So let's start by just propagating any values from the existing
hash-cache if we see one.

The next patch will respect the `pack.writeBitmapHashcache` option while
writing MIDX bitmaps, and then test this new behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 04de387318..33a3732992 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1818,18 +1818,20 @@  uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	for (i = 0; i < num_objects; ++i) {
 		struct object_id oid;
 		struct object_entry *oe;
+		uint32_t index_pos;
 
 		if (bitmap_is_midx(bitmap_git))
-			nth_midxed_object_oid(&oid,
-					      bitmap_git->midx,
-					      pack_pos_to_midx(bitmap_git->midx, i));
+			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
 		else
-			nth_packed_object_id(&oid, bitmap_git->pack,
-					     pack_pos_to_index(bitmap_git->pack, i));
+			index_pos = pack_pos_to_index(bitmap_git->pack, i);
+		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
 		oe = packlist_find(mapping, &oid);
 
-		if (oe)
+		if (oe) {
 			reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
+			if (bitmap_git->hashes && !oe->hash)
+				oe->hash = get_be32(bitmap_git->hashes + index_pos);
+		}
 	}
 
 	return reposition;