diff mbox series

[v4,04/13] pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs

Message ID 832fd0e8dc3a37e36b3d59085e448f8de84ce4b4.1741983492.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: incremental multi-pack indexes, part two | expand

Commit Message

Taylor Blau March 14, 2025, 8:18 p.m. UTC
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.

Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.

The changes to do so are as follows:

  - Avoid initializing hash_pos at its declaration, since
    bitmap_for_commit() is now a recursive function and may receive a
    NULL bitmap_index pointer as its first argument.

  - In cases where we would previously return NULL (to indicate that a
    lookup failed and the given bitmap_index does not contain an entry
    corresponding to the given commit), recursively call the function on
    the previous bitmap layer.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Jeff King March 18, 2025, 1:38 a.m. UTC | #1
On Fri, Mar 14, 2025 at 04:18:31PM -0400, Taylor Blau wrote:

> The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
> EWAH-compressed bitmap corresponding to some given commit object.
> 
> Teach this function about incremental MIDX bitmaps by teaching it to
> recur on earlier bitmap layers when it fails to find a given commit in
> the current layer.
> 
> The changes to do so are as follows:
> 
>   - Avoid initializing hash_pos at its declaration, since
>     bitmap_for_commit() is now a recursive function and may receive a
>     NULL bitmap_index pointer as its first argument.
> 
>   - In cases where we would previously return NULL (to indicate that a
>     lookup failed and the given bitmap_index does not contain an entry
>     corresponding to the given commit), recursively call the function on
>     the previous bitmap layer.

This makes sense, though it does make me wonder if we could/should store
a (midx/pack,pos) pair. I.e., a master hash table stored once for the
whole midx stack. And then you wouldn't need to recurse; it would just
be a single lookup.

Or would that work badly with the lazy nature? You'd need to load all of
the layers to fill it (rather than doing each incrementally). OTOH, if
you ask for the bitmap for commit X you're eventually going to have to
figure out what's in all of the layers as soon as you have a miss and
have to check them all. And I think the lookup table extension is what's
supposed to make that cheap-ish.

I dunno. It has been a long time since I dug into this (and the whole
khash-of-commits thing is just a hack around the lack of the lookup
table in the first place, I think?).

-Peff
Taylor Blau March 19, 2025, 12:13 a.m. UTC | #2
On Mon, Mar 17, 2025 at 09:38:23PM -0400, Jeff King wrote:
> On Fri, Mar 14, 2025 at 04:18:31PM -0400, Taylor Blau wrote:
>
> > The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
> > EWAH-compressed bitmap corresponding to some given commit object.
> >
> > Teach this function about incremental MIDX bitmaps by teaching it to
> > recur on earlier bitmap layers when it fails to find a given commit in
> > the current layer.
> >
> > The changes to do so are as follows:
> >
> >   - Avoid initializing hash_pos at its declaration, since
> >     bitmap_for_commit() is now a recursive function and may receive a
> >     NULL bitmap_index pointer as its first argument.
> >
> >   - In cases where we would previously return NULL (to indicate that a
> >     lookup failed and the given bitmap_index does not contain an entry
> >     corresponding to the given commit), recursively call the function on
> >     the previous bitmap layer.
>
> This makes sense, though it does make me wonder if we could/should store
> a (midx/pack,pos) pair. I.e., a master hash table stored once for the
> whole midx stack. And then you wouldn't need to recurse; it would just
> be a single lookup.
>
> Or would that work badly with the lazy nature? You'd need to load all of
> the layers to fill it (rather than doing each incrementally). OTOH, if
> you ask for the bitmap for commit X you're eventually going to have to
> figure out what's in all of the layers as soon as you have a miss and
> have to check them all. And I think the lookup table extension is what's
> supposed to make that cheap-ish.

I think that it's a good idea, though TBH I think there is even more
room for improvement there, like recording cache misses. I suspect the
details are fiddly enough that I'd rather tackle them outside of this
already-fiddly series, though ;-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 72fb11d014..615d5de85e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -941,18 +941,21 @@  static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit)
 {
-	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
-					   commit->object.oid);
+	khiter_t hash_pos;
+	if (!bitmap_git)
+		return NULL;
+
+	hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
 	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
 		struct stored_bitmap *bitmap = NULL;
 		if (!bitmap_git->table_lookup)
-			return NULL;
+			return bitmap_for_commit(bitmap_git->base, commit);
 
 		/* this is a fairly hot codepath - no trace2_region please */
 		/* NEEDSWORK: cache misses aren't recorded */
 		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
 		if (!bitmap)
-			return NULL;
+			return bitmap_for_commit(bitmap_git->base, commit);
 		return lookup_stored_bitmap(bitmap);
 	}
 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));