Message ID | 832fd0e8dc3a37e36b3d59085e448f8de84ce4b4.1741983492.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | midx: incremental multi-pack indexes, part two | expand |
On Fri, Mar 14, 2025 at 04:18:31PM -0400, Taylor Blau wrote: > The pack-bitmap machinery uses `bitmap_for_commit()` to locate the > EWAH-compressed bitmap corresponding to some given commit object. > > Teach this function about incremental MIDX bitmaps by teaching it to > recur on earlier bitmap layers when it fails to find a given commit in > the current layer. > > The changes to do so are as follows: > > - Avoid initializing hash_pos at its declaration, since > bitmap_for_commit() is now a recursive function and may receive a > NULL bitmap_index pointer as its first argument. > > - In cases where we would previously return NULL (to indicate that a > lookup failed and the given bitmap_index does not contain an entry > corresponding to the given commit), recursively call the function on > the previous bitmap layer. This makes sense, though it does make me wonder if we could/should store a (midx/pack,pos) pair. I.e., a master hash table stored once for the whole midx stack. And then you wouldn't need to recurse; it would just be a single lookup. Or would that work badly with the lazy nature? You'd need to load all of the layers to fill it (rather than doing each incrementally). OTOH, if you ask for the bitmap for commit X you're eventually going to have to figure out what's in all of the layers as soon as you have a miss and have to check them all. And I think the lookup table extension is what's supposed to make that cheap-ish. I dunno. It has been a long time since I dug into this (and the whole khash-of-commits thing is just a hack around the lack of the lookup table in the first place, I think?). -Peff
On Mon, Mar 17, 2025 at 09:38:23PM -0400, Jeff King wrote: > On Fri, Mar 14, 2025 at 04:18:31PM -0400, Taylor Blau wrote: > > > The pack-bitmap machinery uses `bitmap_for_commit()` to locate the > > EWAH-compressed bitmap corresponding to some given commit object. > > > > Teach this function about incremental MIDX bitmaps by teaching it to > > recur on earlier bitmap layers when it fails to find a given commit in > > the current layer. > > > > The changes to do so are as follows: > > > > - Avoid initializing hash_pos at its declaration, since > > bitmap_for_commit() is now a recursive function and may receive a > > NULL bitmap_index pointer as its first argument. > > > > - In cases where we would previously return NULL (to indicate that a > > lookup failed and the given bitmap_index does not contain an entry > > corresponding to the given commit), recursively call the function on > > the previous bitmap layer. > > This makes sense, though it does make me wonder if we could/should store > a (midx/pack,pos) pair. I.e., a master hash table stored once for the > whole midx stack. And then you wouldn't need to recurse; it would just > be a single lookup. > > Or would that work badly with the lazy nature? You'd need to load all of > the layers to fill it (rather than doing each incrementally). OTOH, if > you ask for the bitmap for commit X you're eventually going to have to > figure out what's in all of the layers as soon as you have a miss and > have to check them all. And I think the lookup table extension is what's > supposed to make that cheap-ish. I think that it's a good idea, though TBH I think there is even more room for improvement there, like recording cache misses. I suspect the details are fiddly enough that I'd rather tackle them outside of this already-fiddly series, though ;-). Thanks, Taylor
diff --git a/pack-bitmap.c b/pack-bitmap.c index 72fb11d014..615d5de85e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -941,18 +941,21 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, struct commit *commit) { - khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps, - commit->object.oid); + khiter_t hash_pos; + if (!bitmap_git) + return NULL; + + hash_pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid); if (hash_pos >= kh_end(bitmap_git->bitmaps)) { struct stored_bitmap *bitmap = NULL; if (!bitmap_git->table_lookup) - return NULL; + return bitmap_for_commit(bitmap_git->base, commit); /* this is a fairly hot codepath - no trace2_region please */ /* NEEDSWORK: cache misses aren't recorded */ bitmap = lazy_bitmap_for_commit(bitmap_git, commit); if (!bitmap) - return NULL; + return bitmap_for_commit(bitmap_git->base, commit); return lookup_stored_bitmap(bitmap); } return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the EWAH-compressed bitmap corresponding to some given commit object. Teach this function about incremental MIDX bitmaps by teaching it to recur on earlier bitmap layers when it fails to find a given commit in the current layer. The changes to do so are as follows: - Avoid initializing hash_pos at its declaration, since bitmap_for_commit() is now a recursive function and may receive a NULL bitmap_index pointer as its first argument. - In cases where we would previously return NULL (to indicate that a lookup failed and the given bitmap_index does not contain an entry corresponding to the given commit), recursively call the function on the previous bitmap layer. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- pack-bitmap.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)