Message ID | eada1ffcfafc3fb57de80626e368672cb8b22318.1610129796.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-revindex: prepare for on-disk reverse index | expand |
On Fri, Jan 08, 2021 at 01:18:06PM -0500, Taylor Blau wrote: > To prepare for on-disk reverse indexes, remove a spot in > 'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct > packed_git'. > > Even though this use of the revindex pointer is within pack-revindex.c, > this clean up is still worth doing. Since the 'revindex' pointer will be > NULL when reading from an on-disk reverse index (instead the > 'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this > call-site would have to include a conditional to lookup the offset for > position 'mi' each iteration through the search. > > So instead of open-coding 'pack_pos_to_offset()', call it directly from > within 'offset_to_pack_pos()'. This definitely makes sense in the long run. I could take or leave it as a final patch in _this_ series (as opposed to the first patch in a subsequent series adding the rev files). > do { > const unsigned mi = lo + (hi - lo) / 2; > - if (revindex[mi].offset == ofs) { > + off_t got = pack_pos_to_offset(p, mi); They're both constant-time, so performance should be the same big-O. The function has extra BUG() checks. I doubt those are measurable in practice, though. -Peff
On Tue, Jan 12, 2021 at 04:37:09AM -0500, Jeff King wrote: > This definitely makes sense in the long run. I could take or leave it as > a final patch in _this_ series (as opposed to the first patch in a > subsequent series adding the rev files). > > > do { > > const unsigned mi = lo + (hi - lo) / 2; > > - if (revindex[mi].offset == ofs) { > > + off_t got = pack_pos_to_offset(p, mi); > > > They're both constant-time, so performance should be the same big-O. The > function has extra BUG() checks. I doubt those are measurable in > practice, though. Funny enough, I have moved this patch between the two so many times before submitting this. I tend to agree that I don't think it makes a difference in which series this patch goes, so I'm just as happy to leave it where it is and stop thinking about it ;-). If others have strong feelings, this can be dropped when queuing and I'll send it along as the first commit in the second series (which will have to be updated along with this one). > -Peff Thanks, Taylor
diff --git a/pack-revindex.c b/pack-revindex.c index 36ef276378..2cd9d632f1 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -178,20 +178,20 @@ int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) { int lo = 0; int hi; - const struct revindex_entry *revindex; if (load_pack_revindex(p) < 0) return -1; hi = p->num_objects + 1; - revindex = p->revindex; do { const unsigned mi = lo + (hi - lo) / 2; - if (revindex[mi].offset == ofs) { + off_t got = pack_pos_to_offset(p, mi); + + if (got == ofs) { *pos = mi; return 0; - } else if (ofs < revindex[mi].offset) + } else if (ofs < got) hi = mi; else lo = mi + 1;
To prepare for on-disk reverse indexes, remove a spot in 'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct packed_git'. Even though this use of the revindex pointer is within pack-revindex.c, this clean up is still worth doing. Since the 'revindex' pointer will be NULL when reading from an on-disk reverse index (instead the 'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this call-site would have to include a conditional to lookup the offset for position 'mi' each iteration through the search. So instead of open-coding 'pack_pos_to_offset()', call it directly from within 'offset_to_pack_pos()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- pack-revindex.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)