Message ID | d60411d524656f4680ac578765b2a8704325a060.1610129796.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-revindex: prepare for on-disk reverse index | expand |
On 1/8/2021 1:17 PM, Taylor Blau wrote: > -int find_revindex_position(struct packed_git *p, off_t ofs) > +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) > { > int lo = 0; > - int hi = p->num_objects + 1; > - const struct revindex_entry *revindex = p->revindex; > + int hi; > + const struct revindex_entry *revindex; > + > + if (load_pack_revindex(p) < 0) > + return -1; > + > + hi = p->num_objects + 1; > + revindex = p->revindex; > > do { > const unsigned mi = lo + (hi - lo) / 2; > if (revindex[mi].offset == ofs) { > - return mi; > + *pos = mi; > + return 0; > } else if (ofs < revindex[mi].offset) > hi = mi; > else > @@ -189,20 +196,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs) > return -1; > } > > -int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) > -{ > - int ret; > - > - if (load_pack_revindex(p) < 0) > - return -1; > - > - ret = find_revindex_position(p, ofs); > - if (ret < 0) > - return -1; > - *pos = ret; > - return 0; > -} > - Not that this is new to the current patch, but this patch made me wonder if we should initialize *pos = -1 in the case of a failure to find the position? A correct caller should not use the value if they are checking for the fail-to-find case properly. But, I could see someone making a mistake and having trouble diagnosing the problem because their position variable was initialized to zero or a previous successful case. Thanks, -Stolee
On Mon, Jan 11, 2021 at 06:57:00AM -0500, Derrick Stolee wrote: > On 1/8/2021 1:17 PM, Taylor Blau wrote: > > -int find_revindex_position(struct packed_git *p, off_t ofs) > > +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) > > { > > int lo = 0; > > - int hi = p->num_objects + 1; > > - const struct revindex_entry *revindex = p->revindex; > > + int hi; > > + const struct revindex_entry *revindex; > > + > > + if (load_pack_revindex(p) < 0) > > + return -1; > > + > > + hi = p->num_objects + 1; > > + revindex = p->revindex; > > > > do { > > const unsigned mi = lo + (hi - lo) / 2; > > if (revindex[mi].offset == ofs) { > > - return mi; > > + *pos = mi; > > + return 0; > > } else if (ofs < revindex[mi].offset) > > hi = mi; > > else > > @@ -189,20 +196,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs) > > return -1; > > } > > > > -int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) > > -{ > > - int ret; > > - > > - if (load_pack_revindex(p) < 0) > > - return -1; > > - > > - ret = find_revindex_position(p, ofs); > > - if (ret < 0) > > - return -1; > > - *pos = ret; > > - return 0; > > -} > > - > > Not that this is new to the current patch, but this patch made me > wonder if we should initialize *pos = -1 in the case of a failure > to find the position? A correct caller should not use the value > if they are checking for the fail-to-find case properly. But, I > could see someone making a mistake and having trouble diagnosing > the problem because their position variable was initialized to > zero or a previous successful case. *pos = -1 may be more confusing than clarifying since pos is unsigned. It would be nice if there was a clear signal beyond returning a negative value. I guess you could take a double pointer here which would allow you to assign NULL, but that feels rather cumbersome as a means to catch callers who failed to check the return value. It does raise the argument of whether or not we should allow the program to continue at all if 'ret < 0' (i.e., 'offset_to_pack_pos()' either 'die()'s or returns a usable uint32_t), but I'm OK with the current behavior. > Thanks, > -Stolee Thanks, Taylor
On 1/11/2021 11:27 AM, Taylor Blau wrote: > On Mon, Jan 11, 2021 at 06:57:00AM -0500, Derrick Stolee wrote: >> Not that this is new to the current patch, but this patch made me >> wonder if we should initialize *pos = -1 in the case of a failure >> to find the position? A correct caller should not use the value >> if they are checking for the fail-to-find case properly. But, I >> could see someone making a mistake and having trouble diagnosing >> the problem because their position variable was initialized to >> zero or a previous successful case. > > *pos = -1 may be more confusing than clarifying since pos is unsigned. RIGHT. My bad. > It would be nice if there was a clear signal beyond returning a negative > value. I guess you could take a double pointer here which would allow > you to assign NULL, but that feels rather cumbersome as a means to catch > callers who failed to check the return value. > > It does raise the argument of whether or not we should allow the > program to continue at all if 'ret < 0' (i.e., 'offset_to_pack_pos()' > either 'die()'s or returns a usable uint32_t), but I'm OK with the > current behavior. I was thinking "*pos = -1" was a free way to "help" a developer who uses the API incorrectly, but it's _not_ free. Ignore me. Thanks, -Stolee
On Fri, Jan 08, 2021 at 01:17:57PM -0500, Taylor Blau wrote: > Now that all 'find_revindex_position()' callers have been removed (and > converted to the more descriptive 'offset_to_pack_pos()'), it is almost > safe to get rid of 'find_revindex_position()' entirely. Almost, except > for the fact that 'offset_to_pack_pos()' calls > 'find_revindex_position()'. > > Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and > then remove 'find_revindex_position()' entirely. Sounds good. > This is a straightforward refactoring with one minor snag. > 'offset_to_pack_pos()' used to load the index before calling > 'find_revindex_position()'. That means that by the time > 'find_revindex_position()' starts executing, 'p->num_objects' can be > safely read. After inlining, be careful to not read 'p->num_objects' > until _after_ 'load_pack_revindex()' (which loads the index as a > side-effect) has been called. Good catch. We might want to drop the initialization of "lo": > int lo = 0; > - int hi = p->num_objects + 1; down to here: > + hi = p->num_objects + 1; to maintain symmetry (though it's quite a minor point). I notice these are signed ints, but we've taken care to use uint32_t elsewhere for positions. Shouldn't these be uint32_t, also (or at least unsigned)? -Peff
On Tue, Jan 12, 2021 at 04:32:39AM -0500, Jeff King wrote: > Good catch. We might want to drop the initialization of "lo": > > > int lo = 0; > > - int hi = p->num_objects + 1; > > down to here: > > > + hi = p->num_objects + 1; > to maintain symmetry (though it's quite a minor point). :-). I agree it's a minor point, but I think that the symmetry is nice, so it's worth doing. > I notice these are signed ints, but we've taken care to use uint32_t > elsewhere for positions. Shouldn't these be uint32_t, also (or at least > unsigned)? I'll let both of these be an raw unsigned, since the midpoint is already labeled as an unsigned. > -Peff Thanks, Taylor
On Tue, Jan 12, 2021 at 11:59:42AM -0500, Taylor Blau wrote: > > I notice these are signed ints, but we've taken care to use uint32_t > > elsewhere for positions. Shouldn't these be uint32_t, also (or at least > > unsigned)? > > I'll let both of these be an raw unsigned, since the midpoint is already > labeled as an unsigned. Yeah, I found that existing code doubly weird, since "mi" must by definition be bounded by "lo" and "hi". :) Looks like the bug was introduced in 92e5c77c37 (revindex: export new APIs, 2013-10-24), which hacked up the existing loop into a new function. We should have caught it then (back then we were a lot less careful about types, IMHO). Also, the subject line of that commit is giving me deja vu. ;) -Peff
diff --git a/pack-revindex.c b/pack-revindex.c index 4e42238906..9392c4be73 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -169,16 +169,23 @@ int load_pack_revindex(struct packed_git *p) return 0; } -int find_revindex_position(struct packed_git *p, off_t ofs) +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) { int lo = 0; - int hi = p->num_objects + 1; - const struct revindex_entry *revindex = p->revindex; + int hi; + const struct revindex_entry *revindex; + + if (load_pack_revindex(p) < 0) + return -1; + + hi = p->num_objects + 1; + revindex = p->revindex; do { const unsigned mi = lo + (hi - lo) / 2; if (revindex[mi].offset == ofs) { - return mi; + *pos = mi; + return 0; } else if (ofs < revindex[mi].offset) hi = mi; else @@ -189,20 +196,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs) return -1; } -int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos) -{ - int ret; - - if (load_pack_revindex(p) < 0) - return -1; - - ret = find_revindex_position(p, ofs); - if (ret < 0) - return -1; - *pos = ret; - return 0; -} - uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos) { if (!p->revindex) diff --git a/pack-revindex.h b/pack-revindex.h index 07c1a7a3c8..b5dd114fd5 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -9,7 +9,6 @@ struct revindex_entry { }; int load_pack_revindex(struct packed_git *p); -int find_revindex_position(struct packed_git *p, off_t ofs); int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos); uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
Now that all 'find_revindex_position()' callers have been removed (and converted to the more descriptive 'offset_to_pack_pos()'), it is almost safe to get rid of 'find_revindex_position()' entirely. Almost, except for the fact that 'offset_to_pack_pos()' calls 'find_revindex_position()'. Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and then remove 'find_revindex_position()' entirely. This is a straightforward refactoring with one minor snag. 'offset_to_pack_pos()' used to load the index before calling 'find_revindex_position()'. That means that by the time 'find_revindex_position()' starts executing, 'p->num_objects' can be safely read. After inlining, be careful to not read 'p->num_objects' until _after_ 'load_pack_revindex()' (which loads the index as a side-effect) has been called. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- pack-revindex.c | 29 +++++++++++------------------ pack-revindex.h | 1 - 2 files changed, 11 insertions(+), 19 deletions(-)