diff mbox series

[18/20] pack-revindex: remove unused 'find_revindex_position()'

Message ID d60411d524656f4680ac578765b2a8704325a060.1610129796.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-revindex: prepare for on-disk reverse index | expand

Commit Message

Taylor Blau Jan. 8, 2021, 6:17 p.m. UTC
Now that all 'find_revindex_position()' callers have been removed (and
converted to the more descriptive 'offset_to_pack_pos()'), it is almost
safe to get rid of 'find_revindex_position()' entirely. Almost, except
for the fact that 'offset_to_pack_pos()' calls
'find_revindex_position()'.

Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and
then remove 'find_revindex_position()' entirely.

This is a straightforward refactoring with one minor snag.
'offset_to_pack_pos()' used to load the index before calling
'find_revindex_position()'. That means that by the time
'find_revindex_position()' starts executing, 'p->num_objects' can be
safely read. After inlining, be careful to not read 'p->num_objects'
until _after_ 'load_pack_revindex()' (which loads the index as a
side-effect) has been called.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-revindex.c | 29 +++++++++++------------------
 pack-revindex.h |  1 -
 2 files changed, 11 insertions(+), 19 deletions(-)

Comments

Derrick Stolee Jan. 11, 2021, 11:57 a.m. UTC | #1
On 1/8/2021 1:17 PM, Taylor Blau wrote:
> -int find_revindex_position(struct packed_git *p, off_t ofs)
> +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
>  {
>  	int lo = 0;
> -	int hi = p->num_objects + 1;
> -	const struct revindex_entry *revindex = p->revindex;
> +	int hi;
> +	const struct revindex_entry *revindex;
> +
> +	if (load_pack_revindex(p) < 0)
> +		return -1;
> +
> +	hi = p->num_objects + 1;
> +	revindex = p->revindex;
>  
>  	do {
>  		const unsigned mi = lo + (hi - lo) / 2;
>  		if (revindex[mi].offset == ofs) {
> -			return mi;
> +			*pos = mi;
> +			return 0;
>  		} else if (ofs < revindex[mi].offset)
>  			hi = mi;
>  		else
> @@ -189,20 +196,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs)
>  	return -1;
>  }
>  
> -int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
> -{
> -	int ret;
> -
> -	if (load_pack_revindex(p) < 0)
> -		return -1;
> -
> -	ret = find_revindex_position(p, ofs);
> -	if (ret < 0)
> -		return -1;
> -	*pos = ret;
> -	return 0;
> -}
> -

Not that this is new to the current patch, but this patch made me
wonder if we should initialize *pos = -1 in the case of a failure
to find the position? A correct caller should not use the value
if they are checking for the fail-to-find case properly. But, I
could see someone making a mistake and having trouble diagnosing
the problem because their position variable was initialized to
zero or a previous successful case.

Thanks,
-Stolee
Taylor Blau Jan. 11, 2021, 4:27 p.m. UTC | #2
On Mon, Jan 11, 2021 at 06:57:00AM -0500, Derrick Stolee wrote:
> On 1/8/2021 1:17 PM, Taylor Blau wrote:
> > -int find_revindex_position(struct packed_git *p, off_t ofs)
> > +int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
> >  {
> >  	int lo = 0;
> > -	int hi = p->num_objects + 1;
> > -	const struct revindex_entry *revindex = p->revindex;
> > +	int hi;
> > +	const struct revindex_entry *revindex;
> > +
> > +	if (load_pack_revindex(p) < 0)
> > +		return -1;
> > +
> > +	hi = p->num_objects + 1;
> > +	revindex = p->revindex;
> >
> >  	do {
> >  		const unsigned mi = lo + (hi - lo) / 2;
> >  		if (revindex[mi].offset == ofs) {
> > -			return mi;
> > +			*pos = mi;
> > +			return 0;
> >  		} else if (ofs < revindex[mi].offset)
> >  			hi = mi;
> >  		else
> > @@ -189,20 +196,6 @@ int find_revindex_position(struct packed_git *p, off_t ofs)
> >  	return -1;
> >  }
> >
> > -int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
> > -{
> > -	int ret;
> > -
> > -	if (load_pack_revindex(p) < 0)
> > -		return -1;
> > -
> > -	ret = find_revindex_position(p, ofs);
> > -	if (ret < 0)
> > -		return -1;
> > -	*pos = ret;
> > -	return 0;
> > -}
> > -
>
> Not that this is new to the current patch, but this patch made me
> wonder if we should initialize *pos = -1 in the case of a failure
> to find the position? A correct caller should not use the value
> if they are checking for the fail-to-find case properly. But, I
> could see someone making a mistake and having trouble diagnosing
> the problem because their position variable was initialized to
> zero or a previous successful case.

*pos = -1 may be more confusing than clarifying since pos is unsigned.

It would be nice if there was a clear signal beyond returning a negative
value. I guess you could take a double pointer here which would allow
you to assign NULL, but that feels rather cumbersome as a means to catch
callers who failed to check the return value.

It does raise the argument of whether or not we should allow the
program to continue at all if 'ret < 0' (i.e., 'offset_to_pack_pos()'
either 'die()'s or returns a usable uint32_t), but I'm OK with the
current behavior.

> Thanks,
> -Stolee

Thanks,
Taylor
Derrick Stolee Jan. 11, 2021, 5:11 p.m. UTC | #3
On 1/11/2021 11:27 AM, Taylor Blau wrote:
> On Mon, Jan 11, 2021 at 06:57:00AM -0500, Derrick Stolee wrote:
>> Not that this is new to the current patch, but this patch made me
>> wonder if we should initialize *pos = -1 in the case of a failure
>> to find the position? A correct caller should not use the value
>> if they are checking for the fail-to-find case properly. But, I
>> could see someone making a mistake and having trouble diagnosing
>> the problem because their position variable was initialized to
>> zero or a previous successful case.
> 
> *pos = -1 may be more confusing than clarifying since pos is unsigned.

RIGHT. My bad.

> It would be nice if there was a clear signal beyond returning a negative
> value. I guess you could take a double pointer here which would allow
> you to assign NULL, but that feels rather cumbersome as a means to catch
> callers who failed to check the return value.
> 
> It does raise the argument of whether or not we should allow the
> program to continue at all if 'ret < 0' (i.e., 'offset_to_pack_pos()'
> either 'die()'s or returns a usable uint32_t), but I'm OK with the
> current behavior.

I was thinking "*pos = -1" was a free way to "help" a developer
who uses the API incorrectly, but it's _not_ free. Ignore me.

Thanks,
-Stolee
Jeff King Jan. 12, 2021, 9:32 a.m. UTC | #4
On Fri, Jan 08, 2021 at 01:17:57PM -0500, Taylor Blau wrote:

> Now that all 'find_revindex_position()' callers have been removed (and
> converted to the more descriptive 'offset_to_pack_pos()'), it is almost
> safe to get rid of 'find_revindex_position()' entirely. Almost, except
> for the fact that 'offset_to_pack_pos()' calls
> 'find_revindex_position()'.
> 
> Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and
> then remove 'find_revindex_position()' entirely.

Sounds good.

> This is a straightforward refactoring with one minor snag.
> 'offset_to_pack_pos()' used to load the index before calling
> 'find_revindex_position()'. That means that by the time
> 'find_revindex_position()' starts executing, 'p->num_objects' can be
> safely read. After inlining, be careful to not read 'p->num_objects'
> until _after_ 'load_pack_revindex()' (which loads the index as a
> side-effect) has been called.

Good catch. We might want to drop the initialization of "lo":

>  	int lo = 0;
> -	int hi = p->num_objects + 1;

down to here:

> +	hi = p->num_objects + 1;

to maintain symmetry (though it's quite a minor point).

I notice these are signed ints, but we've taken care to use uint32_t
elsewhere for positions. Shouldn't these be uint32_t, also (or at least
unsigned)?

-Peff
Taylor Blau Jan. 12, 2021, 4:59 p.m. UTC | #5
On Tue, Jan 12, 2021 at 04:32:39AM -0500, Jeff King wrote:
> Good catch. We might want to drop the initialization of "lo":
>
> >  	int lo = 0;
> > -	int hi = p->num_objects + 1;
>
> down to here:
>
> > +	hi = p->num_objects + 1;

> to maintain symmetry (though it's quite a minor point).

:-). I agree it's a minor point, but I think that the symmetry is nice,
so it's worth doing.

> I notice these are signed ints, but we've taken care to use uint32_t
> elsewhere for positions. Shouldn't these be uint32_t, also (or at least
> unsigned)?

I'll let both of these be an raw unsigned, since the midpoint is already
labeled as an unsigned.

> -Peff

Thanks,
Taylor
Jeff King Jan. 13, 2021, 1:05 p.m. UTC | #6
On Tue, Jan 12, 2021 at 11:59:42AM -0500, Taylor Blau wrote:

> > I notice these are signed ints, but we've taken care to use uint32_t
> > elsewhere for positions. Shouldn't these be uint32_t, also (or at least
> > unsigned)?
> 
> I'll let both of these be an raw unsigned, since the midpoint is already
> labeled as an unsigned.

Yeah, I found that existing code doubly weird, since "mi" must by
definition be bounded by "lo" and "hi". :)

Looks like the bug was introduced in 92e5c77c37 (revindex: export new
APIs, 2013-10-24), which hacked up the existing loop into a new
function. We should have caught it then (back then we were a lot less
careful about types, IMHO).

Also, the subject line of that commit is giving me deja vu. ;)

-Peff
diff mbox series

Patch

diff --git a/pack-revindex.c b/pack-revindex.c
index 4e42238906..9392c4be73 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -169,16 +169,23 @@  int load_pack_revindex(struct packed_git *p)
 	return 0;
 }
 
-int find_revindex_position(struct packed_git *p, off_t ofs)
+int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	int lo = 0;
-	int hi = p->num_objects + 1;
-	const struct revindex_entry *revindex = p->revindex;
+	int hi;
+	const struct revindex_entry *revindex;
+
+	if (load_pack_revindex(p) < 0)
+		return -1;
+
+	hi = p->num_objects + 1;
+	revindex = p->revindex;
 
 	do {
 		const unsigned mi = lo + (hi - lo) / 2;
 		if (revindex[mi].offset == ofs) {
-			return mi;
+			*pos = mi;
+			return 0;
 		} else if (ofs < revindex[mi].offset)
 			hi = mi;
 		else
@@ -189,20 +196,6 @@  int find_revindex_position(struct packed_git *p, off_t ofs)
 	return -1;
 }
 
-int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
-{
-	int ret;
-
-	if (load_pack_revindex(p) < 0)
-		return -1;
-
-	ret = find_revindex_position(p, ofs);
-	if (ret < 0)
-		return -1;
-	*pos = ret;
-	return 0;
-}
-
 uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
 {
 	if (!p->revindex)
diff --git a/pack-revindex.h b/pack-revindex.h
index 07c1a7a3c8..b5dd114fd5 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -9,7 +9,6 @@  struct revindex_entry {
 };
 
 int load_pack_revindex(struct packed_git *p);
-int find_revindex_position(struct packed_git *p, off_t ofs);
 
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos);
 uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);