diff mbox series

[v2,15/20] for_each_object_in_pack(): convert to new revindex API

Message ID e7574763513294b71071b032d5cd3aa0976969dd.1610576604.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit b130aef65e492667229191dfd0689b74d88c6a84
Headers show
Series pack-revindex: prepare for on-disk reverse index | expand

Commit Message

Taylor Blau Jan. 13, 2021, 10:24 p.m. UTC
Avoid looking at the 'revindex' pointer directly and instead call
'pack_pos_to_index()'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 packfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano Jan. 14, 2021, 6:43 a.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> Avoid looking at the 'revindex' pointer directly and instead call
> 'pack_pos_to_index()'.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  packfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/packfile.c b/packfile.c
> index 936ab3def5..7bb1750934 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -2086,7 +2086,7 @@ int for_each_object_in_pack(struct packed_git *p,
>  		struct object_id oid;
>  
>  		if (flags & FOR_EACH_OBJECT_PACK_ORDER)
> -			pos = p->revindex[i].nr;
> +			pos = pack_pos_to_index(p, i);

It wasn't too bad before this series formally defined what
"position", "index" and "offset" mean, but now this has become
highly misleading. The variable "pos" here holds what we consider
"index" while "i" holds what we call "position" [*1*].

>  		else
>  			pos = i;

Perhaps renaming "uint32_t pos" to "nth" would avoid confusion?

-	if (nth_packed_object_id(&oid, p, pos) < 0)
+	if (nth_packed_object_id(&oid, p, nth) < 0)
		return error(...);


[Footnote]

*1* The nth_packed_object_id() call we make later using the value we
obtain here should be documented to take "index" as its last
parameter, now that is what we call the location in the index, which
is in object name order.
Taylor Blau Jan. 14, 2021, 5 p.m. UTC | #2
On Wed, Jan 13, 2021 at 10:43:37PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Avoid looking at the 'revindex' pointer directly and instead call
> > 'pack_pos_to_index()'.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  packfile.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/packfile.c b/packfile.c
> > index 936ab3def5..7bb1750934 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -2086,7 +2086,7 @@ int for_each_object_in_pack(struct packed_git *p,
> >  		struct object_id oid;
> >
> >  		if (flags & FOR_EACH_OBJECT_PACK_ORDER)
> > -			pos = p->revindex[i].nr;
> > +			pos = pack_pos_to_index(p, i);
>
> It wasn't too bad before this series formally defined what
> "position", "index" and "offset" mean, but now this has become
> highly misleading. The variable "pos" here holds what we consider
> "index" while "i" holds what we call "position" [*1*].
>
> >  		else
> >  			pos = i;
>
> Perhaps renaming "uint32_t pos" to "nth" would avoid confusion?

I agree that it can be confusing. Unfortunately in this spot, this
variable really does mean two things. If we set the
FOR_EACH_OBJECT_PACK_ORDER bit in our flags, then the caller really
wants the index position (and the objects to be delivered in pack
order). But if we didn't set it, then the caller wants it in index
order.

> -	if (nth_packed_object_id(&oid, p, pos) < 0)
> +	if (nth_packed_object_id(&oid, p, nth) < 0)
> 		return error(...);

This suggested diff makes me think that you understand all of that, so
I'm mostly saying this for the benefit of others that haven't looked at
this code closely in the recent past.

I'd be happy to send a replacement patch if you would like [1], but I'm
hopeful that this is clear enough since there isn't much code between
the declaration, assignment(s), and use of 'pos'.

Thanks,
Taylor

[1]: I understand your general disdain for single replacement patches,
but I'd like to avoid sending the other 19 patches if possible to avoid
delivering more mail to list subscribers than is necessary.
Jeff King Jan. 14, 2021, 7:33 p.m. UTC | #3
On Wed, Jan 13, 2021 at 10:43:37PM -0800, Junio C Hamano wrote:

> >  		if (flags & FOR_EACH_OBJECT_PACK_ORDER)
> > -			pos = p->revindex[i].nr;
> > +			pos = pack_pos_to_index(p, i);
> 
> It wasn't too bad before this series formally defined what
> "position", "index" and "offset" mean, but now this has become
> highly misleading. The variable "pos" here holds what we consider
> "index" while "i" holds what we call "position" [*1*].

I don't think "position" is a meaningful term by itself. I would say the
useful terms are "pack position", "index position", and "offset" (or
"pack offset" if you like). I don't think anything in the definitions
added by earlier patches contradicts that, but perhaps we can make it
more clear.

So "pos" in this case is not wrong. But I agree that it could stand to
be more clear. Saying "nth" does not help things IMHO (there is an "nth"
pack position, as well).

But maybe this makes it more clear (or possibly just the name change
without the comment):

diff --git a/packfile.c b/packfile.c
index de47c9f4f8..6035b80466 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2078,19 +2078,30 @@ int for_each_object_in_pack(struct packed_git *p,
 	}
 
 	for (i = 0; i < p->num_objects; i++) {
-		uint32_t pos;
+		uint32_t index_pos;
 		struct object_id oid;
 
+		/*
+		 * We are iterating "i" from 0 up to num_objects, but its
+		 * meaning may be different:
+		 *
+		 *   - in object-name order, it is the same as the index order
+		 *     given to us by nth_packed_object_id(), and we can use it
+		 *     directly
+		 *
+		 *   - in pack-order, it is pack position, which we must
+		 *     convert to an index position in order to get the oid.
+		 */
 		if (flags & FOR_EACH_OBJECT_PACK_ORDER)
-			pos = p->revindex[i].nr;
+			index_pos = p->revindex[i].nr;
 		else
-			pos = i;
+			index_pos = i;
 
-		if (nth_packed_object_id(&oid, p, pos) < 0)
+		if (nth_packed_object_id(&oid, p, index_pos) < 0)
 			return error("unable to get sha1 of object %u in %s",
-				     pos, p->pack_name);
+				     index_pos, p->pack_name);
 
-		r = cb(&oid, p, pos, data);
+		r = cb(&oid, p, index_pos, data);
 		if (r)
 			break;
 	}


> *1* The nth_packed_object_id() call we make later using the value we
> obtain here should be documented to take "index" as its last
> parameter, now that is what we call the location in the index, which
> is in object name order.

I would love to see the function given a more descriptive name. Having
worked on the bitmap code a lot, where the norm is pack-order, saying
"nth" is confusing and error-prone.

But I think that's out of scope for this series.

-Peff
Jeff King Jan. 14, 2021, 8:11 p.m. UTC | #4
On Thu, Jan 14, 2021 at 02:33:53PM -0500, Jeff King wrote:

> So "pos" in this case is not wrong. But I agree that it could stand to
> be more clear. Saying "nth" does not help things IMHO (there is an "nth"
> pack position, as well).
> 
> But maybe this makes it more clear (or possibly just the name change
> without the comment):

Here it is again, but with a signoff and commit message, and done on top
of your series (so if we agree this is a good resolution, it can just be
picked up on top, but I am also happy for it to be squashed into patch
15).

-- >8 --
Subject: [PATCH] for_each_object_in_pack(): clarify pack vs index ordering

We may return objects in one of two orders: how they appear in the .idx
(sorted by object id) or how they appear in the packfile itself. To
further complicate matters, we have two ordering variables, "i" and
"pos", and it is not clear to which order they apply.

Let's clarify this by using an unambiguous name where possible, and
leaving a comment for the variable that does double-duty.

Signed-off-by: Jeff King <peff@peff.net>
---
 packfile.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/packfile.c b/packfile.c
index 7bb1750934..35d50e2c38 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2082,19 +2082,31 @@ int for_each_object_in_pack(struct packed_git *p,
 	}
 
 	for (i = 0; i < p->num_objects; i++) {
-		uint32_t pos;
+		uint32_t index_pos;
 		struct object_id oid;
 
+		/*
+		 * We are iterating "i" from 0 up to num_objects, but its
+		 * meaning may be different, depending on the requested output
+		 * order:
+		 *
+		 *   - in object-name order, it is the same as the index order
+		 *     used by nth_packed_object_id(), so we can pass it
+		 *     directly
+		 *
+		 *   - in pack-order, it is pack position, which we must
+		 *     convert to an index position in order to get the oid.
+		 */
 		if (flags & FOR_EACH_OBJECT_PACK_ORDER)
-			pos = pack_pos_to_index(p, i);
+			index_pos = pack_pos_to_index(p, i);
 		else
-			pos = i;
+			index_pos = i;
 
-		if (nth_packed_object_id(&oid, p, pos) < 0)
+		if (nth_packed_object_id(&oid, p, index_pos) < 0)
 			return error("unable to get sha1 of object %u in %s",
-				     pos, p->pack_name);
+				     index_pos, p->pack_name);
 
-		r = cb(&oid, p, pos, data);
+		r = cb(&oid, p, index_pos, data);
 		if (r)
 			break;
 	}
Taylor Blau Jan. 14, 2021, 8:15 p.m. UTC | #5
On Thu, Jan 14, 2021 at 03:11:10PM -0500, Jeff King wrote:
> On Thu, Jan 14, 2021 at 02:33:53PM -0500, Jeff King wrote:
>
> > So "pos" in this case is not wrong. But I agree that it could stand to
> > be more clear. Saying "nth" does not help things IMHO (there is an "nth"
> > pack position, as well).
> >
> > But maybe this makes it more clear (or possibly just the name change
> > without the comment):
>
> Here it is again, but with a signoff and commit message, and done on top
> of your series (so if we agree this is a good resolution, it can just be
> picked up on top, but I am also happy for it to be squashed into patch
> 15).

Much appreciated. This looks good to me (and I have no opinion whether
it is picked up on top, or squashed into patch 15).

  Acked-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor
Junio C Hamano Jan. 14, 2021, 8:51 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

>  	for (i = 0; i < p->num_objects; i++) {
> -		uint32_t pos;
> +		uint32_t index_pos;
> ...
>> *1* The nth_packed_object_id() call we make later using the value we
>> obtain here should be documented to take "index" as its last
>> parameter, now that is what we call the location in the index, which
>> is in object name order.
>
> I would love to see the function given a more descriptive name. Having
> worked on the bitmap code a lot, where the norm is pack-order, saying
> "nth" is confusing and error-prone.
>
> But I think that's out of scope for this series.

Yeah, an explicit index_pos (vs pack_order_pos) would be good names
to use, and nth_packed_object_id() can also use somewhere in its
name to hint that it is about the object name order, but I agree
that both are outside the scope of this series.

Thanks.
Junio C Hamano Jan. 15, 2021, 2:22 a.m. UTC | #7
Taylor Blau <me@ttaylorr.com> writes:

> On Thu, Jan 14, 2021 at 03:11:10PM -0500, Jeff King wrote:
>> On Thu, Jan 14, 2021 at 02:33:53PM -0500, Jeff King wrote:
>>
>> > So "pos" in this case is not wrong. But I agree that it could stand to
>> > be more clear. Saying "nth" does not help things IMHO (there is an "nth"
>> > pack position, as well).
>> >
>> > But maybe this makes it more clear (or possibly just the name change
>> > without the comment):
>>
>> Here it is again, but with a signoff and commit message, and done on top
>> of your series (so if we agree this is a good resolution, it can just be
>> picked up on top, but I am also happy for it to be squashed into patch
>> 15).
>
> Much appreciated. This looks good to me (and I have no opinion whether
> it is picked up on top, or squashed into patch 15).
>
>   Acked-by: Taylor Blau <me@ttaylorr.com>

OK, so I'll make this 21/20 and rebase the other one...
Taylor Blau Jan. 15, 2021, 2:29 a.m. UTC | #8
On Thu, Jan 14, 2021 at 06:22:56PM -0800, Junio C Hamano wrote:
> > Much appreciated. This looks good to me (and I have no opinion whether
> > it is picked up on top, or squashed into patch 15).
> >
> >   Acked-by: Taylor Blau <me@ttaylorr.com>
>
> OK, so I'll make this 21/20 and rebase the other one...

I'm happy if you want to apply this separately on top of both series,
since I think we all agree that this isn't strictly necessary to go
forward.

That said, if you do make this 21/20 and the rebase of the second series
requires any intervention, please let me know and I'll be happy to send
out a version myself.

In the meantime, I don't mind if you want to eject the latter topic from
seen while it picks up review. The alternative (keeping it in and
rebasing it forward), of course, is much appreciated.


Thanks,
Taylor
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index 936ab3def5..7bb1750934 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2086,7 +2086,7 @@  int for_each_object_in_pack(struct packed_git *p,
 		struct object_id oid;
 
 		if (flags & FOR_EACH_OBJECT_PACK_ORDER)
-			pos = p->revindex[i].nr;
+			pos = pack_pos_to_index(p, i);
 		else
 			pos = i;