Message ID | e7574763513294b71071b032d5cd3aa0976969dd.1610576604.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Accepted |
Commit | b130aef65e492667229191dfd0689b74d88c6a84 |
Headers | show |
Series | pack-revindex: prepare for on-disk reverse index | expand |
Taylor Blau <me@ttaylorr.com> writes: > Avoid looking at the 'revindex' pointer directly and instead call > 'pack_pos_to_index()'. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > packfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/packfile.c b/packfile.c > index 936ab3def5..7bb1750934 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -2086,7 +2086,7 @@ int for_each_object_in_pack(struct packed_git *p, > struct object_id oid; > > if (flags & FOR_EACH_OBJECT_PACK_ORDER) > - pos = p->revindex[i].nr; > + pos = pack_pos_to_index(p, i); It wasn't too bad before this series formally defined what "position", "index" and "offset" mean, but now this has become highly misleading. The variable "pos" here holds what we consider "index" while "i" holds what we call "position" [*1*]. > else > pos = i; Perhaps renaming "uint32_t pos" to "nth" would avoid confusion? - if (nth_packed_object_id(&oid, p, pos) < 0) + if (nth_packed_object_id(&oid, p, nth) < 0) return error(...); [Footnote] *1* The nth_packed_object_id() call we make later using the value we obtain here should be documented to take "index" as its last parameter, now that is what we call the location in the index, which is in object name order.
On Wed, Jan 13, 2021 at 10:43:37PM -0800, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > Avoid looking at the 'revindex' pointer directly and instead call > > 'pack_pos_to_index()'. > > > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > --- > > packfile.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/packfile.c b/packfile.c > > index 936ab3def5..7bb1750934 100644 > > --- a/packfile.c > > +++ b/packfile.c > > @@ -2086,7 +2086,7 @@ int for_each_object_in_pack(struct packed_git *p, > > struct object_id oid; > > > > if (flags & FOR_EACH_OBJECT_PACK_ORDER) > > - pos = p->revindex[i].nr; > > + pos = pack_pos_to_index(p, i); > > It wasn't too bad before this series formally defined what > "position", "index" and "offset" mean, but now this has become > highly misleading. The variable "pos" here holds what we consider > "index" while "i" holds what we call "position" [*1*]. > > > else > > pos = i; > > Perhaps renaming "uint32_t pos" to "nth" would avoid confusion? I agree that it can be confusing. Unfortunately in this spot, this variable really does mean two things. If we set the FOR_EACH_OBJECT_PACK_ORDER bit in our flags, then the caller really wants the index position (and the objects to be delivered in pack order). But if we didn't set it, then the caller wants it in index order. > - if (nth_packed_object_id(&oid, p, pos) < 0) > + if (nth_packed_object_id(&oid, p, nth) < 0) > return error(...); This suggested diff makes me think that you understand all of that, so I'm mostly saying this for the benefit of others that haven't looked at this code closely in the recent past. I'd be happy to send a replacement patch if you would like [1], but I'm hopeful that this is clear enough since there isn't much code between the declaration, assignment(s), and use of 'pos'. Thanks, Taylor [1]: I understand your general disdain for single replacement patches, but I'd like to avoid sending the other 19 patches if possible to avoid delivering more mail to list subscribers than is necessary.
On Wed, Jan 13, 2021 at 10:43:37PM -0800, Junio C Hamano wrote: > > if (flags & FOR_EACH_OBJECT_PACK_ORDER) > > - pos = p->revindex[i].nr; > > + pos = pack_pos_to_index(p, i); > > It wasn't too bad before this series formally defined what > "position", "index" and "offset" mean, but now this has become > highly misleading. The variable "pos" here holds what we consider > "index" while "i" holds what we call "position" [*1*]. I don't think "position" is a meaningful term by itself. I would say the useful terms are "pack position", "index position", and "offset" (or "pack offset" if you like). I don't think anything in the definitions added by earlier patches contradicts that, but perhaps we can make it more clear. So "pos" in this case is not wrong. But I agree that it could stand to be more clear. Saying "nth" does not help things IMHO (there is an "nth" pack position, as well). But maybe this makes it more clear (or possibly just the name change without the comment): diff --git a/packfile.c b/packfile.c index de47c9f4f8..6035b80466 100644 --- a/packfile.c +++ b/packfile.c @@ -2078,19 +2078,30 @@ int for_each_object_in_pack(struct packed_git *p, } for (i = 0; i < p->num_objects; i++) { - uint32_t pos; + uint32_t index_pos; struct object_id oid; + /* + * We are iterating "i" from 0 up to num_objects, but its + * meaning may be different: + * + * - in object-name order, it is the same as the index order + * given to us by nth_packed_object_id(), and we can use it + * directly + * + * - in pack-order, it is pack position, which we must + * convert to an index position in order to get the oid. + */ if (flags & FOR_EACH_OBJECT_PACK_ORDER) - pos = p->revindex[i].nr; + index_pos = p->revindex[i].nr; else - pos = i; + index_pos = i; - if (nth_packed_object_id(&oid, p, pos) < 0) + if (nth_packed_object_id(&oid, p, index_pos) < 0) return error("unable to get sha1 of object %u in %s", - pos, p->pack_name); + index_pos, p->pack_name); - r = cb(&oid, p, pos, data); + r = cb(&oid, p, index_pos, data); if (r) break; } > *1* The nth_packed_object_id() call we make later using the value we > obtain here should be documented to take "index" as its last > parameter, now that is what we call the location in the index, which > is in object name order. I would love to see the function given a more descriptive name. Having worked on the bitmap code a lot, where the norm is pack-order, saying "nth" is confusing and error-prone. But I think that's out of scope for this series. -Peff
On Thu, Jan 14, 2021 at 02:33:53PM -0500, Jeff King wrote: > So "pos" in this case is not wrong. But I agree that it could stand to > be more clear. Saying "nth" does not help things IMHO (there is an "nth" > pack position, as well). > > But maybe this makes it more clear (or possibly just the name change > without the comment): Here it is again, but with a signoff and commit message, and done on top of your series (so if we agree this is a good resolution, it can just be picked up on top, but I am also happy for it to be squashed into patch 15). -- >8 -- Subject: [PATCH] for_each_object_in_pack(): clarify pack vs index ordering We may return objects in one of two orders: how they appear in the .idx (sorted by object id) or how they appear in the packfile itself. To further complicate matters, we have two ordering variables, "i" and "pos", and it is not clear to which order they apply. Let's clarify this by using an unambiguous name where possible, and leaving a comment for the variable that does double-duty. Signed-off-by: Jeff King <peff@peff.net> --- packfile.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/packfile.c b/packfile.c index 7bb1750934..35d50e2c38 100644 --- a/packfile.c +++ b/packfile.c @@ -2082,19 +2082,31 @@ int for_each_object_in_pack(struct packed_git *p, } for (i = 0; i < p->num_objects; i++) { - uint32_t pos; + uint32_t index_pos; struct object_id oid; + /* + * We are iterating "i" from 0 up to num_objects, but its + * meaning may be different, depending on the requested output + * order: + * + * - in object-name order, it is the same as the index order + * used by nth_packed_object_id(), so we can pass it + * directly + * + * - in pack-order, it is pack position, which we must + * convert to an index position in order to get the oid. + */ if (flags & FOR_EACH_OBJECT_PACK_ORDER) - pos = pack_pos_to_index(p, i); + index_pos = pack_pos_to_index(p, i); else - pos = i; + index_pos = i; - if (nth_packed_object_id(&oid, p, pos) < 0) + if (nth_packed_object_id(&oid, p, index_pos) < 0) return error("unable to get sha1 of object %u in %s", - pos, p->pack_name); + index_pos, p->pack_name); - r = cb(&oid, p, pos, data); + r = cb(&oid, p, index_pos, data); if (r) break; }
On Thu, Jan 14, 2021 at 03:11:10PM -0500, Jeff King wrote: > On Thu, Jan 14, 2021 at 02:33:53PM -0500, Jeff King wrote: > > > So "pos" in this case is not wrong. But I agree that it could stand to > > be more clear. Saying "nth" does not help things IMHO (there is an "nth" > > pack position, as well). > > > > But maybe this makes it more clear (or possibly just the name change > > without the comment): > > Here it is again, but with a signoff and commit message, and done on top > of your series (so if we agree this is a good resolution, it can just be > picked up on top, but I am also happy for it to be squashed into patch > 15). Much appreciated. This looks good to me (and I have no opinion whether it is picked up on top, or squashed into patch 15). Acked-by: Taylor Blau <me@ttaylorr.com> Thanks, Taylor
Jeff King <peff@peff.net> writes: > for (i = 0; i < p->num_objects; i++) { > - uint32_t pos; > + uint32_t index_pos; > ... >> *1* The nth_packed_object_id() call we make later using the value we >> obtain here should be documented to take "index" as its last >> parameter, now that is what we call the location in the index, which >> is in object name order. > > I would love to see the function given a more descriptive name. Having > worked on the bitmap code a lot, where the norm is pack-order, saying > "nth" is confusing and error-prone. > > But I think that's out of scope for this series. Yeah, an explicit index_pos (vs pack_order_pos) would be good names to use, and nth_packed_object_id() can also use somewhere in its name to hint that it is about the object name order, but I agree that both are outside the scope of this series. Thanks.
Taylor Blau <me@ttaylorr.com> writes: > On Thu, Jan 14, 2021 at 03:11:10PM -0500, Jeff King wrote: >> On Thu, Jan 14, 2021 at 02:33:53PM -0500, Jeff King wrote: >> >> > So "pos" in this case is not wrong. But I agree that it could stand to >> > be more clear. Saying "nth" does not help things IMHO (there is an "nth" >> > pack position, as well). >> > >> > But maybe this makes it more clear (or possibly just the name change >> > without the comment): >> >> Here it is again, but with a signoff and commit message, and done on top >> of your series (so if we agree this is a good resolution, it can just be >> picked up on top, but I am also happy for it to be squashed into patch >> 15). > > Much appreciated. This looks good to me (and I have no opinion whether > it is picked up on top, or squashed into patch 15). > > Acked-by: Taylor Blau <me@ttaylorr.com> OK, so I'll make this 21/20 and rebase the other one...
On Thu, Jan 14, 2021 at 06:22:56PM -0800, Junio C Hamano wrote: > > Much appreciated. This looks good to me (and I have no opinion whether > > it is picked up on top, or squashed into patch 15). > > > > Acked-by: Taylor Blau <me@ttaylorr.com> > > OK, so I'll make this 21/20 and rebase the other one... I'm happy if you want to apply this separately on top of both series, since I think we all agree that this isn't strictly necessary to go forward. That said, if you do make this 21/20 and the rebase of the second series requires any intervention, please let me know and I'll be happy to send out a version myself. In the meantime, I don't mind if you want to eject the latter topic from seen while it picks up review. The alternative (keeping it in and rebasing it forward), of course, is much appreciated. Thanks, Taylor
diff --git a/packfile.c b/packfile.c index 936ab3def5..7bb1750934 100644 --- a/packfile.c +++ b/packfile.c @@ -2086,7 +2086,7 @@ int for_each_object_in_pack(struct packed_git *p, struct object_id oid; if (flags & FOR_EACH_OBJECT_PACK_ORDER) - pos = p->revindex[i].nr; + pos = pack_pos_to_index(p, i); else pos = i;
Avoid looking at the 'revindex' pointer directly and instead call 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- packfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)