diff mbox series

[14/20] unpack_entry(): convert to new revindex API

Message ID 13c49ed40ca72b7ab50939244616f0a90b5bf7f6.1610129796.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-revindex: prepare for on-disk reverse index | expand

Commit Message

Taylor Blau Jan. 8, 2021, 6:17 p.m. UTC
Remove direct manipulation of the 'struct revindex_entry' type as well
as calls to the deprecated API in 'packfile.c:unpack_entry()'. Usual
clean-up is performed (replacing '->nr' with calls to
'pack_pos_to_index()' and so on). Add an additional check to make
sure that 'obj_offset()' points at a valid object.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 packfile.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

Comments

Jeff King Jan. 12, 2021, 9:22 a.m. UTC | #1
On Fri, Jan 08, 2021 at 01:17:39PM -0500, Taylor Blau wrote:

> diff --git a/packfile.c b/packfile.c
> index 469c8d4f57..34467ea4a3 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1692,11 +1692,19 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  		}
>  
>  		if (do_check_packed_object_crc && p->index_version > 1) {
> -			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
> -			off_t len = revidx[1].offset - obj_offset;
> -			if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) {
> +			uint32_t pos, nr;

We have "pos" and "nr". What's the difference? :)

I think pack_pos and index_pos might be harder to get confused.

> +			off_t len;
> +
> +			if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
> +				data = NULL;
> +				goto out;
> +			}

Nice to see the error check here. As with the previous commit, we
probably want to error(), just as we would for errors below.

Do we also need to call mark_bad_packed_object()? I guess we can't,
because we only have the offset, and not the oid (the code below uses
nth_packed_object_id(), but it is relying on the revindex, which we know
just failed to work).

I'm just wondering if an error here is going to put us into an infinite
loop of retrying the lookup in the same pack over and over. Let's
see...our caller is ultimately packed_object_info(), but it too does not
have the oid. It returns an error up to do_oid_object_info_extended().
Which yes, does mark_bad_packed_object() itself. Good. So I think we are
fine, and arguably these lower-level calls to mark_bad_packed_object()
are not necessary. But they do not hurt either.

-Peff
Taylor Blau Jan. 12, 2021, 4:56 p.m. UTC | #2
On Tue, Jan 12, 2021 at 04:22:26AM -0500, Jeff King wrote:
> We have "pos" and "nr". What's the difference? :)
>
> I think pack_pos and index_pos might be harder to get confused.

Much clearer, thanks.

> > +			off_t len;
> > +
> > +			if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
> > +				data = NULL;
> > +				goto out;
> > +			}
>
> Nice to see the error check here. As with the previous commit, we
> probably want to error(), just as we would for errors below.

Yup, good call.

> Do we also need to call mark_bad_packed_object()? I guess we can't,
> because we only have the offset, and not the oid (the code below uses
> nth_packed_object_id(), but it is relying on the revindex, which we know
> just failed to work).
>
> I'm just wondering if an error here is going to put us into an infinite
> loop of retrying the lookup in the same pack over and over. Let's
> see...our caller is ultimately packed_object_info(), but it too does not
> have the oid. It returns an error up to do_oid_object_info_extended().
> Which yes, does mark_bad_packed_object() itself. Good. So I think we are
> fine, and arguably these lower-level calls to mark_bad_packed_object()
> are not necessary. But they do not hurt either.

Thanks, this rationale is helpful to have. I included an abridged
version of it in the patch message.

> -Peff

Thanks,
Taylor
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index 469c8d4f57..34467ea4a3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1692,11 +1692,19 @@  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		}
 
 		if (do_check_packed_object_crc && p->index_version > 1) {
-			struct revindex_entry *revidx = find_pack_revindex(p, obj_offset);
-			off_t len = revidx[1].offset - obj_offset;
-			if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) {
+			uint32_t pos, nr;
+			off_t len;
+
+			if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
+				data = NULL;
+				goto out;
+			}
+
+			len = pack_pos_to_offset(p, pos + 1) - obj_offset;
+			nr = pack_pos_to_index(p, pos);
+			if (check_pack_crc(p, &w_curs, obj_offset, len, nr)) {
 				struct object_id oid;
-				nth_packed_object_id(&oid, p, revidx->nr);
+				nth_packed_object_id(&oid, p, nr);
 				error("bad packed object CRC for %s",
 				      oid_to_hex(&oid));
 				mark_bad_packed_object(p, oid.hash);
@@ -1779,11 +1787,11 @@  void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			 * This is costly but should happen only in the presence
 			 * of a corrupted pack, and is better than failing outright.
 			 */
-			struct revindex_entry *revidx;
+			uint32_t pos;
 			struct object_id base_oid;
-			revidx = find_pack_revindex(p, obj_offset);
-			if (revidx) {
-				nth_packed_object_id(&base_oid, p, revidx->nr);
+			if (!(offset_to_pack_pos(p, obj_offset, &pos))) {
+				nth_packed_object_id(&base_oid, p,
+						     pack_pos_to_index(p, pos));
 				error("failed to read delta base object %s"
 				      " at offset %"PRIuMAX" from %s",
 				      oid_to_hex(&base_oid), (uintmax_t)obj_offset,