diff mbox series

[02/20] write_reuse_object(): convert to new revindex API

Message ID 00668523e1cd860f6de08dd7c5a2a54edc08b7b6.1610129796.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-revindex: prepare for on-disk reverse index | expand

Commit Message

Taylor Blau Jan. 8, 2021, 6:16 p.m. UTC
First replace 'find_pack_revindex()' with its replacement
'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make
its way through until 'write_reuse_object()' from causing a bad memory
read (if 'revidx' is 'NULL')

Next, replace a direct access of '->nr' with the wrapper function
'pack_pos_to_index()'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Jeff King Jan. 12, 2021, 8:47 a.m. UTC | #1
On Fri, Jan 08, 2021 at 01:16:48PM -0500, Taylor Blau wrote:

> First replace 'find_pack_revindex()' with its replacement
> 'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make
> its way through until 'write_reuse_object()' from causing a bad memory
> read (if 'revidx' is 'NULL')

Nice. Better abstraction, and we're catching more errors.

> @@ -436,10 +436,13 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
>  					      type, entry_size);
>  
>  	offset = entry->in_pack_offset;
> -	revidx = find_pack_revindex(p, offset);
> -	datalen = revidx[1].offset - offset;
> +	if (offset_to_pack_pos(p, offset, &pos) < 0)
> +		die(_("write_reuse_object: could not locate %s"),
> +		    oid_to_hex(&entry->idx.oid));

If we believe the offset is bogus, should we print that in the error
message, too? Something like:

  die("could not locate %s, expected at offset %"PRIuMAX" in pack %s",
      oid_to_hex(&entry->idx.oid), (uintmax_t)offset, p->pack_name);

> +	datalen = pack_pos_to_offset(p, pos + 1) - offset;

This "pos + 1" means we may be looking one past the end of the array.
That's OK (at least for now), because our revindex always puts in an
extra dummy value exactly for computing these kinds of byte-distances.
That might be worth documenting in the API header.

-Peff
Taylor Blau Jan. 12, 2021, 4:31 p.m. UTC | #2
On Tue, Jan 12, 2021 at 03:47:47AM -0500, Jeff King wrote:
> > @@ -436,10 +436,13 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
> >  					      type, entry_size);
> >
> >  	offset = entry->in_pack_offset;
> > -	revidx = find_pack_revindex(p, offset);
> > -	datalen = revidx[1].offset - offset;
> > +	if (offset_to_pack_pos(p, offset, &pos) < 0)
> > +		die(_("write_reuse_object: could not locate %s"),
> > +		    oid_to_hex(&entry->idx.oid));
>
> If we believe the offset is bogus, should we print that in the error
> message, too? Something like:
>
>   die("could not locate %s, expected at offset %"PRIuMAX" in pack %s",
>       oid_to_hex(&entry->idx.oid), (uintmax_t)offset, p->pack_name);

Good idea, thanks.

> > +	datalen = pack_pos_to_offset(p, pos + 1) - offset;
>
> This "pos + 1" means we may be looking one past the end of the array.
> That's OK (at least for now), because our revindex always puts in an
> extra dummy value exactly for computing these kinds of byte-distances.
> That might be worth documenting in the API header.

Yeah, I made sure to document that when I was touching up the last
patch. FWIW, that's a behavior that we're going to carry over even when
the reverse index is stored on-disk (not by writing four extra bytes
into the .rev file, but by handling queries for pos == p->num_objects
separately.)

> -Peff

Thanks,
Taylor
Jeff King Jan. 13, 2021, 1:02 p.m. UTC | #3
On Tue, Jan 12, 2021 at 11:31:40AM -0500, Taylor Blau wrote:

> > > +	datalen = pack_pos_to_offset(p, pos + 1) - offset;
> >
> > This "pos + 1" means we may be looking one past the end of the array.
> > That's OK (at least for now), because our revindex always puts in an
> > extra dummy value exactly for computing these kinds of byte-distances.
> > That might be worth documenting in the API header.
> 
> Yeah, I made sure to document that when I was touching up the last
> patch. FWIW, that's a behavior that we're going to carry over even when
> the reverse index is stored on-disk (not by writing four extra bytes
> into the .rev file, but by handling queries for pos == p->num_objects
> separately.)

I think the only reason to look past the end like that is to compute the
size of the final entry. So we _could_ abstract that away from the
callers with a separate function like:

  off_t pack_pos_to_size(struct packed_git *p, uint32_t pos);

But as long as the behavior of passing p->num_objects is documented, I
do not mind overly mind spelling it one way or the other.

-Peff
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2a00358f34..03d25db442 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -419,7 +419,7 @@  static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
 {
 	struct packed_git *p = IN_PACK(entry);
 	struct pack_window *w_curs = NULL;
-	struct revindex_entry *revidx;
+	uint32_t pos;
 	off_t offset;
 	enum object_type type = oe_type(entry);
 	off_t datalen;
@@ -436,10 +436,13 @@  static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
 					      type, entry_size);
 
 	offset = entry->in_pack_offset;
-	revidx = find_pack_revindex(p, offset);
-	datalen = revidx[1].offset - offset;
+	if (offset_to_pack_pos(p, offset, &pos) < 0)
+		die(_("write_reuse_object: could not locate %s"),
+		    oid_to_hex(&entry->idx.oid));
+	datalen = pack_pos_to_offset(p, pos + 1) - offset;
 	if (!pack_to_stdout && p->index_version > 1 &&
-	    check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) {
+	    check_pack_crc(p, &w_curs, offset, datalen,
+			   pack_pos_to_index(p, pos))) {
 		error(_("bad packed object CRC for %s"),
 		      oid_to_hex(&entry->idx.oid));
 		unuse_pack(&w_curs);