diff mbox series

[25/44] packfile: compute and use the index CRC offset

Message ID 20200513005424.81369-26-sandals@crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series SHA-256 part 2/3: protocol functionality | expand

Commit Message

brian m. carlson May 13, 2020, 12:54 a.m. UTC
Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/index-pack.c | 6 +-----
 object-store.h       | 1 +
 packfile.c           | 1 +
 3 files changed, 3 insertions(+), 5 deletions(-)

Comments

Martin Ă…gren May 16, 2020, 11:12 a.m. UTC | #1
On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Both v2 pack index files and the v3 format specified as part of the
> NewHash work have similar data starting at the CRC table.  Much of the
> existing code wants to read either this table or the offset entries
> following it, and in doing so computes the offset each time.
>
> In order to share as much code between v2 and v3, compute the offset of
> the CRC table and store it when the pack is opened.  Use this value to
> compute offsets to not only the CRC table, but to the offset entries
> beyond it.

> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1555,13 +1555,9 @@ static void read_v2_anomalous_offsets(struct packed_git *p,
>  {
>         const uint32_t *idx1, *idx2;
>         uint32_t i;
> -       const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
>
>         /* The address of the 4-byte offset table */
> -       idx1 = (((const uint32_t *)p->index_data)
> -               + 2 /* 8-byte header */
> -               + 256 /* fan out */
> -               + hashwords * p->num_objects /* object ID table */
> +       idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset))
>                 + p->num_objects /* CRC32 table */
>                 );

This counts in four-byte words (so `+ 2` skips ahead 8B as the comment
notes). And that's why we need to use "rawsz/4".

Not new in this patch, but that outer pair of parenthesis just makes
this harder to read, IMHO. I keep scanning back and forth wondering,
"where is this whole thing going to get multiplied or something?"

  idx1 = (const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset)
         + p->num_objects /* CRC32 table */;

The double-casting can be avoided with something like this, but I'm not
sure it's really any better:

  idx1 = (const uint32_t *)p->index_data
         + p->crc_offset/sizeof(uint32_t)
         + p->num_objects /* CRC32 table */;

> --- a/packfile.c
> +++ b/packfile.c
> @@ -178,6 +178,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
>                      */
>                     (sizeof(off_t) <= 4))
>                         return error("pack too large for current definition of off_t in %s", path);
> +               p->crc_offset = 8 + 4 * 256 + nr * hashsz;
>         }
>
>         p->index_version = version;

It doesn't fit in the context, but `nr` will be assigned to
`p->num_objects`. And now we can just use `hashsz` without dividing by
4, so this does the same calculation as the old one above.


Martin
diff mbox series

Patch

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f176dd28c8..7bea1fba52 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1555,13 +1555,9 @@  static void read_v2_anomalous_offsets(struct packed_git *p,
 {
 	const uint32_t *idx1, *idx2;
 	uint32_t i;
-	const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
 
 	/* The address of the 4-byte offset table */
-	idx1 = (((const uint32_t *)p->index_data)
-		+ 2 /* 8-byte header */
-		+ 256 /* fan out */
-		+ hashwords * p->num_objects /* object ID table */
+	idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset))
 		+ p->num_objects /* CRC32 table */
 		);
 
diff --git a/object-store.h b/object-store.h
index d1e490f203..f439d47af8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -70,6 +70,7 @@  struct packed_git {
 	size_t index_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
+	uint32_t crc_offset;
 	unsigned char *bad_object_sha1;
 	int index_version;
 	time_t mtime;
diff --git a/packfile.c b/packfile.c
index f4e752996d..6ab5233613 100644
--- a/packfile.c
+++ b/packfile.c
@@ -178,6 +178,7 @@  int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 		     */
 		    (sizeof(off_t) <= 4))
 			return error("pack too large for current definition of off_t in %s", path);
+		p->crc_offset = 8 + 4 * 256 + nr * hashsz;
 	}
 
 	p->index_version = version;