Message ID | 20190818200427.870753-27-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | object_id part 17 | expand |
On 8/18/2019 4:04 PM, brian m. carlson wrote: > Instead of hard-coding the hash size, use the_hash_algo to look up the > hash size at runtime. Remove the #define constant which was used to > hold the hash length, since writing the expression with the_hash_algo > provide enough documentary value on its own. Thanks for this change! It seems to be very similar to the one included in the commit-graph, barring one small issue below (that we can follow-up on later). > diff --git a/midx.c b/midx.c > index d649644420..f29afc0d2d 100644 > --- a/midx.c > +++ b/midx.c > @@ -19,8 +19,7 @@ > #define MIDX_BYTE_NUM_PACKS 8 > #define MIDX_HASH_VERSION 1 This hash version "1" is the same as we used in the commit-graph. It's a byte value from the file format, and we've already discussed how it would have been better to use the 4-byte identifier, but that ship has sailed. I'm just pointing this out to say that we are not done in this file yet, but we can get to that when we want to test the midx with multiple hash lengths. > #define MIDX_HEADER_SIZE 12 > -#define MIDX_HASH_LEN 20 The replacements of MIDX_HASH_LEN make sense. Thanks! -Stolee
On 2019-08-22 at 14:04:16, Derrick Stolee wrote: > On 8/18/2019 4:04 PM, brian m. carlson wrote: > > diff --git a/midx.c b/midx.c > > index d649644420..f29afc0d2d 100644 > > --- a/midx.c > > +++ b/midx.c > > @@ -19,8 +19,7 @@ > > #define MIDX_BYTE_NUM_PACKS 8 > > #define MIDX_HASH_VERSION 1 > > This hash version "1" is the same as we used in the commit-graph. It's > a byte value from the file format, and we've already discussed how it > would have been better to use the 4-byte identifier, but that ship has > sailed. I'm just pointing this out to say that we are not done in this > file yet, but we can get to that when we want to test the midx with > multiple hash lengths. My approach so far has been to assume everything in the .git directory is in the same hash except for the translation functionality. Therefore, it doesn't make sense to distinguish between hashes in the midx files, because we'll never have files that differ in hash. So essentially the MIDX_HASH_VERSION being 1 is "whatever hash is being used in the .git directory", not just SHA-1. In addition, the current multi-pack index format isn't capable (from my reading of the documentation, at least) of handling multiple hash algorithms at once. So we'd need a midx v2 format for folks who are using SHA-256 with SHA-1 compatibility and we could then write separate sets of object chunks with an appropriate format identifier, much like the proposed pack index v3.
On 8/22/2019 10:17 PM, brian m. carlson wrote: > On 2019-08-22 at 14:04:16, Derrick Stolee wrote: >> On 8/18/2019 4:04 PM, brian m. carlson wrote: >>> diff --git a/midx.c b/midx.c >>> index d649644420..f29afc0d2d 100644 >>> --- a/midx.c >>> +++ b/midx.c >>> @@ -19,8 +19,7 @@ >>> #define MIDX_BYTE_NUM_PACKS 8 >>> #define MIDX_HASH_VERSION 1 >> >> This hash version "1" is the same as we used in the commit-graph. It's >> a byte value from the file format, and we've already discussed how it >> would have been better to use the 4-byte identifier, but that ship has >> sailed. I'm just pointing this out to say that we are not done in this >> file yet, but we can get to that when we want to test the midx with >> multiple hash lengths. > > My approach so far has been to assume everything in the .git directory > is in the same hash except for the translation functionality. Therefore, > it doesn't make sense to distinguish between hashes in the midx files, > because we'll never have files that differ in hash. So essentially the > MIDX_HASH_VERSION being 1 is "whatever hash is being used in the .git > directory", not just SHA-1. > > In addition, the current multi-pack index format isn't capable (from my > reading of the documentation, at least) of handling multiple hash > algorithms at once. So we'd need a midx v2 format for folks who are > using SHA-256 with SHA-1 compatibility and we could then write separate > sets of object chunks with an appropriate format identifier, much like > the proposed pack index v3. Absolutely, it is not. It would be a great place to store a transition table, when that is needed. If we _never_ allow both hashes in the .git folder, then maybe we won't ever need this and can rely on config options. I imagine that will be tricky, and updating this byte should only help. We are not ready for that, anyway. Thanks, -Stolee
diff --git a/midx.c b/midx.c index d649644420..f29afc0d2d 100644 --- a/midx.c +++ b/midx.c @@ -19,8 +19,7 @@ #define MIDX_BYTE_NUM_PACKS 8 #define MIDX_HASH_VERSION 1 #define MIDX_HEADER_SIZE 12 -#define MIDX_HASH_LEN 20 -#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN) +#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz) #define MIDX_MAX_CHUNKS 5 #define MIDX_CHUNK_ALIGNMENT 4 @@ -93,7 +92,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local hash_version = m->data[MIDX_BYTE_HASH_VERSION]; if (hash_version != MIDX_HASH_VERSION) die(_("hash version %u does not match"), hash_version); - m->hash_len = MIDX_HASH_LEN; + m->hash_len = the_hash_algo->rawsz; m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS]; @@ -234,7 +233,7 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result) { return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup, - MIDX_HASH_LEN, result); + the_hash_algo->rawsz, result); } struct object_id *nth_midxed_object_oid(struct object_id *oid, @@ -928,7 +927,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_HASH_LEN; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz; cur_chunk++; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; @@ -976,7 +975,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * break; case MIDX_CHUNKID_OIDLOOKUP: - written += write_midx_oid_lookup(f, MIDX_HASH_LEN, entries, nr_entries); + written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries); break; case MIDX_CHUNKID_OBJECTOFFSETS:
Instead of hard-coding the hash size, use the_hash_algo to look up the hash size at runtime. Remove the #define constant which was used to hold the hash length, since writing the expression with the_hash_algo provide enough documentary value on its own. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- midx.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)