[26/26] midx: switch to using the_hash_algo
diff mbox series

Message ID 20190818200427.870753-27-sandals@crustytoothpaste.net
State New
Headers show
Series
  • object_id part 17
Related show

Commit Message

brian m. carlson Aug. 18, 2019, 8:04 p.m. UTC
Instead of hard-coding the hash size, use the_hash_algo to look up the
hash size at runtime.  Remove the #define constant which was used to
hold the hash length, since writing the expression with the_hash_algo
provide enough documentary value on its own.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 midx.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Comments

Derrick Stolee Aug. 22, 2019, 2:04 p.m. UTC | #1
On 8/18/2019 4:04 PM, brian m. carlson wrote:
> Instead of hard-coding the hash size, use the_hash_algo to look up the
> hash size at runtime.  Remove the #define constant which was used to
> hold the hash length, since writing the expression with the_hash_algo
> provide enough documentary value on its own.

Thanks for this change! It seems to be very similar to the one
included in the commit-graph, barring one small issue below
(that we can follow-up on later).

> diff --git a/midx.c b/midx.c
> index d649644420..f29afc0d2d 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -19,8 +19,7 @@
>  #define MIDX_BYTE_NUM_PACKS 8
>  #define MIDX_HASH_VERSION 1

This hash version "1" is the same as we used in the commit-graph. It's
a byte value from the file format, and we've already discussed how it
would have been better to use the 4-byte identifier, but that ship has
sailed. I'm just pointing this out to say that we are not done in this
file yet, but we can get to that when we want to test the midx with
multiple hash lengths.

>  #define MIDX_HEADER_SIZE 12
> -#define MIDX_HASH_LEN 20

The replacements of MIDX_HASH_LEN make sense. Thanks!

-Stolee
brian m. carlson Aug. 23, 2019, 2:17 a.m. UTC | #2
On 2019-08-22 at 14:04:16, Derrick Stolee wrote:
> On 8/18/2019 4:04 PM, brian m. carlson wrote:
> > diff --git a/midx.c b/midx.c
> > index d649644420..f29afc0d2d 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -19,8 +19,7 @@
> >  #define MIDX_BYTE_NUM_PACKS 8
> >  #define MIDX_HASH_VERSION 1
> 
> This hash version "1" is the same as we used in the commit-graph. It's
> a byte value from the file format, and we've already discussed how it
> would have been better to use the 4-byte identifier, but that ship has
> sailed. I'm just pointing this out to say that we are not done in this
> file yet, but we can get to that when we want to test the midx with
> multiple hash lengths.

My approach so far has been to assume everything in the .git directory
is in the same hash except for the translation functionality. Therefore,
it doesn't make sense to distinguish between hashes in the midx files,
because we'll never have files that differ in hash.  So essentially the
MIDX_HASH_VERSION being 1 is "whatever hash is being used in the .git
directory", not just SHA-1.

In addition, the current multi-pack index format isn't capable (from my
reading of the documentation, at least) of handling multiple hash
algorithms at once.  So we'd need a midx v2 format for folks who are
using SHA-256 with SHA-1 compatibility and we could then write separate
sets of object chunks with an appropriate format identifier, much like
the proposed pack index v3.
Derrick Stolee Aug. 23, 2019, 11:53 a.m. UTC | #3
On 8/22/2019 10:17 PM, brian m. carlson wrote:
> On 2019-08-22 at 14:04:16, Derrick Stolee wrote:
>> On 8/18/2019 4:04 PM, brian m. carlson wrote:
>>> diff --git a/midx.c b/midx.c
>>> index d649644420..f29afc0d2d 100644
>>> --- a/midx.c
>>> +++ b/midx.c
>>> @@ -19,8 +19,7 @@
>>>  #define MIDX_BYTE_NUM_PACKS 8
>>>  #define MIDX_HASH_VERSION 1
>>
>> This hash version "1" is the same as we used in the commit-graph. It's
>> a byte value from the file format, and we've already discussed how it
>> would have been better to use the 4-byte identifier, but that ship has
>> sailed. I'm just pointing this out to say that we are not done in this
>> file yet, but we can get to that when we want to test the midx with
>> multiple hash lengths.
> 
> My approach so far has been to assume everything in the .git directory
> is in the same hash except for the translation functionality. Therefore,
> it doesn't make sense to distinguish between hashes in the midx files,
> because we'll never have files that differ in hash.  So essentially the
> MIDX_HASH_VERSION being 1 is "whatever hash is being used in the .git
> directory", not just SHA-1.
> 
> In addition, the current multi-pack index format isn't capable (from my
> reading of the documentation, at least) of handling multiple hash
> algorithms at once.  So we'd need a midx v2 format for folks who are
> using SHA-256 with SHA-1 compatibility and we could then write separate
> sets of object chunks with an appropriate format identifier, much like
> the proposed pack index v3.

Absolutely, it is not. It would be a great place to store a transition
table, when that is needed.

If we _never_ allow both hashes in the .git folder, then maybe we won't
ever need this and can rely on config options. I imagine that will be
tricky, and updating this byte should only help. We are not ready for
that, anyway.

Thanks,
-Stolee

Patch
diff mbox series

diff --git a/midx.c b/midx.c
index d649644420..f29afc0d2d 100644
--- a/midx.c
+++ b/midx.c
@@ -19,8 +19,7 @@ 
 #define MIDX_BYTE_NUM_PACKS 8
 #define MIDX_HASH_VERSION 1
 #define MIDX_HEADER_SIZE 12
-#define MIDX_HASH_LEN 20
-#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
 
 #define MIDX_MAX_CHUNKS 5
 #define MIDX_CHUNK_ALIGNMENT 4
@@ -93,7 +92,7 @@  struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	hash_version = m->data[MIDX_BYTE_HASH_VERSION];
 	if (hash_version != MIDX_HASH_VERSION)
 		die(_("hash version %u does not match"), hash_version);
-	m->hash_len = MIDX_HASH_LEN;
+	m->hash_len = the_hash_algo->rawsz;
 
 	m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
 
@@ -234,7 +233,7 @@  int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result)
 {
 	return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
-			    MIDX_HASH_LEN, result);
+			    the_hash_algo->rawsz, result);
 }
 
 struct object_id *nth_midxed_object_oid(struct object_id *oid,
@@ -928,7 +927,7 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	cur_chunk++;
 	chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS;
-	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_HASH_LEN;
+	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz;
 
 	cur_chunk++;
 	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH;
@@ -976,7 +975,7 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 				break;
 
 			case MIDX_CHUNKID_OIDLOOKUP:
-				written += write_midx_oid_lookup(f, MIDX_HASH_LEN, entries, nr_entries);
+				written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries);
 				break;
 
 			case MIDX_CHUNKID_OBJECTOFFSETS: