[04/31] pack-bitmap: replace sha1_to_hex
diff mbox series

Message ID 20190212012256.1005924-5-sandals@crustytoothpaste.net
State New
Headers show
Series
  • Hash function transition part 16
Related show

Commit Message

brian m. carlson Feb. 12, 2019, 1:22 a.m. UTC
Replace the uses of sha1_to_hex in the pack bitmap code with hash_to_hex
to allow the use of SHA-256 as well.  Rename a few variables since they
are no longer limited to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pack-bitmap-write.c | 6 +++---
 pack-bitmap.c       | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

Comments

Jeff King Feb. 12, 2019, 6:37 a.m. UTC | #1
On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote:

> Replace the uses of sha1_to_hex in the pack bitmap code with hash_to_hex
> to allow the use of SHA-256 as well.  Rename a few variables since they
> are no longer limited to SHA-1.

Sounds good, although...

> -static uint32_t find_object_pos(const unsigned char *sha1)
> +static uint32_t find_object_pos(const unsigned char *hash)

Isn't this really just a "struct object_id"? Why don't we want to use
that here?

I suspect it may be partially because our khash is storing raw sha1s.
But shouldn't we also be converted that to store object_ids?

In particular:

> @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>  	stored->root = root;
>  	stored->xor = xor_with;
>  	stored->flags = flags;
> -	oidread(&stored->oid, sha1);
> +	oidread(&stored->oid, hash);
>  
>  	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);

This last line (which is actually from the previous patch) is going to
always truncate the stored data to 20 bytes, isn't it?

I think we need to define a kh_oid. We have the "set" version already in
oidset.[ch]; I think we need to make that more public.

-Peff
brian m. carlson Feb. 13, 2019, midnight UTC | #2
On Tue, Feb 12, 2019 at 01:37:49AM -0500, Jeff King wrote:
> On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote:
> > -static uint32_t find_object_pos(const unsigned char *sha1)
> > +static uint32_t find_object_pos(const unsigned char *hash)
> 
> Isn't this really just a "struct object_id"? Why don't we want to use
> that here?
> 
> I suspect it may be partially because our khash is storing raw sha1s.
> But shouldn't we also be converted that to store object_ids?

I think probably there are some more places that we could convert here.
There may have been one or two places that weren't convertible because
we ended up passing data from some sort of buffer around. I'll take
another look.

> In particular:
> 
> > @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
> >  	stored->root = root;
> >  	stored->xor = xor_with;
> >  	stored->flags = flags;
> > -	oidread(&stored->oid, sha1);
> > +	oidread(&stored->oid, hash);
> >  
> >  	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
> 
> This last line (which is actually from the previous patch) is going to
> always truncate the stored data to 20 bytes, isn't it?

No, I don't think it does. The _sha1 variant stores pointers to unsigned
char, while the _oid variant stores the entire struct object_id (not
just a pointer to it). We don't care how much data the pointer points
to.

> I think we need to define a kh_oid. We have the "set" version already in
> oidset.[ch]; I think we need to make that more public.

I wrote this quite a bit before that code came in, which is probably why
I didn't do that originally. I seem to recall last time I looked at this
that there was some reason that hoisting this didn't work as I expected
due to header include order, but I'll take a look and see if I can
figure out a way to do this.
Jeff King Feb. 14, 2019, 4:41 a.m. UTC | #3
On Wed, Feb 13, 2019 at 12:00:07AM +0000, brian m. carlson wrote:

> On Tue, Feb 12, 2019 at 01:37:49AM -0500, Jeff King wrote:
> > On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote:
> > > -static uint32_t find_object_pos(const unsigned char *sha1)
> > > +static uint32_t find_object_pos(const unsigned char *hash)
> > 
> > Isn't this really just a "struct object_id"? Why don't we want to use
> > that here?
> > 
> > I suspect it may be partially because our khash is storing raw sha1s.
> > But shouldn't we also be converted that to store object_ids?
> 
> I think probably there are some more places that we could convert here.
> There may have been one or two places that weren't convertible because
> we ended up passing data from some sort of buffer around. I'll take
> another look.

Thanks. I don't want to derail you too much if you have a series of
other changes on top. And moving to "hash" here is a step in the right
direction. But if we can take it all the way to object_id while we're
looking at it, I think that's preferable.

> > >  	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
> > 
> > This last line (which is actually from the previous patch) is going to
> > always truncate the stored data to 20 bytes, isn't it?
> 
> No, I don't think it does. The _sha1 variant stores pointers to unsigned
> char, while the _oid variant stores the entire struct object_id (not
> just a pointer to it). We don't care how much data the pointer points
> to.

Oh, you're right. I was thinking it actually stored the 20-byte
sequences, but I was just reading it wrong. Sorry for the confusion.

> > I think we need to define a kh_oid. We have the "set" version already in
> > oidset.[ch]; I think we need to make that more public.
> 
> I wrote this quite a bit before that code came in, which is probably why
> I didn't do that originally. I seem to recall last time I looked at this
> that there was some reason that hoisting this didn't work as I expected
> due to header include order, but I'll take a look and see if I can
> figure out a way to do this.

Great, thanks!

-Peff

Patch
diff mbox series

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c82fb01fd7..802ed62677 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -142,13 +142,13 @@  static inline void reset_all_seen(void)
 	seen_objects_nr = 0;
 }
 
-static uint32_t find_object_pos(const unsigned char *sha1)
+static uint32_t find_object_pos(const unsigned char *hash)
 {
-	struct object_entry *entry = packlist_find(writer.to_pack, sha1, NULL);
+	struct object_entry *entry = packlist_find(writer.to_pack, hash, NULL);
 
 	if (!entry) {
 		die("Failed to write bitmap index. Packfile doesn't have full closure "
-			"(object %s is missing)", sha1_to_hex(sha1));
+			"(object %s is missing)", hash_to_hex(hash));
 	}
 
 	return oe_in_pack_pos(writer.to_pack, entry);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index c760913cea..6d6fa68563 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -169,7 +169,7 @@  static int load_bitmap_header(struct bitmap_index *index)
 
 static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  struct ewah_bitmap *root,
-					  const unsigned char *sha1,
+					  const unsigned char *hash,
 					  struct stored_bitmap *xor_with,
 					  int flags)
 {
@@ -181,7 +181,7 @@  static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
-	oidread(&stored->oid, sha1);
+	oidread(&stored->oid, hash);
 
 	hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret);
 
@@ -189,7 +189,7 @@  static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	 * because the SHA1 already existed on the map. this is bad, there
 	 * shouldn't be duplicated commits in the index */
 	if (ret == 0) {
-		error("Duplicate entry in bitmap index: %s", sha1_to_hex(sha1));
+		error("Duplicate entry in bitmap index: %s", hash_to_hex(hash));
 		return NULL;
 	}
 
@@ -805,7 +805,7 @@  int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 
 		fprintf(stderr, "Failed to reuse at %d (%016llx)\n",
 			reuse_objects, result->words[i]);
-		fprintf(stderr, " %s\n", sha1_to_hex(sha1));
+		fprintf(stderr, " %s\n", hash_to_hex(sha1));
 	}
 #endif