Message ID | 20190212012256.1005924-5-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Hash function transition part 16 | expand |
On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote: > Replace the uses of sha1_to_hex in the pack bitmap code with hash_to_hex > to allow the use of SHA-256 as well. Rename a few variables since they > are no longer limited to SHA-1. Sounds good, although... > -static uint32_t find_object_pos(const unsigned char *sha1) > +static uint32_t find_object_pos(const unsigned char *hash) Isn't this really just a "struct object_id"? Why don't we want to use that here? I suspect it may be partially because our khash is storing raw sha1s. But shouldn't we also be converted that to store object_ids? In particular: > @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index, > stored->root = root; > stored->xor = xor_with; > stored->flags = flags; > - oidread(&stored->oid, sha1); > + oidread(&stored->oid, hash); > > hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret); This last line (which is actually from the previous patch) is going to always truncate the stored data to 20 bytes, isn't it? I think we need to define a kh_oid. We have the "set" version already in oidset.[ch]; I think we need to make that more public. -Peff
On Tue, Feb 12, 2019 at 01:37:49AM -0500, Jeff King wrote: > On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote: > > -static uint32_t find_object_pos(const unsigned char *sha1) > > +static uint32_t find_object_pos(const unsigned char *hash) > > Isn't this really just a "struct object_id"? Why don't we want to use > that here? > > I suspect it may be partially because our khash is storing raw sha1s. > But shouldn't we also be converted that to store object_ids? I think probably there are some more places that we could convert here. There may have been one or two places that weren't convertible because we ended up passing data from some sort of buffer around. I'll take another look. > In particular: > > > @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index, > > stored->root = root; > > stored->xor = xor_with; > > stored->flags = flags; > > - oidread(&stored->oid, sha1); > > + oidread(&stored->oid, hash); > > > > hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret); > > This last line (which is actually from the previous patch) is going to > always truncate the stored data to 20 bytes, isn't it? No, I don't think it does. The _sha1 variant stores pointers to unsigned char, while the _oid variant stores the entire struct object_id (not just a pointer to it). We don't care how much data the pointer points to. > I think we need to define a kh_oid. We have the "set" version already in > oidset.[ch]; I think we need to make that more public. I wrote this quite a bit before that code came in, which is probably why I didn't do that originally. I seem to recall last time I looked at this that there was some reason that hoisting this didn't work as I expected due to header include order, but I'll take a look and see if I can figure out a way to do this.
On Wed, Feb 13, 2019 at 12:00:07AM +0000, brian m. carlson wrote: > On Tue, Feb 12, 2019 at 01:37:49AM -0500, Jeff King wrote: > > On Tue, Feb 12, 2019 at 01:22:29AM +0000, brian m. carlson wrote: > > > -static uint32_t find_object_pos(const unsigned char *sha1) > > > +static uint32_t find_object_pos(const unsigned char *hash) > > > > Isn't this really just a "struct object_id"? Why don't we want to use > > that here? > > > > I suspect it may be partially because our khash is storing raw sha1s. > > But shouldn't we also be converted that to store object_ids? > > I think probably there are some more places that we could convert here. > There may have been one or two places that weren't convertible because > we ended up passing data from some sort of buffer around. I'll take > another look. Thanks. I don't want to derail you too much if you have a series of other changes on top. And moving to "hash" here is a step in the right direction. But if we can take it all the way to object_id while we're looking at it, I think that's preferable. > > > hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret); > > > > This last line (which is actually from the previous patch) is going to > > always truncate the stored data to 20 bytes, isn't it? > > No, I don't think it does. The _sha1 variant stores pointers to unsigned > char, while the _oid variant stores the entire struct object_id (not > just a pointer to it). We don't care how much data the pointer points > to. Oh, you're right. I was thinking it actually stored the 20-byte sequences, but I was just reading it wrong. Sorry for the confusion. > > I think we need to define a kh_oid. We have the "set" version already in > > oidset.[ch]; I think we need to make that more public. > > I wrote this quite a bit before that code came in, which is probably why > I didn't do that originally. I seem to recall last time I looked at this > that there was some reason that hoisting this didn't work as I expected > due to header include order, but I'll take a look and see if I can > figure out a way to do this. Great, thanks! -Peff
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index c82fb01fd7..802ed62677 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -142,13 +142,13 @@ static inline void reset_all_seen(void) seen_objects_nr = 0; } -static uint32_t find_object_pos(const unsigned char *sha1) +static uint32_t find_object_pos(const unsigned char *hash) { - struct object_entry *entry = packlist_find(writer.to_pack, sha1, NULL); + struct object_entry *entry = packlist_find(writer.to_pack, hash, NULL); if (!entry) { die("Failed to write bitmap index. Packfile doesn't have full closure " - "(object %s is missing)", sha1_to_hex(sha1)); + "(object %s is missing)", hash_to_hex(hash)); } return oe_in_pack_pos(writer.to_pack, entry); diff --git a/pack-bitmap.c b/pack-bitmap.c index c760913cea..6d6fa68563 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -169,7 +169,7 @@ static int load_bitmap_header(struct bitmap_index *index) static struct stored_bitmap *store_bitmap(struct bitmap_index *index, struct ewah_bitmap *root, - const unsigned char *sha1, + const unsigned char *hash, struct stored_bitmap *xor_with, int flags) { @@ -181,7 +181,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index, stored->root = root; stored->xor = xor_with; stored->flags = flags; - oidread(&stored->oid, sha1); + oidread(&stored->oid, hash); hash_pos = kh_put_sha1(index->bitmaps, stored->oid.hash, &ret); @@ -189,7 +189,7 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index, * because the SHA1 already existed on the map. this is bad, there * shouldn't be duplicated commits in the index */ if (ret == 0) { - error("Duplicate entry in bitmap index: %s", sha1_to_hex(sha1)); + error("Duplicate entry in bitmap index: %s", hash_to_hex(hash)); return NULL; } @@ -805,7 +805,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, fprintf(stderr, "Failed to reuse at %d (%016llx)\n", reuse_objects, result->words[i]); - fprintf(stderr, " %s\n", sha1_to_hex(sha1)); + fprintf(stderr, " %s\n", hash_to_hex(sha1)); } #endif
Replace the uses of sha1_to_hex in the pack bitmap code with hash_to_hex to allow the use of SHA-256 as well. Rename a few variables since they are no longer limited to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- pack-bitmap-write.c | 6 +++--- pack-bitmap.c | 8 ++++---- 2 files changed, 7 insertions(+), 7 deletions(-)