Message ID | 20181008215701.779099-7-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Hash function transition part 15 | expand |
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid > dependence on a particular hash length. Unlike the previous patches, this is dealing directly with packfiles, which (I would think) carry their own hash function selector? (i.e. packfiles up to version 4 are sha1 hardcoded and version 5 and onwards will have a hash type field. Usually that hash type would match what is in the_repository, but you could obtain packfiles out of band, or the translation table that we plan to have might be part of the packfile/idx file?) > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > packfile.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/packfile.c b/packfile.c > index 841b36182f..17f993b5bf 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -1121,13 +1121,14 @@ int unpack_object_header(struct packed_git *p, > void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1) > { > unsigned i; > + const unsigned hashsz = the_hash_algo->rawsz; > for (i = 0; i < p->num_bad_objects; i++) > - if (hasheq(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i)) > + if (hasheq(sha1, p->bad_object_sha1 + hashsz * i)) > return; > p->bad_object_sha1 = xrealloc(p->bad_object_sha1, > st_mult(GIT_MAX_RAWSZ, > st_add(p->num_bad_objects, 1))); > - hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1); > + hashcpy(p->bad_object_sha1 + hashsz * p->num_bad_objects, sha1); > p->num_bad_objects++; > } >
On Mon, Oct 08, 2018 at 03:59:36PM -0700, Stefan Beller wrote: > On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > > > Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid > > dependence on a particular hash length. > > Unlike the previous patches, this is dealing directly with packfiles, > which (I would think) carry their own hash function selector? > (i.e. packfiles up to version 4 are sha1 hardcoded and version > 5 and onwards will have a hash type field. Usually that hash type would > match what is in the_repository, but you could obtain packfiles > out of band, or the translation table that we plan to have might > be part of the packfile/idx file?) Yeah, the transition plan doesn't specify a format for pack files, but we may end up needing one. We definitely have a specified format for index files already, and that's where the translation table will be. Anything other than the pack index and the loose object index in the .git directory will have the same algorithm as the rest of the repository, so technically we could use any pack format as long as it lives in the .git directory. This code is mostly here on an interim basis to let us compile with a fully SHA-256 (no SHA-1) Git. Once that piece is done, we can move on to a stage 4 Git, which can do either only SHA-256, or only SHA-1, where we'll learn about various pack file formats and detecting the algorithm from them.
On Tue, Oct 9, 2018 at 3:25 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On Mon, Oct 08, 2018 at 03:59:36PM -0700, Stefan Beller wrote: > > On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson > > <sandals@crustytoothpaste.net> wrote: > > > > > > Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid > > > dependence on a particular hash length. > > > > Unlike the previous patches, this is dealing directly with packfiles, > > which (I would think) carry their own hash function selector? > > (i.e. packfiles up to version 4 are sha1 hardcoded and version > > 5 and onwards will have a hash type field. Usually that hash type would > > match what is in the_repository, but you could obtain packfiles > > out of band, or the translation table that we plan to have might > > be part of the packfile/idx file?) > > Yeah, the transition plan doesn't specify a format for pack files, but > we may end up needing one. We definitely have a specified format for > index files already, and that's where the translation table will be. > Anything other than the pack index and the loose object index in the > .git directory will have the same algorithm as the rest of the > repository, so technically we could use any pack format as long as it > lives in the .git directory. > > This code is mostly here on an interim basis to let us compile with a > fully SHA-256 (no SHA-1) Git. Once that piece is done, we can move on > to a stage 4 Git, which can do either only SHA-256, or only SHA-1, where > we'll learn about various pack file formats and detecting the algorithm > from them. This second paragraph really helps to put things into perspective, thanks! I assume this interim base of code only applies to this patch? (In that case maybe put it into the commit message?)
On Tue, Oct 09, 2018 at 03:34:17PM -0700, Stefan Beller wrote: > On Tue, Oct 9, 2018 at 3:25 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > This code is mostly here on an interim basis to let us compile with a > > fully SHA-256 (no SHA-1) Git. Once that piece is done, we can move on > > to a stage 4 Git, which can do either only SHA-256, or only SHA-1, where > > we'll learn about various pack file formats and detecting the algorithm > > from them. > > This second paragraph really helps to put things into perspective, thanks! > I assume this interim base of code only applies to this patch? > (In that case maybe put it into the commit message?) That comment will apply to most of the changes to the packfile code, whether in this series or in future series. However, after your question, I was indeed going to put it into the commit message when I reroll.
diff --git a/packfile.c b/packfile.c index 841b36182f..17f993b5bf 100644 --- a/packfile.c +++ b/packfile.c @@ -1121,13 +1121,14 @@ int unpack_object_header(struct packed_git *p, void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1) { unsigned i; + const unsigned hashsz = the_hash_algo->rawsz; for (i = 0; i < p->num_bad_objects; i++) - if (hasheq(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i)) + if (hasheq(sha1, p->bad_object_sha1 + hashsz * i)) return; p->bad_object_sha1 = xrealloc(p->bad_object_sha1, st_mult(GIT_MAX_RAWSZ, st_add(p->num_bad_objects, 1))); - hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1); + hashcpy(p->bad_object_sha1 + hashsz * p->num_bad_objects, sha1); p->num_bad_objects++; }
Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid dependence on a particular hash length. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- packfile.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)