diff mbox series

[06/14] packfile: express constants in terms of the_hash_algo

Message ID 20181008215701.779099-7-sandals@crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series Hash function transition part 15 | expand

Commit Message

brian m. carlson Oct. 8, 2018, 9:56 p.m. UTC
Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid
dependence on a particular hash length.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 packfile.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Stefan Beller Oct. 8, 2018, 10:59 p.m. UTC | #1
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid
> dependence on a particular hash length.

Unlike the previous patches, this is dealing directly with packfiles,
which (I would think) carry their own hash function selector?
(i.e. packfiles up to version 4 are sha1 hardcoded and version
5 and onwards will have a hash type field. Usually that hash type would
match what is in the_repository, but you could obtain packfiles
out of band, or the translation table that we plan to have might
be part of the packfile/idx file?)


>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  packfile.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/packfile.c b/packfile.c
> index 841b36182f..17f993b5bf 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1121,13 +1121,14 @@ int unpack_object_header(struct packed_git *p,
>  void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
>  {
>         unsigned i;
> +       const unsigned hashsz = the_hash_algo->rawsz;
>         for (i = 0; i < p->num_bad_objects; i++)
> -               if (hasheq(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
> +               if (hasheq(sha1, p->bad_object_sha1 + hashsz * i))
>                         return;
>         p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
>                                       st_mult(GIT_MAX_RAWSZ,
>                                               st_add(p->num_bad_objects, 1)));
> -       hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1);
> +       hashcpy(p->bad_object_sha1 + hashsz * p->num_bad_objects, sha1);
>         p->num_bad_objects++;
>  }
>
brian m. carlson Oct. 9, 2018, 10:25 p.m. UTC | #2
On Mon, Oct 08, 2018 at 03:59:36PM -0700, Stefan Beller wrote:
> On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid
> > dependence on a particular hash length.
> 
> Unlike the previous patches, this is dealing directly with packfiles,
> which (I would think) carry their own hash function selector?
> (i.e. packfiles up to version 4 are sha1 hardcoded and version
> 5 and onwards will have a hash type field. Usually that hash type would
> match what is in the_repository, but you could obtain packfiles
> out of band, or the translation table that we plan to have might
> be part of the packfile/idx file?)

Yeah, the transition plan doesn't specify a format for pack files, but
we may end up needing one.  We definitely have a specified format for
index files already, and that's where the translation table will be.
Anything other than the pack index and the loose object index in the
.git directory will have the same algorithm as the rest of the
repository, so technically we could use any pack format as long as it
lives in the .git directory.

This code is mostly here on an interim basis to let us compile with a
fully SHA-256 (no SHA-1) Git.  Once that piece is done, we can move on
to a stage 4 Git, which can do either only SHA-256, or only SHA-1, where
we'll learn about various pack file formats and detecting the algorithm
from them.
Stefan Beller Oct. 9, 2018, 10:34 p.m. UTC | #3
On Tue, Oct 9, 2018 at 3:25 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On Mon, Oct 08, 2018 at 03:59:36PM -0700, Stefan Beller wrote:
> > On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
> > <sandals@crustytoothpaste.net> wrote:
> > >
> > > Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid
> > > dependence on a particular hash length.
> >
> > Unlike the previous patches, this is dealing directly with packfiles,
> > which (I would think) carry their own hash function selector?
> > (i.e. packfiles up to version 4 are sha1 hardcoded and version
> > 5 and onwards will have a hash type field. Usually that hash type would
> > match what is in the_repository, but you could obtain packfiles
> > out of band, or the translation table that we plan to have might
> > be part of the packfile/idx file?)
>
> Yeah, the transition plan doesn't specify a format for pack files, but
> we may end up needing one.  We definitely have a specified format for
> index files already, and that's where the translation table will be.
> Anything other than the pack index and the loose object index in the
> .git directory will have the same algorithm as the rest of the
> repository, so technically we could use any pack format as long as it
> lives in the .git directory.
>
> This code is mostly here on an interim basis to let us compile with a
> fully SHA-256 (no SHA-1) Git.  Once that piece is done, we can move on
> to a stage 4 Git, which can do either only SHA-256, or only SHA-1, where
> we'll learn about various pack file formats and detecting the algorithm
> from them.

This second paragraph really helps to put things into perspective, thanks!
I assume this interim base of code only applies to this patch?
(In that case maybe put it into the commit message?)
brian m. carlson Oct. 9, 2018, 10:54 p.m. UTC | #4
On Tue, Oct 09, 2018 at 03:34:17PM -0700, Stefan Beller wrote:
> On Tue, Oct 9, 2018 at 3:25 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > This code is mostly here on an interim basis to let us compile with a
> > fully SHA-256 (no SHA-1) Git.  Once that piece is done, we can move on
> > to a stage 4 Git, which can do either only SHA-256, or only SHA-1, where
> > we'll learn about various pack file formats and detecting the algorithm
> > from them.
> 
> This second paragraph really helps to put things into perspective, thanks!
> I assume this interim base of code only applies to this patch?
> (In that case maybe put it into the commit message?)

That comment will apply to most of the changes to the packfile code,
whether in this series or in future series.  However, after your
question, I was indeed going to put it into the commit message when I
reroll.
diff mbox series

Patch

diff --git a/packfile.c b/packfile.c
index 841b36182f..17f993b5bf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1121,13 +1121,14 @@  int unpack_object_header(struct packed_git *p,
 void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
 {
 	unsigned i;
+	const unsigned hashsz = the_hash_algo->rawsz;
 	for (i = 0; i < p->num_bad_objects; i++)
-		if (hasheq(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
+		if (hasheq(sha1, p->bad_object_sha1 + hashsz * i))
 			return;
 	p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
 				      st_mult(GIT_MAX_RAWSZ,
 					      st_add(p->num_bad_objects, 1)));
-	hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1);
+	hashcpy(p->bad_object_sha1 + hashsz * p->num_bad_objects, sha1);
 	p->num_bad_objects++;
 }