Message ID | 20191111090418.GB12545@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | getting rid of sha1_to_hex() | expand |
On Mon, Nov 11, 2019 at 04:04:18AM -0500, Jeff King wrote: > There's only a single caller left of sha1_to_hex(), since everybody now > uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we > print a hex sha1 when we find a collision. This one will always be sha1, > regardless of the current hash algorithm, so we can't use oid_to_hex() Nit: s/oid_to_hex/hash_to_hex/ We can't use oid_to_hex() because we don't have a 'struct object_id' in the first place, as sha1dc only ever deals with 20 unsigned chars. > here. In practice we'd probably not be running sha1 at all if it isn't > the current algorithm, but it's possible we might still occasionally > need to compute a sha1 in a post-sha256 world. > > Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's > call that ourselves. There's value in getting rid of the sha1-specific > wrapper to de-clutter the global namespace, and to make sure nobody uses > it (and as with sha1_to_hex_r() in the previous patch, we'll drop the > coccinelle transformations, too). > diff --git a/sha1dc_git.c b/sha1dc_git.c > index e0cc9d988c..5c300e812e 100644 > --- a/sha1dc_git.c > +++ b/sha1dc_git.c > @@ -19,7 +19,7 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx) > if (!SHA1DCFinal(hash, ctx)) > return; > die("SHA-1 appears to be part of a collision attack: %s", > - sha1_to_hex(hash)); > + hash_to_hex_algop(hash, &hash_algos[GIT_HASH_SHA1])); > } > > /* > -- > 2.24.0.739.gb5632e4929
On Mon, Nov 11, 2019 at 03:18:05PM +0100, SZEDER Gábor wrote: > On Mon, Nov 11, 2019 at 04:04:18AM -0500, Jeff King wrote: > > There's only a single caller left of sha1_to_hex(), since everybody now > > uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we > > print a hex sha1 when we find a collision. This one will always be sha1, > > regardless of the current hash algorithm, so we can't use oid_to_hex() > > Nit: s/oid_to_hex/hash_to_hex/ > > We can't use oid_to_hex() because we don't have a 'struct object_id' > in the first place, as sha1dc only ever deals with 20 unsigned chars. Ah, you're right. I admit I am still getting up to speed on all of the new hash-agnostic versions of the various functions. -Peff
Jeff King <peff@peff.net> writes: > On Mon, Nov 11, 2019 at 03:18:05PM +0100, SZEDER Gábor wrote: > >> On Mon, Nov 11, 2019 at 04:04:18AM -0500, Jeff King wrote: >> > There's only a single caller left of sha1_to_hex(), since everybody now >> > uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we >> > print a hex sha1 when we find a collision. This one will always be sha1, >> > regardless of the current hash algorithm, so we can't use oid_to_hex() >> >> Nit: s/oid_to_hex/hash_to_hex/ >> >> We can't use oid_to_hex() because we don't have a 'struct object_id' >> in the first place, as sha1dc only ever deals with 20 unsigned chars. > > Ah, you're right. I admit I am still getting up to speed on all of the > new hash-agnostic versions of the various functions. Thanks. I've amended this one and the range diff since the pushout yesterday looks like this. 1: 8a030f1796 ! 1: 02d21d4117 hex: drop sha1_to_hex() @@ Commit message hex: drop sha1_to_hex() There's only a single caller left of sha1_to_hex(), since everybody now - uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we + uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we print a hex sha1 when we find a collision. This one will always be sha1, - regardless of the current hash algorithm, so we can't use oid_to_hex() + regardless of the current hash algorithm, so we can't use hash_to_hex() here. In practice we'd probably not be running sha1 at all if it isn't the current algorithm, but it's possible we might still occasionally need to compute a sha1 in a post-sha256 world. @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); * buffers, making it safe to make multiple calls for a single statement, like: * - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); -+ * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); */ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); char *oid_to_hex_r(char *out, const struct object_id *oid);
On Tue, Nov 12, 2019 at 01:13:58PM +0900, Junio C Hamano wrote: > >> We can't use oid_to_hex() because we don't have a 'struct object_id' > >> in the first place, as sha1dc only ever deals with 20 unsigned chars. > > > > Ah, you're right. I admit I am still getting up to speed on all of the > > new hash-agnostic versions of the various functions. > > Thanks. I've amended this one and the range diff since the pushout > yesterday looks like this. Thanks. This first hunk is what I would have done: > 1: 8a030f1796 ! 1: 02d21d4117 hex: drop sha1_to_hex() > @@ Commit message > hex: drop sha1_to_hex() > > There's only a single caller left of sha1_to_hex(), since everybody now > - uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we > + uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we > print a hex sha1 when we find a collision. This one will always be sha1, > - regardless of the current hash algorithm, so we can't use oid_to_hex() > + regardless of the current hash algorithm, so we can't use hash_to_hex() > here. In practice we'd probably not be running sha1 at all if it isn't > the current algorithm, but it's possible we might still occasionally > need to compute a sha1 in a post-sha256 world. This second one is OK, but not entirely necessary: > @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); > * buffers, making it safe to make multiple calls for a single statement, like: > * > - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); > -+ * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); > ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); > */ > char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); > char *oid_to_hex_r(char *out, const struct object_id *oid); This one-liner leaves the types of "one" and "two" unspecified. :) So it's not wrong to use hash_to_hex(), but maybe it's better to be pushing people towards oid_to_hex() as their first choice? It probably doesn't matter too much either way. -Peff
On Tue, Nov 12, 2019 at 05:57:59AM -0500, Jeff King wrote: > On Tue, Nov 12, 2019 at 01:13:58PM +0900, Junio C Hamano wrote: > > > >> We can't use oid_to_hex() because we don't have a 'struct object_id' > > >> in the first place, as sha1dc only ever deals with 20 unsigned chars. > > > > > > Ah, you're right. I admit I am still getting up to speed on all of the > > > new hash-agnostic versions of the various functions. > > > > Thanks. I've amended this one and the range diff since the pushout > > yesterday looks like this. > > Thanks. This first hunk is what I would have done: > > > 1: 8a030f1796 ! 1: 02d21d4117 hex: drop sha1_to_hex() > > @@ Commit message > > hex: drop sha1_to_hex() > > > > There's only a single caller left of sha1_to_hex(), since everybody now > > - uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we > > + uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we > > print a hex sha1 when we find a collision. This one will always be sha1, > > - regardless of the current hash algorithm, so we can't use oid_to_hex() > > + regardless of the current hash algorithm, so we can't use hash_to_hex() > > here. In practice we'd probably not be running sha1 at all if it isn't > > the current algorithm, but it's possible we might still occasionally > > need to compute a sha1 in a post-sha256 world. > > This second one is OK, but not entirely necessary: > > > @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); > > * buffers, making it safe to make multiple calls for a single statement, like: > > * > > - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); > > -+ * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); > > ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); > > */ > > char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); > > char *oid_to_hex_r(char *out, const struct object_id *oid); > > This one-liner leaves the types of "one" and "two" unspecified. :) So > it's not wrong to use hash_to_hex(), but maybe it's better to be pushing > people towards oid_to_hex() as their first choice? It probably doesn't > matter too much either way. Yeah, most (over 96%) of the hashes that we print are actually object ids, so oid_to_hex() is the right function to use most of the time. And because of this the updated "since everybody uses hash_to_hex()" in the first hunk sounds a bit wrong, because barely anybody actually uses hash_to_hex().
Jeff King <peff@peff.net> writes: >> @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); >> * buffers, making it safe to make multiple calls for a single statement, like: >> * >> - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); >> -+ * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); >> ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); >> */ >> char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); >> char *oid_to_hex_r(char *out, const struct object_id *oid); > > This one-liner leaves the types of "one" and "two" unspecified. :) So > it's not wrong to use hash_to_hex(), but maybe it's better to be pushing > people towards oid_to_hex() as their first choice? It probably doesn't > matter too much either way. The pre-context of that comment reads: * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant, * and writes the NUL-terminated output to the buffer `out`, which must be at * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for * convenience. so I think the intent of the example that used to use sha1_to_hex() has been the raw bytestring that is the object name, not its form that is encapsulated in the "struct object_id()".
On Tue, Nov 12, 2019 at 12:44:58PM +0100, SZEDER Gábor wrote: > > > 1: 8a030f1796 ! 1: 02d21d4117 hex: drop sha1_to_hex() > > > @@ Commit message > > > hex: drop sha1_to_hex() > > > > > > There's only a single caller left of sha1_to_hex(), since everybody now > > > - uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we > > > + uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we > > > print a hex sha1 when we find a collision. This one will always be sha1, > > > - regardless of the current hash algorithm, so we can't use oid_to_hex() > > > + regardless of the current hash algorithm, so we can't use hash_to_hex() > > > here. In practice we'd probably not be running sha1 at all if it isn't > > > the current algorithm, but it's possible we might still occasionally > > > need to compute a sha1 in a post-sha256 world. > [...] > And because of this the updated "since everybody uses hash_to_hex()" > in the first hunk sounds a bit wrong, because barely anybody actually > uses hash_to_hex(). You're right, I should have read more carefully. -Peff
On Tue, Nov 12, 2019 at 08:49:44PM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); > >> * buffers, making it safe to make multiple calls for a single statement, like: > >> * > >> - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); > >> -+ * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); > >> ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); > >> */ > >> char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); > >> char *oid_to_hex_r(char *out, const struct object_id *oid); > > > > This one-liner leaves the types of "one" and "two" unspecified. :) So > > it's not wrong to use hash_to_hex(), but maybe it's better to be pushing > > people towards oid_to_hex() as their first choice? It probably doesn't > > matter too much either way. > > The pre-context of that comment reads: > > * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant, > * and writes the NUL-terminated output to the buffer `out`, which must be at > * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for > * convenience. > > so I think the intent of the example that used to use sha1_to_hex() > has been the raw bytestring that is the object name, not its form > that is encapsulated in the "struct object_id()". I guess you're keying on the phrase "binary hash" there (the "GIT_MAX_HEXZ" bits only apply to the "_r" variants anyway). I'd read it as encompassing all of the functions below, including oid_to_hex(). But I'm OK with it either way. -Peff
Jeff King <peff@peff.net> writes: > On Tue, Nov 12, 2019 at 08:49:44PM +0900, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> >> @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); >> >> * buffers, making it safe to make multiple calls for a single statement, like: >> >> * >> >> - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); >> >> -+ * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); >> >> ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); >> >> */ >> >> char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); >> >> char *oid_to_hex_r(char *out, const struct object_id *oid); >> > >> > This one-liner leaves the types of "one" and "two" unspecified. :) So >> > it's not wrong to use hash_to_hex(), but maybe it's better to be pushing >> > people towards oid_to_hex() as their first choice? It probably doesn't >> > matter too much either way. >> >> The pre-context of that comment reads: >> >> * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant, >> * and writes the NUL-terminated output to the buffer `out`, which must be at >> * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for >> * convenience. >> >> so I think the intent of the example that used to use sha1_to_hex() >> has been the raw bytestring that is the object name, not its form >> that is encapsulated in the "struct object_id()". > > I guess you're keying on the phrase "binary hash" there (the > "GIT_MAX_HEXZ" bits only apply to the "_r" variants anyway). I'd read it > as encompassing all of the functions below, including oid_to_hex(). But > I'm OK with it either way. Yes. "binary hash" is about "unsigned char[]". I think that's historical accident---we added "struct object_id *" variants without updating the comment. Here is another try. The "everybody uses oid_to_hex" in the log message has also been updated. 1: 8a030f1796 ! 1: b19f3fe9dd hex: drop sha1_to_hex() @@ Metadata ## Commit message ## hex: drop sha1_to_hex() - There's only a single caller left of sha1_to_hex(), since everybody now - uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we - print a hex sha1 when we find a collision. This one will always be sha1, - regardless of the current hash algorithm, so we can't use oid_to_hex() - here. In practice we'd probably not be running sha1 at all if it isn't - the current algorithm, but it's possible we might still occasionally + There's only a single caller left of sha1_to_hex(), since everybody + that has an object name in "unsigned char[]" now uses hash_to_hex() + instead. + + This case is in the sha1dc wrapper, where we print a hex sha1 when + we find a collision. This one will always be sha1, regardless of the + current hash algorithm, so we can't use hash_to_hex() here. In + practice we'd probably not be running sha1 at all if it isn't the + current algorithm, but it's possible we might still occasionally need to compute a sha1 in a post-sha256 world. Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's @@ Commit message it (and as with sha1_to_hex_r() in the previous patch, we'll drop the coccinelle transformations, too). - The sha1_to_hex() function is mentioned in a comment; we can easily swap - that out for oid_to_hex() to give a better example. It's also mentioned - in some test vectors in t4100, but that's not runnable code, so there's - no point in trying to clean it up. + The sha1_to_hex() function is mentioned in a comment; we can easily + swap that out for oid_to_hex() to give a better example. Also + update the comment that was left stale when we added "struct + object_id *" as a way to name an object and added functions to + convert it to hex. + + The function is also mentioned in some test vectors in t4100, but + that's not runnable code, so there's no point in trying to clean it + up. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> ## cache.h ## +@@ cache.h: int get_oid_hex(const char *hex, struct object_id *sha1); + int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); + + /* +- * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant, ++ * Convert a binary hash in "unsigned char []" or an object name in ++ * "struct object_id *" to its hex equivalent. The `_r` variant is reentrant, + * and writes the NUL-terminated output to the buffer `out`, which must be at + * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for + * convenience. @@ cache.h: int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); * The non-`_r` variant returns a static buffer, but uses a ring of 4 * buffers, making it safe to make multiple calls for a single statement, like: * - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); ++ * printf("%s -> %s", hash_to_hex(one), hash_to_hex(two)); + * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); */ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
On Wed, Nov 13, 2019 at 10:09:23AM +0900, Junio C Hamano wrote: > Yes. "binary hash" is about "unsigned char[]". I think that's > historical accident---we added "struct object_id *" variants without > updating the comment. > > Here is another try. The "everybody uses oid_to_hex" in the log > message has also been updated. This looks good to me (and thank you for cleaning up my messy commit message). > 1: 8a030f1796 ! 1: b19f3fe9dd hex: drop sha1_to_hex() > @@ Metadata > ## Commit message ## > hex: drop sha1_to_hex() > > - There's only a single caller left of sha1_to_hex(), since everybody now > - uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we > - print a hex sha1 when we find a collision. This one will always be sha1, > - regardless of the current hash algorithm, so we can't use oid_to_hex() > - here. In practice we'd probably not be running sha1 at all if it isn't > - the current algorithm, but it's possible we might still occasionally > + There's only a single caller left of sha1_to_hex(), since everybody > + that has an object name in "unsigned char[]" now uses hash_to_hex() > + instead. That makes sense. I mentioned oid_to_hex() originally because most of the callers _did_ switch from sha1_to_hex(oid->hash) to oid_to_hex(oid). But that happened far enough in the past that the more interesting change is using the generic hash_to_hex(). -Peff
diff --git a/cache.h b/cache.h index 6a4eb221b3..a2ab10503f 100644 --- a/cache.h +++ b/cache.h @@ -1459,12 +1459,11 @@ int hex_to_bytes(unsigned char *binary, const char *hex, size_t len); * The non-`_r` variant returns a static buffer, but uses a ring of 4 * buffers, making it safe to make multiple calls for a single statement, like: * - * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); + * printf("%s -> %s", oid_to_hex(one), oid_to_hex(two)); */ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *); char *oid_to_hex_r(char *out, const struct object_id *oid); char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *); /* static buffer result! */ -char *sha1_to_hex(const unsigned char *sha1); /* same static buffer */ char *hash_to_hex(const unsigned char *hash); /* same static buffer */ char *oid_to_hex(const struct object_id *oid); /* same static buffer */ diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci index 6c0d21d8e2..ddf4f22bd7 100644 --- a/contrib/coccinelle/object_id.cocci +++ b/contrib/coccinelle/object_id.cocci @@ -10,21 +10,6 @@ struct object_id *OIDPTR; - is_null_sha1(OIDPTR->hash) + is_null_oid(OIDPTR) -@@ -struct object_id OID; -@@ -- sha1_to_hex(OID.hash) -+ oid_to_hex(&OID) - -@@ -identifier f != oid_to_hex; -struct object_id *OIDPTR; -@@ - f(...) {<... -- sha1_to_hex(OIDPTR->hash) -+ oid_to_hex(OIDPTR) - ...>} - @@ struct object_id OID; @@ diff --git a/hex.c b/hex.c index 8c3f06a192..fd7f00c43f 100644 --- a/hex.c +++ b/hex.c @@ -103,11 +103,6 @@ char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *a return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); } -char *sha1_to_hex(const unsigned char *sha1) -{ - return hash_to_hex_algop(sha1, &hash_algos[GIT_HASH_SHA1]); -} - char *hash_to_hex(const unsigned char *hash) { return hash_to_hex_algop(hash, the_hash_algo); diff --git a/sha1dc_git.c b/sha1dc_git.c index e0cc9d988c..5c300e812e 100644 --- a/sha1dc_git.c +++ b/sha1dc_git.c @@ -19,7 +19,7 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx) if (!SHA1DCFinal(hash, ctx)) return; die("SHA-1 appears to be part of a collision attack: %s", - sha1_to_hex(hash)); + hash_to_hex_algop(hash, &hash_algos[GIT_HASH_SHA1])); } /*
There's only a single caller left of sha1_to_hex(), since everybody now uses oid_to_hex() instead. This case is in the sha1dc wrapper, where we print a hex sha1 when we find a collision. This one will always be sha1, regardless of the current hash algorithm, so we can't use oid_to_hex() here. In practice we'd probably not be running sha1 at all if it isn't the current algorithm, but it's possible we might still occasionally need to compute a sha1 in a post-sha256 world. Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's call that ourselves. There's value in getting rid of the sha1-specific wrapper to de-clutter the global namespace, and to make sure nobody uses it (and as with sha1_to_hex_r() in the previous patch, we'll drop the coccinelle transformations, too). The sha1_to_hex() function is mentioned in a comment; we can easily swap that out for oid_to_hex() to give a better example. It's also mentioned in some test vectors in t4100, but that's not runnable code, so there's no point in trying to clean it up. Signed-off-by: Jeff King <peff@peff.net> --- cache.h | 3 +-- contrib/coccinelle/object_id.cocci | 15 --------------- hex.c | 5 ----- sha1dc_git.c | 2 +- 4 files changed, 2 insertions(+), 23 deletions(-)