[v2,03/13] hex: introduce functions to print arbitrary hashes
diff mbox series

Message ID 20181015021900.1030041-4-sandals@crustytoothpaste.net
State New
Headers show
Series
  • Base SHA-256 implementation
Related show

Commit Message

brian m. carlson Oct. 15, 2018, 2:18 a.m. UTC
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 15 +++++++++------
 hex.c   | 37 +++++++++++++++++++++++++++++--------
 2 files changed, 38 insertions(+), 14 deletions(-)

Comments

Junio C Hamano Oct. 16, 2018, 1:54 a.m. UTC | #1
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> diff --git a/cache.h b/cache.h
> index d508f3d4f8..a13d14ce0a 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1361,9 +1361,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
>  extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
>  
>  /*
> - * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
> + * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
>   * and writes the NUL-terminated output to the buffer `out`, which must be at
> - * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
> + * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
>   * convenience.
>   *
>   * The non-`_r` variant returns a static buffer, but uses a ring of 4
> @@ -1371,10 +1371,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
>   *
>   *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
>   */
> -extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
> -extern char *oid_to_hex_r(char *out, const struct object_id *oid);
> -extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
> -extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
> +char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
> +char *sha1_to_hex_r(char *out, const unsigned char *sha1);
> +char *oid_to_hex_r(char *out, const struct object_id *oid);
> +char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
> +char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
> +char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
> +char *oid_to_hex(const struct object_id *oid);			/* same static buffer */

Even though in hex.c I see mixture of *_algo and *_algop helper
functions, I see only "algo" variants above.  Is it our official
stance to use primarily the integer index into the algo array when
specifying the hash, and when a caller into 'multi-hash' API happens
to have other things, it should use functions in 2/13 to convert it
to the canonical "int algo" beforehand?

I am not saying it is bad or good to choose the index into the algo
array as the primary way to specify the algorithm.  I think it is a
good idea to pick one and stick to it, and I wanted to make sure
that the choice we made here is clearly communicated to any future
developer who read this code.

> +char *sha1_to_hex(const unsigned char *sha1)
> +{
> +	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
> +}
> +
> +char *hash_to_hex(const unsigned char *hash)
> +{
> +	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
>  }
>  
>  char *oid_to_hex(const struct object_id *oid)
>  {
> -	return sha1_to_hex(oid->hash);
> +	return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
>  }

Having said the above, seeing the use of hash_algo_by_ptr() here
makes me suspect if it makes more sense to use the algop as the
primary way to specify which algorithm the caller wants to use.
IOW, making the set of helpers in 02/13 to allow quering by name,
format-id, or the integer index and have them all return a pointer
to "const struct git_hash_algo".  Two immediate downsides I can see
is that it exposes the actual structure to the callers (but is it
really a problem?  Outside callers learn hash sizes etc. by accessing
its fields anyway without treating the algo struct as opaque.), and
passing an 8-byte pointer may be more costly than passing a small
integer index that ranges between 0 and 1 at most (assuming that
we'd only use SHA-1 and "the current NewHash" in the code).
brian m. carlson Oct. 17, 2018, 11:49 p.m. UTC | #2
On Tue, Oct 16, 2018 at 10:54:23AM +0900, Junio C Hamano wrote:
> Even though in hex.c I see mixture of *_algo and *_algop helper
> functions, I see only "algo" variants above.  Is it our official
> stance to use primarily the integer index into the algo array when
> specifying the hash, and when a caller into 'multi-hash' API happens
> to have other things, it should use functions in 2/13 to convert it
> to the canonical "int algo" beforehand?

That was my intention, yes.

> I am not saying it is bad or good to choose the index into the algo
> array as the primary way to specify the algorithm.  I think it is a
> good idea to pick one and stick to it, and I wanted to make sure
> that the choice we made here is clearly communicated to any future
> developer who read this code.

Yeah, that was my feeling as well.  I wanted to pick something fixed and
stick to it.

> Having said the above, seeing the use of hash_algo_by_ptr() here
> makes me suspect if it makes more sense to use the algop as the
> primary way to specify which algorithm the caller wants to use.
> IOW, making the set of helpers in 02/13 to allow quering by name,
> format-id, or the integer index and have them all return a pointer
> to "const struct git_hash_algo".  Two immediate downsides I can see
> is that it exposes the actual structure to the callers (but is it
> really a problem?  Outside callers learn hash sizes etc. by accessing
> its fields anyway without treating the algo struct as opaque.), and
> passing an 8-byte pointer may be more costly than passing a small
> integer index that ranges between 0 and 1 at most (assuming that
> we'd only use SHA-1 and "the current NewHash" in the code).

I thought about this.  The one downside to this is that we can't use
those values anywhere we need an integer constant expression, like a
switch.  I suppose that just means we need hash_algo_by_ptr in those
cases, and not everywhere else, which would make the code cleaner.

Let me reroll with that change, and we'll see if we like it better.  If
we don't, I can pull the old version out of history.

Patch
diff mbox series

diff --git a/cache.h b/cache.h
index d508f3d4f8..a13d14ce0a 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,9 +1361,9 @@  extern int get_oid_hex(const char *hex, struct object_id *sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1371,10 +1371,13 @@  extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
-extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
+char *oid_to_hex(const struct object_id *oid);			/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..080597ad3f 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@  int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 	return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+static inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+					const struct git_hash_algo *algop)
 {
 	static const char hex[] = "0123456789abcdef";
 	char *buf = buffer;
 	int i;
 
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
-		unsigned int val = *sha1++;
+	for (i = 0; i < algop->rawsz; i++) {
+		unsigned int val = *hash++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
 	}
@@ -89,20 +90,40 @@  char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 	return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo)
 {
-	return sha1_to_hex_r(buffer, oid->hash);
+	return hash_to_hex_algop_r(buffer, hash, &hash_algos[algo]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+{
+	return hash_to_hex_algo_r(buffer, sha1, GIT_HASH_SHA1);
+}
+
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+	return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algo(const unsigned char *hash, int algo)
 {
 	static int bufno;
 	static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
 	bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-	return sha1_to_hex_r(hexbuffer[bufno], sha1);
+	return hash_to_hex_algo_r(hexbuffer[bufno], hash, algo);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-	return sha1_to_hex(oid->hash);
+	return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
 }