mbox series

[v2,00/13] Base SHA-256 implementation

Message ID 20181015021900.1030041-1-sandals@crustytoothpaste.net (mailing list archive)
Headers show
Series Base SHA-256 implementation | expand

Message

brian m. carlson Oct. 15, 2018, 2:18 a.m. UTC
This series provides an actual SHA-256 implementation and wires it up,
along with some housekeeping patches to make it suitable for testing.

New in this version is a patch which reverts the change to limit hashcmp
to 20 bytes.  I've taken care to permit the compiler to inline as much
as possible for efficiency, but it would be helpful to see what the
performance impact of these changes is on real-life workflows, or with
MSVC and other non-GCC and non-clang compilers.  The resulting change is
uglier and more duplicative than I wanted, but oh, well.

I didn't attempt to pull in the full complement of code from libtomcrypt
to try to show the changes made, since that would have involved
importing a significant quantity of code in order to make things work.

I realize the increase to GIT_MAX_HEXSZ will result in an increase in
memory usage, but we're going to need to do it at some point, and the
sooner the code is in the codebase, the sooner people can play around
with it and test it.

This piece should be the final piece of preparatory work required for a
fully functioning SHA-256-only Git.  Additional work should be able to
come into the testsuite and codebase without needing to build on work in
any series after this one.

v1, which was RFC, can be seen at
https://public-inbox.org/git/20180829005857.980820-1-sandals@crustytoothpaste.net/.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (13):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL
  commit-graph: specify OID version for SHA-256

 Makefile                              |  22 ++++
 cache.h                               |  51 +++++---
 commit-graph.c                        |  40 +++---
 hash.h                                |  41 +++++-
 hex.c                                 |  37 ++++--
 sha1-file.c                           |  70 +++++++++-
 sha256/block/sha256.c                 | 180 ++++++++++++++++++++++++++
 sha256/block/sha256.h                 |  26 ++++
 sha256/gcrypt.h                       |  30 +++++
 t/helper/test-hash-speed.c            |  61 +++++++++
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c                  |  52 +-------
 t/helper/test-sha256.c                |   7 +
 t/helper/test-tool.c                  |   2 +
 t/helper/test-tool.h                  |   4 +
 t/t0014-hash.sh                       |  54 ++++++++
 16 files changed, 592 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (66%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0014-hash.sh

Range-diff against v1:
 1:  0a96c59452 =  1:  804ec2fd27 sha1-file: rename algorithm to "sha1"
 2:  65f9feba41 =  2:  5196e00b26 sha1-file: provide functions to look up hash algorithms
 3:  b6d960121d !  3:  5873510d0a hex: introduce functions to print arbitrary hashes
    @@ -13,9 +13,12 @@
         it.
     
         We use the variant taking the algorithm structure pointer as the
    -    internal variant, since in the future we'll want to replace sha1_to_hex
    -    with a hash_to_hex that handles the_hash_algo, and taking an algorithm
    -    pointer is the easiest way to handle all of the variants in use.
    +    internal variant, since taking an algorithm pointer is the easiest way
    +    to handle all of the variants in use.
    +
    +    Note that we maintain these functions because there are hashes which
    +    must change based on the hash algorithm in use but are not object IDs
    +    (such as pack checksums).
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
    @@ -47,6 +50,7 @@
     +char *oid_to_hex_r(char *out, const struct object_id *oid);
     +char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
     +char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
    ++char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
     +char *oid_to_hex(const struct object_id *oid);			/* same static buffer */
      
      /*
    @@ -108,6 +112,11 @@
     +char *sha1_to_hex(const unsigned char *sha1)
     +{
     +	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
    ++}
    ++
    ++char *hash_to_hex(const unsigned char *hash)
    ++{
    ++	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
      }
      
      char *oid_to_hex(const struct object_id *oid)
 -:  ---------- >  4:  e22bf99c93 cache: make hashcmp and hasheq work with larger hashes
 4:  984698c42a =  5:  02cbcae270 t: add basic tests for our SHA-1 implementation
 5:  61632644bd !  6:  e9ac5f265a t: make the sha1 test-tool helper generic
    @@ -14,7 +14,7 @@
      --- a/Makefile
      +++ b/Makefile
     @@
    - TEST_BUILTINS_OBJS += test-dump-split-index.o
    + TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
      TEST_BUILTINS_OBJS += test-example-decorate.o
      TEST_BUILTINS_OBJS += test-genrandom.o
     +TEST_BUILTINS_OBJS += test-hash.o
    @@ -147,7 +147,7 @@
      --- a/t/helper/test-tool.h
      +++ b/t/helper/test-tool.h
     @@
    - int cmd__wildmatch(int argc, const char **argv);
    + #endif
      int cmd__write_cache(int argc, const char **argv);
      
     +int cmd_hash_impl(int ac, const char **av, int algo);
 6:  79d5246af6 =  7:  44c9f27f9d sha1-file: add a constant for hash block size
 7:  a83ceb35a8 =  8:  59be7f1f37 t/helper: add a test helper to compute hash speed
 8:  ef3c5f3370 !  9:  09f1286769  commit-graph: convert to using the_hash_algo
    @@ -1,6 +1,6 @@
     Author: brian m. carlson <sandals@crustytoothpaste.net>
     
    -     commit-graph: convert to using the_hash_algo
    +    commit-graph: convert to using the_hash_algo
     
         Instead of using hard-coded constants for object sizes, use
         the_hash_algo to look them up.  In addition, use a function call to look
    @@ -12,6 +12,12 @@
      --- a/commit-graph.c
      +++ b/commit-graph.c
     @@
    + #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
    + #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
    + 
    +-#define GRAPH_DATA_WIDTH 36
    ++#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
    + 
      #define GRAPH_VERSION_1 0x1
      #define GRAPH_VERSION GRAPH_VERSION_1
      
 9:  298b15b577 ! 10:  cdec8c1b49 Add a base implementation of SHA-256 support
    @@ -3,39 +3,14 @@
         Add a base implementation of SHA-256 support
     
         SHA-1 is weak and we need to transition to a new hash function.  For
    -    some time, we have referred to this new function as NewHash.
    -
    -    The selection criteria for NewHash specify that it should (a) be 256
    -    bits in length, (b) have high quality implementations available, (c)
    -    should match Git's needs in terms of security, and (d) ideally, be fast
    -    to compute.
    -
    -    SHA-256 has a variety of high quality implementations across various
    -    libraries.  It is implemented by every cryptographic library we support
    -    and is available on every platform and in almost every programming
    -    language.  It is often highly optimized, since it is commonly used in
    -    TLS and elsewhere.  Additionally, there are various command line
    -    utilities that implement it, which is useful for educational and testing
    -    purposes.
    -
    -    SHA-256 is presently considered secure and has received a reasonable
    -    amount of cryptanalysis in the literature.  It is, admittedly, not
    -    resistant to length extension attacks, but Git object storage is immune
    -    to those due to the length field at the beginning.
    -
    -    SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
    -    since our default SHA-1 implementation is collision-detecting, a
    -    reasonable cryptographic library implementation of SHA-256 will actually
    -    be faster than SHA-256.  In addition, modern ARM and AMD processors (and
    -    some Intel processors) contain instructions for implementing SHA-256 in
    -    hardware, making it the fastest possible option.
    -
    -    There are other reasons to select SHA-256.  With signed commits and
    -    tags, it's possible to use SHA-256 for signatures and therefore have to
    -    rely on only one hash algorithm for security.
    +    some time, we have referred to this new function as NewHash.  Recently,
    +    we decided to pick SHA-256 as NewHash.
     
         Add a basic implementation of SHA-256 based off libtomcrypt, which is in
    -    the public domain.  Optimize it and tidy it somewhat.
    +    the public domain.  Optimize it and restructure it to meet our coding
    +    standards.  Place it in a directory called "sha256" where it and any
    +    future implementations can live so as to avoid a proliferation of
    +    implementation directories.
     
         Wire up SHA-256 in the list of hash algorithms, and add a test that the
         algorithm works correctly.
    @@ -51,8 +26,8 @@
      +++ b/Makefile
     @@
      TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
    - TEST_BUILTINS_OBJS += test-sha1-array.o
      TEST_BUILTINS_OBJS += test-sha1.o
    + TEST_BUILTINS_OBJS += test-sha1-array.o
     +TEST_BUILTINS_OBJS += test-sha256.o
      TEST_BUILTINS_OBJS += test-sigchain.o
      TEST_BUILTINS_OBJS += test-strcmp-offset.o
    @@ -459,8 +434,8 @@
      +++ b/t/helper/test-tool.c
     @@
      	{ "scrap-cache-tree", cmd__scrap_cache_tree },
    - 	{ "sha1-array", cmd__sha1_array },
      	{ "sha1", cmd__sha1 },
    + 	{ "sha1-array", cmd__sha1_array },
     +	{ "sha256", cmd__sha256 },
      	{ "sigchain", cmd__sigchain },
      	{ "strcmp-offset", cmd__strcmp_offset },
    @@ -471,8 +446,8 @@
      +++ b/t/helper/test-tool.h
     @@
      int cmd__scrap_cache_tree(int argc, const char **argv);
    - int cmd__sha1_array(int argc, const char **argv);
      int cmd__sha1(int argc, const char **argv);
    + int cmd__sha1_array(int argc, const char **argv);
     +int cmd__sha256(int argc, const char **argv);
      int cmd__sigchain(int argc, const char **argv);
      int cmd__strcmp_offset(int argc, const char **argv);
10:  c52b4be13c = 11:  d085802a89 sha256: add an SHA-256 implementation using libgcrypt
11:  225ec075d1 = 12:  4e4daf4000 hash: add an SHA-256 implementation using OpenSSL
12:  04a3ac6f29 = 13:  f206f45426 commit-graph: specify OID version for SHA-256

Comments

Junio C Hamano Oct. 16, 2018, 2 a.m. UTC | #1
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I realize the increase to GIT_MAX_HEXSZ will result in an increase in
> memory usage, but we're going to need to do it at some point, and the
> sooner the code is in the codebase, the sooner people can play around
> with it and test it.

Thanks.
Junio C Hamano Oct. 16, 2018, 4:01 a.m. UTC | #2
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>  t/t0014-hash.sh                       |  54 ++++++++
>  create mode 100755 t/t0014-hash.sh

If I am not mistaken, 0014 is already used by another topic in
flight, and will cause test-lint failure on 'pu'.
Duy Nguyen Oct. 16, 2018, 3:39 p.m. UTC | #3
On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> This series provides an actual SHA-256 implementation and wires it up,
> along with some housekeeping patches to make it suitable for testing.
>
> New in this version is a patch which reverts the change to limit hashcmp
> to 20 bytes.  I've taken care to permit the compiler to inline as much
> as possible for efficiency, but it would be helpful to see what the
> performance impact of these changes is on real-life workflows, or with
> MSVC and other non-GCC and non-clang compilers.  The resulting change is
> uglier and more duplicative than I wanted, but oh, well.
>
> I didn't attempt to pull in the full complement of code from libtomcrypt
> to try to show the changes made, since that would have involved
> importing a significant quantity of code in order to make things work.
>
> I realize the increase to GIT_MAX_HEXSZ will result in an increase in
> memory usage, but we're going to need to do it at some point, and the
> sooner the code is in the codebase, the sooner people can play around
> with it and test it.
>
> This piece should be the final piece of preparatory work required for a
> fully functioning SHA-256-only Git.  Additional work should be able to
> come into the testsuite and codebase without needing to build on work in
> any series after this one.

Thanks again for keeping working on this. I'm still sick and can't
really do a deep review. With that in mind, the patches look good.
brian m. carlson Oct. 16, 2018, 10:45 p.m. UTC | #4
On Tue, Oct 16, 2018 at 01:01:07PM +0900, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> >  t/t0014-hash.sh                       |  54 ++++++++
> >  create mode 100755 t/t0014-hash.sh
> 
> If I am not mistaken, 0014 is already used by another topic in
> flight, and will cause test-lint failure on 'pu'.

This series was written a while back.  I'll rename it.  Feel free to
wait for v3 before picking it up.