mbox series

[v5,00/34] crypto: crypto API library interfaces for WireGuard

Message ID 20191108122240.28479-1-ardb@kernel.org (mailing list archive)
Headers show
Series crypto: crypto API library interfaces for WireGuard | expand

Message

Ard Biesheuvel Nov. 8, 2019, 12:22 p.m. UTC
This series implements the crypto library abstractions that are needed to
incorporate WireGuard into the mainline kernel.

Changes since v4:
- Address most review feedback from Eric, with the exception of the remark
  about libraries being selectable by the user - this is something we need
  to revisit in the context of moving to weak references or static calls to
  make accelerated versions of libraries loadable at any time. (Currently,
  loading an accelerated version at runtime will not supersede calls to the
  generic routines in the kernel proper, which is counterintuitive, and this
  is currently being addressed by making the generic library versions only
  selectable as modules if the accelerated ones are selected as modules as
  well)
- Align the generic blake2s Kconfig symbols, filenames etc with the recently
  added blake2b driver.
- Rewrote the blake2s selftest for better coverage of key length and input
  length combinations, and added a HMAC selftest as well.
- Rename blake2s_hmac() to blake2s256_hmac(), and drop the digest length
  argument, which was not implemented correctly, and never deviates from
  the full length in practice anyway.
- Update to more recent version of the blake2s x86 Zinc code

Changes since v3:
- Unify the way the generic vs arch libraries are organized between ChaCha20
  and Poly1305 on the one hand and Curve25519 and Blake2s on the other.
  All are now made up of a generic library, a generic crypto API driver
  (skcipher for [X]ChaCha, shash for Poly1305 and Blake2s and kpp for
  Curve25519) and optional per-arch versions providing both the library and
  the crypto API interfaces while potentially relying on the generic *library*
  only as a fallback (and not on the generic crypto API driver). Implementations
  of the libary interface that don't require the fallback don't pull in the
  generic code at all, but the generic crypto API drivers are tied to the
  generic implementations directly (this is necessary since we fuzz test the
  accelerated implementations against the generic implementations)
- Provide testmgr test vectors for the Curve25519 and Blake2s crypto API
  drivers that were added in this revision. This also required some changes
  to the KPP test routines so we can test for failures as well.
- Update to the latest version of Andy Polyakov's Poly1305 implementation for
  MIPS that incorporates Rene's improvements for 32r2
- Remove logic in the x86 and ARM implementations of ChaCha and Poly1305 to
  prefer the non-SIMD path for short inputs. This is no longer necessary, and
  even undesirable since it forced ChaCha20Poly1305's ChaCha pass generating
  the Poly1305 nonce to always take the slower scalar path.

Changes since v2:
- Reduce the cc: audience a bit, since I assumed that not everyone is
  interested in discussing the details of this.
- Incorporate scalar ARM code for ChaCha, and the 64-bit MIPS code for
  Poly1305. NOTE: the Cryptogams MIPS code now supports 32-bit MIPS as well,
  and not just 32r2, so I omitted Rene's Poly1305 implementation for now, and
  used Andy's code for everything.
- Incorporate NEON opt-out for Cortex-A5/A7. Note that the code is still
  exposed via the crypto API, but with a low prioririty, so it is still
  available and still gets test coverage, but is not used by default.
- Use static keys (*not* static calls) in the SIMD and bmi2/adx drivers to
  keep track of which implementation is being used, to avoid the memory
  load on each call.
- Defer using weak references or static calls until the dust around this has
  settled. Instead, rely on Kconfig constraints and symbol dependencies to
  ensure that the arch code is always used when it is loaded. This means
  you can only opt out of using the arch code if you disable it in Kconfig
  but this is something I can live with for now.
- Refactor the Curve25519 glue code slightly so that the call sites branch to
  the arch or generic code directly.
- Split up the Poly1305 refactoring patches so they can be reviewed more
  easily.

Changes since RFC/v1:
- dropped the WireGuard patch itself, and the followup patches - since the
  purpose was to illustrate the extent of the required changes, there is no
  reason to keep including them.
- import the MIPS 32r2 versions of ChaCha and Poly1305, but expose both the
  crypto API and library interfaces so that not only WireGuard but also IPsec
  and Adiantum can benefit immediately. (The latter required adding support for
  the reduced round version of ChaCha to the MIPS asm code)
- fix up various minor kconfig/build issues found in randconfig testing
  (thanks Arnd!)

Patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=wireguard-crypto-library-api-v5

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Martin Willi <martin@strongswan.org>
Cc: Rene van Dorst <opensource@vdorst.com>
Cc: David Sterba <dsterba@suse.com>

Ard Biesheuvel (27):
  crypto: tidy up lib/crypto Kconfig and Makefile
  crypto: chacha - move existing library code into lib/crypto
  crypto: x86/chacha - depend on generic chacha library instead of
    crypto driver
  crypto: x86/chacha - expose SIMD ChaCha routine as library function
  crypto: arm64/chacha - depend on generic chacha library instead of
    crypto driver
  crypto: arm64/chacha - expose arm64 ChaCha routine as library function
  crypto: arm/chacha - import Eric Biggers's scalar accelerated ChaCha
    code
  crypto: arm/chacha - remove dependency on generic ChaCha driver
  crypto: arm/chacha - expose ARM ChaCha routine as library function
  crypto: mips/chacha - wire up accelerated 32r2 code from Zinc
  crypto: chacha - unexport chacha_generic routines
  crypto: poly1305 - move core routines into a separate library
  crypto: x86/poly1305 - unify Poly1305 state struct with generic code
  crypto: poly1305 - expose init/update/final library interface
  crypto: x86/poly1305 - depend on generic library not generic shash
  crypto: x86/poly1305 - expose existing driver as poly1305 library
  crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON
    implementation
  crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON
    implementation
  crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized
    implementation
  int128: move __uint128_t compiler test to Kconfig
  crypto: testmgr - add test cases for Blake2s
  crypto: blake2s - implement generic shash driver
  crypto: curve25519 - add kpp selftest
  crypto: curve25519 - implement generic KPP driver
  crypto: lib/curve25519 - work around Clang stack spilling issue
  crypto: chacha20poly1305 - import construction and selftest from Zinc
  crypto: lib/chacha20poly1305 - reimplement crypt_from_sg() routine

Jason A. Donenfeld (7):
  crypto: mips/chacha - import 32r2 ChaCha code from Zinc
  crypto: BLAKE2s - generic C library implementation and selftest
  crypto: BLAKE2s - x86_64 SIMD implementation
  crypto: Curve25519 - generic C library implementations
  crypto: Curve25519 - x86_64 library and KPP implementations
  crypto: arm - import Bernstein and Schwabe's Curve25519 ARM
    implementation
  crypto: arm/Curve25519 - wire up NEON implementation

 arch/arm/crypto/Kconfig                   |   16 +-
 arch/arm/crypto/Makefile                  |   17 +-
 arch/arm/crypto/chacha-glue.c             |  343 +
 arch/arm/crypto/chacha-neon-glue.c        |  202 -
 arch/arm/crypto/chacha-scalar-core.S      |  460 ++
 arch/arm/crypto/curve25519-core.S         | 2062 ++++++
 arch/arm/crypto/curve25519-glue.c         |  127 +
 arch/arm/crypto/poly1305-armv4.pl         | 1236 ++++
 arch/arm/crypto/poly1305-core.S_shipped   | 1158 +++
 arch/arm/crypto/poly1305-glue.c           |  276 +
 arch/arm64/Kconfig                        |    2 +-
 arch/arm64/crypto/Kconfig                 |    9 +-
 arch/arm64/crypto/Makefile                |   10 +-
 arch/arm64/crypto/chacha-neon-glue.c      |   81 +-
 arch/arm64/crypto/poly1305-armv8.pl       |  913 +++
 arch/arm64/crypto/poly1305-core.S_shipped |  835 +++
 arch/arm64/crypto/poly1305-glue.c         |  237 +
 arch/mips/Makefile                        |    2 +-
 arch/mips/crypto/Makefile                 |   18 +
 arch/mips/crypto/chacha-core.S            |  497 ++
 arch/mips/crypto/chacha-glue.c            |  150 +
 arch/mips/crypto/poly1305-glue.c          |  203 +
 arch/mips/crypto/poly1305-mips.pl         | 1273 ++++
 arch/riscv/Kconfig                        |    2 +-
 arch/x86/Kconfig                          |    2 +-
 arch/x86/crypto/Makefile                  |    3 +
 arch/x86/crypto/blake2s-core.S            |  258 +
 arch/x86/crypto/blake2s-glue.c            |  233 +
 arch/x86/crypto/chacha_glue.c             |  181 +-
 arch/x86/crypto/curve25519-x86_64.c       | 2475 +++++++
 arch/x86/crypto/poly1305_glue.c           |  199 +-
 crypto/Kconfig                            |   71 +-
 crypto/Makefile                           |    2 +
 crypto/adiantum.c                         |    5 +-
 crypto/blake2s_generic.c                  |  171 +
 crypto/chacha_generic.c                   |   84 +-
 crypto/curve25519-generic.c               |   90 +
 crypto/ecc.c                              |    2 +-
 crypto/nhpoly1305.c                       |    3 +-
 crypto/poly1305_generic.c                 |  228 +-
 crypto/testmgr.c                          |   30 +
 crypto/testmgr.h                          | 1520 +++-
 include/crypto/blake2s.h                  |  106 +
 include/crypto/chacha.h                   |   83 +-
 include/crypto/chacha20poly1305.h         |   48 +
 include/crypto/curve25519.h               |   71 +
 include/crypto/internal/blake2s.h         |   24 +
 include/crypto/internal/chacha.h          |   43 +
 include/crypto/internal/poly1305.h        |   58 +
 include/crypto/poly1305.h                 |   69 +-
 init/Kconfig                              |    4 +
 lib/Makefile                              |    3 +-
 lib/crypto/Kconfig                        |  130 +
 lib/crypto/Makefile                       |   42 +-
 lib/crypto/blake2s-generic.c              |  111 +
 lib/crypto/blake2s-selftest.c             |  622 ++
 lib/crypto/blake2s.c                      |  126 +
 lib/{ => crypto}/chacha.c                 |   20 +-
 lib/crypto/chacha20poly1305-selftest.c    | 7393 ++++++++++++++++++++
 lib/crypto/chacha20poly1305.c             |  369 +
 lib/crypto/curve25519-fiat32.c            |  864 +++
 lib/crypto/curve25519-hacl64.c            |  788 +++
 lib/crypto/curve25519.c                   |   25 +
 lib/crypto/libchacha.c                    |   35 +
 lib/crypto/poly1305.c                     |  232 +
 lib/ubsan.c                               |    2 +-
 lib/ubsan.h                               |    2 +-
 67 files changed, 26148 insertions(+), 808 deletions(-)
 create mode 100644 arch/arm/crypto/chacha-glue.c
 delete mode 100644 arch/arm/crypto/chacha-neon-glue.c
 create mode 100644 arch/arm/crypto/chacha-scalar-core.S
 create mode 100644 arch/arm/crypto/curve25519-core.S
 create mode 100644 arch/arm/crypto/curve25519-glue.c
 create mode 100644 arch/arm/crypto/poly1305-armv4.pl
 create mode 100644 arch/arm/crypto/poly1305-core.S_shipped
 create mode 100644 arch/arm/crypto/poly1305-glue.c
 create mode 100644 arch/arm64/crypto/poly1305-armv8.pl
 create mode 100644 arch/arm64/crypto/poly1305-core.S_shipped
 create mode 100644 arch/arm64/crypto/poly1305-glue.c
 create mode 100644 arch/mips/crypto/chacha-core.S
 create mode 100644 arch/mips/crypto/chacha-glue.c
 create mode 100644 arch/mips/crypto/poly1305-glue.c
 create mode 100644 arch/mips/crypto/poly1305-mips.pl
 create mode 100644 arch/x86/crypto/blake2s-core.S
 create mode 100644 arch/x86/crypto/blake2s-glue.c
 create mode 100644 arch/x86/crypto/curve25519-x86_64.c
 create mode 100644 crypto/blake2s_generic.c
 create mode 100644 crypto/curve25519-generic.c
 create mode 100644 include/crypto/blake2s.h
 create mode 100644 include/crypto/chacha20poly1305.h
 create mode 100644 include/crypto/curve25519.h
 create mode 100644 include/crypto/internal/blake2s.h
 create mode 100644 include/crypto/internal/chacha.h
 create mode 100644 include/crypto/internal/poly1305.h
 create mode 100644 lib/crypto/Kconfig
 create mode 100644 lib/crypto/blake2s-generic.c
 create mode 100644 lib/crypto/blake2s-selftest.c
 create mode 100644 lib/crypto/blake2s.c
 rename lib/{ => crypto}/chacha.c (88%)
 create mode 100644 lib/crypto/chacha20poly1305-selftest.c
 create mode 100644 lib/crypto/chacha20poly1305.c
 create mode 100644 lib/crypto/curve25519-fiat32.c
 create mode 100644 lib/crypto/curve25519-hacl64.c
 create mode 100644 lib/crypto/curve25519.c
 create mode 100644 lib/crypto/libchacha.c
 create mode 100644 lib/crypto/poly1305.c

Comments

Herbert Xu Nov. 15, 2019, 6:07 a.m. UTC | #1
On Fri, Nov 08, 2019 at 01:22:06PM +0100, Ard Biesheuvel wrote:
> This series implements the crypto library abstractions that are needed to
> incorporate WireGuard into the mainline kernel.
> 
> Changes since v4:
> - Address most review feedback from Eric, with the exception of the remark
>   about libraries being selectable by the user - this is something we need
>   to revisit in the context of moving to weak references or static calls to
>   make accelerated versions of libraries loadable at any time. (Currently,
>   loading an accelerated version at runtime will not supersede calls to the
>   generic routines in the kernel proper, which is counterintuitive, and this
>   is currently being addressed by making the generic library versions only
>   selectable as modules if the accelerated ones are selected as modules as
>   well)
> - Align the generic blake2s Kconfig symbols, filenames etc with the recently
>   added blake2b driver.
> - Rewrote the blake2s selftest for better coverage of key length and input
>   length combinations, and added a HMAC selftest as well.
> - Rename blake2s_hmac() to blake2s256_hmac(), and drop the digest length
>   argument, which was not implemented correctly, and never deviates from
>   the full length in practice anyway.
> - Update to more recent version of the blake2s x86 Zinc code
> 
> Changes since v3:
> - Unify the way the generic vs arch libraries are organized between ChaCha20
>   and Poly1305 on the one hand and Curve25519 and Blake2s on the other.
>   All are now made up of a generic library, a generic crypto API driver
>   (skcipher for [X]ChaCha, shash for Poly1305 and Blake2s and kpp for
>   Curve25519) and optional per-arch versions providing both the library and
>   the crypto API interfaces while potentially relying on the generic *library*
>   only as a fallback (and not on the generic crypto API driver). Implementations
>   of the libary interface that don't require the fallback don't pull in the
>   generic code at all, but the generic crypto API drivers are tied to the
>   generic implementations directly (this is necessary since we fuzz test the
>   accelerated implementations against the generic implementations)
> - Provide testmgr test vectors for the Curve25519 and Blake2s crypto API
>   drivers that were added in this revision. This also required some changes
>   to the KPP test routines so we can test for failures as well.
> - Update to the latest version of Andy Polyakov's Poly1305 implementation for
>   MIPS that incorporates Rene's improvements for 32r2
> - Remove logic in the x86 and ARM implementations of ChaCha and Poly1305 to
>   prefer the non-SIMD path for short inputs. This is no longer necessary, and
>   even undesirable since it forced ChaCha20Poly1305's ChaCha pass generating
>   the Poly1305 nonce to always take the slower scalar path.
> 
> Changes since v2:
> - Reduce the cc: audience a bit, since I assumed that not everyone is
>   interested in discussing the details of this.
> - Incorporate scalar ARM code for ChaCha, and the 64-bit MIPS code for
>   Poly1305. NOTE: the Cryptogams MIPS code now supports 32-bit MIPS as well,
>   and not just 32r2, so I omitted Rene's Poly1305 implementation for now, and
>   used Andy's code for everything.
> - Incorporate NEON opt-out for Cortex-A5/A7. Note that the code is still
>   exposed via the crypto API, but with a low prioririty, so it is still
>   available and still gets test coverage, but is not used by default.
> - Use static keys (*not* static calls) in the SIMD and bmi2/adx drivers to
>   keep track of which implementation is being used, to avoid the memory
>   load on each call.
> - Defer using weak references or static calls until the dust around this has
>   settled. Instead, rely on Kconfig constraints and symbol dependencies to
>   ensure that the arch code is always used when it is loaded. This means
>   you can only opt out of using the arch code if you disable it in Kconfig
>   but this is something I can live with for now.
> - Refactor the Curve25519 glue code slightly so that the call sites branch to
>   the arch or generic code directly.
> - Split up the Poly1305 refactoring patches so they can be reviewed more
>   easily.
> 
> Changes since RFC/v1:
> - dropped the WireGuard patch itself, and the followup patches - since the
>   purpose was to illustrate the extent of the required changes, there is no
>   reason to keep including them.
> - import the MIPS 32r2 versions of ChaCha and Poly1305, but expose both the
>   crypto API and library interfaces so that not only WireGuard but also IPsec
>   and Adiantum can benefit immediately. (The latter required adding support for
>   the reduced round version of ChaCha to the MIPS asm code)
> - fix up various minor kconfig/build issues found in randconfig testing
>   (thanks Arnd!)
> 
> Patches can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=wireguard-crypto-library-api-v5
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: David Miller <davem@davemloft.net>
> Cc: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Eric Biggers <ebiggers@google.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Martin Willi <martin@strongswan.org>
> Cc: Rene van Dorst <opensource@vdorst.com>
> Cc: David Sterba <dsterba@suse.com>
> 
> Ard Biesheuvel (27):
>   crypto: tidy up lib/crypto Kconfig and Makefile
>   crypto: chacha - move existing library code into lib/crypto
>   crypto: x86/chacha - depend on generic chacha library instead of
>     crypto driver
>   crypto: x86/chacha - expose SIMD ChaCha routine as library function
>   crypto: arm64/chacha - depend on generic chacha library instead of
>     crypto driver
>   crypto: arm64/chacha - expose arm64 ChaCha routine as library function
>   crypto: arm/chacha - import Eric Biggers's scalar accelerated ChaCha
>     code
>   crypto: arm/chacha - remove dependency on generic ChaCha driver
>   crypto: arm/chacha - expose ARM ChaCha routine as library function
>   crypto: mips/chacha - wire up accelerated 32r2 code from Zinc
>   crypto: chacha - unexport chacha_generic routines
>   crypto: poly1305 - move core routines into a separate library
>   crypto: x86/poly1305 - unify Poly1305 state struct with generic code
>   crypto: poly1305 - expose init/update/final library interface
>   crypto: x86/poly1305 - depend on generic library not generic shash
>   crypto: x86/poly1305 - expose existing driver as poly1305 library
>   crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON
>     implementation
>   crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON
>     implementation
>   crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized
>     implementation
>   int128: move __uint128_t compiler test to Kconfig
>   crypto: testmgr - add test cases for Blake2s
>   crypto: blake2s - implement generic shash driver
>   crypto: curve25519 - add kpp selftest
>   crypto: curve25519 - implement generic KPP driver
>   crypto: lib/curve25519 - work around Clang stack spilling issue
>   crypto: chacha20poly1305 - import construction and selftest from Zinc
>   crypto: lib/chacha20poly1305 - reimplement crypt_from_sg() routine
> 
> Jason A. Donenfeld (7):
>   crypto: mips/chacha - import 32r2 ChaCha code from Zinc
>   crypto: BLAKE2s - generic C library implementation and selftest
>   crypto: BLAKE2s - x86_64 SIMD implementation
>   crypto: Curve25519 - generic C library implementations
>   crypto: Curve25519 - x86_64 library and KPP implementations
>   crypto: arm - import Bernstein and Schwabe's Curve25519 ARM
>     implementation
>   crypto: arm/Curve25519 - wire up NEON implementation
> 
>  arch/arm/crypto/Kconfig                   |   16 +-
>  arch/arm/crypto/Makefile                  |   17 +-
>  arch/arm/crypto/chacha-glue.c             |  343 +
>  arch/arm/crypto/chacha-neon-glue.c        |  202 -
>  arch/arm/crypto/chacha-scalar-core.S      |  460 ++
>  arch/arm/crypto/curve25519-core.S         | 2062 ++++++
>  arch/arm/crypto/curve25519-glue.c         |  127 +
>  arch/arm/crypto/poly1305-armv4.pl         | 1236 ++++
>  arch/arm/crypto/poly1305-core.S_shipped   | 1158 +++
>  arch/arm/crypto/poly1305-glue.c           |  276 +
>  arch/arm64/Kconfig                        |    2 +-
>  arch/arm64/crypto/Kconfig                 |    9 +-
>  arch/arm64/crypto/Makefile                |   10 +-
>  arch/arm64/crypto/chacha-neon-glue.c      |   81 +-
>  arch/arm64/crypto/poly1305-armv8.pl       |  913 +++
>  arch/arm64/crypto/poly1305-core.S_shipped |  835 +++
>  arch/arm64/crypto/poly1305-glue.c         |  237 +
>  arch/mips/Makefile                        |    2 +-
>  arch/mips/crypto/Makefile                 |   18 +
>  arch/mips/crypto/chacha-core.S            |  497 ++
>  arch/mips/crypto/chacha-glue.c            |  150 +
>  arch/mips/crypto/poly1305-glue.c          |  203 +
>  arch/mips/crypto/poly1305-mips.pl         | 1273 ++++
>  arch/riscv/Kconfig                        |    2 +-
>  arch/x86/Kconfig                          |    2 +-
>  arch/x86/crypto/Makefile                  |    3 +
>  arch/x86/crypto/blake2s-core.S            |  258 +
>  arch/x86/crypto/blake2s-glue.c            |  233 +
>  arch/x86/crypto/chacha_glue.c             |  181 +-
>  arch/x86/crypto/curve25519-x86_64.c       | 2475 +++++++
>  arch/x86/crypto/poly1305_glue.c           |  199 +-
>  crypto/Kconfig                            |   71 +-
>  crypto/Makefile                           |    2 +
>  crypto/adiantum.c                         |    5 +-
>  crypto/blake2s_generic.c                  |  171 +
>  crypto/chacha_generic.c                   |   84 +-
>  crypto/curve25519-generic.c               |   90 +
>  crypto/ecc.c                              |    2 +-
>  crypto/nhpoly1305.c                       |    3 +-
>  crypto/poly1305_generic.c                 |  228 +-
>  crypto/testmgr.c                          |   30 +
>  crypto/testmgr.h                          | 1520 +++-
>  include/crypto/blake2s.h                  |  106 +
>  include/crypto/chacha.h                   |   83 +-
>  include/crypto/chacha20poly1305.h         |   48 +
>  include/crypto/curve25519.h               |   71 +
>  include/crypto/internal/blake2s.h         |   24 +
>  include/crypto/internal/chacha.h          |   43 +
>  include/crypto/internal/poly1305.h        |   58 +
>  include/crypto/poly1305.h                 |   69 +-
>  init/Kconfig                              |    4 +
>  lib/Makefile                              |    3 +-
>  lib/crypto/Kconfig                        |  130 +
>  lib/crypto/Makefile                       |   42 +-
>  lib/crypto/blake2s-generic.c              |  111 +
>  lib/crypto/blake2s-selftest.c             |  622 ++
>  lib/crypto/blake2s.c                      |  126 +
>  lib/{ => crypto}/chacha.c                 |   20 +-
>  lib/crypto/chacha20poly1305-selftest.c    | 7393 ++++++++++++++++++++
>  lib/crypto/chacha20poly1305.c             |  369 +
>  lib/crypto/curve25519-fiat32.c            |  864 +++
>  lib/crypto/curve25519-hacl64.c            |  788 +++
>  lib/crypto/curve25519.c                   |   25 +
>  lib/crypto/libchacha.c                    |   35 +
>  lib/crypto/poly1305.c                     |  232 +
>  lib/ubsan.c                               |    2 +-
>  lib/ubsan.h                               |    2 +-
>  67 files changed, 26148 insertions(+), 808 deletions(-)
>  create mode 100644 arch/arm/crypto/chacha-glue.c
>  delete mode 100644 arch/arm/crypto/chacha-neon-glue.c
>  create mode 100644 arch/arm/crypto/chacha-scalar-core.S
>  create mode 100644 arch/arm/crypto/curve25519-core.S
>  create mode 100644 arch/arm/crypto/curve25519-glue.c
>  create mode 100644 arch/arm/crypto/poly1305-armv4.pl
>  create mode 100644 arch/arm/crypto/poly1305-core.S_shipped
>  create mode 100644 arch/arm/crypto/poly1305-glue.c
>  create mode 100644 arch/arm64/crypto/poly1305-armv8.pl
>  create mode 100644 arch/arm64/crypto/poly1305-core.S_shipped
>  create mode 100644 arch/arm64/crypto/poly1305-glue.c
>  create mode 100644 arch/mips/crypto/chacha-core.S
>  create mode 100644 arch/mips/crypto/chacha-glue.c
>  create mode 100644 arch/mips/crypto/poly1305-glue.c
>  create mode 100644 arch/mips/crypto/poly1305-mips.pl
>  create mode 100644 arch/x86/crypto/blake2s-core.S
>  create mode 100644 arch/x86/crypto/blake2s-glue.c
>  create mode 100644 arch/x86/crypto/curve25519-x86_64.c
>  create mode 100644 crypto/blake2s_generic.c
>  create mode 100644 crypto/curve25519-generic.c
>  create mode 100644 include/crypto/blake2s.h
>  create mode 100644 include/crypto/chacha20poly1305.h
>  create mode 100644 include/crypto/curve25519.h
>  create mode 100644 include/crypto/internal/blake2s.h
>  create mode 100644 include/crypto/internal/chacha.h
>  create mode 100644 include/crypto/internal/poly1305.h
>  create mode 100644 lib/crypto/Kconfig
>  create mode 100644 lib/crypto/blake2s-generic.c
>  create mode 100644 lib/crypto/blake2s-selftest.c
>  create mode 100644 lib/crypto/blake2s.c
>  rename lib/{ => crypto}/chacha.c (88%)
>  create mode 100644 lib/crypto/chacha20poly1305-selftest.c
>  create mode 100644 lib/crypto/chacha20poly1305.c
>  create mode 100644 lib/crypto/curve25519-fiat32.c
>  create mode 100644 lib/crypto/curve25519-hacl64.c
>  create mode 100644 lib/crypto/curve25519.c
>  create mode 100644 lib/crypto/libchacha.c
>  create mode 100644 lib/crypto/poly1305.c

All applied.  Thanks.
Herbert Xu Nov. 15, 2019, 9:09 a.m. UTC | #2
On Fri, Nov 15, 2019 at 09:56:17AM +0100, Jason A. Donenfeld wrote:
> Hey Herbert,
> 
> I've been traveling and got back home yesterday. I was going to review
> these this afternoon if there's still opportunity.

Hi Jason:

Last I checked there weren't any major objections to this patch-set.
If there are any remaining issues we can resolve them with new
patches on top of this series.

Thanks,
Jason A. Donenfeld Nov. 19, 2019, 3:18 p.m. UTC | #3
Hey Ard, Herbert, Dave,

The series looks fine. Ard -- thanks so much for picking up the work
and making this happen. As far as I'm concerned, this is "most" of
Zinc, simply without calling it "Zinc", and minus a few other things
that I think constitutes an okay compromise and good base for moving
forward.

Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>

The TODO list for me remains the same, and now I can get moving with that:

- Zinc's generic C implementation of poly1305, which is faster and has
separate implementations for u64 and u128.
- x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy.
- x86_64 Poly1305 from Zinc.
- Resurrecting the big_keys patch and receiving DavidH's review on that.
- WireGuard! Hurrah!

If you have any feedback on how you'd like this prioritized, please
pipe up. For example Dave - would you like WireGuard *now* or sometime
later? I can probably get that cooking this week, though I do have
some testing and fuzzing of it to do on top of the patches that just
landed in cryptodev.

Jason
Ard Biesheuvel Nov. 19, 2019, 3:34 p.m. UTC | #4
Hey Jason,

On Tue, 19 Nov 2019 at 16:18, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey Ard, Herbert, Dave,
>
> The series looks fine. Ard -- thanks so much for picking up the work
> and making this happen. As far as I'm concerned, this is "most" of
> Zinc, simply without calling it "Zinc", and minus a few other things
> that I think constitutes an okay compromise and good base for moving
> forward.
>
> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
>

Thanks!

> The TODO list for me remains the same, and now I can get moving with that:
>
> - Zinc's generic C implementation of poly1305, which is faster and has
> separate implementations for u64 and u128.
> - x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy.
> - x86_64 Poly1305 from Zinc.

As I pointed out in the private discussions we had, there are two
aspects two AndyP's benchmarking that don't carry over 100% to the
Linux kernel:
- Every microarchitecture is given equal weight, regardless of the
likelihood that the code will actually run on it. This makes some
sense for OpenSSL, I guess, but not for the kernel.
- Benchmarks are typically based on the performance of the core
cryptographics transformation rather than a representative workload.
This is especially relevant for network use cases, where packet sizes
are not necessarily fixed and usually not a multiple of the block size
(as opposed to disk encryption, where every single call is the same
size and a power of 2)

So for future changes, could we please include performance numbers
based on realistic workloads?

> - Resurrecting the big_keys patch and receiving DavidH's review on that.

My concern here (but we can discuss it in the context of a repost) is
that this will pull the accelerated chacha20 and poly1305 code (if
enabled)  into the core kernel, given that big_keys is not a tristate
option. So perhaps we can park this one until we know how to proceed
with static calls or alternative approaches?

> - WireGuard! Hurrah!
>

I'm a bit surprised that this only appears at the end of your list :-)

> If you have any feedback on how you'd like this prioritized, please
> pipe up. For example Dave - would you like WireGuard *now* or sometime
> later? I can probably get that cooking this week, though I do have
> some testing and fuzzing of it to do on top of the patches that just
> landed in cryptodev.
>

We're at -rc8, and wireguard itself will not go via the crypto tree so
you should wait until after the merge window, rebase it onto -rc1 and
repost it.
Jason A. Donenfeld Nov. 19, 2019, 3:44 p.m. UTC | #5
On Tue, Nov 19, 2019 at 4:34 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> > - Zinc's generic C implementation of poly1305, which is faster and has
> > separate implementations for u64 and u128.

I assume your AndyP comment below didn't apply to this top item here.
This one should be fairly uncontroversial in your opinion, right?

> > - x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy.
> > - x86_64 Poly1305 from Zinc.
>
> As I pointed out in the private discussions we had, there are two
> aspects two AndyP's benchmarking that don't carry over 100% to the
> Linux kernel:
> - Every microarchitecture is given equal weight, regardless of the
> likelihood that the code will actually run on it. This makes some
> sense for OpenSSL, I guess, but not for the kernel.
> - Benchmarks are typically based on the performance of the core
> cryptographics transformation rather than a representative workload.
> This is especially relevant for network use cases, where packet sizes
> are not necessarily fixed and usually not a multiple of the block size
> (as opposed to disk encryption, where every single call is the same
> size and a power of 2)
>
> So for future changes, could we please include performance numbers
> based on realistic workloads?

Yea I share your concerns here. From preliminary results, I think the
Poly1305 code will be globally better, and I don't think we'll need an
abundance of discussion about it.

The ChaCha case is more interesting. I'll submit this with lots of
packet-sized microbenchmarks, as well as on-the-wire WireGuard
testing. Eric - I'm guessing you don't care too much about Adamantium
performance on x86 where people are probably better off with AES-XTS,
right? Are there other specific real world cases we care about? IPsec
is another one, but those concerns, packet-size wise, are more or less
the same as for WireGuard. But anyway, we can cross this bridge when
we come to it.


> > - WireGuard! Hurrah!
> >
> I'm a bit surprised that this only appears at the end of your list :-)

Haha, "last but not least" :)

>
> > If you have any feedback on how you'd like this prioritized, please
> > pipe up. For example Dave - would you like WireGuard *now* or sometime
> > later? I can probably get that cooking this week, though I do have
> > some testing and fuzzing of it to do on top of the patches that just
> > landed in cryptodev.
> >
>
> We're at -rc8, and wireguard itself will not go via the crypto tree so
> you should wait until after the merge window, rebase it onto -rc1 and
> repost it.

Thanks, yea, that makes sense. Netdev also has its own merge window
schedule that I should aim to meet. I guess, based on this if I'm
understanding correctly, we're looking at WireGuard for 5.5?

Jason
Ard Biesheuvel Nov. 19, 2019, 3:59 p.m. UTC | #6
On Tue, 19 Nov 2019 at 16:44, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Tue, Nov 19, 2019 at 4:34 PM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > > - Zinc's generic C implementation of poly1305, which is faster and has
> > > separate implementations for u64 and u128.
>
> I assume your AndyP comment below didn't apply to this top item here.
> This one should be fairly uncontroversial in your opinion, right?
>

Yes.

> > > - x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy.
> > > - x86_64 Poly1305 from Zinc.
> >
> > As I pointed out in the private discussions we had, there are two
> > aspects two AndyP's benchmarking that don't carry over 100% to the
> > Linux kernel:
> > - Every microarchitecture is given equal weight, regardless of the
> > likelihood that the code will actually run on it. This makes some
> > sense for OpenSSL, I guess, but not for the kernel.
> > - Benchmarks are typically based on the performance of the core
> > cryptographics transformation rather than a representative workload.
> > This is especially relevant for network use cases, where packet sizes
> > are not necessarily fixed and usually not a multiple of the block size
> > (as opposed to disk encryption, where every single call is the same
> > size and a power of 2)
> >
> > So for future changes, could we please include performance numbers
> > based on realistic workloads?
>
> Yea I share your concerns here. From preliminary results, I think the
> Poly1305 code will be globally better, and I don't think we'll need an
> abundance of discussion about it.
>

Good.

> The ChaCha case is more interesting. I'll submit this with lots of
> packet-sized microbenchmarks, as well as on-the-wire WireGuard
> testing. Eric - I'm guessing you don't care too much about Adamantium
> performance on x86 where people are probably better off with AES-XTS,
> right? Are there other specific real world cases we care about? IPsec
> is another one, but those concerns, packet-size wise, are more or less
> the same as for WireGuard. But anyway, we can cross this bridge when
> we come to it.
>

As long as we can show that WireGuard really benefits without
regressing other users disproportionately, I think we're fine. There
are not /that/ many users to begin with ...

>
> > > - WireGuard! Hurrah!
> > >
> > I'm a bit surprised that this only appears at the end of your list :-)
>
> Haha, "last but not least" :)
>
> >
> > > If you have any feedback on how you'd like this prioritized, please
> > > pipe up. For example Dave - would you like WireGuard *now* or sometime
> > > later? I can probably get that cooking this week, though I do have
> > > some testing and fuzzing of it to do on top of the patches that just
> > > landed in cryptodev.
> > >
> >
> > We're at -rc8, and wireguard itself will not go via the crypto tree so
> > you should wait until after the merge window, rebase it onto -rc1 and
> > repost it.
>
> Thanks, yea, that makes sense. Netdev also has its own merge window
> schedule that I should aim to meet. I guess, based on this if I'm
> understanding correctly, we're looking at WireGuard for 5.5?
>

No, v5.6. The merge window for v5.5 will open this Sunday, so you'll
need to rebase on v5.5-rc1 once it is released, which will [hopefully]
have all the crypto pieces you need.

Note that this applies equally to the other changes: we can queue
performance tweaks in the crypto tree in parallel, and they will all
hit v5.6 at the same time.
Eric Biggers Nov. 19, 2019, 4:23 p.m. UTC | #7
On Tue, Nov 19, 2019 at 04:44:11PM +0100, Jason A. Donenfeld wrote:
> >
> > So for future changes, could we please include performance numbers
> > based on realistic workloads?
> 
> Yea I share your concerns here. From preliminary results, I think the
> Poly1305 code will be globally better, and I don't think we'll need an
> abundance of discussion about it.
> 
> The ChaCha case is more interesting. I'll submit this with lots of
> packet-sized microbenchmarks, as well as on-the-wire WireGuard
> testing. Eric - I'm guessing you don't care too much about Adamantium
> performance on x86 where people are probably better off with AES-XTS,
> right? Are there other specific real world cases we care about? IPsec
> is another one, but those concerns, packet-size wise, are more or less
> the same as for WireGuard. But anyway, we can cross this bridge when
> we come to it.

I'd like for Adiantum to continue to be accelerated on x86, but it doesn't have
to squeeze out all performance possible on x86, given that hardware AES support
is available there so most people will use that instead.  So if e.g. the ChaCha
implementation is still AVX2 accelerated, but it's primarily optimized for
networking packets rather than disk encryption, that would probably be fine.

- Eric
Jordan Glover Nov. 19, 2019, 9:43 p.m. UTC | #8
On Tuesday, November 19, 2019 4:23 PM, Eric Biggers <ebiggers@kernel.org> wrote:

> On Tue, Nov 19, 2019 at 04:44:11PM +0100, Jason A. Donenfeld wrote:
>
> > > So for future changes, could we please include performance numbers
> > > based on realistic workloads?
> >
> > Yea I share your concerns here. From preliminary results, I think the
> > Poly1305 code will be globally better, and I don't think we'll need an
> > abundance of discussion about it.
> > The ChaCha case is more interesting. I'll submit this with lots of
> > packet-sized microbenchmarks, as well as on-the-wire WireGuard
> > testing. Eric - I'm guessing you don't care too much about Adamantium
> > performance on x86 where people are probably better off with AES-XTS,
> > right? Are there other specific real world cases we care about? IPsec
> > is another one, but those concerns, packet-size wise, are more or less
> > the same as for WireGuard. But anyway, we can cross this bridge when
> > we come to it.
>
> I'd like for Adiantum to continue to be accelerated on x86, but it doesn't have
> to squeeze out all performance possible on x86, given that hardware AES support
> is available there so most people will use that instead. So if e.g. the ChaCha
> implementation is still AVX2 accelerated, but it's primarily optimized for
> networking packets rather than disk encryption, that would probably be fine.
>
> -   Eric

I'm interested in using Adamantium on x86 and I hope that you folks won't cripple it :(

Jordan