Message ID | 20191108122240.28479-1-ardb@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | crypto: crypto API library interfaces for WireGuard | expand |
On Fri, Nov 08, 2019 at 01:22:06PM +0100, Ard Biesheuvel wrote: > This series implements the crypto library abstractions that are needed to > incorporate WireGuard into the mainline kernel. > > Changes since v4: > - Address most review feedback from Eric, with the exception of the remark > about libraries being selectable by the user - this is something we need > to revisit in the context of moving to weak references or static calls to > make accelerated versions of libraries loadable at any time. (Currently, > loading an accelerated version at runtime will not supersede calls to the > generic routines in the kernel proper, which is counterintuitive, and this > is currently being addressed by making the generic library versions only > selectable as modules if the accelerated ones are selected as modules as > well) > - Align the generic blake2s Kconfig symbols, filenames etc with the recently > added blake2b driver. > - Rewrote the blake2s selftest for better coverage of key length and input > length combinations, and added a HMAC selftest as well. > - Rename blake2s_hmac() to blake2s256_hmac(), and drop the digest length > argument, which was not implemented correctly, and never deviates from > the full length in practice anyway. > - Update to more recent version of the blake2s x86 Zinc code > > Changes since v3: > - Unify the way the generic vs arch libraries are organized between ChaCha20 > and Poly1305 on the one hand and Curve25519 and Blake2s on the other. > All are now made up of a generic library, a generic crypto API driver > (skcipher for [X]ChaCha, shash for Poly1305 and Blake2s and kpp for > Curve25519) and optional per-arch versions providing both the library and > the crypto API interfaces while potentially relying on the generic *library* > only as a fallback (and not on the generic crypto API driver). Implementations > of the libary interface that don't require the fallback don't pull in the > generic code at all, but the generic crypto API drivers are tied to the > generic implementations directly (this is necessary since we fuzz test the > accelerated implementations against the generic implementations) > - Provide testmgr test vectors for the Curve25519 and Blake2s crypto API > drivers that were added in this revision. This also required some changes > to the KPP test routines so we can test for failures as well. > - Update to the latest version of Andy Polyakov's Poly1305 implementation for > MIPS that incorporates Rene's improvements for 32r2 > - Remove logic in the x86 and ARM implementations of ChaCha and Poly1305 to > prefer the non-SIMD path for short inputs. This is no longer necessary, and > even undesirable since it forced ChaCha20Poly1305's ChaCha pass generating > the Poly1305 nonce to always take the slower scalar path. > > Changes since v2: > - Reduce the cc: audience a bit, since I assumed that not everyone is > interested in discussing the details of this. > - Incorporate scalar ARM code for ChaCha, and the 64-bit MIPS code for > Poly1305. NOTE: the Cryptogams MIPS code now supports 32-bit MIPS as well, > and not just 32r2, so I omitted Rene's Poly1305 implementation for now, and > used Andy's code for everything. > - Incorporate NEON opt-out for Cortex-A5/A7. Note that the code is still > exposed via the crypto API, but with a low prioririty, so it is still > available and still gets test coverage, but is not used by default. > - Use static keys (*not* static calls) in the SIMD and bmi2/adx drivers to > keep track of which implementation is being used, to avoid the memory > load on each call. > - Defer using weak references or static calls until the dust around this has > settled. Instead, rely on Kconfig constraints and symbol dependencies to > ensure that the arch code is always used when it is loaded. This means > you can only opt out of using the arch code if you disable it in Kconfig > but this is something I can live with for now. > - Refactor the Curve25519 glue code slightly so that the call sites branch to > the arch or generic code directly. > - Split up the Poly1305 refactoring patches so they can be reviewed more > easily. > > Changes since RFC/v1: > - dropped the WireGuard patch itself, and the followup patches - since the > purpose was to illustrate the extent of the required changes, there is no > reason to keep including them. > - import the MIPS 32r2 versions of ChaCha and Poly1305, but expose both the > crypto API and library interfaces so that not only WireGuard but also IPsec > and Adiantum can benefit immediately. (The latter required adding support for > the reduced round version of ChaCha to the MIPS asm code) > - fix up various minor kconfig/build issues found in randconfig testing > (thanks Arnd!) > > Patches can be found here: > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=wireguard-crypto-library-api-v5 > > Cc: Herbert Xu <herbert@gondor.apana.org.au> > Cc: David Miller <davem@davemloft.net> > Cc: Jason A. Donenfeld <Jason@zx2c4.com> > Cc: Samuel Neves <sneves@dei.uc.pt> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Eric Biggers <ebiggers@google.com> > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Martin Willi <martin@strongswan.org> > Cc: Rene van Dorst <opensource@vdorst.com> > Cc: David Sterba <dsterba@suse.com> > > Ard Biesheuvel (27): > crypto: tidy up lib/crypto Kconfig and Makefile > crypto: chacha - move existing library code into lib/crypto > crypto: x86/chacha - depend on generic chacha library instead of > crypto driver > crypto: x86/chacha - expose SIMD ChaCha routine as library function > crypto: arm64/chacha - depend on generic chacha library instead of > crypto driver > crypto: arm64/chacha - expose arm64 ChaCha routine as library function > crypto: arm/chacha - import Eric Biggers's scalar accelerated ChaCha > code > crypto: arm/chacha - remove dependency on generic ChaCha driver > crypto: arm/chacha - expose ARM ChaCha routine as library function > crypto: mips/chacha - wire up accelerated 32r2 code from Zinc > crypto: chacha - unexport chacha_generic routines > crypto: poly1305 - move core routines into a separate library > crypto: x86/poly1305 - unify Poly1305 state struct with generic code > crypto: poly1305 - expose init/update/final library interface > crypto: x86/poly1305 - depend on generic library not generic shash > crypto: x86/poly1305 - expose existing driver as poly1305 library > crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON > implementation > crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON > implementation > crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized > implementation > int128: move __uint128_t compiler test to Kconfig > crypto: testmgr - add test cases for Blake2s > crypto: blake2s - implement generic shash driver > crypto: curve25519 - add kpp selftest > crypto: curve25519 - implement generic KPP driver > crypto: lib/curve25519 - work around Clang stack spilling issue > crypto: chacha20poly1305 - import construction and selftest from Zinc > crypto: lib/chacha20poly1305 - reimplement crypt_from_sg() routine > > Jason A. Donenfeld (7): > crypto: mips/chacha - import 32r2 ChaCha code from Zinc > crypto: BLAKE2s - generic C library implementation and selftest > crypto: BLAKE2s - x86_64 SIMD implementation > crypto: Curve25519 - generic C library implementations > crypto: Curve25519 - x86_64 library and KPP implementations > crypto: arm - import Bernstein and Schwabe's Curve25519 ARM > implementation > crypto: arm/Curve25519 - wire up NEON implementation > > arch/arm/crypto/Kconfig | 16 +- > arch/arm/crypto/Makefile | 17 +- > arch/arm/crypto/chacha-glue.c | 343 + > arch/arm/crypto/chacha-neon-glue.c | 202 - > arch/arm/crypto/chacha-scalar-core.S | 460 ++ > arch/arm/crypto/curve25519-core.S | 2062 ++++++ > arch/arm/crypto/curve25519-glue.c | 127 + > arch/arm/crypto/poly1305-armv4.pl | 1236 ++++ > arch/arm/crypto/poly1305-core.S_shipped | 1158 +++ > arch/arm/crypto/poly1305-glue.c | 276 + > arch/arm64/Kconfig | 2 +- > arch/arm64/crypto/Kconfig | 9 +- > arch/arm64/crypto/Makefile | 10 +- > arch/arm64/crypto/chacha-neon-glue.c | 81 +- > arch/arm64/crypto/poly1305-armv8.pl | 913 +++ > arch/arm64/crypto/poly1305-core.S_shipped | 835 +++ > arch/arm64/crypto/poly1305-glue.c | 237 + > arch/mips/Makefile | 2 +- > arch/mips/crypto/Makefile | 18 + > arch/mips/crypto/chacha-core.S | 497 ++ > arch/mips/crypto/chacha-glue.c | 150 + > arch/mips/crypto/poly1305-glue.c | 203 + > arch/mips/crypto/poly1305-mips.pl | 1273 ++++ > arch/riscv/Kconfig | 2 +- > arch/x86/Kconfig | 2 +- > arch/x86/crypto/Makefile | 3 + > arch/x86/crypto/blake2s-core.S | 258 + > arch/x86/crypto/blake2s-glue.c | 233 + > arch/x86/crypto/chacha_glue.c | 181 +- > arch/x86/crypto/curve25519-x86_64.c | 2475 +++++++ > arch/x86/crypto/poly1305_glue.c | 199 +- > crypto/Kconfig | 71 +- > crypto/Makefile | 2 + > crypto/adiantum.c | 5 +- > crypto/blake2s_generic.c | 171 + > crypto/chacha_generic.c | 84 +- > crypto/curve25519-generic.c | 90 + > crypto/ecc.c | 2 +- > crypto/nhpoly1305.c | 3 +- > crypto/poly1305_generic.c | 228 +- > crypto/testmgr.c | 30 + > crypto/testmgr.h | 1520 +++- > include/crypto/blake2s.h | 106 + > include/crypto/chacha.h | 83 +- > include/crypto/chacha20poly1305.h | 48 + > include/crypto/curve25519.h | 71 + > include/crypto/internal/blake2s.h | 24 + > include/crypto/internal/chacha.h | 43 + > include/crypto/internal/poly1305.h | 58 + > include/crypto/poly1305.h | 69 +- > init/Kconfig | 4 + > lib/Makefile | 3 +- > lib/crypto/Kconfig | 130 + > lib/crypto/Makefile | 42 +- > lib/crypto/blake2s-generic.c | 111 + > lib/crypto/blake2s-selftest.c | 622 ++ > lib/crypto/blake2s.c | 126 + > lib/{ => crypto}/chacha.c | 20 +- > lib/crypto/chacha20poly1305-selftest.c | 7393 ++++++++++++++++++++ > lib/crypto/chacha20poly1305.c | 369 + > lib/crypto/curve25519-fiat32.c | 864 +++ > lib/crypto/curve25519-hacl64.c | 788 +++ > lib/crypto/curve25519.c | 25 + > lib/crypto/libchacha.c | 35 + > lib/crypto/poly1305.c | 232 + > lib/ubsan.c | 2 +- > lib/ubsan.h | 2 +- > 67 files changed, 26148 insertions(+), 808 deletions(-) > create mode 100644 arch/arm/crypto/chacha-glue.c > delete mode 100644 arch/arm/crypto/chacha-neon-glue.c > create mode 100644 arch/arm/crypto/chacha-scalar-core.S > create mode 100644 arch/arm/crypto/curve25519-core.S > create mode 100644 arch/arm/crypto/curve25519-glue.c > create mode 100644 arch/arm/crypto/poly1305-armv4.pl > create mode 100644 arch/arm/crypto/poly1305-core.S_shipped > create mode 100644 arch/arm/crypto/poly1305-glue.c > create mode 100644 arch/arm64/crypto/poly1305-armv8.pl > create mode 100644 arch/arm64/crypto/poly1305-core.S_shipped > create mode 100644 arch/arm64/crypto/poly1305-glue.c > create mode 100644 arch/mips/crypto/chacha-core.S > create mode 100644 arch/mips/crypto/chacha-glue.c > create mode 100644 arch/mips/crypto/poly1305-glue.c > create mode 100644 arch/mips/crypto/poly1305-mips.pl > create mode 100644 arch/x86/crypto/blake2s-core.S > create mode 100644 arch/x86/crypto/blake2s-glue.c > create mode 100644 arch/x86/crypto/curve25519-x86_64.c > create mode 100644 crypto/blake2s_generic.c > create mode 100644 crypto/curve25519-generic.c > create mode 100644 include/crypto/blake2s.h > create mode 100644 include/crypto/chacha20poly1305.h > create mode 100644 include/crypto/curve25519.h > create mode 100644 include/crypto/internal/blake2s.h > create mode 100644 include/crypto/internal/chacha.h > create mode 100644 include/crypto/internal/poly1305.h > create mode 100644 lib/crypto/Kconfig > create mode 100644 lib/crypto/blake2s-generic.c > create mode 100644 lib/crypto/blake2s-selftest.c > create mode 100644 lib/crypto/blake2s.c > rename lib/{ => crypto}/chacha.c (88%) > create mode 100644 lib/crypto/chacha20poly1305-selftest.c > create mode 100644 lib/crypto/chacha20poly1305.c > create mode 100644 lib/crypto/curve25519-fiat32.c > create mode 100644 lib/crypto/curve25519-hacl64.c > create mode 100644 lib/crypto/curve25519.c > create mode 100644 lib/crypto/libchacha.c > create mode 100644 lib/crypto/poly1305.c All applied. Thanks.
On Fri, Nov 15, 2019 at 09:56:17AM +0100, Jason A. Donenfeld wrote: > Hey Herbert, > > I've been traveling and got back home yesterday. I was going to review > these this afternoon if there's still opportunity. Hi Jason: Last I checked there weren't any major objections to this patch-set. If there are any remaining issues we can resolve them with new patches on top of this series. Thanks,
Hey Ard, Herbert, Dave,
The series looks fine. Ard -- thanks so much for picking up the work
and making this happen. As far as I'm concerned, this is "most" of
Zinc, simply without calling it "Zinc", and minus a few other things
that I think constitutes an okay compromise and good base for moving
forward.
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
The TODO list for me remains the same, and now I can get moving with that:
- Zinc's generic C implementation of poly1305, which is faster and has
separate implementations for u64 and u128.
- x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy.
- x86_64 Poly1305 from Zinc.
- Resurrecting the big_keys patch and receiving DavidH's review on that.
- WireGuard! Hurrah!
If you have any feedback on how you'd like this prioritized, please
pipe up. For example Dave - would you like WireGuard *now* or sometime
later? I can probably get that cooking this week, though I do have
some testing and fuzzing of it to do on top of the patches that just
landed in cryptodev.
Jason
Hey Jason, On Tue, 19 Nov 2019 at 16:18, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hey Ard, Herbert, Dave, > > The series looks fine. Ard -- thanks so much for picking up the work > and making this happen. As far as I'm concerned, this is "most" of > Zinc, simply without calling it "Zinc", and minus a few other things > that I think constitutes an okay compromise and good base for moving > forward. > > Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> > Thanks! > The TODO list for me remains the same, and now I can get moving with that: > > - Zinc's generic C implementation of poly1305, which is faster and has > separate implementations for u64 and u128. > - x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy. > - x86_64 Poly1305 from Zinc. As I pointed out in the private discussions we had, there are two aspects two AndyP's benchmarking that don't carry over 100% to the Linux kernel: - Every microarchitecture is given equal weight, regardless of the likelihood that the code will actually run on it. This makes some sense for OpenSSL, I guess, but not for the kernel. - Benchmarks are typically based on the performance of the core cryptographics transformation rather than a representative workload. This is especially relevant for network use cases, where packet sizes are not necessarily fixed and usually not a multiple of the block size (as opposed to disk encryption, where every single call is the same size and a power of 2) So for future changes, could we please include performance numbers based on realistic workloads? > - Resurrecting the big_keys patch and receiving DavidH's review on that. My concern here (but we can discuss it in the context of a repost) is that this will pull the accelerated chacha20 and poly1305 code (if enabled) into the core kernel, given that big_keys is not a tristate option. So perhaps we can park this one until we know how to proceed with static calls or alternative approaches? > - WireGuard! Hurrah! > I'm a bit surprised that this only appears at the end of your list :-) > If you have any feedback on how you'd like this prioritized, please > pipe up. For example Dave - would you like WireGuard *now* or sometime > later? I can probably get that cooking this week, though I do have > some testing and fuzzing of it to do on top of the patches that just > landed in cryptodev. > We're at -rc8, and wireguard itself will not go via the crypto tree so you should wait until after the merge window, rebase it onto -rc1 and repost it.
On Tue, Nov 19, 2019 at 4:34 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > - Zinc's generic C implementation of poly1305, which is faster and has > > separate implementations for u64 and u128. I assume your AndyP comment below didn't apply to this top item here. This one should be fairly uncontroversial in your opinion, right? > > - x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy. > > - x86_64 Poly1305 from Zinc. > > As I pointed out in the private discussions we had, there are two > aspects two AndyP's benchmarking that don't carry over 100% to the > Linux kernel: > - Every microarchitecture is given equal weight, regardless of the > likelihood that the code will actually run on it. This makes some > sense for OpenSSL, I guess, but not for the kernel. > - Benchmarks are typically based on the performance of the core > cryptographics transformation rather than a representative workload. > This is especially relevant for network use cases, where packet sizes > are not necessarily fixed and usually not a multiple of the block size > (as opposed to disk encryption, where every single call is the same > size and a power of 2) > > So for future changes, could we please include performance numbers > based on realistic workloads? Yea I share your concerns here. From preliminary results, I think the Poly1305 code will be globally better, and I don't think we'll need an abundance of discussion about it. The ChaCha case is more interesting. I'll submit this with lots of packet-sized microbenchmarks, as well as on-the-wire WireGuard testing. Eric - I'm guessing you don't care too much about Adamantium performance on x86 where people are probably better off with AES-XTS, right? Are there other specific real world cases we care about? IPsec is another one, but those concerns, packet-size wise, are more or less the same as for WireGuard. But anyway, we can cross this bridge when we come to it. > > - WireGuard! Hurrah! > > > I'm a bit surprised that this only appears at the end of your list :-) Haha, "last but not least" :) > > > If you have any feedback on how you'd like this prioritized, please > > pipe up. For example Dave - would you like WireGuard *now* or sometime > > later? I can probably get that cooking this week, though I do have > > some testing and fuzzing of it to do on top of the patches that just > > landed in cryptodev. > > > > We're at -rc8, and wireguard itself will not go via the crypto tree so > you should wait until after the merge window, rebase it onto -rc1 and > repost it. Thanks, yea, that makes sense. Netdev also has its own merge window schedule that I should aim to meet. I guess, based on this if I'm understanding correctly, we're looking at WireGuard for 5.5? Jason
On Tue, 19 Nov 2019 at 16:44, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > On Tue, Nov 19, 2019 at 4:34 PM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > > - Zinc's generic C implementation of poly1305, which is faster and has > > > separate implementations for u64 and u128. > > I assume your AndyP comment below didn't apply to this top item here. > This one should be fairly uncontroversial in your opinion, right? > Yes. > > > - x86_64 ChaCha20 from Zinc. Will be fun to discuss with Martin and Andy. > > > - x86_64 Poly1305 from Zinc. > > > > As I pointed out in the private discussions we had, there are two > > aspects two AndyP's benchmarking that don't carry over 100% to the > > Linux kernel: > > - Every microarchitecture is given equal weight, regardless of the > > likelihood that the code will actually run on it. This makes some > > sense for OpenSSL, I guess, but not for the kernel. > > - Benchmarks are typically based on the performance of the core > > cryptographics transformation rather than a representative workload. > > This is especially relevant for network use cases, where packet sizes > > are not necessarily fixed and usually not a multiple of the block size > > (as opposed to disk encryption, where every single call is the same > > size and a power of 2) > > > > So for future changes, could we please include performance numbers > > based on realistic workloads? > > Yea I share your concerns here. From preliminary results, I think the > Poly1305 code will be globally better, and I don't think we'll need an > abundance of discussion about it. > Good. > The ChaCha case is more interesting. I'll submit this with lots of > packet-sized microbenchmarks, as well as on-the-wire WireGuard > testing. Eric - I'm guessing you don't care too much about Adamantium > performance on x86 where people are probably better off with AES-XTS, > right? Are there other specific real world cases we care about? IPsec > is another one, but those concerns, packet-size wise, are more or less > the same as for WireGuard. But anyway, we can cross this bridge when > we come to it. > As long as we can show that WireGuard really benefits without regressing other users disproportionately, I think we're fine. There are not /that/ many users to begin with ... > > > > - WireGuard! Hurrah! > > > > > I'm a bit surprised that this only appears at the end of your list :-) > > Haha, "last but not least" :) > > > > > > If you have any feedback on how you'd like this prioritized, please > > > pipe up. For example Dave - would you like WireGuard *now* or sometime > > > later? I can probably get that cooking this week, though I do have > > > some testing and fuzzing of it to do on top of the patches that just > > > landed in cryptodev. > > > > > > > We're at -rc8, and wireguard itself will not go via the crypto tree so > > you should wait until after the merge window, rebase it onto -rc1 and > > repost it. > > Thanks, yea, that makes sense. Netdev also has its own merge window > schedule that I should aim to meet. I guess, based on this if I'm > understanding correctly, we're looking at WireGuard for 5.5? > No, v5.6. The merge window for v5.5 will open this Sunday, so you'll need to rebase on v5.5-rc1 once it is released, which will [hopefully] have all the crypto pieces you need. Note that this applies equally to the other changes: we can queue performance tweaks in the crypto tree in parallel, and they will all hit v5.6 at the same time.
On Tue, Nov 19, 2019 at 04:44:11PM +0100, Jason A. Donenfeld wrote: > > > > So for future changes, could we please include performance numbers > > based on realistic workloads? > > Yea I share your concerns here. From preliminary results, I think the > Poly1305 code will be globally better, and I don't think we'll need an > abundance of discussion about it. > > The ChaCha case is more interesting. I'll submit this with lots of > packet-sized microbenchmarks, as well as on-the-wire WireGuard > testing. Eric - I'm guessing you don't care too much about Adamantium > performance on x86 where people are probably better off with AES-XTS, > right? Are there other specific real world cases we care about? IPsec > is another one, but those concerns, packet-size wise, are more or less > the same as for WireGuard. But anyway, we can cross this bridge when > we come to it. I'd like for Adiantum to continue to be accelerated on x86, but it doesn't have to squeeze out all performance possible on x86, given that hardware AES support is available there so most people will use that instead. So if e.g. the ChaCha implementation is still AVX2 accelerated, but it's primarily optimized for networking packets rather than disk encryption, that would probably be fine. - Eric
On Tuesday, November 19, 2019 4:23 PM, Eric Biggers <ebiggers@kernel.org> wrote: > On Tue, Nov 19, 2019 at 04:44:11PM +0100, Jason A. Donenfeld wrote: > > > > So for future changes, could we please include performance numbers > > > based on realistic workloads? > > > > Yea I share your concerns here. From preliminary results, I think the > > Poly1305 code will be globally better, and I don't think we'll need an > > abundance of discussion about it. > > The ChaCha case is more interesting. I'll submit this with lots of > > packet-sized microbenchmarks, as well as on-the-wire WireGuard > > testing. Eric - I'm guessing you don't care too much about Adamantium > > performance on x86 where people are probably better off with AES-XTS, > > right? Are there other specific real world cases we care about? IPsec > > is another one, but those concerns, packet-size wise, are more or less > > the same as for WireGuard. But anyway, we can cross this bridge when > > we come to it. > > I'd like for Adiantum to continue to be accelerated on x86, but it doesn't have > to squeeze out all performance possible on x86, given that hardware AES support > is available there so most people will use that instead. So if e.g. the ChaCha > implementation is still AVX2 accelerated, but it's primarily optimized for > networking packets rather than disk encryption, that would probably be fine. > > - Eric I'm interested in using Adamantium on x86 and I hope that you folks won't cripple it :( Jordan