diff mbox series

[net-next,v3,02/17] zinc: introduce minimal cryptography library

Message ID 20180911010838.8818-3-Jason@zx2c4.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series None | expand

Commit Message

Jason A. Donenfeld Sept. 11, 2018, 1:08 a.m. UTC
Zinc stands for "Zinc Is Neat Crypto" or "Zinc as IN Crypto" or maybe
just "Zx2c4's INsane Cryptolib." It's also short, easy to type, and
plays nicely with the recent trend of naming crypto libraries after
elements. The guiding principle is "don't overdo it". It's less of a
library and more of a directory tree for organizing well-curated direct
implementations of cryptography primitives.

Zinc is a new cryptography API that is much more minimal and lower-level
than the current one. It intends to complement it and provide a basis
upon which the current crypto API might build, as the provider of
software implementations of cryptographic primitives. It is motivated by
three primary observations in crypto API design:

  * Highly composable "cipher modes" and related abstractions from
    90s cryptographers did not turn out to be as terrific an idea as
    hoped, leading to a host of API misuse problems.

  * Most programmers are afraid of crypto code, and so prefer to
    integrate it into libraries in a highly abstracted manner, so as to
    shield themselves from implementation details. Cryptographers, on
    the other hand, prefer simple direct implementations, which they're
    able to verify for high assurance and optimize in accordance with
    their expertise.

  * Overly abstracted and flexible cryptography APIs lead to a host of
    dangerous problems and performance issues. The kernel is in the
    business usually not of coming up with new uses of crypto, but
    rather implementing various constructions, which means it essentially
    needs a library of primitives, not a highly abstracted enterprise-ready
    pluggable system, with a few particular exceptions.

This last observation has seen itself play out several times over and
over again within the kernel:

  * The perennial move of actual primitives away from crypto/ and into
    lib/, so that users can actually call these functions directly with
    no overhead and without lots of allocations, function pointers,
    string specifier parsing, and general clunkiness. For example:
    sha256, chacha20, siphash, sha1, and so forth live in lib/ rather
    than in crypto/. Zinc intends to stop the cluttering of lib/ and
    introduce these direct primitives into their proper place, lib/zinc/.

  * An abundance of misuse bugs with the present crypto API that have
    been very unpleasant to clean up.

  * A hesitance to even use cryptography, because of the overhead and
    headaches involved in accessing the routines.

Zinc goes in a rather different direction. Rather than providing a
thoroughly designed and abstracted API, Zinc gives you simple functions,
which implement some primitive, or some particular and specific
construction of primitives. It is not dynamic in the least, though one
could imagine implementing a complex dynamic dispatch mechanism (such as
the current crypto API) on top of these basic functions. After all,
dynamic dispatch is usually needed for applications with cipher agility,
such as IPsec, dm-crypt, AF_ALG, and so forth, and the existing crypto
API will continue to play that role. However, Zinc will provide a non-
haphazard way of directly utilizing crypto routines in applications
that do have neither the need nor desire for abstraction and dynamic
dispatch.

It also organizes the implementations in a simple, straight-forward,
and direct manner, making it enjoyable and intuitive to work on.
Rather than moving optimized assembly implementations into arch/, it
keeps them all together in lib/zinc/, making it simple and obvious to
compare and contrast what's happening. This is, notably, exactly what
the lib/raid6/ tree does, and that seems to work out rather well. It's
also the pattern of most successful crypto libraries. The architecture-
specific glue-code is made a part of each translation unit, rather than
being in a separate one, so that generic and architecture-optimized code
are combined at compile-time, and incompatibility branches compiled out by
the optimizer.

All implementations have been extensively tested and fuzzed, and are
selected for their quality, trustworthiness, and performance. Wherever
possible and performant, formally verified implementations are used,
such as those from HACL* [1] and Fiat-Crypto [2]. The routines also take
special care to zero out secrets using memzero_explicit (and future work
is planned to have gcc do this more reliably and performantly with
compiler plugins). The performance of the selected implementations is
state-of-the-art and unrivaled on a broad array of hardware, though of
course we will continue to fine tune these to the hardware demands
needed by kernel contributors. Each implementation also comes with
extensive self-tests and crafted test vectors, pulled from various
places such as Wycheproof [9].

Regularity of function signatures is important, so that users can easily
"guess" the name of the function they want. Though, individual
primitives are oftentimes not trivially interchangeable, having been
designed for different things and requiring different parameters and
semantics, and so the function signatures they provide will directly
reflect the realities of the primitives' usages, rather than hiding it
behind (inevitably leaky) abstractions. Also, in contrast to the current
crypto API, Zinc functions can work on stack buffers, and can be called
with different keys, without requiring allocations or locking.

SIMD is used automatically when available, though some routines may
benefit from either having their SIMD disabled for particular
invocations, or to have the SIMD initialization calls amortized over
several invocations of the function, and so Zinc utilizes function
signatures enabling that in conjunction with the recently introduced
simd_context_t.

More generally, Zinc provides function signatures that allow just what
is required by the various callers. This isn't to say that users of the
functions will be permitted to pollute the function semantics with weird
particular needs, but we are trying very hard not to overdo it, and that
means looking carefully at what's actually necessary, and doing just that,
and not much more than that. Remember: practicality and cleanliness rather
than over-zealous infrastructure.

Zinc provides also an opening for the best implementers in academia to
contribute their time and effort to the kernel, by being sufficiently
simple and inviting. In discussing this commit with some of the best and
brightest over the last few years, there are many who are eager to
devote rare talent and energy to this effort.

Following the merging of this, I expect for the primitives that
currently exist in lib/ to work their way into lib/zinc/, after intense
scrutiny of each implementation, potentially replacing them with either
formally-verified implementations, or better studied and faster
state-of-the-art implementations.

Also following the merging of this, I expect for the old crypto API
implementations to be ported over to use Zinc for their software-based
implementations.

As Zinc is simply library code, its config options are un-menued, with
the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
BUG_ONs.

[1] https://github.com/project-everest/hacl-star
[2] https://github.com/mit-plv/fiat-crypto
[3] https://cr.yp.to/ecdh.html
[4] https://cr.yp.to/chacha.html
[5] https://cr.yp.to/snuffle/xsalsa-20081128.pdf
[6] https://cr.yp.to/mac.html
[7] https://blake2.net/
[8] https://tools.ietf.org/html/rfc8439
[9] https://github.com/google/wycheproof

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: linux-crypto@vger.kernel.org
---
 MAINTAINERS       |  8 ++++++++
 lib/Kconfig       |  2 ++
 lib/Makefile      |  2 ++
 lib/zinc/Kconfig  | 20 ++++++++++++++++++++
 lib/zinc/Makefile |  8 ++++++++
 lib/zinc/main.c   | 31 +++++++++++++++++++++++++++++++
 6 files changed, 71 insertions(+)
 create mode 100644 lib/zinc/Kconfig
 create mode 100644 lib/zinc/Makefile
 create mode 100644 lib/zinc/main.c

Comments

Ard Biesheuvel Sept. 11, 2018, 10:08 a.m. UTC | #1
On 11 September 2018 at 03:08, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Zinc stands for "Zinc Is Neat Crypto" or "Zinc as IN Crypto" or maybe
> just "Zx2c4's INsane Cryptolib." It's also short, easy to type, and
> plays nicely with the recent trend of naming crypto libraries after
> elements. The guiding principle is "don't overdo it". It's less of a
> library and more of a directory tree for organizing well-curated direct
> implementations of cryptography primitives.
>
> Zinc is a new cryptography API that is much more minimal and lower-level
> than the current one. It intends to complement it and provide a basis
> upon which the current crypto API might build, as the provider of
> software implementations of cryptographic primitives. It is motivated by
> three primary observations in crypto API design:
>
>   * Highly composable "cipher modes" and related abstractions from
>     90s cryptographers did not turn out to be as terrific an idea as
>     hoped, leading to a host of API misuse problems.
>
>   * Most programmers are afraid of crypto code, and so prefer to
>     integrate it into libraries in a highly abstracted manner, so as to
>     shield themselves from implementation details. Cryptographers, on
>     the other hand, prefer simple direct implementations, which they're
>     able to verify for high assurance and optimize in accordance with
>     their expertise.
>
>   * Overly abstracted and flexible cryptography APIs lead to a host of
>     dangerous problems and performance issues. The kernel is in the
>     business usually not of coming up with new uses of crypto, but
>     rather implementing various constructions, which means it essentially
>     needs a library of primitives, not a highly abstracted enterprise-ready
>     pluggable system, with a few particular exceptions.
>
> This last observation has seen itself play out several times over and
> over again within the kernel:
>
>   * The perennial move of actual primitives away from crypto/ and into
>     lib/, so that users can actually call these functions directly with
>     no overhead and without lots of allocations, function pointers,
>     string specifier parsing, and general clunkiness. For example:
>     sha256, chacha20, siphash, sha1, and so forth live in lib/ rather
>     than in crypto/. Zinc intends to stop the cluttering of lib/ and
>     introduce these direct primitives into their proper place, lib/zinc/.
>
>   * An abundance of misuse bugs with the present crypto API that have
>     been very unpleasant to clean up.
>
>   * A hesitance to even use cryptography, because of the overhead and
>     headaches involved in accessing the routines.
>
> Zinc goes in a rather different direction. Rather than providing a
> thoroughly designed and abstracted API, Zinc gives you simple functions,
> which implement some primitive, or some particular and specific
> construction of primitives. It is not dynamic in the least, though one
> could imagine implementing a complex dynamic dispatch mechanism (such as
> the current crypto API) on top of these basic functions. After all,
> dynamic dispatch is usually needed for applications with cipher agility,
> such as IPsec, dm-crypt, AF_ALG, and so forth, and the existing crypto
> API will continue to play that role. However, Zinc will provide a non-
> haphazard way of directly utilizing crypto routines in applications
> that do have neither the need nor desire for abstraction and dynamic
> dispatch.
>
> It also organizes the implementations in a simple, straight-forward,
> and direct manner, making it enjoyable and intuitive to work on.
> Rather than moving optimized assembly implementations into arch/, it
> keeps them all together in lib/zinc/, making it simple and obvious to
> compare and contrast what's happening. This is, notably, exactly what
> the lib/raid6/ tree does, and that seems to work out rather well. It's
> also the pattern of most successful crypto libraries. The architecture-
> specific glue-code is made a part of each translation unit, rather than
> being in a separate one, so that generic and architecture-optimized code
> are combined at compile-time, and incompatibility branches compiled out by
> the optimizer.
>
> All implementations have been extensively tested and fuzzed, and are
> selected for their quality, trustworthiness, and performance. Wherever
> possible and performant, formally verified implementations are used,
> such as those from HACL* [1] and Fiat-Crypto [2]. The routines also take
> special care to zero out secrets using memzero_explicit (and future work
> is planned to have gcc do this more reliably and performantly with
> compiler plugins). The performance of the selected implementations is
> state-of-the-art and unrivaled on a broad array of hardware, though of
> course we will continue to fine tune these to the hardware demands
> needed by kernel contributors. Each implementation also comes with
> extensive self-tests and crafted test vectors, pulled from various
> places such as Wycheproof [9].
>
> Regularity of function signatures is important, so that users can easily
> "guess" the name of the function they want. Though, individual
> primitives are oftentimes not trivially interchangeable, having been
> designed for different things and requiring different parameters and
> semantics, and so the function signatures they provide will directly
> reflect the realities of the primitives' usages, rather than hiding it
> behind (inevitably leaky) abstractions. Also, in contrast to the current
> crypto API, Zinc functions can work on stack buffers, and can be called
> with different keys, without requiring allocations or locking.
>
> SIMD is used automatically when available, though some routines may
> benefit from either having their SIMD disabled for particular
> invocations, or to have the SIMD initialization calls amortized over
> several invocations of the function, and so Zinc utilizes function
> signatures enabling that in conjunction with the recently introduced
> simd_context_t.
>
> More generally, Zinc provides function signatures that allow just what
> is required by the various callers. This isn't to say that users of the
> functions will be permitted to pollute the function semantics with weird
> particular needs, but we are trying very hard not to overdo it, and that
> means looking carefully at what's actually necessary, and doing just that,
> and not much more than that. Remember: practicality and cleanliness rather
> than over-zealous infrastructure.
>
> Zinc provides also an opening for the best implementers in academia to
> contribute their time and effort to the kernel, by being sufficiently
> simple and inviting. In discussing this commit with some of the best and
> brightest over the last few years, there are many who are eager to
> devote rare talent and energy to this effort.
>
> Following the merging of this, I expect for the primitives that
> currently exist in lib/ to work their way into lib/zinc/, after intense
> scrutiny of each implementation, potentially replacing them with either
> formally-verified implementations, or better studied and faster
> state-of-the-art implementations.
>
> Also following the merging of this, I expect for the old crypto API
> implementations to be ported over to use Zinc for their software-based
> implementations.
>
> As Zinc is simply library code, its config options are un-menued, with
> the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
> BUG_ONs.
>

In spite of the wall of text, you fail to point out exactly why the
existing AEAD API in unsuitable, and why fixing it is not an option.

As I pointed out in a previous version, I don't think we need a
separate crypto API/library in the kernel, and I don't think you have
convinced anyone else yet either.

Perhaps you can devote /your/ rare talent and energy to improving what
we already have for everybody's sake, rather than providing a
completely separate crypto stack that only benefits WireGuard (unless
you yourself port the existing crypto API software algorithms to this
crypto stack first and present *that* work as a convincing case in
itself)

I won't go into the 1000s lines of generated assembly again - you
already know my position on that topic.

Please refrain from sending a v4 with just a couple of more tweaks on
top - these are fundamental issues that require discussion before
there is any chance of this being merged.


> [1] https://github.com/project-everest/hacl-star
> [2] https://github.com/mit-plv/fiat-crypto
> [3] https://cr.yp.to/ecdh.html
> [4] https://cr.yp.to/chacha.html
> [5] https://cr.yp.to/snuffle/xsalsa-20081128.pdf
> [6] https://cr.yp.to/mac.html
> [7] https://blake2.net/
> [8] https://tools.ietf.org/html/rfc8439
> [9] https://github.com/google/wycheproof
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Cc: linux-crypto@vger.kernel.org
> ---
>  MAINTAINERS       |  8 ++++++++
>  lib/Kconfig       |  2 ++
>  lib/Makefile      |  2 ++
>  lib/zinc/Kconfig  | 20 ++++++++++++++++++++
>  lib/zinc/Makefile |  8 ++++++++
>  lib/zinc/main.c   | 31 +++++++++++++++++++++++++++++++
>  6 files changed, 71 insertions(+)
>  create mode 100644 lib/zinc/Kconfig
>  create mode 100644 lib/zinc/Makefile
>  create mode 100644 lib/zinc/main.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2ef884b883c3..d2092e52320d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16160,6 +16160,14 @@ Q:     https://patchwork.linuxtv.org/project/linux-media/list/
>  S:     Maintained
>  F:     drivers/media/dvb-frontends/zd1301_demod*
>
> +ZINC CRYPTOGRAPHY LIBRARY
> +M:     Jason A. Donenfeld <Jason@zx2c4.com>
> +M:     Samuel Neves <sneves@dei.uc.pt>
> +S:     Maintained
> +F:     lib/zinc/
> +F:     include/zinc/
> +L:     linux-crypto@vger.kernel.org
> +
>  ZPOOL COMPRESSED PAGE STORAGE API
>  M:     Dan Streetman <ddstreet@ieee.org>
>  L:     linux-mm@kvack.org
> diff --git a/lib/Kconfig b/lib/Kconfig
> index a3928d4438b5..3e6848269c66 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -485,6 +485,8 @@ config GLOB_SELFTEST
>           module load) by a small amount, so you're welcome to play with
>           it, but you probably don't need it.
>
> +source "lib/zinc/Kconfig"
> +
>  #
>  # Netlink attribute parsing support is select'ed if needed
>  #
> diff --git a/lib/Makefile b/lib/Makefile
> index ca3f7ebb900d..3f16e35d2c11 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -214,6 +214,8 @@ obj-$(CONFIG_PERCPU_TEST) += percpu_test.o
>
>  obj-$(CONFIG_ASN1) += asn1_decoder.o
>
> +obj-$(CONFIG_ZINC) += zinc/
> +
>  obj-$(CONFIG_FONT_SUPPORT) += fonts/
>
>  obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o
> diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
> new file mode 100644
> index 000000000000..aa4f8d449d6b
> --- /dev/null
> +++ b/lib/zinc/Kconfig
> @@ -0,0 +1,20 @@
> +config ZINC
> +       tristate
> +       select CRYPTO_BLKCIPHER
> +       select VFP
> +       select VFPv3
> +       select NEON
> +       select KERNEL_MODE_NEON
> +
> +config ZINC_DEBUG
> +       bool "Zinc cryptography library debugging and self-tests"
> +       depends on ZINC
> +       help
> +         This builds a series of self-tests for the Zinc crypto library, which
> +         help diagnose any cryptographic algorithm implementation issues that
> +         might be at the root cause of potential bugs. It also adds various
> +         debugging traps.
> +
> +         Unless you're developing and testing cryptographic routines, or are
> +         especially paranoid about correctness on your hardware, you may say
> +         N here.
> diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
> new file mode 100644
> index 000000000000..dad47573de42
> --- /dev/null
> +++ b/lib/zinc/Makefile
> @@ -0,0 +1,8 @@
> +ccflags-y := -O3
> +ccflags-y += -Wframe-larger-than=8192
> +ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt'
> +ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
> +
> +zinc-y += main.o
> +
> +obj-$(CONFIG_ZINC) := zinc.o
> diff --git a/lib/zinc/main.c b/lib/zinc/main.c
> new file mode 100644
> index 000000000000..ceece33ff5a7
> --- /dev/null
> +++ b/lib/zinc/main.c
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +
> +#ifdef DEBUG
> +#define selftest(which) do { \
> +       if (!which ## _selftest()) \
> +               return -ENOTRECOVERABLE; \
> +} while (0)
> +#else
> +#define selftest(which)
> +#endif
> +
> +static int __init mod_init(void)
> +{
> +       return 0;
> +}
> +
> +static void __exit mod_exit(void)
> +{
> +}
> +
> +module_init(mod_init);
> +module_exit(mod_exit);
> +MODULE_LICENSE("GPL v2");
> +MODULE_DESCRIPTION("Zinc cryptography library");
> +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
> --
> 2.18.0
>
Greg KH Sept. 11, 2018, 2:56 p.m. UTC | #2
On Tue, Sep 11, 2018 at 12:08:56PM +0200, Ard Biesheuvel wrote:
> > As Zinc is simply library code, its config options are un-menued, with
> > the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
> > BUG_ONs.
> >
> 
> In spite of the wall of text, you fail to point out exactly why the
> existing AEAD API in unsuitable, and why fixing it is not an option.
> 
> As I pointed out in a previous version, I don't think we need a
> separate crypto API/library in the kernel, and I don't think you have
> convinced anyone else yet either.

Um, then why do people keep sprinkling new crypto/hash code all around
the kernel tree?  It's because what we have as a crypto api is complex
and is hard to use for many in-kernel users.

Something like this new interface (zinc) is a much "saner" api for
writing kernel code that has to interact with crypto/hash primitives.

I see no reason why the existing crypto code can be redone to use the
zinc crypto primitives over time, making there only be one main location
for the crypto algorithms.  But to do it the other way around is pretty
much impossible given the complexities in the existing api that has been
created over time.

Not to say that the existing api is not a viable one, but ugh, really?
You have to admit it is a pain to try to use in any "normal" type of
"here's a bytestream, go give me a hash" type of method, right?

Also there is the added benefit that the crypto primitives here have
been audited by a number of people (so Jason stated), and they are
written in a way that the crypto community can more easily interact and
contribute to.  Which is _way_ better than what we have today.

So this gets my "stamp of approval" for whatever it is worth :)

thanks,

greg k-h
Jason A. Donenfeld Sept. 11, 2018, 9:22 p.m. UTC | #3
Hello Ard,

I realize you've put a lot of good and hard work into the existing
crypto API, and that my writing in these commit messages might be a
bit too bristly and dismissive of that hard work. So I think in a
sense it's understandable that you've responded as such here. But
hopefully I can address your concerns. One thing to keep in mind is
that Zinc endeavors to provide the basis for simple and direct
functions to software algorithms. This is fairly modest goal. Just
some functions that do some stuff in software. Around these you'll
still be able to have complicated and impressive dynamic dispatch and
asynchronous mechanisms such as the present crypto API. Zinc is merely
getting the software implementation side done in a very simple and
direct way. So I don't think there's a good reason for so much
antagonism, despite a perhaps overbearing tone of my commit messages.
Rather, I expect that we'll wind up working together on this quite a
bit down the line.

> In spite of the wall of text, you fail to point out exactly why the
> existing AEAD API in unsuitable, and why fixing it is not an option.

I thought I had addressed this. Firstly, there's a need for more than
just AEAD, but ignoring that, the AEAD API is a big full API that does
lots of things, makes allocations, parses descriptors, and so forth.
I'm sure this kind of highly-engineered approach will continue to
improve over time in that highly engineered direction. Zinc is doing
something a bit different: it's providing a series of simple functions
for various cryptographic routines. This is a considerably different
goal -- perhaps even a complementary one -- to the AEAD API.

> I don't think you have
> convinced anyone else yet either.

Please only speak for yourself and refrain from rhetoric like this,
which is patently false.

> Please refrain from sending a v4 with just a couple of more tweaks on
> top

Sorry, no, I'm not going to stop working hard on this because you're
wary of a new approach. I will continue to improve the submission
until it is mergable, and I do not intend to stop.

Anyway, it sounds like this whole thing may have ruffled your feathers
a bit. Will you be at Linux Plumbers Conference in November? I'm
planning on attending, and perhaps we could find some time there to
sit down and talk one on one a bit.

Regards,
Jason
Eric Biggers Sept. 11, 2018, 9:47 p.m. UTC | #4
On Tue, Sep 11, 2018 at 04:56:24PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Sep 11, 2018 at 12:08:56PM +0200, Ard Biesheuvel wrote:
> > > As Zinc is simply library code, its config options are un-menued, with
> > > the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
> > > BUG_ONs.
> > >
> > 
> > In spite of the wall of text, you fail to point out exactly why the
> > existing AEAD API in unsuitable, and why fixing it is not an option.
> > 
> > As I pointed out in a previous version, I don't think we need a
> > separate crypto API/library in the kernel, and I don't think you have
> > convinced anyone else yet either.
> 
> Um, then why do people keep sprinkling new crypto/hash code all around
> the kernel tree?  It's because what we have as a crypto api is complex
> and is hard to use for many in-kernel users.
> 
> Something like this new interface (zinc) is a much "saner" api for
> writing kernel code that has to interact with crypto/hash primitives.
> 
> I see no reason why the existing crypto code can be redone to use the
> zinc crypto primitives over time, making there only be one main location
> for the crypto algorithms.  But to do it the other way around is pretty
> much impossible given the complexities in the existing api that has been
> created over time.
> 
> Not to say that the existing api is not a viable one, but ugh, really?
> You have to admit it is a pain to try to use in any "normal" type of
> "here's a bytestream, go give me a hash" type of method, right?
> 
> Also there is the added benefit that the crypto primitives here have
> been audited by a number of people (so Jason stated), and they are
> written in a way that the crypto community can more easily interact and
> contribute to.  Which is _way_ better than what we have today.
> 
> So this gets my "stamp of approval" for whatever it is worth :)
> 

I think you mean you see no reason why it *cannot* be converted?  The
conversions definitely *should* be done, just like how some of the existing
crypto API algorithms like SHA-256 already wrap implementations in lib/.  In my
view, lib/zinc/ isn't fundamentally different from what we already have for some
algorithms.  So it's misguided to design/present it as some novel thing, which
unfortunately this patchset still does to a large extent.  (The actual new thing
is how architecture-specific implementations are handled.)

Of course, the real problem is that even after multiple revisions of this
patchset, there's still no actual conversions of the existing crypto API
algorithms over to use the new implementations.  "Zinc" is still completely
separate from the existing crypto API.

So, it's not yet clear that the conversions will actually work out without
problems that would require changes in "Zinc".  I don't think it makes sense to
merge all this stuff without doing the conversions, or at the very least
demonstrating how they will be done.

In particular, in its current form "Zinc" is useless for anyone that needs the
existing crypto API.  For example, for HPolyC,
(https://lkml.org/lkml/2018/8/6/857), I need to make improvements to ChaCha and
Poly1305 in the existing crypto API, e.g. to add support for XChaCha and
NEON-accelerated Poly1305.  Having completely separate ChaCha and Poly1305
implementations in Zinc doesn't help at all.  If anything, it makes things
harder because people will have to review/maintain both sets of implementations;
and when trying to make the improvements I need, I'll find myself in the middle
of a holy war between two competing camps who each have their own opinion about
The Right Way To Do Crypto, and their own crypto implementations and APIs in the
kernel.

- Eric
Jason A. Donenfeld Sept. 11, 2018, 10:02 p.m. UTC | #5
On Tue, Sep 11, 2018 at 3:47 PM Eric Biggers <ebiggers@kernel.org> wrote:
> Of course, the real problem is that even after multiple revisions of this
> patchset, there's still no actual conversions of the existing crypto API
> algorithms over to use the new implementations.  "Zinc" is still completely
> separate from the existing crypto API.

No this is not, "the real problem [...] after multiple revisions"
because I've offered to do this and stated pretty clearly my intent to
do so. But, as I've mentioned before, I'd really prefer to land this
series through net-next, and then after we can turn our attention to
integrating this into the existing crypto API. This series is already
big enough and I would really prefer not to further complicate it.

So, what you want is going to happen. There isn't some kind of
fundamental problem here. This is more of a discussion on
scheduling/trees/mergecycles than anything else.
Eric Biggers Sept. 11, 2018, 10:08 p.m. UTC | #6
On Tue, Sep 11, 2018 at 12:08:56PM +0200, Ard Biesheuvel wrote:
> 
> I won't go into the 1000s lines of generated assembly again - you
> already know my position on that topic.
> 

I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
you've actually read through the asm from the OpenSSL implementations, but the
generated .S files actually do lose a lot of semantic information that was in
the original .pl scripts.  For example, in the Poly1305 NEON implementation
which I'm especially interested in (but you could check any of the other
generated files too), the original .pl script has register aliases showing the
meaning of each register.  Just grabbing a random hunk:

        vshr.u64    $T0,$D3,#26
        vmovn.i64   $D3#lo,$D3
         vshr.u64   $T1,$D0,#26
         vmovn.i64  $D0#lo,$D0
        vadd.i64    $D4,$D4,$T0     @ h3 -> h4
        vbic.i32    $D3#lo,#0xfc000000
          vsri.u32  $H4,$H3,#8      @ base 2^32 -> base 2^26
         vadd.i64   $D1,$D1,$T1     @ h0 -> h1
          vshl.u32  $H3,$H3,#18
         vbic.i32   $D0#lo,#0xfc000000

(Yes, it's still not *that* readable, but D0-D4 and H0-H4 map directly to d0-d4
and h0-h4 in the C implementation.  So someone familiar with Poly1305
implementations can figure it out.)

In contrast, the generated .S file just has the raw registers.  It's difficult
to remember what each register is used for.  In fact, someone who actually
wanted to figure it out would probably find themselves referring to the .pl
script -- which raises the question of why the .S file is the "source" and not
the .pl script...

        vshr.u64        q15,q8,#26
        vmovn.i64       d16,q8
         vshr.u64       q4,q5,#26
         vmovn.i64      d10,q5
        vadd.i64        q9,q9,q15               @ h3 -> h4
        vbic.i32        d16,#0xfc000000
          vsri.u32      q14,q13,#8              @ base 2^32 -> base 2^26
         vadd.i64       q6,q6,q4                @ h0 -> h1
          vshl.u32      q13,q13,#18
         vbic.i32       d10,#0xfc000000

- Eric
Andy Lutomirski Sept. 11, 2018, 10:16 p.m. UTC | #7
> On Sep 11, 2018, at 2:47 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> 
>> On Tue, Sep 11, 2018 at 04:56:24PM +0200, Greg Kroah-Hartman wrote:
>> On Tue, Sep 11, 2018 at 12:08:56PM +0200, Ard Biesheuvel wrote:
>>>> As Zinc is simply library code, its config options are un-menued, with
>>>> the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
>>>> BUG_ONs.
>>>> 
>>> 
>>> In spite of the wall of text, you fail to point out exactly why the
>>> existing AEAD API in unsuitable, and why fixing it is not an option.
>>> 
>>> As I pointed out in a previous version, I don't think we need a
>>> separate crypto API/library in the kernel, and I don't think you have
>>> convinced anyone else yet either.
>> 
>> Um, then why do people keep sprinkling new crypto/hash code all around
>> the kernel tree?  It's because what we have as a crypto api is complex
>> and is hard to use for many in-kernel users.
>> 
>> Something like this new interface (zinc) is a much "saner" api for
>> writing kernel code that has to interact with crypto/hash primitives.
>> 
>> I see no reason why the existing crypto code can be redone to use the
>> zinc crypto primitives over time, making there only be one main location
>> for the crypto algorithms.  But to do it the other way around is pretty
>> much impossible given the complexities in the existing api that has been
>> created over time.
>> 
>> Not to say that the existing api is not a viable one, but ugh, really?
>> You have to admit it is a pain to try to use in any "normal" type of
>> "here's a bytestream, go give me a hash" type of method, right?
>> 
>> Also there is the added benefit that the crypto primitives here have
>> been audited by a number of people (so Jason stated), and they are
>> written in a way that the crypto community can more easily interact and
>> contribute to.  Which is _way_ better than what we have today.
>> 
>> So this gets my "stamp of approval" for whatever it is worth :)
>> 
> 
> I think you mean you see no reason why it *cannot* be converted?  The
> conversions definitely *should* be done, just like how some of the existing
> crypto API algorithms like SHA-256 already wrap implementations in lib/.  In my
> view, lib/zinc/ isn't fundamentally different from what we already have for some
> algorithms.  So it's misguided to design/present it as some novel thing, which
> unfortunately this patchset still does to a large extent.  (The actual new thing
> is how architecture-specific implementations are handled.)
> 
> Of course, the real problem is that even after multiple revisions of this
> patchset, there's still no actual conversions of the existing crypto API
> algorithms over to use the new implementations.  "Zinc" is still completely
> separate from the existing crypto API.
> 

Jason, can you do one of these conversions as an example?

> So, it's not yet clear that the conversions will actually work out without
> problems that would require changes in "Zinc".  I don't think it makes sense to
> merge all this stuff without doing the conversions, or at the very least
> demonstrating how they will be done.
> 
> In particular, in its current form "Zinc" is useless for anyone that needs the
> existing crypto API.  For example, for HPolyC,
> (https://lkml.org/lkml/2018/8/6/857), I need to make improvements to ChaCha and
> Poly1305 in the existing crypto API, e.g. to add support for XChaCha and
> NEON-accelerated Poly1305.  Having completely separate ChaCha and Poly1305
> implementations in Zinc doesn't help at all.  If anything, it makes things
> harder because people will have to review/maintain both sets of implementations;
> and when trying to make the improvements I need, I'll find myself in the middle
> of a holy war between two competing camps who each have their own opinion about
> The Right Way To Do Crypto, and their own crypto implementations and APIs in the
> kernel.
> 
> - Eric
Jason A. Donenfeld Sept. 11, 2018, 10:18 p.m. UTC | #8
On Tue, Sep 11, 2018 at 4:16 PM Andy Lutomirski <luto@amacapital.net> wrote:
> Jason, can you do one of these conversions as an example?

My preference is really to leave that to a different patch series. But
if you think I *must*, then I shall.

Jason
Andy Lutomirski Sept. 11, 2018, 11:01 p.m. UTC | #9
> On Sep 11, 2018, at 3:18 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
>> On Tue, Sep 11, 2018 at 4:16 PM Andy Lutomirski <luto@amacapital.net> wrote:
>> Jason, can you do one of these conversions as an example?
> 
> My preference is really to leave that to a different patch series. But
> if you think I *must*, then I shall.
> 
> 

I think Ard’s point is valid: in the long run we don’t want two competing software implementations of each primitive. It clearly *should* be possible to make crypto API call into zinc for synchronous software operations, but a demonstration of how this actually works and that there isn’t some change to zinc to make it would well would be in order, I think.

IMO the right approach is do one conversion right away and save the rest for later.
Andrew Lunn Sept. 11, 2018, 11:30 p.m. UTC | #10
On Tue, Sep 11, 2018 at 04:02:52PM -0600, Jason A. Donenfeld wrote:
> On Tue, Sep 11, 2018 at 3:47 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > Of course, the real problem is that even after multiple revisions of this
> > patchset, there's still no actual conversions of the existing crypto API
> > algorithms over to use the new implementations.  "Zinc" is still completely
> > separate from the existing crypto API.
> 
> No this is not, "the real problem [...] after multiple revisions"
> because I've offered to do this and stated pretty clearly my intent to
> do so. But, as I've mentioned before, I'd really prefer to land this
> series through net-next

Hi Jason

Just as an FYI:

1) I don't think anybody in netdev has taken a serious look at the
network code yet. There is little point until the controversial part
of the code, Zinc, has been sorted out.

2) I personally would be surprised if DaveM took this code without
having an Acked-by from the crypto subsystem people. In the same way,
i doubt the crypto people would take an Ethernet driver without having
DaveM's Acked-by.

       Andrew
David Miller Sept. 11, 2018, 11:57 p.m. UTC | #11
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 12 Sep 2018 01:30:15 +0200

> Just as an FYI:
> 
> 1) I don't think anybody in netdev has taken a serious look at the
> network code yet. There is little point until the controversial part
> of the code, Zinc, has been sorted out.
> 
> 2) I personally would be surprised if DaveM took this code without
> having an Acked-by from the crypto subsystem people. In the same way,
> i doubt the crypto people would take an Ethernet driver without having
> DaveM's Acked-by.

Both of Andrew's statements are completely true.

I'm not looking at any of the networking bits until the crypto stuff
is fully sorted and fully supported and Ack'd by crypto folks.
Jason A. Donenfeld Sept. 12, 2018, 12:01 a.m. UTC | #12
Hi Andy,

On Tue, Sep 11, 2018 at 5:01 PM Andy Lutomirski <luto@amacapital.net> wrote:
> I think Ard’s point is valid: in the long run we don’t want two competing software implementations of each primitive. It clearly *should* be possible to make crypto API call into zinc for synchronous software operations, but a demonstration of how this actually works and that there isn’t some change to zinc to make it would well would be in order, I think.
>
> IMO the right approach is do one conversion right away and save the rest for later.

Alright, I'll go ahead and do this for v4. Thanks for the guidance.

Jason
Jason A. Donenfeld Sept. 12, 2018, 12:02 a.m. UTC | #13
On Tue, Sep 11, 2018 at 5:57 PM David Miller <davem@davemloft.net> wrote:
> Both of Andrew's statements are completely true.
>
> I'm not looking at any of the networking bits until the crypto stuff
> is fully sorted and fully supported and Ack'd by crypto folks.

Seems reasonable to me.

Jason
Jason A. Donenfeld Sept. 12, 2018, 4:29 a.m. UTC | #14
Hi Eric, Andy,

> On Tue, Sep 11, 2018 at 5:01 PM Andy Lutomirski <luto@amacapital.net> wrote:
> > I think Ard’s point is valid: in the long run we don’t want two competing software implementations of each primitive. It clearly *should* be possible to make crypto API call into zinc for synchronous software operations, but a demonstration of how this actually works and that there isn’t some change to zinc to make it would well would be in order, I think.
> >
> > IMO the right approach is do one conversion right away and save the rest for later.

It turned out to be fairly straight forward and quick to do. I'm on a
transatlantic plane so the connection is super spotty, but you can
check out the three commits added to:
<https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log?h=jd/wireguard>
-- the two "crypto:" commits and the one "random:" commit.
Alternatively, you can look directly at
<https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/tree/crypto/poly1305_zinc.c?h=jd/wireguard>
and <https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/tree/crypto/chacha20_zinc.c?h=jd/wireguard>.
I'll be refining and checking these over the next week or so, but the
basics were super easy to do.

Jason
Jason A. Donenfeld Sept. 12, 2018, 6:16 p.m. UTC | #15
Hi Eric,

On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
> you've actually read through the asm from the OpenSSL implementations, but the
> generated .S files actually do lose a lot of semantic information that was in
> the original .pl scripts.

The thing to keep in mind is that the .S was not directly and blindly
generated from the .pl. We started with the output of the .pl, and
then, particularly in the case of x86_64, worked with it a lot, and
now it's something a bit different. We've definitely spent a lot of
time reading that assembly.

I'll see if I can improve the readability with some register name
remapping on ARM. No guarantees, but I'll play a bit and see if I can
make it a bit better.

Jason
Ard Biesheuvel Sept. 12, 2018, 6:19 p.m. UTC | #16
On 12 September 2018 at 20:16, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Eric,
>
> On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
>> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
>> you've actually read through the asm from the OpenSSL implementations, but the
>> generated .S files actually do lose a lot of semantic information that was in
>> the original .pl scripts.
>
> The thing to keep in mind is that the .S was not directly and blindly
> generated from the .pl. We started with the output of the .pl, and
> then, particularly in the case of x86_64, worked with it a lot, and
> now it's something a bit different. We've definitely spent a lot of
> time reading that assembly.
>

Can we please have those changes as a separate patch? Preferably to
the .pl file rather than the .S file, so we can easily distinguish the
code from upstream from the code that you modified.

> I'll see if I can improve the readability with some register name
> remapping on ARM. No guarantees, but I'll play a bit and see if I can
> make it a bit better.
>
> Jason
Eric Biggers Sept. 12, 2018, 6:34 p.m. UTC | #17
On Wed, Sep 12, 2018 at 08:19:21PM +0200, Ard Biesheuvel wrote:
> On 12 September 2018 at 20:16, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > Hi Eric,
> >
> > On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
> >> you've actually read through the asm from the OpenSSL implementations, but the
> >> generated .S files actually do lose a lot of semantic information that was in
> >> the original .pl scripts.
> >
> > The thing to keep in mind is that the .S was not directly and blindly
> > generated from the .pl. We started with the output of the .pl, and
> > then, particularly in the case of x86_64, worked with it a lot, and
> > now it's something a bit different. We've definitely spent a lot of
> > time reading that assembly.
> >
> 
> Can we please have those changes as a separate patch? Preferably to
> the .pl file rather than the .S file, so we can easily distinguish the
> code from upstream from the code that you modified.
> 
> > I'll see if I can improve the readability with some register name
> > remapping on ARM. No guarantees, but I'll play a bit and see if I can
> > make it a bit better.
> >
> > Jason

FWIW, yesterday I made a modified version of poly1305-armv4.pl that generates an
asm file that works in kernel mode.  The changes are actually pretty small, and
I think we can get them upstream into OpenSSL like they were for sha256-armv4.pl
and sha512-armv4.pl.  I'll start a thread with Andy Polyakov and you two.

But I don't have time to help with all the many OpenSSL asm files Jason is
proposing, just maybe poly1305-armv4 and chacha-armv4 for now.

- Eric
Ard Biesheuvel Sept. 12, 2018, 10:56 p.m. UTC | #18
On 11 September 2018 at 23:22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hello Ard,
>
> I realize you've put a lot of good and hard work into the existing
> crypto API, and that my writing in these commit messages might be a
> bit too bristly and dismissive of that hard work. So I think in a
> sense it's understandable that you've responded as such here. But
> hopefully I can address your concerns. One thing to keep in mind is
> that Zinc endeavors to provide the basis for simple and direct
> functions to software algorithms. This is fairly modest goal. Just
> some functions that do some stuff in software. Around these you'll
> still be able to have complicated and impressive dynamic dispatch and
> asynchronous mechanisms such as the present crypto API. Zinc is merely
> getting the software implementation side done in a very simple and
> direct way. So I don't think there's a good reason for so much
> antagonism, despite a perhaps overbearing tone of my commit messages.
> Rather, I expect that we'll wind up working together on this quite a
> bit down the line.
>

No worries, it takes more than this to piss me off. I do take pride in
my work, but I am perfectly aware that I am not a rare talent like
Andy Polyakov, which is actually why I have worked extensively with
him in the past to get kernel specific changes to the ARM/arm64 NEON
code into upstream OpenSSL, so that we could use the upstream code
unmodified and benefit from upstream review. [1] [2]

In this series, you are dumping a huge volume of unannotated,
generated asm into the kernel which has been modified [by you] to
[among other things?] adhere to the kernel API (without documenting
what the changes are exactly). How does that live up to the promise of
better, peer reviewed code?

Then there is the performance claim. We know for instance that the
OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
possess a micro-architectural property that ALU instructions are
essentially free when they are interleaved with SIMD instructions. But
we also know that a) Cortex-A7, which is a relevant target, is not one
of those cores, and b) that chip designers are not likely to optimize
for that particular usage pattern so relying on it in generic code is
unwise in general.

I am also concerned about your claim that all software algorithms will
be moved into this crypto library. You are not specific about whose
responsibility it will be that this is going to happen in a timely
fashion. But more importantly, it is not clear at all how you expect
this to work for, e.g., h/w instruction based SHAxxx or AES in various
chaining modes, which should be used only on cores that implement
those instructions (note that on arm64, we have optional instructions
for AES, PMULL, SHA1, SHA256, SHA512, SHA3, SM3 and SM4). Are all
those implementations (only few of which will be used on a certain
core) going to be part of the monolithic library? What are the APIs
going to look like for block ciphers, taking chaining modes into
account?

I'm sure it is rather simple to port the crypto API implementation of
ChaCha20 to use your library. I am more concerned about how your
library is going to expand to cover all other software algorithms that
we currently use in the kernel.

>> In spite of the wall of text, you fail to point out exactly why the
>> existing AEAD API in unsuitable, and why fixing it is not an option.
>
> I thought I had addressed this. Firstly, there's a need for more than
> just AEAD, but ignoring that, the AEAD API is a big full API that does
> lots of things, makes allocations, parses descriptors, and so forth.
> I'm sure this kind of highly-engineered approach will continue to
> improve over time in that highly engineered direction. Zinc is doing
> something a bit different: it's providing a series of simple functions
> for various cryptographic routines. This is a considerably different
> goal -- perhaps even a complementary one -- to the AEAD API.
>

I understand how your crypto library is different from the crypto API.
But the question was why WireGuard cannot make use of the crypto APIs
AEAD interface. And note that this is an honest question: I know that
the crypto API is cumbersome to use in general, but it would be very
helpful to understand where exactly the impedance mismatch is between
what your code needs and what the crypto API provides.

>> I don't think you have
>> convinced anyone else yet either.
>
> Please only speak for yourself and refrain from rhetoric like this,
> which is patently false.
>

Fair enough. You have clearly convinced Greg :-)

>> Please refrain from sending a v4 with just a couple of more tweaks on
>> top
>
> Sorry, no, I'm not going to stop working hard on this because you're
> wary of a new approach. I will continue to improve the submission
> until it is mergable, and I do not intend to stop.
>

Of course. But please respond to all the concerns, not just the low
hanging fruit. I have already mentioned a couple of times that I
object to dumping large volumes of generated assembly into the kernel
tree without any annotation whatsoever, or any documentation about how
the generated code was hand tweaked before submission. You have not
responded to that concern yet.

> Anyway, it sounds like this whole thing may have ruffled your feathers
> a bit. Will you be at Linux Plumbers Conference in November? I'm
> planning on attending, and perhaps we could find some time there to
> sit down and talk one on one a bit.
>

That would be good, yes. I will be there.


[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4e7f10bfc4069925e99cc4b428c3434e30b6c3f
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7918ecef073fe80eeb399a37d8d48561864eedf1
Andy Lutomirski Sept. 12, 2018, 11:45 p.m. UTC | #19
On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> I realize you've put a lot of good and hard work into the existing
> I am also concerned about your claim that all software algorithms will
> be moved into this crypto library. You are not specific about whose
> responsibility it will be that this is going to happen in a timely
> fashion. But more importantly, it is not clear at all how you expect
> this to work for, e.g., h/w instruction based SHAxxx or AES in various
> chaining modes, which should be used only on cores that implement
> those instructions (note that on arm64, we have optional instructions
> for AES, PMULL, SHA1, SHA256, SHA512, SHA3, SM3 and SM4). Are all
> those implementations (only few of which will be used on a certain
> core) going to be part of the monolithic library? What are the APIs
> going to look like for block ciphers, taking chaining modes into
> account?

I'm not convinced that there's any real need for *all* crypto
algorithms to move into lib/zinc or to move at all.  As I see it,
there are two classes of crypto algorithms in the kernel:

a) Crypto that is used by code that chooses its algorithm statically
and wants synchronous operations.  These include everything in
drivers/char/random.c, but also a bunch of various networking things
that are hardcoded and basically everything that uses stack buffers.
(This means it includes all the code that I broke when I did
VMAP_STACK.  Sign.)

b) Crypto that is used dynamically.  This includes dm-crypt
(aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
lot of IPSEC stuff, possibly KCM, and probably many more.  These will
get comparatively little benefit from being converted to a zinc-like
interface.  For some of these cases, it wouldn't make any sense at all
to convert them.  Certainly the ones that do async hardware crypto
using DMA engines will never look at all like zinc, even under the
hood.

I think that, as a short-term goal, it makes a lot of sense to have
implementations of the crypto that *new* kernel code (like Wireguard)
wants to use in style (a) that live in /lib, and it obviously makes
sense to consolidate their implementations with the crypto/
implementations in a timely manner.  As a medium-term goal, adding
more algorithms as needed for things that could use the simpler APIs
(Bluetooth, perhaps) would make sense.

But I see no reason at all that /lib should ever contain a grab-bag of
crypto implementations just for the heck of it.  They should have real
in-kernel users IMO.  And this means that there will probably always
be some crypto implementations in crypto/ for things like aes-xts.

--Andy
Ard Biesheuvel Sept. 13, 2018, 5:41 a.m. UTC | #20
On 13 September 2018 at 01:45, Andy Lutomirski <luto@kernel.org> wrote:
> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> I realize you've put a lot of good and hard work into the existing
>> I am also concerned about your claim that all software algorithms will
>> be moved into this crypto library. You are not specific about whose
>> responsibility it will be that this is going to happen in a timely
>> fashion. But more importantly, it is not clear at all how you expect
>> this to work for, e.g., h/w instruction based SHAxxx or AES in various
>> chaining modes, which should be used only on cores that implement
>> those instructions (note that on arm64, we have optional instructions
>> for AES, PMULL, SHA1, SHA256, SHA512, SHA3, SM3 and SM4). Are all
>> those implementations (only few of which will be used on a certain
>> core) going to be part of the monolithic library? What are the APIs
>> going to look like for block ciphers, taking chaining modes into
>> account?
>
> I'm not convinced that there's any real need for *all* crypto
> algorithms to move into lib/zinc or to move at all.  As I see it,
> there are two classes of crypto algorithms in the kernel:
>
> a) Crypto that is used by code that chooses its algorithm statically
> and wants synchronous operations.  These include everything in
> drivers/char/random.c, but also a bunch of various networking things
> that are hardcoded and basically everything that uses stack buffers.
> (This means it includes all the code that I broke when I did
> VMAP_STACK.  Sign.)
>
> b) Crypto that is used dynamically.  This includes dm-crypt
> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
> get comparatively little benefit from being converted to a zinc-like
> interface.  For some of these cases, it wouldn't make any sense at all
> to convert them.  Certainly the ones that do async hardware crypto
> using DMA engines will never look at all like zinc, even under the
> hood.
>
> I think that, as a short-term goal, it makes a lot of sense to have
> implementations of the crypto that *new* kernel code (like Wireguard)
> wants to use in style (a) that live in /lib, and it obviously makes
> sense to consolidate their implementations with the crypto/
> implementations in a timely manner.  As a medium-term goal, adding
> more algorithms as needed for things that could use the simpler APIs
> (Bluetooth, perhaps) would make sense.
>
> But I see no reason at all that /lib should ever contain a grab-bag of
> crypto implementations just for the heck of it.  They should have real
> in-kernel users IMO.  And this means that there will probably always
> be some crypto implementations in crypto/ for things like aes-xts.
>

But one of the supposed selling points of this crypto library is that
it gives engineers who are frightened of crypto in general and the
crypto API in particular simple and easy to use crypto primitives
rather than having to jump through the crypto API's hoops.

A crypto library whose only encryption algorithm is a stream cipher
does *not* deliver on that promise, since it is only suitable for
cases where IVs are guaranteed not to be reused. You yourself were
bitten by the clunkiness of the crypto API when attempting to use the
SHA26 code, right? So shouldn't we move that into this crypto library
as well?

I think it is reasonable for WireGuard to standardize on
ChaCha20/Poly1305 only, although I have my concerns about the flag day
that will be required if this 'one true cipher' ever does turn out to
be compromised (either that, or we will have to go back in time and
add some kind of protocol versioning to existing deployments of
WireGuard)

And frankly, if the code were as good as the prose, we wouldn't be
having this discussion. Zinc adds its own clunky ways to mix arch and
generic code, involving GCC -include command line arguments and
#ifdefs everywhere. My review comments on this were completely ignored
by Jason.
Milan Broz Sept. 13, 2018, 6:39 a.m. UTC | #21
On 13/09/18 01:45, Andy Lutomirski wrote:
> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
... 
> b) Crypto that is used dynamically.  This includes dm-crypt
> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
> get comparatively little benefit from being converted to a zinc-like
> interface.  For some of these cases, it wouldn't make any sense at all
> to convert them.  Certainly the ones that do async hardware crypto
> using DMA engines will never look at all like zinc, even under the
> hood.

Please note, that dm-crypt now uses not only block ciphers and modes,
but also authenticated encryption and hashes (for ESSIV and HMAC
in authenticated composed modes) and RNG (for random IV).
We use crypto API, including async variants (I hope correctly :)

There is a long time battle to move initialization vectors generators
from dm-crypt to crypto API. If there are any plans to use a new library,
this issue should be discussed as well.
(Some dm-crypt IV generators are disk encryption specific, some do more
that just IV so porting is not straightforward etc).

Related problem here is an optimization of chain of sectors encryption -
if we have new crypto API, it would be nice if can take chain of sectors
so possible implementation can process this chain in one batch
(every sector need to be tweaked by differently generated IV - and we
are back in problem above).
I think filesystem encryption uses the same pattern.

And btw, we use the same algorithms through AF_ALG in userspace (cryptsetup).

So please, if you mention dm-crypt, note that it is very complex
crypto API consumer :) And everything is dynamic, configurable through
dm-crypt options.

That said, I would be more than happy to help in experiments to porting dm-crypt
to any other crypto library, but if it doesn't not help with problems
mentioned above, I do not see any compelling reason for the new library for dm-crypt...

Milan
Jason A. Donenfeld Sept. 13, 2018, 2:15 p.m. UTC | #22
Hi Ard,

On Thu, Sep 13, 2018 at 12:56 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> In this series, you are dumping a huge volume of unannotated,
> generated asm into the kernel which has been modified [by you] to
> [among other things?] adhere to the kernel API (without documenting
> what the changes are exactly). How does that live up to the promise of
> better, peer reviewed code?

The code still benefits from the review that's gone into OpenSSL. It's
not modified in ways that would affect the cryptographic operations
being done. It's modified to be suitable for kernel space.

> Then there is the performance claim. We know for instance that the
> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
> possess a micro-architectural property that ALU instructions are
> essentially free when they are interleaved with SIMD instructions. But
> we also know that a) Cortex-A7, which is a relevant target, is not one
> of those cores, and b) that chip designers are not likely to optimize
> for that particular usage pattern so relying on it in generic code is
> unwise in general.

That's interesting. I'll bring this up with AndyP. FWIW, if you think
you have a real and compelling claim here, I'd be much more likely to
accept a different ChaCha20 implementation than I would be to accept a
different Poly1305 implementation. (It's a *lot* harder to screw up
ChaCha20 than it is to screw up Poly1305.)

> I am also concerned about your claim that all software algorithms will
> be moved into this crypto library.

I'll defer to Andy's response here, which I think is a correct one:
https://lkml.org/lkml/2018/9/13/27

The short answer is that Zinc is going to be adding the ciphers that
people want to use for normal reasons from normal code. For example,
after this merges, we'll next be working on moving the remaining
non-optimized C code out of lib/ that's called by places (such as
SHA2).

> You are not specific about whose
> responsibility it will be that this is going to happen in a timely
> fashion.

I thought I laid out the roadmap for this in the commit message. In
case I wasn't clear: my plan is to tackle lib/ after merging, and I
plan to do so in a timely manner. It's a pretty common tactic to keep
layering on tasks, "what about X?", "what about Y?", "I won't agree
unless Z!" -- when in reality kernel development and refactorings are
done incrementally. I've been around on this list contributing code
for long enough that you should have a decent amount of confidence
that I'm not just going to disappear working on this or something
insane like that. And neither are the two academic cryptographers CC'd
on this thread. So, as Andy said, we're going to be porting to Zinc
the primitives that are useful for the various applications of Zinc.
This means yes, we'll have SHA2 in there.

> chaining modes
> What are the APIs
> going to look like for block ciphers, taking chaining modes into
> account?

As mentioned in the commit message and numerous times, we're not
trying to make a win32-like crypto API here or to remake the existing
Linux crypto API. Rather we're providing libraries of specific
functions that are useful for various circumstances. For example, if
AES-GCM is desired at some point, then we'll have a similar API for
that as we do for ChaPoly -- one that takes buffers and one that takes
sg. Likewise, hash functions use the familiar init/update/final.
"Generic" chaining modes aren't really part of the equation or design
goals.

Again, I realize you've spent a long time working on the existing
crypto API, and so your questions and concerns are in the line of,
"how are we going to make Zinc look like the existing crypto API in
functionality?" But that's not what we're up to here. We have a
different and complementary design goal. I understand why you're
squirming, but please recognize we're working on different things.

> I'm sure it is rather simple to port the crypto API implementation of
> ChaCha20 to use your library. I am more concerned about how your
> library is going to expand to cover all other software algorithms that
> we currently use in the kernel.

The subset of algorithms we add will be developed with the same
methodology as the present ones. There is nothing making this
particularly difficult or even more difficult for other primitives
than it was for ChaCha20. It's especially easy, in fact, since we're
following similar design methodologies as the vast majority of other
cryptography libraries that have been developed. Namely, we're
creating simple things called "functions".

> Of course. But please respond to all the concerns,
> You have not
> responded to that concern yet.

Sorry, it's certainly not my intention. I've been on vacation with my
family for the last several weeks, and only returned home
sleep-deprived last night after 4 days of plane delays. I've now
rested and will resume working on this full-time and I'll try my best
to address concerns, and also go back through emails to find things I
might have missed. (First, though, I'm going to deal with getting back
the three suitcases the airline lost in transit...)

> > Anyway, it sounds like this whole thing may have ruffled your feathers
> > a bit. Will you be at Linux Plumbers Conference in November? I'm
> > planning on attending, and perhaps we could find some time there to
> > sit down and talk one on one a bit.
>
> That would be good, yes. I will be there.

Looking forward to talking to you there, and hopefully we can put to
rest any lingering concerns.

Jason
Jason A. Donenfeld Sept. 13, 2018, 2:18 p.m. UTC | #23
On Thu, Sep 13, 2018 at 1:45 AM Andy Lutomirski <luto@kernel.org> wrote:
> I'm not convinced that there's any real need for *all* crypto
> algorithms to move into lib/zinc or to move at all.  As I see it,
> there are two classes of crypto algorithms in the kernel:
>
> a) Crypto that is used by code that chooses its algorithm statically
> and wants synchronous operations.  These include everything in
> drivers/char/random.c, but also a bunch of various networking things
> that are hardcoded and basically everything that uses stack buffers.
> (This means it includes all the code that I broke when I did
> VMAP_STACK.  Sign.)

Right, exactly. This is what will wind up using Zinc. I'm working on
an example usage of this for v4 of the patch submission, which you can
ogle in a preview here if you're curious:

https://git.zx2c4.com/linux-dev/commit/?h=big_key_rewrite

28 insertions, 206 deletions :-D

> b) Crypto that is used dynamically.  This includes dm-crypt
> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
> get comparatively little benefit from being converted to a zinc-like
> interface.  For some of these cases, it wouldn't make any sense at all
> to convert them.  Certainly the ones that do async hardware crypto
> using DMA engines will never look at all like zinc, even under the
> hood.

Right, this is what the crypto API will continue to be used for.


> I think that, as a short-term goal, it makes a lot of sense to have
> implementations of the crypto that *new* kernel code (like Wireguard)
> wants to use in style (a) that live in /lib, and it obviously makes
> sense to consolidate their implementations with the crypto/
> implementations in a timely manner.  As a medium-term goal, adding
> more algorithms as needed for things that could use the simpler APIs
> (Bluetooth, perhaps) would make sense.

Agreed 100%. With regards to "consolidate their implementations" --
I've actually already done this after your urging yesterday, and so
that will be a part of v4.

> But I see no reason at all that /lib should ever contain a grab-bag of
> crypto implementations just for the heck of it.  They should have real
> in-kernel users IMO.  And this means that there will probably always
> be some crypto implementations in crypto/ for things like aes-xts.

Right, precisely.

Jason
Jason A. Donenfeld Sept. 13, 2018, 2:32 p.m. UTC | #24
On Thu, Sep 13, 2018 at 7:41 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> But one of the supposed selling points of this crypto library is that
> it gives engineers who are frightened of crypto in general and the
> crypto API in particular simple and easy to use crypto primitives
> rather than having to jump through the crypto API's hoops.

The goal is for engineers who want to specifically use algorithm X
from within the kernel in a non-dynamic way to be able to then use
algorithm X with a simple function call. The goal is not to open it up
to people who have no idea what they're doing; for that a NaCL-like
library with functions like "crypto_box_open" or something would fit
the bill; but that's also not what we're trying to do here. Please
don't confuse the design goals. The rest of your email is therefore a
bit of a straw man; cut the rhetoric out.

> A crypto library whose only encryption algorithm is a stream cipher
> does *not* deliver on that promise, since it is only suitable for
> cases where IVs are guaranteed not to be reused.

False. We also offer XChaCha20Poly1305, which takes a massive nonce,
suitable for random generation.

If there became a useful case for AES-PMAC-SIV or even AES-GCM or
something to that extent, then Zinc would add that as required. But
we're not going to start adding random ciphers unless they're needed.

> You yourself were
> bitten by the clunkiness of the crypto API when attempting to use the
> SHA26 code, right? So shouldn't we move that into this crypto library
> as well?

As stated in the initial commit, and in numerous other emails
stretching back a year, yes, sha256 and other things in lib/ are going
to be put into Zinc following the initial merge of Zinc. These changes
will happen incrementally, like everything else that happens in the
kernel. Sha256, in particular, is probably the first thing I'll port
post-merge.

> I think it is reasonable for WireGuard to standardize on
> ChaCha20/Poly1305 only, although I have my concerns about the flag day
> that will be required if this 'one true cipher' ever does turn out to
> be compromised (either that, or we will have to go back in time and
> add some kind of protocol versioning to existing deployments of
> WireGuard)

Those concerns are not valid and have already been addressed (to you,
I believe) on this mailing list and elsewhere. WireGuard is versioned,
hence there's no need to "add" versioning, and it is prepared to roll
out new cryptography in a subsequent version should there be any
issues. In other words, your concern is based on a misunderstanding of
the protocol. If you have issues, however, with the design decisions
of WireGuard, something that's been heavily discussed with members of
the linux kernel community, networking community, cryptography
community, and so forth, for the last 3 years, I invite you to bring
them up on <wireguard@lists.zx2c4.com>.

> And frankly, if the code were as good as the prose, we wouldn't be
> having this discussion.

Please cut out this rhetoric. That's an obviously unprovable
statement, but it probably isn't true anyway. I wish you'd stick to
technical concerns only, rather than what appears to be a desire to
derail this by any means necessary.

> Zinc adds its own clunky ways to mix arch and
> generic code, involving GCC -include command line arguments and
> #ifdefs everywhere. My review comments on this were completely ignored
> by Jason.

No, they were not ignored. v2 cleaned up the #ifdefs. v4 has already
cleaned up the makefile stuff and will be even cleaner. Good things
await, don't worry.

Jason
Jason A. Donenfeld Sept. 13, 2018, 2:34 p.m. UTC | #25
Hi Milan,

On Thu, Sep 13, 2018 at 8:40 AM Milan Broz <gmazyland@gmail.com> wrote:
> Please note, that dm-crypt now uses not only block ciphers and modes,
> but also authenticated encryption and hashes (for ESSIV and HMAC
> in authenticated composed modes) and RNG (for random IV).
> We use crypto API, including async variants (I hope correctly :)
>
> There is a long time battle to move initialization vectors generators
> from dm-crypt to crypto API. If there are any plans to use a new library,
> this issue should be discussed as well.
> (Some dm-crypt IV generators are disk encryption specific, some do more
> that just IV so porting is not straightforward etc).
>
> Related problem here is an optimization of chain of sectors encryption -
> if we have new crypto API, it would be nice if can take chain of sectors
> so possible implementation can process this chain in one batch
> (every sector need to be tweaked by differently generated IV - and we
> are back in problem above).
> I think filesystem encryption uses the same pattern.
>
> And btw, we use the same algorithms through AF_ALG in userspace (cryptsetup).
>
> So please, if you mention dm-crypt, note that it is very complex
> crypto API consumer :) And everything is dynamic, configurable through
> dm-crypt options.
>
> That said, I would be more than happy to help in experiments to porting dm-crypt
> to any other crypto library, but if it doesn't not help with problems
> mentioned above, I do not see any compelling reason for the new library for dm-crypt...

dm-crypt is probably a good consumer of the existing crypto API and
won't be impacted by the introduction of Zinc, which is really just
the exposure of a couple low level simple crypto functions, and not a
fancy API like the crypto API which dm-crypt happily uses.

Jason
Ard Biesheuvel Sept. 13, 2018, 3:04 p.m. UTC | #26
On 13 September 2018 at 16:15, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Ard,
>
> On Thu, Sep 13, 2018 at 12:56 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> In this series, you are dumping a huge volume of unannotated,
>> generated asm into the kernel which has been modified [by you] to
>> [among other things?] adhere to the kernel API (without documenting
>> what the changes are exactly). How does that live up to the promise of
>> better, peer reviewed code?
>
> The code still benefits from the review that's gone into OpenSSL. It's
> not modified in ways that would affect the cryptographic operations
> being done. It's modified to be suitable for kernel space.
>

So could we please at least have those changes as a separate patch then?

>> Then there is the performance claim. We know for instance that the
>> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
>> possess a micro-architectural property that ALU instructions are
>> essentially free when they are interleaved with SIMD instructions. But
>> we also know that a) Cortex-A7, which is a relevant target, is not one
>> of those cores, and b) that chip designers are not likely to optimize
>> for that particular usage pattern so relying on it in generic code is
>> unwise in general.
>
> That's interesting. I'll bring this up with AndyP. FWIW, if you think
> you have a real and compelling claim here, I'd be much more likely to
> accept a different ChaCha20 implementation than I would be to accept a
> different Poly1305 implementation. (It's a *lot* harder to screw up
> ChaCha20 than it is to screw up Poly1305.)
>

The question is really whether we want different implementations in
the crypto API and in zinc.

>> I am also concerned about your claim that all software algorithms will
>> be moved into this crypto library.
>
> I'll defer to Andy's response here, which I think is a correct one:
> https://lkml.org/lkml/2018/9/13/27
>
> The short answer is that Zinc is going to be adding the ciphers that
> people want to use for normal reasons from normal code. For example,
> after this merges, we'll next be working on moving the remaining
> non-optimized C code out of lib/ that's called by places (such as
> SHA2).
>

Excellent.

>> You are not specific about whose
>> responsibility it will be that this is going to happen in a timely
>> fashion.
>
> I thought I laid out the roadmap for this in the commit message. In
> case I wasn't clear: my plan is to tackle lib/ after merging, and I
> plan to do so in a timely manner. It's a pretty common tactic to keep
> layering on tasks, "what about X?", "what about Y?", "I won't agree
> unless Z!" -- when in reality kernel development and refactorings are
> done incrementally. I've been around on this list contributing code
> for long enough that you should have a decent amount of confidence
> that I'm not just going to disappear working on this or something
> insane like that. And neither are the two academic cryptographers CC'd
> on this thread. So, as Andy said, we're going to be porting to Zinc
> the primitives that are useful for the various applications of Zinc.
> This means yes, we'll have SHA2 in there.
>
>> chaining modes
>> What are the APIs
>> going to look like for block ciphers, taking chaining modes into
>> account?
>
> As mentioned in the commit message and numerous times, we're not
> trying to make a win32-like crypto API here or to remake the existing
> Linux crypto API. Rather we're providing libraries of specific
> functions that are useful for various circumstances. For example, if
> AES-GCM is desired at some point, then we'll have a similar API for
> that as we do for ChaPoly -- one that takes buffers and one that takes
> sg. Likewise, hash functions use the familiar init/update/final.
> "Generic" chaining modes aren't really part of the equation or design
> goals.
>
> Again, I realize you've spent a long time working on the existing
> crypto API, and so your questions and concerns are in the line of,
> "how are we going to make Zinc look like the existing crypto API in
> functionality?"

You are completely missing my point. I am not particularly invested in
the crypto API, and I share the concerns about its usability. That is
why I want to make sure that your solution actually results in a net
improvement for everybody, not just for WireGuard, in a maintainable
way.

> But that's not what we're up to here. We have a
> different and complementary design goal. I understand why you're
> squirming, but please recognize we're working on different things.
>
>> I'm sure it is rather simple to port the crypto API implementation of
>> ChaCha20 to use your library. I am more concerned about how your
>> library is going to expand to cover all other software algorithms that
>> we currently use in the kernel.
>
> The subset of algorithms we add will be developed with the same
> methodology as the present ones. There is nothing making this
> particularly difficult or even more difficult for other primitives
> than it was for ChaCha20. It's especially easy, in fact, since we're
> following similar design methodologies as the vast majority of other
> cryptography libraries that have been developed. Namely, we're
> creating simple things called "functions".
>
>> Of course. But please respond to all the concerns,
>> You have not
>> responded to that concern yet.
>
> Sorry, it's certainly not my intention. I've been on vacation with my
> family for the last several weeks, and only returned home
> sleep-deprived last night after 4 days of plane delays. I've now
> rested and will resume working on this full-time and I'll try my best
> to address concerns, and also go back through emails to find things I
> might have missed. (First, though, I'm going to deal with getting back
> the three suitcases the airline lost in transit...)
>
>> > Anyway, it sounds like this whole thing may have ruffled your feathers
>> > a bit. Will you be at Linux Plumbers Conference in November? I'm
>> > planning on attending, and perhaps we could find some time there to
>> > sit down and talk one on one a bit.
>>
>> That would be good, yes. I will be there.
>
> Looking forward to talking to you there, and hopefully we can put to
> rest any lingering concerns.
>
> Jason
Ard Biesheuvel Sept. 13, 2018, 3:07 p.m. UTC | #27
On 13 September 2018 at 16:18, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Thu, Sep 13, 2018 at 1:45 AM Andy Lutomirski <luto@kernel.org> wrote:
>> I'm not convinced that there's any real need for *all* crypto
>> algorithms to move into lib/zinc or to move at all.  As I see it,
>> there are two classes of crypto algorithms in the kernel:
>>
>> a) Crypto that is used by code that chooses its algorithm statically
>> and wants synchronous operations.  These include everything in
>> drivers/char/random.c, but also a bunch of various networking things
>> that are hardcoded and basically everything that uses stack buffers.
>> (This means it includes all the code that I broke when I did
>> VMAP_STACK.  Sign.)
>
> Right, exactly. This is what will wind up using Zinc. I'm working on
> an example usage of this for v4 of the patch submission, which you can
> ogle in a preview here if you're curious:
>
> https://git.zx2c4.com/linux-dev/commit/?h=big_key_rewrite
>
> 28 insertions, 206 deletions :-D
>

I must say, that actually looks pretty good.

>> b) Crypto that is used dynamically.  This includes dm-crypt
>> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
>> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
>> get comparatively little benefit from being converted to a zinc-like
>> interface.  For some of these cases, it wouldn't make any sense at all
>> to convert them.  Certainly the ones that do async hardware crypto
>> using DMA engines will never look at all like zinc, even under the
>> hood.
>
> Right, this is what the crypto API will continue to be used for.
>
>
>> I think that, as a short-term goal, it makes a lot of sense to have
>> implementations of the crypto that *new* kernel code (like Wireguard)
>> wants to use in style (a) that live in /lib, and it obviously makes
>> sense to consolidate their implementations with the crypto/
>> implementations in a timely manner.  As a medium-term goal, adding
>> more algorithms as needed for things that could use the simpler APIs
>> (Bluetooth, perhaps) would make sense.
>
> Agreed 100%. With regards to "consolidate their implementations" --
> I've actually already done this after your urging yesterday, and so
> that will be a part of v4.
>
>> But I see no reason at all that /lib should ever contain a grab-bag of
>> crypto implementations just for the heck of it.  They should have real
>> in-kernel users IMO.  And this means that there will probably always
>> be some crypto implementations in crypto/ for things like aes-xts.
>
> Right, precisely.
>
> Jason
Andy Lutomirski Sept. 13, 2018, 3:26 p.m. UTC | #28
> On Sep 12, 2018, at 11:39 PM, Milan Broz <gmazyland@gmail.com> wrote:
> 
>> On 13/09/18 01:45, Andy Lutomirski wrote:
>> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
> ... 
>> b) Crypto that is used dynamically.  This includes dm-crypt
>> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
>> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
>> get comparatively little benefit from being converted to a zinc-like
>> interface.  For some of these cases, it wouldn't make any sense at all
>> to convert them.  Certainly the ones that do async hardware crypto
>> using DMA engines will never look at all like zinc, even under the
>> hood.
> 
> Please note, that dm-crypt now uses not only block ciphers and modes,
> but also authenticated encryption and hashes (for ESSIV and HMAC
> in authenticated composed modes) and RNG (for random IV).
> We use crypto API, including async variants (I hope correctly :)

Right. And all this is why I don’t think dm-crypt should use zinc, at least not any time soon.
Ard Biesheuvel Sept. 13, 2018, 3:42 p.m. UTC | #29
On 13 September 2018 at 16:32, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Thu, Sep 13, 2018 at 7:41 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> But one of the supposed selling points of this crypto library is that
>> it gives engineers who are frightened of crypto in general and the
>> crypto API in particular simple and easy to use crypto primitives
>> rather than having to jump through the crypto API's hoops.
>
> The goal is for engineers who want to specifically use algorithm X
> from within the kernel in a non-dynamic way to be able to then use
> algorithm X with a simple function call. The goal is not to open it up
> to people who have no idea what they're doing; for that a NaCL-like
> library with functions like "crypto_box_open" or something would fit
> the bill; but that's also not what we're trying to do here. Please
> don't confuse the design goals. The rest of your email is therefore a
> bit of a straw man; cut the rhetoric out.
>
>> A crypto library whose only encryption algorithm is a stream cipher
>> does *not* deliver on that promise, since it is only suitable for
>> cases where IVs are guaranteed not to be reused.
>
> False. We also offer XChaCha20Poly1305, which takes a massive nonce,
> suitable for random generation.
>
> If there became a useful case for AES-PMAC-SIV or even AES-GCM or
> something to that extent, then Zinc would add that as required. But
> we're not going to start adding random ciphers unless they're needed.
>

I'd prefer it if all the accelerated software implementations live in
the same place. But I do strongly prefer arch code to live in
arch/$arch, simply because of the maintenance scope for arch
developers and maintainers.

I think AES-GCM is a useful example here. I really like the SIMD token
abstraction a lot, but I would like to understand how this would work
in Zinc if you have
a) a generic implementation
b) perhaps an arch specific scalar implementation
c) a pure NEON implementation
d) an implementation using AES instructions but not the PMULL instructions
e) an implementation that uses AES and PMULL instructions.

On arch/arm64 we support code patching, on other arches it may be
different. We also support udev loading of modules depending on CPU
capabilities, which means only the implementation you are likely to
use will be loaded, and other modules are disregarded.

I am not asking you to implement this now. But I do want to understand
how these use cases fit into Zinc. And note that this has nothing to
do with the crypto API: this could simply be a synchronous
aes_gcm_encrypt() library function that maps to any of the above at
runtime depending on the platform's capabilities.

>> You yourself were
>> bitten by the clunkiness of the crypto API when attempting to use the
>> SHA26 code, right? So shouldn't we move that into this crypto library
>> as well?
>
> As stated in the initial commit, and in numerous other emails
> stretching back a year, yes, sha256 and other things in lib/ are going
> to be put into Zinc following the initial merge of Zinc. These changes
> will happen incrementally, like everything else that happens in the
> kernel. Sha256, in particular, is probably the first thing I'll port
> post-merge.
>

Excellent.

>> I think it is reasonable for WireGuard to standardize on
>> ChaCha20/Poly1305 only, although I have my concerns about the flag day
>> that will be required if this 'one true cipher' ever does turn out to
>> be compromised (either that, or we will have to go back in time and
>> add some kind of protocol versioning to existing deployments of
>> WireGuard)
>
> Those concerns are not valid and have already been addressed (to you,
> I believe) on this mailing list and elsewhere. WireGuard is versioned,
> hence there's no need to "add" versioning, and it is prepared to roll
> out new cryptography in a subsequent version should there be any
> issues. In other words, your concern is based on a misunderstanding of
> the protocol. If you have issues, however, with the design decisions
> of WireGuard, something that's been heavily discussed with members of
> the linux kernel community, networking community, cryptography
> community, and so forth, for the last 3 years, I invite you to bring
> them up on <wireguard@lists.zx2c4.com>.
>

I'd prefer to have the discussion here, if you don't mind.

IIUC clients and servers need to run the same version of the protocol.
So does it require a flag day to switch the world over to the new
version when the old one is deprecated?

>> And frankly, if the code were as good as the prose, we wouldn't be
>> having this discussion.
>
> Please cut out this rhetoric. That's an obviously unprovable
> statement, but it probably isn't true anyway. I wish you'd stick to
> technical concerns only, rather than what appears to be a desire to
> derail this by any means necessary.
>

I apologize if I hit a nerve there, that was not my intention.

I am not an 'entrenched crypto API guy that is out to get you'. I am a
core ARM developer that takes an interest in crypto, shares your
concern about the usability of the crypto API, and tries to ensure
that what we end up is maintainable and usable for everybody.

>> Zinc adds its own clunky ways to mix arch and
>> generic code, involving GCC -include command line arguments and
>> #ifdefs everywhere. My review comments on this were completely ignored
>> by Jason.
>
> No, they were not ignored. v2 cleaned up the #ifdefs. v4 has already
> cleaned up the makefile stuff and will be even cleaner. Good things
> await, don't worry.
>

You know what? If you're up for it, let's not wait until Plumbers, but
instead, let's collaborate off list to get this into shape.
Jason A. Donenfeld Sept. 13, 2018, 3:45 p.m. UTC | #30
On Thu, Sep 13, 2018 at 5:04 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> > The code still benefits from the review that's gone into OpenSSL. It's
> > not modified in ways that would affect the cryptographic operations
> > being done. It's modified to be suitable for kernel space.
>
> So could we please at least have those changes as a separate patch then?

I'll experiment with a couple ways of trying to communicate with
precision what's been changed from OpenSSL for the next round of
patches.

> > That's interesting. I'll bring this up with AndyP. FWIW, if you think
> > you have a real and compelling claim here, I'd be much more likely to
> > accept a different ChaCha20 implementation than I would be to accept a
> > different Poly1305 implementation. (It's a *lot* harder to screw up
> > ChaCha20 than it is to screw up Poly1305.)
>
> The question is really whether we want different implementations in
> the crypto API and in zinc.

Per earlier in this discussion, I've already authored patches that
replaces the crypto API's implementations with simple calls to Zinc,
so that code isn't duplicated. These will be in v4 and you can comment
on the approach then.

> You are completely missing my point. I am not particularly invested in
> the crypto API, and I share the concerns about its usability. That is
> why I want to make sure that your solution actually results in a net
> improvement for everybody, not just for WireGuard, in a maintainable
> way.

Right, likewise. I've put quite a bit of effort into separating Zinc
into Zinc and not into something part of WireGuard. The motivation for
doing so is a decent amount of call sites all around the kernel that
I'd like to gradually fix up.
Jason A. Donenfeld Sept. 13, 2018, 3:58 p.m. UTC | #31
On Thu, Sep 13, 2018 at 5:43 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> I'd prefer it if all the accelerated software implementations live in
> the same place. But I do strongly prefer arch code to live in
> arch/$arch

Zinc follows the scheme of the raid6 code, as well as of most other
crypto libraries: code is grouped by cipher, making it easy for people
to work with and understand differing implementations. It also allows
us to trivially link these together at compile time rather than at
link time, which makes cipher selection much more efficient. It's
really much more maintainable this way.

> I think AES-GCM is a useful example here. I really like the SIMD token
> abstraction a lot, but I would like to understand how this would work
> in Zinc if you have
> a) a generic implementation
> b) perhaps an arch specific scalar implementation
> c) a pure NEON implementation
> d) an implementation using AES instructions but not the PMULL instructions
> e) an implementation that uses AES and PMULL instructions.

The same way that Zinc currently chooses between the five different
implementations for, say, x86_64 ChaCha20:

- Generic C scalar
- SSSE3
- AVX2
- AVX512F
- AVX512VL

We make a decision based on CPU capabilities, SIMD context, and input
length, and then choose the right function.

> You know what? If you're up for it, let's not wait until Plumbers, but
> instead, let's collaborate off list to get this into shape.

Sure, sounds good.

Jason
Ard Biesheuvel Sept. 14, 2018, 6:15 a.m. UTC | #32
On 13 September 2018 at 17:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Thu, Sep 13, 2018 at 5:43 PM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> I'd prefer it if all the accelerated software implementations live in
>> the same place. But I do strongly prefer arch code to live in
>> arch/$arch
>
> Zinc follows the scheme of the raid6 code, as well as of most other
> crypto libraries: code is grouped by cipher, making it easy for people
> to work with and understand differing implementations. It also allows
> us to trivially link these together at compile time rather than at
> link time, which makes cipher selection much more efficient. It's
> really much more maintainable this way.
>
>> I think AES-GCM is a useful example here. I really like the SIMD token
>> abstraction a lot, but I would like to understand how this would work
>> in Zinc if you have
>> a) a generic implementation
>> b) perhaps an arch specific scalar implementation
>> c) a pure NEON implementation
>> d) an implementation using AES instructions but not the PMULL instructions
>> e) an implementation that uses AES and PMULL instructions.
>
> The same way that Zinc currently chooses between the five different
> implementations for, say, x86_64 ChaCha20:
>
> - Generic C scalar
> - SSSE3
> - AVX2
> - AVX512F
> - AVX512VL
>
> We make a decision based on CPU capabilities, SIMD context, and input
> length, and then choose the right function.
>

OK, so given random.c's future dependency on Zinc (for ChaCha20), and
the fact that Zinc is one monolithic piece of code, all versions of
all algorithms will always be statically linked into the kernel
proper. I'm not sure that is acceptable.

>> You know what? If you're up for it, let's not wait until Plumbers, but
>> instead, let's collaborate off list to get this into shape.
>
> Sure, sounds good.
>

BTW you haven't answered my question yet about what happens when the
WireGuard protocol version changes: will we need a flag day and switch
all deployments over at the same time?
Ard Biesheuvel Sept. 14, 2018, 6:21 a.m. UTC | #33
On 12 September 2018 at 20:34, Eric Biggers <ebiggers@kernel.org> wrote:
> On Wed, Sep 12, 2018 at 08:19:21PM +0200, Ard Biesheuvel wrote:
>> On 12 September 2018 at 20:16, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> > Hi Eric,
>> >
>> > On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
>> >> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
>> >> you've actually read through the asm from the OpenSSL implementations, but the
>> >> generated .S files actually do lose a lot of semantic information that was in
>> >> the original .pl scripts.
>> >
>> > The thing to keep in mind is that the .S was not directly and blindly
>> > generated from the .pl. We started with the output of the .pl, and
>> > then, particularly in the case of x86_64, worked with it a lot, and
>> > now it's something a bit different. We've definitely spent a lot of
>> > time reading that assembly.
>> >
>>
>> Can we please have those changes as a separate patch? Preferably to
>> the .pl file rather than the .S file, so we can easily distinguish the
>> code from upstream from the code that you modified.
>>
>> > I'll see if I can improve the readability with some register name
>> > remapping on ARM. No guarantees, but I'll play a bit and see if I can
>> > make it a bit better.
>> >
>> > Jason
>
> FWIW, yesterday I made a modified version of poly1305-armv4.pl that generates an
> asm file that works in kernel mode.  The changes are actually pretty small, and
> I think we can get them upstream into OpenSSL like they were for sha256-armv4.pl
> and sha512-armv4.pl.  I'll start a thread with Andy Polyakov and you two.
>
> But I don't have time to help with all the many OpenSSL asm files Jason is
> proposing, just maybe poly1305-armv4 and chacha-armv4 for now.
>

Thanks Eric. I reached out to Andy Polyakov off line, and he is happy
to work with us again on this, although he did point out that our
experiences on ARM may not extrapolate to x86_64, given the fact that
the perl sources there also contain parameterization for the calling
convention differences between Windows and SysV.
Jason A. Donenfeld Sept. 14, 2018, 9:53 a.m. UTC | #34
On Fri, Sep 14, 2018 at 8:15 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> OK, so given random.c's future dependency on Zinc (for ChaCha20), and
> the fact that Zinc is one monolithic piece of code, all versions of
> all algorithms will always be statically linked into the kernel
> proper. I'm not sure that is acceptable.

v4 already addresses that issue, actually. I'll post it shortly.

> BTW you haven't answered my question yet about what happens when the
> WireGuard protocol version changes: will we need a flag day and switch
> all deployments over at the same time?

No, that won't be necessary, necessarily. Peers are individually
versioned and the protocol is fairly flexible in this regard.
Andy Lutomirski Sept. 17, 2018, 4:09 a.m. UTC | #35
On Tue, Sep 11, 2018 at 4:57 PM David Miller <davem@davemloft.net> wrote:
>
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 12 Sep 2018 01:30:15 +0200
>
> > Just as an FYI:
> >
> > 1) I don't think anybody in netdev has taken a serious look at the
> > network code yet. There is little point until the controversial part
> > of the code, Zinc, has been sorted out.
> >
> > 2) I personally would be surprised if DaveM took this code without
> > having an Acked-by from the crypto subsystem people. In the same way,
> > i doubt the crypto people would take an Ethernet driver without having
> > DaveM's Acked-by.
>
> Both of Andrew's statements are completely true.
>
> I'm not looking at any of the networking bits until the crypto stuff
> is fully sorted and fully supported and Ack'd by crypto folks.

So, as a process question, whom exactly are we waiting for:

CRYPTO API
M:      Herbert Xu <herbert@gondor.apana.org.au>
M:      "David S. Miller" <davem@davemloft.net>
L:      linux-crypto@vger.kernel.org

Herbert hasn't replied to any of these submissions.  You're the other
maintainer :)

To the extent that you (DaveM) want my ack, here's what I think of the
series so far:

The new APIs to the crypto primitives are good.  For code that wants
to do a specific known crypto operation, they are much, much more
pleasant to use than the existing crypto API.  The code cleanups in
the big keys patch speak for themselves IMO.  I have no problem with
the addition of a brand-new API to the kernel, especially when it's a
nice one like Zinc's, even if that API starts out with only a couple
of users.

Zinc's arrangement of arch code is just fine.  Sure, there are
arguments that putting arch-specific code in arch/ is better, but this
is mostly bikeshedding IMO.

There has been some discussion of the exact best way to handle
simd_relax() and some other minor nitpicks of API details.  This kind
of stuff doesn't need to block the series -- it can always be reworked
down the road if needed.

There are two things I don't like right now, though:

1. Zinc conflates the addition of a new API with the replacement of
some algorithm implementations.  This is problematic.  Look at the
recent benchmarks of ipsec before and after this series.  Apparently
big packets get faster and small packets get slower.  It would be
really nice to bisect the series to narrow down *where* the regression
came from, but, as currently structured, you can't.

The right way to do this is to rearrange the series.  First, the new
Zinc APIs should be added, and they should be backed with the
*existing* crypto code.  (If the code needs to be moved or copied to a
new location, so be it.  The patch will be messy because somehow the
Zinc API is going to have to dispatch to the arch-specific code, and
the way that the crypto API handles it is not exactly friendly to this
type of use.  So be it.)  Then another patch should switch the crypto
API to use the Zinc interface.  That patch, *by itself*, can be
benchmarked.  If it causes a regression for small ipsec packets, then
it can be tracked down relatively easily.  Once this is all done, the
actual crypto implementation can be changed, and that changed can be
reviewed on its own merits.

As a simplistic example, I wrote this code a while back:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=crypto/sha256_bpf&id=e9e12f056f2abed50a30b762db9185799f5864e6

and its two parents.  This series added a more Zinc-like API to
SHA256.  And it did it without replacing the SHA256 implementation.
Doing this for Zinc would be a bit more complication, since the arch
code would need to be invoked too, but it should be doable.

FWIW, Wireguard should not actually depend on the replacement of the
crypto implementation.

2. The new Zinc crypto implementations look like they're brand new.  I
realize that they have some history, some of them are derived from
OpenSSL, etc, but none of this is really apparent in the patches
themselves.  It would be great if the kernel could literally share the
same code as something like OpenSSL, since OpenSSL gets much more
attention than the kernel's crypto.  Failing that, it would be nice if
the patches made it more clear how the code differs from its origin.
At the very least, though, if the replacement of the crypto code were,
as above, a patch that just replaced the crypto code, it would be much
easier to review and benchmark intelligently.

--Andy
David Miller Sept. 17, 2018, 4:45 a.m. UTC | #36
From: Andy Lutomirski <luto@kernel.org>
Date: Sun, 16 Sep 2018 21:09:11 -0700

> CRYPTO API
> M:      Herbert Xu <herbert@gondor.apana.org.au>
> M:      "David S. Miller" <davem@davemloft.net>
> L:      linux-crypto@vger.kernel.org
> 
> Herbert hasn't replied to any of these submissions.  You're the other
> maintainer :)

Herbert is the primary crypto maintainer, I haven't done a serious
review of crypto code in ages.

So yes, Herbert review is what is important here.
Jason A. Donenfeld Sept. 17, 2018, 5:07 a.m. UTC | #37
Hey Andy,

Thanks a lot for your feedback.

On Mon, Sep 17, 2018 at 6:09 AM Andy Lutomirski <luto@kernel.org> wrote:
> 1. Zinc conflates the addition of a new API with the replacement of
> some algorithm implementations.  This is problematic.  Look at the
> recent benchmarks of ipsec before and after this series.  Apparently
> big packets get faster and small packets get slower.  It would be
> really nice to bisect the series to narrow down *where* the regression
> came from, but, as currently structured, you can't.
>
> The right way to do this is to rearrange the series.  First, the new
> Zinc APIs should be added, and they should be backed with the
> *existing* crypto code.  (If the code needs to be moved or copied to a
> new location, so be it.  The patch will be messy because somehow the
> Zinc API is going to have to dispatch to the arch-specific code, and
> the way that the crypto API handles it is not exactly friendly to this
> type of use.  So be it.)  Then another patch should switch the crypto
> API to use the Zinc interface.  That patch, *by itself*, can be
> benchmarked.  If it causes a regression for small ipsec packets, then
> it can be tracked down relatively easily.  Once this is all done, the
> actual crypto implementation can be changed, and that changed can be
> reviewed on its own merits.

That ipsec regression was less related to the implementation and more
related to calling kernel_fpu_begin() unnecessarily, something I've
now fixed. So I'm not sure that's such a good example. However, I can
try to implement Zinc over the existing assembly (Martin's and Ard's),
first, as you've described. This will be a pretty large amount of
work, but if you think it's worth it for the commit history, then I'll
do it.

> 2. The new Zinc crypto implementations look like they're brand new.  I
> realize that they have some history, some of them are derived from
> OpenSSL, etc, but none of this is really apparent in the patches
> themselves.

The whole point of going with these is that they _aren't_ brand new,
yet they are very fast. Eyeballs and fuzzer hours are important, and
AndyP's seems to get the most eyeballs and fuzzer hours, generally.

> it would be nice if
> the patches made it more clear how the code differs from its origin.
> At the very least, though, if the replacement of the crypto code were,
> as above, a patch that just replaced the crypto code, it would be much
> easier to review and benchmark intelligently.

You seem to have replied to the v3 thread, not the v4 thread. I've
already started to include lots of detail about the origins of the
code and [any] important differences in v4, and I'll continue to add
more detail for v5.

On <https://git.zx2c4.com/linux-dev/log/>, have a look at AndyP's x86_64 ones:
- zinc: ChaCha20 x86_64 implementation
- zinc: Poly1305 x86_64 implementation
For the arm/arm64 ones, the changes were even more trivial, so much so
that at Ard's urging, I included a cleaned-up diff inside the commit
message:
- zinc: ChaCha20 ARM and ARM64 implementations
- zinc: Poly1305 ARM and ARM64 implementations

How's that level of detail looking to you?

Thanks again for the review.

Regards,
Jason
Ard Biesheuvel Sept. 17, 2018, 5:26 a.m. UTC | #38
On 17 September 2018 at 06:09, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Sep 11, 2018 at 4:57 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Andrew Lunn <andrew@lunn.ch>
>> Date: Wed, 12 Sep 2018 01:30:15 +0200
>>
>> > Just as an FYI:
>> >
>> > 1) I don't think anybody in netdev has taken a serious look at the
>> > network code yet. There is little point until the controversial part
>> > of the code, Zinc, has been sorted out.
>> >
>> > 2) I personally would be surprised if DaveM took this code without
>> > having an Acked-by from the crypto subsystem people. In the same way,
>> > i doubt the crypto people would take an Ethernet driver without having
>> > DaveM's Acked-by.
>>
>> Both of Andrew's statements are completely true.
>>
>> I'm not looking at any of the networking bits until the crypto stuff
>> is fully sorted and fully supported and Ack'd by crypto folks.
>
> So, as a process question, whom exactly are we waiting for:
>
> CRYPTO API
> M:      Herbert Xu <herbert@gondor.apana.org.au>
> M:      "David S. Miller" <davem@davemloft.net>
> L:      linux-crypto@vger.kernel.org
>
> Herbert hasn't replied to any of these submissions.  You're the other
> maintainer :)
>
> To the extent that you (DaveM) want my ack, here's what I think of the
> series so far:
>
> The new APIs to the crypto primitives are good.  For code that wants
> to do a specific known crypto operation, they are much, much more
> pleasant to use than the existing crypto API.  The code cleanups in
> the big keys patch speak for themselves IMO.  I have no problem with
> the addition of a brand-new API to the kernel, especially when it's a
> nice one like Zinc's, even if that API starts out with only a couple
> of users.
>
> Zinc's arrangement of arch code is just fine.  Sure, there are
> arguments that putting arch-specific code in arch/ is better, but this
> is mostly bikeshedding IMO.
>
> There has been some discussion of the exact best way to handle
> simd_relax() and some other minor nitpicks of API details.  This kind
> of stuff doesn't need to block the series -- it can always be reworked
> down the road if needed.
>
> There are two things I don't like right now, though:
>
> 1. Zinc conflates the addition of a new API with the replacement of
> some algorithm implementations.  This is problematic.  Look at the
> recent benchmarks of ipsec before and after this series.  Apparently
> big packets get faster and small packets get slower.  It would be
> really nice to bisect the series to narrow down *where* the regression
> came from, but, as currently structured, you can't.
>
> The right way to do this is to rearrange the series.  First, the new
> Zinc APIs should be added, and they should be backed with the
> *existing* crypto code.  (If the code needs to be moved or copied to a
> new location, so be it.  The patch will be messy because somehow the
> Zinc API is going to have to dispatch to the arch-specific code, and
> the way that the crypto API handles it is not exactly friendly to this
> type of use.  So be it.)  Then another patch should switch the crypto
> API to use the Zinc interface.  That patch, *by itself*, can be
> benchmarked.  If it causes a regression for small ipsec packets, then
> it can be tracked down relatively easily.  Once this is all done, the
> actual crypto implementation can be changed, and that changed can be
> reviewed on its own merits.
>
> As a simplistic example, I wrote this code a while back:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=crypto/sha256_bpf&id=e9e12f056f2abed50a30b762db9185799f5864e6
>
> and its two parents.  This series added a more Zinc-like API to
> SHA256.  And it did it without replacing the SHA256 implementation.
> Doing this for Zinc would be a bit more complication, since the arch
> code would need to be invoked too, but it should be doable.
>
> FWIW, Wireguard should not actually depend on the replacement of the
> crypto implementation.
>
> 2. The new Zinc crypto implementations look like they're brand new.  I
> realize that they have some history, some of them are derived from
> OpenSSL, etc, but none of this is really apparent in the patches
> themselves.  It would be great if the kernel could literally share the
> same code as something like OpenSSL, since OpenSSL gets much more
> attention than the kernel's crypto.  Failing that, it would be nice if
> the patches made it more clear how the code differs from its origin.
> At the very least, though, if the replacement of the crypto code were,
> as above, a patch that just replaced the crypto code, it would be much
> easier to review and benchmark intelligently.
>

OK, so let me summarize my remaining concerns as well. I may be a bit
more finicky than Andy, though.

First of all, the rate at which new revisions of this series are
appearing increases the review effort unnecessarily, especially given
that the latest version seemed to have some issues that would have
been spotted by a simple build test. I would like to urge Jason to
bear with us and bring this discussion to a close before resubmitting.

As far as I can tell (i.e., as a user not a network dev), WireGuard is
an excellent piece of code, and I would like to see it merged. I also
think there is little disagreement about the quality of the proposed
algorithm implementations and the usefulness of having a set of easy
to use solid crypto primitives in addition to or complementing the
current crypto API.

I do have some concerns over how the code is organized though:

* simd_relax() is currently not called by the crypto routines
themselves. This means that the worst case scheduling latency is
unbounded, which is unacceptable for the -rt kernel. The worst case
scheduling latency should never be proportional to the input size.
(Apologies for not spotting that earlier)

* Using a cute name for the crypto library [that will end up being the
preferred choice for casual use only] may confuse people, given that
we have lots of code in crypto/ already. I'd much prefer using, e.g.,
crypto/lib and crypto/api (regardless of where the arch specific
pieces live)

* I'd prefer the arch specific pieces to live under arch, but I can
live with keeping them in a single place, as long as the arch
maintainers have some kind of jurisdiction over them. I also think
there should be some overlap between the maintainership
responsibilities of the two generic pieces (api and lib).

* (Nit) The GCC command line -include'd .h files contain variable and
function definitions so they are actually .c files.

* (Nit) Referencing CONFIG_xxx macros from -include'd files adds the
implicit assumption that the -include appears after the -include of
kconfig.h.

* Adding /conditional/ -include's (or #include's) increases the size
of the validation space, which is why we generally prefer
unconditional includes (and static inline stubs), and 'if
(IS_ENABLED(CONFIG_xxx))' over #ifdef CONFIG_xxx

* The current organization of the code puts all available (for the
arch) versions of all routines into a single module, which can only be
built in once we update random.c to use Zinc's chacha20 routines. This
bloats the core kernel (which is a huge deal for embedded systems that
have very strict boot requirements). It also makes it impossible to
simply blacklist a module if you, for instance, prefer to use the
[potentially more power efficient] scalar code over the SIMD code when
using a distro kernel.

[To summarize the 4 points above, I'd much rather see a more
conventional organization where different parts are provided by
different modules. I don't think the argument that inlining is needed
for performance is actually valid, given that we have branch
predictors and static keys, and the arch SIMD code is built as
separate object files anyway]

* If we reuse source files from OpenSSL, we should use that actual
source which is the perlasm not the emitted assembler. Also, we should
work with Andy Polyakov (as I have done several times over the past 5+
years) to upstream the changes we apply to the kernel version of the
code. The same applies to code from other sources, btw, but I am not
personally familiar with them.

* If upstreaming the changes is not an option, they should be applied
as a separate patch and not hidden in a 5000 line patch without any
justification or documentation (but Jason is already working on that)
Andy Lutomirski Sept. 17, 2018, 2:51 p.m. UTC | #39
> On Sep 16, 2018, at 10:26 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> 
> As far as I can tell (i.e., as a user not a network dev), WireGuard is
> an excellent piece of code, and I would like to see it merged. I also
> think there is little disagreement about the quality of the proposed
> algorithm implementations and the usefulness of having a set of easy
> to use solid crypto primitives in addition to or complementing the
> current crypto API.
> 
> I do have some concerns over how the code is organized though:
> 
> * simd_relax() is currently not called by the crypto routines
> themselves. This means that the worst case scheduling latency is
> unbounded, which is unacceptable for the -rt kernel. The worst case
> scheduling latency should never be proportional to the input size.
> (Apologies for not spotting that earlier)
> 
> * Using a cute name for the crypto library [that will end up being the
> preferred choice for casual use only] may confuse people, given that
> we have lots of code in crypto/ already. I'd much prefer using, e.g.,
> crypto/lib and crypto/api (regardless of where the arch specific
> pieces live)
> 
> * I'd prefer the arch specific pieces to live under arch, but I can
> live with keeping them in a single place, as long as the arch
> maintainers have some kind of jurisdiction over them. I also think
> there should be some overlap between the maintainership
> responsibilities of the two generic pieces (api and lib).
> 
> * (Nit) The GCC command line -include'd .h files contain variable and
> function definitions so they are actually .c files.

Hmm. I would suggest just getting rid of the -include magic entirely.  The resulting ifdef will be more comprehensible.


> * The current organization of the code puts all available (for the
> arch) versions of all routines into a single module, which can only be
> built in once we update random.c to use Zinc's chacha20 routines. This
> bloats the core kernel (which is a huge deal for embedded systems that
> have very strict boot requirements). It also makes it impossible to
> simply blacklist a module if you, for instance, prefer to use the
> [potentially more power efficient] scalar code over the SIMD code when
> using a distro kernel.

I think the module organization needs to change. It needs to be possible to have chacha20 built in but AES or whatever as a module.

> 
> [To summarize the 4 points above, I'd much rather see a more
> conventional organization where different parts are provided by
> different modules. I don't think the argument that inlining is needed
> for performance is actually valid, given that we have branch
> predictors and static keys, and the arch SIMD code is built as
> separate object files anyway]

I might have agreed before Spectre :(. Unfortunately, unless we do some magic, I think the code would look something like:

if (static_branch_likely(have_simd)) arch_chacha20();

...where arch_chacha20 is a *pointer*. And that will generate a retpoline and run very, very slowly.  (I just rewrote some of the x86 entry code to eliminate one retpoline. I got a 5% speedup on some tests according to the kbuild bot.)

So, if we really wanted modules, we’d need a new dynamic patching mechanism.

I would suggest instead adding two boot (or debugfs) options:

simd=off: disables simd_get using a static branch.

crypto_chacha20_nosimd: Does what it sounds like.
Andy Lutomirski Sept. 17, 2018, 2:53 p.m. UTC | #40
> On Sep 16, 2018, at 10:07 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> Hey Andy,
> 
> Thanks a lot for your feedback.
> 
>> On Mon, Sep 17, 2018 at 6:09 AM Andy Lutomirski <luto@kernel.org> wrote:
>> 1. Zinc conflates the addition of a new API with the replacement of
>> some algorithm implementations.  This is problematic.  Look at the
>> recent benchmarks of ipsec before and after this series.  Apparently
>> big packets get faster and small packets get slower.  It would be
>> really nice to bisect the series to narrow down *where* the regression
>> came from, but, as currently structured, you can't.
>> 
>> The right way to do this is to rearrange the series.  First, the new
>> Zinc APIs should be added, and they should be backed with the
>> *existing* crypto code.  (If the code needs to be moved or copied to a
>> new location, so be it.  The patch will be messy because somehow the
>> Zinc API is going to have to dispatch to the arch-specific code, and
>> the way that the crypto API handles it is not exactly friendly to this
>> type of use.  So be it.)  Then another patch should switch the crypto
>> API to use the Zinc interface.  That patch, *by itself*, can be
>> benchmarked.  If it causes a regression for small ipsec packets, then
>> it can be tracked down relatively easily.  Once this is all done, the
>> actual crypto implementation can be changed, and that changed can be
>> reviewed on its own merits.
> 
> That ipsec regression was less related to the implementation and more
> related to calling kernel_fpu_begin() unnecessarily, something I've
> now fixed. So I'm not sure that's such a good example. However, I can
> try to implement Zinc over the existing assembly (Martin's and Ard's),
> first, as you've described. This will be a pretty large amount of
> work, but if you think it's worth it for the commit history, then I'll
> do it.

Ard, what do you think?  I think it would
be nice, but if the authors of that assembly are convinced it should be replaced, then this step is optional IMO.

> 
>> 2. The new Zinc crypto implementations look like they're brand new.  I
>> realize that they have some history, some of them are derived from
>> OpenSSL, etc, but none of this is really apparent in the patches
>> themselves.
> 
> The whole point of going with these is that they _aren't_ brand new,
> yet they are very fast. Eyeballs and fuzzer hours are important, and
> AndyP's seems to get the most eyeballs and fuzzer hours, generally.
> 
>> it would be nice if
>> the patches made it more clear how the code differs from its origin.
>> At the very least, though, if the replacement of the crypto code were,
>> as above, a patch that just replaced the crypto code, it would be much
>> easier to review and benchmark intelligently.
> 
> You seem to have replied to the v3 thread, not the v4 thread. I've
> already started to include lots of detail about the origins of the
> code and [any] important differences in v4, and I'll continue to add
> more detail for v5.

This is indeed better.  Ard’s reply covers this better.
Andy Lutomirski Sept. 17, 2018, 2:55 p.m. UTC | #41
> On Sep 16, 2018, at 9:45 PM, David Miller <davem@davemloft.net> wrote:
> 
> From: Andy Lutomirski <luto@kernel.org>
> Date: Sun, 16 Sep 2018 21:09:11 -0700
> 
>> CRYPTO API
>> M:      Herbert Xu <herbert@gondor.apana.org.au>
>> M:      "David S. Miller" <davem@davemloft.net>
>> L:      linux-crypto@vger.kernel.org
>> 
>> Herbert hasn't replied to any of these submissions.  You're the other
>> maintainer :)
> 
> Herbert is the primary crypto maintainer, I haven't done a serious
> review of crypto code in ages.
> 
> So yes, Herbert review is what is important here.

Would you accept Ard’s (and/or Eric Biggers’, perhaps, as an alternative)?  Or, if you really want Herbert’s review, can you ask him to review it?  (Hi Herbert!)
Jason A. Donenfeld Sept. 17, 2018, 2:59 p.m. UTC | #42
On Mon, Sep 17, 2018 at 4:54 PM Andy Lutomirski <luto@amacapital.net> wrote:
> Ard, what do you think?  I think it would
> be nice, but if the authors of that assembly are convinced it should be replaced, then this step is optional IMO.

I just spent several hours reworking everything for ChaCha. That was
the easy one. Poly1305 will be much harder. Work in progress, I guess,
but erf it's rough.
Jason A. Donenfeld Sept. 17, 2018, 2:59 p.m. UTC | #43
On Mon, Sep 17, 2018 at 4:56 PM Andy Lutomirski <luto@amacapital.net> wrote:
> Would you accept Ard’s (and/or Eric Biggers’, perhaps, as an alternative)?  Or, if you really want Herbert’s review, can you ask him to review it?  (Hi Herbert!)

Preferably Eric's.
Jason A. Donenfeld Sept. 17, 2018, 3:28 p.m. UTC | #44
On Mon, Sep 17, 2018 at 4:52 PM Andy Lutomirski <luto@amacapital.net> wrote:
> > * (Nit) The GCC command line -include'd .h files contain variable and
> > function definitions so they are actually .c files.
>
> Hmm. I would suggest just getting rid of the -include magic entirely.  The resulting ifdef will be more comprehensible.

I really don't think so, actually. The way the -include stuff works
now is that the glue code is inlined in the same place that the
assembly object file is added to the build object list, so it gels
together cleanly, as the thing is defined and set in one single place.
I could go back to the ifdefs - and even make them as clean as
possible - but I think that puts more things in more places and is
therefore more confusing. The -include system now works extremely
well.
Jason A. Donenfeld Sept. 17, 2018, 3:31 p.m. UTC | #45
On Mon, Sep 17, 2018 at 4:52 PM Andy Lutomirski <luto@amacapital.net> wrote:
> I think the module organization needs to change. It needs to be possible to have chacha20 built in but AES or whatever as a module.

Okay, I'll do that for v5.

> I might have agreed before Spectre :(. Unfortunately, unless we do some magic, I think the code would look something like:
>
> if (static_branch_likely(have_simd)) arch_chacha20();
>
> ...where arch_chacha20 is a *pointer*. And that will generate a retpoline and run very, very slowly.  (I just rewrote some of the x86 entry code to eliminate one retpoline. I got a 5% speedup on some tests according to the kbuild bot.)

Actually, the way it works now benefits from the compilers inliner and
the branch predictor. I benchmarked this without any retpoline
slowdowns, and the branch predictor becomes correct pretty much all
the time. We can tinker with this after the initial merge, if you
really want, but avoiding function pointers and instead using ordinary
branches really winds up being quite fast.
Jason A. Donenfeld Sept. 17, 2018, 3:52 p.m. UTC | #46
Hi Ard,

On Mon, Sep 17, 2018 at 7:26 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> OK, so let me summarize my remaining concerns as well. I may be a bit
> more finicky than Andy, though.

Yes, and generally hostile to this whole initiative since the
beginning. But I am very grateful for your reviews nonetheless, and
I'll do my best to incorporate as much as is reasonable.

> I would like to urge Jason to
> bear with us and bring this discussion to a close before resubmitting.

What I fear is that either:
- You don't like the Zinc initiative in one way or another, and the
desire to "keep discussing" and adding more things is less out of
their necessity and more out of a desire to stall it indefinitely.
- You're going to bikeshed and bikeshed and waste tons of time until
Zinc copies lots of the same design decisions from the present crypto
API.

I do sincerely hope these are only fears and not what actually is
going on. I'll do my best to take into serious consideration what you
say -- many are indeed extremely helpful --  but I certainly won't be
wholesale accepting all the things you've mentioned.

Nevertheless, I'll make sure v5 has a pretty healthy quantity of
improvements in it before resubmitting.

> * simd_relax() is currently not called by the crypto routines
> themselves. This means that the worst case scheduling latency is
> unbounded, which is unacceptable for the -rt kernel. The worst case
> scheduling latency should never be proportional to the input size.
> (Apologies for not spotting that earlier)

Good catch. I actually did this correct when porting the crypto API to
use Zinc (in those later top commits in v4), but I hadn't in the Zinc
code itself. I'll address this for v5.

> maintainership
> responsibilities

Samuel and I intend to maintain Zinc in lib/zinc/ and send future
updates to it through Greg's tree, as mentioned in the 00/ cover
letter. The maintainership of the existing crypto API won't change.

> * The current organization of the code puts all available (for the
> arch) versions of all routines into a single module, which can only be
> built in once we update random.c to use Zinc's chacha20 routines. This
> bloats the core kernel (which is a huge deal for embedded systems that
> have very strict boot requirements).

I'll split each Zinc primitive into a separate module for v5, per your
and Andy's desire. And the SIMD code is already toggle-able via a
Kconfig menu option.

> we should
> work with Andy Polyakov (as I have done several times over the past 5+
> years) to upstream the changes we apply to the kernel version of the
> code.

Indeed this is the intent.

> The same applies to code from other sources, btw, but I am not
> personally familiar with them.

Good news on this front:
- Rene wrote the MIPS code specifically for WireGuard, so that _is_ upstream.
- Samuel wrote the BLAKE2 assembly, and he's the co-maintainer of Zinc with me.
- I talk to Dan and Peter a decent amount about qhasm.
- I'm in very close contact with the team behind HACL*, and they're
treating Zinc as a target -- stylistically and with regards to kernel
requirements -- which means they're looking at what's happening in
this patchset and adjusting accordingly.


> * If upstreaming the changes is not an option, they should be applied
> as a separate patch and not hidden in a 5000 line patch without any
> justification or documentation (but Jason is already working on that)

Indeed this is already underway.

Thanks again for your review.

Jason
Andy Lutomirski Sept. 17, 2018, 4:06 p.m. UTC | #47
> On Sep 17, 2018, at 8:28 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Mon, Sep 17, 2018 at 4:52 PM Andy Lutomirski <luto@amacapital.net> wrote:
>>> * (Nit) The GCC command line -include'd .h files contain variable and
>>> function definitions so they are actually .c files.
>>
>> Hmm. I would suggest just getting rid of the -include magic entirely.  The resulting ifdef will be more comprehensible.
>
> I really don't think so, actually. The way the -include stuff works
> now is that the glue code is inlined in the same place that the
> assembly object file is added to the build object list, so it gels
> together cleanly, as the thing is defined and set in one single place.
> I could go back to the ifdefs - and even make them as clean as
> possible - but I think that puts more things in more places and is
> therefore more confusing. The -include system now works extremely
> well.

Is it really better than:

#ifdef CONFIG_X86_64
#include "whatever"
#endif

It seems a more obfuscated than needed to put the equivalent of that
into the Makefile, and I don't think people really like searching
through the Makefile to figure out why the code does what it does.
Andy Lutomirski Sept. 17, 2018, 4:07 p.m. UTC | #48
On Mon, Sep 17, 2018 at 8:32 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Mon, Sep 17, 2018 at 4:52 PM Andy Lutomirski <luto@amacapital.net> wrote:
> > I think the module organization needs to change. It needs to be possible to have chacha20 built in but AES or whatever as a module.
>
> Okay, I'll do that for v5.
>
> > I might have agreed before Spectre :(. Unfortunately, unless we do some magic, I think the code would look something like:
> >
> > if (static_branch_likely(have_simd)) arch_chacha20();
> >
> > ...where arch_chacha20 is a *pointer*. And that will generate a retpoline and run very, very slowly.  (I just rewrote some of the x86 entry code to eliminate one retpoline. I got a 5% speedup on some tests according to the kbuild bot.)
>
> Actually, the way it works now benefits from the compilers inliner and
> the branch predictor. I benchmarked this without any retpoline
> slowdowns, and the branch predictor becomes correct pretty much all
> the time. We can tinker with this after the initial merge, if you
> really want, but avoiding function pointers and instead using ordinary
> branches really winds up being quite fast.

Indeed.  What I'm saying is that you shouldn't refactor it this way
because it will be slow.  I agree it would be conceptually nice to be
able to blacklist a chacha20_x86_64 module to disable the asm, but I
think it would be very hard to get good performance.

--Andy
Jason A. Donenfeld Sept. 17, 2018, 4:16 p.m. UTC | #49
On Mon, Sep 17, 2018 at 6:14 PM Andy Lutomirski <luto@amacapital.net> wrote:
> Indeed.  What I'm saying is that you shouldn't refactor it this way
> because it will be slow.  I agree it would be conceptually nice to be
> able to blacklist a chacha20_x86_64 module to disable the asm, but I
> think it would be very hard to get good performance.

I hadn't understood your nosimd=1 command line suggestion the first
time through, but now I see what you were after. This would be really
easy to add. And I can do it for v5 if you want. But I'm kind of loath
to add too much stuff to the initial patchset. Do you think this is an
important feature to have for it? Or should I leave it for later?
Jason A. Donenfeld Sept. 17, 2018, 4:17 p.m. UTC | #50
On Mon, Sep 17, 2018 at 6:06 PM Andy Lutomirski <luto@amacapital.net> wrote:
> #ifdef CONFIG_X86_64
> #include "whatever"
> #endif
>
> It seems a more obfuscated than needed to put the equivalent of that
> into the Makefile, and I don't think people really like searching
> through the Makefile to figure out why the code does what it does.

Okay, I'll change it back to ifdefs for v5.
Andy Lutomirski Sept. 17, 2018, 4:18 p.m. UTC | #51
> On Sep 17, 2018, at 9:16 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
>> On Mon, Sep 17, 2018 at 6:14 PM Andy Lutomirski <luto@amacapital.net> wrote:
>> Indeed.  What I'm saying is that you shouldn't refactor it this way
>> because it will be slow.  I agree it would be conceptually nice to be
>> able to blacklist a chacha20_x86_64 module to disable the asm, but I
>> think it would be very hard to get good performance.
> 
> I hadn't understood your nosimd=1 command line suggestion the first
> time through, but now I see what you were after. This would be really
> easy to add. And I can do it for v5 if you want. But I'm kind of loath
> to add too much stuff to the initial patchset. Do you think this is an
> important feature to have for it? Or should I leave it for later?

I think it’s fine for later. It’s potentially useful for benchmarking and debugging.
Jason A. Donenfeld Sept. 17, 2018, 4:24 p.m. UTC | #52
On Mon, Sep 17, 2018 at 4:54 PM Andy Lutomirski <luto@amacapital.net> wrote:
> be nice, but if the authors of that assembly are convinced it should be replaced, then this step is optional IMO.

I think this actually makes the patchset and maintenance of it a lot
more confusing, so I'm going to abort doing this. I'd rather make the
convincing argument for the assembly anyway.
Jason A. Donenfeld Sept. 18, 2018, 12:56 a.m. UTC | #53
Hey Andy,

On Mon, Sep 17, 2018 at 6:18 PM Andy Lutomirski <luto@amacapital.net> wrote:
> I think it’s fine for later. It’s potentially useful for benchmarking and debugging.

I went ahead and added it anyway in the end. It was really quite easy
to do, and it sets a good template for future primitives that are
added to Zinc.

Meanwhile, I've fully triaged the regression Martin found, and in the
same test he performed, the Zinc code now runs faster than the old
code (which should be no surprise).

I've also finished getting rid of -include, per your suggestion, and
modularizing each algorithm to be loadable independently, per your
suggestion.

All and all, v5 is turning out really well, and I'll probably submit
it sometime soon. My tree was just added to the kbuild test bot, so
I'll hopefully get an email about any breakage *before* I submit this
time.

Regards,
Jason
Herbert Xu Sept. 18, 2018, 4:21 a.m. UTC | #54
Hi Jason:

Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> On Mon, Sep 17, 2018 at 7:26 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> OK, so let me summarize my remaining concerns as well. I may be a bit
>> more finicky than Andy, though.
> 
> Yes, and generally hostile to this whole initiative since the
> beginning. But I am very grateful for your reviews nonetheless, and
> I'll do my best to incorporate as much as is reasonable.
 
>> I would like to urge Jason to
>> bear with us and bring this discussion to a close before resubmitting.
> 
> What I fear is that either:
> - You don't like the Zinc initiative in one way or another, and the
> desire to "keep discussing" and adding more things is less out of
> their necessity and more out of a desire to stall it indefinitely.
> - You're going to bikeshed and bikeshed and waste tons of time until
> Zinc copies lots of the same design decisions from the present crypto
> API.

That may be your view but from what I've read Ard has been very
constructive in pointing out issues in your submission.  If your
response to criticism is to dismiss them as hostile then I'm afraid
that we will not be able to progress on this patch series.

Please keep in mind that this is a large project that has to support
multiple users on one hand (not just WireGuard) and complex hardware
acceleration drivers on the other.  Ard has been one of the most
prolific contributors to the crypto code and his review should be
taken seriously.

Thanks,
Jason A. Donenfeld Sept. 18, 2018, 4:26 a.m. UTC | #55
Hi Herbert,

On Tue, Sep 18, 2018 at 6:21 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> That may be your view but from what I've read Ard has been very
> constructive in pointing out issues in your submission.
> Please keep in mind that this is a large project that has to support
> multiple users on one hand (not just WireGuard) and complex hardware
> acceleration drivers on the other.  Ard has been one of the most
> prolific contributors to the crypto code and his review should be
> taken seriously.

I'm well aware, which is why I also wrote:

"I do sincerely hope these are only fears and not what actually is
going on. I'll do my best to take into serious consideration what you
say -- many are indeed extremely helpful"

Worry not, I've been working around the clock to implement suggestions
from Ard and from others.

I'll be submitting v5 fairly soon, which will probably be a good spot
for you to review as well, if you're interested. I'm just going to
give kbuild testbot a few more hours on it first.

Regards,
Jason
Ard Biesheuvel Sept. 18, 2018, 4:06 p.m. UTC | #56
On 17 September 2018 at 07:53, Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
>> On Sep 16, 2018, at 10:07 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> Hey Andy,
>>
>> Thanks a lot for your feedback.
>>
>>> On Mon, Sep 17, 2018 at 6:09 AM Andy Lutomirski <luto@kernel.org> wrote:
>>> 1. Zinc conflates the addition of a new API with the replacement of
>>> some algorithm implementations.  This is problematic.  Look at the
>>> recent benchmarks of ipsec before and after this series.  Apparently
>>> big packets get faster and small packets get slower.  It would be
>>> really nice to bisect the series to narrow down *where* the regression
>>> came from, but, as currently structured, you can't.
>>>
>>> The right way to do this is to rearrange the series.  First, the new
>>> Zinc APIs should be added, and they should be backed with the
>>> *existing* crypto code.  (If the code needs to be moved or copied to a
>>> new location, so be it.  The patch will be messy because somehow the
>>> Zinc API is going to have to dispatch to the arch-specific code, and
>>> the way that the crypto API handles it is not exactly friendly to this
>>> type of use.  So be it.)  Then another patch should switch the crypto
>>> API to use the Zinc interface.  That patch, *by itself*, can be
>>> benchmarked.  If it causes a regression for small ipsec packets, then
>>> it can be tracked down relatively easily.  Once this is all done, the
>>> actual crypto implementation can be changed, and that changed can be
>>> reviewed on its own merits.
>>
>> That ipsec regression was less related to the implementation and more
>> related to calling kernel_fpu_begin() unnecessarily, something I've
>> now fixed. So I'm not sure that's such a good example. However, I can
>> try to implement Zinc over the existing assembly (Martin's and Ard's),
>> first, as you've described. This will be a pretty large amount of
>> work, but if you think it's worth it for the commit history, then I'll
>> do it.
>
> Ard, what do you think?  I think it would
> be nice, but if the authors of that assembly are convinced it should be replaced, then this step is optional IMO.
>

I don't think there is any problem with switching to faster code
immediately, as long as we have data that supports the claim  that it
is actually faster on hardware people care about. The arm64 ChaCha20
code in the kernel is slower than the OpenSSL code as far as I know,
so I have no problems whatsoever with dropping it. The ARM version,
however, is slower on Cortex-A7 (according to Eric's benchmarks),
which is the only 32-bit ARM core anybody cares about these days.

>>
>>> 2. The new Zinc crypto implementations look like they're brand new.  I
>>> realize that they have some history, some of them are derived from
>>> OpenSSL, etc, but none of this is really apparent in the patches
>>> themselves.
>>
>> The whole point of going with these is that they _aren't_ brand new,
>> yet they are very fast. Eyeballs and fuzzer hours are important, and
>> AndyP's seems to get the most eyeballs and fuzzer hours, generally.
>>
>>> it would be nice if
>>> the patches made it more clear how the code differs from its origin.
>>> At the very least, though, if the replacement of the crypto code were,
>>> as above, a patch that just replaced the crypto code, it would be much
>>> easier to review and benchmark intelligently.
>>
>> You seem to have replied to the v3 thread, not the v4 thread. I've
>> already started to include lots of detail about the origins of the
>> code and [any] important differences in v4, and I'll continue to add
>> more detail for v5.
>
> This is indeed better.  Ard’s reply covers this better.
Jason A. Donenfeld Sept. 18, 2018, 4:45 p.m. UTC | #57
Hi Ard,

On Tue, Sep 18, 2018 at 6:06 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> as long as we have data that supports the claim  that it
> is actually faster on hardware people care about.

Seems reasonable. I'll next be turning my attention back to ARM
performance. I'll try to gather some numbers. Expect data at some
point next week.

Regards,
Jason
Ard Biesheuvel Sept. 18, 2018, 6:53 p.m. UTC | #58
On 17 September 2018 at 08:52, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Ard,
>
> On Mon, Sep 17, 2018 at 7:26 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> OK, so let me summarize my remaining concerns as well. I may be a bit
>> more finicky than Andy, though.
>
> Yes, and generally hostile to this whole initiative since the
> beginning. But I am very grateful for your reviews nonetheless, and
> I'll do my best to incorporate as much as is reasonable.
>
>> I would like to urge Jason to
>> bear with us and bring this discussion to a close before resubmitting.
>
> What I fear is that either:
> - You don't like the Zinc initiative in one way or another, and the
> desire to "keep discussing" and adding more things is less out of
> their necessity and more out of a desire to stall it indefinitely.
> - You're going to bikeshed and bikeshed and waste tons of time until
> Zinc copies lots of the same design decisions from the present crypto
> API.
>

Given that you show no interest whatsoever in gaining an understanding
of the underlying requirements that we have to deal with in the crypto
API, the only way to get my point across is by repeatedly stating it
in response to your patches. Also, sending out a new series each time
with only half of the review comments addressed doesn't make me a
bikeshedder.

> I do sincerely hope these are only fears and not what actually is
> going on. I'll do my best to take into serious consideration what you
> say -- many are indeed extremely helpful --  but I certainly won't be
> wholesale accepting all the things you've mentioned.
>

I have pointed out to you numerous times (as has Eric) that the
ChaCha20 ARM code you are importing from the OpenSSL project has been
found to be slower on Cortex-A7, which represents the vast majority of
devices expected to be in the field in 1~2 years. Yet, we are at the
fifth revision now where you are replacing the existing code. Also, I
have asked you more than once to split out your changes to the
upstream OpenSSL code into separate patches so we can more easily
track them, but v5 now puts them in the commit log (again) [but in a
corrupted way - the preprocessor directives are filtered out by
git-commit], which means we cannot use git diff/blame etc to look at
them.

Upstreaming code is about taking an interest in other people's use
cases, and about choosing your battles. It is unfortunate that we have
spent all this time talking about a couple of crypto routines, while
the actual meat of your submission is in WireGuard itself, which I'm
sure you much rather talk about.



>> * simd_relax() is currently not called by the crypto routines
>> themselves. This means that the worst case scheduling latency is
>> unbounded, which is unacceptable for the -rt kernel. The worst case
>> scheduling latency should never be proportional to the input size.
>> (Apologies for not spotting that earlier)
>
> Good catch. I actually did this correct when porting the crypto API to
> use Zinc (in those later top commits in v4), but I hadn't in the Zinc
> code itself. I'll address this for v5.
>
>> maintainership
>> responsibilities
>
> Samuel and I intend to maintain Zinc in lib/zinc/ and send future
> updates to it through Greg's tree, as mentioned in the 00/ cover
> letter. The maintainership of the existing crypto API won't change.
>
>> * The current organization of the code puts all available (for the
>> arch) versions of all routines into a single module, which can only be
>> built in once we update random.c to use Zinc's chacha20 routines. This
>> bloats the core kernel (which is a huge deal for embedded systems that
>> have very strict boot requirements).
>
> I'll split each Zinc primitive into a separate module for v5, per your
> and Andy's desire. And the SIMD code is already toggle-able via a
> Kconfig menu option.
>
>> we should
>> work with Andy Polyakov (as I have done several times over the past 5+
>> years) to upstream the changes we apply to the kernel version of the
>> code.
>
> Indeed this is the intent.
>
>> The same applies to code from other sources, btw, but I am not
>> personally familiar with them.
>
> Good news on this front:
> - Rene wrote the MIPS code specifically for WireGuard, so that _is_ upstream.
> - Samuel wrote the BLAKE2 assembly, and he's the co-maintainer of Zinc with me.
> - I talk to Dan and Peter a decent amount about qhasm.
> - I'm in very close contact with the team behind HACL*, and they're
> treating Zinc as a target -- stylistically and with regards to kernel
> requirements -- which means they're looking at what's happening in
> this patchset and adjusting accordingly.
>
>
>> * If upstreaming the changes is not an option, they should be applied
>> as a separate patch and not hidden in a 5000 line patch without any
>> justification or documentation (but Jason is already working on that)
>
> Indeed this is already underway.
>
> Thanks again for your review.
>
> Jason
Jason A. Donenfeld Sept. 18, 2018, 8:36 p.m. UTC | #59
Hi Ard,

On Tue, Sep 18, 2018 at 11:53:11AM -0700, Ard Biesheuvel wrote:
> On 17 September 2018 at 08:52, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > Hi Ard,
> >
> 
> Given that you show no interest whatsoever in gaining an understanding
> of the underlying requirements that we have to deal with in the crypto
> API, the only way to get my point across is by repeatedly stating it

Sorry if I've come across that way, but I am certainly interested in
gaining such an understanding of said requirements.

> I have pointed out to you numerous times (as has Eric) that the
> ChaCha20 ARM code you are importing from the OpenSSL project has been
> found to be slower on Cortex-A7, which represents the vast majority of
> devices expected to be in the field in 1~2 years.

I mentioned in the other thread that I intend immanently to begin
benching on a variety of ARM boards, and I should have numbers and
results and conclusions somewhat soon for the list.

My initial notion here was that it'd be better to go with AndyP's code
no matter what, and then later work with him on contributing said
improvements, and then port these back to the kernel. However, you and
Andy made a compelling point about code replacement -- that it's okay to
replace all in one go only if there are positive benchmark results. So I
think what I'll do to appease this is -- if the benchmarks are indeed
how Eric suggested -- stick with your faster code, and then follow up
with replacement plans after the merge. (I feel a bit more comfortable
with varying ChaCha code, because implementations tend to be pretty
straight forward and harder to screw up in subtle ways than, say,
poly1305 or curve25519.)

> have asked you more than once to split out your changes to the
> upstream OpenSSL code into separate patches so we can more easily
> track them, but v5 now puts them in the commit log (again) [but in a
> corrupted way - the preprocessor directives are filtered out by
> git-commit], which means we cannot use git diff/blame etc to look at
> them.

Didnt't realize this was so important to you. It's trivial to do, so
I'll do that for AndyP's implementations for the next revision.


> Upstreaming code is about taking an interest in other people's use
> cases, and about choosing your battles. It is unfortunate that we have
> spent all this time talking about a couple of crypto routines, while
> the actual meat of your submission is in WireGuard itself, which I'm
> sure you much rather talk about.

I don't find it unfortunate; getting the crypto right is of the utmost
importance.

Regards,
Jason
Ard Biesheuvel Sept. 19, 2018, 4:55 p.m. UTC | #60
On 18 September 2018 at 13:36, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Ard,
>
> On Tue, Sep 18, 2018 at 11:53:11AM -0700, Ard Biesheuvel wrote:
>> On 17 September 2018 at 08:52, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> > Hi Ard,
>> >
>>
>> Given that you show no interest whatsoever in gaining an understanding
>> of the underlying requirements that we have to deal with in the crypto
>> API, the only way to get my point across is by repeatedly stating it
>
> Sorry if I've come across that way, but I am certainly interested in
> gaining such an understanding of said requirements.
>

Excellent.

So you are probably aware that there is a big push in the industry
these days towards high-performance accelerators on a coherent fabric,
potentially with device side caches, and this is the main reason that
the crypto API abstractions are the way they are today.

So while standardizing on Chacha20Poly1305 in WireGuard [while still a
policy decision in my view] seems reasonable to me, the decision to
limit WireGuard to synchronous software implementations seems to me
like something we may want to revisit in the future. What is your view
on that? And is the ChaCha20/Poly1305 AEAD construction in WireGuard
identical to the one in RFC 7539, i.e., could an accelerator built for
the IPsec flavor of ChaCha20Poly1305 potentially be reused for
WireGuard?
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 2ef884b883c3..d2092e52320d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16160,6 +16160,14 @@  Q:	https://patchwork.linuxtv.org/project/linux-media/list/
 S:	Maintained
 F:	drivers/media/dvb-frontends/zd1301_demod*
 
+ZINC CRYPTOGRAPHY LIBRARY
+M:	Jason A. Donenfeld <Jason@zx2c4.com>
+M:	Samuel Neves <sneves@dei.uc.pt>
+S:	Maintained
+F:	lib/zinc/
+F:	include/zinc/
+L:	linux-crypto@vger.kernel.org
+
 ZPOOL COMPRESSED PAGE STORAGE API
 M:	Dan Streetman <ddstreet@ieee.org>
 L:	linux-mm@kvack.org
diff --git a/lib/Kconfig b/lib/Kconfig
index a3928d4438b5..3e6848269c66 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -485,6 +485,8 @@  config GLOB_SELFTEST
 	  module load) by a small amount, so you're welcome to play with
 	  it, but you probably don't need it.
 
+source "lib/zinc/Kconfig"
+
 #
 # Netlink attribute parsing support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index ca3f7ebb900d..3f16e35d2c11 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -214,6 +214,8 @@  obj-$(CONFIG_PERCPU_TEST) += percpu_test.o
 
 obj-$(CONFIG_ASN1) += asn1_decoder.o
 
+obj-$(CONFIG_ZINC) += zinc/
+
 obj-$(CONFIG_FONT_SUPPORT) += fonts/
 
 obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
new file mode 100644
index 000000000000..aa4f8d449d6b
--- /dev/null
+++ b/lib/zinc/Kconfig
@@ -0,0 +1,20 @@ 
+config ZINC
+	tristate
+	select CRYPTO_BLKCIPHER
+	select VFP
+	select VFPv3
+	select NEON
+	select KERNEL_MODE_NEON
+
+config ZINC_DEBUG
+	bool "Zinc cryptography library debugging and self-tests"
+	depends on ZINC
+	help
+	  This builds a series of self-tests for the Zinc crypto library, which
+	  help diagnose any cryptographic algorithm implementation issues that
+	  might be at the root cause of potential bugs. It also adds various
+	  debugging traps.
+
+	  Unless you're developing and testing cryptographic routines, or are
+	  especially paranoid about correctness on your hardware, you may say
+	  N here.
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
new file mode 100644
index 000000000000..dad47573de42
--- /dev/null
+++ b/lib/zinc/Makefile
@@ -0,0 +1,8 @@ 
+ccflags-y := -O3
+ccflags-y += -Wframe-larger-than=8192
+ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt'
+ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
+
+zinc-y += main.o
+
+obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
new file mode 100644
index 000000000000..ceece33ff5a7
--- /dev/null
+++ b/lib/zinc/main.c
@@ -0,0 +1,31 @@ 
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+
+#ifdef DEBUG
+#define selftest(which) do { \
+	if (!which ## _selftest()) \
+		return -ENOTRECOVERABLE; \
+} while (0)
+#else
+#define selftest(which)
+#endif
+
+static int __init mod_init(void)
+{
+	return 0;
+}
+
+static void __exit mod_exit(void)
+{
+}
+
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Zinc cryptography library");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");