mbox series

[RFC,00/20] AES cleanup

Message ID 20190612124838.2492-1-ard.biesheuvel@linaro.org (mailing list archive)
Headers show
Series AES cleanup | expand

Message

Ard Biesheuvel June 12, 2019, 12:48 p.m. UTC
This started out as an attempt to provide synchronous SIMD based GCM
on 32-bit ARM, but along the way, I ended up changing and cleaning up
so many things that it is more of a general AES cleanup now rather than
anything else.

This is posted as an RFC since using the mostly-fixed-time C code as
library code for general use may be something that needs some discussion.

On 32-bit ARM, asynchronous GCM can be provided by the following drivers:

                                              |  cycles per byte on low end Si
  gcm_base(ctr(aes-generic),ghash-generic)    |            65.3
  gcm_base(ctr-aes-neonbs,ghash-ce) [*]       |            27.7
  gcm_base(ctr-aes-ce,ghash-ce) [**]          |             3.7

  [*]  ghash-ce using vmull.p8 instructions
  [**] ghash-ce using optional vmull.p64 instructions

The third and fastest option is actually only available on 32-bit cores that
implement the v8 Crypto Extensions, which are rare, but the NEON based runner
up is obviously a huge improvement over the generic code, not only in terms of
performance, but also because it is time invariant (generic AES and generic
GHASH are both implemented using lookup tables, which are susceptible to
cache timing attacks)

However, when allocating the gcm(aes) skcipher in synchronous mode, we end up
with the generic code, due to the fact that the NEON code has no handling for
invocations that occur from a context where the NEON cannot be used, and so
it defers the processing to a kthread, which is only permitted for asynchronous
ciphers.

So let's implement this fallback handling, by reusing some of the logic that
has already been implemented for arm64. Note that these fallbacks are rarely
called in practice, but the API requires the functionality to be there.
This is implemented in patches 17-20.

All the patches leading up to that are cleanups for the AES code, to reduce
the dependency on the generic table based AES code, or in some cases, hardcoded
dependencies on the scalar arm64 asm code which suffers from the same problem.
It also removes redundant key expansion routines, and gets rid of the x86
scalar asm code, which is a maintenance burden and is not actually faster than
the generic code built with a modern compiler.

Ard Biesheuvel (20):
  crypto: arm/aes-ce - cosmetic/whitespace cleanup
  crypto: arm/aes - rename local routines to prevent future clashes
  crypto: aes/fixed-time - align key schedule with other implementations
  crypto: aes - create AES library based on the fixed time AES code
  crypto: x86/aes-ni - switch to generic for fallback and key routines
  crypto: x86/aes - drop scalar assembler implementations
  crypto: padlock/aes - switch to library version of key expansion
    routine
  crypto: cesa/aes - switch to library version of key expansion routine
  crypto: safexcel/aes - switch to library version of key expansion
    routine
  crypto: arm64/ghash - switch to AES library
  crypto: arm/aes-neonbs - switch to library version of key expansion
    routine
  crypto: arm64/aes-ccm - switch to AES library
  crypto: arm64/aes-neonbs - switch to library version of key expansion
    routine
  crypto: arm64/aes-ce - switch to library version of key expansion
    routine
  crypto: generic/aes - drop key expansion routine in favor of library
    version
  crypto: arm64/aes-ce-cipher - use AES library as fallback
  crypto: aes - move ctr(aes) non-SIMD fallback to AES library
  crypto: arm/aes-ce - provide a synchronous version of ctr(aes)
  crypto: arm/aes-neonbs - provide a synchronous version of ctr(aes)
  crypto: arm/ghash - provide a synchronous version

 arch/arm/crypto/Kconfig                        |   2 +-
 arch/arm/crypto/aes-ce-glue.c                  | 152 +++++---
 arch/arm/crypto/aes-cipher-glue.c              |   8 +-
 arch/arm/crypto/aes-neonbs-glue.c              |  62 ++-
 arch/arm/crypto/ghash-ce-glue.c                |  78 ++--
 arch/arm64/crypto/Kconfig                      |  10 +-
 arch/arm64/crypto/aes-ce-ccm-glue.c            |  18 +-
 arch/arm64/crypto/aes-ce-glue.c                |   7 +-
 arch/arm64/crypto/aes-cipher-glue.c            |  11 +-
 arch/arm64/crypto/aes-ctr-fallback.h           |  53 ---
 arch/arm64/crypto/aes-glue.c                   |  29 +-
 arch/arm64/crypto/aes-neonbs-glue.c            |  20 +-
 arch/arm64/crypto/ghash-ce-glue.c              |  30 +-
 arch/x86/crypto/Makefile                       |   4 -
 arch/x86/crypto/aes-i586-asm_32.S              | 362 -----------------
 arch/x86/crypto/aes-x86_64-asm_64.S            | 185 ---------
 arch/x86/crypto/aes_glue.c                     |  71 ----
 arch/x86/crypto/aesni-intel_glue.c             |  15 +-
 arch/x86/include/asm/crypto/aes.h              |  12 -
 crypto/Kconfig                                 |  53 +--
 crypto/aes_generic.c                           | 161 +-------
 crypto/aes_ti.c                                | 335 +---------------
 drivers/crypto/Kconfig                         |   6 +-
 drivers/crypto/inside-secure/safexcel_cipher.c |   2 +-
 drivers/crypto/marvell/cipher.c                |   2 +-
 drivers/crypto/padlock-aes.c                   |   2 +-
 include/crypto/aes.h                           |  47 ++-
 lib/crypto/Makefile                            |   3 +
 lib/crypto/aes.c                               | 409 ++++++++++++++++++++
 29 files changed, 754 insertions(+), 1395 deletions(-)
 delete mode 100644 arch/arm64/crypto/aes-ctr-fallback.h
 delete mode 100644 arch/x86/crypto/aes-i586-asm_32.S
 delete mode 100644 arch/x86/crypto/aes-x86_64-asm_64.S
 delete mode 100644 arch/x86/crypto/aes_glue.c
 delete mode 100644 arch/x86/include/asm/crypto/aes.h
 create mode 100644 lib/crypto/aes.c

Comments

Herbert Xu June 12, 2019, 1:58 p.m. UTC | #1
On Wed, Jun 12, 2019 at 02:48:18PM +0200, Ard Biesheuvel wrote:
>
> All the patches leading up to that are cleanups for the AES code, to reduce
> the dependency on the generic table based AES code, or in some cases, hardcoded
> dependencies on the scalar arm64 asm code which suffers from the same problem.
> It also removes redundant key expansion routines, and gets rid of the x86
> scalar asm code, which is a maintenance burden and is not actually faster than
> the generic code built with a modern compiler.

Nice, I like this a lot.

I presume you'll be converting the AES cipher users throughout
the kernel (such as net/ipv4/tcp_fastopen) at some point, right?

Thanks,
Ard Biesheuvel June 12, 2019, 2 p.m. UTC | #2
On Wed, 12 Jun 2019 at 15:58, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Wed, Jun 12, 2019 at 02:48:18PM +0200, Ard Biesheuvel wrote:
> >
> > All the patches leading up to that are cleanups for the AES code, to reduce
> > the dependency on the generic table based AES code, or in some cases, hardcoded
> > dependencies on the scalar arm64 asm code which suffers from the same problem.
> > It also removes redundant key expansion routines, and gets rid of the x86
> > scalar asm code, which is a maintenance burden and is not actually faster than
> > the generic code built with a modern compiler.
>
> Nice, I like this a lot.
>
> I presume you'll be converting the AES cipher users throughout
> the kernel (such as net/ipv4/tcp_fastopen) at some point, right?
>

Yes. I am currently surveying which users need to switch to a proper
mode, and which ones can just use the unoptimized library version
(such as tcp_fastopen).
Ard Biesheuvel June 12, 2019, 5:08 p.m. UTC | #3
On Wed, 12 Jun 2019 at 16:00, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On Wed, 12 Jun 2019 at 15:58, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> > On Wed, Jun 12, 2019 at 02:48:18PM +0200, Ard Biesheuvel wrote:
> > >
> > > All the patches leading up to that are cleanups for the AES code, to reduce
> > > the dependency on the generic table based AES code, or in some cases, hardcoded
> > > dependencies on the scalar arm64 asm code which suffers from the same problem.
> > > It also removes redundant key expansion routines, and gets rid of the x86
> > > scalar asm code, which is a maintenance burden and is not actually faster than
> > > the generic code built with a modern compiler.
> >
> > Nice, I like this a lot.
> >
> > I presume you'll be converting the AES cipher users throughout
> > the kernel (such as net/ipv4/tcp_fastopen) at some point, right?
> >
>
> Yes. I am currently surveying which users need to switch to a proper
> mode, and which ones can just use the unoptimized library version
> (such as tcp_fastopen).

Would anyone mind if I switch the TCP fastopen code to SipHash instead
of AES? I can see in the archives that Dave wasn't a fan at the time,
but for a MAC over ~16 bytes of input, it is actually a much more
better choice than what we have now. And calling into the AES cipher
twice, as we do for IPv6 connections, is even worse, since you take
the hit of invoking a SIMD cipher twice in cases where the cipher is
provided by a SIMD based implementation.