[RFC,v2,00/12] crypto: Adiantum support

Message ID	20181015175424.97147-1-ebiggers@kernel.org (mailing list archive)
Headers	show Return-Path: <linux-fscrypt-owner@kernel.org> From: Eric Biggers <ebiggers@kernel.org> To: linux-crypto@vger.kernel.org Cc: linux-fscrypt@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au>, Paul Crowley <paulcrowley@google.com>, Greg Kaiser <gkaiser@google.com>, Michael Halcrow <mhalcrow@google.com>, "Jason A . Donenfeld" <Jason@zx2c4.com>, Samuel Neves <samuel.c.p.neves@gmail.com>, Tomer Ashur <tomer.ashur@esat.kuleuven.be> Subject: [RFC PATCH v2 00/12] crypto: Adiantum support Date: Mon, 15 Oct 2018 10:54:12 -0700 Message-Id: <20181015175424.97147-1-ebiggers@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk
Series	crypto: Adiantum support \| expand [RFC,v2,00/12] crypto: Adiantum support [RFC,v2,01/12] crypto: chacha20-generic - add HChaCha20 library function [RFC,v2,02/12] crypto: chacha20-generic - add XChaCha20 support [RFC,v2,03/12] crypto: chacha20-generic - refactor to allow varying number of rounds [RFC,v2,04/12] crypto: chacha - add XChaCha12 support [RFC,v2,05/12] crypto: arm/chacha20 - add XChaCha20 support [RFC,v2,06/12] crypto: arm/chacha20 - refactor to allow varying number of rounds [RFC,v2,07/12] crypto: arm/chacha - add XChaCha12 support [RFC,v2,08/12] crypto: poly1305 - add Poly1305 core API [RFC,v2,09/12] crypto: nhpoly1305 - add NHPoly1305 support [RFC,v2,10/12] crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305 [RFC,v2,11/12] crypto: adiantum - add Adiantum support [RFC,v2,12/12] fscrypt: add Adiantum support

Eric Biggers Oct. 15, 2018, 5:54 p.m. UTC

Hello,

We've been working to find a way to bring storage encryption to
entry-level Android devices like the inexpensive "Android Go" devices
sold in developing countries, and some smartwatches.  Unfortunately,
often these devices still ship with no encryption, since for cost
reasons they have to use older CPUs like ARM Cortex-A7; and these CPUs
lack the ARMv8 Cryptography Extensions, making AES-XTS much too slow.

We're trying to change this, since we believe encryption is for
everyone, not just those who can afford it.  And while it's unknown how
long CPUs without AES support will be around, there will likely always
be a "low end"; and in any case it's immensely valuable to provide a
software-optimized cipher that doesn't depend on hardware support.
Lack of hardware support should not be an excuse for no encryption.

But after an extensive search (e.g. see [1]) we were unable to find an
existing cipher that simultaneously meets the very strict performance
requirements on ARM processors, is secure (including having sufficient
security parameters as well as sufficient cryptanalysis of any
primitive(s) used), is suitable for practical use in dm-crypt and
fscrypt, *and* avoids any particularly controversial primitive.

Therefore, we (well, Paul Crowley did the real work) designed a new
encryption mode, Adiantum.  In essence, Adiantum makes it secure to use
the ChaCha stream cipher for disk encryption.  Adiantum is specified by
our paper here: https://eprint.iacr.org/2018/720.pdf ("Adiantum:
length-preserving encryption for entry-level processors").  Reference
code and test vectors are here: https://github.com/google/adiantum.
Most of the high-level concepts of Adiantum are not new; similar
existing modes include XCB, HCTR, and HCH.  Adiantum and these modes are
true wide-block modes (tweakable super-pseudorandom permutations), so
they actually provide a stronger notion of security than XTS.

Adiantum is an improved version of our previous algorithm, HPolyC [2].
Like HPolyC, Adiantum uses XChaCha12, two passes of an
ε-almost-∆-universal (εA∆U) hash function, and one AES-256 encryption of
a single 16-byte block.  On ARM Cortex-A7, on 4096-byte messages
Adiantum is about 4x faster than AES-256-XTS (about 5x for decryption),
and about 30% faster than Speck128/256-XTS.

Adiantum is a construction, not a primitive.  Its security is reducible
to that of XChaCha12 and AES-256, subject to a security bound; the proof
is in Section 5 of our paper.  Therefore, one need not "trust" Adiantum;
they only need trust XChaCha12 and AES-256.  Note that of these two
primitives, AES-256 currently has the lower security margin.

Adiantum is ~20% faster than HPolyC, with no loss of security; in fact,
Adiantum's security bound is slightly better than HPolyC's.  It does
this by choosing a faster εA∆U hash function: it still uses Poly1305's
εA∆U hash function, but now a hash function from the "NH" family of hash
functions is used to "compress" the message by 32x first.  NH is εAU (as
shown in the UMAC paper[3]) but is over twice as fast as Poly1305.  Key
agility is reduced, but that's acceptable for disk encryption.

NH is also very simple, and it's easy to implement in SIMD assembly,
e.g. in ARM NEON.  Now, to get good performance only a SIMD
implementation of NH is required, not Poly1305.  Therefore, Adiantum can
be easier to port to new platforms than HPolyC, despite Adiantum's
slightly increased complexity.  For now this patchset only includes an
ARM32 NEON implementation of NH, but as a proof of concept I've also
written SSE2, AVX2, and ARM64 NEON implementations of NH; see
https://github.com/google/adiantum/tree/master/benchmark/src.

This patchset adds Adiantum to Linux's crypto API, focusing on generic
and ARM32 implementations.  Patches 1-7 add support for XChaCha20 and
XChaCha12.  Patches 8-10 add NHPoly1305 support, needed for Adiantum
hashing.  Patch 11 adds Adiantum support as a skcipher template.

Patch 12 adds Adiantum support to fscrypt ("file-based encryption").
In fscrypt, Adiantum is used for filenames encryption as well as
contents encryption; since Adiantum is a SPRP, it fixes the information
leak when filenames share a common prefix.  We also take advantage of
Adiantum's support for long tweaks to include the per-inode nonce
directly in the tweak, which allows providing an option to skip the
per-file key derivation, providing even greater performance benefits.

As before, some of these patches conflict with the new "Zinc" crypto
library.  But I don't know when Zinc will be merged, so for now I've
continued to base this patchset on the current 'cryptodev'.

Again, for more details please read our paper:

    Adiantum: length-preserving encryption for entry-level processors
    (https://eprint.iacr.org/2018/720.pdf)

This patchset can also be found in git at
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
branch "adiantum-v2".

References:
  [1] https://www.spinics.net/lists/linux-crypto/msg33000.html
  [2] https://patchwork.kernel.org/cover/10558059/
  [3] https://fastcrypto.org/umac/umac_proc.pdf

Eric Biggers (12):
  crypto: chacha20-generic - add HChaCha20 library function
  crypto: chacha20-generic - add XChaCha20 support
  crypto: chacha20-generic - refactor to allow varying number of rounds
  crypto: chacha - add XChaCha12 support
  crypto: arm/chacha20 - add XChaCha20 support
  crypto: arm/chacha20 - refactor to allow varying number of rounds
  crypto: arm/chacha - add XChaCha12 support
  crypto: poly1305 - add Poly1305 core API
  crypto: nhpoly1305 - add NHPoly1305 support
  crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305
  crypto: adiantum - add Adiantum support
  fscrypt: add Adiantum support

 Documentation/filesystems/fscrypt.rst         |  183 +-
 arch/arm/crypto/Kconfig                       |    7 +-
 arch/arm/crypto/Makefile                      |    6 +-
 ...hacha20-neon-core.S => chacha-neon-core.S} |   90 +-
 arch/arm/crypto/chacha-neon-glue.c            |  207 ++
 arch/arm/crypto/chacha20-neon-glue.c          |  127 -
 arch/arm/crypto/nh-neon-core.S                |  116 +
 arch/arm/crypto/nhpoly1305-neon-glue.c        |   78 +
 arch/arm64/crypto/chacha20-neon-glue.c        |   40 +-
 arch/x86/crypto/chacha20_glue.c               |   52 +-
 arch/x86/crypto/poly1305_glue.c               |   20 +-
 crypto/Kconfig                                |   46 +-
 crypto/Makefile                               |    4 +-
 crypto/adiantum.c                             |  648 ++++
 crypto/chacha20_generic.c                     |  137 -
 crypto/chacha20poly1305.c                     |   10 +-
 crypto/chacha_generic.c                       |  217 ++
 crypto/nhpoly1305.c                           |  288 ++
 crypto/poly1305_generic.c                     |  174 +-
 crypto/testmgr.c                              |   30 +
 crypto/testmgr.h                              | 2856 ++++++++++++++++-
 drivers/char/random.c                         |   51 +-
 fs/crypto/crypto.c                            |   35 +-
 fs/crypto/fname.c                             |   22 +-
 fs/crypto/fscrypt_private.h                   |   66 +-
 fs/crypto/keyinfo.c                           |  322 +-
 fs/crypto/policy.c                            |    5 +-
 include/crypto/chacha.h                       |   53 +
 include/crypto/chacha20.h                     |   27 -
 include/crypto/nhpoly1305.h                   |   74 +
 include/crypto/poly1305.h                     |   28 +-
 include/uapi/linux/fs.h                       |    4 +-
 lib/Makefile                                  |    2 +-
 lib/{chacha20.c => chacha.c}                  |   59 +-
 34 files changed, 5389 insertions(+), 695 deletions(-)
 rename arch/arm/crypto/{chacha20-neon-core.S => chacha-neon-core.S} (92%)
 create mode 100644 arch/arm/crypto/chacha-neon-glue.c
 delete mode 100644 arch/arm/crypto/chacha20-neon-glue.c
 create mode 100644 arch/arm/crypto/nh-neon-core.S
 create mode 100644 arch/arm/crypto/nhpoly1305-neon-glue.c
 create mode 100644 crypto/adiantum.c
 delete mode 100644 crypto/chacha20_generic.c
 create mode 100644 crypto/chacha_generic.c
 create mode 100644 crypto/nhpoly1305.c
 create mode 100644 include/crypto/chacha.h
 delete mode 100644 include/crypto/chacha20.h
 create mode 100644 include/crypto/nhpoly1305.h
 rename lib/{chacha20.c => chacha.c} (59%)

Jason A. Donenfeld Oct. 19, 2018, 3:58 p.m. UTC | #1

Hello Eric,

> As before, some of these patches conflict with the new "Zinc" crypto
> library.  But I don't know when Zinc will be merged, so for now I've
> continued to base this patchset on the current 'cryptodev'.

I'd appreciate it if you waited to merge this until you can rebase it
on top of Zinc. In fact, if you already want to build it on top of
Zinc, I'm happy to work with you on that in a shared repo or similar.
We can also hash out the details of that in person in Vancouver in a
few weeks. I think pushing this in before will create undesirable
churn for both of us.

> Therefore, we (well, Paul Crowley did the real work) designed a new
> encryption mode, Adiantum.  In essence, Adiantum makes it secure to use
> the ChaCha stream cipher for disk encryption.  Adiantum is specified by
> our paper here: https://eprint.iacr.org/2018/720.pdf ("Adiantum:
> length-preserving encryption for entry-level processors").  Reference
> code and test vectors are here: https://github.com/google/adiantum.
> Most of the high-level concepts of Adiantum are not new; similar
> existing modes include XCB, HCTR, and HCH.  Adiantum and these modes are
> true wide-block modes (tweakable super-pseudorandom permutations), so
> they actually provide a stronger notion of security than XTS.

Great, I'm very happy to see you've created such a high performance alternative.

Before merging this into the kernel, do you want to wait until you've
received some public review from academia?

Jason

Paul Crowley Oct. 19, 2018, 6:19 p.m. UTC | #2

On Fri, 19 Oct 2018 at 08:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Before merging this into the kernel, do you want to wait until you've
> received some public review from academia?

I would prefer not to wait. Unlike a new primitive whose strength can
only be known through attempts at cryptanalysis, Adiantum is a
construction based on
well-understood and trusted primitives; it is secure if the proof
accompanying it is correct. Given that (outside competitions or
standardization efforts) no-one ever issues public statements that
they think algorithms or proofs are good, what I'm expecting from
academia is silence :) The most we could hope for would be getting the
paper accepted at a conference, and we're pursuing that but there's a
good chance that won't happen simply because it's not very novel. It
basically takes existing ideas and applies them using a stream cipher
instead of a block cipher, and a faster hashing mode; it's also a
small update from HPolyC. I've had some private feedback that the
proof seems correct, and that's all I'm expecting to get.

Eric Biggers Oct. 19, 2018, 7:04 p.m. UTC | #3

Hi Jason,

On Fri, Oct 19, 2018 at 05:58:35PM +0200, Jason A. Donenfeld wrote:
> Hello Eric,
> 
> > As before, some of these patches conflict with the new "Zinc" crypto
> > library.  But I don't know when Zinc will be merged, so for now I've
> > continued to base this patchset on the current 'cryptodev'.
> 
> I'd appreciate it if you waited to merge this until you can rebase it
> on top of Zinc. In fact, if you already want to build it on top of
> Zinc, I'm happy to work with you on that in a shared repo or similar.
> We can also hash out the details of that in person in Vancouver in a
> few weeks. I think pushing this in before will create undesirable
> churn for both of us.
> 

I won't be at Plumbers, sorry!  For if/when it's needed, I'll start a version of
this based on Zinc.  The basic requirements are that we need (1) xchacha12 and
xchacha20 available as 'skciphers' in the crypto API, and (2) the poly1305_core
functions (see patch 08/12).  In principle, these can be implemented in Zinc.
The Adiantum template and all the NHPoly1305 stuff will be the same either way.
(Unless you'll want one or both of those moved to Zinc too.  To be honest, even
after your explanations I still don't have a clear idea of what is supposed to
go in Zinc and what isn't...)

However, for now I'm hesitant to completely abandon the current approach and bet
the farm on Zinc.  Zinc has a large scope and various controversies that haven't
yet been fully resolved to everyone's satisfaction, including unclear licenses
on some of the essential assembly files.  It's not appropriate to grind kernel
crypto development to grind a halt while everyone waits for Zinc.

So if Zinc is ready, then it makes sense for it to go first;
otherwise, it doesn't.  It's not yet clear which is the case.

Thanks,

- Eric

Ard Biesheuvel Oct. 20, 2018, 3:24 a.m. UTC | #4

On 20 October 2018 at 02:19, Paul Crowley <paulcrowley@google.com> wrote:
> On Fri, 19 Oct 2018 at 08:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> Before merging this into the kernel, do you want to wait until you've
>> received some public review from academia?
>
> I would prefer not to wait. Unlike a new primitive whose strength can
> only be known through attempts at cryptanalysis, Adiantum is a
> construction based on
> well-understood and trusted primitives; it is secure if the proof
> accompanying it is correct. Given that (outside competitions or
> standardization efforts) no-one ever issues public statements that
> they think algorithms or proofs are good, what I'm expecting from
> academia is silence :) The most we could hope for would be getting the
> paper accepted at a conference, and we're pursuing that but there's a
> good chance that won't happen simply because it's not very novel. It
> basically takes existing ideas and applies them using a stream cipher
> instead of a block cipher, and a faster hashing mode; it's also a
> small update from HPolyC. I've had some private feedback that the
> proof seems correct, and that's all I'm expecting to get.

Hi Paul, Eric,

The Adiantum paper claims

"On an ARM Cortex-A7 processor, Adiantum decrypts 4096-byte messages
at 11 cycles per byte, five times faster than AES-256-XTS, with a
constant-time implementation."

which is surprising to me. The bit slicing NEON AES core runs at ~14
cycle per byte on a Cortex-A15 (when encrypting), so 55 cycles per
byte on A7 sounds rather high. Is it really that bad?

Also, the paper mentions that the second hash pass and the stream
cipher en/decryption pass could be executed in parallel, while your
implementation performs three distinct passes. Do you have any
estimates on the potential performance gain of implementing that? In
my experience (which is mostly A53 rather than A7 based, mind you),
removing memory accesses can help tremendously to speed up the
execution on low end cores.

Eric Biggers Oct. 20, 2018, 5:22 a.m. UTC | #5

Hi Ard,

On Sat, Oct 20, 2018 at 11:24:05AM +0800, Ard Biesheuvel wrote:
> On 20 October 2018 at 02:19, Paul Crowley <paulcrowley@google.com> wrote:
> > On Fri, 19 Oct 2018 at 08:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >> Before merging this into the kernel, do you want to wait until you've
> >> received some public review from academia?
> >
> > I would prefer not to wait. Unlike a new primitive whose strength can
> > only be known through attempts at cryptanalysis, Adiantum is a
> > construction based on
> > well-understood and trusted primitives; it is secure if the proof
> > accompanying it is correct. Given that (outside competitions or
> > standardization efforts) no-one ever issues public statements that
> > they think algorithms or proofs are good, what I'm expecting from
> > academia is silence :) The most we could hope for would be getting the
> > paper accepted at a conference, and we're pursuing that but there's a
> > good chance that won't happen simply because it's not very novel. It
> > basically takes existing ideas and applies them using a stream cipher
> > instead of a block cipher, and a faster hashing mode; it's also a
> > small update from HPolyC. I've had some private feedback that the
> > proof seems correct, and that's all I'm expecting to get.
> 
> Hi Paul, Eric,
> 
> The Adiantum paper claims
> 
> "On an ARM Cortex-A7 processor, Adiantum decrypts 4096-byte messages
> at 11 cycles per byte, five times faster than AES-256-XTS, with a
> constant-time implementation."
> 
> which is surprising to me. The bit slicing NEON AES core runs at ~14
> cycle per byte on a Cortex-A15 (when encrypting), so 55 cycles per
> byte on A7 sounds rather high. Is it really that bad?

Yes, it's really that slow, maybe because the NEON unit on Cortex-A7 isn't very
good.  Our figures are shown in the performance table in section 4.  Note that
the abstract is talking about AES-256-XTS.  AES-128-XTS is ~27% faster.  You can
also reproduce our performance results using our userspace benchmark program
from https://github.com/google/adiantum/tree/master/benchmark.  It uses a copy
of aes-neonbs-core.S from the kernel source tree.

> 
> Also, the paper mentions that the second hash pass and the stream
> cipher en/decryption pass could be executed in parallel, while your
> implementation performs three distinct passes. Do you have any
> estimates on the potential performance gain of implementing that? In
> my experience (which is mostly A53 rather than A7 based, mind you),
> removing memory accesses can help tremendously to speed up the
> execution on low end cores.

As a quick hack, on Cortex-A7 I timed "NH" without loading the message words.
It became about 10% faster.  My NEON-accelerated NH is already only about 1.3
cpb, so that means in theory not having to reload the message words would save
~0.13 cpb...  But Adiantum as a whole is ~11 cpb, so that suggests the
improvement would be only a bit over 1%.

Maybe it could actually be better (for example, not having to map the pages
again could save a lot), but in practice considering the increased complexity as
well as that probably there wouldn't actually be enough registers to do
everything efficiently, it seemed it would cause far too much trouble to bother
yet (at least for the Linux kernel implementation; a two-pass implementation
could still be useful elsewhere, of course).

- Eric

Milan Broz Oct. 20, 2018, 10:26 a.m. UTC | #6

On 19/10/2018 21:04, Eric Biggers wrote:
> Hi Jason,
> 
> On Fri, Oct 19, 2018 at 05:58:35PM +0200, Jason A. Donenfeld wrote:
>> Hello Eric,
>>
>>> As before, some of these patches conflict with the new "Zinc" crypto
>>> library.  But I don't know when Zinc will be merged, so for now I've
>>> continued to base this patchset on the current 'cryptodev'.
>>
>> I'd appreciate it if you waited to merge this until you can rebase it
>> on top of Zinc. In fact, if you already want to build it on top of
>> Zinc, I'm happy to work with you on that in a shared repo or similar.
>> We can also hash out the details of that in person in Vancouver in a
>> few weeks. I think pushing this in before will create undesirable
>> churn for both of us.
>>
> 
> I won't be at Plumbers, sorry!  For if/when it's needed, I'll start a version of
> this based on Zinc.  The basic requirements are that we need (1) xchacha12 and
> xchacha20 available as 'skciphers' in the crypto API, and (2) the poly1305_core
> functions (see patch 08/12).  In principle, these can be implemented in Zinc.
> The Adiantum template and all the NHPoly1305 stuff will be the same either way.
> (Unless you'll want one or both of those moved to Zinc too.  To be honest, even
> after your explanations I still don't have a clear idea of what is supposed to
> go in Zinc and what isn't...)
> 
> However, for now I'm hesitant to completely abandon the current approach and bet
> the farm on Zinc.  Zinc has a large scope and various controversies that haven't
> yet been fully resolved to everyone's satisfaction, including unclear licenses
> on some of the essential assembly files.  It's not appropriate to grind kernel
> crypto development to grind a halt while everyone waits for Zinc.
> 
> So if Zinc is ready, then it makes sense for it to go first;
> otherwise, it doesn't.  It's not yet clear which is the case.

Does it mean, that if Adiantum is based on Zinc, it can be no longer used
for FDE (dm-crypt)? IOW only file-based encryption is possible?

Adiantum (as in your current git branches on kernel.org) can be used for dm-crypt
without any changes (yes, I played with it :) and with some easy tricks directly
through cryptsetup/LUKS as well.

I think we should have this as an alternative to length-preserving wide-block
cipher modes for FDE.

Milan

Jason A. Donenfeld Oct. 20, 2018, 1:47 p.m. UTC | #7

Hi Milan,

On Sat, Oct 20, 2018 at 12:53 PM Milan Broz <gmazyland@gmail.com> wrote:
> Does it mean, that if Adiantum is based on Zinc, it can be no longer used
> for FDE (dm-crypt)? IOW only file-based encryption is possible?

No, don't worry. All I had in mind was the software implementations of
chacha12 and so forth. There aren't any current plans at this point to
change the scafolding underlying dm-crypt.

Jason

Eric Biggers Oct. 21, 2018, 10:23 p.m. UTC | #8

On Fri, Oct 19, 2018 at 12:04:11PM -0700, Eric Biggers wrote:
> Hi Jason,
> 
> On Fri, Oct 19, 2018 at 05:58:35PM +0200, Jason A. Donenfeld wrote:
> > Hello Eric,
> > 
> > > As before, some of these patches conflict with the new "Zinc" crypto
> > > library.  But I don't know when Zinc will be merged, so for now I've
> > > continued to base this patchset on the current 'cryptodev'.
> > 
> > I'd appreciate it if you waited to merge this until you can rebase it
> > on top of Zinc. In fact, if you already want to build it on top of
> > Zinc, I'm happy to work with you on that in a shared repo or similar.
> > We can also hash out the details of that in person in Vancouver in a
> > few weeks. I think pushing this in before will create undesirable
> > churn for both of us.
> > 
> 
> I won't be at Plumbers, sorry!  For if/when it's needed, I'll start a version of
> this based on Zinc.  The basic requirements are that we need (1) xchacha12 and
> xchacha20 available as 'skciphers' in the crypto API, and (2) the poly1305_core
> functions (see patch 08/12).  In principle, these can be implemented in Zinc.
> The Adiantum template and all the NHPoly1305 stuff will be the same either way.
> (Unless you'll want one or both of those moved to Zinc too.  To be honest, even
> after your explanations I still don't have a clear idea of what is supposed to
> go in Zinc and what isn't...)
> 
> However, for now I'm hesitant to completely abandon the current approach and bet
> the farm on Zinc.  Zinc has a large scope and various controversies that haven't
> yet been fully resolved to everyone's satisfaction, including unclear licenses
> on some of the essential assembly files.  It's not appropriate to grind kernel
> crypto development to grind a halt while everyone waits for Zinc.
> 
> So if Zinc is ready, then it makes sense for it to go first;
> otherwise, it doesn't.  It's not yet clear which is the case.
> 

I started a branch based on Zinc:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git,
branch "adiantum-zinc".

For Poly1305, for now I decided to just use the existing functions, passing 0
for the 16-byte element is added at the end.  This causes some unnecessary
overhead, but it's not very much.  It also results in a much larger size of
'struct nhpoly1305_state', but that doesn't matter too much anymore either [1].

For ChaCha, I haven't yet updated all the "Zinc" assembly to support 12 rounds.
So far I've updated my ARM scalar implementation.  I still don't see how you
expect people to maintain the files like chacha20-x86_64.S from which all
comments, register aliases, etc. were removed in comparison to the original
OpenSSL code.  I find it hard to very understand what's going on from what is
nearly an 'objdump' output.  (I'll figure it out eventually, but it will take
some time.)  I don't see how dumping thousands of lines of undocumented,
generated assembly code into the kernel fits with your goals of "Zinc's focus is
on simplicity and clarity" and "inviting collaboration".  Note that the
OpenSSL-derived assembly files still have an unclear license as well.

I'm also still not a fan of the remaining duplication between "zinc" and
"crypto", e.g. we still have both crypto/chacha.h and zinc/chacha.h, and
separate tests for "zinc" and "crypto".  (I haven't yet gotten around to adding
"zinc tests" for XChaCha12, though I did add "crypto tests".  Note that "crypto
tests" are much easier to add, since all algorithms of the same type share a
common test framework -- not the case for Zinc.)

Of course, both myself and others have expressed concerns about these issues
previously too, yet they remain unaddressed nor is there a documentation file
explaining things.  So please understand that until it's clear that Zinc is
ready, I still have to have Adiantum ready to go without Zinc, just in case.

Thanks,

- Eric

[1] Originally we were going to define Adiantum's hash function to be
    Poly1305(message_length || tweak_length || tweak || NH(message)), which
    would have made it desirable to export the Poly1305 state before NH, so that
    it could be imported for the second hash step to avoid redundantly hashing
    the message length and tweak.  But later we changed it to
    Poly1305(message_length || tweak) + Poly1305(NH(message)).

Jason A. Donenfeld Oct. 21, 2018, 10:51 p.m. UTC | #9

Hey Eric,

On Mon, Oct 22, 2018 at 12:23 AM Eric Biggers <ebiggers@kernel.org> wrote:
> I started a branch based on Zinc:

Nice to see. I'm heading to bed in a second, so I'll give this a
thorough read-through tomorrow, but some preliminary notes on your
comments:

> For Poly1305, for now I decided to just use the existing functions, passing 0
> for the 16-byte element is added at the end.  This causes some unnecessary
> overhead, but it's not very much.  It also results in a much larger size of
> 'struct nhpoly1305_state', but that doesn't matter too much anymore either [1].
> [1] Originally we were going to define Adiantum's hash function to be
>     Poly1305(message_length || tweak_length || tweak || NH(message)), which
>     would have made it desirable to export the Poly1305 state before NH, so that
>     it could be imported for the second hash step to avoid redundantly hashing
>     the message length and tweak.  But later we changed it to
>     Poly1305(message_length || tweak) + Poly1305(NH(message)).

Out of curiosity, why this change?

> For ChaCha, I haven't yet updated all the "Zinc" assembly to support 12 rounds.
> So far I've updated my ARM scalar implementation.  I still don't see how you
> expect people to maintain the files like chacha20-x86_64.S from which all
> comments, register aliases, etc. were removed in comparison to the original
> OpenSSL code.

For at least the ARM[64] and MIPS64 code, I think it will be feasible
to import the .pl eventually. There's an open PR from Andy importing
some of the necessary changes. For the x86_64, that might be a little
trickier, but I can take another stab at it.

> I don't see how dumping thousands of lines of undocumented,
> generated assembly code into the kernel fits with your goals of "Zinc's focus is
> on simplicity and clarity" and "inviting collaboration".

It's not totally "undocumented" and totally "dumped"; that's a bit
hyperbolic. But I can understand it's not as friendly as we'd like.
I'll try to improve that.

> Note that the
> OpenSSL-derived assembly files still have an unclear license as well.

Andy's been pretty clear about the CRYPTOGAMS aspect with me. But, as
you pointed out on lkml and in the private thread, it hasn't yet
migrated over to the CRYPTOGAMS repo. I don't think this is a cause
for immediate concern, because it seems pretty certain it will wind up
there soon enough.

> (I haven't yet gotten around to adding
> "zinc tests" for XChaCha12, though I did add "crypto tests".  Note that "crypto
> tests" are much easier to add, since all algorithms of the same type share a
> common test framework -- not the case for Zinc.)

Actually the advantage of not working with a winding abstraction layer
is that specific tests can test particular aspects of particular
primitives -- for example, by looking at different chunking patterns.
It also enables you to write tests for internal, non-exported
functions.

> nor is there a documentation file
> explaining things.

Sorry, my bad on delaying that one. I'll be sure the Documentation/
stuff is ready before posting another series.

> So please understand that until it's clear that Zinc is
> ready, I still have to have Adiantum ready to go without Zinc, just in case.

Makes sense. I do really appreciate you taking the time, though, to
try this out with Zinc as well. Thanks for that.

Regards,
Jason

Tomer Ashur Oct. 22, 2018, 11:20 a.m. UTC | #10

> On 19-Oct-18 8:19 PM, Paul Crowley wrote:
>> I would prefer not to wait. Unlike a new primitive whose strength can
>> only be known through attempts at cryptanalysis, Adiantum is a
>> construction based on
>> well-understood and trusted primitives; it is secure if the proof
>> accompanying it is correct. Given that (outside competitions or
>> standardization efforts) no-one ever issues public statements that
>> they think algorithms or proofs are good, what I'm expecting from
>> academia is silence :) The most we could hope for would be getting the
>> paper accepted at a conference, and we're pursuing that but there's a
>> good chance that won't happen simply because it's not very novel. It
>> basically takes existing ideas and applies them using a stream cipher
>> instead of a block cipher, and a faster hashing mode; it's also a
>> small update from HPolyC. I've had some private feedback that the
>> proof seems correct, and that's all I'm expecting to get.
>
I tend to agree with Paul on this point. This is a place where academia
needs to improve. An attempt to do so is the Real World Crypto
conference (RWC; https://rwc.iacr.org/2019/), but the deadline for
submissions was October 1st. For HpolyC I asked a few people to take a
look at the construction and the consensus was that it seems secure but
that the proof style makes it hard to verify. I haven't had the time yet
to read the Adiantum paper (and I'm not a provable security person
anyway) but I suppose Paul took the comments he received on this into
account and that's the best we can hope for. Academia simply moves in a
different pace and has different incentives.

 Tomer

Paul Crowley Oct. 22, 2018, 5:17 p.m. UTC | #11

On Sun, 21 Oct 2018 at 15:52, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > [1] Originally we were going to define Adiantum's hash function to be
> >     Poly1305(message_length || tweak_length || tweak || NH(message)), which
> >     would have made it desirable to export the Poly1305 state before NH, so that
> >     it could be imported for the second hash step to avoid redundantly hashing
> >     the message length and tweak.  But later we changed it to
> >     Poly1305(message_length || tweak) + Poly1305(NH(message)).
>
> Out of curiosity, why this change?

With the old system, Eric ended up implementing a function which took
"message_length || tweak_length || tweak || message" as input and
*parsed out* the lengths in the first 16 bytes to know when to start
applying NH. That struck me as not nice at all, and we worked together
to design something that fitted more naturally into the way that
crypto is done in the kernel.

With this change, the part that can be kept in common between the two
hashing stages is cleanly separated from the part that will be
different, and the Poly1305(NH(message)) construction is a relatively
clean thing by itself to be part of the Linux kernel, though by itself
it is only epsilon-almost-delta-universal over equal-length inputs so
it has to be combined with something else to handle varying-length
inputs. This is not too dissimilar from the caveats around GHASH which
is also part of the kernel.

Eric Biggers Nov. 16, 2018, 9:52 p.m. UTC | #12

Hi Milan,

On Sat, Oct 20, 2018 at 12:26:20PM +0200, Milan Broz wrote:
> 
> Adiantum (as in your current git branches on kernel.org) can be used for dm-crypt
> without any changes (yes, I played with it :) and with some easy tricks directly
> through cryptsetup/LUKS as well.
> 
> I think we should have this as an alternative to length-preserving wide-block
> cipher modes for FDE.
> 

Yes, dm-crypt can use Adiantum by specifying the cipher as
"capi:adiantum(xchacha12,aes)-plain64".

But, I'm having trouble getting cryptsetup/LUKS to use Adiantum.
Using LUKS1, the following works:

    cryptsetup luksFormat /dev/$partition --cipher='capi:adiantum(xchacha12,aes)-plain64' --key-size 256

However, when possible we'd like people to use 4K sectors for better
performance, which I understand requires using the LUKS2 format along with
cryptsetup v2.0.0+ and Linux v4.12+.  But the following does *not* work:

    cryptsetup luksFormat /dev/$partition --cipher='capi:adiantum(xchacha12,aes)-plain64' --key-size 256 --type luks2 --sector-size 4096

The problem seems to be that when cryptsetup tries to encrypt the keyslot in
luks2_encrypt_to_storage(), it tries to use the algorithm via AF_ALG, but it
incorrectly requests "plain64(capi:adiantum(xchacha12,aes))" which fails.
It should request just "adiantum(xchacha12,aes)".

What are the "easy tricks" you had in mind -- do you mean there's already a way
to use Adiantum with cryptsetup, or do you mean that cryptsetup still needs to
be updated to fully support algorithms using the crypto API syntax?

Thanks,

- Eric

Milan Broz Nov. 17, 2018, 10:29 a.m. UTC | #13

On 16/11/2018 22:52, Eric Biggers wrote:
> Hi Milan,
> 
> On Sat, Oct 20, 2018 at 12:26:20PM +0200, Milan Broz wrote:
>>
>> Adiantum (as in your current git branches on kernel.org) can be used for dm-crypt
>> without any changes (yes, I played with it :) and with some easy tricks directly
>> through cryptsetup/LUKS as well.
>>
>> I think we should have this as an alternative to length-preserving wide-block
>> cipher modes for FDE.
>>
> 
> Yes, dm-crypt can use Adiantum by specifying the cipher as
> "capi:adiantum(xchacha12,aes)-plain64".
> 
> But, I'm having trouble getting cryptsetup/LUKS to use Adiantum.
> Using LUKS1, the following works:
> 
>     cryptsetup luksFormat /dev/$partition --cipher='capi:adiantum(xchacha12,aes)-plain64' --key-size 256
> 
> However, when possible we'd like people to use 4K sectors for better
> performance, which I understand requires using the LUKS2 format along with
> cryptsetup v2.0.0+ and Linux v4.12+.  But the following does *not* work:
> 
>     cryptsetup luksFormat /dev/$partition --cipher='capi:adiantum(xchacha12,aes)-plain64' --key-size 256 --type luks2 --sector-size 4096

Hi Eric,

actually I planned to test it and then reply to these patches with example cryptsetup
commands, but did not have time for it yet.
So thanks for a reminder ;-)

Recent cryptsetup supports sector-size even for plain device.

You actually do not need to use capi: prefix, Adiantum is a composition,
so "xchacha20,aes-adiantum-plain64" works as well (and it should work even for old cryptsetup).
(It is ugly, but it should be compatible.)

# cryptsetup open --type plain -c xchacha20,aes-adiantum-plain64 -s 256 --sector-size 4096 /dev/sdb test

For LUKS and benchmark, Adiantum need to use 32 bytes IV. And we have these parameter,
unfortunately, hardcoded...
(I guess there is already a way how to get this dynamically from userspace crypto API now.)

So, I already added patch to devel branch patch for benchmark to support Adiantum few days ago
https://gitlab.com/cryptsetup/cryptsetup/commit/bce567db461e558af7d735c694a50146db899709

This allows trivial benchmark (but it just encrypts one big blob of data):

#  cryptsetup benchmark -c xchacha20,aes-adiantum -s 256
# Tests are approximate using memory only (no storage IO).
#            Algorithm |       Key |      Encryption |      Decryption
xchacha20,aes-adiantum        256b       146.6 MiB/s       148.0 MiB/s
...

# ./cryptsetup benchmark -c xchacha12,aes-adiantum -s 256
xchacha12,aes-adiantum        256b       181.7 MiB/s       184.6 MiB/s

For LUKS2, we need a similar change to cryptoAPI IV size (unfortunately it does not
fallback to old keyslot handling, so LUKS2 does not work currently now).

I quickly added a workaround that fallbacks to default keyslot encryption for keyslots
in this case
https://gitlab.com/cryptsetup/cryptsetup/commit/29e87add5aac9d5eb0087881146988d9c4280915

then you can use LUKS2
# cryptsetup luksFormat --type luks2 --sector-size 4096 -c xchacha20,aes-adiantum-plain64 -s 256 /dev/sdb

(Example above will encrypt keyslots with AES-XTS and use Aviantum for data only.)

So, unfortunately yes, we need some small changes in cryptsetup for LUKS;
plain mode should work out of the box (with the syntax above).

Milan

Eric Biggers Nov. 19, 2018, 7:28 p.m. UTC | #14

Hi Milan,

On Sat, Nov 17, 2018 at 11:29:23AM +0100, Milan Broz wrote:
> On 16/11/2018 22:52, Eric Biggers wrote:
> > Hi Milan,
> > 
> > On Sat, Oct 20, 2018 at 12:26:20PM +0200, Milan Broz wrote:
> >>
> >> Adiantum (as in your current git branches on kernel.org) can be used for dm-crypt
> >> without any changes (yes, I played with it :) and with some easy tricks directly
> >> through cryptsetup/LUKS as well.
> >>
> >> I think we should have this as an alternative to length-preserving wide-block
> >> cipher modes for FDE.
> >>
> > 
> > Yes, dm-crypt can use Adiantum by specifying the cipher as
> > "capi:adiantum(xchacha12,aes)-plain64".
> > 
> > But, I'm having trouble getting cryptsetup/LUKS to use Adiantum.
> > Using LUKS1, the following works:
> > 
> >     cryptsetup luksFormat /dev/$partition --cipher='capi:adiantum(xchacha12,aes)-plain64' --key-size 256
> > 
> > However, when possible we'd like people to use 4K sectors for better
> > performance, which I understand requires using the LUKS2 format along with
> > cryptsetup v2.0.0+ and Linux v4.12+.  But the following does *not* work:
> > 
> >     cryptsetup luksFormat /dev/$partition --cipher='capi:adiantum(xchacha12,aes)-plain64' --key-size 256 --type luks2 --sector-size 4096
> 
> Hi Eric,
> 
> actually I planned to test it and then reply to these patches with example cryptsetup
> commands, but did not have time for it yet.
> So thanks for a reminder ;-)
> 
> Recent cryptsetup supports sector-size even for plain device.
> 
> You actually do not need to use capi: prefix, Adiantum is a composition,
> so "xchacha20,aes-adiantum-plain64" works as well (and it should work even for old cryptsetup).
> (It is ugly, but it should be compatible.)

Okay, good to know the "capi:" prefix is not needed.
That makes things slightly easier for us.

> 
> # cryptsetup open --type plain -c xchacha20,aes-adiantum-plain64 -s 256 --sector-size 4096 /dev/sdb test
> 
> For LUKS and benchmark, Adiantum need to use 32 bytes IV. And we have these parameter,
> unfortunately, hardcoded...
> (I guess there is already a way how to get this dynamically from userspace crypto API now.)
> 
> So, I already added patch to devel branch patch for benchmark to support Adiantum few days ago
> https://gitlab.com/cryptsetup/cryptsetup/commit/bce567db461e558af7d735c694a50146db899709
> 
> This allows trivial benchmark (but it just encrypts one big blob of data):
> 
> #  cryptsetup benchmark -c xchacha20,aes-adiantum -s 256
> # Tests are approximate using memory only (no storage IO).
> #            Algorithm |       Key |      Encryption |      Decryption
> xchacha20,aes-adiantum        256b       146.6 MiB/s       148.0 MiB/s
> ...
> 
> # ./cryptsetup benchmark -c xchacha12,aes-adiantum -s 256
> xchacha12,aes-adiantum        256b       181.7 MiB/s       184.6 MiB/s

Note that Adiantum benchmarks on x86 are misleading at the moment, since the
initial kernel patchset doesn't include SSE2 and AVX2 optimized XChaCha and
NHPoly1305.  To start, only C and arm32 NEON implementations are included.
Hence, on x86 Adiantum will appear much slower than it should be.  But I'm
planning to add the x86 and arm64 implementations, so it will get much faster.

> 
> For LUKS2, we need a similar change to cryptoAPI IV size (unfortunately it does not
> fallback to old keyslot handling, so LUKS2 does not work currently now).
> 
> I quickly added a workaround that fallbacks to default keyslot encryption for keyslots
> in this case
> https://gitlab.com/cryptsetup/cryptsetup/commit/29e87add5aac9d5eb0087881146988d9c4280915
> 
> then you can use LUKS2
> # cryptsetup luksFormat --type luks2 --sector-size 4096 -c xchacha20,aes-adiantum-plain64 -s 256 /dev/sdb
> 
> (Example above will encrypt keyslots with AES-XTS and use Aviantum for data only.)
> 
> So, unfortunately yes, we need some small changes in cryptsetup for LUKS;
> plain mode should work out of the box (with the syntax above).

I think that when using AF_ALG, cryptsetup should get the IV size from
/proc/crypto, or else have it hardcoded that "adiantum" uses 32-byte IVs.
(Actually Adiantum can formally can use any size IV, but we had to choose a
fixed size for Linux's crypto API.)

Getting the IV size via CRYPTO_MSG_GETALG via NETLINK_CRYPTO is also an option,
but that requires the kconfig option CONFIG_CRYPTO_USER which isn't guaranteed
to be enabled even if CONFIG_CRYPTO_USER_API_SKCIPHER is.

Also: why is cryptsetup's default keyslot encryption AES-128-XTS instead of
AES-256-XTS?  People can choose a cipher with a 256-bit key strength such as
AES-256-XTS or Adiantum, so the keyslots should use at least that strength too.

Thanks,

- Eric

Milan Broz Nov. 19, 2018, 8:05 p.m. UTC | #15

Hi,

On 19/11/2018 20:28, Eric Biggers wrote:
> Note that Adiantum benchmarks on x86 are misleading at the moment, since the
> initial kernel patchset doesn't include SSE2 and AVX2 optimized XChaCha and
> NHPoly1305.  To start, only C and arm32 NEON implementations are included.
> Hence, on x86 Adiantum will appear much slower than it should be.  But I'm
> planning to add the x86 and arm64 implementations, so it will get much faster.

The posted benchmark was just an example (it was 32bit virtual machine on my
old laptop so numbers are misleading).

If Adiantum is going to be merged, I expect it can be used as an alternative
even on x86, so I expect more optimizations.

...
> I think that when using AF_ALG, cryptsetup should get the IV size from
> /proc/crypto, or else have it hardcoded that "adiantum" uses 32-byte IVs.
> (Actually Adiantum can formally can use any size IV, but we had to choose a
> fixed size for Linux's crypto API.)

I do not want to parse /proc/crypto (it needs to load the module first anyway)
and proper API was not yet here when I wrote this code (I think we were the first
real user of userspace crypto api...)

> Getting the IV size via CRYPTO_MSG_GETALG via NETLINK_CRYPTO is also an option,
> but that requires the kconfig option CONFIG_CRYPTO_USER which isn't guaranteed
> to be enabled even if CONFIG_CRYPTO_USER_API_SKCIPHER is.

Yes. For now, I hardcode Adiantum IV size in cryptsetup and later we will try to
find a more generic way.

> Also: why is cryptsetup's default keyslot encryption AES-128-XTS instead of
> AES-256-XTS?  People can choose a cipher with a 256-bit key strength such as
> AES-256-XTS or Adiantum, so the keyslots should use at least that strength too.

It was inherited from 256bit default key (so 2xAES-128 in XTS).
It is still the default for LUKS1, but we should perhaps change it to double key
it for XTS mode (at least for fallback keyslot encryption).

Anyway, we will release cryptsetup 2.0.6 very soon to fix one problem
in LUKS2, so I'll add the Adiantum IV size there as well so people can play with it.

Thanks,
Milan

p.s.
Reading the discussion about Zinc/Adiantum - I would perhaps prefer to merge
Adiantum first (if it is ready).
It is a new feature, I see it as useful cipher alternative for dm-crypt and it can be
esily backported without Zinc to older kernels (I am testing it actually this way).

Jason A. Donenfeld Nov. 19, 2018, 8:30 p.m. UTC | #16

On Mon, Nov 19, 2018 at 9:05 PM Milan Broz <gmazyland@gmail.com> wrote:
> p.s.
> Reading the discussion about Zinc/Adiantum - I would perhaps prefer to merge
> Adiantum first (if it is ready).
> It is a new feature, I see it as useful cipher alternative for dm-crypt and it can be
> esily backported without Zinc to older kernels (I am testing it actually this way).

Seems reasonable to me.

Jason

[RFC,v2,00/12] crypto: Adiantum support

Message

Comments