diff mbox

[v2,0/2] crypto: x86/aes-ni-xts - recover and improve performance

Message ID 20201231164155.21792-1-ardb@kernel.org (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show

Commit Message

Ard Biesheuvel Dec. 31, 2020, 4:41 p.m. UTC
The AES-NI implementation of XTS was impacted significantly by the retpoline
changes, which is due to the fact that both its asm helper and the chaining
mode glue library use indirect calls for processing small quantitities of
data

So let's fix this, by:
- creating a minimal, backportable fix that recovers most of the performance,
  by reducing the number of indirect calls substantially;
- for future releases, rewrite the XTS implementation completely, and replace
  the glue helper with a core asm routine that is more flexible, making the C
  code wrapper much more straight-forward.

This results in a substantial performance improvement: around ~2x for 1k and
4k blocks, and more than 3x for ~1k blocks that require ciphertext stealing
(benchmarked using tcrypt using 1420 byte blocks - full results below)

It also allows us to enable the same driver for i386.

Changes since v1:
- use 'test LEN, LEN' instead of 'cmp $0, LEN' to get shorter opcodes, as
  suggested by Uros
- rebase to get rid of false dependencies on other changes that are in flight.

NOTE: patch #2 depends on [0], which provides the permutation table used for
      ciphertext stealing

[0] https://lore.kernel.org/linux-crypto/20201207233402.17472-1-ardb@kernel.org/

Cc: Megha Dey <megha.dey@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Uros Bizjak <ubizjak@gmail.com>

Ard Biesheuvel (2):
  crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
  crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper

 arch/x86/crypto/aesni-intel_asm.S  | 353 ++++++++++++++++----
 arch/x86/crypto/aesni-intel_glue.c | 229 +++++++------
 crypto/Kconfig                     |   1 -
 3 files changed, 411 insertions(+), 172 deletions(-)

Comments

Herbert Xu Jan. 8, 2021, 4:42 a.m. UTC | #1
On Thu, Dec 31, 2020 at 05:41:53PM +0100, Ard Biesheuvel wrote:
> The AES-NI implementation of XTS was impacted significantly by the retpoline
> changes, which is due to the fact that both its asm helper and the chaining
> mode glue library use indirect calls for processing small quantitities of
> data
> 
> So let's fix this, by:
> - creating a minimal, backportable fix that recovers most of the performance,
>   by reducing the number of indirect calls substantially;
> - for future releases, rewrite the XTS implementation completely, and replace
>   the glue helper with a core asm routine that is more flexible, making the C
>   code wrapper much more straight-forward.
> 
> This results in a substantial performance improvement: around ~2x for 1k and
> 4k blocks, and more than 3x for ~1k blocks that require ciphertext stealing
> (benchmarked using tcrypt using 1420 byte blocks - full results below)
> 
> It also allows us to enable the same driver for i386.
> 
> Changes since v1:
> - use 'test LEN, LEN' instead of 'cmp $0, LEN' to get shorter opcodes, as
>   suggested by Uros
> - rebase to get rid of false dependencies on other changes that are in flight.
> 
> NOTE: patch #2 depends on [0], which provides the permutation table used for
>       ciphertext stealing
> 
> [0] https://lore.kernel.org/linux-crypto/20201207233402.17472-1-ardb@kernel.org/
> 
> Cc: Megha Dey <megha.dey@intel.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Uros Bizjak <ubizjak@gmail.com>
> 
> Ard Biesheuvel (2):
>   crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
>   crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper
> 
>  arch/x86/crypto/aesni-intel_asm.S  | 353 ++++++++++++++++----
>  arch/x86/crypto/aesni-intel_glue.c | 229 +++++++------
>  crypto/Kconfig                     |   1 -
>  3 files changed, 411 insertions(+), 172 deletions(-)

All applied.  Thanks.
diff mbox

Patch

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 2054cd6f55cf..ac8b0d087927 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -994,12 +994,13 @@  static struct skcipher_alg aesni_skciphers[] = {
                        .cra_driver_name        = "__xts-aes-aesni",
                        .cra_priority           = 401,
                        .cra_flags              = CRYPTO_ALG_INTERNAL,
-                       .cra_blocksize          = AES_BLOCK_SIZE,
+                       .cra_blocksize          = 1,//AES_BLOCK_SIZE,
                        .cra_ctxsize            = XTS_AES_CTX_SIZE,
                        .cra_module             = THIS_MODULE,
                },
                .min_keysize    = 2 * AES_MIN_KEY_SIZE,
                .max_keysize    = 2 * AES_MAX_KEY_SIZE,
+               .chunksize      = AES_BLOCK_SIZE,
                .ivsize         = AES_BLOCK_SIZE,
                .setkey         = xts_aesni_setkey,
                .encrypt        = xts_encrypt,
diff --git a/crypto/xts.c b/crypto/xts.c
index 6c12f30dbdd6..7ade682f1241 100644
--- a/crypto/xts.c
+++ b/crypto/xts.c
@@ -416,11 +416,12 @@  static int xts_create(struct crypto_template *tmpl, struct rtattr **tb)
                goto err_free_inst;

        inst->alg.base.cra_priority = alg->base.cra_priority;
-       inst->alg.base.cra_blocksize = XTS_BLOCK_SIZE;
+       inst->alg.base.cra_blocksize = 1,//XTS_BLOCK_SIZE;
        inst->alg.base.cra_alignmask = alg->base.cra_alignmask |
                                       (__alignof__(u64) - 1);

        inst->alg.ivsize = XTS_BLOCK_SIZE;
+       inst->alg.chunksize = XTS_BLOCK_SIZE;
        inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg) * 2;
        inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg) * 2;