Message ID | 20201231164155.21792-1-ardb@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
On Thu, Dec 31, 2020 at 05:41:53PM +0100, Ard Biesheuvel wrote: > The AES-NI implementation of XTS was impacted significantly by the retpoline > changes, which is due to the fact that both its asm helper and the chaining > mode glue library use indirect calls for processing small quantitities of > data > > So let's fix this, by: > - creating a minimal, backportable fix that recovers most of the performance, > by reducing the number of indirect calls substantially; > - for future releases, rewrite the XTS implementation completely, and replace > the glue helper with a core asm routine that is more flexible, making the C > code wrapper much more straight-forward. > > This results in a substantial performance improvement: around ~2x for 1k and > 4k blocks, and more than 3x for ~1k blocks that require ciphertext stealing > (benchmarked using tcrypt using 1420 byte blocks - full results below) > > It also allows us to enable the same driver for i386. > > Changes since v1: > - use 'test LEN, LEN' instead of 'cmp $0, LEN' to get shorter opcodes, as > suggested by Uros > - rebase to get rid of false dependencies on other changes that are in flight. > > NOTE: patch #2 depends on [0], which provides the permutation table used for > ciphertext stealing > > [0] https://lore.kernel.org/linux-crypto/20201207233402.17472-1-ardb@kernel.org/ > > Cc: Megha Dey <megha.dey@intel.com> > Cc: Eric Biggers <ebiggers@kernel.org> > Cc: Herbert Xu <herbert@gondor.apana.org.au> > Cc: Uros Bizjak <ubizjak@gmail.com> > > Ard Biesheuvel (2): > crypto: x86/aes-ni-xts - use direct calls to and 4-way stride > crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper > > arch/x86/crypto/aesni-intel_asm.S | 353 ++++++++++++++++---- > arch/x86/crypto/aesni-intel_glue.c | 229 +++++++------ > crypto/Kconfig | 1 - > 3 files changed, 411 insertions(+), 172 deletions(-) All applied. Thanks.
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 2054cd6f55cf..ac8b0d087927 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -994,12 +994,13 @@ static struct skcipher_alg aesni_skciphers[] = { .cra_driver_name = "__xts-aes-aesni", .cra_priority = 401, .cra_flags = CRYPTO_ALG_INTERNAL, - .cra_blocksize = AES_BLOCK_SIZE, + .cra_blocksize = 1,//AES_BLOCK_SIZE, .cra_ctxsize = XTS_AES_CTX_SIZE, .cra_module = THIS_MODULE, }, .min_keysize = 2 * AES_MIN_KEY_SIZE, .max_keysize = 2 * AES_MAX_KEY_SIZE, + .chunksize = AES_BLOCK_SIZE, .ivsize = AES_BLOCK_SIZE, .setkey = xts_aesni_setkey, .encrypt = xts_encrypt, diff --git a/crypto/xts.c b/crypto/xts.c index 6c12f30dbdd6..7ade682f1241 100644 --- a/crypto/xts.c +++ b/crypto/xts.c @@ -416,11 +416,12 @@ static int xts_create(struct crypto_template *tmpl, struct rtattr **tb) goto err_free_inst; inst->alg.base.cra_priority = alg->base.cra_priority; - inst->alg.base.cra_blocksize = XTS_BLOCK_SIZE; + inst->alg.base.cra_blocksize = 1,//XTS_BLOCK_SIZE; inst->alg.base.cra_alignmask = alg->base.cra_alignmask | (__alignof__(u64) - 1); inst->alg.ivsize = XTS_BLOCK_SIZE; + inst->alg.chunksize = XTS_BLOCK_SIZE; inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg) * 2; inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg) * 2;