Message ID | 20210526100729.12939-6-ardb@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Herbert Xu |
Headers | show |
Series | running kernel mode SIMD with softirqs disabled | expand |
On Wed, May 26, 2021 at 12:07:28PM +0200, Ard Biesheuvel wrote: > AES-CCM (as used in WPA2 CCMP, for instance) typically involves > authenticate-only data, and operates on a single network packet, and so > the common case is for the authenticate, en/decrypt and finalize SIMD > helpers to all be called exactly once in sequence. Since > kernel_neon_end() now involves manipulation of the preemption state as > well as the softirq mask state, let's reduce the number of times we are > forced to call it to only once if we are handling this common case. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/arm64/crypto/aes-ce-ccm-core.S | 1 + > arch/arm64/crypto/aes-ce-ccm-glue.c | 74 +++++++++++--------- > 2 files changed, 43 insertions(+), 32 deletions(-) > > diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S > index 99a028e298ed..8adff299fcd3 100644 > --- a/arch/arm64/crypto/aes-ce-ccm-core.S > +++ b/arch/arm64/crypto/aes-ce-ccm-core.S > @@ -124,6 +124,7 @@ SYM_FUNC_START(ce_aes_ccm_final) > SYM_FUNC_END(ce_aes_ccm_final) > > .macro aes_ccm_do_crypt,enc > + cbz x2, 5f > ldr x8, [x6, #8] /* load lower ctr */ > ld1 {v0.16b}, [x5] /* load mac */ > CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ > diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c > index 54bd2494a000..98159f2c49ae 100644 > --- a/arch/arm64/crypto/aes-ce-ccm-glue.c > +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c > @@ -97,10 +97,8 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) > static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], > u32 abytes, u32 *macp) > { > - kernel_neon_begin(); > ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, > num_rounds(key)); > - kernel_neon_end(); > } [...] > + if (req->assoclen) > + ccm_calculate_auth_mac(req, mac); > + This still makes all the associated data be processed under a single kernel_neon_begin() / kernel_neon_end() pair, even if there is a large amount of it. Shouldn't it be limited to a reasonable amount at a time, like 4K? This sort of thing has been considered a bug before, e.g. see commit 706024a52c6 ("crypto: arch/lib - limit simd usage to 4k chunks"). You could do the entire CCM operation under a single pair as long as there isn't more than 4K of associated data. - Eric
On Wed, 26 May 2021 at 19:14, Eric Biggers <ebiggers@kernel.org> wrote: > > On Wed, May 26, 2021 at 12:07:28PM +0200, Ard Biesheuvel wrote: > > AES-CCM (as used in WPA2 CCMP, for instance) typically involves > > authenticate-only data, and operates on a single network packet, and so > > the common case is for the authenticate, en/decrypt and finalize SIMD > > helpers to all be called exactly once in sequence. Since > > kernel_neon_end() now involves manipulation of the preemption state as > > well as the softirq mask state, let's reduce the number of times we are > > forced to call it to only once if we are handling this common case. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/arm64/crypto/aes-ce-ccm-core.S | 1 + > > arch/arm64/crypto/aes-ce-ccm-glue.c | 74 +++++++++++--------- > > 2 files changed, 43 insertions(+), 32 deletions(-) > > > > diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S > > index 99a028e298ed..8adff299fcd3 100644 > > --- a/arch/arm64/crypto/aes-ce-ccm-core.S > > +++ b/arch/arm64/crypto/aes-ce-ccm-core.S > > @@ -124,6 +124,7 @@ SYM_FUNC_START(ce_aes_ccm_final) > > SYM_FUNC_END(ce_aes_ccm_final) > > > > .macro aes_ccm_do_crypt,enc > > + cbz x2, 5f > > ldr x8, [x6, #8] /* load lower ctr */ > > ld1 {v0.16b}, [x5] /* load mac */ > > CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ > > diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c > > index 54bd2494a000..98159f2c49ae 100644 > > --- a/arch/arm64/crypto/aes-ce-ccm-glue.c > > +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c > > @@ -97,10 +97,8 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) > > static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], > > u32 abytes, u32 *macp) > > { > > - kernel_neon_begin(); > > ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, > > num_rounds(key)); > > - kernel_neon_end(); > > } > [...] > > + if (req->assoclen) > > + ccm_calculate_auth_mac(req, mac); > > + > > This still makes all the associated data be processed under a single > kernel_neon_begin() / kernel_neon_end() pair, even if there is a large amount of > it. Shouldn't it be limited to a reasonable amount at a time, like 4K? > This sort of thing has been considered a bug before, e.g. see > commit 706024a52c6 ("crypto: arch/lib - limit simd usage to 4k chunks"). > > You could do the entire CCM operation under a single pair as long as there isn't > more than 4K of associated data. > Good point. I'll add a separate patch for that.
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 99a028e298ed..8adff299fcd3 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -124,6 +124,7 @@ SYM_FUNC_START(ce_aes_ccm_final) SYM_FUNC_END(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc + cbz x2, 5f ldr x8, [x6, #8] /* load lower ctr */ ld1 {v0.16b}, [x5] /* load mac */ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index 54bd2494a000..98159f2c49ae 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -97,10 +97,8 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], u32 abytes, u32 *macp) { - kernel_neon_begin(); ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, num_rounds(key)); - kernel_neon_end(); } static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) @@ -157,35 +155,41 @@ static int ccm_encrypt(struct aead_request *req) if (err) return err; - if (req->assoclen) - ccm_calculate_auth_mac(req, mac); - /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_encrypt(&walk, req, false); + if (unlikely(err)) + return err; - while (walk.nbytes) { + kernel_neon_begin(); + + if (req->assoclen) + ccm_calculate_auth_mac(req, mac); + + do { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; - kernel_neon_begin(); ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); - kernel_neon_end(); - err = skcipher_walk_done(&walk, tail); - } - if (!err) { - kernel_neon_begin(); - ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); + if (walk.nbytes == walk.total) + ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); + kernel_neon_end(); - } - if (err) - return err; + + if (walk.nbytes) { + err = skcipher_walk_done(&walk, tail); + if (unlikely(err)) + return err; + if (unlikely(walk.nbytes)) + kernel_neon_begin(); + } + } while (walk.nbytes); /* copy authtag to end of dst */ scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen, @@ -209,35 +213,41 @@ static int ccm_decrypt(struct aead_request *req) if (err) return err; - if (req->assoclen) - ccm_calculate_auth_mac(req, mac); - /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_decrypt(&walk, req, false); + if (unlikely(err)) + return err; + + kernel_neon_begin(); - while (walk.nbytes) { + if (req->assoclen) + ccm_calculate_auth_mac(req, mac); + + do { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; - kernel_neon_begin(); ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - walk.nbytes - tail, ctx->key_enc, - num_rounds(ctx), mac, walk.iv); - kernel_neon_end(); + walk.nbytes - tail, ctx->key_enc, + num_rounds(ctx), mac, walk.iv); + + if (walk.nbytes == walk.total) + ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - err = skcipher_walk_done(&walk, tail); - } - if (!err) { - kernel_neon_begin(); - ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); kernel_neon_end(); - } - if (err) - return err; + + if (walk.nbytes) { + err = skcipher_walk_done(&walk, tail); + if (unlikely(err)) + return err; + if (unlikely(walk.nbytes)) + kernel_neon_begin(); + } + } while (walk.nbytes); /* compare calculated auth tag with the stored one */ scatterwalk_map_and_copy(buf, req->src,
AES-CCM (as used in WPA2 CCMP, for instance) typically involves authenticate-only data, and operates on a single network packet, and so the common case is for the authenticate, en/decrypt and finalize SIMD helpers to all be called exactly once in sequence. Since kernel_neon_end() now involves manipulation of the preemption state as well as the softirq mask state, let's reduce the number of times we are forced to call it to only once if we are handling this common case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm64/crypto/aes-ce-ccm-core.S | 1 + arch/arm64/crypto/aes-ce-ccm-glue.c | 74 +++++++++++--------- 2 files changed, 43 insertions(+), 32 deletions(-)