diff mbox

arm64: crypto: increase AES interleave to 4x

Message ID 1424366716-30439-1-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show

Commit Message

Ard Biesheuvel Feb. 19, 2015, 5:25 p.m. UTC
This patch increases the interleave factor for parallel AES modes
to 4x. This improves performance on Cortex-A57 by ~35%. This is
due to the 3-cycle latency of AES instructions on the A57's
relatively deep pipeline (compared to Cortex-A53 where the AES
instruction latency is only 2 cycles).

At the same time, disable inline expansion of the core AES functions,
as the performance benefit of this feature is negligible.

  Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):

  Baseline (2x interleave, inline expansion)
  ------------------------------------------
  testing speed of async cbc(aes) (cbc-aes-ce) decryption
  test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
  test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds

  This patch (4x interleave, no inline expansion)
  -----------------------------------------------
  testing speed of async cbc(aes) (cbc-aes-ce) decryption
  test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
  test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Will Deacon Feb. 20, 2015, 3:55 p.m. UTC | #1
On Thu, Feb 19, 2015 at 05:25:16PM +0000, Ard Biesheuvel wrote:
> This patch increases the interleave factor for parallel AES modes
> to 4x. This improves performance on Cortex-A57 by ~35%. This is
> due to the 3-cycle latency of AES instructions on the A57's
> relatively deep pipeline (compared to Cortex-A53 where the AES
> instruction latency is only 2 cycles).
> 
> At the same time, disable inline expansion of the core AES functions,
> as the performance benefit of this feature is negligible.
> 
>   Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
> 
>   Baseline (2x interleave, inline expansion)
>   ------------------------------------------
>   testing speed of async cbc(aes) (cbc-aes-ce) decryption
>   test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
>   test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
> 
>   This patch (4x interleave, no inline expansion)
>   -----------------------------------------------
>   testing speed of async cbc(aes) (cbc-aes-ce) decryption
>   test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
>   test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds

Fine by me. Shall I queue this via the arm64 tree?

Will

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/crypto/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index 5720608c50b1..abb79b3cfcfe 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>  
> -AFLAGS_aes-ce.o		:= -DINTERLEAVE=2 -DINTERLEAVE_INLINE
> +AFLAGS_aes-ce.o		:= -DINTERLEAVE=4
>  AFLAGS_aes-neon.o	:= -DINTERLEAVE=4
>  
>  CFLAGS_aes-glue-ce.o	:= -DUSE_V8_CRYPTO_EXTENSIONS
> -- 
> 1.8.3.2
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ard Biesheuvel Feb. 20, 2015, 4:16 p.m. UTC | #2
On 20 February 2015 at 15:55, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Feb 19, 2015 at 05:25:16PM +0000, Ard Biesheuvel wrote:
>> This patch increases the interleave factor for parallel AES modes
>> to 4x. This improves performance on Cortex-A57 by ~35%. This is
>> due to the 3-cycle latency of AES instructions on the A57's
>> relatively deep pipeline (compared to Cortex-A53 where the AES
>> instruction latency is only 2 cycles).
>>
>> At the same time, disable inline expansion of the core AES functions,
>> as the performance benefit of this feature is negligible.
>>
>>   Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
>>
>>   Baseline (2x interleave, inline expansion)
>>   ------------------------------------------
>>   testing speed of async cbc(aes) (cbc-aes-ce) decryption
>>   test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
>>   test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
>>
>>   This patch (4x interleave, no inline expansion)
>>   -----------------------------------------------
>>   testing speed of async cbc(aes) (cbc-aes-ce) decryption
>>   test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
>>   test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
>
> Fine by me. Shall I queue this via the arm64 tree?
>

Yes, please.



>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/crypto/Makefile | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index 5720608c50b1..abb79b3cfcfe 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>>
>> -AFLAGS_aes-ce.o              := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
>> +AFLAGS_aes-ce.o              := -DINTERLEAVE=4
>>  AFLAGS_aes-neon.o    := -DINTERLEAVE=4
>>
>>  CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
>> --
>> 1.8.3.2
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 5720608c50b1..abb79b3cfcfe 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,7 +29,7 @@  aes-ce-blk-y := aes-glue-ce.o aes-ce.o
 obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
 aes-neon-blk-y := aes-glue-neon.o aes-neon.o
 
-AFLAGS_aes-ce.o		:= -DINTERLEAVE=2 -DINTERLEAVE_INLINE
+AFLAGS_aes-ce.o		:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o	:= -DINTERLEAVE=4
 
 CFLAGS_aes-glue-ce.o	:= -DUSE_V8_CRYPTO_EXTENSIONS