Message ID | 20210720034642.19230-1-tianjia.zhang@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce x86 assembler accelerated implementation for SM4 algorithm | expand |
On Tue, Jul 20, 2021 at 11:46:38AM +0800, Tianjia Zhang wrote: > This patchset extracts the public SM4 algorithm as a separate library, > At the same time, the acceleration implementation of SM4 in arm64 was > adjusted to adapt to this SM4 library. Then introduces an accelerated > implementation of the instruction set on x86_64. > > This optimization supports the four modes of SM4, ECB, CBC, CFB, and > CTR. Since CBC and CFB do not support multiple block parallel > encryption, the optimization effect is not obvious. And all selftests > have passed already. > > The main algorithm implementation comes from SM4 AES-NI work by > libgcrypt and Markku-Juhani O. Saarinen at: > https://github.com/mjosaarinen/sm4ni > > Benchmark on Intel Xeon Cascadelake, the data comes from the mode 218 > and mode 518 of tcrypt. The abscissas are blocks of different lengths. > The data is tabulated and the unit is Mb/s: > > sm4-generic | 16 64 128 256 1024 1420 4096 > ECB enc | 40.99 46.50 48.05 48.41 49.20 49.25 49.28 > ECB dec | 41.07 46.99 48.15 48.67 49.20 49.25 49.29 > CBC enc | 37.71 45.28 46.77 47.60 48.32 48.37 48.40 > CBC dec | 36.48 44.82 46.43 47.45 48.23 48.30 48.36 > CFB enc | 37.94 44.84 46.12 46.94 47.57 47.46 47.68 > CFB dec | 37.50 42.84 43.74 44.37 44.85 44.80 44.96 > CTR enc | 39.20 45.63 46.75 47.49 48.09 47.85 48.08 > CTR dec | 39.64 45.70 46.72 47.47 47.98 47.88 48.06 > sm4-aesni-avx > ECB enc | 33.75 134.47 221.64 243.43 264.05 251.58 258.13 > ECB dec | 34.02 134.92 223.11 245.14 264.12 251.04 258.33 > CBC enc | 38.85 46.18 47.67 48.34 49.00 48.96 49.14 > CBC dec | 33.54 131.29 223.88 245.27 265.50 252.41 263.78 > CFB enc | 38.70 46.10 47.58 48.29 49.01 48.94 49.19 > CFB dec | 32.79 128.40 223.23 244.87 265.77 253.31 262.79 > CTR enc | 32.58 122.23 220.29 241.16 259.57 248.32 256.69 > CTR dec | 32.81 122.47 218.99 241.54 258.42 248.58 256.61 > > --- > v3 changes: > * Remove single block algorithm that does not greatly improve performance > * Remove accelerated for sm4 key expand, which is not performance-critical > * Fix the warning on arm64/sm4-ce > > v2 changes: > * SM4 library functions use "sm4_" prefix instead of "crypto_" prefix > * sm4-aesni-avx supports accelerated implementation of four specific modes > * tcrypt benchmark supports sm4-aesni-avx > * fixes of other reviews > > > Tianjia Zhang (4): > crypto: sm4 - create SM4 library based on sm4 generic code > crypto: arm64/sm4-ce - Make dependent on sm4 library instead of > sm4-generic > crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation > crypto: tcrypt - add the asynchronous speed test for SM4 > > arch/arm64/crypto/Kconfig | 2 +- > arch/arm64/crypto/sm4-ce-glue.c | 20 +- > arch/x86/crypto/Makefile | 3 + > arch/x86/crypto/sm4-aesni-avx-asm_64.S | 589 +++++++++++++++++++++++++ > arch/x86/crypto/sm4_aesni_avx_glue.c | 459 +++++++++++++++++++ > crypto/Kconfig | 22 + > crypto/sm4_generic.c | 180 +------- > crypto/tcrypt.c | 26 +- > include/crypto/sm4.h | 25 +- > lib/crypto/Kconfig | 3 + > lib/crypto/Makefile | 3 + > lib/crypto/sm4.c | 176 ++++++++ > 12 files changed, 1330 insertions(+), 178 deletions(-) > create mode 100644 arch/x86/crypto/sm4-aesni-avx-asm_64.S > create mode 100644 arch/x86/crypto/sm4_aesni_avx_glue.c > create mode 100644 lib/crypto/sm4.c All applied. Thanks.