[RFC,0/2] Eliminate the no-SIMD en/decryption fallbacks on x86

Message ID	20250220051325.340691-1-ebiggers@kernel.org (mailing list archive)
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CBA3A930; Thu, 20 Feb 2025 05:16:26 +0000 (UTC) From: Eric Biggers <ebiggers@kernel.org> To: x86@kernel.org Cc: linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel <ardb@kernel.org>, Ben Greear <greearb@candelatech.com>, Xiao Liang <shaw.leon@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, "Jason A . Donenfeld" <Jason@zx2c4.com> Subject: [RFC PATCH 0/2] Eliminate the no-SIMD en/decryption fallbacks on x86 Date: Wed, 19 Feb 2025 21:13:23 -0800 Message-ID: <20250220051325.340691-1-ebiggers@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Eliminate the no-SIMD en/decryption fallbacks on x86 \| expand [RFC,0/2] Eliminate the no-SIMD en/decryption fallbacks on x86 [RFC,1/2] x86/fpu: make kernel-mode FPU reliably usable in softirqs [RFC,2/2] crypto: x86 - stop using the SIMD helper

Message ID

20250220051325.340691-1-ebiggers@kernel.org (mailing list archive)

Headers

From: Eric Biggers <ebiggers@kernel.org>
To: x86@kernel.org
Cc: linux-crypto@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Ard Biesheuvel <ardb@kernel.org>,
	Ben Greear <greearb@candelatech.com>,
	Xiao Liang <shaw.leon@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>
Subject: [RFC PATCH 0/2] Eliminate the no-SIMD en/decryption fallbacks on x86
Date: Wed, 19 Feb 2025 21:13:23 -0800
Message-ID: <20250220051325.340691-1-ebiggers@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Eliminate the no-SIMD en/decryption fallbacks on x86 | expand

Message

Eric Biggers Feb. 20, 2025, 5:13 a.m. UTC

The patchset can also be retrieved from:
 
    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git x86-softirq-fpu-fix-v1

This patchset fixes a longstanding issue where kernel-mode FPU (i.e.,
SIMD) was not reliably usable in softirqs in x86, which was creating the
need for a fallback.  The fallback was really bad for performance, and
it even hurt performance for users that never encountered the edge case
where kernel-mode FPU was not usable.

This patchset aligns x86 with other architectures such as arm, arm64,
and riscv by making kernel-mode FPU work in softirqs reliably.  There
are a few possible ways to achieve that, and for now I just went with
the simplest way; see patch 1 for details.

Patch 2 eliminates all uses of the "crypto SIMD helper" from x86, as
patch 1 makes it unnecessary.  For the RFC it is just one big patch;
I'll probably split patch 2 up if this progresses past RFC status.

Performance results have been positive.  All en/decryption is now
slightly faster on x86, as it no longer take a detour through
crypto/simd.c.  I get a 7% or 23% improvement for AES-XTS, for example.

I also benchmarked bidirectional IPsec, which has been claimed to often
hit the edge case where kernel-mode FPU was previously not usable in
softirq context.  Ultimately, I was not actually able to reproduce that
edge case being reached unless I reduced the number of CPUs to 1, in
which case it then started being occasionally reached.  Regardless, even
without that case being reached, IPsec throughput still improved by 2%.
In situations where that case was being reached, or where users required
a synchronous algorithm, a much larger improvement should be seen.

Eric Biggers (2):
  x86/fpu: make kernel-mode FPU reliably usable in softirqs
  crypto: x86 - stop using the SIMD helper

 arch/x86/crypto/Kconfig                    |  14 --
 arch/x86/crypto/aegis128-aesni-glue.c      |  13 +-
 arch/x86/crypto/aesni-intel_glue.c         | 168 ++++++++-------------
 arch/x86/crypto/aria_aesni_avx2_glue.c     |  22 +--
 arch/x86/crypto/aria_aesni_avx_glue.c      |  20 +--
 arch/x86/crypto/aria_gfni_avx512_glue.c    |  22 +--
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  21 +--
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  21 +--
 arch/x86/crypto/cast5_avx_glue.c           |  21 +--
 arch/x86/crypto/cast6_avx_glue.c           |  20 +--
 arch/x86/crypto/serpent_avx2_glue.c        |  21 +--
 arch/x86/crypto/serpent_avx_glue.c         |  21 +--
 arch/x86/crypto/serpent_sse2_glue.c        |  21 +--
 arch/x86/crypto/sm4_aesni_avx2_glue.c      |  30 ++--
 arch/x86/crypto/sm4_aesni_avx_glue.c       |  30 ++--
 arch/x86/crypto/twofish_avx_glue.c         |  21 +--
 arch/x86/include/asm/fpu/api.h             |  17 +--
 arch/x86/kernel/fpu/core.c                 |  37 ++---
 18 files changed, 180 insertions(+), 360 deletions(-)


base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
prerequisite-patch-id: ec1feea7e6f4d03e4e4c64c492197b89c957611a

Comments

Herbert Xu Feb. 21, 2025, 3:53 a.m. UTC | #1

Eric Biggers <ebiggers@kernel.org> wrote:
> The patchset can also be retrieved from:
> 
>    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git x86-softirq-fpu-fix-v1
> 
> This patchset fixes a longstanding issue where kernel-mode FPU (i.e.,
> SIMD) was not reliably usable in softirqs in x86, which was creating the
> need for a fallback.  The fallback was really bad for performance, and
> it even hurt performance for users that never encountered the edge case
> where kernel-mode FPU was not usable.

Great work!
 
> I also benchmarked bidirectional IPsec, which has been claimed to often
> hit the edge case where kernel-mode FPU was previously not usable in
> softirq context.  Ultimately, I was not actually able to reproduce that
> edge case being reached unless I reduced the number of CPUs to 1, in
> which case it then started being occasionally reached.  Regardless, even
> without that case being reached, IPsec throughput still improved by 2%.
> In situations where that case was being reached, or where users required
> a synchronous algorithm, a much larger improvement should be seen.

You would need a situation where your CPU is maxed out by your
bandwidth, so on a physical box these days you would need 10GbE
at the minimum.

However, I used to be able to easily reproduce this using virtualisation
because there the bandwidth is essentially unlimited.  So perhaps
a KVM guest with a single CPU doing bidirection IPsec to the host
should be enough to reproduce this case.

Thanks,