diff mbox series

[1/3] crypto: x86/nh-avx2 - add missing vzeroupper

Message ID 20240406002610.37202-2-ebiggers@kernel.org (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show
Series crypto: x86 - add missing vzeroupper instructions | expand

Commit Message

Eric Biggers April 6, 2024, 12:26 a.m. UTC
From: Eric Biggers <ebiggers@google.com>

Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it.  This is necessary to avoid reducing the performance of SSE
code.

Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/nh-avx2-x86_64.S | 1 +
 1 file changed, 1 insertion(+)

Comments

Tim Chen April 9, 2024, 10:42 p.m. UTC | #1
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it.  This is necessary to avoid reducing the performance of SSE
> > code.
> > 
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >  
> >  	vpaddq		T5, T4, T4
> >  	vpaddq		T1, T0, T0
> >  	vpaddq		T4, T0, T0
> >  	vmovdqu		T0, (HASH)
> > +	vzeroupper
> >  	RET
> >  SYM_FUNC_END(nh_avx2)

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
diff mbox series

Patch

diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@  SYM_TYPED_FUNC_START(nh_avx2)
 
 	vpaddq		T5, T4, T4
 	vpaddq		T1, T0, T0
 	vpaddq		T4, T0, T0
 	vmovdqu		T0, (HASH)
+	vzeroupper
 	RET
 SYM_FUNC_END(nh_avx2)