diff mbox series

[RESEND,1/2] x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op()

Message ID 20250213191457.12377-1-ubizjak@gmail.com (mailing list archive)
State New
Headers show
Series [RESEND,1/2] x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op() | expand

Commit Message

Uros Bizjak Feb. 13, 2025, 7:14 p.m. UTC
percpu_{,try_}cmpxchg{64,128}() macros use CALL instruction inside
asm statement in one of their alternatives. Use ALT_OUTPUT_SP()
macro to add required dependence on %esp register.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
---
 arch/x86/include/asm/percpu.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

Comments

Dave Hansen Feb. 13, 2025, 8:43 p.m. UTC | #1
On 2/13/25 11:14, Uros Bizjak wrote:
> percpu_{,try_}cmpxchg{64,128}() macros use CALL instruction inside
> asm statement in one of their alternatives. Use ALT_OUTPUT_SP()
> macro to add required dependence on %esp register.

Is this just a pedantic fix? Or is there an actual impact to end users
that needs to be considered?

Basically, you've told me what the patch does, but not why anyone should
care or why it should be applied.
Uros Bizjak Feb. 13, 2025, 9:17 p.m. UTC | #2
On Thu, Feb 13, 2025 at 9:43 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/13/25 11:14, Uros Bizjak wrote:
> > percpu_{,try_}cmpxchg{64,128}() macros use CALL instruction inside
> > asm statement in one of their alternatives. Use ALT_OUTPUT_SP()
> > macro to add required dependence on %esp register.
>
> Is this just a pedantic fix? Or is there an actual impact to end users
> that needs to be considered?

When call insn is embedded in the asm, then the compiler doesn't know
that due to call insn asm now depends on stack pointer or frame
pointer, so it is free to schedule the instruction outside the
function frame prologue/epilogue. Currently, this only triggers
objtool warning, but if we ever compile the kernel with redzone (IIRC,
it was mentioned that this is possible with FRED enabled kernel), the
call will clobber the redzone. Please note that alternative_call()
family of functions, __alternative_atomic64() and
__arch_{,try_}cmpxchg64_emu() all use the same macro exactly for the
reason explained above.

OTOH, all recent x86_64 processors support CMPXCHG128 insn, so the
call alternative will be rarely used.

> Basically, you've told me what the patch does, but not why anyone should
> care or why it should be applied.

This is actually explained at length in the comment for
ASM_CALL_CONSTRAINT, which ALT_OUTPUT_SP macro uses.

Uros.
Dave Hansen Feb. 13, 2025, 10:54 p.m. UTC | #3
On 2/13/25 13:17, Uros Bizjak wrote:
>> Basically, you've told me what the patch does, but not why anyone should
>> care or why it should be applied.
> This is actually explained at length in the comment for
> ASM_CALL_CONSTRAINT, which ALT_OUTPUT_SP macro uses.

Great info, thanks! Could you give the patch another shot and include
this in the changelog, please? Better yet, you could paraphrase the
comment so that we don't have to go searching for it.
Christoph Lameter (Ampere) Feb. 14, 2025, 6:22 p.m. UTC | #4
On Thu, 13 Feb 2025, Uros Bizjak wrote:

> OTOH, all recent x86_64 processors support CMPXCHG128 insn, so the
> call alternative will be rarely used.

Do we still support processors without cmpxchg128? If not then lets just
drop the calls from the kernel.
Uros Bizjak Feb. 14, 2025, 7:55 p.m. UTC | #5
On Fri, Feb 14, 2025 at 7:22 PM Christoph Lameter (Ampere)
<cl@gentwo.org> wrote:
>
> On Thu, 13 Feb 2025, Uros Bizjak wrote:
>
> > OTOH, all recent x86_64 processors support CMPXCHG128 insn, so the
> > call alternative will be rarely used.
>
> Do we still support processors without cmpxchg128? If not then lets just
> drop the calls from the kernel.

I'm not aware of any discussion about that decision.

Thanks,
Uros.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index e525cd85f999..0ab991fba7de 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -350,9 +350,9 @@  do {									\
 									\
 	asm qual (ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
 			      "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \
-		  : [var] "+m" (__my_cpu_var(_var)),			\
-		    "+a" (old__.low),					\
-		    "+d" (old__.high)					\
+		  : ALT_OUTPUT_SP([var] "+m" (__my_cpu_var(_var)),	\
+				  "+a" (old__.low),			\
+				  "+d" (old__.high))			\
 		  : "b" (new__.low),					\
 		    "c" (new__.high),					\
 		    "S" (&(_var))					\
@@ -381,10 +381,10 @@  do {									\
 	asm qual (ALTERNATIVE("call this_cpu_cmpxchg8b_emu",		\
 			      "cmpxchg8b " __percpu_arg([var]), X86_FEATURE_CX8) \
 		  CC_SET(z)						\
-		  : CC_OUT(z) (success),				\
-		    [var] "+m" (__my_cpu_var(_var)),			\
-		    "+a" (old__.low),					\
-		    "+d" (old__.high)					\
+		  : ALT_OUTPUT_SP(CC_OUT(z) (success),			\
+				  [var] "+m" (__my_cpu_var(_var)),	\
+				  "+a" (old__.low),			\
+				  "+d" (old__.high))			\
 		  : "b" (new__.low),					\
 		    "c" (new__.high),					\
 		    "S" (&(_var))					\
@@ -421,9 +421,9 @@  do {									\
 									\
 	asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu",		\
 			      "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \
-		  : [var] "+m" (__my_cpu_var(_var)),			\
-		    "+a" (old__.low),					\
-		    "+d" (old__.high)					\
+		  : ALT_OUTPUT_SP([var] "+m" (__my_cpu_var(_var)),	\
+				  "+a" (old__.low),			\
+				  "+d" (old__.high))			\
 		  : "b" (new__.low),					\
 		    "c" (new__.high),					\
 		    "S" (&(_var))					\
@@ -452,10 +452,10 @@  do {									\
 	asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu",		\
 			      "cmpxchg16b " __percpu_arg([var]), X86_FEATURE_CX16) \
 		  CC_SET(z)						\
-		  : CC_OUT(z) (success),				\
-		    [var] "+m" (__my_cpu_var(_var)),			\
-		    "+a" (old__.low),					\
-		    "+d" (old__.high)					\
+		  : ALT_OUTPUT_SP(CC_OUT(z) (success),			\
+				  [var] "+m" (__my_cpu_var(_var)),	\
+				  "+a" (old__.low),			\
+				  "+d" (old__.high))			\
 		  : "b" (new__.low),					\
 		    "c" (new__.high),					\
 		    "S" (&(_var))					\