diff mbox series

[v3] x86/cpu: Use SERIALIZE in sync_core() when available

Message ID 20200806192531.25136-1-ricardo.neri-calderon@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [v3] x86/cpu: Use SERIALIZE in sync_core() when available | expand

Commit Message

Ricardo Neri Aug. 6, 2020, 7:25 p.m. UTC
The SERIALIZE instruction gives software a way to force the processor to
complete all modifications to flags, registers and memory from previous
instructions and drain all buffered writes to memory before the next
instruction is fetched and executed. Thus, it serves the purpose of
sync_core(). Use it when available.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cathy Zhang <cathy.zhang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kyung Min Park <kyung.min.park@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
This is a v3 from my two previous submissions [1], [2]. The first three
patches of the series have been merged into Linus' tree. Hence, I am
submitting only this patch for review.

[1]. https://lkml.org/lkml/2020/7/27/8
[2]. https://lkml.org/lkml/2020/8/4/1090

Changes since v2:
 * Support serialize with static_cpu_has() instead of using alternative
   runtime patching directly. (Borislav Petkov)

Changes since v1:
 * Support SERIALIZE using alternative runtime patching.
   (Peter Zijlstra, H. Peter Anvin)
 * Added a note to specify which version of binutils supports SERIALIZE.
   (Peter Zijlstra)
 * Verified that (::: "memory") is used. (H. Peter Anvin)
---
 arch/x86/include/asm/special_insns.h | 2 ++
 arch/x86/include/asm/sync_core.h     | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

Comments

Dave Hansen Aug. 6, 2020, 7:57 p.m. UTC | #1
On 8/6/20 12:25 PM, Ricardo Neri wrote:
>  static inline void sync_core(void)
>  {
>  	/*
> -	 * There are quite a few ways to do this.  IRET-to-self is nice
> +	 * Hardware can do this for us if SERIALIZE is available. Otherwise,
> +	 * there are quite a few ways to do this.  IRET-to-self is nice

This seems to imply that things other than SERIALIZE aren't the hardware
doing this.  All of these methods are equally architecturally
serializing *and* provided by the hardware.

I also don't quite get the point of separating the comments from the
code.  Shouldn't this be:

	/*
	 * The SERIALIZE instruction is the most straightforward way to
	 * do this but it not universally available.
	 */
	if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
		asm volatile(__ASM_SERIALIZE ::: "memory");
		return;
	}

	/*
	 * For all other processors, IRET-to-self is nice ...
	 */
	iret_to_self();
Ricardo Neri Aug. 6, 2020, 11:04 p.m. UTC | #2
On Thu, Aug 06, 2020 at 12:57:26PM -0700, Dave Hansen wrote:
> On 8/6/20 12:25 PM, Ricardo Neri wrote:
> >  static inline void sync_core(void)
> >  {
> >  	/*
> > -	 * There are quite a few ways to do this.  IRET-to-self is nice
> > +	 * Hardware can do this for us if SERIALIZE is available. Otherwise,
> > +	 * there are quite a few ways to do this.  IRET-to-self is nice
> 
> This seems to imply that things other than SERIALIZE aren't the hardware
> doing this.  All of these methods are equally architecturally
> serializing *and* provided by the hardware.

Indeed, I can see how the wording may imply that.

> 
> I also don't quite get the point of separating the comments from the
> code.  Shouldn't this be:
> 
> 	/*
> 	 * The SERIALIZE instruction is the most straightforward way to
> 	 * do this but it not universally available.
> 	 */

I regard the comment as describing all the considered options to for
architectural serialization. What about if I add SERIALIZE as another
option? I propose to discuss it towards the end of the comment:

	/*
	 * There are quite a few ways to do this.  IRET-to-self is nice
	 * because it works on every CPU, at any CPL (so it's compatible
	 * with paravirtualization), and it never exits to a hypervisor.
	 * The only down sides are that it's a bit slow (it seems to be
	 * a bit more than 2x slower than the fastest options) and that
	 * it unmasks NMIs.  The "push %cs" is needed because, in
	 * paravirtual environments, __KERNEL_CS may not be a valid CS
	 * value when we do IRET directly.
	 *
	 * In case NMI unmasking or performance ever becomes a problem,
	 * the next best option appears to be MOV-to-CR2 and an
	 * unconditional jump.  That sequence also works on all CPUs,
	 * but it will fault at CPL3 (i.e. Xen PV).
	 *
	 * CPUID is the conventional way, but it's nasty: it doesn't
	 * exist on some 486-like CPUs, and it usually exits to a
	 * hypervisor.
	 *
 	 * The SERIALIZE instruction is the most straightforward way to
 	 * do this as it does not clobber registers or exit to a
	 * hypervisor. However, it is not universally available.
 	 *
	 * Like all of Linux's memory ordering operations, this is a
	 * compiler barrier as well.
	 */

What do you think?

> 	if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> 		asm volatile(__ASM_SERIALIZE ::: "memory");
> 		return;
> 	}
> 
> 	/*
> 	 * For all other processors, IRET-to-self is nice ...
> 	 */
> 	iret_to_self();
Dave Hansen Aug. 6, 2020, 11:08 p.m. UTC | #3
On 8/6/20 4:04 PM, Ricardo Neri wrote:
> 	 * CPUID is the conventional way, but it's nasty: it doesn't
> 	 * exist on some 486-like CPUs, and it usually exits to a
> 	 * hypervisor.
> 	 *
>  	 * The SERIALIZE instruction is the most straightforward way to
>  	 * do this as it does not clobber registers or exit to a
> 	 * hypervisor. However, it is not universally available.
>  	 *
> 	 * Like all of Linux's memory ordering operations, this is a
> 	 * compiler barrier as well.
> 	 */
> 
> What do you think?

I like what I suggested.  :)

SERIALIZE is best where available.  Do it first, comment it by itself.

Then, go into the long discussion of the other alternatives.  They only
make sense when SERIALIZE isn't there, and the logic for selection there
is substantially more complicated.
Ricardo Neri Aug. 6, 2020, 11:39 p.m. UTC | #4
On Thu, Aug 06, 2020 at 04:08:47PM -0700, Dave Hansen wrote:
> On 8/6/20 4:04 PM, Ricardo Neri wrote:
> > 	 * CPUID is the conventional way, but it's nasty: it doesn't
> > 	 * exist on some 486-like CPUs, and it usually exits to a
> > 	 * hypervisor.
> > 	 *
> >  	 * The SERIALIZE instruction is the most straightforward way to
> >  	 * do this as it does not clobber registers or exit to a
> > 	 * hypervisor. However, it is not universally available.
> >  	 *
> > 	 * Like all of Linux's memory ordering operations, this is a
> > 	 * compiler barrier as well.
> > 	 */
> > 
> > What do you think?
> 
> I like what I suggested.  :)
> 
> SERIALIZE is best where available.  Do it first, comment it by itself.
> 
> Then, go into the long discussion of the other alternatives.  They only
> make sense when SERIALIZE isn't there, and the logic for selection there
> is substantially more complicated.

Sure Dave, I think this layout makes sense. I will rework the comments.

Thanks and BR,
Ricardo
diff mbox series

Patch

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 59a3e13204c3..25cd67801dda 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -10,6 +10,8 @@ 
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
 
+/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */
+#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
 /*
  * Volatile isn't enough to prevent the compiler from reordering the
  * read/write functions for the control registers and messing everything up.
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index fdb5b356e59b..b74580413a0b 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -5,6 +5,7 @@ 
 #include <linux/preempt.h>
 #include <asm/processor.h>
 #include <asm/cpufeature.h>
+#include <asm/special_insns.h>
 
 #ifdef CONFIG_X86_32
 static inline void iret_to_self(void)
@@ -54,7 +55,8 @@  static inline void iret_to_self(void)
 static inline void sync_core(void)
 {
 	/*
-	 * There are quite a few ways to do this.  IRET-to-self is nice
+	 * Hardware can do this for us if SERIALIZE is available. Otherwise,
+	 * there are quite a few ways to do this.  IRET-to-self is nice
 	 * because it works on every CPU, at any CPL (so it's compatible
 	 * with paravirtualization), and it never exits to a hypervisor.
 	 * The only down sides are that it's a bit slow (it seems to be
@@ -75,6 +77,11 @@  static inline void sync_core(void)
 	 * Like all of Linux's memory ordering operations, this is a
 	 * compiler barrier as well.
 	 */
+	if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
+		asm volatile(__ASM_SERIALIZE ::: "memory");
+		return;
+	}
+
 	iret_to_self();
 }