[v3] x86/cpu: Use SERIALIZE in sync_core() when available

Message ID	20200806192531.25136-1-ricardo.neri-calderon@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qw4z=BQ=vger.kernel.org=linux-edac-owner@kernel.org> IronPort-SDR: S93U1g4Kzx9e9Bladk0irKIg5lURiFrJrPoJOFr+7L/Am/fWtajNRexhSYKivPLTX0qsKJZ0V1 x2ixX9CuN9eA== IronPort-SDR: nVMEfMzQrFS2qrV8CO5TL2tsnf0ZZDvw2SEgJxnRR7yUkRFScyeLZ6jWA7IcaFWboeuR1ClA8t 7LrGnk3/ZCVw== From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@suse.de>, Andy Lutomirski <luto@kernel.org>, x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com>, Tony Luck <tony.luck@intel.com>, Cathy Zhang <cathy.zhang@intel.com>, Fenghua Yu <fenghua.yu@intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Kyung Min Park <kyung.min.park@intel.com>, "Peter Zijlstra (Intel)" <peterz@infradead.org>, "Ravi V. Shankar" <ravi.v.shankar@intel.com>, Sean Christopherson <sean.j.christopherson@intel.com>, linux-kernel@vger.kernel.org, Ricardo Neri <ricardo.neri@intel.com>, Ricardo Neri <ricardo.neri-calderon@linux.intel.com>, Dave Hansen <dave.hansen@linux.intel.com>, linux-edac@vger.kernel.org Subject: [PATCH v3] x86/cpu: Use SERIALIZE in sync_core() when available Date: Thu, 6 Aug 2020 12:25:31 -0700 Message-Id: <20200806192531.25136-1-ricardo.neri-calderon@linux.intel.com> Sender: linux-edac-owner@vger.kernel.org Precedence: bulk
Series	[v3] x86/cpu: Use SERIALIZE in sync_core() when available \| expand [v3] x86/cpu: Use SERIALIZE in sync_core() when available

Message ID

20200806192531.25136-1-ricardo.neri-calderon@linux.intel.com (mailing list archive)

State

New, archived

Headers

IronPort-SDR: 
 S93U1g4Kzx9e9Bladk0irKIg5lURiFrJrPoJOFr+7L/Am/fWtajNRexhSYKivPLTX0qsKJZ0V1
 x2ixX9CuN9eA==
IronPort-SDR: 
 nVMEfMzQrFS2qrV8CO5TL2tsnf0ZZDvw2SEgJxnRR7yUkRFScyeLZ6jWA7IcaFWboeuR1ClA8t
 7LrGnk3/ZCVw==
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@suse.de>,
        Andy Lutomirski <luto@kernel.org>, x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>,
        Tony Luck <tony.luck@intel.com>,
        Cathy Zhang <cathy.zhang@intel.com>,
        Fenghua Yu <fenghua.yu@intel.com>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Kyung Min Park <kyung.min.park@intel.com>,
        "Peter Zijlstra (Intel)" <peterz@infradead.org>,
        "Ravi V. Shankar" <ravi.v.shankar@intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        linux-kernel@vger.kernel.org,
        Ricardo Neri <ricardo.neri@intel.com>,
        Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        linux-edac@vger.kernel.org
Subject: [PATCH v3] x86/cpu: Use SERIALIZE in sync_core() when available
Date: Thu,  6 Aug 2020 12:25:31 -0700
Message-Id: <20200806192531.25136-1-ricardo.neri-calderon@linux.intel.com>
Sender: linux-edac-owner@vger.kernel.org
Precedence: bulk

Series

[v3] x86/cpu: Use SERIALIZE in sync_core() when available | expand

Commit Message

Ricardo Neri Aug. 6, 2020, 7:25 p.m. UTC

The SERIALIZE instruction gives software a way to force the processor to
complete all modifications to flags, registers and memory from previous
instructions and drain all buffered writes to memory before the next
instruction is fetched and executed. Thus, it serves the purpose of
sync_core(). Use it when available.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cathy Zhang <cathy.zhang@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kyung Min Park <kyung.min.park@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
This is a v3 from my two previous submissions [1], [2]. The first three
patches of the series have been merged into Linus' tree. Hence, I am
submitting only this patch for review.

[1]. https://lkml.org/lkml/2020/7/27/8
[2]. https://lkml.org/lkml/2020/8/4/1090

Changes since v2:
 * Support serialize with static_cpu_has() instead of using alternative
   runtime patching directly. (Borislav Petkov)

Changes since v1:
 * Support SERIALIZE using alternative runtime patching.
   (Peter Zijlstra, H. Peter Anvin)
 * Added a note to specify which version of binutils supports SERIALIZE.
   (Peter Zijlstra)
 * Verified that (::: "memory") is used. (H. Peter Anvin)
---
 arch/x86/include/asm/special_insns.h | 2 ++
 arch/x86/include/asm/sync_core.h     | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

Comments

Dave Hansen Aug. 6, 2020, 7:57 p.m. UTC | #1

On 8/6/20 12:25 PM, Ricardo Neri wrote:
>  static inline void sync_core(void)
>  {
>  	/*
> -	 * There are quite a few ways to do this.  IRET-to-self is nice
> +	 * Hardware can do this for us if SERIALIZE is available. Otherwise,
> +	 * there are quite a few ways to do this.  IRET-to-self is nice

This seems to imply that things other than SERIALIZE aren't the hardware
doing this.  All of these methods are equally architecturally
serializing *and* provided by the hardware.

I also don't quite get the point of separating the comments from the
code.  Shouldn't this be:

	/*
	 * The SERIALIZE instruction is the most straightforward way to
	 * do this but it not universally available.
	 */
	if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
		asm volatile(__ASM_SERIALIZE ::: "memory");
		return;
	}

	/*
	 * For all other processors, IRET-to-self is nice ...
	 */
	iret_to_self();

Ricardo Neri Aug. 6, 2020, 11:04 p.m. UTC | #2

On Thu, Aug 06, 2020 at 12:57:26PM -0700, Dave Hansen wrote:
> On 8/6/20 12:25 PM, Ricardo Neri wrote:
> >  static inline void sync_core(void)
> >  {
> >  	/*
> > -	 * There are quite a few ways to do this.  IRET-to-self is nice
> > +	 * Hardware can do this for us if SERIALIZE is available. Otherwise,
> > +	 * there are quite a few ways to do this.  IRET-to-self is nice
> 
> This seems to imply that things other than SERIALIZE aren't the hardware
> doing this.  All of these methods are equally architecturally
> serializing *and* provided by the hardware.

Indeed, I can see how the wording may imply that.

> 
> I also don't quite get the point of separating the comments from the
> code.  Shouldn't this be:
> 
> 	/*
> 	 * The SERIALIZE instruction is the most straightforward way to
> 	 * do this but it not universally available.
> 	 */

I regard the comment as describing all the considered options to for
architectural serialization. What about if I add SERIALIZE as another
option? I propose to discuss it towards the end of the comment:

	/*
	 * There are quite a few ways to do this.  IRET-to-self is nice
	 * because it works on every CPU, at any CPL (so it's compatible
	 * with paravirtualization), and it never exits to a hypervisor.
	 * The only down sides are that it's a bit slow (it seems to be
	 * a bit more than 2x slower than the fastest options) and that
	 * it unmasks NMIs.  The "push %cs" is needed because, in
	 * paravirtual environments, __KERNEL_CS may not be a valid CS
	 * value when we do IRET directly.
	 *
	 * In case NMI unmasking or performance ever becomes a problem,
	 * the next best option appears to be MOV-to-CR2 and an
	 * unconditional jump.  That sequence also works on all CPUs,
	 * but it will fault at CPL3 (i.e. Xen PV).
	 *
	 * CPUID is the conventional way, but it's nasty: it doesn't
	 * exist on some 486-like CPUs, and it usually exits to a
	 * hypervisor.
	 *
 	 * The SERIALIZE instruction is the most straightforward way to
 	 * do this as it does not clobber registers or exit to a
	 * hypervisor. However, it is not universally available.
 	 *
	 * Like all of Linux's memory ordering operations, this is a
	 * compiler barrier as well.
	 */

What do you think?

> 	if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> 		asm volatile(__ASM_SERIALIZE ::: "memory");
> 		return;
> 	}
> 
> 	/*
> 	 * For all other processors, IRET-to-self is nice ...
> 	 */
> 	iret_to_self();

Dave Hansen Aug. 6, 2020, 11:08 p.m. UTC | #3

On 8/6/20 4:04 PM, Ricardo Neri wrote:
> 	 * CPUID is the conventional way, but it's nasty: it doesn't
> 	 * exist on some 486-like CPUs, and it usually exits to a
> 	 * hypervisor.
> 	 *
>  	 * The SERIALIZE instruction is the most straightforward way to
>  	 * do this as it does not clobber registers or exit to a
> 	 * hypervisor. However, it is not universally available.
>  	 *
> 	 * Like all of Linux's memory ordering operations, this is a
> 	 * compiler barrier as well.
> 	 */
> 
> What do you think?

I like what I suggested.  :)

SERIALIZE is best where available.  Do it first, comment it by itself.

Then, go into the long discussion of the other alternatives.  They only
make sense when SERIALIZE isn't there, and the logic for selection there
is substantially more complicated.

Ricardo Neri Aug. 6, 2020, 11:39 p.m. UTC | #4

On Thu, Aug 06, 2020 at 04:08:47PM -0700, Dave Hansen wrote:
> On 8/6/20 4:04 PM, Ricardo Neri wrote:
> > 	 * CPUID is the conventional way, but it's nasty: it doesn't
> > 	 * exist on some 486-like CPUs, and it usually exits to a
> > 	 * hypervisor.
> > 	 *
> >  	 * The SERIALIZE instruction is the most straightforward way to
> >  	 * do this as it does not clobber registers or exit to a
> > 	 * hypervisor. However, it is not universally available.
> >  	 *
> > 	 * Like all of Linux's memory ordering operations, this is a
> > 	 * compiler barrier as well.
> > 	 */
> > 
> > What do you think?
> 
> I like what I suggested.  :)
> 
> SERIALIZE is best where available.  Do it first, comment it by itself.
> 
> Then, go into the long discussion of the other alternatives.  They only
> make sense when SERIALIZE isn't there, and the logic for selection there
> is substantially more complicated.

Sure Dave, I think this layout makes sense. I will rework the comments.

Thanks and BR,
Ricardo

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 59a3e13204c3..25cd67801dda 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -10,6 +10,8 @@ 
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
 
+/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */
+#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
 /*
  * Volatile isn't enough to prevent the compiler from reordering the
  * read/write functions for the control registers and messing everything up.
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index fdb5b356e59b..b74580413a0b 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -5,6 +5,7 @@ 
 #include <linux/preempt.h>
 #include <asm/processor.h>
 #include <asm/cpufeature.h>
+#include <asm/special_insns.h>
 
 #ifdef CONFIG_X86_32
 static inline void iret_to_self(void)
@@ -54,7 +55,8 @@  static inline void iret_to_self(void)
 static inline void sync_core(void)
 {
 	/*
-	 * There are quite a few ways to do this.  IRET-to-self is nice
+	 * Hardware can do this for us if SERIALIZE is available. Otherwise,
+	 * there are quite a few ways to do this.  IRET-to-self is nice
 	 * because it works on every CPU, at any CPL (so it's compatible
 	 * with paravirtualization), and it never exits to a hypervisor.
 	 * The only down sides are that it's a bit slow (it seems to be
@@ -75,6 +77,11 @@  static inline void sync_core(void)
 	 * Like all of Linux's memory ordering operations, this is a
 	 * compiler barrier as well.
 	 */
+	if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
+		asm volatile(__ASM_SERIALIZE ::: "memory");
+		return;
+	}
+
 	iret_to_self();
 }

[v3] x86/cpu: Use SERIALIZE in sync_core() when available

Commit Message

Comments

Patch