diff mbox series

[v12,01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

Message ID 20231003062458.23552-2-xin3.li@intel.com (mailing list archive)
State New, archived
Headers show
Series x86: enable FRED for x86-64 | expand

Commit Message

Li, Xin3 Oct. 3, 2023, 6:24 a.m. UTC
WRMSRNS is an instruction that behaves exactly like WRMSR, with
the only difference being that it is not a serializing instruction
by default. Under certain conditions, WRMSRNS may replace WRMSR to
improve performance.

Add the CPU feature bit for WRMSRNS.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 2 files changed, 2 insertions(+)

Comments

Borislav Petkov Nov. 8, 2023, 12:36 p.m. UTC | #1
On Mon, Oct 02, 2023 at 11:24:22PM -0700, Xin Li wrote:
> Subject: Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS
							  ^^^^

For all your text:

s/cpu/CPU/g

> WRMSRNS is an instruction that behaves exactly like WRMSR, with
> the only difference being that it is not a serializing instruction
> by default. Under certain conditions, WRMSRNS may replace WRMSR to
> improve performance.
> 
> Add the CPU feature bit for WRMSRNS.
> 
> Tested-by: Shan Kang <shan.kang@intel.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>
> ---
>  arch/x86/include/asm/cpufeatures.h       | 1 +
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  2 files changed, 2 insertions(+)

It looks to me like you can merge the first three patches into one as
all they do is add that insn support.

Then, further down in the patchset, it says:

+	if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+		/* WRMSRNS is a baseline feature for FRED. */

but WRMSRNS is not mentioned in the FRED spec "Document Number:
346446-005US, Revision: 5.0" which, according to

https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html

is the latest.

Am I looking at the wrong one?

> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 58cb9495e40f..330876d34b68 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -322,6 +322,7 @@
>  #define X86_FEATURE_FSRS		(12*32+11) /* "" Fast short REP STOSB */
>  #define X86_FEATURE_FSRC		(12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
>  #define X86_FEATURE_LKGS		(12*32+18) /* "" Load "kernel" (userspace) GS */
> +#define X86_FEATURE_WRMSRNS		(12*32+19) /* "" Non-Serializing Write to Model Specific Register instruction */

						  /* "" Non-serializing WRMSR */

is more than enough.

And now I'm wondering: when you're adding a separate CPUID bit, then the
above should be

+       if (cpu_feature_enabled(X86_FEATURE_WRMSRNS)) {
+               /* WRMSRNS is a baseline feature for FRED. */

I see that you're adding a dependency:

+	{ X86_FEATURE_FRED,			X86_FEATURE_WRMSRNS   },

which then means you don't need the X86_FEATURE_WRMSRNS definition at
all and can use X86_FEATURE_FRED only.

So, what's up?

Thx.
Li, Xin3 Nov. 14, 2023, 12:43 a.m. UTC | #2
> Then, further down in the patchset, it says:
> 
> +	if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> +		/* WRMSRNS is a baseline feature for FRED. */
> 
> but WRMSRNS is not mentioned in the FRED spec "Document Number:
> 346446-005US, Revision: 5.0" which, according to
> 
> https://www.intel.com/content/www/us/en/content-details/780121/flexible-
> return-and-event-delivery-fred-specification.html
> 
> is the latest.
> 
> Am I looking at the wrong one?

No.  tglx asked for it:
https://lkml.kernel.org/kvm/87y1h81ht4.ffs@tglx/

> 
> And now I'm wondering: when you're adding a separate CPUID bit, then the
> above should be
> 
> +       if (cpu_feature_enabled(X86_FEATURE_WRMSRNS)) {
> +               /* WRMSRNS is a baseline feature for FRED. */

Because we are doing 
		wrmsrns(MSR_IA32_FRED_RSP0, ...)
here, and X86_FEATURE_WRMSRNS doesn't guarantee MSR_IA32_FRED_RSP0 exists.

Or I missed something?

> 
> I see that you're adding a dependency:
> 
> +	{ X86_FEATURE_FRED,			X86_FEATURE_WRMSRNS   },
> 
> which then means you don't need the X86_FEATURE_WRMSRNS definition at all
> and can use X86_FEATURE_FRED only.
> 
> So, what's up?

FRED just gets the honor to introduce WRMSRNS and its first usage:
https://lkml.kernel.org/kvm/b05e3092-8ba3-f4e1-b5a3-2125944936fd@zytor.com/

Another patch set should replace WRMSR with WRMSRNS, with SERIALIZE added
when needed.

Sorry for the late response, it was a long weekend in the US.

Thanks!
    Xin
Borislav Petkov Nov. 14, 2023, 5:02 a.m. UTC | #3
On Tue, Nov 14, 2023 at 12:43:38AM +0000, Li, Xin3 wrote:
> No.  tglx asked for it:
> https://lkml.kernel.org/kvm/87y1h81ht4.ffs@tglx/

Aha

"According to the CPU folks FRED systems are guaranteed to have WRMSRNS -
I asked for that :). It's just not yet documented."

so I'm going to expect that to appear in the next FRED spec revision...

> Because we are doing 
> 		wrmsrns(MSR_IA32_FRED_RSP0, ...)
> here, and X86_FEATURE_WRMSRNS doesn't guarantee MSR_IA32_FRED_RSP0 exists.
> 
> Or I missed something?

Well, according to what I'm hearing and reading so far:

FRED means WRMSRNS
FRED means MSR_IA32_FRED_RSP0

and if you had to be precise, the code should do:

	if (cpu_feature_enabled(X86_FEATURE_FRED)) {
		if (cpu_feature_enabled(X86_FEATURE_WRMSRNS))
			wrmsrns(MSR_IA32_FRED_RSP0, (unsigned long)task_stack_page(task) + THREAD_SIZE);
		else
			wrmsr(MSR_IA32_FRED_RSP0, (unsigned long)task_stack_page(task) + THREAD_SIZE);
	}

but apparently FRED implies WRMSRNS - not documented anywhere currently
- so you can save yourself one check.

But your version checks FRED if it can do WRMSRNS while there's
a separate WRMSRNS flag and that made me wonder...

> Another patch set should replace WRMSR with WRMSRNS, with SERIALIZE added
> when needed.

I sense someone wants to optimize MSR writes ... :-)

Thx.
Li, Xin3 Nov. 14, 2023, 5:58 a.m. UTC | #4
> and if you had to be precise, the code should do:
> 
> 	if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> 		if (cpu_feature_enabled(X86_FEATURE_WRMSRNS))
> 			wrmsrns(MSR_IA32_FRED_RSP0, (unsigned
> long)task_stack_page(task) + THREAD_SIZE);
> 		else
> 			wrmsr(MSR_IA32_FRED_RSP0, (unsigned
> long)task_stack_page(task) + THREAD_SIZE);
> 	}

This is exactly what tglx wanted to avoid.

And I love the idea "baseline", especially we have a ton of CPU features.

> 
> > Another patch set should replace WRMSR with WRMSRNS, with SERIALIZE
> > added when needed.
> 
> I sense someone wants to optimize MSR writes ... :-)

:-)
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 58cb9495e40f..330876d34b68 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -322,6 +322,7 @@ 
 #define X86_FEATURE_FSRS		(12*32+11) /* "" Fast short REP STOSB */
 #define X86_FEATURE_FSRC		(12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
 #define X86_FEATURE_LKGS		(12*32+18) /* "" Load "kernel" (userspace) GS */
+#define X86_FEATURE_WRMSRNS		(12*32+19) /* "" Non-Serializing Write to Model Specific Register instruction */
 #define X86_FEATURE_AMX_FP16		(12*32+21) /* "" AMX fp16 Support */
 #define X86_FEATURE_AVX_IFMA            (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
 #define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 798e60b5454b..1b9d86ba5bc2 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -318,6 +318,7 @@ 
 #define X86_FEATURE_FSRS		(12*32+11) /* "" Fast short REP STOSB */
 #define X86_FEATURE_FSRC		(12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
 #define X86_FEATURE_LKGS		(12*32+18) /* "" Load "kernel" (userspace) GS */
+#define X86_FEATURE_WRMSRNS		(12*32+19) /* "" Non-Serializing Write to Model Specific Register instruction */
 #define X86_FEATURE_AMX_FP16		(12*32+21) /* "" AMX fp16 Support */
 #define X86_FEATURE_AVX_IFMA            (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
 #define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */