Message ID | 20250122013438.731416-2-kevinloughlin@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency | expand |
On Wed, Jan 22, 2025 at 01:34:37AM +0000, Kevin Loughlin wrote: > In line with WBINVD usage, add WBONINVD helper functions. For the > wbnoinvd() helper, fall back to WBINVD if X86_FEATURE_WBNOINVD is not > present. > > Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> > --- > arch/x86/include/asm/smp.h | 7 +++++++ > arch/x86/include/asm/special_insns.h | 15 ++++++++++++++- > arch/x86/lib/cache-smp.c | 12 ++++++++++++ > 3 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h > index ca073f40698f..ecf93a243b83 100644 > --- a/arch/x86/include/asm/smp.h > +++ b/arch/x86/include/asm/smp.h > @@ -112,6 +112,7 @@ void native_play_dead(void); > void play_dead_common(void); > void wbinvd_on_cpu(int cpu); > int wbinvd_on_all_cpus(void); > +int wbnoinvd_on_all_cpus(void); > > void smp_kick_mwait_play_dead(void); > > @@ -160,6 +161,12 @@ static inline int wbinvd_on_all_cpus(void) > return 0; > } > > +static inline int wbnoinvd_on_all_cpus(void) > +{ > + wbnoinvd(); > + return 0; > +} > + > static inline struct cpumask *cpu_llc_shared_mask(int cpu) > { > return (struct cpumask *)cpumask_of(0); > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h > index 03e7c2d49559..94640c3491d7 100644 > --- a/arch/x86/include/asm/special_insns.h > +++ b/arch/x86/include/asm/special_insns.h > @@ -117,7 +117,20 @@ static inline void wrpkru(u32 pkru) > > static __always_inline void wbinvd(void) > { > - asm volatile("wbinvd": : :"memory"); > + asm volatile("wbinvd" : : : "memory"); > +} > + > +/* > + * Cheaper version of wbinvd(). Call when caches > + * need to be written back but not invalidated. > + */ > +static __always_inline void wbnoinvd(void) > +{ > + /* > + * Use the compatible but more destructive "invalidate" > + * variant when no-invalidate is unavailable. > + */ > + alternative("wbinvd", "wbnoinvd", X86_FEATURE_WBNOINVD); The minimal version of binutils kernel supports is 2.25 which doesn't know about WBNOINVD. I think you need to do something like. alternative("wbinvd", ".byte 0xf3; wbinvd", X86_FEATURE_WBNOINVD); Or propose to bump minimal binutils version.
On 1/22/25 01:32, Kirill A. Shutemov wrote: > On Wed, Jan 22, 2025 at 01:34:37AM +0000, Kevin Loughlin wrote: >> In line with WBINVD usage, add WBONINVD helper functions. For the >> wbnoinvd() helper, fall back to WBINVD if X86_FEATURE_WBNOINVD is not >> present. >> >> Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> >> --- >> arch/x86/include/asm/smp.h | 7 +++++++ >> arch/x86/include/asm/special_insns.h | 15 ++++++++++++++- >> arch/x86/lib/cache-smp.c | 12 ++++++++++++ >> 3 files changed, 33 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h >> index ca073f40698f..ecf93a243b83 100644 >> --- a/arch/x86/include/asm/smp.h >> +++ b/arch/x86/include/asm/smp.h >> @@ -112,6 +112,7 @@ void native_play_dead(void); >> void play_dead_common(void); >> void wbinvd_on_cpu(int cpu); >> int wbinvd_on_all_cpus(void); >> +int wbnoinvd_on_all_cpus(void); >> >> void smp_kick_mwait_play_dead(void); >> >> @@ -160,6 +161,12 @@ static inline int wbinvd_on_all_cpus(void) >> return 0; >> } >> >> +static inline int wbnoinvd_on_all_cpus(void) >> +{ >> + wbnoinvd(); >> + return 0; >> +} >> + >> static inline struct cpumask *cpu_llc_shared_mask(int cpu) >> { >> return (struct cpumask *)cpumask_of(0); >> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h >> index 03e7c2d49559..94640c3491d7 100644 >> --- a/arch/x86/include/asm/special_insns.h >> +++ b/arch/x86/include/asm/special_insns.h >> @@ -117,7 +117,20 @@ static inline void wrpkru(u32 pkru) >> >> static __always_inline void wbinvd(void) >> { >> - asm volatile("wbinvd": : :"memory"); >> + asm volatile("wbinvd" : : : "memory"); >> +} >> + >> +/* >> + * Cheaper version of wbinvd(). Call when caches >> + * need to be written back but not invalidated. >> + */ >> +static __always_inline void wbnoinvd(void) >> +{ >> + /* >> + * Use the compatible but more destructive "invalidate" >> + * variant when no-invalidate is unavailable. >> + */ >> + alternative("wbinvd", "wbnoinvd", X86_FEATURE_WBNOINVD); > > The minimal version of binutils kernel supports is 2.25 which doesn't > know about WBNOINVD. > > I think you need to do something like. > > alternative("wbinvd", ".byte 0xf3; wbinvd", X86_FEATURE_WBNOINVD); I think "rep; wbinvd" would work as well. Thanks, Tom > > Or propose to bump minimal binutils version. >
On 1/22/25 11:39, Tom Lendacky wrote: >> I think you need to do something like. >> >> alternative("wbinvd", ".byte 0xf3; wbinvd", X86_FEATURE_WBNOINVD); > I think "rep; wbinvd" would work as well. I don't want to bike shed this too much. But, please, no. The fact that wbnoinvd can be expressed as "rep; wbinvd" is a implementation detail. Exposing how it's encoded in the ISA is only going to add confusion. This: static __always_inline void wbnoinvd(void) { asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); } is perfectly fine and a billion times less confusing than: asm volatile(".byte 0xf3; wbinvd\n\t": : :"memory"); or asm volatile("rep; wbinvd\n\t": : :"memory"); It only gets worse if it's mixed in an alternative() with the _actual_ wbinvd. BTW, I don't think you should be compelled to use alternative() as opposed to a good old: if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) ...
On Wed, Jan 22, 2025 at 3:16 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 1/22/25 11:39, Tom Lendacky wrote: > >> I think you need to do something like. > >> > >> alternative("wbinvd", ".byte 0xf3; wbinvd", X86_FEATURE_WBNOINVD); > > I think "rep; wbinvd" would work as well. > > I don't want to bike shed this too much. > > But, please, no. > > The fact that wbnoinvd can be expressed as "rep; wbinvd" is a > implementation detail. Exposing how it's encoded in the ISA is only > going to add confusion. This: > > static __always_inline void wbnoinvd(void) > { > asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); > } > > is perfectly fine and a billion times less confusing than: > > asm volatile(".byte 0xf3; wbinvd\n\t": : :"memory"); > or > asm volatile("rep; wbinvd\n\t": : :"memory"); > > It only gets worse if it's mixed in an alternative() with the _actual_ > wbinvd. Works for me. I will explicitly use 0xf3,0x0f,0x09 in v5, which I will send shortly. > BTW, I don't think you should be compelled to use alternative() as > opposed to a good old: > > if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > ... Agreed, though I'm leaving as alternative() for now (both because it results in fewer checks and because that's what is used in the rest of the file); please holler if you prefer otherwise. If so, my slight preference in that case would be to update the whole file stylistically in a separate commit.
On 1/22/25 16:06, Kevin Loughlin wrote: >> BTW, I don't think you should be compelled to use alternative() as >> opposed to a good old: >> >> if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) >> ... > Agreed, though I'm leaving as alternative() for now (both because it > results in fewer checks and because that's what is used in the rest of > the file); please holler if you prefer otherwise. If so, my slight > preference in that case would be to update the whole file > stylistically in a separate commit. alternative() can make a _lot_ of sense. It's extremely compact in the code it generates. It messes with compiler optimization, of course, just like any assembly. But, overall, it's great. In this case, though, we don't care one bit about code generation or performance. We're running the world's slowest instruction from an IPI. As for consistency, special_insns.h is gloriously inconsistent. But only two instructions use alternatives, and they *need* the asm syntax because they're passing registers and meaningful constraints in. The wbinvds don't get passed registers and their constraints are trivial. This conditional: alternative_io(".byte 0x3e; clflush %0", ".byte 0x66; clflush %0", X86_FEATURE_CLFLUSHOPT, "+m" (*(volatile char __force *)__p)); could be written like this: if (cpu_feature_enabled(X86_FEATURE_CLFLUSHOPT)) asm volatile(".byte 0x3e; clflush %0", "+m" (*(volatile char __force *)__p)); else asm volatile(".byte 0x66; clflush %0", "+m" (*(volatile char __force *)__p)); But that's _actively_ ugly. alternative() syntax there makes sense. Here, it's not ugly at all: if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); else wbinvd(); and it's actually more readable with alternative() syntax. So, please just do what makes the code look most readable. Performance and consistency aren't important. I see absolutely nothing wrong with: static __always_inline void raw_wbnoinvd(void) { asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); } void wbnoinvd(void) { if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) raw_wbnoinvd(); else wbinvd(); } ... except the fact that cpu_feature_enabled() kinda sucks and needs some work, but that's a whole other can of worms we can leave closed today.
On Wed, Jan 22, 2025 at 4:33 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 1/22/25 16:06, Kevin Loughlin wrote: > >> BTW, I don't think you should be compelled to use alternative() as > >> opposed to a good old: > >> > >> if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > >> ... > > Agreed, though I'm leaving as alternative() for now (both because it > > results in fewer checks and because that's what is used in the rest of > > the file); please holler if you prefer otherwise. If so, my slight > > preference in that case would be to update the whole file > > stylistically in a separate commit. > > alternative() can make a _lot_ of sense. It's extremely compact in the > code it generates. It messes with compiler optimization, of course, just > like any assembly. But, overall, it's great. > > In this case, though, we don't care one bit about code generation or > performance. We're running the world's slowest instruction from an IPI. > > As for consistency, special_insns.h is gloriously inconsistent. But only > two instructions use alternatives, and they *need* the asm syntax > because they're passing registers and meaningful constraints in. > > The wbinvds don't get passed registers and their constraints are > trivial. This conditional: > > alternative_io(".byte 0x3e; clflush %0", > ".byte 0x66; clflush %0", > X86_FEATURE_CLFLUSHOPT, > "+m" (*(volatile char __force *)__p)); > > could be written like this: > > if (cpu_feature_enabled(X86_FEATURE_CLFLUSHOPT)) > asm volatile(".byte 0x3e; clflush %0", > "+m" (*(volatile char __force *)__p)); > else > asm volatile(".byte 0x66; clflush %0", > "+m" (*(volatile char __force *)__p)); > > But that's _actively_ ugly. alternative() syntax there makes sense. > Here, it's not ugly at all: > > if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); > else > wbinvd(); > > and it's actually more readable with alternative() syntax. > > So, please just do what makes the code look most readable. Performance > and consistency aren't important. I see absolutely nothing wrong with: > > static __always_inline void raw_wbnoinvd(void) > { > asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); > } > > void wbnoinvd(void) > { > if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > raw_wbnoinvd(); > else > wbinvd(); > } > > ... except the fact that cpu_feature_enabled() kinda sucks and needs > some work, but that's a whole other can of worms we can leave closed today. Thanks for the detailed explanation; you've convinced me. v6 coming up shortly (using native_wbnoinvd() instead of raw_wbnoinvd(), as you named the proposed wrapper in your reply to v5).
On Wed, Jan 22, 2025 at 4:58 PM Kevin Loughlin <kevinloughlin@google.com> wrote: > > On Wed, Jan 22, 2025 at 4:33 PM Dave Hansen <dave.hansen@intel.com> wrote: > > > > On 1/22/25 16:06, Kevin Loughlin wrote: > > >> BTW, I don't think you should be compelled to use alternative() as > > >> opposed to a good old: > > >> > > >> if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > > >> ... > > > Agreed, though I'm leaving as alternative() for now (both because it > > > results in fewer checks and because that's what is used in the rest of > > > the file); please holler if you prefer otherwise. If so, my slight > > > preference in that case would be to update the whole file > > > stylistically in a separate commit. > > > > alternative() can make a _lot_ of sense. It's extremely compact in the > > code it generates. It messes with compiler optimization, of course, just > > like any assembly. But, overall, it's great. > > > > In this case, though, we don't care one bit about code generation or > > performance. We're running the world's slowest instruction from an IPI. > > > > As for consistency, special_insns.h is gloriously inconsistent. But only > > two instructions use alternatives, and they *need* the asm syntax > > because they're passing registers and meaningful constraints in. > > > > The wbinvds don't get passed registers and their constraints are > > trivial. This conditional: > > > > alternative_io(".byte 0x3e; clflush %0", > > ".byte 0x66; clflush %0", > > X86_FEATURE_CLFLUSHOPT, > > "+m" (*(volatile char __force *)__p)); > > > > could be written like this: > > > > if (cpu_feature_enabled(X86_FEATURE_CLFLUSHOPT)) > > asm volatile(".byte 0x3e; clflush %0", > > "+m" (*(volatile char __force *)__p)); > > else > > asm volatile(".byte 0x66; clflush %0", > > "+m" (*(volatile char __force *)__p)); > > > > But that's _actively_ ugly. alternative() syntax there makes sense. > > Here, it's not ugly at all: > > > > if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > > asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); > > else > > wbinvd(); > > > > and it's actually more readable with alternative() syntax. > > > > So, please just do what makes the code look most readable. Performance > > and consistency aren't important. I see absolutely nothing wrong with: > > > > static __always_inline void raw_wbnoinvd(void) > > { > > asm volatile(".byte 0xf3,0x0f,0x09\n\t": : :"memory"); > > } > > > > void wbnoinvd(void) > > { > > if (cpu_feature_enabled(X86_FEATURE_WBNOINVD)) > > raw_wbnoinvd(); > > else > > wbinvd(); > > } > > > > ... except the fact that cpu_feature_enabled() kinda sucks and needs > > some work, but that's a whole other can of worms we can leave closed today. > > Thanks for the detailed explanation; you've convinced me. v6 coming up > shortly (using native_wbnoinvd() instead of raw_wbnoinvd(), as you > named the proposed wrapper in your reply to v5). Actually, we may still want to use alternative() for the following reason: Kirill noted in ad3fe525b950 ("x86/mm: Unify pgtable_l5_enabled usage in early boot code") that cpu_feature_enabled() can't be used in early boot code, which would mean using it would make the wbnoinvd() implementation incompatible with early boot code if desired there. In contrast, I believe alternative() will just fall back to WBINVD until apply_alternatives() runs, at which point it will be replaced with WBNOINVD if the processor supports it. I'll still use the native_wbnoinvd() wrapper to clarify the encoding as you suggested, but does this reasoning for keeping alternative() make sense to you Dave? Or am I missing something?
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index ca073f40698f..ecf93a243b83 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -112,6 +112,7 @@ void native_play_dead(void); void play_dead_common(void); void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); +int wbnoinvd_on_all_cpus(void); void smp_kick_mwait_play_dead(void); @@ -160,6 +161,12 @@ static inline int wbinvd_on_all_cpus(void) return 0; } +static inline int wbnoinvd_on_all_cpus(void) +{ + wbnoinvd(); + return 0; +} + static inline struct cpumask *cpu_llc_shared_mask(int cpu) { return (struct cpumask *)cpumask_of(0); diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 03e7c2d49559..94640c3491d7 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -117,7 +117,20 @@ static inline void wrpkru(u32 pkru) static __always_inline void wbinvd(void) { - asm volatile("wbinvd": : :"memory"); + asm volatile("wbinvd" : : : "memory"); +} + +/* + * Cheaper version of wbinvd(). Call when caches + * need to be written back but not invalidated. + */ +static __always_inline void wbnoinvd(void) +{ + /* + * Use the compatible but more destructive "invalidate" + * variant when no-invalidate is unavailable. + */ + alternative("wbinvd", "wbnoinvd", X86_FEATURE_WBNOINVD); } static inline unsigned long __read_cr4(void) diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c index 7af743bd3b13..7ac5cca53031 100644 --- a/arch/x86/lib/cache-smp.c +++ b/arch/x86/lib/cache-smp.c @@ -20,3 +20,15 @@ int wbinvd_on_all_cpus(void) return 0; } EXPORT_SYMBOL(wbinvd_on_all_cpus); + +static void __wbnoinvd(void *dummy) +{ + wbnoinvd(); +} + +int wbnoinvd_on_all_cpus(void) +{ + on_each_cpu(__wbnoinvd, NULL, 1); + return 0; +} +EXPORT_SYMBOL(wbnoinvd_on_all_cpus);
In line with WBINVD usage, add WBONINVD helper functions. For the wbnoinvd() helper, fall back to WBINVD if X86_FEATURE_WBNOINVD is not present. Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> --- arch/x86/include/asm/smp.h | 7 +++++++ arch/x86/include/asm/special_insns.h | 15 ++++++++++++++- arch/x86/lib/cache-smp.c | 12 ++++++++++++ 3 files changed, 33 insertions(+), 1 deletion(-)