diff mbox series

[3/3] x86: correct fencing around CLFLUSH

Message ID 7026f7df-1f70-0018-a6eb-b7e043b279d8@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: correct fencing around CLFLUSH (+some tidying) | expand

Commit Message

Jan Beulich Feb. 23, 2022, 10:13 a.m. UTC
As noted in the context of 3330013e6739 ("VT-d / x86: re-arrange cache
syncing"): While cache_writeback() has the SFENCE on the correct side of
CLFLUSHOPT, flush_area_local() doesn't. While I can't prove it due to
lacking a copy of the old SDM version, I can only assume this placement
was a result of what had been described there originally. In any event
recent versions of the SDM hve been telling us otherwise.

For AMD (and Hygon, albeit there it's benign, since all their CPUs are
expected to support CLFLUSHOPT) the situation is more complex: MFENCE is
needed ahead and/or after CLFLUSH when the CPU doesn't also support
CLFLUSHOPT. (It's "and" in our case, as we cannot know what the caller's
needs are.)

Fixes: 623c720fc8da3 ("x86: use CLFLUSHOPT when available")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper Feb. 23, 2022, 12:33 p.m. UTC | #1
On 23/02/2022 10:13, Jan Beulich wrote:
> As noted in the context of 3330013e6739 ("VT-d / x86: re-arrange cache
> syncing"): While cache_writeback() has the SFENCE on the correct side of
> CLFLUSHOPT, flush_area_local() doesn't. While I can't prove it due to
> lacking a copy of the old SDM version, I can only assume this placement
> was a result of what had been described there originally. In any event
> recent versions of the SDM hve been telling us otherwise.
>
> For AMD (and Hygon, albeit there it's benign, since all their CPUs are
> expected to support CLFLUSHOPT) the situation is more complex: MFENCE is
> needed ahead and/or after CLFLUSH when the CPU doesn't also support
> CLFLUSHOPT. (It's "and" in our case, as we cannot know what the caller's
> needs are.)
>
> Fixes: 623c720fc8da3 ("x86: use CLFLUSHOPT when available")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -346,9 +346,14 @@ void __init early_cpu_init(void)
>  	       c->x86_model, c->x86_model, c->x86_mask, eax);
>  
>  	if (c->cpuid_level >= 7)
> -		cpuid_count(7, 0, &eax, &ebx,
> +		cpuid_count(7, 0, &eax,
> +                            &c->x86_capability[FEATURESET_7b0],
>                              &c->x86_capability[FEATURESET_7c0], &edx);
>  
> +	if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
> +	    cpu_has(c, X86_FEATURE_CLFLUSHOPT))
> +		setup_force_cpu_cap(X86_FEATURE_CLFLUSH_NO_MFENCE);

This is somewhat ugly, not only because it presumes that the early AMD
implementation peculiarities are common.

It also has a corner case that goes wrong when the BSP enumerates
CLFLUSHOPT but later CPUs don't.  In this case the workaround will be
disengaged even when it's not safe to.

Most importantly however, it makes the one current slow usecase (VT-d on
early Intel with only CLFLUSH) even slower.


I suggest inverting this workaround (and IMO, using the bug
infrastructure, because that's exactly what it is) and leaving a big
warning by the function saying "don't use on AMD before alternatives
have run" or something.  It's quite possibly a problem we'll never need
to solve in practice, although my plans for overhauling CPUID scanning
will probably fix it because we can move the first alternatives pass far
earlier as a consequence.


> +
>  	eax = cpuid_eax(0x80000000);
>  	if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
>  		ebx = eax >= 0x8000001f ? cpuid_ebx(0x8000001f) : 0;
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -245,12 +245,15 @@ unsigned int flush_area_local(const void
>               c->x86_clflush_size && c->x86_cache_size && sz &&
>               ((sz >> 10) < c->x86_cache_size) )
>          {
> -            alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
> +            alternative("mfence", , X86_FEATURE_CLFLUSH_NO_MFENCE);

An an aside, the absence of "" is very weird parse, and only compiles
because this is a macro rather than a function.

I'd request that it stays, simply to make the code read more like regular C.

~Andrew
Jan Beulich Feb. 24, 2022, 12:13 p.m. UTC | #2
On 23.02.2022 13:33, Andrew Cooper wrote:
> On 23/02/2022 10:13, Jan Beulich wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -346,9 +346,14 @@ void __init early_cpu_init(void)
>>  	       c->x86_model, c->x86_model, c->x86_mask, eax);
>>  
>>  	if (c->cpuid_level >= 7)
>> -		cpuid_count(7, 0, &eax, &ebx,
>> +		cpuid_count(7, 0, &eax,
>> +                            &c->x86_capability[FEATURESET_7b0],
>>                              &c->x86_capability[FEATURESET_7c0], &edx);
>>  
>> +	if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
>> +	    cpu_has(c, X86_FEATURE_CLFLUSHOPT))
>> +		setup_force_cpu_cap(X86_FEATURE_CLFLUSH_NO_MFENCE);
> 
> This is somewhat ugly, not only because it presumes that the early AMD
> implementation peculiarities are common.
> 
> It also has a corner case that goes wrong when the BSP enumerates
> CLFLUSHOPT but later CPUs don't.  In this case the workaround will be
> disengaged even when it's not safe to.

You realize though that this cannot be taken care of when alternatives
patching is involved? Are you suggesting to not use patching just to
deal with an asymmetry we don't really deal with anywhere else?

> Most importantly however, it makes the one current slow usecase (VT-d on
> early Intel with only CLFLUSH) even slower.
> 
> 
> I suggest inverting this workaround (and IMO, using the bug
> infrastructure, because that's exactly what it is) and leaving a big
> warning by the function saying "don't use on AMD before alternatives
> have run" or something.  It's quite possibly a problem we'll never need
> to solve in practice, although my plans for overhauling CPUID scanning
> will probably fix it because we can move the first alternatives pass far
> earlier as a consequence.

I've switched to marking this BUG, but I'm not sure about such a
comment: It really depends on the use whether it would be safe
without the MFENCEs. (We also aren't aware of problems, despite them
having been missing forever.) Furthermore it's not overly likely for
anyone to look here when adding a new use of FLUSH_CACHE. I'd
therefore rather consider it best effort behavior until patching has
taken place.

>> --- a/xen/arch/x86/flushtlb.c
>> +++ b/xen/arch/x86/flushtlb.c
>> @@ -245,12 +245,15 @@ unsigned int flush_area_local(const void
>>               c->x86_clflush_size && c->x86_cache_size && sz &&
>>               ((sz >> 10) < c->x86_cache_size) )
>>          {
>> -            alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
>> +            alternative("mfence", , X86_FEATURE_CLFLUSH_NO_MFENCE);
> 
> An an aside, the absence of "" is very weird parse, and only compiles
> because this is a macro rather than a function.
> 
> I'd request that it stays, simply to make the code read more like regular C.

To be honest I was half expecting this feedback. For now I've put
back the quotes, but I have a change halfway ready which will
eliminate the need for quotes in the same cases where the
assembler macros don't require their use (as you may guess: by
using the assembler macros instead of maintaining redundant C
infrastructure). I guess at that point it would become a little
inconsistent if quotes were present just to express "empty". Also
if I'm not mistaken this isn't the only place where we make use of
macro arguments being allowed to be empty.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -346,9 +346,14 @@  void __init early_cpu_init(void)
 	       c->x86_model, c->x86_model, c->x86_mask, eax);
 
 	if (c->cpuid_level >= 7)
-		cpuid_count(7, 0, &eax, &ebx,
+		cpuid_count(7, 0, &eax,
+                            &c->x86_capability[FEATURESET_7b0],
                             &c->x86_capability[FEATURESET_7c0], &edx);
 
+	if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
+	    cpu_has(c, X86_FEATURE_CLFLUSHOPT))
+		setup_force_cpu_cap(X86_FEATURE_CLFLUSH_NO_MFENCE);
+
 	eax = cpuid_eax(0x80000000);
 	if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
 		ebx = eax >= 0x8000001f ? cpuid_ebx(0x8000001f) : 0;
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -245,12 +245,15 @@  unsigned int flush_area_local(const void
              c->x86_clflush_size && c->x86_cache_size && sz &&
              ((sz >> 10) < c->x86_cache_size) )
         {
-            alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
+            alternative("mfence", , X86_FEATURE_CLFLUSH_NO_MFENCE);
             for ( i = 0; i < sz; i += c->x86_clflush_size )
                 alternative_input("ds; clflush %0",
                                   "data16 clflush %0",      /* clflushopt */
                                   X86_FEATURE_CLFLUSHOPT,
                                   "m" (((const char *)va)[i]));
+            alternative_2("mfence",
+                          , X86_FEATURE_CLFLUSH_NO_MFENCE,
+                          "sfence", X86_FEATURE_CLFLUSHOPT);
             flags &= ~FLUSH_CACHE;
         }
         else
@@ -274,6 +277,8 @@  void cache_writeback(const void *addr, u
     unsigned int clflush_size = current_cpu_data.x86_clflush_size ?: 16;
     const void *end = addr + size;
 
+    alternative("mfence", , X86_FEATURE_CLFLUSH_NO_MFENCE);
+
     addr -= (unsigned long)addr & (clflush_size - 1);
     for ( ; addr < end; addr += clflush_size )
     {
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -24,7 +24,7 @@  XEN_CPUFEATURE(APERFMPERF,        X86_SY
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 - unused. */
+XEN_CPUFEATURE(CLFLUSH_NO_MFENCE, X86_SYNTH(12)) /* No MFENCE needed to serialize CLFLUSH */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */