diff mbox series

[2/4] x86/spec-ctrl: Synthesize RSBA/RRSBA bits with older microcode

Message ID 20230526110656.4018711-4-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Andrew Cooper May 26, 2023, 11:06 a.m. UTC
In order to level a VM safely for migration, the toolstack needs to know the
RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated.

Synthesize the bits when missing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/include/asm/cpufeature.h |  1 +
 xen/arch/x86/spec_ctrl.c              | 50 +++++++++++++++++++++++----
 2 files changed, 44 insertions(+), 7 deletions(-)

Comments

Jan Beulich May 30, 2023, 9:18 a.m. UTC | #1
On 26.05.2023 13:06, Andrew Cooper wrote:
> @@ -687,6 +697,32 @@ static bool __init retpoline_calculations(void)
>      if ( safe )
>          return true;
>  
> +    /*
> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
> +     * agreed upon meaning at the time of writing (May 2023) is thus:
> +     *
> +     * - RSBA (RSB Alterantive) means that an RSB may fall back to an
> +     *   alternative predictor on underflow.  Skylake uarch and later all have
> +     *   this property.  Broadwell too, when running microcode versions prior
> +     *   to Jan 2018.
> +     *
> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
> +     *   tagging of predictions with the mode in which they were learned.  So
> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
> +     *
> +     * Some parts (Broadwell) are not expected to ever enumerate this
> +     * behaviour directly.  Other parts have differing enumeration with
> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
> +     * to guests, and so toolstacks can level a VM safelty for migration.
> +     */

If the difference between the two is whether eIBRS is active (as you did
word it yet more explicitly in e.g. [1]), then ...

> + unsafe_maybe_fixup_rrsba:
> +    if ( !cpu_has_rrsba )
> +        setup_force_cpu_cap(X86_FEATURE_RRSBA);
> +
> + unsafe_maybe_fixup_rsba:
> +    if ( !cpu_has_rsba )
> +        setup_force_cpu_cap(X86_FEATURE_RSBA);
> +
>      return false;
>  }

... can both actually be active at the same time? IOW is there a "return
false" missing ahead of the 2nd label?

Not having looked at further patches yet it also strikes me as odd that
each of the two labels is used exactly once only. Leaving the shared
comment aside, imo this would then better avoid "goto".

Finally, what use are the two if()s? There's nothing wrong with forcing
a feature which is already available.

Jan

[1] https://lore.kernel.org/lkml/f43c3c33-f8b9-e764-709d-b3864d2bd9f8@citrix.com/
Andrew Cooper May 30, 2023, 10 a.m. UTC | #2
On 30/05/2023 10:18 am, Jan Beulich wrote:
> On 26.05.2023 13:06, Andrew Cooper wrote:
>> @@ -687,6 +697,32 @@ static bool __init retpoline_calculations(void)
>>      if ( safe )
>>          return true;
>>  
>> +    /*
>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>> +     *
>> +     * - RSBA (RSB Alterantive) means that an RSB may fall back to an
>> +     *   alternative predictor on underflow.  Skylake uarch and later all have
>> +     *   this property.  Broadwell too, when running microcode versions prior
>> +     *   to Jan 2018.
>> +     *
>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>> +     *   tagging of predictions with the mode in which they were learned.  So
>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>> +     *
>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>> +     * behaviour directly.  Other parts have differing enumeration with
>> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
>> +     * to guests, and so toolstacks can level a VM safelty for migration.
>> +     */
> If the difference between the two is whether eIBRS is active (as you did
> word it yet more explicitly in e.g. [1]), then ...
>
>> + unsafe_maybe_fixup_rrsba:
>> +    if ( !cpu_has_rrsba )
>> +        setup_force_cpu_cap(X86_FEATURE_RRSBA);
>> +
>> + unsafe_maybe_fixup_rsba:
>> +    if ( !cpu_has_rsba )
>> +        setup_force_cpu_cap(X86_FEATURE_RSBA);
>> +
>>      return false;
>>  }
> ... can both actually be active at the same time? IOW is there a "return
> false" missing ahead of the 2nd label?

I've already got a question out to Intel to this effect.  (I didn't say
the enumeration made much sense...)

> Not having looked at further patches yet it also strikes me as odd that
> each of the two labels is used exactly once only. Leaving the shared
> comment aside, imo this would then better avoid "goto".

They're both used twice, not once.  You asked why it wasn't "return
safe;" in the previous patch?  Well this is why.

> Finally, what use are the two if()s? There's nothing wrong with forcing
> a feature which is already available.

It breaks is_forced_cpu_cap().

Also, I considered having a printk() here.  I've still got it around in
a debug patch, but I decided against it.

~Andrew
Jan Beulich May 30, 2023, 10:19 a.m. UTC | #3
On 30.05.2023 12:00, Andrew Cooper wrote:
> On 30/05/2023 10:18 am, Jan Beulich wrote:
>> On 26.05.2023 13:06, Andrew Cooper wrote:
>>> @@ -687,6 +697,32 @@ static bool __init retpoline_calculations(void)
>>>      if ( safe )
>>>          return true;
>>>  
>>> +    /*
>>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>>> +     *
>>> +     * - RSBA (RSB Alterantive) means that an RSB may fall back to an
>>> +     *   alternative predictor on underflow.  Skylake uarch and later all have
>>> +     *   this property.  Broadwell too, when running microcode versions prior
>>> +     *   to Jan 2018.
>>> +     *
>>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
>>> +     *   tagging of predictions with the mode in which they were learned.  So
>>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>>> +     *
>>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>>> +     * behaviour directly.  Other parts have differing enumeration with
>>> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
>>> +     * to guests, and so toolstacks can level a VM safelty for migration.
>>> +     */
>> If the difference between the two is whether eIBRS is active (as you did
>> word it yet more explicitly in e.g. [1]), then ...
>>
>>> + unsafe_maybe_fixup_rrsba:
>>> +    if ( !cpu_has_rrsba )
>>> +        setup_force_cpu_cap(X86_FEATURE_RRSBA);
>>> +
>>> + unsafe_maybe_fixup_rsba:
>>> +    if ( !cpu_has_rsba )
>>> +        setup_force_cpu_cap(X86_FEATURE_RSBA);
>>> +
>>>      return false;
>>>  }
>> ... can both actually be active at the same time? IOW is there a "return
>> false" missing ahead of the 2nd label?
> 
> I've already got a question out to Intel to this effect.  (I didn't say
> the enumeration made much sense...)
> 
>> Not having looked at further patches yet it also strikes me as odd that
>> each of the two labels is used exactly once only. Leaving the shared
>> comment aside, imo this would then better avoid "goto".
> 
> They're both used twice, not once.  You asked why it wasn't "return
> safe;" in the previous patch?  Well this is why.

Ouch, yes. The labels themselves are used just once, but there's
important fall-through from above here.

>> Finally, what use are the two if()s? There's nothing wrong with forcing
>> a feature which is already available.
> 
> It breaks is_forced_cpu_cap().

Hmm, yes, but is that important here? (If you decide to keep the if()s,
which I'm not opposed to, would you mind adding half a sentence to the
description or maybe a brief code comment?)

Jan

> Also, I considered having a printk() here.  I've still got it around in
> a debug patch, but I decided against it.
> 
> ~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
index 50235f098d70..08e3eedd1280 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -192,6 +192,7 @@  static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_tsx_ctrl        boot_cpu_has(X86_FEATURE_TSX_CTRL)
 #define cpu_has_taa_no          boot_cpu_has(X86_FEATURE_TAA_NO)
 #define cpu_has_fb_clear        boot_cpu_has(X86_FEATURE_FB_CLEAR)
+#define cpu_has_rrsba           boot_cpu_has(X86_FEATURE_RRSBA)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0774d40627dd..2647784615cc 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -578,7 +578,10 @@  static bool __init check_smt_enabled(void)
     return false;
 }
 
-/* Calculate whether Retpoline is known-safe on this CPU. */
+/*
+ * Calculate whether Retpoline is known-safe on this CPU.  Synthesize missing
+ * RSBA/RRSBA bits when running with old microcode.
+ */
 static bool __init retpoline_calculations(void)
 {
     unsigned int ucode_rev = this_cpu(cpu_sig).rev;
@@ -592,13 +595,18 @@  static bool __init retpoline_calculations(void)
         return false;
 
     /*
-     * RSBA may be set by a hypervisor to indicate that we may move to a
-     * processor which isn't retpoline-safe.
-     *
      * Processors offering Enhanced IBRS are not guarenteed to be
      * repoline-safe.
      */
-    if ( cpu_has_rsba || cpu_has_eibrs )
+    if ( cpu_has_eibrs )
+        goto unsafe_maybe_fixup_rrsba;
+
+    /*
+     * RSBA is explicitly enumerated in some cases, but may also be set by a
+     * hypervisor to indicate that we may move to a processor which isn't
+     * retpoline-safe.
+     */
+    if ( cpu_has_rsba )
         return false;
 
     switch ( boot_cpu_data.x86_model )
@@ -648,6 +656,8 @@  static bool __init retpoline_calculations(void)
 
         /*
          * Skylake, Kabylake and Cannonlake processors are not retpoline-safe.
+         * Note: the eIBRS-capable steppings are filtered out earlier, so the
+         * remainder here are the ones which suffer only RSBA behaviour.
          */
     case 0x4e: /* Skylake M */
     case 0x55: /* Skylake X */
@@ -656,7 +666,7 @@  static bool __init retpoline_calculations(void)
     case 0x67: /* Cannonlake? */
     case 0x8e: /* Kabylake M */
     case 0x9e: /* Kabylake D */
-        return false;
+        goto unsafe_maybe_fixup_rsba;
 
         /*
          * Atom processors before Goldmont Plus/Gemini Lake are retpoline-safe.
@@ -687,6 +697,32 @@  static bool __init retpoline_calculations(void)
     if ( safe )
         return true;
 
+    /*
+     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
+     * agreed upon meaning at the time of writing (May 2023) is thus:
+     *
+     * - RSBA (RSB Alterantive) means that an RSB may fall back to an
+     *   alternative predictor on underflow.  Skylake uarch and later all have
+     *   this property.  Broadwell too, when running microcode versions prior
+     *   to Jan 2018.
+     *
+     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
+     *   tagging of predictions with the mode in which they were learned.  So
+     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
+     *
+     * Some parts (Broadwell) are not expected to ever enumerate this
+     * behaviour directly.  Other parts have differing enumeration with
+     * microcode version.  Fix up Xen's idea, so we can advertise them safely
+     * to guests, and so toolstacks can level a VM safelty for migration.
+     */
+ unsafe_maybe_fixup_rrsba:
+    if ( !cpu_has_rrsba )
+        setup_force_cpu_cap(X86_FEATURE_RRSBA);
+
+ unsafe_maybe_fixup_rsba:
+    if ( !cpu_has_rsba )
+        setup_force_cpu_cap(X86_FEATURE_RSBA);
+
     return false;
 }
 
@@ -1146,7 +1182,7 @@  void __init init_speculation_mitigations(void)
             thunk = THUNK_JMP;
     }
 
-    /* Determine if retpoline is safe on this CPU. */
+    /* Determine if retpoline is safe on this CPU.  Fix up RSBA/RRSBA enumerations. */
     retpoline_safe = retpoline_calculations();
 
     /*