diff mbox series

x86/cpu-policy: Hide x2APIC from PV guests

Message ID 20240229221448.2593171-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series x86/cpu-policy: Hide x2APIC from PV guests | expand

Commit Message

Andrew Cooper Feb. 29, 2024, 10:14 p.m. UTC
PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
CPUID bit saying that they can.

Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
generally see x2APIC except on Zen1-and-older AMD systems.

Linux works around this by explicitly hiding the bit itself, and filtering
EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
guests, and entirely ignores the APIC when built as a PV guest.

Change the annotation from !A to !S.  This has a consequence of stripping it
out of both PV featuremasks.  However, as existing guests may have seen the
bit, set it back into the PV Max policy; a VM which saw the bit and is alive
enough to migrate will have ignored it one way or another.

Hiding x2APIC does also change the contents of leaf 0xb, but as the
information is nonsense to begin with, this is likely an improvement on the
status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
with the host's topology structure, and the APIC_IDs are useless without an
MADT to start with.  Dom0 is the only PV VM to get an MADT but it's the host
one, meaning the two sets of APIC_IDs are from different address spaces.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

v2:
 * Set x2APIC in PV max after applying the featuremask.
 * Drop the hunk for the default policy as it's handled by the A->S transform.
 * Rewrite the commit message.
---
 xen/arch/x86/cpu-policy.c                   | 11 +++++++++--
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 2 files changed, 10 insertions(+), 3 deletions(-)

Comments

Roger Pau Monne March 1, 2024, 12:11 p.m. UTC | #1
On Thu, Feb 29, 2024 at 10:14:48PM +0000, Andrew Cooper wrote:
> PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
> access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
> CPUID bit saying that they can.
> 
> Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
> generally see x2APIC except on Zen1-and-older AMD systems.
> 
> Linux works around this by explicitly hiding the bit itself, and filtering
> EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
> guests, and entirely ignores the APIC when built as a PV guest.
> 
> Change the annotation from !A to !S.  This has a consequence of stripping it
> out of both PV featuremasks.  However, as existing guests may have seen the
> bit, set it back into the PV Max policy; a VM which saw the bit and is alive
> enough to migrate will have ignored it one way or another.
> 
> Hiding x2APIC does also change the contents of leaf 0xb, but as the
> information is nonsense to begin with, this is likely an improvement on the
> status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
> with the host's topology structure, and the APIC_IDs are useless without an
> MADT to start with.

AFAICT the APIC ID in CPUID leaf 0x1 %ebx is only reported to HVM
guests, not PV ones (see the dynamic section of guest_cpuid()).  This
paragraph might need adjusting then, as it reads to me as if PV
guests could also expect APIC_ID == vCPU_ID * 2.

With this change no x{,2}APIC ID gets exposed in CPUID for PV guests
unless it's for compat reasons for guests that already saw the x2APIC
feature.

> Dom0 is the only PV VM to get an MADT but it's the host
> one, meaning the two sets of APIC_IDs are from different address spaces.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
Andrew Cooper March 1, 2024, 3:29 p.m. UTC | #2
On 01/03/2024 12:11 pm, Roger Pau Monné wrote:
> On Thu, Feb 29, 2024 at 10:14:48PM +0000, Andrew Cooper wrote:
>> PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
>> access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
>> CPUID bit saying that they can.
>>
>> Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
>> generally see x2APIC except on Zen1-and-older AMD systems.
>>
>> Linux works around this by explicitly hiding the bit itself, and filtering
>> EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
>> guests, and entirely ignores the APIC when built as a PV guest.
>>
>> Change the annotation from !A to !S.  This has a consequence of stripping it
>> out of both PV featuremasks.  However, as existing guests may have seen the
>> bit, set it back into the PV Max policy; a VM which saw the bit and is alive
>> enough to migrate will have ignored it one way or another.
>>
>> Hiding x2APIC does also change the contents of leaf 0xb, but as the
>> information is nonsense to begin with, this is likely an improvement on the
>> status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
>> with the host's topology structure, and the APIC_IDs are useless without an
>> MADT to start with.
> AFAICT the APIC ID in CPUID leaf 0x1 %ebx is only reported to HVM
> guests, not PV ones (see the dynamic section of guest_cpuid()).

"reported to" is complicated for PV with CPUID Masking, because host
details will leak in for both the xAPIC and x2APIC IDs.

> This paragraph might need adjusting then, as it reads to me as if PV
> guests could also expect APIC_ID == vCPU_ID * 2.
>
> With this change no x{,2}APIC ID gets exposed in CPUID for PV guests
> unless it's for compat reasons for guests that already saw the x2APIC
> feature.

I'll see if I can find some better wording.
>> Dom0 is the only PV VM to get an MADT but it's the host
>> one, meaning the two sets of APIC_IDs are from different address spaces.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

~Andrew
Jan Beulich March 4, 2024, 8:33 a.m. UTC | #3
On 29.02.2024 23:14, Andrew Cooper wrote:
> PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
> access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
> CPUID bit saying that they can.

This argumentation could then be used equally for the APIC bit. Why is it
correct to hide x2APIC, but not APIC?

> Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
> generally see x2APIC except on Zen1-and-older AMD systems.
> 
> Linux works around this by explicitly hiding the bit itself, and filtering
> EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
> guests, and entirely ignores the APIC when built as a PV guest.
> 
> Change the annotation from !A to !S.  This has a consequence of stripping it
> out of both PV featuremasks.  However, as existing guests may have seen the
> bit, set it back into the PV Max policy; a VM which saw the bit and is alive
> enough to migrate will have ignored it one way or another.
> 
> Hiding x2APIC does also change the contents of leaf 0xb, but as the
> information is nonsense to begin with, this is likely an improvement on the
> status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
> with the host's topology structure, and the APIC_IDs are useless without an
> MADT to start with.  Dom0 is the only PV VM to get an MADT but it's the host
> one, meaning the two sets of APIC_IDs are from different address spaces.

Aiui the field will now be seen as zero on all CPUs. Is all CPUs having the
same APIC ID there really better than them all having different ones?

Also while I agree that logically CPUID leaf 0xb EDX is tied to x2APIC being
available as a feature, nothing is said in this regard in the documentation
of that leaf. This in particular leaves open whether on a system with x2APIC
disabled in/by firmware the value would actually be zero. Yet that case could
be considered somewhat similar to the PV case here.

Jan
Andrew Cooper March 4, 2024, 12:21 p.m. UTC | #4
On 04/03/2024 8:33 am, Jan Beulich wrote:
> On 29.02.2024 23:14, Andrew Cooper wrote:
>> PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
>> access any of the x2APIC MSR range.  Therefore they mustn't see the x2APIC
>> CPUID bit saying that they can.
> This argumentation could then be used equally for the APIC bit. Why is it
> correct to hide x2APIC, but not APIC?

In an ideal world we'd hide APIC too.

But Linux pvops doesn't tolerate it, and I'm not sure classic dom0 did
either.

>> Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
>> generally see x2APIC except on Zen1-and-older AMD systems.
>>
>> Linux works around this by explicitly hiding the bit itself, and filtering
>> EXTD out of MSR_APIC_BASE reads.  NetBSD behaves more in the spirit of PV
>> guests, and entirely ignores the APIC when built as a PV guest.
>>
>> Change the annotation from !A to !S.  This has a consequence of stripping it
>> out of both PV featuremasks.  However, as existing guests may have seen the
>> bit, set it back into the PV Max policy; a VM which saw the bit and is alive
>> enough to migrate will have ignored it one way or another.
>>
>> Hiding x2APIC does also change the contents of leaf 0xb, but as the
>> information is nonsense to begin with, this is likely an improvement on the
>> status quo.  The blind reporting of APIC_ID = vCPU_ID * 2 isn't interlinked
>> with the host's topology structure, and the APIC_IDs are useless without an
>> MADT to start with.  Dom0 is the only PV VM to get an MADT but it's the host
>> one, meaning the two sets of APIC_IDs are from different address spaces.
> Aiui the field will now be seen as zero on all CPUs. Is all CPUs having the
> same APIC ID there really better than them all having different ones?

Again - we're taking something that was conditionally like this, and
making it unconditionally like this.

When x2APIC is hidden, so is the precondition for the data in this leaf
being valid.

> Also while I agree that logically CPUID leaf 0xb EDX is tied to x2APIC being
> available as a feature, nothing is said in this regard in the documentation
> of that leaf. This in particular leaves open whether on a system with x2APIC
> disabled in/by firmware the value would actually be zero. Yet that case could
> be considered somewhat similar to the PV case here.

I'm not aware of there being a capability to disable x2APIC in firmware.

The only choices you got were which mode to default to, and
(occasionally) whether to set the x2APIC opt-out in DMAR.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 15b49048fd55..c9b32bc17849 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -561,6 +561,14 @@  static void __init calculate_pv_max_policy(void)
     for ( i = 0; i < ARRAY_SIZE(fs); ++i )
         fs[i] &= pv_max_featuremask[i];
 
+    /*
+     * Xen at the time of writing (Feb 2024, 4.19 dev cycle) used to leak the
+     * host x2APIC capability into PV guests, but never supported the guest
+     * trying to turn x2APIC mode on.  Tolerate an incoming VM which saw the
+     * x2APIC CPUID bit and is alive enough to migrate.
+     */
+    __set_bit(X86_FEATURE_X2APIC, fs);
+
     /*
      * If Xen isn't virtualising MSR_SPEC_CTRL for PV guests (functional
      * availability, or admin choice), hide the feature.
@@ -854,11 +862,10 @@  void recalculate_cpuid_policy(struct domain *d)
     }
 
     /*
-     * Allow the toolstack to set HTT, X2APIC and CMP_LEGACY.  These bits
+     * Allow the toolstack to set HTT and CMP_LEGACY.  These bits
      * affect how to interpret topology information in other cpuid leaves.
      */
     __set_bit(X86_FEATURE_HTT, max_fs);
-    __set_bit(X86_FEATURE_X2APIC, max_fs);
     __set_bit(X86_FEATURE_CMP_LEGACY, max_fs);
 
     /*
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 7e184ce0e3f4..0374cec3a2af 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -123,7 +123,7 @@  XEN_CPUFEATURE(PCID,          1*32+17) /*H  Process Context ID */
 XEN_CPUFEATURE(DCA,           1*32+18) /*   Direct Cache Access */
 XEN_CPUFEATURE(SSE4_1,        1*32+19) /*A  Streaming SIMD Extensions 4.1 */
 XEN_CPUFEATURE(SSE4_2,        1*32+20) /*A  Streaming SIMD Extensions 4.2 */
-XEN_CPUFEATURE(X2APIC,        1*32+21) /*!A Extended xAPIC */
+XEN_CPUFEATURE(X2APIC,        1*32+21) /*!S Extended xAPIC */
 XEN_CPUFEATURE(MOVBE,         1*32+22) /*A  movbe instruction */
 XEN_CPUFEATURE(POPCNT,        1*32+23) /*A  POPCNT instruction */
 XEN_CPUFEATURE(TSC_DEADLINE,  1*32+24) /*S  TSC Deadline Timer */