diff mbox series

[3/3] x86: Use CpuidUserDis if an AMD HVM guest toggles CPUID faulting

Message ID 20230505175705.18098-4-alejandro.vallejo@cloud.com (mailing list archive)
State Superseded
Headers show
Series Add CpuidUserDis support | expand

Commit Message

Alejandro Vallejo May 5, 2023, 5:57 p.m. UTC
This is in order to aid guests of AMD hardware that we have exposed
CPUID faulting to. If they try to modify the Intel MSR that enables
the feature, trigger levelling so AMD's version of it (CpuidUserDis)
is used instead.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/x86/msr.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Jan Beulich May 8, 2023, 1:18 p.m. UTC | #1
On 05.05.2023 19:57, Alejandro Vallejo wrote:
> This is in order to aid guests of AMD hardware that we have exposed
> CPUID faulting to. If they try to modify the Intel MSR that enables
> the feature, trigger levelling so AMD's version of it (CpuidUserDis)
> is used instead.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  xen/arch/x86/msr.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Don't you also need to update cpu-policy.c:calculate_host_policy()
for the guest to actually know it can use the functionality? Which
in turn would appear to require some form of adjustment to
lib/x86/policy.c:x86_cpu_policies_are_compatible().

> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -431,6 +431,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>      {
>          bool old_cpuid_faulting = msrs->misc_features_enables.cpuid_faulting;
>  
> +        /*
> +         * The boot CPU must support Intel's CPUID faulting _or_
> +         * AMD's CpuidUserDis.
> +         */
> +        bool can_fault_cpuid = cpu_has_cpuid_faulting ||
> +                               boot_cpu_has(X86_FEATURE_CPUID_USER_DIS);

I'd like to suggest that in such a comment it not be emphasized that
it's the boot CPU (alone) we check. In fact I'm not convinced any
comment is needed here at all. I'm further inclined to suggest to
omit this (single-use) variable altogether.

Jan
Andrew Cooper May 9, 2023, 10:05 a.m. UTC | #2
On 08/05/2023 2:18 pm, Jan Beulich wrote:
> On 05.05.2023 19:57, Alejandro Vallejo wrote:
>> This is in order to aid guests of AMD hardware that we have exposed
>> CPUID faulting to. If they try to modify the Intel MSR that enables
>> the feature, trigger levelling so AMD's version of it (CpuidUserDis)
>> is used instead.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>>  xen/arch/x86/msr.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
> Don't you also need to update cpu-policy.c:calculate_host_policy()
> for the guest to actually know it can use the functionality? Which
> in turn would appear to require some form of adjustment to
> lib/x86/policy.c:x86_cpu_policies_are_compatible().

I asked Alejandro to do it like this.

Advertising this to guests requires plumbing another MSR into the
infrastructure which isn't quite set up properly let, and is in flux
from my work.

For now, this just lets Xen enforce the policy over PV guests, which is
an improvement in and of itself.

~Andrew
Jan Beulich May 9, 2023, 2:41 p.m. UTC | #3
On 09.05.2023 12:05, Andrew Cooper wrote:
> On 08/05/2023 2:18 pm, Jan Beulich wrote:
>> On 05.05.2023 19:57, Alejandro Vallejo wrote:
>>> This is in order to aid guests of AMD hardware that we have exposed
>>> CPUID faulting to. If they try to modify the Intel MSR that enables
>>> the feature, trigger levelling so AMD's version of it (CpuidUserDis)
>>> is used instead.
>>>
>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>>> ---
>>>  xen/arch/x86/msr.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>> Don't you also need to update cpu-policy.c:calculate_host_policy()
>> for the guest to actually know it can use the functionality? Which
>> in turn would appear to require some form of adjustment to
>> lib/x86/policy.c:x86_cpu_policies_are_compatible().
> 
> I asked Alejandro to do it like this.
> 
> Advertising this to guests requires plumbing another MSR into the
> infrastructure which isn't quite set up properly let, and is in flux
> from my work.
> 
> For now, this just lets Xen enforce the policy over PV guests, which is
> an improvement in and of itself.

But as per the title this patch is about HVM guests (aiui the PV aspect
is taken care of already without the patch here). In any event - if the
omissions are intentional (for the time being), then I think that wants
mentioning in the description.

Jan
Alejandro Vallejo May 9, 2023, 2:57 p.m. UTC | #4
On Tue, May 09, 2023 at 04:41:49PM +0200, Jan Beulich wrote:
> > I asked Alejandro to do it like this.
> > 
> > Advertising this to guests requires plumbing another MSR into the
> > infrastructure which isn't quite set up properly let, and is in flux
> > from my work.
> > 
> > For now, this just lets Xen enforce the policy over PV guests, which is
> > an improvement in and of itself.
> 
> But as per the title this patch is about HVM guests (aiui the PV aspect
> is taken care of already without the patch here). In any event - if the
> omissions are intentional (for the time being), then I think that wants
> mentioning in the description.
> 
> Jan

HVM guests are always exposed the Intel interface (emulated if not natively
available). The HVM max policy forces it on, and I don't see anything in
the default policy overriding it. My attempt here was to let AMD guests use
the emulated Intel MSR and trigger levellling that would itself rely on
AMD's CpuidUserDis without guest intervention. That said, several cans of
worms exist in mantaining this internal routing. I'll get rid of that last
patch and leave HVM guests alone for the time being. They are functionally 
correct (albeit their CPUID take 2 faults, whereas 1 would suffice).

Alejandro
Jan Beulich May 10, 2023, 8:15 a.m. UTC | #5
On 09.05.2023 12:05, Andrew Cooper wrote:
> On 08/05/2023 2:18 pm, Jan Beulich wrote:
>> On 05.05.2023 19:57, Alejandro Vallejo wrote:
>>> This is in order to aid guests of AMD hardware that we have exposed
>>> CPUID faulting to. If they try to modify the Intel MSR that enables
>>> the feature, trigger levelling so AMD's version of it (CpuidUserDis)
>>> is used instead.
>>>
>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>>> ---
>>>  xen/arch/x86/msr.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>> Don't you also need to update cpu-policy.c:calculate_host_policy()
>> for the guest to actually know it can use the functionality? Which
>> in turn would appear to require some form of adjustment to
>> lib/x86/policy.c:x86_cpu_policies_are_compatible().
> 
> I asked Alejandro to do it like this.
> 
> Advertising this to guests requires plumbing another MSR into the
> infrastructure which isn't quite set up properly let, and is in flux
> from my work.

Maybe there was some misunderstanding here, as I realize only now. I
wasn't asking to expose the AMD feature, but instead I was after

    /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
    /* probe_cpuid_faulting() sanity checks presence of MISC_FEATURES_ENABLES */
    p->platform_info.cpuid_faulting = cpu_has_cpuid_faulting;

wanting to be extended by "|| boot_cpu_has(X86_FEATURE_CPUID_USER_DIS)".
That, afaict, has no connection to plumbing yet another MSR.

Jan
Alejandro Vallejo May 10, 2023, 10:52 a.m. UTC | #6
On Wed, May 10, 2023 at 10:15:31AM +0200, Jan Beulich wrote:
> On 09.05.2023 12:05, Andrew Cooper wrote:
> > On 08/05/2023 2:18 pm, Jan Beulich wrote:
> >> On 05.05.2023 19:57, Alejandro Vallejo wrote:
> >>> This is in order to aid guests of AMD hardware that we have exposed
> >>> CPUID faulting to. If they try to modify the Intel MSR that enables
> >>> the feature, trigger levelling so AMD's version of it (CpuidUserDis)
> >>> is used instead.
> >>>
> >>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >>> ---
> >>>  xen/arch/x86/msr.c | 9 ++++++++-
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >> Don't you also need to update cpu-policy.c:calculate_host_policy()
> >> for the guest to actually know it can use the functionality? Which
> >> in turn would appear to require some form of adjustment to
> >> lib/x86/policy.c:x86_cpu_policies_are_compatible().
> > 
> > I asked Alejandro to do it like this.
> > 
> > Advertising this to guests requires plumbing another MSR into the
> > infrastructure which isn't quite set up properly let, and is in flux
> > from my work.
> 
> Maybe there was some misunderstanding here, as I realize only now. I
> wasn't asking to expose the AMD feature, but instead I was after
> 
>     /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
>     /* probe_cpuid_faulting() sanity checks presence of MISC_FEATURES_ENABLES */
>     p->platform_info.cpuid_faulting = cpu_has_cpuid_faulting;
> 
> wanting to be extended by "|| boot_cpu_has(X86_FEATURE_CPUID_USER_DIS)".
> That, afaict, has no connection to plumbing yet another MSR.
> 
> Jan

Let me backtrack a little. There's 2 different (but related) matters under
discussion:

 1. Trapping PV guest attempts to run the cpuid instruction
 2. Providing a virtualized CPUID faulting interface so a guest kernel
    can intercept the CPUID instructions of code running under it.

This series was meant to provide fix (1) on AMD hardware. I did go a bit
beyond in v1, trying to provide support for a unified faulting mechanism
on AMD, but providing a virtualized CPUID faulting interface needs to be
done a bit more carefully than that. Doing it partially now just adds tech
debt to be paid when CpuidUserDis is exposed to the domains.

Changing the policy to expose the Intel interface when AMD is the backend
falls on (2), which is probably better off done separately in a unified way.

v2 removes the changes to x86/msr.c so only (1) is addressed.

Does this make sense?

Alejandro
Jan Beulich May 10, 2023, 1:17 p.m. UTC | #7
On 10.05.2023 12:52, Alejandro Vallejo wrote:
> On Wed, May 10, 2023 at 10:15:31AM +0200, Jan Beulich wrote:
>> On 09.05.2023 12:05, Andrew Cooper wrote:
>>> On 08/05/2023 2:18 pm, Jan Beulich wrote:
>>>> On 05.05.2023 19:57, Alejandro Vallejo wrote:
>>>>> This is in order to aid guests of AMD hardware that we have exposed
>>>>> CPUID faulting to. If they try to modify the Intel MSR that enables
>>>>> the feature, trigger levelling so AMD's version of it (CpuidUserDis)
>>>>> is used instead.
>>>>>
>>>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>>>>> ---
>>>>>  xen/arch/x86/msr.c | 9 ++++++++-
>>>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>> Don't you also need to update cpu-policy.c:calculate_host_policy()
>>>> for the guest to actually know it can use the functionality? Which
>>>> in turn would appear to require some form of adjustment to
>>>> lib/x86/policy.c:x86_cpu_policies_are_compatible().
>>>
>>> I asked Alejandro to do it like this.
>>>
>>> Advertising this to guests requires plumbing another MSR into the
>>> infrastructure which isn't quite set up properly let, and is in flux
>>> from my work.
>>
>> Maybe there was some misunderstanding here, as I realize only now. I
>> wasn't asking to expose the AMD feature, but instead I was after
>>
>>     /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
>>     /* probe_cpuid_faulting() sanity checks presence of MISC_FEATURES_ENABLES */
>>     p->platform_info.cpuid_faulting = cpu_has_cpuid_faulting;
>>
>> wanting to be extended by "|| boot_cpu_has(X86_FEATURE_CPUID_USER_DIS)".
>> That, afaict, has no connection to plumbing yet another MSR.
> 
> Let me backtrack a little. There's 2 different (but related) matters under
> discussion:
> 
>  1. Trapping PV guest attempts to run the cpuid instruction
>  2. Providing a virtualized CPUID faulting interface so a guest kernel
>     can intercept the CPUID instructions of code running under it.
> 
> This series was meant to provide fix (1) on AMD hardware. I did go a bit
> beyond in v1, trying to provide support for a unified faulting mechanism
> on AMD, but providing a virtualized CPUID faulting interface needs to be
> done a bit more carefully than that. Doing it partially now just adds tech
> debt to be paid when CpuidUserDis is exposed to the domains.
> 
> Changing the policy to expose the Intel interface when AMD is the backend
> falls on (2), which is probably better off done separately in a unified way.
> 
> v2 removes the changes to x86/msr.c so only (1) is addressed.
> 
> Does this make sense?

Certainly. Nevertheless I would like to understand what Andrew meant,
even if only for staying better in sync with all the work he has been
doing (and is still planning to do) in the area of CPU policies and
leveling.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index ecf126566d..984aedf180 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -431,6 +431,13 @@  int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
     {
         bool old_cpuid_faulting = msrs->misc_features_enables.cpuid_faulting;
 
+        /*
+         * The boot CPU must support Intel's CPUID faulting _or_
+         * AMD's CpuidUserDis.
+         */
+        bool can_fault_cpuid = cpu_has_cpuid_faulting ||
+                               boot_cpu_has(X86_FEATURE_CPUID_USER_DIS);
+
         rsvd = ~0ull;
         if ( cp->platform_info.cpuid_faulting )
             rsvd &= ~MSR_MISC_FEATURES_CPUID_FAULTING;
@@ -440,7 +447,7 @@  int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 
         msrs->misc_features_enables.raw = val;
 
-        if ( v == curr && is_hvm_domain(d) && cpu_has_cpuid_faulting &&
+        if ( v == curr && is_hvm_domain(d) && can_fault_cpuid &&
              (old_cpuid_faulting ^ msrs->misc_features_enables.cpuid_faulting) )
             ctxt_switch_levelling(v);
         break;