diff mbox

[23/27] x86/cpuid: Move all leaf 7 handling into guest_cpuid()

Message ID 1483533584-8015-24-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Jan. 4, 2017, 12:39 p.m. UTC
All per-domain policy data concerning leaf 7 is accurate.  Handle it all in
guest_cpuid() by reading out of the raw array block, and introduing a dynamic
adjustment for OSPKE.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpuid.c        | 41 +++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/hvm/hvm.c      | 17 ++++-------------
 xen/arch/x86/traps.c        | 28 ++++------------------------
 xen/include/asm-x86/cpuid.h |  2 ++
 4 files changed, 47 insertions(+), 41 deletions(-)

Comments

Jan Beulich Jan. 5, 2017, 2:01 p.m. UTC | #1
>>> On 04.01.17 at 13:39, <andrew.cooper3@citrix.com> wrote:
> @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf,
>      case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1:
>          if ( leaf > p->extd.max_leaf )
>              return;
> -        break;
> +        goto legacy;
>  
>      default:
>          return;
>      }
>  
> +    /* Skip dynamic adjustments if we are in the wrong context. */
> +    if ( v != curr )
> +        return;
> +
> +    /*
> +     * Second pass:
> +     * - Dynamic adjustments
> +     */
> +    switch ( leaf )
> +    {
> +    case 0x7:
> +        switch ( subleaf )
> +        {
> +        case 0:
> +            /* OSPKE clear in policy.  Fast-forward CR4 back in. */
> +            if ( (is_pv_vcpu(v)
> +                  ? v->arch.pv_vcpu.ctrlreg[4]
> +                  : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE )
> +                res->c |= cpufeat_mask(X86_FEATURE_OSPKE);

What's wrong with doing this adjustment when v != curr? By
the time the caller looks at the result, the state of guest
software controlled bits can't be relied upon anyway. Which
then raises the question whether a second switch() statement
for the a second pass is all that useful in the first place (I
realize this may depend on future plans of yours).

Jan
Andrew Cooper Jan. 5, 2017, 2:39 p.m. UTC | #2
On 05/01/17 14:01, Jan Beulich wrote:
>>>> On 04.01.17 at 13:39, <andrew.cooper3@citrix.com> wrote:
>> @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf,
>>      case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1:
>>          if ( leaf > p->extd.max_leaf )
>>              return;
>> -        break;
>> +        goto legacy;
>>  
>>      default:
>>          return;
>>      }
>>  
>> +    /* Skip dynamic adjustments if we are in the wrong context. */
>> +    if ( v != curr )
>> +        return;
>> +
>> +    /*
>> +     * Second pass:
>> +     * - Dynamic adjustments
>> +     */
>> +    switch ( leaf )
>> +    {
>> +    case 0x7:
>> +        switch ( subleaf )
>> +        {
>> +        case 0:
>> +            /* OSPKE clear in policy.  Fast-forward CR4 back in. */
>> +            if ( (is_pv_vcpu(v)
>> +                  ? v->arch.pv_vcpu.ctrlreg[4]
>> +                  : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE )
>> +                res->c |= cpufeat_mask(X86_FEATURE_OSPKE);
> What's wrong with doing this adjustment when v != curr?

A guests %cr4 is stale if it is running elsewhere.

> By the time the caller looks at the result, the state of guest
> software controlled bits can't be relied upon anyway.

This particular adjustment can be done out of curr context, but others
are harder.  I have taken the approach that it is better to do nothing
consistently, than to expend effort filling in data we know is going to
be wrong for the caller.

(I hit a rats nest with the xstate leaf and dynamic %ebx's, which is why
those patches are still pending some more work, and I haven't yet
decided how to do the pv hardware domain leakage.)

> Which then raises the question whether a second switch() statement
> for the a second pass is all that useful in the first place (I
> realize this may depend on future plans of yours).

This switch statement will soon be far larger and more complicated than
the first-pass one, and I think it is important to separate the static
and dynamic nature of the two.

~Andrew
Jan Beulich Jan. 5, 2017, 2:55 p.m. UTC | #3
>>> On 05.01.17 at 15:39, <andrew.cooper3@citrix.com> wrote:
> On 05/01/17 14:01, Jan Beulich wrote:
>>>>> On 04.01.17 at 13:39, <andrew.cooper3@citrix.com> wrote:
>>> @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int 
> leaf,
>>>      case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1:
>>>          if ( leaf > p->extd.max_leaf )
>>>              return;
>>> -        break;
>>> +        goto legacy;
>>>  
>>>      default:
>>>          return;
>>>      }
>>>  
>>> +    /* Skip dynamic adjustments if we are in the wrong context. */
>>> +    if ( v != curr )
>>> +        return;
>>> +
>>> +    /*
>>> +     * Second pass:
>>> +     * - Dynamic adjustments
>>> +     */
>>> +    switch ( leaf )
>>> +    {
>>> +    case 0x7:
>>> +        switch ( subleaf )
>>> +        {
>>> +        case 0:
>>> +            /* OSPKE clear in policy.  Fast-forward CR4 back in. */
>>> +            if ( (is_pv_vcpu(v)
>>> +                  ? v->arch.pv_vcpu.ctrlreg[4]
>>> +                  : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE )
>>> +                res->c |= cpufeat_mask(X86_FEATURE_OSPKE);
>> What's wrong with doing this adjustment when v != curr?
> 
> A guests %cr4 is stale if it is running elsewhere.
> 
>> By the time the caller looks at the result, the state of guest
>> software controlled bits can't be relied upon anyway.
> 
> This particular adjustment can be done out of curr context, but others
> are harder.  I have taken the approach that it is better to do nothing
> consistently, than to expend effort filling in data we know is going to
> be wrong for the caller.

May I then suggest to add the early bailing at the time it actually
becomes necessary, or at the very least extend its comment to
make clear that this isn't always strictly needed?

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index d140482..9181fc7 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -339,6 +339,7 @@  int init_domain_cpuid_policy(struct domain *d)
 void guest_cpuid(const struct vcpu *v, unsigned int leaf,
                  unsigned int subleaf, struct cpuid_leaf *res)
 {
+    const struct vcpu *curr = current;
     const struct domain *d = v->domain;
     const struct cpuid_policy *p = d->arch.cpuid;
 
@@ -348,6 +349,7 @@  void guest_cpuid(const struct vcpu *v, unsigned int leaf,
      * First pass:
      * - Dispatch the virtualised leaves to their respective handlers.
      * - Perform max_leaf/subleaf calculations, maybe returning early.
+     * - Fill in *res for leaves no longer handled on the legacy path.
      */
     switch ( leaf )
     {
@@ -358,17 +360,20 @@  void guest_cpuid(const struct vcpu *v, unsigned int leaf,
 #endif
         if ( leaf > p->basic.max_leaf )
             return;
-        break;
+        goto legacy;
 
     case 0x7:
         if ( subleaf > p->feat.max_subleaf )
             return;
+
+        BUG_ON(subleaf >= ARRAY_SIZE(p->feat.raw));
+        *res = p->feat.raw[subleaf];
         break;
 
     case 0xd:
         if ( subleaf > ARRAY_SIZE(p->xstate.raw) )
             return;
-        break;
+        goto legacy;
 
     case 0x40000000 ... 0x400000ff:
         if ( is_viridian_domain(d) )
@@ -380,14 +385,42 @@  void guest_cpuid(const struct vcpu *v, unsigned int leaf,
     case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1:
         if ( leaf > p->extd.max_leaf )
             return;
-        break;
+        goto legacy;
 
     default:
         return;
     }
 
+    /* Skip dynamic adjustments if we are in the wrong context. */
+    if ( v != curr )
+        return;
+
+    /*
+     * Second pass:
+     * - Dynamic adjustments
+     */
+    switch ( leaf )
+    {
+    case 0x7:
+        switch ( subleaf )
+        {
+        case 0:
+            /* OSPKE clear in policy.  Fast-forward CR4 back in. */
+            if ( (is_pv_vcpu(v)
+                  ? v->arch.pv_vcpu.ctrlreg[4]
+                  : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE )
+                res->c |= cpufeat_mask(X86_FEATURE_OSPKE);
+            break;
+        }
+        break;
+    }
+
+    /* Done. */
+    return;
+
+ legacy:
     /* {pv,hvm}_cpuid() have this expectation. */
-    ASSERT(v == current);
+    ASSERT(v == curr);
 
     if ( is_pv_vcpu(v) || is_pvh_vcpu(v) )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1dd92e3..4ded533 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3354,19 +3354,6 @@  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
 
         break;
 
-    case 0x7:
-        if ( count == 0 )
-        {
-            *ebx = d->arch.cpuid->feat._7b0;
-            *ecx = d->arch.cpuid->feat._7c0;
-            *edx = d->arch.cpuid->feat._7d0;
-
-            /* OSPKE clear in policy.  Fast-forward CR4 back in. */
-            if ( v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE )
-                *ecx |= cpufeat_mask(X86_FEATURE_OSPKE);
-        }
-        break;
-
     case 0xb:
         /* Fix the x2APIC identifier. */
         *edx = v->vcpu_id * 2;
@@ -3543,6 +3530,10 @@  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
         else
             *eax = 0;
         break;
+
+    case 0x7:
+        ASSERT_UNREACHABLE();
+        /* Now handled in guest_cpuid(). */
     }
 }
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index aed96c3..e2669b1 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1180,30 +1180,6 @@  void pv_cpuid(struct cpu_user_regs *regs)
         }
         break;
 
-    case 0x00000007:
-        if ( subleaf == 0 )
-        {
-            b = currd->arch.cpuid->feat._7b0;
-            c = currd->arch.cpuid->feat._7c0;
-            d = currd->arch.cpuid->feat._7d0;
-
-            if ( !is_pvh_domain(currd) )
-            {
-                /*
-                 * Delete the PVH condition when HVMLite formally replaces PVH,
-                 * and HVM guests no longer enter a PV codepath.
-                 */
-
-                /* OSPKE clear in policy.  Fast-forward CR4 back in. */
-                if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_PKE )
-                    c |= cpufeat_mask(X86_FEATURE_OSPKE);
-            }
-        }
-        else
-            b = c = d = 0;
-        a = 0;
-        break;
-
     case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
         if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
              !vpmu_enabled(curr) )
@@ -1305,6 +1281,10 @@  void pv_cpuid(struct cpu_user_regs *regs)
     unsupported:
         a = b = c = d = 0;
         break;
+
+    case 0x7:
+        ASSERT_UNREACHABLE();
+        /* Now handled in guest_cpuid(). */
     }
 
     regs->rax = a;
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 7363263..c7e9df5 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -81,12 +81,14 @@  struct cpuid_policy
      *   - {xcr0,xss}_{high,low}
      *
      * - Guest appropriate:
+     *   - All of the feat union
      *   - max_{,sub}leaf
      *   - All FEATURESET_* words
      *
      * Per-domain objects:
      *
      * - Guest accurate:
+     *   - All of the feat union
      *   - max_{,sub}leaf
      *   - All FEATURESET_* words
      *