diff mbox

[06/10] x86/cpuid: Handle leaf 0x6 in guest_cpuid()

Message ID 1487588434-4359-7-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Feb. 20, 2017, 11 a.m. UTC
The thermal/performance leaf was previously hidden from HVM guests, but fully
visible to PV guests.  Most of the leaf refers to MSR availability, and there
is nothing an unprivileged PV guest can do with the information, so hide the
leaf entirely.

The PV MSR handling logic as minimal support for some thermal/perf operations
from the hardware domain, so leak through the implemented subset of features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpuid.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

Comments

Jan Beulich Feb. 21, 2017, 5:25 p.m. UTC | #1
>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
> The thermal/performance leaf was previously hidden from HVM guests, but fully
> visible to PV guests.  Most of the leaf refers to MSR availability, and there
> is nothing an unprivileged PV guest can do with the information, so hide the
> leaf entirely.
> 
> The PV MSR handling logic as minimal support for some thermal/perf operations

... has ...

> from the hardware domain, so leak through the implemented subset of 
> features.

Does it make sense to continue to special case PV hwdom here?
Should there perhaps be at least a fixme note?

Jan
Andrew Cooper Feb. 21, 2017, 5:40 p.m. UTC | #2
On 21/02/17 17:25, Jan Beulich wrote:
>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>> The thermal/performance leaf was previously hidden from HVM guests, but fully
>> visible to PV guests.  Most of the leaf refers to MSR availability, and there
>> is nothing an unprivileged PV guest can do with the information, so hide the
>> leaf entirely.
>>
>> The PV MSR handling logic as minimal support for some thermal/perf operations
> ... has ...
>
>> from the hardware domain, so leak through the implemented subset of 
>> features.
> Does it make sense to continue to special case PV hwdom here?

Being able to play with these MSRs will be actively wrong for HVM
context.  It is already fairly wrong for PV context, as nothing prevents
you being rescheduled across pcpus while in the middle of a read/write
cycle on the MSRs.

> Should there perhaps be at least a fixme note?

One way or another, we have to invest some different mechanism of
providing real hardware details to the hardware domain which don't
collide with their vcpus.

~Andrew
Andrew Cooper Feb. 21, 2017, 5:44 p.m. UTC | #3
On 21/02/17 17:40, Andrew Cooper wrote:
> On 21/02/17 17:25, Jan Beulich wrote:
>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>> The thermal/performance leaf was previously hidden from HVM guests, but fully
>>> visible to PV guests.  Most of the leaf refers to MSR availability, and there
>>> is nothing an unprivileged PV guest can do with the information, so hide the
>>> leaf entirely.
>>>
>>> The PV MSR handling logic as minimal support for some thermal/perf operations
>> ... has ...
>>
>>> from the hardware domain, so leak through the implemented subset of 
>>> features.
>> Does it make sense to continue to special case PV hwdom here?
> Being able to play with these MSRs will be actively wrong for HVM
> context.  It is already fairly wrong for PV context, as nothing prevents
> you being rescheduled across pcpus while in the middle of a read/write
> cycle on the MSRs.
>
>> Should there perhaps be at least a fixme note?
> One way or another, we have to invest some different mechanism of
> providing real hardware details to the hardware domain which don't
> collide with their vcpus.

s/invest/invent/.  Sorry for the confusion.

~Andrew
Jan Beulich Feb. 22, 2017, 7:31 a.m. UTC | #4
>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote:
> On 21/02/17 17:25, Jan Beulich wrote:
>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>> The thermal/performance leaf was previously hidden from HVM guests, but 
> fully
>>> visible to PV guests.  Most of the leaf refers to MSR availability, and 
> there
>>> is nothing an unprivileged PV guest can do with the information, so hide the
>>> leaf entirely.
>>>
>>> The PV MSR handling logic as minimal support for some thermal/perf operations
>> ... has ...
>>
>>> from the hardware domain, so leak through the implemented subset of 
>>> features.
>> Does it make sense to continue to special case PV hwdom here?
> 
> Being able to play with these MSRs will be actively wrong for HVM
> context.  It is already fairly wrong for PV context, as nothing prevents
> you being rescheduled across pcpus while in the middle of a read/write
> cycle on the MSRs.

So the MSRs in question are, afaics
- MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all
  of which are is_cpufreq_controller() dependent)
- MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS
  (both of which are is_pinned_vcpu() dependent)
For the latter your argument doesn't apply. For the former, I've
been wondering for a while whether we shouldn't do away with
"cpufreq=dom0-kernel".

>> Should there perhaps be at least a fixme note?
> 
> One way or another, we have to invest some different mechanism of
> providing real hardware details to the hardware domain which don't
> collide with their vcpus.

Hence the question whether to at least add a "fixme" note.

Jan
Andrew Cooper Feb. 22, 2017, 8:23 a.m. UTC | #5
On 22/02/17 07:31, Jan Beulich wrote:
>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote:
>> On 21/02/17 17:25, Jan Beulich wrote:
>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>> The thermal/performance leaf was previously hidden from HVM guests, but 
>> fully
>>>> visible to PV guests.  Most of the leaf refers to MSR availability, and 
>> there
>>>> is nothing an unprivileged PV guest can do with the information, so hide the
>>>> leaf entirely.
>>>>
>>>> The PV MSR handling logic as minimal support for some thermal/perf operations
>>> ... has ...
>>>
>>>> from the hardware domain, so leak through the implemented subset of 
>>>> features.
>>> Does it make sense to continue to special case PV hwdom here?
>> Being able to play with these MSRs will be actively wrong for HVM
>> context.  It is already fairly wrong for PV context, as nothing prevents
>> you being rescheduled across pcpus while in the middle of a read/write
>> cycle on the MSRs.
> So the MSRs in question are, afaics
> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all
>   of which are is_cpufreq_controller() dependent)
> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS
>   (both of which are is_pinned_vcpu() dependent)
> For the latter your argument doesn't apply. For the former, I've
> been wondering for a while whether we shouldn't do away with
> "cpufreq=dom0-kernel".

Hmm.  All good points.  If I can get away without leaking any of this,
that would be ideal.  (Lets see what Linux thinks of such a setup.)

~Andrew
Andrew Cooper Feb. 22, 2017, 9:12 a.m. UTC | #6
On 22/02/17 08:23, Andrew Cooper wrote:
> On 22/02/17 07:31, Jan Beulich wrote:
>>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote:
>>> On 21/02/17 17:25, Jan Beulich wrote:
>>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>>> The thermal/performance leaf was previously hidden from HVM guests, but 
>>> fully
>>>>> visible to PV guests.  Most of the leaf refers to MSR availability, and 
>>> there
>>>>> is nothing an unprivileged PV guest can do with the information, so hide the
>>>>> leaf entirely.
>>>>>
>>>>> The PV MSR handling logic as minimal support for some thermal/perf operations
>>>> ... has ...
>>>>
>>>>> from the hardware domain, so leak through the implemented subset of 
>>>>> features.
>>>> Does it make sense to continue to special case PV hwdom here?
>>> Being able to play with these MSRs will be actively wrong for HVM
>>> context.  It is already fairly wrong for PV context, as nothing prevents
>>> you being rescheduled across pcpus while in the middle of a read/write
>>> cycle on the MSRs.
>> So the MSRs in question are, afaics
>> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all
>>   of which are is_cpufreq_controller() dependent)
>> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS
>>   (both of which are is_pinned_vcpu() dependent)
>> For the latter your argument doesn't apply. For the former, I've
>> been wondering for a while whether we shouldn't do away with
>> "cpufreq=dom0-kernel".
> Hmm.  All good points.  If I can get away without leaking any of this,
> that would be ideal.  (Lets see what Linux thinks of such a setup.)

Linux seems fine without any of this leakage.  I have checked and C/P
state information is still propagated up to Xen.  I will drop the entire
dynamic adjustment.

~Andrew
Jan Beulich Feb. 22, 2017, 9:26 a.m. UTC | #7
>>> On 22.02.17 at 10:12, <andrew.cooper3@citrix.com> wrote:
> On 22/02/17 08:23, Andrew Cooper wrote:
>> On 22/02/17 07:31, Jan Beulich wrote:
>>>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote:
>>>> On 21/02/17 17:25, Jan Beulich wrote:
>>>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>>>> The PV MSR handling logic as minimal support for some thermal/perf operations
>>>>>> from the hardware domain, so leak through the implemented subset of 
>>>>>> features.
>>>>> Does it make sense to continue to special case PV hwdom here?
>>>> Being able to play with these MSRs will be actively wrong for HVM
>>>> context.  It is already fairly wrong for PV context, as nothing prevents
>>>> you being rescheduled across pcpus while in the middle of a read/write
>>>> cycle on the MSRs.
>>> So the MSRs in question are, afaics
>>> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all
>>>   of which are is_cpufreq_controller() dependent)
>>> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS
>>>   (both of which are is_pinned_vcpu() dependent)
>>> For the latter your argument doesn't apply. For the former, I've
>>> been wondering for a while whether we shouldn't do away with
>>> "cpufreq=dom0-kernel".
>> Hmm.  All good points.  If I can get away without leaking any of this,
>> that would be ideal.  (Lets see what Linux thinks of such a setup.)
> 
> Linux seems fine without any of this leakage.

But is that for a broad range of versions, or just the one you had
to hand?

Jan
Andrew Cooper Feb. 27, 2017, 2:30 p.m. UTC | #8
On 22/02/17 09:26, Jan Beulich wrote:
>>>> On 22.02.17 at 10:12, <andrew.cooper3@citrix.com> wrote:
>> On 22/02/17 08:23, Andrew Cooper wrote:
>>> On 22/02/17 07:31, Jan Beulich wrote:
>>>>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote:
>>>>> On 21/02/17 17:25, Jan Beulich wrote:
>>>>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote:
>>>>>>> The PV MSR handling logic as minimal support for some thermal/perf operations
>>>>>>> from the hardware domain, so leak through the implemented subset of 
>>>>>>> features.
>>>>>> Does it make sense to continue to special case PV hwdom here?
>>>>> Being able to play with these MSRs will be actively wrong for HVM
>>>>> context.  It is already fairly wrong for PV context, as nothing prevents
>>>>> you being rescheduled across pcpus while in the middle of a read/write
>>>>> cycle on the MSRs.
>>>> So the MSRs in question are, afaics
>>>> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all
>>>>   of which are is_cpufreq_controller() dependent)
>>>> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS
>>>>   (both of which are is_pinned_vcpu() dependent)
>>>> For the latter your argument doesn't apply. For the former, I've
>>>> been wondering for a while whether we shouldn't do away with
>>>> "cpufreq=dom0-kernel".
>>> Hmm.  All good points.  If I can get away without leaking any of this,
>>> that would be ideal.  (Lets see what Linux thinks of such a setup.)
>> Linux seems fine without any of this leakage.
> But is that for a broad range of versions, or just the one you had
> to hand?

3.10 and 4.4 PVOps.  Looking at the 2.6.32-classic source, I can't see
anything which would be a problem.

~Andrew
diff mbox

Patch

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 7862d62..2a5e011 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -167,6 +167,7 @@  static void recalculate_misc(struct cpuid_policy *p)
     p->basic.raw[0x4] = p->basic.raw[0x7] = p->basic.raw[0xd] = EMPTY_LEAF;
 
     p->basic.raw[0x5] = EMPTY_LEAF; /* MONITOR not exposed to guests. */
+    p->basic.raw[0x6] = EMPTY_LEAF; /* Therm/Power not exposed to guests. */
 
     p->basic.raw[0x8] = EMPTY_LEAF;
     p->basic.raw[0xc] = EMPTY_LEAF;
@@ -648,8 +649,7 @@  static void pv_cpuid(uint32_t leaf, uint32_t subleaf, struct cpuid_leaf *res)
         *res = EMPTY_LEAF;
         break;
 
-    case 0x0 ... 0x5:
-    case 0x7 ... 0x9:
+    case 0x0 ... 0x9:
     case 0xc ... XSTATE_CPUID:
     case 0x80000000 ... 0xffffffff:
         ASSERT_UNREACHABLE();
@@ -683,8 +683,7 @@  static void hvm_cpuid(uint32_t leaf, uint32_t subleaf, struct cpuid_leaf *res)
             res->a = (res->a & ~0xff) | 3;
         break;
 
-    case 0x0 ... 0x5:
-    case 0x7 ... 0x9:
+    case 0x0 ... 0x9:
     case 0xc ... XSTATE_CPUID:
     case 0x80000000 ... 0xffffffff:
         ASSERT_UNREACHABLE();
@@ -744,7 +743,7 @@  void guest_cpuid(const struct vcpu *v, uint32_t leaf,
             goto legacy;
 
         case 0x0 ... 0x3:
-        case 0x5:
+        case 0x5 ... 0x6:
         case 0x8 ... 0x9:
         case 0xc:
             *res = p->basic.raw[leaf];
@@ -942,6 +941,22 @@  void guest_cpuid(const struct vcpu *v, uint32_t leaf,
             *res = raw_policy.basic.raw[leaf];
         break;
 
+    case 0x6:
+        /*
+         * Leak the implemented-subset of therm/power features into PV
+         * hardware domain kernels.
+         */
+        if ( is_pv_domain(d) && is_hardware_domain(d) &&
+             guest_kernel_mode(v, guest_cpu_user_regs()) )
+        {
+            /* Temp, Turbo, ARAT. */
+            res->a = raw_policy.basic.raw[leaf].a & 0x00000007;
+
+            /* APERF/MPERF, perf/enery bias. */
+            res->c = raw_policy.basic.raw[leaf].c & 0x00000009;
+        }
+        break;
+
     case 0x7:
         switch ( subleaf )
         {