Message ID | 1487588434-4359-7-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: > The thermal/performance leaf was previously hidden from HVM guests, but fully > visible to PV guests. Most of the leaf refers to MSR availability, and there > is nothing an unprivileged PV guest can do with the information, so hide the > leaf entirely. > > The PV MSR handling logic as minimal support for some thermal/perf operations ... has ... > from the hardware domain, so leak through the implemented subset of > features. Does it make sense to continue to special case PV hwdom here? Should there perhaps be at least a fixme note? Jan
On 21/02/17 17:25, Jan Beulich wrote: >>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >> The thermal/performance leaf was previously hidden from HVM guests, but fully >> visible to PV guests. Most of the leaf refers to MSR availability, and there >> is nothing an unprivileged PV guest can do with the information, so hide the >> leaf entirely. >> >> The PV MSR handling logic as minimal support for some thermal/perf operations > ... has ... > >> from the hardware domain, so leak through the implemented subset of >> features. > Does it make sense to continue to special case PV hwdom here? Being able to play with these MSRs will be actively wrong for HVM context. It is already fairly wrong for PV context, as nothing prevents you being rescheduled across pcpus while in the middle of a read/write cycle on the MSRs. > Should there perhaps be at least a fixme note? One way or another, we have to invest some different mechanism of providing real hardware details to the hardware domain which don't collide with their vcpus. ~Andrew
On 21/02/17 17:40, Andrew Cooper wrote: > On 21/02/17 17:25, Jan Beulich wrote: >>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >>> The thermal/performance leaf was previously hidden from HVM guests, but fully >>> visible to PV guests. Most of the leaf refers to MSR availability, and there >>> is nothing an unprivileged PV guest can do with the information, so hide the >>> leaf entirely. >>> >>> The PV MSR handling logic as minimal support for some thermal/perf operations >> ... has ... >> >>> from the hardware domain, so leak through the implemented subset of >>> features. >> Does it make sense to continue to special case PV hwdom here? > Being able to play with these MSRs will be actively wrong for HVM > context. It is already fairly wrong for PV context, as nothing prevents > you being rescheduled across pcpus while in the middle of a read/write > cycle on the MSRs. > >> Should there perhaps be at least a fixme note? > One way or another, we have to invest some different mechanism of > providing real hardware details to the hardware domain which don't > collide with their vcpus. s/invest/invent/. Sorry for the confusion. ~Andrew
>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote: > On 21/02/17 17:25, Jan Beulich wrote: >>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >>> The thermal/performance leaf was previously hidden from HVM guests, but > fully >>> visible to PV guests. Most of the leaf refers to MSR availability, and > there >>> is nothing an unprivileged PV guest can do with the information, so hide the >>> leaf entirely. >>> >>> The PV MSR handling logic as minimal support for some thermal/perf operations >> ... has ... >> >>> from the hardware domain, so leak through the implemented subset of >>> features. >> Does it make sense to continue to special case PV hwdom here? > > Being able to play with these MSRs will be actively wrong for HVM > context. It is already fairly wrong for PV context, as nothing prevents > you being rescheduled across pcpus while in the middle of a read/write > cycle on the MSRs. So the MSRs in question are, afaics - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all of which are is_cpufreq_controller() dependent) - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS (both of which are is_pinned_vcpu() dependent) For the latter your argument doesn't apply. For the former, I've been wondering for a while whether we shouldn't do away with "cpufreq=dom0-kernel". >> Should there perhaps be at least a fixme note? > > One way or another, we have to invest some different mechanism of > providing real hardware details to the hardware domain which don't > collide with their vcpus. Hence the question whether to at least add a "fixme" note. Jan
On 22/02/17 07:31, Jan Beulich wrote: >>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote: >> On 21/02/17 17:25, Jan Beulich wrote: >>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >>>> The thermal/performance leaf was previously hidden from HVM guests, but >> fully >>>> visible to PV guests. Most of the leaf refers to MSR availability, and >> there >>>> is nothing an unprivileged PV guest can do with the information, so hide the >>>> leaf entirely. >>>> >>>> The PV MSR handling logic as minimal support for some thermal/perf operations >>> ... has ... >>> >>>> from the hardware domain, so leak through the implemented subset of >>>> features. >>> Does it make sense to continue to special case PV hwdom here? >> Being able to play with these MSRs will be actively wrong for HVM >> context. It is already fairly wrong for PV context, as nothing prevents >> you being rescheduled across pcpus while in the middle of a read/write >> cycle on the MSRs. > So the MSRs in question are, afaics > - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all > of which are is_cpufreq_controller() dependent) > - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS > (both of which are is_pinned_vcpu() dependent) > For the latter your argument doesn't apply. For the former, I've > been wondering for a while whether we shouldn't do away with > "cpufreq=dom0-kernel". Hmm. All good points. If I can get away without leaking any of this, that would be ideal. (Lets see what Linux thinks of such a setup.) ~Andrew
On 22/02/17 08:23, Andrew Cooper wrote: > On 22/02/17 07:31, Jan Beulich wrote: >>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote: >>> On 21/02/17 17:25, Jan Beulich wrote: >>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >>>>> The thermal/performance leaf was previously hidden from HVM guests, but >>> fully >>>>> visible to PV guests. Most of the leaf refers to MSR availability, and >>> there >>>>> is nothing an unprivileged PV guest can do with the information, so hide the >>>>> leaf entirely. >>>>> >>>>> The PV MSR handling logic as minimal support for some thermal/perf operations >>>> ... has ... >>>> >>>>> from the hardware domain, so leak through the implemented subset of >>>>> features. >>>> Does it make sense to continue to special case PV hwdom here? >>> Being able to play with these MSRs will be actively wrong for HVM >>> context. It is already fairly wrong for PV context, as nothing prevents >>> you being rescheduled across pcpus while in the middle of a read/write >>> cycle on the MSRs. >> So the MSRs in question are, afaics >> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all >> of which are is_cpufreq_controller() dependent) >> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS >> (both of which are is_pinned_vcpu() dependent) >> For the latter your argument doesn't apply. For the former, I've >> been wondering for a while whether we shouldn't do away with >> "cpufreq=dom0-kernel". > Hmm. All good points. If I can get away without leaking any of this, > that would be ideal. (Lets see what Linux thinks of such a setup.) Linux seems fine without any of this leakage. I have checked and C/P state information is still propagated up to Xen. I will drop the entire dynamic adjustment. ~Andrew
>>> On 22.02.17 at 10:12, <andrew.cooper3@citrix.com> wrote: > On 22/02/17 08:23, Andrew Cooper wrote: >> On 22/02/17 07:31, Jan Beulich wrote: >>>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote: >>>> On 21/02/17 17:25, Jan Beulich wrote: >>>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >>>>>> The PV MSR handling logic as minimal support for some thermal/perf operations >>>>>> from the hardware domain, so leak through the implemented subset of >>>>>> features. >>>>> Does it make sense to continue to special case PV hwdom here? >>>> Being able to play with these MSRs will be actively wrong for HVM >>>> context. It is already fairly wrong for PV context, as nothing prevents >>>> you being rescheduled across pcpus while in the middle of a read/write >>>> cycle on the MSRs. >>> So the MSRs in question are, afaics >>> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all >>> of which are is_cpufreq_controller() dependent) >>> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS >>> (both of which are is_pinned_vcpu() dependent) >>> For the latter your argument doesn't apply. For the former, I've >>> been wondering for a while whether we shouldn't do away with >>> "cpufreq=dom0-kernel". >> Hmm. All good points. If I can get away without leaking any of this, >> that would be ideal. (Lets see what Linux thinks of such a setup.) > > Linux seems fine without any of this leakage. But is that for a broad range of versions, or just the one you had to hand? Jan
On 22/02/17 09:26, Jan Beulich wrote: >>>> On 22.02.17 at 10:12, <andrew.cooper3@citrix.com> wrote: >> On 22/02/17 08:23, Andrew Cooper wrote: >>> On 22/02/17 07:31, Jan Beulich wrote: >>>>>>> On 21.02.17 at 18:40, <andrew.cooper3@citrix.com> wrote: >>>>> On 21/02/17 17:25, Jan Beulich wrote: >>>>>>>>> On 20.02.17 at 12:00, <andrew.cooper3@citrix.com> wrote: >>>>>>> The PV MSR handling logic as minimal support for some thermal/perf operations >>>>>>> from the hardware domain, so leak through the implemented subset of >>>>>>> features. >>>>>> Does it make sense to continue to special case PV hwdom here? >>>>> Being able to play with these MSRs will be actively wrong for HVM >>>>> context. It is already fairly wrong for PV context, as nothing prevents >>>>> you being rescheduled across pcpus while in the middle of a read/write >>>>> cycle on the MSRs. >>>> So the MSRs in question are, afaics >>>> - MSR_IA32_MPERF, MSR_IA32_APERF, MSR_IA32_PERF_CTL (all >>>> of which are is_cpufreq_controller() dependent) >>>> - MSR_IA32_THERM_CONTROL, MSR_IA32_ENERGY_PERF_BIAS >>>> (both of which are is_pinned_vcpu() dependent) >>>> For the latter your argument doesn't apply. For the former, I've >>>> been wondering for a while whether we shouldn't do away with >>>> "cpufreq=dom0-kernel". >>> Hmm. All good points. If I can get away without leaking any of this, >>> that would be ideal. (Lets see what Linux thinks of such a setup.) >> Linux seems fine without any of this leakage. > But is that for a broad range of versions, or just the one you had > to hand? 3.10 and 4.4 PVOps. Looking at the 2.6.32-classic source, I can't see anything which would be a problem. ~Andrew
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index 7862d62..2a5e011 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -167,6 +167,7 @@ static void recalculate_misc(struct cpuid_policy *p) p->basic.raw[0x4] = p->basic.raw[0x7] = p->basic.raw[0xd] = EMPTY_LEAF; p->basic.raw[0x5] = EMPTY_LEAF; /* MONITOR not exposed to guests. */ + p->basic.raw[0x6] = EMPTY_LEAF; /* Therm/Power not exposed to guests. */ p->basic.raw[0x8] = EMPTY_LEAF; p->basic.raw[0xc] = EMPTY_LEAF; @@ -648,8 +649,7 @@ static void pv_cpuid(uint32_t leaf, uint32_t subleaf, struct cpuid_leaf *res) *res = EMPTY_LEAF; break; - case 0x0 ... 0x5: - case 0x7 ... 0x9: + case 0x0 ... 0x9: case 0xc ... XSTATE_CPUID: case 0x80000000 ... 0xffffffff: ASSERT_UNREACHABLE(); @@ -683,8 +683,7 @@ static void hvm_cpuid(uint32_t leaf, uint32_t subleaf, struct cpuid_leaf *res) res->a = (res->a & ~0xff) | 3; break; - case 0x0 ... 0x5: - case 0x7 ... 0x9: + case 0x0 ... 0x9: case 0xc ... XSTATE_CPUID: case 0x80000000 ... 0xffffffff: ASSERT_UNREACHABLE(); @@ -744,7 +743,7 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, goto legacy; case 0x0 ... 0x3: - case 0x5: + case 0x5 ... 0x6: case 0x8 ... 0x9: case 0xc: *res = p->basic.raw[leaf]; @@ -942,6 +941,22 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, *res = raw_policy.basic.raw[leaf]; break; + case 0x6: + /* + * Leak the implemented-subset of therm/power features into PV + * hardware domain kernels. + */ + if ( is_pv_domain(d) && is_hardware_domain(d) && + guest_kernel_mode(v, guest_cpu_user_regs()) ) + { + /* Temp, Turbo, ARAT. */ + res->a = raw_policy.basic.raw[leaf].a & 0x00000007; + + /* APERF/MPERF, perf/enery bias. */ + res->c = raw_policy.basic.raw[leaf].c & 0x00000009; + } + break; + case 0x7: switch ( subleaf ) {
The thermal/performance leaf was previously hidden from HVM guests, but fully visible to PV guests. Most of the leaf refers to MSR availability, and there is nothing an unprivileged PV guest can do with the information, so hide the leaf entirely. The PV MSR handling logic as minimal support for some thermal/perf operations from the hardware domain, so leak through the implemented subset of features. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> --- xen/arch/x86/cpuid.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-)