Message ID | 1483533584-8015-24-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 04.01.17 at 13:39, <andrew.cooper3@citrix.com> wrote: > @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf, > case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1: > if ( leaf > p->extd.max_leaf ) > return; > - break; > + goto legacy; > > default: > return; > } > > + /* Skip dynamic adjustments if we are in the wrong context. */ > + if ( v != curr ) > + return; > + > + /* > + * Second pass: > + * - Dynamic adjustments > + */ > + switch ( leaf ) > + { > + case 0x7: > + switch ( subleaf ) > + { > + case 0: > + /* OSPKE clear in policy. Fast-forward CR4 back in. */ > + if ( (is_pv_vcpu(v) > + ? v->arch.pv_vcpu.ctrlreg[4] > + : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE ) > + res->c |= cpufeat_mask(X86_FEATURE_OSPKE); What's wrong with doing this adjustment when v != curr? By the time the caller looks at the result, the state of guest software controlled bits can't be relied upon anyway. Which then raises the question whether a second switch() statement for the a second pass is all that useful in the first place (I realize this may depend on future plans of yours). Jan
On 05/01/17 14:01, Jan Beulich wrote: >>>> On 04.01.17 at 13:39, <andrew.cooper3@citrix.com> wrote: >> @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf, >> case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1: >> if ( leaf > p->extd.max_leaf ) >> return; >> - break; >> + goto legacy; >> >> default: >> return; >> } >> >> + /* Skip dynamic adjustments if we are in the wrong context. */ >> + if ( v != curr ) >> + return; >> + >> + /* >> + * Second pass: >> + * - Dynamic adjustments >> + */ >> + switch ( leaf ) >> + { >> + case 0x7: >> + switch ( subleaf ) >> + { >> + case 0: >> + /* OSPKE clear in policy. Fast-forward CR4 back in. */ >> + if ( (is_pv_vcpu(v) >> + ? v->arch.pv_vcpu.ctrlreg[4] >> + : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE ) >> + res->c |= cpufeat_mask(X86_FEATURE_OSPKE); > What's wrong with doing this adjustment when v != curr? A guests %cr4 is stale if it is running elsewhere. > By the time the caller looks at the result, the state of guest > software controlled bits can't be relied upon anyway. This particular adjustment can be done out of curr context, but others are harder. I have taken the approach that it is better to do nothing consistently, than to expend effort filling in data we know is going to be wrong for the caller. (I hit a rats nest with the xstate leaf and dynamic %ebx's, which is why those patches are still pending some more work, and I haven't yet decided how to do the pv hardware domain leakage.) > Which then raises the question whether a second switch() statement > for the a second pass is all that useful in the first place (I > realize this may depend on future plans of yours). This switch statement will soon be far larger and more complicated than the first-pass one, and I think it is important to separate the static and dynamic nature of the two. ~Andrew
>>> On 05.01.17 at 15:39, <andrew.cooper3@citrix.com> wrote: > On 05/01/17 14:01, Jan Beulich wrote: >>>>> On 04.01.17 at 13:39, <andrew.cooper3@citrix.com> wrote: >>> @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int > leaf, >>> case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1: >>> if ( leaf > p->extd.max_leaf ) >>> return; >>> - break; >>> + goto legacy; >>> >>> default: >>> return; >>> } >>> >>> + /* Skip dynamic adjustments if we are in the wrong context. */ >>> + if ( v != curr ) >>> + return; >>> + >>> + /* >>> + * Second pass: >>> + * - Dynamic adjustments >>> + */ >>> + switch ( leaf ) >>> + { >>> + case 0x7: >>> + switch ( subleaf ) >>> + { >>> + case 0: >>> + /* OSPKE clear in policy. Fast-forward CR4 back in. */ >>> + if ( (is_pv_vcpu(v) >>> + ? v->arch.pv_vcpu.ctrlreg[4] >>> + : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE ) >>> + res->c |= cpufeat_mask(X86_FEATURE_OSPKE); >> What's wrong with doing this adjustment when v != curr? > > A guests %cr4 is stale if it is running elsewhere. > >> By the time the caller looks at the result, the state of guest >> software controlled bits can't be relied upon anyway. > > This particular adjustment can be done out of curr context, but others > are harder. I have taken the approach that it is better to do nothing > consistently, than to expend effort filling in data we know is going to > be wrong for the caller. May I then suggest to add the early bailing at the time it actually becomes necessary, or at the very least extend its comment to make clear that this isn't always strictly needed? Jan
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index d140482..9181fc7 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -339,6 +339,7 @@ int init_domain_cpuid_policy(struct domain *d) void guest_cpuid(const struct vcpu *v, unsigned int leaf, unsigned int subleaf, struct cpuid_leaf *res) { + const struct vcpu *curr = current; const struct domain *d = v->domain; const struct cpuid_policy *p = d->arch.cpuid; @@ -348,6 +349,7 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf, * First pass: * - Dispatch the virtualised leaves to their respective handlers. * - Perform max_leaf/subleaf calculations, maybe returning early. + * - Fill in *res for leaves no longer handled on the legacy path. */ switch ( leaf ) { @@ -358,17 +360,20 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf, #endif if ( leaf > p->basic.max_leaf ) return; - break; + goto legacy; case 0x7: if ( subleaf > p->feat.max_subleaf ) return; + + BUG_ON(subleaf >= ARRAY_SIZE(p->feat.raw)); + *res = p->feat.raw[subleaf]; break; case 0xd: if ( subleaf > ARRAY_SIZE(p->xstate.raw) ) return; - break; + goto legacy; case 0x40000000 ... 0x400000ff: if ( is_viridian_domain(d) ) @@ -380,14 +385,42 @@ void guest_cpuid(const struct vcpu *v, unsigned int leaf, case 0x80000000 ... 0x80000000 + CPUID_GUEST_NR_EXTD - 1: if ( leaf > p->extd.max_leaf ) return; - break; + goto legacy; default: return; } + /* Skip dynamic adjustments if we are in the wrong context. */ + if ( v != curr ) + return; + + /* + * Second pass: + * - Dynamic adjustments + */ + switch ( leaf ) + { + case 0x7: + switch ( subleaf ) + { + case 0: + /* OSPKE clear in policy. Fast-forward CR4 back in. */ + if ( (is_pv_vcpu(v) + ? v->arch.pv_vcpu.ctrlreg[4] + : v->arch.hvm_vcpu.guest_cr[4]) & X86_CR4_PKE ) + res->c |= cpufeat_mask(X86_FEATURE_OSPKE); + break; + } + break; + } + + /* Done. */ + return; + + legacy: /* {pv,hvm}_cpuid() have this expectation. */ - ASSERT(v == current); + ASSERT(v == curr); if ( is_pv_vcpu(v) || is_pvh_vcpu(v) ) { diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 1dd92e3..4ded533 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3354,19 +3354,6 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, break; - case 0x7: - if ( count == 0 ) - { - *ebx = d->arch.cpuid->feat._7b0; - *ecx = d->arch.cpuid->feat._7c0; - *edx = d->arch.cpuid->feat._7d0; - - /* OSPKE clear in policy. Fast-forward CR4 back in. */ - if ( v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PKE ) - *ecx |= cpufeat_mask(X86_FEATURE_OSPKE); - } - break; - case 0xb: /* Fix the x2APIC identifier. */ *edx = v->vcpu_id * 2; @@ -3543,6 +3530,10 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, else *eax = 0; break; + + case 0x7: + ASSERT_UNREACHABLE(); + /* Now handled in guest_cpuid(). */ } } diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index aed96c3..e2669b1 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1180,30 +1180,6 @@ void pv_cpuid(struct cpu_user_regs *regs) } break; - case 0x00000007: - if ( subleaf == 0 ) - { - b = currd->arch.cpuid->feat._7b0; - c = currd->arch.cpuid->feat._7c0; - d = currd->arch.cpuid->feat._7d0; - - if ( !is_pvh_domain(currd) ) - { - /* - * Delete the PVH condition when HVMLite formally replaces PVH, - * and HVM guests no longer enter a PV codepath. - */ - - /* OSPKE clear in policy. Fast-forward CR4 back in. */ - if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_PKE ) - c |= cpufeat_mask(X86_FEATURE_OSPKE); - } - } - else - b = c = d = 0; - a = 0; - break; - case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */ if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL || !vpmu_enabled(curr) ) @@ -1305,6 +1281,10 @@ void pv_cpuid(struct cpu_user_regs *regs) unsupported: a = b = c = d = 0; break; + + case 0x7: + ASSERT_UNREACHABLE(); + /* Now handled in guest_cpuid(). */ } regs->rax = a; diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h index 7363263..c7e9df5 100644 --- a/xen/include/asm-x86/cpuid.h +++ b/xen/include/asm-x86/cpuid.h @@ -81,12 +81,14 @@ struct cpuid_policy * - {xcr0,xss}_{high,low} * * - Guest appropriate: + * - All of the feat union * - max_{,sub}leaf * - All FEATURESET_* words * * Per-domain objects: * * - Guest accurate: + * - All of the feat union * - max_{,sub}leaf * - All FEATURESET_* words *
All per-domain policy data concerning leaf 7 is accurate. Handle it all in guest_cpuid() by reading out of the raw array block, and introduing a dynamic adjustment for OSPKE. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> --- xen/arch/x86/cpuid.c | 41 +++++++++++++++++++++++++++++++++++++---- xen/arch/x86/hvm/hvm.c | 17 ++++------------- xen/arch/x86/traps.c | 28 ++++------------------------ xen/include/asm-x86/cpuid.h | 2 ++ 4 files changed, 47 insertions(+), 41 deletions(-)