diff mbox

WHPX fixes an issue with CPUID 1 not returning CPUID_EXT_HYPERVISOR

Message ID 20180326170658.606-1-juterry@microsoft.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhijian Li (Fujitsu)" via March 26, 2018, 5:06 p.m. UTC
Implements the CPUID trap for CPUID 1 to include the
CPUID_EXT_HYPERVISOR flag in the ECX results. This was preventing some
older linux kernels from booting when trying to access MSR's that dont
make sense when virtualized.

Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
---
 target/i386/whpx-all.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 1 deletion(-)

Comments

Eric Blake March 26, 2018, 6:31 p.m. UTC | #1
On 03/26/2018 12:06 PM, Justin Terry (VM) via Qemu-devel wrote:

[meta-comment]

Since we're already discussing this today, I noticed the list rewrites 
your headers due to SPF policies at microsoft.com, to the point that a 
blind 'git am' on the list message will attempt to attribute your patch 
to 'Justin Terry (VM) via Qemu-devel <qemu-devel@nongnu.org>' if the 
maintainer is not careful.

You may want to configure git on your end to emit an explicit 'From: 
Justin Terry (VM) <juterry@microsoft.com>' in the body of each message 
you send, to override the mailman overwrite of your messages to the 
list.  I don't know whether 'git config format.from "Your Name 
<address@example.com>"' is sufficient for this (the help for git 
format-patch mentions that it adds the From: line to the body only when 
it differs textually from the From: header, but doesn't mention a way to 
force the From: line in the body) even when the two start out the same.

> Implements the CPUID trap for CPUID 1 to include the
> CPUID_EXT_HYPERVISOR flag in the ECX results. This was preventing some
> older linux kernels from booting when trying to access MSR's that dont

s/dont/don't/

> make sense when virtualized.
> 
> Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
> ---
Eduardo Habkost March 28, 2018, 5:50 p.m. UTC | #2
On Mon, Mar 26, 2018 at 10:06:58AM -0700, Justin Terry (VM) wrote:
> Implements the CPUID trap for CPUID 1 to include the
> CPUID_EXT_HYPERVISOR flag in the ECX results. This was preventing some
> older linux kernels from booting when trying to access MSR's that dont
> make sense when virtualized.
> 
> Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
> ---
>  target/i386/whpx-all.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 78 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
> index bf33d320bf..58435178a4 100644
> --- a/target/i386/whpx-all.c
> +++ b/target/i386/whpx-all.c
> @@ -911,12 +911,62 @@ static int whpx_vcpu_run(CPUState *cpu)
>              ret = 1;
>              break;
>  
> +        case WHvRunVpExitReasonX64Cpuid: {
> +            WHV_REGISTER_VALUE reg_values[5] = {0};
> +            WHV_REGISTER_NAME reg_names[5];
> +            UINT32 reg_count = 5;
> +            UINT64 rip, rax, rcx, rdx, rbx;
> +
> +            rip = vcpu->exit_ctx.VpContext.Rip +
> +                  vcpu->exit_ctx.VpContext.InstructionLength;
> +            switch (vcpu->exit_ctx.CpuidAccess.Rax) {
> +            case 1:
> +                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> +                /* Advertise that we are running on a hypervisor */
> +                rcx =
> +                    vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
> +                    CPUID_EXT_HYPERVISOR;
> +
> +                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> +                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> +                break;
> +            default:
> +                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> +                rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
> +                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> +                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;

Interesting, so the WHPX API already tries to provide default
values for the CPUID leaves.  Would it make sense to try and use
the values returned by cpu_x86_cpuid() in the future?

Is there a way to get the default CPUID results from the WHPX API
without calling WHvRunVirtualProcessor(), so QEMU can be aware of
what exactly the guest is seeing on CPUID?


> +            }
> +
> +            reg_names[0] = WHvX64RegisterRip;
> +            reg_names[1] = WHvX64RegisterRax;
> +            reg_names[2] = WHvX64RegisterRcx;
> +            reg_names[3] = WHvX64RegisterRdx;
> +            reg_names[4] = WHvX64RegisterRbx;
> +
> +            reg_values[0].Reg64 = rip;
> +            reg_values[1].Reg64 = rax;
> +            reg_values[2].Reg64 = rcx;
> +            reg_values[3].Reg64 = rdx;
> +            reg_values[4].Reg64 = rbx;
> +
> +            hr = WHvSetVirtualProcessorRegisters(whpx->partition,
> +                                                 cpu->cpu_index,
> +                                                 reg_names,
> +                                                 reg_count,
> +                                                 reg_values);
> +
> +            if (FAILED(hr)) {
> +                error_report("WHPX: Failed to set CpuidAccess state registers,"
> +                             " hr=%08lx", hr);
> +            }
> +            ret = 0;
> +            break;
> +        }
>          case WHvRunVpExitReasonNone:
>          case WHvRunVpExitReasonUnrecoverableException:
>          case WHvRunVpExitReasonInvalidVpRegisterValue:
>          case WHvRunVpExitReasonUnsupportedFeature:
>          case WHvRunVpExitReasonX64MsrAccess:
> -        case WHvRunVpExitReasonX64Cpuid:
>          case WHvRunVpExitReasonException:
>          default:
>              error_report("WHPX: Unexpected VP exit code %d",
> @@ -1272,6 +1322,33 @@ static int whpx_accel_init(MachineState *ms)
>          goto error;
>      }
>  
> +    memset(&prop, 0, sizeof(WHV_PARTITION_PROPERTY));
> +    prop.ExtendedVmExits.X64CpuidExit = 1;
> +    hr = WHvSetPartitionProperty(whpx->partition,
> +                                 WHvPartitionPropertyCodeExtendedVmExits,
> +                                 &prop,
> +                                 sizeof(WHV_PARTITION_PROPERTY));
> +
> +    if (FAILED(hr)) {
> +        error_report("WHPX: Failed to enable partition extended X64CpuidExit"
> +                     " hr=%08lx", hr);
> +        ret = -EINVAL;
> +        goto error;
> +    }
> +
> +    UINT32 cpuidExitList[] = {1};
> +    hr = WHvSetPartitionProperty(whpx->partition,
> +                                 WHvPartitionPropertyCodeCpuidExitList,
> +                                 cpuidExitList,
> +                                 RTL_NUMBER_OF(cpuidExitList) * sizeof(UINT32));
> +
> +    if (FAILED(hr)) {
> +        error_report("WHPX: Failed to set partition CpuidExitList hr=%08lx",
> +                     hr);
> +        ret = -EINVAL;
> +        goto error;
> +    }
> +
>      hr = WHvSetupPartition(whpx->partition);
>      if (FAILED(hr)) {
>          error_report("WHPX: Failed to setup partition, hr=%08lx", hr);
> -- 
> 2.11.0
>
Zhijian Li (Fujitsu)" via March 28, 2018, 8:48 p.m. UTC | #3
Hey Eduardo

Responses inline. Thanks!

> -----Original Message-----
> From: Eduardo Habkost <ehabkost@redhat.com>
> Sent: Wednesday, March 28, 2018 10:51 AM
> To: Justin Terry (VM) <juterry@microsoft.com>
> Cc: qemu-devel@nongnu.org; pbonzini@redhat.com; rth@twiddle.net
> Subject: Re: [PATCH] WHPX fixes an issue with CPUID 1 not returning
> CPUID_EXT_HYPERVISOR
> 
> On Mon, Mar 26, 2018 at 10:06:58AM -0700, Justin Terry (VM) wrote:
> > Implements the CPUID trap for CPUID 1 to include the
> > CPUID_EXT_HYPERVISOR flag in the ECX results. This was preventing some
> > older linux kernels from booting when trying to access MSR's that dont
> > make sense when virtualized.
> >
> > Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
> > ---
> >  target/i386/whpx-all.c | 79
> > +++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 78 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c index
> > bf33d320bf..58435178a4 100644
> > --- a/target/i386/whpx-all.c
> > +++ b/target/i386/whpx-all.c
> > @@ -911,12 +911,62 @@ static int whpx_vcpu_run(CPUState *cpu)
> >              ret = 1;
> >              break;
> >
> > +        case WHvRunVpExitReasonX64Cpuid: {
> > +            WHV_REGISTER_VALUE reg_values[5] = {0};
> > +            WHV_REGISTER_NAME reg_names[5];
> > +            UINT32 reg_count = 5;
> > +            UINT64 rip, rax, rcx, rdx, rbx;
> > +
> > +            rip = vcpu->exit_ctx.VpContext.Rip +
> > +                  vcpu->exit_ctx.VpContext.InstructionLength;
> > +            switch (vcpu->exit_ctx.CpuidAccess.Rax) {
> > +            case 1:
> > +                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> > +                /* Advertise that we are running on a hypervisor */
> > +                rcx =
> > +                    vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
> > +                    CPUID_EXT_HYPERVISOR;
> > +
> > +                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > +                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> > +                break;
> > +            default:
> > +                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> > +                rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
> > +                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > +                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> 
> Interesting, so the WHPX API already tries to provide default values for the
> CPUID leaves.  Would it make sense to try and use the values returned by
> cpu_x86_cpuid() in the future?
> 
> Is there a way to get the default CPUID results from the WHPX API without
> calling WHvRunVirtualProcessor(), so QEMU can be aware of what exactly
> the guest is seeing on CPUID?

The platform now has two ways to interact with CPUID.

1. (As the code is doing now). At partition creation time you can register for specific CPUID exits and then respond to the CPUID with your custom answer or with the Hypervisor defaults that were forwarded to you. Unfortunately, QEMU has no way to know the Hypervisor default ahead of time but QEMU can make at least make a runtime decision about how to respond.
2. At partition creation time the platform allows QEMU to inject (set) the default responses for specific CPUID exits. This can now be done by setting the  `WHV_X64_CPUID_RESULT` in the `CpuidResultList` of `WHV_PARTITION_PROPERTY` to the exit values QEMU wants. So effectively you can know the answers ahead of time for any that you set but the answers are not dynamic.

The only issues/questions I have there are:

If we use [1] (like the code is now) I don't see any way to keep the exits in cpu_x86_cpuid() matched up with the registered exits to WHPX. This means that WHPX would not be called in these cases and would instead get the Hypervisor default rather than the answer from cpu_x86_cpuid().

If we use [2] to inject the answers at creation time WHPX needs access to the CPUX86State at accel init which also doesn't seem to be possible in QEMU today. WHPX could basically just call cpu_x86_cpuid() for each CPUID QEMU cares about and plumb the answer before start. This has the best performance as we avoid the additional exits but has an issue in that the results must be known ahead of time.

And, we could obviously use a hybrid of the two for cases we know. Do you have any ideas that I could try out here on how you would like to see this work?

Thanks,
Justin

> 
> 
> > +            }
> > +
> > +            reg_names[0] = WHvX64RegisterRip;
> > +            reg_names[1] = WHvX64RegisterRax;
> > +            reg_names[2] = WHvX64RegisterRcx;
> > +            reg_names[3] = WHvX64RegisterRdx;
> > +            reg_names[4] = WHvX64RegisterRbx;
> > +
> > +            reg_values[0].Reg64 = rip;
> > +            reg_values[1].Reg64 = rax;
> > +            reg_values[2].Reg64 = rcx;
> > +            reg_values[3].Reg64 = rdx;
> > +            reg_values[4].Reg64 = rbx;
> > +
> > +            hr = WHvSetVirtualProcessorRegisters(whpx->partition,
> > +                                                 cpu->cpu_index,
> > +                                                 reg_names,
> > +                                                 reg_count,
> > +                                                 reg_values);
> > +
> > +            if (FAILED(hr)) {
> > +                error_report("WHPX: Failed to set CpuidAccess state registers,"
> > +                             " hr=%08lx", hr);
> > +            }
> > +            ret = 0;
> > +            break;
> > +        }
> >          case WHvRunVpExitReasonNone:
> >          case WHvRunVpExitReasonUnrecoverableException:
> >          case WHvRunVpExitReasonInvalidVpRegisterValue:
> >          case WHvRunVpExitReasonUnsupportedFeature:
> >          case WHvRunVpExitReasonX64MsrAccess:
> > -        case WHvRunVpExitReasonX64Cpuid:
> >          case WHvRunVpExitReasonException:
> >          default:
> >              error_report("WHPX: Unexpected VP exit code %d", @@
> > -1272,6 +1322,33 @@ static int whpx_accel_init(MachineState *ms)
> >          goto error;
> >      }
> >
> > +    memset(&prop, 0, sizeof(WHV_PARTITION_PROPERTY));
> > +    prop.ExtendedVmExits.X64CpuidExit = 1;
> > +    hr = WHvSetPartitionProperty(whpx->partition,
> > +                                 WHvPartitionPropertyCodeExtendedVmExits,
> > +                                 &prop,
> > +                                 sizeof(WHV_PARTITION_PROPERTY));
> > +
> > +    if (FAILED(hr)) {
> > +        error_report("WHPX: Failed to enable partition extended
> X64CpuidExit"
> > +                     " hr=%08lx", hr);
> > +        ret = -EINVAL;
> > +        goto error;
> > +    }
> > +
> > +    UINT32 cpuidExitList[] = {1};
> > +    hr = WHvSetPartitionProperty(whpx->partition,
> > +                                 WHvPartitionPropertyCodeCpuidExitList,
> > +                                 cpuidExitList,
> > +                                 RTL_NUMBER_OF(cpuidExitList) *
> > + sizeof(UINT32));
> > +
> > +    if (FAILED(hr)) {
> > +        error_report("WHPX: Failed to set partition CpuidExitList hr=%08lx",
> > +                     hr);
> > +        ret = -EINVAL;
> > +        goto error;
> > +    }
> > +
> >      hr = WHvSetupPartition(whpx->partition);
> >      if (FAILED(hr)) {
> >          error_report("WHPX: Failed to setup partition, hr=%08lx",
> > hr);
> > --
> > 2.11.0
> >
> 
> --
> Eduardo
Eduardo Habkost April 3, 2018, 1:36 a.m. UTC | #4
On Wed, Mar 28, 2018 at 08:48:54PM +0000, Justin Terry (VM) wrote:
> Hey Eduardo
> 
> Responses inline. Thanks!
> 
> > -----Original Message-----
> > From: Eduardo Habkost <ehabkost@redhat.com>
> > Sent: Wednesday, March 28, 2018 10:51 AM
> > To: Justin Terry (VM) <juterry@microsoft.com>
> > Cc: qemu-devel@nongnu.org; pbonzini@redhat.com; rth@twiddle.net
> > Subject: Re: [PATCH] WHPX fixes an issue with CPUID 1 not returning
> > CPUID_EXT_HYPERVISOR
> > 
> > On Mon, Mar 26, 2018 at 10:06:58AM -0700, Justin Terry (VM) wrote:
> > > Implements the CPUID trap for CPUID 1 to include the
> > > CPUID_EXT_HYPERVISOR flag in the ECX results. This was preventing some
> > > older linux kernels from booting when trying to access MSR's that dont
> > > make sense when virtualized.
> > >
> > > Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
> > > ---
> > >  target/i386/whpx-all.c | 79
> > > +++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 78 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c index
> > > bf33d320bf..58435178a4 100644
> > > --- a/target/i386/whpx-all.c
> > > +++ b/target/i386/whpx-all.c
> > > @@ -911,12 +911,62 @@ static int whpx_vcpu_run(CPUState *cpu)
> > >              ret = 1;
> > >              break;
> > >
> > > +        case WHvRunVpExitReasonX64Cpuid: {
> > > +            WHV_REGISTER_VALUE reg_values[5] = {0};
> > > +            WHV_REGISTER_NAME reg_names[5];
> > > +            UINT32 reg_count = 5;
> > > +            UINT64 rip, rax, rcx, rdx, rbx;
> > > +
> > > +            rip = vcpu->exit_ctx.VpContext.Rip +
> > > +                  vcpu->exit_ctx.VpContext.InstructionLength;
> > > +            switch (vcpu->exit_ctx.CpuidAccess.Rax) {
> > > +            case 1:
> > > +                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> > > +                /* Advertise that we are running on a hypervisor */
> > > +                rcx =
> > > +                    vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
> > > +                    CPUID_EXT_HYPERVISOR;
> > > +
> > > +                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > > +                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> > > +                break;
> > > +            default:
> > > +                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
> > > +                rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
> > > +                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
> > > +                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
> > 
> > Interesting, so the WHPX API already tries to provide default values for the
> > CPUID leaves.  Would it make sense to try and use the values returned by
> > cpu_x86_cpuid() in the future?
> > 
> > Is there a way to get the default CPUID results from the WHPX API without
> > calling WHvRunVirtualProcessor(), so QEMU can be aware of what exactly
> > the guest is seeing on CPUID?
> 
> The platform now has two ways to interact with CPUID.
> 
> 1. (As the code is doing now). At partition creation time you
> can register for specific CPUID exits and then respond to the
> CPUID with your custom answer or with the Hypervisor defaults
> that were forwarded to you. Unfortunately, QEMU has no way to
> know the Hypervisor default ahead of time but QEMU can make at
> least make a runtime decision about how to respond.
> 2. At partition creation time the platform allows QEMU to
> inject (set) the default responses for specific CPUID exits.
> This can now be done by setting the  `WHV_X64_CPUID_RESULT` in
> the `CpuidResultList` of `WHV_PARTITION_PROPERTY` to the exit
> values QEMU wants. So effectively you can know the answers
> ahead of time for any that you set but the answers are not
> dynamic.
> 
> The only issues/questions I have there are:
> 
> If we use [1] (like the code is now) I don't see any way to
> keep the exits in cpu_x86_cpuid() matched up with the
> registered exits to WHPX. This means that WHPX would not be
> called in these cases and would instead get the Hypervisor
> default rather than the answer from cpu_x86_cpuid().

I assume you could call cpu_x86_cpuid() on every CPUID exit and
override the hypervisor default completely.  Not a very efficient
solution, but seems simple to implement.

But I'm still not sure if you really want to override the
hypervisor defaults completely (see below).

> 
> If we use [2] to inject the answers at creation time WHPX needs
> access to the CPUX86State at accel init which also doesn't seem
> to be possible in QEMU today. WHPX could basically just call
> cpu_x86_cpuid() for each CPUID QEMU cares about and plumb the
> answer before start. This has the best performance as we avoid
> the additional exits but has an issue in that the results must
> be known ahead of time.

You already have a hook into CPU initialization at
whpx_init_vcpu(), wouldn't it work for you?

> 
> And, we could obviously use a hybrid of the two for cases we
> know. Do you have any ideas that I could try out here on how
> you would like to see this work?

(2) matches very closely how it works on KVM, and using the
contents of cpu_x86_cpuid() (using either method) seems necessary
if you want to safely support live migration between different
host CPUs.

But there's one thing I would like to understand better, before
giving any advice: how important/useful are the hypervisor
defaults for your users?  The cost of making QEMU manage every
single CPUID bit is extra complexity.  The benefit you get in
return is letting management software + QEMU ensure guest ABI is
stable and nothing breaks on live migration.  So it depends on
the use cases you have in mind.
Paolo Bonzini April 5, 2018, 3:50 p.m. UTC | #5
On 28/03/2018 22:48, Justin Terry (VM) wrote:
> 1. (As the code is doing now). At partition creation time you can
> register for specific CPUID exits and then respond to the CPUID with
> your custom answer or with the Hypervisor defaults that were forwarded
> to you. Unfortunately, QEMU has no way to know the Hypervisor default
> ahead of time but QEMU can make at least make a runtime decision about
> how to respond.
>
> 2. At partition creation time the platform allows QEMU to inject (set)
> the default responses for specific CPUID exits.

... but it still cannot access the hypervisor defaults, right?

> This can now be done by
> setting the `WHV_X64_CPUID_RESULT` in the `CpuidResultList` of
> `WHV_PARTITION_PROPERTY` to the exit values QEMU wants. So effectively
> you can know the answers ahead of time for any that you set but the
> answers are not dynamic.
> 
> The only issues/questions I have there are:
> 
> If we use [1] (like the code is now) I don't see any way to keep the
> exits in cpu_x86_cpuid() matched up with the registered exits to WHPX.

You'd have to do that on a case-by-case basis.  For example, the number
of leaves can be the minimum of the hypervisor and QEMU values, or just
the QEMU value; for "feature" leaves the results will be the AND of the
QEMU and WHPX features; for the XSAVE/XSAVEC/XSAVES (size, offset)
tuples you have to use WHPX's; for family/model/stepping/name/vendor
your mileage may vary but I suppose you can just use WHPX's; and so on.

You can take a look at target/i386/cpu.c's array feature_word_info and
add a function like

void x86_is_feature_word(uint32_t eax, uint32_t ecx, int reg)
{
    for (w = 0; w < FEATURE_WORDS; w++) {
        FeatureWordInfo *wi = &feature_word_info[w];
        if (eax == wi->cpuid_eax &&
            (ecx == wi->cpuid_ecx || wi->cpuid_needs_ecx) &&
            reg == wi->cpuid_reg) {
            return true;
        }
    }
    return false;
}

so that the code in the end is

    switch (eax) {
        /* yadda yadda... code to special case leaves whose value
         * comes from WHPX.
         */
        ...
    default:
        if (x86_is_feature_word(eax, ecx, R_EAX)) {
            rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
        }
        if (x86_is_feature_word(eax, ecx, R_EBX)) {
            rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
        }
        if (x86_is_feature_word(eax, ecx, R_ECX)) {
            rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
        }
        if (x86_is_feature_word(eax, ecx, R_EDX)) {
            rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
        }
    }

> If we use [2] to inject the answers at creation time WHPX needs access
> to the CPUX86State at accel init which also doesn't seem to be possible
> in QEMU today. WHPX could basically just call cpu_x86_cpuid() for each
> CPUID QEMU cares about and plumb the answer before start. This has the
> best performance as we avoid the additional exits but has an issue in
> that the results must be known ahead of time.

The earliest where you have access to that is x86_cpu_initfn.

Paolo
Eduardo Habkost April 16, 2018, 4:33 p.m. UTC | #6
On Thu, Apr 05, 2018 at 05:50:20PM +0200, Paolo Bonzini wrote:
> On 28/03/2018 22:48, Justin Terry (VM) wrote:
[...]
> > If we use [2] to inject the answers at creation time WHPX needs access
> > to the CPUX86State at accel init which also doesn't seem to be possible
> > in QEMU today. WHPX could basically just call cpu_x86_cpuid() for each
> > CPUID QEMU cares about and plumb the answer before start. This has the
> > best performance as we avoid the additional exits but has an issue in
> > that the results must be known ahead of time.
> 
> The earliest where you have access to that is x86_cpu_initfn.

x86_cpu_initfn() is the earliest you have access to the CPU
object, but note that the final CPUID bits (based on -cpu
options, accel data, and possibly other input) are known only
when x86_cpu_realizefn() is called.
Zhijian Li (Fujitsu)" via April 16, 2018, 8:44 p.m. UTC | #7
Hey Eduardo/Paolo,

I have not forgotten about your responses. I am working out how best to do this in our platform and will send a follow up patch (this one is already merged) to fully support the -cpu flag. It looks like all the pieces are in place between the two and we just need a bit of translation between the QEMU state at partition creation/start/execution to handle the various exits either by pre-setting the value or via the actual CPUID trap at runtime. Thank you for all your insights/input up to here. It has been really helpful. More to come.

Thanks,
Justin

> -----Original Message-----
> From: Eduardo Habkost <ehabkost@redhat.com>
> Sent: Monday, April 16, 2018 9:33 AM
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Justin Terry (VM) <juterry@microsoft.com>; qemu-devel@nongnu.org;
> rth@twiddle.net
> Subject: Re: [PATCH] WHPX fixes an issue with CPUID 1 not returning
> CPUID_EXT_HYPERVISOR
> 
> On Thu, Apr 05, 2018 at 05:50:20PM +0200, Paolo Bonzini wrote:
> > On 28/03/2018 22:48, Justin Terry (VM) wrote:
> [...]
> > > If we use [2] to inject the answers at creation time WHPX needs
> > > access to the CPUX86State at accel init which also doesn't seem to
> > > be possible in QEMU today. WHPX could basically just call
> > > cpu_x86_cpuid() for each CPUID QEMU cares about and plumb the
> answer
> > > before start. This has the best performance as we avoid the
> > > additional exits but has an issue in that the results must be known ahead
> of time.
> >
> > The earliest where you have access to that is x86_cpu_initfn.
> 
> x86_cpu_initfn() is the earliest you have access to the CPU object, but note
> that the final CPUID bits (based on -cpu options, accel data, and possibly
> other input) are known only when x86_cpu_realizefn() is called.
> 
> --
> Eduardo
diff mbox

Patch

diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
index bf33d320bf..58435178a4 100644
--- a/target/i386/whpx-all.c
+++ b/target/i386/whpx-all.c
@@ -911,12 +911,62 @@  static int whpx_vcpu_run(CPUState *cpu)
             ret = 1;
             break;
 
+        case WHvRunVpExitReasonX64Cpuid: {
+            WHV_REGISTER_VALUE reg_values[5] = {0};
+            WHV_REGISTER_NAME reg_names[5];
+            UINT32 reg_count = 5;
+            UINT64 rip, rax, rcx, rdx, rbx;
+
+            rip = vcpu->exit_ctx.VpContext.Rip +
+                  vcpu->exit_ctx.VpContext.InstructionLength;
+            switch (vcpu->exit_ctx.CpuidAccess.Rax) {
+            case 1:
+                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
+                /* Advertise that we are running on a hypervisor */
+                rcx =
+                    vcpu->exit_ctx.CpuidAccess.DefaultResultRcx |
+                    CPUID_EXT_HYPERVISOR;
+
+                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
+                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
+                break;
+            default:
+                rax = vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
+                rcx = vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
+                rdx = vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
+                rbx = vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
+            }
+
+            reg_names[0] = WHvX64RegisterRip;
+            reg_names[1] = WHvX64RegisterRax;
+            reg_names[2] = WHvX64RegisterRcx;
+            reg_names[3] = WHvX64RegisterRdx;
+            reg_names[4] = WHvX64RegisterRbx;
+
+            reg_values[0].Reg64 = rip;
+            reg_values[1].Reg64 = rax;
+            reg_values[2].Reg64 = rcx;
+            reg_values[3].Reg64 = rdx;
+            reg_values[4].Reg64 = rbx;
+
+            hr = WHvSetVirtualProcessorRegisters(whpx->partition,
+                                                 cpu->cpu_index,
+                                                 reg_names,
+                                                 reg_count,
+                                                 reg_values);
+
+            if (FAILED(hr)) {
+                error_report("WHPX: Failed to set CpuidAccess state registers,"
+                             " hr=%08lx", hr);
+            }
+            ret = 0;
+            break;
+        }
         case WHvRunVpExitReasonNone:
         case WHvRunVpExitReasonUnrecoverableException:
         case WHvRunVpExitReasonInvalidVpRegisterValue:
         case WHvRunVpExitReasonUnsupportedFeature:
         case WHvRunVpExitReasonX64MsrAccess:
-        case WHvRunVpExitReasonX64Cpuid:
         case WHvRunVpExitReasonException:
         default:
             error_report("WHPX: Unexpected VP exit code %d",
@@ -1272,6 +1322,33 @@  static int whpx_accel_init(MachineState *ms)
         goto error;
     }
 
+    memset(&prop, 0, sizeof(WHV_PARTITION_PROPERTY));
+    prop.ExtendedVmExits.X64CpuidExit = 1;
+    hr = WHvSetPartitionProperty(whpx->partition,
+                                 WHvPartitionPropertyCodeExtendedVmExits,
+                                 &prop,
+                                 sizeof(WHV_PARTITION_PROPERTY));
+
+    if (FAILED(hr)) {
+        error_report("WHPX: Failed to enable partition extended X64CpuidExit"
+                     " hr=%08lx", hr);
+        ret = -EINVAL;
+        goto error;
+    }
+
+    UINT32 cpuidExitList[] = {1};
+    hr = WHvSetPartitionProperty(whpx->partition,
+                                 WHvPartitionPropertyCodeCpuidExitList,
+                                 cpuidExitList,
+                                 RTL_NUMBER_OF(cpuidExitList) * sizeof(UINT32));
+
+    if (FAILED(hr)) {
+        error_report("WHPX: Failed to set partition CpuidExitList hr=%08lx",
+                     hr);
+        ret = -EINVAL;
+        goto error;
+    }
+
     hr = WHvSetupPartition(whpx->partition);
     if (FAILED(hr)) {
         error_report("WHPX: Failed to setup partition, hr=%08lx", hr);