diff mbox

[v2,10/30] xen/x86: Annotate VM applicability in featureset

Message ID 1454679743-18133-11-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Feb. 5, 2016, 1:42 p.m. UTC
Use attributes to specify whether a feature is applicable to be exposed to:
 1) All guests
 2) HVM guests
 3) HVM HAP guests

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2: Annotate features using a magic comment and autogeneration.
---
 xen/include/public/arch-x86/cpufeatureset.h | 187 ++++++++++++++--------------
 xen/tools/gen-cpuid.py                      |  32 ++++-
 2 files changed, 127 insertions(+), 92 deletions(-)

Comments

Jan Beulich Feb. 12, 2016, 5:05 p.m. UTC | #1
>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> Use attributes to specify whether a feature is applicable to be exposed to:
>  1) All guests
>  2) HVM guests
>  3) HVM HAP guests

No provisions for PV-only or shadow-only features?

> +#define X86_FEATURE_MTRR          ( 0*32+12) /*S  Memory Type Range Registers */

Why is this being hidden from PV guests (namely Dom0)? Right now
this is being cleared for PHV only.

> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */

Shouldn't this get a "nested-only" class? Also don't we currently
require HAP for nested mode to work?

>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */

Leaving this untouched warrants at least a comment in the commit
message I would think.

> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */

Hmm, I'm confused - on one hand we currently clear this bit for
PV guests, but otoh do_invalid_op() emulates it.

> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
> [...]
> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */

How do you intend to handle exposing these to 64-bit PV guests,
but not to 32-bit ones?

>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */

This currently is left untouched for HVM guests, and gets cleared
only when !cpu_has_apic (i.e. effectively never) for PV ones.

>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */

Why not exposed to HVM (also for _MWAIT as I now notice)?

Jan
Andrew Cooper Feb. 12, 2016, 5:42 p.m. UTC | #2
On 12/02/16 17:05, Jan Beulich wrote:
>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>> Use attributes to specify whether a feature is applicable to be exposed to:
>>  1) All guests
>>  2) HVM guests
>>  3) HVM HAP guests
> No provisions for PV-only or shadow-only features?

No.  There are not currently any, and I would prefer to avoid
introducing any.

Of course, if a plausible usecase comes along, we can certainly make
changes and accommodate.

>
>> +#define X86_FEATURE_MTRR          ( 0*32+12) /*S  Memory Type Range Registers */
> Why is this being hidden from PV guests (namely Dom0)? Right now
> this is being cleared for PHV only.

It is unilaterally hidden from PV domains other than dom0.  We have no
handling for them in emulate_privileged_op(), which means dom0 can't use
them anyway.

>
>> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
> Shouldn't this get a "nested-only" class?

I am not sure that would be appropriate.  On the Intel side, this bit is
the only option in cpuid; the VT-x features need MSR-levelling, which is
moderately far away on my TODO list.

Having said that, the AMD side has all nested features in cpuid.  I
guess this is more a problem for whomever steps up and makes nested virt
a properly supported option, but this is way off at this point.

> Also don't we currently require HAP for nested mode to work?

Experimentally, the p2m lock contention caused by Shadow and Nested virt
caused Xen to fall over very frequently with watchdog timeouts.

Having said that, nothing formal is written down one way or another, and
it is possible in limited scenarios to make nested virt work without
hap.  FWIW, I would be happy with a blanket "no nested virt without HAP"
statement for Xen.

>
>>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
> Leaving this untouched warrants at least a comment in the commit
> message I would think.

The handling for the magic bits

>
>> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
> Hmm, I'm confused - on one hand we currently clear this bit for
> PV guests, but otoh do_invalid_op() emulates it.

Urgh yes - I had forgotten about this gem.  I lacked sufficient tuits to
untangle the swamp which is the vtsc subsystem.

Currently, the dynamic vtsc setting controls whether the RDTSCP feature
flag is visible.

>
>> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
>> [...]
>> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
>> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
> How do you intend to handle exposing these to 64-bit PV guests,
> but not to 32-bit ones?

At the end of this series, the deep dependency logic used by the
toolstack, and some remaining dynamic checks in the intercept hooks.

By the end of my plans for full cpuid handling, Xen will create a policy
on each domaincreate, and keep it consistent with calls such as set_width.

>
>>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
> This currently is left untouched for HVM guests, and gets cleared
> only when !cpu_has_apic (i.e. effectively never) for PV ones.

There is no HVM support for handling a guest trying to use EXTAPIC, and
PV guests don't get to play with the hardware APIC anyway.  As far as I
can tell, it has always been wrong to ever expose this feature.

>
>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */
> Why not exposed to HVM (also for _MWAIT as I now notice)?

Because that is a good chunk of extra work to support.  We would need to
use 4K monitor widths, and extra p2m handling.

~Andrew
Jan Beulich Feb. 15, 2016, 9:20 a.m. UTC | #3
>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 17:05, Jan Beulich wrote:
>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
>> Shouldn't this get a "nested-only" class?
> 
> I am not sure that would be appropriate.  On the Intel side, this bit is
> the only option in cpuid; the VT-x features need MSR-levelling, which is
> moderately far away on my TODO list.
> 
> Having said that, the AMD side has all nested features in cpuid.  I
> guess this is more a problem for whomever steps up and makes nested virt
> a properly supported option, but this is way off at this point.

Okay then. My hope was that introducing the extra category
wouldn't be too much extra effort inside this series.

>> Also don't we currently require HAP for nested mode to work?
> 
> Experimentally, the p2m lock contention caused by Shadow and Nested virt
> caused Xen to fall over very frequently with watchdog timeouts.
> 
> Having said that, nothing formal is written down one way or another, and
> it is possible in limited scenarios to make nested virt work without
> hap.  FWIW, I would be happy with a blanket "no nested virt without HAP"
> statement for Xen.

Same here.

>>>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
>> Leaving this untouched warrants at least a comment in the commit
>> message I would think.
> 
> The handling for the magic bits

Unfinished sentence?

>>> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
>> Hmm, I'm confused - on one hand we currently clear this bit for
>> PV guests, but otoh do_invalid_op() emulates it.
> 
> Urgh yes - I had forgotten about this gem.  I lacked sufficient tuits to
> untangle the swamp which is the vtsc subsystem.
> 
> Currently, the dynamic vtsc setting controls whether the RDTSCP feature
> flag is visible.

I don't see where that would be happening - all I see is a single

        __clear_bit(X86_FEATURE_RDTSCP % 32, &d);

in pv_cpuid().

>>> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
>>> [...]
>>> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
>>> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
>> How do you intend to handle exposing these to 64-bit PV guests,
>> but not to 32-bit ones?
> 
> At the end of this series, the deep dependency logic used by the
> toolstack, and some remaining dynamic checks in the intercept hooks.

Okay, I'll try to remember to look there once I get to that point in
the series.

>>>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
>> This currently is left untouched for HVM guests, and gets cleared
>> only when !cpu_has_apic (i.e. effectively never) for PV ones.
> 
> There is no HVM support for handling a guest trying to use EXTAPIC, and
> PV guests don't get to play with the hardware APIC anyway.  As far as I
> can tell, it has always been wrong to ever expose this feature.

Well, that's a fair statement, but should be made in the commit
message (after all it's a behavioral change).

>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
> (MONITORX/MWAITX) */
>> Why not exposed to HVM (also for _MWAIT as I now notice)?
> 
> Because that is a good chunk of extra work to support.  We would need to
> use 4K monitor widths, and extra p2m handling.

I don't understand: The base (_MWAIT) feature being exposed to
guests today, and kernels making use of the feature when available
suggests to me that things work. Are you saying you know
otherwise? (And if there really is a reason to mask the feature all of
the sudden, this should again be justified in the commit message.)

Jan
Andrew Cooper Feb. 15, 2016, 2:38 p.m. UTC | #4
On 15/02/16 09:20, Jan Beulich wrote:
>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>> +#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
>>> Shouldn't this get a "nested-only" class?
>> I am not sure that would be appropriate.  On the Intel side, this bit is
>> the only option in cpuid; the VT-x features need MSR-levelling, which is
>> moderately far away on my TODO list.
>>
>> Having said that, the AMD side has all nested features in cpuid.  I
>> guess this is more a problem for whomever steps up and makes nested virt
>> a properly supported option, but this is way off at this point.
> Okay then. My hope was that introducing the extra category
> wouldn't be too much extra effort inside this series.
>>> Also don't we currently require HAP for nested mode to work?
>> Experimentally, the p2m lock contention caused by Shadow and Nested virt
>> caused Xen to fall over very frequently with watchdog timeouts.
>>
>> Having said that, nothing formal is written down one way or another, and
>> it is possible in limited scenarios to make nested virt work without
>> hap.  FWIW, I would be happy with a blanket "no nested virt without HAP"
>> statement for Xen.
> Same here.

Where possible, I am deliberately trying not to make policy changes
hidden inside a large functional series.

As far as nested virt specifically goes, I am still not sure what is the
best approach, so would prefer not to change things at this time.

>
>>>>  #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
>>> Leaving this untouched warrants at least a comment in the commit
>>> message I would think.
>> The handling for the magic bits
> Unfinished sentence?

Oops yes, although it was going to be statement rather than a query.

>
>>>> +#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
>>> Hmm, I'm confused - on one hand we currently clear this bit for
>>> PV guests, but otoh do_invalid_op() emulates it.
>> Urgh yes - I had forgotten about this gem.  I lacked sufficient tuits to
>> untangle the swamp which is the vtsc subsystem.
>>
>> Currently, the dynamic vtsc setting controls whether the RDTSCP feature
>> flag is visible.
> I don't see where that would be happening - all I see is a single
>
>         __clear_bit(X86_FEATURE_RDTSCP % 32, &d);
>
> in pv_cpuid().

The HVM side is more dynamic.  Either way, the subsystem is a mess.

>
>>>> +#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
>>>> [...]
>>>> -#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
>>>> +#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
>>> How do you intend to handle exposing these to 64-bit PV guests,
>>> but not to 32-bit ones?
>> At the end of this series, the deep dependency logic used by the
>> toolstack, and some remaining dynamic checks in the intercept hooks.
> Okay, I'll try to remember to look there once I get to that point in
> the series.
>
>>>>  #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
>>> This currently is left untouched for HVM guests, and gets cleared
>>> only when !cpu_has_apic (i.e. effectively never) for PV ones.
>> There is no HVM support for handling a guest trying to use EXTAPIC, and
>> PV guests don't get to play with the hardware APIC anyway.  As far as I
>> can tell, it has always been wrong to ever expose this feature.
> Well, that's a fair statement, but should be made in the commit
> message (after all it's a behavioral change).

I will include it in the commit message in the future.

>
>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>> (MONITORX/MWAITX) */
>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>> Because that is a good chunk of extra work to support.  We would need to
>> use 4K monitor widths, and extra p2m handling.
> I don't understand: The base (_MWAIT) feature being exposed to
> guests today, and kernels making use of the feature when available
> suggests to me that things work. Are you saying you know
> otherwise? (And if there really is a reason to mask the feature all of
> the sudden, this should again be justified in the commit message.)

PV guests had it clobbered by Xen in traps.c

HVM guests have:

vmx.c:
    case EXIT_REASON_MWAIT_INSTRUCTION:
    case EXIT_REASON_MONITOR_INSTRUCTION:
    case EXIT_REASON_GETSEC:
       
/*                                                                                                                                                                              

         * We should never exit on GETSEC because CR4.SMXE is always 0
when                                                                                                             

         * running in guest context, and the CPU checks that before
getting                                                                                                             

         * as far as
vmexit.                                                                                                                                                            

         */
        WARN_ON(exit_reason == EXIT_REASON_GETSEC);
    hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
        break;

and svm.c:
    case VMEXIT_MONITOR:
    case VMEXIT_MWAIT:
        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
        break;

I don't see how a guest could actually use this feature.

~Andrew
Jan Beulich Feb. 15, 2016, 2:50 p.m. UTC | #5
>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 09:20, Jan Beulich wrote:
>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>> (MONITORX/MWAITX) */
>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>> Because that is a good chunk of extra work to support.  We would need to
>>> use 4K monitor widths, and extra p2m handling.
>> I don't understand: The base (_MWAIT) feature being exposed to
>> guests today, and kernels making use of the feature when available
>> suggests to me that things work. Are you saying you know
>> otherwise? (And if there really is a reason to mask the feature all of
>> the sudden, this should again be justified in the commit message.)
> 
> PV guests had it clobbered by Xen in traps.c
> 
> HVM guests have:
> 
> vmx.c:
>     case EXIT_REASON_MWAIT_INSTRUCTION:
>     case EXIT_REASON_MONITOR_INSTRUCTION:
>[...]
>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>         break;
> 
> and svm.c:
>     case VMEXIT_MONITOR:
>     case VMEXIT_MWAIT:
>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>         break;
> 
> I don't see how a guest could actually use this feature.

Do you see the respective intercepts getting enabled anywhere?
(I don't outside of nested code, which I didn't check in detail.)

Jan
Andrew Cooper Feb. 15, 2016, 2:53 p.m. UTC | #6
On 15/02/16 14:50, Jan Beulich wrote:
>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>> (MONITORX/MWAITX) */
>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>> Because that is a good chunk of extra work to support.  We would need to
>>>> use 4K monitor widths, and extra p2m handling.
>>> I don't understand: The base (_MWAIT) feature being exposed to
>>> guests today, and kernels making use of the feature when available
>>> suggests to me that things work. Are you saying you know
>>> otherwise? (And if there really is a reason to mask the feature all of
>>> the sudden, this should again be justified in the commit message.)
>> PV guests had it clobbered by Xen in traps.c
>>
>> HVM guests have:
>>
>> vmx.c:
>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>> [...]
>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>         break;
>>
>> and svm.c:
>>     case VMEXIT_MONITOR:
>>     case VMEXIT_MWAIT:
>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>         break;
>>
>> I don't see how a guest could actually use this feature.
> Do you see the respective intercepts getting enabled anywhere?
> (I don't outside of nested code, which I didn't check in detail.)

Yes - the intercepts are always enabled to prevent the guest actually
putting the processor to sleep.

~Andrew
Jan Beulich Feb. 15, 2016, 3:02 p.m. UTC | #7
>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
> On 15/02/16 14:50, Jan Beulich wrote:
>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>> (MONITORX/MWAITX) */
>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>> use 4K monitor widths, and extra p2m handling.
>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>> guests today, and kernels making use of the feature when available
>>>> suggests to me that things work. Are you saying you know
>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>> the sudden, this should again be justified in the commit message.)
>>> PV guests had it clobbered by Xen in traps.c
>>>
>>> HVM guests have:
>>>
>>> vmx.c:
>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>> [...]
>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>         break;
>>>
>>> and svm.c:
>>>     case VMEXIT_MONITOR:
>>>     case VMEXIT_MWAIT:
>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>         break;
>>>
>>> I don't see how a guest could actually use this feature.
>> Do you see the respective intercepts getting enabled anywhere?
>> (I don't outside of nested code, which I didn't check in detail.)
> 
> Yes - the intercepts are always enabled to prevent the guest actually
> putting the processor to sleep.

Hmm, you're right, somehow I've managed to ignore the relevant
lines grep reported. Yet - how do things work then, without the
MWAIT feature flag currently getting cleared?

Jan
Andrew Cooper Feb. 15, 2016, 3:41 p.m. UTC | #8
On 15/02/16 15:02, Jan Beulich wrote:
>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>> (MONITORX/MWAITX) */
>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>> guests today, and kernels making use of the feature when available
>>>>> suggests to me that things work. Are you saying you know
>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>> the sudden, this should again be justified in the commit message.)
>>>> PV guests had it clobbered by Xen in traps.c
>>>>
>>>> HVM guests have:
>>>>
>>>> vmx.c:
>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>> [...]
>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>         break;
>>>>
>>>> and svm.c:
>>>>     case VMEXIT_MONITOR:
>>>>     case VMEXIT_MWAIT:
>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>         break;
>>>>
>>>> I don't see how a guest could actually use this feature.
>>> Do you see the respective intercepts getting enabled anywhere?
>>> (I don't outside of nested code, which I didn't check in detail.)
>> Yes - the intercepts are always enabled to prevent the guest actually
>> putting the processor to sleep.
> Hmm, you're right, somehow I've managed to ignore the relevant
> lines grep reported. Yet - how do things work then, without the
> MWAIT feature flag currently getting cleared?

I have never observed it being used.  Do you have some local patches in
the SLES hypervisor?

There is some gross layer violation in xen/enlighten.c to pretend that
MWAIT is present to trick the ACPI code into evaluating _CST() methods
to report back to Xen.  (This is yet another PV-ism which will cause a
headache for a DMLite dom0)

~Andrew
Konrad Rzeszutek Wilk Feb. 17, 2016, 7:02 p.m. UTC | #9
On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
> On 15/02/16 15:02, Jan Beulich wrote:
> >>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
> >> On 15/02/16 14:50, Jan Beulich wrote:
> >>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
> >>>> On 15/02/16 09:20, Jan Beulich wrote:
> >>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
> >>>>>> On 12/02/16 17:05, Jan Beulich wrote:
> >>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
> >>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
> >>>>>> (MONITORX/MWAITX) */
> >>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
> >>>>>> Because that is a good chunk of extra work to support.  We would need to
> >>>>>> use 4K monitor widths, and extra p2m handling.
> >>>>> I don't understand: The base (_MWAIT) feature being exposed to
> >>>>> guests today, and kernels making use of the feature when available
> >>>>> suggests to me that things work. Are you saying you know
> >>>>> otherwise? (And if there really is a reason to mask the feature all of
> >>>>> the sudden, this should again be justified in the commit message.)
> >>>> PV guests had it clobbered by Xen in traps.c
> >>>>
> >>>> HVM guests have:
> >>>>
> >>>> vmx.c:
> >>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
> >>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
> >>>> [...]
> >>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
> >>>>         break;
> >>>>
> >>>> and svm.c:
> >>>>     case VMEXIT_MONITOR:
> >>>>     case VMEXIT_MWAIT:
> >>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
> >>>>         break;
> >>>>
> >>>> I don't see how a guest could actually use this feature.
> >>> Do you see the respective intercepts getting enabled anywhere?
> >>> (I don't outside of nested code, which I didn't check in detail.)
> >> Yes - the intercepts are always enabled to prevent the guest actually
> >> putting the processor to sleep.
> > Hmm, you're right, somehow I've managed to ignore the relevant
> > lines grep reported. Yet - how do things work then, without the
> > MWAIT feature flag currently getting cleared?
> 
> I have never observed it being used.  Do you have some local patches in
> the SLES hypervisor?
> 
> There is some gross layer violation in xen/enlighten.c to pretend that
> MWAIT is present to trick the ACPI code into evaluating _CST() methods
> to report back to Xen.  (This is yet another PV-ism which will cause a
> headache for a DMLite dom0)

Yes indeed. CC-ing Roger, and Boris.

> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
Boris Ostrovsky Feb. 17, 2016, 7:58 p.m. UTC | #10
On 02/17/2016 02:02 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>   #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension
>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>
>>>>>> HVM guests have:
>>>>>>
>>>>>> vmx.c:
>>>>>>      case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>      case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>> [...]
>>>>>>      hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>          break;
>>>>>>
>>>>>> and svm.c:
>>>>>>      case VMEXIT_MONITOR:
>>>>>>      case VMEXIT_MWAIT:
>>>>>>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>          break;
>>>>>>
>>>>>> I don't see how a guest could actually use this feature.
>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>> putting the processor to sleep.
>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>> lines grep reported. Yet - how do things work then, without the
>>> MWAIT feature flag currently getting cleared?


We whitelist CPUID0x00000001.ecx features in 
libxc/xc_cpuid_x86.c:xc_cpuid_hvm_policy() so MWAIT is never set.

-boris


>> I have never observed it being used.  Do you have some local patches in
>> the SLES hypervisor?
>>
>> There is some gross layer violation in xen/enlighten.c to pretend that
>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>> to report back to Xen.  (This is yet another PV-ism which will cause a
>> headache for a DMLite dom0)
> Yes indeed. CC-ing Roger, and Boris.
>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
Roger Pau Monné Feb. 18, 2016, 3:02 p.m. UTC | #11
El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>
>>>>>> HVM guests have:
>>>>>>
>>>>>> vmx.c:
>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>> [...]
>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>         break;
>>>>>>
>>>>>> and svm.c:
>>>>>>     case VMEXIT_MONITOR:
>>>>>>     case VMEXIT_MWAIT:
>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>         break;
>>>>>>
>>>>>> I don't see how a guest could actually use this feature.
>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>> putting the processor to sleep.
>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>> lines grep reported. Yet - how do things work then, without the
>>> MWAIT feature flag currently getting cleared?
>>
>> I have never observed it being used.  Do you have some local patches in
>> the SLES hypervisor?
>>
>> There is some gross layer violation in xen/enlighten.c to pretend that
>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>> to report back to Xen.  (This is yet another PV-ism which will cause a
>> headache for a DMLite dom0)
> 
> Yes indeed. CC-ing Roger, and Boris.

Yes, all this is indeed not very nice, and we would ideally like to get
rid of it on PVHv2.

Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
order to fetch this information from user-space and send it to Xen using
privcmd?

AFAIK those tools work on most OSes (or at least the ones we care about
as Dom0).

Roger.
Andrew Cooper Feb. 18, 2016, 3:12 p.m. UTC | #12
On 18/02/16 15:02, Roger Pau Monné wrote:
> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>
>>>>>>> HVM guests have:
>>>>>>>
>>>>>>> vmx.c:
>>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>> [...]
>>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> and svm.c:
>>>>>>>     case VMEXIT_MONITOR:
>>>>>>>     case VMEXIT_MWAIT:
>>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> I don't see how a guest could actually use this feature.
>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>> putting the processor to sleep.
>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>> lines grep reported. Yet - how do things work then, without the
>>>> MWAIT feature flag currently getting cleared?
>>> I have never observed it being used.  Do you have some local patches in
>>> the SLES hypervisor?
>>>
>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>> headache for a DMLite dom0)
>> Yes indeed. CC-ing Roger, and Boris.
> Yes, all this is indeed not very nice, and we would ideally like to get
> rid of it on PVHv2.
>
> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
> order to fetch this information from user-space and send it to Xen using
> privcmd?
>
> AFAIK those tools work on most OSes (or at least the ones we care about
> as Dom0).

In general, we can't rely on userspace evaluation of AML.

For trivial AML which evaluates to a constant, it could be interpreted
by userspace, but anything accessing system resources will need
evaluating by the kernel.

~Andrew
David Vrabel Feb. 18, 2016, 3:16 p.m. UTC | #13
On 18/02/16 15:02, Roger Pau Monné wrote:
> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>
>>>>>>> HVM guests have:
>>>>>>>
>>>>>>> vmx.c:
>>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>> [...]
>>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> and svm.c:
>>>>>>>     case VMEXIT_MONITOR:
>>>>>>>     case VMEXIT_MWAIT:
>>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>         break;
>>>>>>>
>>>>>>> I don't see how a guest could actually use this feature.
>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>> putting the processor to sleep.
>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>> lines grep reported. Yet - how do things work then, without the
>>>> MWAIT feature flag currently getting cleared?
>>>
>>> I have never observed it being used.  Do you have some local patches in
>>> the SLES hypervisor?
>>>
>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>> headache for a DMLite dom0)
>>
>> Yes indeed. CC-ing Roger, and Boris.
> 
> Yes, all this is indeed not very nice, and we would ideally like to get
> rid of it on PVHv2.
> 
> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
> order to fetch this information from user-space and send it to Xen using
> privcmd?
> 
> AFAIK those tools work on most OSes (or at least the ones we care about
> as Dom0).

If that works, that would be great!

David
Boris Ostrovsky Feb. 18, 2016, 4:24 p.m. UTC | #14
On 02/18/2016 10:12 AM, Andrew Cooper wrote:
> On 18/02/16 15:02, Roger Pau Monné wrote:
>> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>   #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension
>>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>>
>>>>>>>> HVM guests have:
>>>>>>>>
>>>>>>>> vmx.c:
>>>>>>>>      case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>>      case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>>> [...]
>>>>>>>>      hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>          break;
>>>>>>>>
>>>>>>>> and svm.c:
>>>>>>>>      case VMEXIT_MONITOR:
>>>>>>>>      case VMEXIT_MWAIT:
>>>>>>>>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>          break;
>>>>>>>>
>>>>>>>> I don't see how a guest could actually use this feature.
>>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>>> putting the processor to sleep.
>>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>>> lines grep reported. Yet - how do things work then, without the
>>>>> MWAIT feature flag currently getting cleared?
>>>> I have never observed it being used.  Do you have some local patches in
>>>> the SLES hypervisor?
>>>>
>>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>>> headache for a DMLite dom0)
>>> Yes indeed. CC-ing Roger, and Boris.
>> Yes, all this is indeed not very nice, and we would ideally like to get
>> rid of it on PVHv2.

We will have to come up with something else: AIUI the whole point of 
xen_check_mwait() is to come up with proper ECX and EDX values for the 
MWAIT CPUID leaf. Those value are expected to be reported from 
xen_cpuid() pv_op so that acpi_processor_ffh_state_probe_cpu() can set C 
states structures properly.

The problem is that PVH executes CPUID instruction natively. (And so 
this must have been broken on PVHv1 as well).

-boris


>>
>> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
>> order to fetch this information from user-space and send it to Xen using
>> privcmd?
>>
>> AFAIK those tools work on most OSes (or at least the ones we care about
>> as Dom0).
> In general, we can't rely on userspace evaluation of AML.
>
> For trivial AML which evaluates to a constant, it could be interpreted
> by userspace, but anything accessing system resources will need
> evaluating by the kernel.
>
> ~Andrew
Andrew Cooper Feb. 18, 2016, 4:48 p.m. UTC | #15
On 18/02/16 16:24, Boris Ostrovsky wrote:
>
>
> On 02/18/2016 10:12 AM, Andrew Cooper wrote:
>> On 18/02/16 15:02, Roger Pau Monné wrote:
>>> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>>>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>>   #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT
>>>>>>>>>>>>> extension
>>>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>>>> Because that is a good chunk of extra work to support.  We
>>>>>>>>>>> would need to
>>>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>>>> guests today, and kernels making use of the feature when
>>>>>>>>>> available
>>>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>>>> otherwise? (And if there really is a reason to mask the
>>>>>>>>>> feature all of
>>>>>>>>>> the sudden, this should again be justified in the commit
>>>>>>>>>> message.)
>>>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>>>
>>>>>>>>> HVM guests have:
>>>>>>>>>
>>>>>>>>> vmx.c:
>>>>>>>>>      case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>>>      case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>>>> [...]
>>>>>>>>>      hvm_inject_hw_exception(TRAP_invalid_op,
>>>>>>>>> HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>>          break;
>>>>>>>>>
>>>>>>>>> and svm.c:
>>>>>>>>>      case VMEXIT_MONITOR:
>>>>>>>>>      case VMEXIT_MWAIT:
>>>>>>>>>          hvm_inject_hw_exception(TRAP_invalid_op,
>>>>>>>>> HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>>          break;
>>>>>>>>>
>>>>>>>>> I don't see how a guest could actually use this feature.
>>>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>>>> Yes - the intercepts are always enabled to prevent the guest
>>>>>>> actually
>>>>>>> putting the processor to sleep.
>>>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>>>> lines grep reported. Yet - how do things work then, without the
>>>>>> MWAIT feature flag currently getting cleared?
>>>>> I have never observed it being used.  Do you have some local
>>>>> patches in
>>>>> the SLES hypervisor?
>>>>>
>>>>> There is some gross layer violation in xen/enlighten.c to pretend
>>>>> that
>>>>> MWAIT is present to trick the ACPI code into evaluating _CST()
>>>>> methods
>>>>> to report back to Xen.  (This is yet another PV-ism which will
>>>>> cause a
>>>>> headache for a DMLite dom0)
>>>> Yes indeed. CC-ing Roger, and Boris.
>>> Yes, all this is indeed not very nice, and we would ideally like to get
>>> rid of it on PVHv2.
>
> We will have to come up with something else: AIUI the whole point of
> xen_check_mwait() is to come up with proper ECX and EDX values for the
> MWAIT CPUID leaf. Those value are expected to be reported from
> xen_cpuid() pv_op so that acpi_processor_ffh_state_probe_cpu() can set
> C states structures properly.
>
> The problem is that PVH executes CPUID instruction natively. (And so
> this must have been broken on PVHv1 as well).

Currently, mwait is unusable by any domains, and will not be offered in
any cpuid policy.

How a particular dom0 goes about deciding to enumerate the ACPI objects
is its own business, but personally I think it is a layering violation
to have the enumeration of an existing ACPI object based on a feature bit.

Dom0, being suitably enlightened, should know that its job is to service
Xen when it comes to ACPI, and unilaterally collect and upload
everything it can.

~Andrew
Roger Pau Monné Feb. 18, 2016, 5:03 p.m. UTC | #16
El 18/2/16 a les 16:12, Andrew Cooper ha escrit:
> On 18/02/16 15:02, Roger Pau Monné wrote:
>> El 17/2/16 a les 20:02, Konrad Rzeszutek Wilk ha escrit:
>>> On Mon, Feb 15, 2016 at 03:41:41PM +0000, Andrew Cooper wrote:
>>>> On 15/02/16 15:02, Jan Beulich wrote:
>>>>>>>> On 15.02.16 at 15:53, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 15/02/16 14:50, Jan Beulich wrote:
>>>>>>>>>> On 15.02.16 at 15:38, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 15/02/16 09:20, Jan Beulich wrote:
>>>>>>>>>>>> On 12.02.16 at 18:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 12/02/16 17:05, Jan Beulich wrote:
>>>>>>>>>>>>>> On 05.02.16 at 14:42, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>>>  #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension 
>>>>>>>>>> (MONITORX/MWAITX) */
>>>>>>>>>>> Why not exposed to HVM (also for _MWAIT as I now notice)?
>>>>>>>>>> Because that is a good chunk of extra work to support.  We would need to
>>>>>>>>>> use 4K monitor widths, and extra p2m handling.
>>>>>>>>> I don't understand: The base (_MWAIT) feature being exposed to
>>>>>>>>> guests today, and kernels making use of the feature when available
>>>>>>>>> suggests to me that things work. Are you saying you know
>>>>>>>>> otherwise? (And if there really is a reason to mask the feature all of
>>>>>>>>> the sudden, this should again be justified in the commit message.)
>>>>>>>> PV guests had it clobbered by Xen in traps.c
>>>>>>>>
>>>>>>>> HVM guests have:
>>>>>>>>
>>>>>>>> vmx.c:
>>>>>>>>     case EXIT_REASON_MWAIT_INSTRUCTION:
>>>>>>>>     case EXIT_REASON_MONITOR_INSTRUCTION:
>>>>>>>> [...]
>>>>>>>>     hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>         break;
>>>>>>>>
>>>>>>>> and svm.c:
>>>>>>>>     case VMEXIT_MONITOR:
>>>>>>>>     case VMEXIT_MWAIT:
>>>>>>>>         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>>>>>         break;
>>>>>>>>
>>>>>>>> I don't see how a guest could actually use this feature.
>>>>>>> Do you see the respective intercepts getting enabled anywhere?
>>>>>>> (I don't outside of nested code, which I didn't check in detail.)
>>>>>> Yes - the intercepts are always enabled to prevent the guest actually
>>>>>> putting the processor to sleep.
>>>>> Hmm, you're right, somehow I've managed to ignore the relevant
>>>>> lines grep reported. Yet - how do things work then, without the
>>>>> MWAIT feature flag currently getting cleared?
>>>> I have never observed it being used.  Do you have some local patches in
>>>> the SLES hypervisor?
>>>>
>>>> There is some gross layer violation in xen/enlighten.c to pretend that
>>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
>>>> to report back to Xen.  (This is yet another PV-ism which will cause a
>>>> headache for a DMLite dom0)
>>> Yes indeed. CC-ing Roger, and Boris.
>> Yes, all this is indeed not very nice, and we would ideally like to get
>> rid of it on PVHv2.
>>
>> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
>> order to fetch this information from user-space and send it to Xen using
>> privcmd?
>>
>> AFAIK those tools work on most OSes (or at least the ones we care about
>> as Dom0).
> 
> In general, we can't rely on userspace evaluation of AML.
> 
> For trivial AML which evaluates to a constant, it could be interpreted
> by userspace, but anything accessing system resources will need
> evaluating by the kernel.

Hm, I've took a look at the ACPI tables in one of my systems, and I'm 
not sure, but I guess the CPU related methods indeed must be executed 
by the kernel. I don't have much idea of ASL, but I guess the 
"Register" instruction means that a specific register must be poked, 
and it probably can't be done from user-space:

[...]
    Scope (\_PR.CPU0)
    {
        Name (_PPC, Zero)  // _PPC: Performance Present Capabilities
        Method (_PCT, 0, NotSerialized)  // _PCT: Performance Control
        {
            \_PR.CPU0._PPC = \_PR.CPPC
            If (((CFGD & One) && (PDC0 & One)))
            {
                Return (Package (0x02)
                {
                    ResourceTemplate ()
                    {
                        Register (FFixedHW, 
                            0x00,               // Bit Width
                            0x00,               // Bit Offset
                            0x0000000000000000, // Address
                            ,)
                    }, 

                    ResourceTemplate ()
                    {
                        Register (FFixedHW, 
                            0x00,               // Bit Width
                            0x00,               // Bit Offset
                            0x0000000000000000, // Address
                            ,)
                    }
                })
            }
        }

        Name (_PSS, Package (0x10)  // _PSS: Performance Supported States
        {
            Package (0x06)
            {
                0x00000834, 
                0x00003A98, 
                0x0000000A, 
                0x0000000A, 
                0x00001500, 
                0x00001500
            }, 

            Package (0x06)
            {
                0x000007D0, 
                0x00003708, 
                0x0000000A, 
                0x0000000A, 
                0x00001400, 
                0x00001400
            }, 
[...]

Do we have a formal list of what exactly does Xen want from ACPI that 
it cannot fetch itself?

I'm quite sure Xen cares about all the "Processor Vendor-Specific ACPI" 
[0], that should be _PCT, _CST and _PTC (located in \_PR_.CPUN._XXX).

Roger.

[0] http://www.intel.es/content/dam/www/public/us/en/documents/product-specifications/processor-vendor-specific-acpi-specification.pdf
Konrad Rzeszutek Wilk Feb. 18, 2016, 10:08 p.m. UTC | #17
. snip..
> >>>>> Hmm, you're right, somehow I've managed to ignore the relevant
> >>>>> lines grep reported. Yet - how do things work then, without the
> >>>>> MWAIT feature flag currently getting cleared?
> >>>> I have never observed it being used.  Do you have some local patches in
> >>>> the SLES hypervisor?
> >>>>
> >>>> There is some gross layer violation in xen/enlighten.c to pretend that
> >>>> MWAIT is present to trick the ACPI code into evaluating _CST() methods
> >>>> to report back to Xen.  (This is yet another PV-ism which will cause a
> >>>> headache for a DMLite dom0)
> >>> Yes indeed. CC-ing Roger, and Boris.
> >> Yes, all this is indeed not very nice, and we would ideally like to get
> >> rid of it on PVHv2.
> >>
> >> Could we use the acpica tools (acpidump/acpixtract/acpiexec/...) in
> >> order to fetch this information from user-space and send it to Xen using
> >> privcmd?
> >>
> >> AFAIK those tools work on most OSes (or at least the ones we care about
> >> as Dom0).
> > 
> > In general, we can't rely on userspace evaluation of AML.
> > 
> > For trivial AML which evaluates to a constant, it could be interpreted
> > by userspace, but anything accessing system resources will need
> > evaluating by the kernel.
> 
> Hm, I've took a look at the ACPI tables in one of my systems, and I'm 
... snip ..
> Do we have a formal list of what exactly does Xen want from ACPI that 
> it cannot fetch itself?
> 
> I'm quite sure Xen cares about all the "Processor Vendor-Specific ACPI" 
> [0], that should be _PCT, _CST and _PTC (located in \_PR_.CPUN._XXX).

Correct. But those values - especially _CST are modified by the firmware
depending on the ..[something, I can't actually find the code for it].

Here is an copy-n-paste of the code that sets the
generic ACPI code on the path to get the lower C-states:

commit 73c154c60be106b47f15d1111fc2d75cc7a436f2
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Mon Feb 13 22:26:32 2012 -0500

    xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
    
    For the hypervisor to take advantage of the MWAIT support it needs
    to extract from the ACPI _CST the register address. But the
    hypervisor does not have the support to parse DSDT so it relies on
    the initial domain (dom0) to parse the ACPI Power Management information
    and push it up to the hypervisor. The pushing of the data is done
    by the processor_harveset_xen module which parses the information that
    the ACPI parser has graciously exposed in 'struct acpi_processor'.
    
    For the ACPI parser to also expose the Cx states for MWAIT, we need
    to expose the MWAIT capability (leaf 1). Furthermore we also need to
    expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
    function.
    
    The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
    operations, but it can't do it since it needs to be backwards compatible.
    Instead we choose to use the native CPUID to figure out if the MWAIT
    capability exists and use the XEN_SET_PDC query hypercall to figure out
    if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
    
    Note: The XEN_SET_PDC query was implemented in c/s 23783:
    "ACPI: add _PDC input override mechanism".
    
    With this in place, instead of
     C3 ACPI IOPORT 415
    we get now
     C3:ACPI FFH INTEL MWAIT 0x20
    
    Note: The cpu_idle which would be calling the mwait variants for
idling
    never gets set b/c we set the default pm_idle to be the hypercall
variant.

> 
> Roger.
> 
> [0] http://www.intel.es/content/dam/www/public/us/en/documents/product-specifications/processor-vendor-specific-acpi-specification.pdf
diff mbox

Patch

diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 2748cfd..d10b725 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -55,139 +55,144 @@ 
  * Inverted: '!'
  *   This feature has its value in a featureset inverted, compared to how it
  *   is specified by vendor architecture manuals.
+ *
+ * Applicability to guests: 'A', 'S' or 'H'
+ *   'A' = All guests.
+ *   'S' = All HVM guests (not PV guests).
+ *   'H' = HVM HAP guests (not PV or HVM Shadow guests).
  */
 
 /* Intel-defined CPU features, CPUID level 0x00000001.edx, word 0 */
-#define X86_FEATURE_FPU           ( 0*32+ 0) /*   Onboard FPU */
-#define X86_FEATURE_VME           ( 0*32+ 1) /*   Virtual Mode Extensions */
-#define X86_FEATURE_DE            ( 0*32+ 2) /*   Debugging Extensions */
-#define X86_FEATURE_PSE           ( 0*32+ 3) /*   Page Size Extensions */
-#define X86_FEATURE_TSC           ( 0*32+ 4) /*   Time Stamp Counter */
-#define X86_FEATURE_MSR           ( 0*32+ 5) /*   Model-Specific Registers, RDMSR, WRMSR */
-#define X86_FEATURE_PAE           ( 0*32+ 6) /*   Physical Address Extensions */
-#define X86_FEATURE_MCE           ( 0*32+ 7) /*   Machine Check Architecture */
-#define X86_FEATURE_CX8           ( 0*32+ 8) /*   CMPXCHG8 instruction */
-#define X86_FEATURE_APIC          ( 0*32+ 9) /*   Onboard APIC */
-#define X86_FEATURE_SEP           ( 0*32+11) /*   SYSENTER/SYSEXIT */
-#define X86_FEATURE_MTRR          ( 0*32+12) /*   Memory Type Range Registers */
-#define X86_FEATURE_PGE           ( 0*32+13) /*   Page Global Enable */
-#define X86_FEATURE_MCA           ( 0*32+14) /*   Machine Check Architecture */
-#define X86_FEATURE_CMOV          ( 0*32+15) /*   CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
-#define X86_FEATURE_PAT           ( 0*32+16) /*   Page Attribute Table */
-#define X86_FEATURE_PSE36         ( 0*32+17) /*   36-bit PSEs */
+#define X86_FEATURE_FPU           ( 0*32+ 0) /*A  Onboard FPU */
+#define X86_FEATURE_VME           ( 0*32+ 1) /*S  Virtual Mode Extensions */
+#define X86_FEATURE_DE            ( 0*32+ 2) /*A  Debugging Extensions */
+#define X86_FEATURE_PSE           ( 0*32+ 3) /*S  Page Size Extensions */
+#define X86_FEATURE_TSC           ( 0*32+ 4) /*A  Time Stamp Counter */
+#define X86_FEATURE_MSR           ( 0*32+ 5) /*A  Model-Specific Registers, RDMSR, WRMSR */
+#define X86_FEATURE_PAE           ( 0*32+ 6) /*A  Physical Address Extensions */
+#define X86_FEATURE_MCE           ( 0*32+ 7) /*A  Machine Check Architecture */
+#define X86_FEATURE_CX8           ( 0*32+ 8) /*A  CMPXCHG8 instruction */
+#define X86_FEATURE_APIC          ( 0*32+ 9) /*A  Onboard APIC */
+#define X86_FEATURE_SEP           ( 0*32+11) /*A  SYSENTER/SYSEXIT */
+#define X86_FEATURE_MTRR          ( 0*32+12) /*S  Memory Type Range Registers */
+#define X86_FEATURE_PGE           ( 0*32+13) /*S  Page Global Enable */
+#define X86_FEATURE_MCA           ( 0*32+14) /*A  Machine Check Architecture */
+#define X86_FEATURE_CMOV          ( 0*32+15) /*A  CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
+#define X86_FEATURE_PAT           ( 0*32+16) /*A  Page Attribute Table */
+#define X86_FEATURE_PSE36         ( 0*32+17) /*S  36-bit PSEs */
 #define X86_FEATURE_PN            ( 0*32+18) /*   Processor serial number */
-#define X86_FEATURE_CLFLSH        ( 0*32+19) /*   CLFLUSH instruction */
+#define X86_FEATURE_CLFLSH        ( 0*32+19) /*A  CLFLUSH instruction */
 #define X86_FEATURE_DS            ( 0*32+21) /*   Debug Store */
-#define X86_FEATURE_ACPI          ( 0*32+22) /*   ACPI via MSR */
-#define X86_FEATURE_MMX           ( 0*32+23) /*   Multimedia Extensions */
-#define X86_FEATURE_FXSR          ( 0*32+24) /*   FXSAVE and FXRSTOR instructions */
-#define X86_FEATURE_XMM           ( 0*32+25) /*   Streaming SIMD Extensions */
-#define X86_FEATURE_XMM2          ( 0*32+26) /*   Streaming SIMD Extensions-2 */
+#define X86_FEATURE_ACPI          ( 0*32+22) /*A  ACPI via MSR */
+#define X86_FEATURE_MMX           ( 0*32+23) /*A  Multimedia Extensions */
+#define X86_FEATURE_FXSR          ( 0*32+24) /*A  FXSAVE and FXRSTOR instructions */
+#define X86_FEATURE_XMM           ( 0*32+25) /*A  Streaming SIMD Extensions */
+#define X86_FEATURE_XMM2          ( 0*32+26) /*A  Streaming SIMD Extensions-2 */
 #define X86_FEATURE_SELFSNOOP     ( 0*32+27) /*   CPU self snoop */
-#define X86_FEATURE_HT            ( 0*32+28) /*   Hyper-Threading */
+#define X86_FEATURE_HT            ( 0*32+28) /*A  Hyper-Threading */
 #define X86_FEATURE_ACC           ( 0*32+29) /*   Automatic clock control */
 #define X86_FEATURE_IA64          ( 0*32+30) /*   IA-64 processor */
 #define X86_FEATURE_PBE           ( 0*32+31) /*   Pending Break Enable */
 
 /* Intel-defined CPU features, CPUID level 0x00000001.ecx, word 1 */
-#define X86_FEATURE_XMM3          ( 1*32+ 0) /*   Streaming SIMD Extensions-3 */
-#define X86_FEATURE_PCLMULQDQ     ( 1*32+ 1) /*   Carry-less mulitplication */
+#define X86_FEATURE_XMM3          ( 1*32+ 0) /*A  Streaming SIMD Extensions-3 */
+#define X86_FEATURE_PCLMULQDQ     ( 1*32+ 1) /*A  Carry-less mulitplication */
 #define X86_FEATURE_DTES64        ( 1*32+ 2) /*   64-bit Debug Store */
 #define X86_FEATURE_MWAIT         ( 1*32+ 3) /*   Monitor/Mwait support */
 #define X86_FEATURE_DSCPL         ( 1*32+ 4) /*   CPL Qualified Debug Store */
-#define X86_FEATURE_VMXE          ( 1*32+ 5) /*   Virtual Machine Extensions */
+#define X86_FEATURE_VMXE          ( 1*32+ 5) /*S  Virtual Machine Extensions */
 #define X86_FEATURE_SMXE          ( 1*32+ 6) /*   Safer Mode Extensions */
 #define X86_FEATURE_EST           ( 1*32+ 7) /*   Enhanced SpeedStep */
 #define X86_FEATURE_TM2           ( 1*32+ 8) /*   Thermal Monitor 2 */
-#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*   Supplemental Streaming SIMD Extensions-3 */
+#define X86_FEATURE_SSSE3         ( 1*32+ 9) /*A  Supplemental Streaming SIMD Extensions-3 */
 #define X86_FEATURE_CID           ( 1*32+10) /*   Context ID */
-#define X86_FEATURE_FMA           ( 1*32+12) /*   Fused Multiply Add */
-#define X86_FEATURE_CX16          ( 1*32+13) /*   CMPXCHG16B */
+#define X86_FEATURE_FMA           ( 1*32+12) /*A  Fused Multiply Add */
+#define X86_FEATURE_CX16          ( 1*32+13) /*A  CMPXCHG16B */
 #define X86_FEATURE_XTPR          ( 1*32+14) /*   Send Task Priority Messages */
 #define X86_FEATURE_PDCM          ( 1*32+15) /*   Perf/Debug Capability MSR */
-#define X86_FEATURE_PCID          ( 1*32+17) /*   Process Context ID */
+#define X86_FEATURE_PCID          ( 1*32+17) /*H  Process Context ID */
 #define X86_FEATURE_DCA           ( 1*32+18) /*   Direct Cache Access */
-#define X86_FEATURE_SSE4_1        ( 1*32+19) /*   Streaming SIMD Extensions 4.1 */
-#define X86_FEATURE_SSE4_2        ( 1*32+20) /*   Streaming SIMD Extensions 4.2 */
-#define X86_FEATURE_X2APIC        ( 1*32+21) /*   Extended xAPIC */
-#define X86_FEATURE_MOVBE         ( 1*32+22) /*   movbe instruction */
-#define X86_FEATURE_POPCNT        ( 1*32+23) /*   POPCNT instruction */
-#define X86_FEATURE_TSC_DEADLINE  ( 1*32+24) /*   TSC Deadline Timer */
-#define X86_FEATURE_AES           ( 1*32+25) /*   AES instructions */
-#define X86_FEATURE_XSAVE         ( 1*32+26) /*   XSAVE/XRSTOR/XSETBV/XGETBV */
+#define X86_FEATURE_SSE4_1        ( 1*32+19) /*A  Streaming SIMD Extensions 4.1 */
+#define X86_FEATURE_SSE4_2        ( 1*32+20) /*A  Streaming SIMD Extensions 4.2 */
+#define X86_FEATURE_X2APIC        ( 1*32+21) /*A  Extended xAPIC */
+#define X86_FEATURE_MOVBE         ( 1*32+22) /*A  movbe instruction */
+#define X86_FEATURE_POPCNT        ( 1*32+23) /*A  POPCNT instruction */
+#define X86_FEATURE_TSC_DEADLINE  ( 1*32+24) /*S  TSC Deadline Timer */
+#define X86_FEATURE_AES           ( 1*32+25) /*A  AES instructions */
+#define X86_FEATURE_XSAVE         ( 1*32+26) /*A  XSAVE/XRSTOR/XSETBV/XGETBV */
 #define X86_FEATURE_OSXSAVE       ( 1*32+27) /*   OSXSAVE */
-#define X86_FEATURE_AVX           ( 1*32+28) /*   Advanced Vector Extensions */
-#define X86_FEATURE_F16C          ( 1*32+29) /*   Half-precision convert instruction */
-#define X86_FEATURE_RDRAND        ( 1*32+30) /*   Digital Random Number Generator */
-#define X86_FEATURE_HYPERVISOR    ( 1*32+31) /*   Running under some hypervisor */
+#define X86_FEATURE_AVX           ( 1*32+28) /*A  Advanced Vector Extensions */
+#define X86_FEATURE_F16C          ( 1*32+29) /*A  Half-precision convert instruction */
+#define X86_FEATURE_RDRAND        ( 1*32+30) /*A  Digital Random Number Generator */
+#define X86_FEATURE_HYPERVISOR    ( 1*32+31) /*A  Running under some hypervisor */
 
 /* AMD-defined CPU features, CPUID level 0x80000001.edx, word 2 */
-#define X86_FEATURE_SYSCALL       ( 2*32+11) /*   SYSCALL/SYSRET */
-#define X86_FEATURE_MP            ( 2*32+19) /*   MP Capable. */
-#define X86_FEATURE_NX            ( 2*32+20) /*   Execute Disable */
-#define X86_FEATURE_MMXEXT        ( 2*32+22) /*   AMD MMX extensions */
-#define X86_FEATURE_FFXSR         ( 2*32+25) /*   FFXSR instruction optimizations */
-#define X86_FEATURE_PAGE1GB       ( 2*32+26) /*   1Gb large page support */
-#define X86_FEATURE_RDTSCP        ( 2*32+27) /*   RDTSCP */
-#define X86_FEATURE_LM            ( 2*32+29) /*   Long Mode (x86-64) */
-#define X86_FEATURE_3DNOWEXT      ( 2*32+30) /*   AMD 3DNow! extensions */
-#define X86_FEATURE_3DNOW         ( 2*32+31) /*   3DNow! */
+#define X86_FEATURE_SYSCALL       ( 2*32+11) /*A  SYSCALL/SYSRET */
+#define X86_FEATURE_MP            ( 2*32+19) /*A  MP Capable. */
+#define X86_FEATURE_NX            ( 2*32+20) /*A  Execute Disable */
+#define X86_FEATURE_MMXEXT        ( 2*32+22) /*A  AMD MMX extensions */
+#define X86_FEATURE_FFXSR         ( 2*32+25) /*A  FFXSR instruction optimizations */
+#define X86_FEATURE_PAGE1GB       ( 2*32+26) /*H  1Gb large page support */
+#define X86_FEATURE_RDTSCP        ( 2*32+27) /*S  RDTSCP */
+#define X86_FEATURE_LM            ( 2*32+29) /*A  Long Mode (x86-64) */
+#define X86_FEATURE_3DNOWEXT      ( 2*32+30) /*A  AMD 3DNow! extensions */
+#define X86_FEATURE_3DNOW         ( 2*32+31) /*A  3DNow! */
 
 /* AMD-defined CPU features, CPUID level 0x80000001.ecx, word 3 */
-#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*   LAHF/SAHF in long mode */
+#define X86_FEATURE_LAHF_LM       ( 3*32+ 0) /*A  LAHF/SAHF in long mode */
 #define X86_FEATURE_CMP_LEGACY    ( 3*32+ 1) /*   If yes HyperThreading not valid */
-#define X86_FEATURE_SVM           ( 3*32+ 2) /*   Secure virtual machine */
+#define X86_FEATURE_SVM           ( 3*32+ 2) /*S  Secure virtual machine */
 #define X86_FEATURE_EXTAPIC       ( 3*32+ 3) /*   Extended APIC space */
-#define X86_FEATURE_CR8_LEGACY    ( 3*32+ 4) /*   CR8 in 32-bit mode */
-#define X86_FEATURE_ABM           ( 3*32+ 5) /*   Advanced bit manipulation */
-#define X86_FEATURE_SSE4A         ( 3*32+ 6) /*   SSE-4A */
-#define X86_FEATURE_MISALIGNSSE   ( 3*32+ 7) /*   Misaligned SSE mode */
-#define X86_FEATURE_3DNOWPREFETCH ( 3*32+ 8) /*   3DNow prefetch instructions */
+#define X86_FEATURE_CR8_LEGACY    ( 3*32+ 4) /*S  CR8 in 32-bit mode */
+#define X86_FEATURE_ABM           ( 3*32+ 5) /*A  Advanced bit manipulation */
+#define X86_FEATURE_SSE4A         ( 3*32+ 6) /*A  SSE-4A */
+#define X86_FEATURE_MISALIGNSSE   ( 3*32+ 7) /*A  Misaligned SSE mode */
+#define X86_FEATURE_3DNOWPREFETCH ( 3*32+ 8) /*A  3DNow prefetch instructions */
 #define X86_FEATURE_OSVW          ( 3*32+ 9) /*   OS Visible Workaround */
-#define X86_FEATURE_IBS           ( 3*32+10) /*   Instruction Based Sampling */
-#define X86_FEATURE_XOP           ( 3*32+11) /*   extended AVX instructions */
+#define X86_FEATURE_IBS           ( 3*32+10) /*S  Instruction Based Sampling */
+#define X86_FEATURE_XOP           ( 3*32+11) /*A  extended AVX instructions */
 #define X86_FEATURE_SKINIT        ( 3*32+12) /*   SKINIT/STGI instructions */
 #define X86_FEATURE_WDT           ( 3*32+13) /*   Watchdog timer */
-#define X86_FEATURE_LWP           ( 3*32+15) /*   Light Weight Profiling */
-#define X86_FEATURE_FMA4          ( 3*32+16) /*   4 operands MAC instructions */
+#define X86_FEATURE_LWP           ( 3*32+15) /*A  Light Weight Profiling */
+#define X86_FEATURE_FMA4          ( 3*32+16) /*A  4 operands MAC instructions */
 #define X86_FEATURE_NODEID_MSR    ( 3*32+19) /*   NodeId MSR */
-#define X86_FEATURE_TBM           ( 3*32+21) /*   trailing bit manipulations */
+#define X86_FEATURE_TBM           ( 3*32+21) /*A  trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT       ( 3*32+22) /*   topology extensions CPUID leafs */
-#define X86_FEATURE_DBEXT         ( 3*32+26) /*   data breakpoint extension */
+#define X86_FEATURE_DBEXT         ( 3*32+26) /*A  data breakpoint extension */
 #define X86_FEATURE_MWAITX        ( 3*32+29) /*   MWAIT extension (MONITORX/MWAITX) */
 
 /* Intel-defined CPU features, CPUID level 0x0000000D:1.eax, word 4 */
-#define X86_FEATURE_XSAVEOPT      ( 4*32+ 0) /*   XSAVEOPT instruction */
-#define X86_FEATURE_XSAVEC        ( 4*32+ 1) /*   XSAVEC/XRSTORC instructions */
-#define X86_FEATURE_XGETBV1       ( 4*32+ 2) /*   XGETBV with %ecx=1 */
-#define X86_FEATURE_XSAVES        ( 4*32+ 3) /*   XSAVES/XRSTORS instructions */
+#define X86_FEATURE_XSAVEOPT      ( 4*32+ 0) /*A  XSAVEOPT instruction */
+#define X86_FEATURE_XSAVEC        ( 4*32+ 1) /*A  XSAVEC/XRSTORC instructions */
+#define X86_FEATURE_XGETBV1       ( 4*32+ 2) /*A  XGETBV with %ecx=1 */
+#define X86_FEATURE_XSAVES        ( 4*32+ 3) /*S  XSAVES/XRSTORS instructions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
-#define X86_FEATURE_FSGSBASE      ( 5*32+ 0) /*   {RD,WR}{FS,GS}BASE instructions */
-#define X86_FEATURE_TSC_ADJUST    ( 5*32+ 1) /*   TSC_ADJUST MSR available */
-#define X86_FEATURE_BMI1          ( 5*32+ 3) /*   1st bit manipulation extensions */
-#define X86_FEATURE_HLE           ( 5*32+ 4) /*   Hardware Lock Elision */
-#define X86_FEATURE_AVX2          ( 5*32+ 5) /*   AVX2 instructions */
-#define X86_FEATURE_SMEP          ( 5*32+ 7) /*   Supervisor Mode Execution Protection */
-#define X86_FEATURE_BMI2          ( 5*32+ 8) /*   2nd bit manipulation extensions */
-#define X86_FEATURE_ERMS          ( 5*32+ 9) /*   Enhanced REP MOVSB/STOSB */
-#define X86_FEATURE_INVPCID       ( 5*32+10) /*   Invalidate Process Context ID */
-#define X86_FEATURE_RTM           ( 5*32+11) /*   Restricted Transactional Memory */
+#define X86_FEATURE_FSGSBASE      ( 5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
+#define X86_FEATURE_TSC_ADJUST    ( 5*32+ 1) /*S  TSC_ADJUST MSR available */
+#define X86_FEATURE_BMI1          ( 5*32+ 3) /*A  1st bit manipulation extensions */
+#define X86_FEATURE_HLE           ( 5*32+ 4) /*A  Hardware Lock Elision */
+#define X86_FEATURE_AVX2          ( 5*32+ 5) /*A  AVX2 instructions */
+#define X86_FEATURE_SMEP          ( 5*32+ 7) /*S  Supervisor Mode Execution Protection */
+#define X86_FEATURE_BMI2          ( 5*32+ 8) /*A  2nd bit manipulation extensions */
+#define X86_FEATURE_ERMS          ( 5*32+ 9) /*A  Enhanced REP MOVSB/STOSB */
+#define X86_FEATURE_INVPCID       ( 5*32+10) /*H  Invalidate Process Context ID */
+#define X86_FEATURE_RTM           ( 5*32+11) /*A  Restricted Transactional Memory */
 #define X86_FEATURE_CMT           ( 5*32+12) /*   Cache Monitoring Technology */
-#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!  FPU CS/DS stored as zero */
+#define X86_FEATURE_FPU_SEL       ( 5*32+13) /*!A FPU CS/DS stored as zero */
 #define X86_FEATURE_MPX           ( 5*32+14) /*   Memory Protection Extensions */
 #define X86_FEATURE_CAT           ( 5*32+15) /*   Cache Allocation Technology */
-#define X86_FEATURE_RDSEED        ( 5*32+18) /*   RDSEED instruction */
-#define X86_FEATURE_ADX           ( 5*32+19) /*   ADCX, ADOX instructions */
-#define X86_FEATURE_SMAP          ( 5*32+20) /*   Supervisor Mode Access Prevention */
-#define X86_FEATURE_PCOMMIT       ( 5*32+22) /*   PCOMMIT instruction */
-#define X86_FEATURE_CLFLUSHOPT    ( 5*32+23) /*   CLFLUSHOPT instruction */
-#define X86_FEATURE_CLWB          ( 5*32+24) /*   CLWB instruction */
-#define X86_FEATURE_SHA           ( 5*32+29) /*   SHA1 & SHA256 instructions */
+#define X86_FEATURE_RDSEED        ( 5*32+18) /*A  RDSEED instruction */
+#define X86_FEATURE_ADX           ( 5*32+19) /*A  ADCX, ADOX instructions */
+#define X86_FEATURE_SMAP          ( 5*32+20) /*S  Supervisor Mode Access Prevention */
+#define X86_FEATURE_PCOMMIT       ( 5*32+22) /*A  PCOMMIT instruction */
+#define X86_FEATURE_CLFLUSHOPT    ( 5*32+23) /*A  CLFLUSHOPT instruction */
+#define X86_FEATURE_CLWB          ( 5*32+24) /*A  CLWB instruction */
+#define X86_FEATURE_SHA           ( 5*32+29) /*A  SHA1 & SHA256 instructions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ecx, word 6 */
-#define X86_FEATURE_PREFETCHWT1   ( 6*32+ 0) /*   PREFETCHWT1 instruction */
-#define X86_FEATURE_PKU           ( 6*32+ 3) /*   Protection Keys for Userspace */
+#define X86_FEATURE_PREFETCHWT1   ( 6*32+ 0) /*A  PREFETCHWT1 instruction */
+#define X86_FEATURE_PKU           ( 6*32+ 3) /*H  Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE         ( 6*32+ 4) /*   OS Protection Keys Enable */
 
 /* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
@@ -195,7 +200,7 @@ 
 #define X86_FEATURE_EFRO          ( 7*32+10) /*   APERF/MPERF Read Only interface */
 
 /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
-#define X86_FEATURE_CLZERO        ( 8*32+ 0) /*   CLZERO instruction */
+#define X86_FEATURE_CLZERO        ( 8*32+ 0) /*A  CLZERO instruction */
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 #endif /* !__XEN_PUBLIC_ARCH_X86_CPUFEATURESET_H__ */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 9e0cc34..5f0f892 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -17,12 +17,18 @@  class State(object):
         # State parsed from input
         self.names = {} # Name => value mapping
         self.raw_inverted = []
+        self.raw_pv = []
+        self.raw_hvm_shadow = []
+        self.raw_hvm_hap = []
 
         # State calculated
         self.nr_entries = 0 # Number of words in a featureset
         self.common = 0 # Common features between 1d and e1d
         self.known = [] # All known features
         self.inverted = [] # Features with inverted representations
+        self.pv = []
+        self.hvm_shadow = []
+        self.hvm_hap = []
 
 def parse_definitions(state):
     """
@@ -32,7 +38,7 @@  def parse_definitions(state):
     feat_regex = re.compile(
         r"^#define X86_FEATURE_([A-Z0-9_]+)"
         "\s+\(([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
-        "\s+/\*([!]*) .*$")
+        "\s+/\*([\w!]*) .*$")
 
     this = sys.modules[__name__]
 
@@ -69,6 +75,18 @@  def parse_definitions(state):
             if "!" in attr:
                 state.raw_inverted.append(val)
 
+            if "A" in attr:
+                state.raw_pv.append(val)
+                state.raw_hvm_shadow.append(val)
+                state.raw_hvm_hap.append(val)
+            elif "S" in attr:
+                state.raw_hvm_shadow.append(val)
+                state.raw_hvm_hap.append(val)
+            elif "H" in attr:
+                state.raw_hvm_hap.append(val)
+            else:
+                raise Fail("Unrecognised attributes '%s' for %s" % (attr, name))
+
 
 def featureset_to_uint32s(fs, nr):
     """ Represent a featureset as a list of C-compatible uint32_t's """
@@ -116,6 +134,9 @@  def crunch_numbers(state):
 
     state.common = featureset_to_uint32s(common_1d, 1)[0]
     state.inverted = featureset_to_uint32s(state.raw_inverted, nr_entries)
+    state.pv = featureset_to_uint32s(state.raw_pv, nr_entries)
+    state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries)
+    state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries)
 
 
 def write_results(state):
@@ -137,10 +158,19 @@  def write_results(state):
 #define INIT_KNOWN_FEATURES { \\\n%s\n}
 
 #define INIT_INVERTED_FEATURES { \\\n%s\n}
+
+#define INIT_PV_FEATURES { \\\n%s\n}
+
+#define INIT_HVM_SHADOW_FEATURES { \\\n%s\n}
+
+#define INIT_HVM_HAP_FEATURES { \\\n%s\n}
 """ % (state.nr_entries,
        state.common,
        format_uint32s(state.known, 4),
        format_uint32s(state.inverted, 4),
+       format_uint32s(state.pv, 4),
+       format_uint32s(state.hvm_shadow, 4),
+       format_uint32s(state.hvm_hap, 4),
        ))
 
     state.output.write(