diff mbox series

[v2,3/3] svm/nestedsvm: Introduce nested capabilities bit

Message ID 20240313122454.965566-4-george.dunlap@cloud.com (mailing list archive)
State New
Headers show
Series AMD Nested Virt Preparation | expand

Commit Message

George Dunlap March 13, 2024, 12:24 p.m. UTC
In order to make implementation and testing tractable, we will require
specific host functionality.  Add a nested_virt bit to hvm_funcs.caps,
and return an error if a domain is created with nested virt and this
bit isn't set.  Create VMX and SVM callbacks to be executed from
start_nested_svm(), which is guaranteed to execute after all
command-line options have been procesed.

For VMX, start with always enabling it if HAP is present; this
shouldn't change current behvior.

For SVM, require some basic functionality, adding a document
explaining the rationale.

NB that only SVM CPUID bits 0-7 have been considered.  Bits 10-16 may
be considered in a follow-up patch.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
v2:
 - Fixed typo in title
 - Added hvm_nested_virt_supported() def for !CONFIG_HVM
 - Rebased over previous changes
 - Tweak some wording in document
 - Require npt rather than nrips twice
 - Remove stray __init from header
 - Set caps.nested_virt from callback from nestedhvm_setup()

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Julien Grall <julien@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
---
 docs/designs/nested-svm-cpu-features.md  | 111 +++++++++++++++++++++++
 xen/arch/x86/domain.c                    |   6 ++
 xen/arch/x86/hvm/nestedhvm.c             |  10 ++
 xen/arch/x86/hvm/svm/nestedsvm.c         |  14 +++
 xen/arch/x86/hvm/vmx/vvmx.c              |   8 ++
 xen/arch/x86/include/asm/hvm/hvm.h       |  16 +++-
 xen/arch/x86/include/asm/hvm/nestedhvm.h |   4 +
 7 files changed, 168 insertions(+), 1 deletion(-)

Comments

Jan Beulich March 18, 2024, 2:17 p.m. UTC | #1
On 13.03.2024 13:24, George Dunlap wrote:
> In order to make implementation and testing tractable, we will require
> specific host functionality.  Add a nested_virt bit to hvm_funcs.caps,
> and return an error if a domain is created with nested virt and this
> bit isn't set.  Create VMX and SVM callbacks to be executed from
> start_nested_svm(), which is guaranteed to execute after all

Nit: nestedhvm_setup() (or, with different wording, start_nested_{svm,vmx}()).

> command-line options have been procesed.
> 
> For VMX, start with always enabling it if HAP is present; this
> shouldn't change current behvior.
> 
> For SVM, require some basic functionality, adding a document
> explaining the rationale.
> 
> NB that only SVM CPUID bits 0-7 have been considered.  Bits 10-16 may
> be considered in a follow-up patch.
> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>
>[...]
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>           */
>          config->flags |= XEN_DOMCTL_CDF_oos_off;
>  
> +    if ( nested_virt && !hvm_nested_virt_supported() )
> +    {
> +        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
> +        return -EINVAL;        
> +    }
> +
>      if ( nested_virt && !hap )
>      {
>          dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");

As mentioned in reply to v1, I think what both start_nested_{svm,vmx}() check
is redundant with this latter check. I think that check isn't necessary there
(yet unlike suggested in reply to v1 I don't think anymore that the check here
can alternatively be dropped). And even if it was to be kept for some reason,
I'm having some difficulty seeing why vendor code needs to do that check, when
nestedhvm_setup() could do it for both of them.

> --- a/xen/arch/x86/hvm/nestedhvm.c
> +++ b/xen/arch/x86/hvm/nestedhvm.c
> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>      __clear_bit(0x80, shadow_io_bitmap[0]);
>      __clear_bit(0xed, shadow_io_bitmap[1]);
>  
> +    /* 
> +     * NB this must be called after all command-line processing has been
> +     * done, so that if (for example) HAP is disabled, nested virt is
> +     * disabled as well.
> +     */
> +    if ( cpu_has_vmx )
> +        start_nested_vmx(&hvm_funcs);
> +    else if ( cpu_has_svm )
> +        start_nested_svm(&hvm_funcs);

Instead of passing the pointer, couldn't you have the functions return
bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
pointer looks somewhat odd to me.

Is there a reason to use direct calls here rather than a true hook
(seeing that hvm_funcs must have been populated by the time we make it
here)? I understand we're (remotely) considering to switch away from
using hooks at some point, but right now this feels somewhat
inconsistent if not further justified.

> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -2816,6 +2816,14 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
>      __vmwrite(read_shadow_field, v->arch.hvm.nvcpu.guest_cr[cr]);
>  }
>  
> +void __init start_nested_vmx(struct hvm_function_table *hvm_function_table)
> +{
> +    /* TODO: Require hardware support before enabling nested virt */
> +    hvm_function_table->caps.nested_virt =
> +        hvm_function_table->caps.hap;
> +}
> +
> +

Nit: No double blank lines please.

Jan
George Dunlap March 27, 2024, 5:01 p.m. UTC | #2
On Mon, Mar 18, 2024 at 2:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 13.03.2024 13:24, George Dunlap wrote:
> > In order to make implementation and testing tractable, we will require
> > specific host functionality.  Add a nested_virt bit to hvm_funcs.caps,
> > and return an error if a domain is created with nested virt and this
> > bit isn't set.  Create VMX and SVM callbacks to be executed from
> > start_nested_svm(), which is guaranteed to execute after all
>
> Nit: nestedhvm_setup() (or, with different wording, start_nested_{svm,vmx}()).

Oops, fixed.

> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >           */
> >          config->flags |= XEN_DOMCTL_CDF_oos_off;
> >
> > +    if ( nested_virt && !hvm_nested_virt_supported() )
> > +    {
> > +        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
> > +        return -EINVAL;
> > +    }
> > +
> >      if ( nested_virt && !hap )
> >      {
> >          dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");
>
> As mentioned in reply to v1, I think what both start_nested_{svm,vmx}() check
> is redundant with this latter check. I think that check isn't necessary there
> (yet unlike suggested in reply to v1 I don't think anymore that the check here
> can alternatively be dropped). And even if it was to be kept for some reason,
> I'm having some difficulty seeing why vendor code needs to do that check, when
> nestedhvm_setup() could do it for both of them.

I'm having a bit of trouble resolving the antecedents in this
paragraph (what "this" and "there" are referring to).

Are you saying that we should set hvm_funcs.caps.nested_virt to 'true'
even if the hardware doesn't support HAP, because we check that here?

That seems like a very strange way to go about things; hvm_funcs.caps
should reflect the actual capabilities of the hardware.  Suppose at
some point we want to expose nested virt capability to the toolstack
-- wouldn't it make more sense to be able to just read
`hvm_funcs.caps.nested_virt`, rather than having to remember to && it
with `hvm_funcs.caps.hap`?

And as you say, you can't get rid of the HAP check here, because we
also want to deny nested virt if the admin deliberately created a
guest in shadow mode on a system that has HAP available.  So it's not
redundant: one is checking the capabilities of the system, the other
is checking the requested guest configuration.

As to why to have each vendor independent code check for HAP -- there
are in fact two implementations of the code; it's nice to be able to
look in one place for each implementation to determine the
requirements.  Additionally, it would be possible, in the future, for
one of the nested virt implementations to enable shadow mode, while
the other one didn't.  The current arrangement makes that easy.

> > --- a/xen/arch/x86/hvm/nestedhvm.c
> > +++ b/xen/arch/x86/hvm/nestedhvm.c
> > @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
> >      __clear_bit(0x80, shadow_io_bitmap[0]);
> >      __clear_bit(0xed, shadow_io_bitmap[1]);
> >
> > +    /*
> > +     * NB this must be called after all command-line processing has been
> > +     * done, so that if (for example) HAP is disabled, nested virt is
> > +     * disabled as well.
> > +     */
> > +    if ( cpu_has_vmx )
> > +        start_nested_vmx(&hvm_funcs);
> > +    else if ( cpu_has_svm )
> > +        start_nested_svm(&hvm_funcs);
>
> Instead of passing the pointer, couldn't you have the functions return
> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
> pointer looks somewhat odd to me.

For one, it makes start_nested_XXX symmetric to start_XXX, which also
has access to the full hvm_funcs structure (albeit in the former case
because it's creating the structure).  For two, both of them need to
read caps.hap.  For three, it's just more flexible -- there may be
future things that we want to modify in the start_nested_*()
functions.  If we did as you suggest, and then added (say)
caps.nested_virt_needs_hap, then we'd either need to add a second
function, or refactor it to look like this.

> Is there a reason to use direct calls here rather than a true hook
> (seeing that hvm_funcs must have been populated by the time we make it
> here)? I understand we're (remotely) considering to switch away from
> using hooks at some point, but right now this feels somewhat
> inconsistent if not further justified.

Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
look at adding a function pointer to see if it works.

 -George
Jan Beulich March 28, 2024, 6:44 a.m. UTC | #3
On 27.03.2024 18:01, George Dunlap wrote:
> On Mon, Mar 18, 2024 at 2:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 13.03.2024 13:24, George Dunlap wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>           */
>>>          config->flags |= XEN_DOMCTL_CDF_oos_off;
>>>
>>> +    if ( nested_virt && !hvm_nested_virt_supported() )
>>> +    {
>>> +        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>>      if ( nested_virt && !hap )
>>>      {
>>>          dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");
>>
>> As mentioned in reply to v1, I think what both start_nested_{svm,vmx}() check
>> is redundant with this latter check. I think that check isn't necessary there
>> (yet unlike suggested in reply to v1 I don't think anymore that the check here
>> can alternatively be dropped). And even if it was to be kept for some reason,
>> I'm having some difficulty seeing why vendor code needs to do that check, when
>> nestedhvm_setup() could do it for both of them.
> 
> I'm having a bit of trouble resolving the antecedents in this
> paragraph (what "this" and "there" are referring to).
> 
> Are you saying that we should set hvm_funcs.caps.nested_virt to 'true'
> even if the hardware doesn't support HAP, because we check that here?
> 
> That seems like a very strange way to go about things; hvm_funcs.caps
> should reflect the actual capabilities of the hardware.  Suppose at
> some point we want to expose nested virt capability to the toolstack
> -- wouldn't it make more sense to be able to just read
> `hvm_funcs.caps.nested_virt`, rather than having to remember to && it
> with `hvm_funcs.caps.hap`?
> 
> And as you say, you can't get rid of the HAP check here, because we
> also want to deny nested virt if the admin deliberately created a
> guest in shadow mode on a system that has HAP available.  So it's not
> redundant: one is checking the capabilities of the system, the other
> is checking the requested guest configuration.

Hmm, yes, you're right.

> As to why to have each vendor independent code check for HAP -- there
> are in fact two implementations of the code; it's nice to be able to
> look in one place for each implementation to determine the
> requirements.  Additionally, it would be possible, in the future, for
> one of the nested virt implementations to enable shadow mode, while
> the other one didn't.  The current arrangement makes that easy.

Well, first - how likely is this, when instead we've been considering
whether we could get rid of shadow mode? And then this is balancing
between ease of future changes vs avoiding redundancy where it can be
avoided. I'm not going to insist on centralizing the HAP check, but I
certainly wanted the option to be considered.

>>> --- a/xen/arch/x86/hvm/nestedhvm.c
>>> +++ b/xen/arch/x86/hvm/nestedhvm.c
>>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>>>      __clear_bit(0x80, shadow_io_bitmap[0]);
>>>      __clear_bit(0xed, shadow_io_bitmap[1]);
>>>
>>> +    /*
>>> +     * NB this must be called after all command-line processing has been
>>> +     * done, so that if (for example) HAP is disabled, nested virt is
>>> +     * disabled as well.
>>> +     */
>>> +    if ( cpu_has_vmx )
>>> +        start_nested_vmx(&hvm_funcs);
>>> +    else if ( cpu_has_svm )
>>> +        start_nested_svm(&hvm_funcs);
>>
>> Instead of passing the pointer, couldn't you have the functions return
>> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
>> pointer looks somewhat odd to me.
> 
> For one, it makes start_nested_XXX symmetric to start_XXX, which also
> has access to the full hvm_funcs structure (albeit in the former case
> because it's creating the structure).

This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
special because of this. Everywhere else we have accessor helpers when
hvm_funcs needs consulting, e.g. ...

>  For two, both of them need to read caps.hap.

... caps _reads_ would want using (an) accessor(s), too.

>  For three, it's just more flexible -- there may be
> future things that we want to modify in the start_nested_*()
> functions.  If we did as you suggest, and then added (say)
> caps.nested_virt_needs_hap, then we'd either need to add a second
> function, or refactor it to look like this.

Right, I can see this as an argument when taking, as you do, speculation
on future needs into account. Albeit I'm then inclined to say that even
such modifications may better be done through accessor helpers.

>> Is there a reason to use direct calls here rather than a true hook
>> (seeing that hvm_funcs must have been populated by the time we make it
>> here)? I understand we're (remotely) considering to switch away from
>> using hooks at some point, but right now this feels somewhat
>> inconsistent if not further justified.
> 
> Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
> look at adding a function pointer to see if it works.

Andrew - do you have any input here towards those somewhat vague plans
of getting rid of the hook functions? I guess they're more relevant in
places where we can't easily use the altcall machinery (i.e. not here).

Jan
George Dunlap March 28, 2024, 10:57 a.m. UTC | #4
On Thu, Mar 28, 2024 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
> > As to why to have each vendor independent code check for HAP -- there
> > are in fact two implementations of the code; it's nice to be able to
> > look in one place for each implementation to determine the
> > requirements.  Additionally, it would be possible, in the future, for
> > one of the nested virt implementations to enable shadow mode, while
> > the other one didn't.  The current arrangement makes that easy.
>
> Well, first - how likely is this, when instead we've been considering
> whether we could get rid of shadow mode? And then this is balancing
> between ease of future changes vs avoiding redundancy where it can be
> avoided. I'm not going to insist on centralizing the HAP check, but I
> certainly wanted the option to be considered.

I considered it before replying to you; so consider it considered. :-)

> >>> --- a/xen/arch/x86/hvm/nestedhvm.c
> >>> +++ b/xen/arch/x86/hvm/nestedhvm.c
> >>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
> >>>      __clear_bit(0x80, shadow_io_bitmap[0]);
> >>>      __clear_bit(0xed, shadow_io_bitmap[1]);
> >>>
> >>> +    /*
> >>> +     * NB this must be called after all command-line processing has been
> >>> +     * done, so that if (for example) HAP is disabled, nested virt is
> >>> +     * disabled as well.
> >>> +     */
> >>> +    if ( cpu_has_vmx )
> >>> +        start_nested_vmx(&hvm_funcs);
> >>> +    else if ( cpu_has_svm )
> >>> +        start_nested_svm(&hvm_funcs);
> >>
> >> Instead of passing the pointer, couldn't you have the functions return
> >> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
> >> pointer looks somewhat odd to me.
> >
> > For one, it makes start_nested_XXX symmetric to start_XXX, which also
> > has access to the full hvm_funcs structure (albeit in the former case
> > because it's creating the structure).
>
> This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
> special because of this. Everywhere else we have accessor helpers when
> hvm_funcs needs consulting, e.g. ...
>
> >  For two, both of them need to read caps.hap.
>
> ... caps _reads_ would want using (an) accessor(s), too.
>
> >  For three, it's just more flexible -- there may be
> > future things that we want to modify in the start_nested_*()
> > functions.  If we did as you suggest, and then added (say)
> > caps.nested_virt_needs_hap, then we'd either need to add a second
> > function, or refactor it to look like this.
>
> Right, I can see this as an argument when taking, as you do, speculation
> on future needs into account. Albeit I'm then inclined to say that even
> such modifications may better be done through accessor helpers.

Why access it through accessor helpers?

I consider that it's the SVM and VMX "construction/setup" code
respectively which "own" hvm_funcs (as evidenced by the fact that they
create the structures and then return them); and I consider the
start_nested_{vmx/svm} to be a part of the same code; so I consider it
valid for them to have direct access to the structure.

> >> Is there a reason to use direct calls here rather than a true hook
> >> (seeing that hvm_funcs must have been populated by the time we make it
> >> here)? I understand we're (remotely) considering to switch away from
> >> using hooks at some point, but right now this feels somewhat
> >> inconsistent if not further justified.
> >
> > Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
> > look at adding a function pointer to see if it works.
>
> Andrew - do you have any input here towards those somewhat vague plans
> of getting rid of the hook functions? I guess they're more relevant in
> places where we can't easily use the altcall machinery (i.e. not here).

Rather than do all that work collecting information, why don't we just
check it in as it is?  Obviously if we're thinking about getting rid
of hook functions at some point anyway, it can't be all that bad.
There is an aesthetic justification for the lack of hook, in that it's
similar to the start_vmx/svm() functions.

So far I really don't think the remaining "open" changes we're
discussing are worth either of our efforts.  I don't plan to make any
of these changes unless another x86 maintainer seconds your request
(or you can convince me they're worth making).

(The other two changes -- correcting the function name in the commit
message, and removing the extra blank line -- I've already changed
locally, so could check in with your ack.)

 -George
Jan Beulich March 28, 2024, 11:32 a.m. UTC | #5
On 28.03.2024 11:57, George Dunlap wrote:
> On Thu, Mar 28, 2024 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/nestedhvm.c
>>>>> +++ b/xen/arch/x86/hvm/nestedhvm.c
>>>>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>>>>>      __clear_bit(0x80, shadow_io_bitmap[0]);
>>>>>      __clear_bit(0xed, shadow_io_bitmap[1]);
>>>>>
>>>>> +    /*
>>>>> +     * NB this must be called after all command-line processing has been
>>>>> +     * done, so that if (for example) HAP is disabled, nested virt is
>>>>> +     * disabled as well.
>>>>> +     */
>>>>> +    if ( cpu_has_vmx )
>>>>> +        start_nested_vmx(&hvm_funcs);
>>>>> +    else if ( cpu_has_svm )
>>>>> +        start_nested_svm(&hvm_funcs);
>>>>
>>>> Instead of passing the pointer, couldn't you have the functions return
>>>> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
>>>> pointer looks somewhat odd to me.
>>>
>>> For one, it makes start_nested_XXX symmetric to start_XXX, which also
>>> has access to the full hvm_funcs structure (albeit in the former case
>>> because it's creating the structure).
>>
>> This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
>> special because of this. Everywhere else we have accessor helpers when
>> hvm_funcs needs consulting, e.g. ...
>>
>>>  For two, both of them need to read caps.hap.
>>
>> ... caps _reads_ would want using (an) accessor(s), too.
>>
>>>  For three, it's just more flexible -- there may be
>>> future things that we want to modify in the start_nested_*()
>>> functions.  If we did as you suggest, and then added (say)
>>> caps.nested_virt_needs_hap, then we'd either need to add a second
>>> function, or refactor it to look like this.
>>
>> Right, I can see this as an argument when taking, as you do, speculation
>> on future needs into account. Albeit I'm then inclined to say that even
>> such modifications may better be done through accessor helpers.
> 
> Why access it through accessor helpers?
> 
> I consider that it's the SVM and VMX "construction/setup" code
> respectively which "own" hvm_funcs (as evidenced by the fact that they
> create the structures and then return them); and I consider the
> start_nested_{vmx/svm} to be a part of the same code; so I consider it
> valid for them to have direct access to the structure.

That's one way of looking at it, yes. However, to me neither SVM nor VMX
code directly fiddle with hvm_funcs in the way you describe it. Instead
they return a pointer to their respective instances of struct
hvm_function_table. If the nested startup code would similarly alter
private struct instances of some sort, all would be fine.

May I suggest that you grep for hvm_funcs in what is there right now. You'll
find solely vmcs.c having a few uses of .caps, which likely are wrong (as in:
layering violations), too.

That said I can accept that the situation is slightly different here, when
generic code passes a pointer to the nested startup functions. To me at
least it still feels layering-violation-ish.

Jan
Jan Beulich April 3, 2024, 5:52 a.m. UTC | #6
On 28.03.2024 11:57, George Dunlap wrote:
> On Thu, Mar 28, 2024 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
>>> As to why to have each vendor independent code check for HAP -- there
>>> are in fact two implementations of the code; it's nice to be able to
>>> look in one place for each implementation to determine the
>>> requirements.  Additionally, it would be possible, in the future, for
>>> one of the nested virt implementations to enable shadow mode, while
>>> the other one didn't.  The current arrangement makes that easy.
>>
>> Well, first - how likely is this, when instead we've been considering
>> whether we could get rid of shadow mode? And then this is balancing
>> between ease of future changes vs avoiding redundancy where it can be
>> avoided. I'm not going to insist on centralizing the HAP check, but I
>> certainly wanted the option to be considered.
> 
> I considered it before replying to you; so consider it considered. :-)
> 
>>>>> --- a/xen/arch/x86/hvm/nestedhvm.c
>>>>> +++ b/xen/arch/x86/hvm/nestedhvm.c
>>>>> @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void)
>>>>>      __clear_bit(0x80, shadow_io_bitmap[0]);
>>>>>      __clear_bit(0xed, shadow_io_bitmap[1]);
>>>>>
>>>>> +    /*
>>>>> +     * NB this must be called after all command-line processing has been
>>>>> +     * done, so that if (for example) HAP is disabled, nested virt is
>>>>> +     * disabled as well.
>>>>> +     */
>>>>> +    if ( cpu_has_vmx )
>>>>> +        start_nested_vmx(&hvm_funcs);
>>>>> +    else if ( cpu_has_svm )
>>>>> +        start_nested_svm(&hvm_funcs);
>>>>
>>>> Instead of passing the pointer, couldn't you have the functions return
>>>> bool, and then set hvm_funcs.caps.nested_virt from that? Passing that
>>>> pointer looks somewhat odd to me.
>>>
>>> For one, it makes start_nested_XXX symmetric to start_XXX, which also
>>> has access to the full hvm_funcs structure (albeit in the former case
>>> because it's creating the structure).
>>
>> This last aspect is the crucial aspect: start{svm,vmx}() are indeed quite
>> special because of this. Everywhere else we have accessor helpers when
>> hvm_funcs needs consulting, e.g. ...
>>
>>>  For two, both of them need to read caps.hap.
>>
>> ... caps _reads_ would want using (an) accessor(s), too.
>>
>>>  For three, it's just more flexible -- there may be
>>> future things that we want to modify in the start_nested_*()
>>> functions.  If we did as you suggest, and then added (say)
>>> caps.nested_virt_needs_hap, then we'd either need to add a second
>>> function, or refactor it to look like this.
>>
>> Right, I can see this as an argument when taking, as you do, speculation
>> on future needs into account. Albeit I'm then inclined to say that even
>> such modifications may better be done through accessor helpers.
> 
> Why access it through accessor helpers?
> 
> I consider that it's the SVM and VMX "construction/setup" code
> respectively which "own" hvm_funcs (as evidenced by the fact that they
> create the structures and then return them); and I consider the
> start_nested_{vmx/svm} to be a part of the same code; so I consider it
> valid for them to have direct access to the structure.
> 
>>>> Is there a reason to use direct calls here rather than a true hook
>>>> (seeing that hvm_funcs must have been populated by the time we make it
>>>> here)? I understand we're (remotely) considering to switch away from
>>>> using hooks at some point, but right now this feels somewhat
>>>> inconsistent if not further justified.
>>>
>>> Again it mimics the calls to start_vmx/svm in hvm_enable.  But I could
>>> look at adding a function pointer to see if it works.
>>
>> Andrew - do you have any input here towards those somewhat vague plans
>> of getting rid of the hook functions? I guess they're more relevant in
>> places where we can't easily use the altcall machinery (i.e. not here).
> 
> Rather than do all that work collecting information, why don't we just
> check it in as it is?  Obviously if we're thinking about getting rid
> of hook functions at some point anyway, it can't be all that bad.
> There is an aesthetic justification for the lack of hook, in that it's
> similar to the start_vmx/svm() functions.
> 
> So far I really don't think the remaining "open" changes we're
> discussing are worth either of our efforts.  I don't plan to make any
> of these changes unless another x86 maintainer seconds your request
> (or you can convince me they're worth making).
> 
> (The other two changes -- correcting the function name in the commit
> message, and removing the extra blank line -- I've already changed
> locally, so could check in with your ack.)

After some mumbling to myself over the holidays
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan
diff mbox series

Patch

diff --git a/docs/designs/nested-svm-cpu-features.md b/docs/designs/nested-svm-cpu-features.md
new file mode 100644
index 0000000000..837a96df05
--- /dev/null
+++ b/docs/designs/nested-svm-cpu-features.md
@@ -0,0 +1,111 @@ 
+# Nested SVM (AMD) CPUID requirements
+
+The first step in making nested SVM production-ready is to make sure
+that all features are implemented and well-tested.  To make this
+tractable, we will initially be limiting the "supported" range of
+nested virt to a specific subset of host and guest features.  This
+document describes the criteria for deciding on features, and the
+rationale behind each feature.
+
+For AMD, all virtualization-related features can be found in CPUID
+leaf 8000000A:edx
+
+# Criteria
+
+- Processor support: At a minimum we want to support processors from
+  the last 5 years.  All things being equal, we'd prefer to cover
+  older processors than not.  Bits 0:7 were available in the very
+  earliest processors; and even through bit 15 we should be pretty
+  good support-wise.
+
+- Faithfulness to hardware: We need the behavior of the "virtual cpu"
+  from the L1 hypervisor's perspective to be as close as possible to
+  the original hardware.  In particular, the behavior of the hardware
+  on error paths 1) is not easy to understand or test, 2) can be the
+  source of surprising vulnerabiliies.  (See XSA-7 for an example of a
+  case where subtle error-handling differences can open up a privilege
+  escalation.)  We should avoid emulating any bit of the hardware with
+  complex error paths if we can at all help it.
+
+- Cost of implementation: We want to minimize the cost of
+  implementation (where this includes bringing an existing sub-par
+  implementation up to speed).  All things being equal, we'll favor a
+  configuration which does not require any new implementation.
+
+- Performance: All things being equal, we'd prefer to choose a set of
+  L0 / L1 CPUID bits that are faster than slower.
+
+
+# Bits
+
+- 0 `NP` *Nested Paging*: Required both for L0 and L1.
+
+  Based primarily on faithfulness and performance, as well as
+  potential cost of implementation.  Available on earliest hardware,
+  so no compatibility issues.
+
+- 1 `LbrVirt` *LBR / debugging virtualization*: Require for L0 and L1.
+
+  For L0 this is required for performance: There's no way to tell the
+  guests not to use the LBR-related registers; and if the guest does,
+  then you have to save and restore all LBR-related registers on
+  context switch, which is prohibitive.  Furthermore, the additional
+  emulation risks a security-relevant difference to come up.
+
+  Providing it to L1 when we have it in L0 is basically free, and
+  already implemented.
+
+  Just require it and provide it.
+
+- 2 `SVML` *SVM Lock*: Not required for L0, not provided to L1
+
+  Seems to be aboult enabling an operating system to prevent "blue
+  pill" attacks against itself.
+
+  Xen doesn't use it, nor provide it; so it would need to be
+  implementend.  The best way to protect a guest OS is to leave nested
+  virt disabled in the tools.
+
+- 3 `NRIPS` NRIP Save: Require for both L0 and L1
+
+  If NRIPS is not present, the software interrupt injection
+  functionality can't be used; and Xen has to emulate it.  That's
+  another source of potential security issues.  If hardware supports
+  it, then providing it to guest is basically free.
+
+- 4 `TscRateMsr`: Not required by L0, not provided to L1
+
+  The main putative use for this would be trying to maintain an
+  invariant TSC across cores with different clock speeds, or after a
+  migrate.  Unlike others, this doesn't have an error path to worry
+  about compatibility-wise; and according to tests done when nestedSVM
+  was first implemented, it's actually faster to emliate TscRateMSR in
+  the L0 hypervisor than for L1 to attempt to emulate it itself.
+
+  However, using this properly in L0 will take some implementation
+  effort; and composing it properly with L1 will take even more
+  effort.  Just leave it off for now.
+
+ - 5 `VmcbClean`: VMCB Clean Bits: Not required by L0, provide to L1
+
+  This is a pure optimization, both on the side of the L0 and L1.  The
+  implementaiton for L1 is entirely Xen-side, so can be provided even
+  on hardware that doesn't provide it.  And it's purely an
+  optimization, so could be "implemented" by ignoring the bits
+  entirely.
+
+  As such, we don't need to require it for L0; and as it's already
+  implemented, no reason not to provide it to L1.  Before this feature
+  was available those bits were marked SBZ ("should be zero"); setting
+  them was already advertised to cause unpredictable behavior.
+
+- 6 `FlushByAsid`: Require for L0, provide to L1
+
+  This is cheap and easy to use for L0 and to provide to the L1;
+  there's no reson not to just pass it through.
+
+- 7 `DecodeAssists`: Require for L0, provide to L1
+
+  Using it in L0 reduces the chance that we'll make some sort of error
+  in the decode path.  And if hardware supports it, it's easy enough
+  to provide to the L1.
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bda853e3c9..a25f498265 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -673,6 +673,12 @@  int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
          */
         config->flags |= XEN_DOMCTL_CDF_oos_off;
 
+    if ( nested_virt && !hvm_nested_virt_supported() )
+    {
+        dprintk(XENLOG_INFO, "Nested virt requested but not available\n");
+        return -EINVAL;        
+    }
+
     if ( nested_virt && !hap )
     {
         dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n");
diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index 12bf7172b8..451c4da6d4 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -150,6 +150,16 @@  static int __init cf_check nestedhvm_setup(void)
     __clear_bit(0x80, shadow_io_bitmap[0]);
     __clear_bit(0xed, shadow_io_bitmap[1]);
 
+    /* 
+     * NB this must be called after all command-line processing has been
+     * done, so that if (for example) HAP is disabled, nested virt is
+     * disabled as well.
+     */
+    if ( cpu_has_vmx )
+        start_nested_vmx(&hvm_funcs);
+    else if ( cpu_has_svm )
+        start_nested_svm(&hvm_funcs);
+
     return 0;
 }
 __initcall(nestedhvm_setup);
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index a5319ab729..ad2e9f5c35 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1666,3 +1666,17 @@  void svm_nested_features_on_efer_update(struct vcpu *v)
         }
     }
 }
+
+void __init start_nested_svm(struct hvm_function_table *hvm_function_table)
+{
+    /* 
+     * Required host functionality to support nested virt.  See
+     * docs/designs/nested-svm-cpu-features.md for rationale.
+     */
+    hvm_function_table->caps.nested_virt =
+        hvm_function_table->caps.hap && 
+        cpu_has_svm_lbrv &&
+        cpu_has_svm_nrips &&
+        cpu_has_svm_flushbyasid &&
+        cpu_has_svm_decode;
+}
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index ece0aa243a..ed058d9d2b 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2816,6 +2816,14 @@  void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     __vmwrite(read_shadow_field, v->arch.hvm.nvcpu.guest_cr[cr]);
 }
 
+void __init start_nested_vmx(struct hvm_function_table *hvm_function_table)
+{
+    /* TODO: Require hardware support before enabling nested virt */
+    hvm_function_table->caps.nested_virt =
+        hvm_function_table->caps.hap;
+}
+
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 87a6935d97..e6f937fed7 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -97,7 +97,10 @@  struct hvm_function_table {
              singlestep:1,
             
              /* Hardware virtual interrupt delivery enable? */
-             virtual_intr_delivery:1;
+             virtual_intr_delivery:1,
+
+             /* Nested virt capabilities */
+             nested_virt:1;
     } caps;
 
     /*
@@ -654,6 +657,12 @@  static inline bool hvm_altp2m_supported(void)
     return hvm_funcs.caps.altp2m;
 }
 
+/* Returns true if we have the minimum hardware requirements for nested virt */
+static inline bool hvm_nested_virt_supported(void)
+{
+    return hvm_funcs.caps.nested_virt;
+}
+
 /* updates the current hardware p2m */
 static inline void altp2m_vcpu_update_p2m(struct vcpu *v)
 {
@@ -797,6 +806,11 @@  static inline bool hvm_hap_supported(void)
     return false;
 }
 
+static inline bool hvm_nested_virt_supported(void)
+{
+    return false;
+}
+
 static inline bool nhvm_vmcx_hap_enabled(const struct vcpu *v)
 {
     ASSERT_UNREACHABLE();
diff --git a/xen/arch/x86/include/asm/hvm/nestedhvm.h b/xen/arch/x86/include/asm/hvm/nestedhvm.h
index 56a2019e1b..0568acb25f 100644
--- a/xen/arch/x86/include/asm/hvm/nestedhvm.h
+++ b/xen/arch/x86/include/asm/hvm/nestedhvm.h
@@ -82,4 +82,8 @@  static inline bool vvmcx_valid(const struct vcpu *v)
     return vcpu_nestedhvm(v).nv_vvmcxaddr != INVALID_PADDR;
 }
 
+
+void start_nested_svm(struct hvm_function_table *);
+void start_nested_vmx(struct hvm_function_table *);
+
 #endif /* _HVM_NESTEDHVM_H */