diff mbox series

[7/9] x86/hvm: Disable MPX by default

Message ID 20200615141532.1927-8-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series XSA-320 follow for IvyBridge | expand

Commit Message

Andrew Cooper June 15, 2020, 2:15 p.m. UTC
Memory Protection eXtension support has been dropped from GCC and Linux, and
will be dropped from future Intel CPUs.

With all other default/max pieces in place, move MPX from default to max.
This means that VMs won't be offered it by default, but can explicitly opt
into using it via cpuid="host,mpx=1" in their vm.cfg file.

The difference as visible to the guest is:

  diff --git a/default b/mpx
  index 0e91765d6b..c8c33cd584 100644
  --- a/default
  +++ b/mpx
  @@ -13,15 +13,17 @@ Native cpuid:
     00000004:00000004 -> 00000000:00000000:00000000:00000000
     00000005:ffffffff -> 00000000:00000000:00000000:00000000
     00000006:ffffffff -> 00000000:00000000:00000000:00000000
  -  00000007:00000000 -> 00000000:009c2fbb:00000000:9c000400
  +  00000007:00000000 -> 00000000:009c6fbb:00000000:9c000400
     00000008:ffffffff -> 00000000:00000000:00000000:00000000
     00000009:ffffffff -> 00000000:00000000:00000000:00000000
     0000000a:ffffffff -> 00000000:00000000:00000000:00000000
     0000000b:ffffffff -> 00000000:00000000:00000000:00000000
     0000000c:ffffffff -> 00000000:00000000:00000000:00000000
  -  0000000d:00000000 -> 00000007:00000240:00000340:00000000
  +  0000000d:00000000 -> 0000001f:00000240:00000440:00000000
     0000000d:00000001 -> 0000000f:00000240:00000000:00000000
     0000000d:00000002 -> 00000100:00000240:00000000:00000000
  +  0000000d:00000003 -> 00000040:000003c0:00000000:00000000
  +  0000000d:00000004 -> 00000040:00000400:00000000:00000000
     40000000:ffffffff -> 40000005:566e6558:65584d4d:4d4d566e
     40000001:ffffffff -> 0004000e:00000000:00000000:00000000
     40000002:ffffffff -> 00000001:40000000:00000000:00000000

Adjust the legacy restore path in libxc to cope safely with pre-4.14 VMs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Jackson <Ian.Jackson@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Paul Durrant <paul@xen.org>

Dropped Jan's R-by from previous posting, as the patch has gained new
toolstack logic to avoid breaking migrate.
---
 tools/libxc/xc_cpuid_x86.c                  | 48 ++++++++++++++++++-----------
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 2 files changed, 31 insertions(+), 19 deletions(-)

Comments

Jan Beulich June 16, 2020, 9:33 a.m. UTC | #1
On 15.06.2020 16:15, Andrew Cooper wrote:
> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>          goto out;
>      }
>  
> +    /*
> +     * Account for feature which have been disabled by default since Xen 4.13,
> +     * so migrated-in VM's don't risk seeing features disappearing.
> +     */
> +    if ( restore )
> +    {
> +        if ( di.hvm )
> +        {
> +            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);

Why do you derive this from the host featureset instead of the max
one for the guest type? Also, while you modify p here, ...

> +        }
> +    }
> +
>      if ( featureset )
>      {
>          uint32_t disabled_features[FEATURESET_NR_ENTRIES],

... the code in this if()'s body ignores p altogether. I realize the
only caller of the function passes NULL for "featureset", but I'd
like to understand the rationale here anyway before giving an R-b.

Jan
Andrew Cooper June 16, 2020, 4:15 p.m. UTC | #2
On 16/06/2020 10:33, Jan Beulich wrote:
> On 15.06.2020 16:15, Andrew Cooper wrote:
>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>>          goto out;
>>      }
>>  
>> +    /*
>> +     * Account for feature which have been disabled by default since Xen 4.13,
>> +     * so migrated-in VM's don't risk seeing features disappearing.
>> +     */
>> +    if ( restore )
>> +    {
>> +        if ( di.hvm )
>> +        {
>> +            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
> Why do you derive this from the host featureset instead of the max
> one for the guest type?

Because that is how the logic worked for 4.13.

Also, because we don't have easy access to the actual guest max
featureset at this point.  I could add two new sysctl subops to
get_featureset, but the reason for not doing so before are still
applicable now.

There is a theoretical case where host MPX is visible but guest max is
hidden, and that is down to the vmentry controls.  As this doesn't exist
in real hardware, I'm not terribly concerned about it.

>  Also, while you modify p here, ...
>
>> +        }
>> +    }
>> +
>>      if ( featureset )
>>      {
>>          uint32_t disabled_features[FEATURESET_NR_ENTRIES],
> ... the code in this if()'s body ignores p altogether.

That is correct.

> I realize the
> only caller of the function passes NULL for "featureset", but I'd
> like to understand the rationale here anyway before giving an R-b.

The meaning of 'featureset' is "here are the exact bits I want you to use".

It has existed since my original CPUID work in 4.6/4.7.  Currently, the
only user in XenServer, and its a stopgap on the way to the "proper"
solution which I've been working on for the past several years.

~Andrew
Jan Beulich June 17, 2020, 10:32 a.m. UTC | #3
On 16.06.2020 18:15, Andrew Cooper wrote:
> On 16/06/2020 10:33, Jan Beulich wrote:
>> On 15.06.2020 16:15, Andrew Cooper wrote:
>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>>>          goto out;
>>>      }
>>>  
>>> +    /*
>>> +     * Account for feature which have been disabled by default since Xen 4.13,
>>> +     * so migrated-in VM's don't risk seeing features disappearing.
>>> +     */
>>> +    if ( restore )
>>> +    {
>>> +        if ( di.hvm )
>>> +        {
>>> +            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
>> Why do you derive this from the host featureset instead of the max
>> one for the guest type?
> 
> Because that is how the logic worked for 4.13.
> 
> Also, because we don't have easy access to the actual guest max
> featureset at this point.  I could add two new sysctl subops to
> get_featureset, but the reason for not doing so before are still
> applicable now.
> 
> There is a theoretical case where host MPX is visible but guest max is
> hidden, and that is down to the vmentry controls.  As this doesn't exist
> in real hardware, I'm not terribly concerned about it.

I'd also see us allow features to be kept for the host, but masked
off of the/some guest feature sets, by way of a to-be-introduced
command line option.

I take your reply to mean that you agree that conceptually it
ought to be max which gets used here, but there's no practical
difference at this point.

>>  Also, while you modify p here, ...
>>
>>> +        }
>>> +    }
>>> +
>>>      if ( featureset )
>>>      {
>>>          uint32_t disabled_features[FEATURESET_NR_ENTRIES],
>> ... the code in this if()'s body ignores p altogether.
> 
> That is correct.
> 
>> I realize the
>> only caller of the function passes NULL for "featureset", but I'd
>> like to understand the rationale here anyway before giving an R-b.
> 
> The meaning of 'featureset' is "here are the exact bits I want you to use".

With validation to happen only in the hypervisor then, I suppose?

If for both parts my understanding is correct:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Andrew Cooper June 17, 2020, 11:16 a.m. UTC | #4
On 17/06/2020 11:32, Jan Beulich wrote:
> On 16.06.2020 18:15, Andrew Cooper wrote:
>> On 16/06/2020 10:33, Jan Beulich wrote:
>>> On 15.06.2020 16:15, Andrew Cooper wrote:
>>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>>>>          goto out;
>>>>      }
>>>>  
>>>> +    /*
>>>> +     * Account for feature which have been disabled by default since Xen 4.13,
>>>> +     * so migrated-in VM's don't risk seeing features disappearing.
>>>> +     */
>>>> +    if ( restore )
>>>> +    {
>>>> +        if ( di.hvm )
>>>> +        {
>>>> +            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
>>> Why do you derive this from the host featureset instead of the max
>>> one for the guest type?
>> Because that is how the logic worked for 4.13.
>>
>> Also, because we don't have easy access to the actual guest max
>> featureset at this point.  I could add two new sysctl subops to
>> get_featureset, but the reason for not doing so before are still
>> applicable now.
>>
>> There is a theoretical case where host MPX is visible but guest max is
>> hidden, and that is down to the vmentry controls.  As this doesn't exist
>> in real hardware, I'm not terribly concerned about it.
> I'd also see us allow features to be kept for the host, but masked
> off of the/some guest feature sets, by way of a to-be-introduced
> command line option.

What kind of usecase do you have in mind for this?  We've got a load of
features which are blanket disabled for guests.  I suppose `ler` et al
ought to have an impact, except for the fact that LBR at that level
isn't architectural and always expected.

> I take your reply to mean that you agree that conceptually it
> ought to be max which gets used here, but there's no practical
> difference at this point.

If max were trivially available, I'd use use that, but its not, and host
is equivalent in the cases we care about.

>
>>>  Also, while you modify p here, ...
>>>
>>>> +        }
>>>> +    }
>>>> +
>>>>      if ( featureset )
>>>>      {
>>>>          uint32_t disabled_features[FEATURESET_NR_ENTRIES],
>>> ... the code in this if()'s body ignores p altogether.
>> That is correct.
>>
>>> I realize the
>>> only caller of the function passes NULL for "featureset", but I'd
>>> like to understand the rationale here anyway before giving an R-b.
>> The meaning of 'featureset' is "here are the exact bits I want you to use".
> With validation to happen only in the hypervisor then, I suppose?

Almost none of this logic has any validation in the toolstack side (that
which does, was added by me fairly recently).  Its going to be one of
several large changes in the new world.

> If for both parts my understanding is correct:
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

~Andrew
Jan Beulich June 17, 2020, 11:24 a.m. UTC | #5
On 17.06.2020 13:16, Andrew Cooper wrote:
> On 17/06/2020 11:32, Jan Beulich wrote:
>> On 16.06.2020 18:15, Andrew Cooper wrote:
>>> On 16/06/2020 10:33, Jan Beulich wrote:
>>>> On 15.06.2020 16:15, Andrew Cooper wrote:
>>>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>>>>>          goto out;
>>>>>      }
>>>>>  
>>>>> +    /*
>>>>> +     * Account for feature which have been disabled by default since Xen 4.13,
>>>>> +     * so migrated-in VM's don't risk seeing features disappearing.
>>>>> +     */
>>>>> +    if ( restore )
>>>>> +    {
>>>>> +        if ( di.hvm )
>>>>> +        {
>>>>> +            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
>>>> Why do you derive this from the host featureset instead of the max
>>>> one for the guest type?
>>> Because that is how the logic worked for 4.13.
>>>
>>> Also, because we don't have easy access to the actual guest max
>>> featureset at this point.  I could add two new sysctl subops to
>>> get_featureset, but the reason for not doing so before are still
>>> applicable now.
>>>
>>> There is a theoretical case where host MPX is visible but guest max is
>>> hidden, and that is down to the vmentry controls.  As this doesn't exist
>>> in real hardware, I'm not terribly concerned about it.
>> I'd also see us allow features to be kept for the host, but masked
>> off of the/some guest feature sets, by way of a to-be-introduced
>> command line option.
> 
> What kind of usecase do you have in mind for this?  We've got a load of
> features which are blanket disabled for guests.  I suppose `ler` et al
> ought to have an impact, except for the fact that LBR at that level
> isn't architectural and always expected.

What I was thinking of was the kind of "none of my guests should use
AVX512 - let me disable it globally, rather than individually in
each guest's config" approach. Of course AVX512 is something we use
in Xen only to emulate guest insns, but I think the example still
serves the purpose.

Jan
Andrew Cooper June 17, 2020, 11:28 a.m. UTC | #6
On 17/06/2020 12:24, Jan Beulich wrote:
> On 17.06.2020 13:16, Andrew Cooper wrote:
>> On 17/06/2020 11:32, Jan Beulich wrote:
>>> On 16.06.2020 18:15, Andrew Cooper wrote:
>>>> On 16/06/2020 10:33, Jan Beulich wrote:
>>>>> On 15.06.2020 16:15, Andrew Cooper wrote:
>>>>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>>>>>>          goto out;
>>>>>>      }
>>>>>>  
>>>>>> +    /*
>>>>>> +     * Account for feature which have been disabled by default since Xen 4.13,
>>>>>> +     * so migrated-in VM's don't risk seeing features disappearing.
>>>>>> +     */
>>>>>> +    if ( restore )
>>>>>> +    {
>>>>>> +        if ( di.hvm )
>>>>>> +        {
>>>>>> +            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
>>>>> Why do you derive this from the host featureset instead of the max
>>>>> one for the guest type?
>>>> Because that is how the logic worked for 4.13.
>>>>
>>>> Also, because we don't have easy access to the actual guest max
>>>> featureset at this point.  I could add two new sysctl subops to
>>>> get_featureset, but the reason for not doing so before are still
>>>> applicable now.
>>>>
>>>> There is a theoretical case where host MPX is visible but guest max is
>>>> hidden, and that is down to the vmentry controls.  As this doesn't exist
>>>> in real hardware, I'm not terribly concerned about it.
>>> I'd also see us allow features to be kept for the host, but masked
>>> off of the/some guest feature sets, by way of a to-be-introduced
>>> command line option.
>> What kind of usecase do you have in mind for this?  We've got a load of
>> features which are blanket disabled for guests.  I suppose `ler` et al
>> ought to have an impact, except for the fact that LBR at that level
>> isn't architectural and always expected.
> What I was thinking of was the kind of "none of my guests should use
> AVX512 - let me disable it globally, rather than individually in
> each guest's config" approach. Of course AVX512 is something we use
> in Xen only to emulate guest insns, but I think the example still
> serves the purpose.

We actually have AVX512 disabled by default in XenServer.  The perf
implications of letting 1 guest play with it is very severe.

Now I think about it, I'm tempted to recommend it moves out of default
generally.

~Andrew
Jan Beulich June 17, 2020, 11:41 a.m. UTC | #7
On 17.06.2020 13:28, Andrew Cooper wrote:
> We actually have AVX512 disabled by default in XenServer.  The perf
> implications of letting 1 guest play with it is very severe.
> 
> Now I think about it, I'm tempted to recommend it moves out of default
> generally.

Hmm, I'm tempted to ask whether you're kidding. This is the kind of
feature that I see no reason at all to move out of default. Imo we
shouldn't put in place policy like this - if anything shouldn't be
on by default, it should imo be because of limitations in our
handling (I've recently revived my UMIP emulation patch, which
comes to mind here) or because of uncertainty on some aspects (like
is the case for MOVDIR / ENQCMD for example). Anything else should
be left to the admins to decide.

Jan
Andrew Cooper June 17, 2020, 11:47 a.m. UTC | #8
On 17/06/2020 12:41, Jan Beulich wrote:
> On 17.06.2020 13:28, Andrew Cooper wrote:
>> We actually have AVX512 disabled by default in XenServer.  The perf
>> implications of letting 1 guest play with it is very severe.
>>
>> Now I think about it, I'm tempted to recommend it moves out of default
>> generally.
> Hmm, I'm tempted to ask whether you're kidding.

I'm very definitely not.

AVX512 is a disaster, perf wise on Skylake/CascadeLake, and its very
easy to cripple the entire system, including the other guest.

So much so that "better AVX512 frequency transitions" is a headline
feature in IceLake.

>  This is the kind of
> feature that I see no reason at all to move out of default. Imo we
> shouldn't put in place policy like this - if anything shouldn't be
> on by default, it should imo be because of limitations in our
> handling (I've recently revived my UMIP emulation patch, which
> comes to mind here) or because of uncertainty on some aspects (like
> is the case for MOVDIR / ENQCMD for example). Anything else should
> be left to the admins to decide.

"left to the admins to decide" does not mean "on by default".

"default" needs to be a sensible set, which migrates safely, and can't
be used to trivially DoS the rest of the system.  An admin can always
opt into allowing this DoS, but shouldn't have it by default.

~Andrew
diff mbox series

Patch

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index e017abffce..5649913e69 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -436,6 +436,8 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
     xen_cpuid_leaf_t *leaves = NULL;
     struct cpuid_policy *p = NULL;
     uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
+    uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
+    uint32_t len = ARRAY_SIZE(host_featureset);
 
     if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 ||
          di.domid != domid )
@@ -458,6 +460,22 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
          (p = calloc(1, sizeof(*p))) == NULL )
         goto out;
 
+    /* Get the host policy. */
+    rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
+                               &len, host_featureset);
+    if ( rc )
+    {
+        /* Tolerate "buffer too small", as we've got the bits we need. */
+        if ( errno == ENOBUFS )
+            rc = 0;
+        else
+        {
+            PERROR("Failed to obtain host featureset");
+            rc = -errno;
+            goto out;
+        }
+    }
+
     /* Get the domain's default policy. */
     nr_msrs = 0;
     rc = xc_get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_default
@@ -479,6 +497,18 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
         goto out;
     }
 
+    /*
+     * Account for feature which have been disabled by default since Xen 4.13,
+     * so migrated-in VM's don't risk seeing features disappearing.
+     */
+    if ( restore )
+    {
+        if ( di.hvm )
+        {
+            p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
+        }
+    }
+
     if ( featureset )
     {
         uint32_t disabled_features[FEATURESET_NR_ENTRIES],
@@ -530,24 +560,6 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
 
     if ( !di.hvm )
     {
-        uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
-        uint32_t len = ARRAY_SIZE(host_featureset);
-
-        rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host,
-                                   &len, host_featureset);
-        if ( rc )
-        {
-            /* Tolerate "buffer too small", as we've got the bits we need. */
-            if ( errno == ENOBUFS )
-                rc = 0;
-            else
-            {
-                PERROR("Failed to obtain host featureset");
-                rc = -errno;
-                goto out;
-            }
-        }
-
         /*
          * On hardware without CPUID Faulting, PV guests see real topology.
          * As a consequence, they also need to see the host htt/cmp fields.
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 5ca35d9d97..af1b8a96a6 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -207,7 +207,7 @@  XEN_CPUFEATURE(INVPCID,       5*32+10) /*H  Invalidate Process Context ID */
 XEN_CPUFEATURE(RTM,           5*32+11) /*A  Restricted Transactional Memory */
 XEN_CPUFEATURE(PQM,           5*32+12) /*   Platform QoS Monitoring */
 XEN_CPUFEATURE(NO_FPU_SEL,    5*32+13) /*!  FPU CS/DS stored as zero */
-XEN_CPUFEATURE(MPX,           5*32+14) /*S  Memory Protection Extensions */
+XEN_CPUFEATURE(MPX,           5*32+14) /*s  Memory Protection Extensions */
 XEN_CPUFEATURE(PQE,           5*32+15) /*   Platform QoS Enforcement */
 XEN_CPUFEATURE(AVX512F,       5*32+16) /*A  AVX-512 Foundation Instructions */
 XEN_CPUFEATURE(AVX512DQ,      5*32+17) /*A  AVX-512 Doubleword & Quadword Instrs */