Message ID | 20200615141532.1927-8-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | XSA-320 follow for IvyBridge | expand |
On 15.06.2020 16:15, Andrew Cooper wrote: > @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, > goto out; > } > > + /* > + * Account for feature which have been disabled by default since Xen 4.13, > + * so migrated-in VM's don't risk seeing features disappearing. > + */ > + if ( restore ) > + { > + if ( di.hvm ) > + { > + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); Why do you derive this from the host featureset instead of the max one for the guest type? Also, while you modify p here, ... > + } > + } > + > if ( featureset ) > { > uint32_t disabled_features[FEATURESET_NR_ENTRIES], ... the code in this if()'s body ignores p altogether. I realize the only caller of the function passes NULL for "featureset", but I'd like to understand the rationale here anyway before giving an R-b. Jan
On 16/06/2020 10:33, Jan Beulich wrote: > On 15.06.2020 16:15, Andrew Cooper wrote: >> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, >> goto out; >> } >> >> + /* >> + * Account for feature which have been disabled by default since Xen 4.13, >> + * so migrated-in VM's don't risk seeing features disappearing. >> + */ >> + if ( restore ) >> + { >> + if ( di.hvm ) >> + { >> + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); > Why do you derive this from the host featureset instead of the max > one for the guest type? Because that is how the logic worked for 4.13. Also, because we don't have easy access to the actual guest max featureset at this point. I could add two new sysctl subops to get_featureset, but the reason for not doing so before are still applicable now. There is a theoretical case where host MPX is visible but guest max is hidden, and that is down to the vmentry controls. As this doesn't exist in real hardware, I'm not terribly concerned about it. > Also, while you modify p here, ... > >> + } >> + } >> + >> if ( featureset ) >> { >> uint32_t disabled_features[FEATURESET_NR_ENTRIES], > ... the code in this if()'s body ignores p altogether. That is correct. > I realize the > only caller of the function passes NULL for "featureset", but I'd > like to understand the rationale here anyway before giving an R-b. The meaning of 'featureset' is "here are the exact bits I want you to use". It has existed since my original CPUID work in 4.6/4.7. Currently, the only user in XenServer, and its a stopgap on the way to the "proper" solution which I've been working on for the past several years. ~Andrew
On 16.06.2020 18:15, Andrew Cooper wrote: > On 16/06/2020 10:33, Jan Beulich wrote: >> On 15.06.2020 16:15, Andrew Cooper wrote: >>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, >>> goto out; >>> } >>> >>> + /* >>> + * Account for feature which have been disabled by default since Xen 4.13, >>> + * so migrated-in VM's don't risk seeing features disappearing. >>> + */ >>> + if ( restore ) >>> + { >>> + if ( di.hvm ) >>> + { >>> + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); >> Why do you derive this from the host featureset instead of the max >> one for the guest type? > > Because that is how the logic worked for 4.13. > > Also, because we don't have easy access to the actual guest max > featureset at this point. I could add two new sysctl subops to > get_featureset, but the reason for not doing so before are still > applicable now. > > There is a theoretical case where host MPX is visible but guest max is > hidden, and that is down to the vmentry controls. As this doesn't exist > in real hardware, I'm not terribly concerned about it. I'd also see us allow features to be kept for the host, but masked off of the/some guest feature sets, by way of a to-be-introduced command line option. I take your reply to mean that you agree that conceptually it ought to be max which gets used here, but there's no practical difference at this point. >> Also, while you modify p here, ... >> >>> + } >>> + } >>> + >>> if ( featureset ) >>> { >>> uint32_t disabled_features[FEATURESET_NR_ENTRIES], >> ... the code in this if()'s body ignores p altogether. > > That is correct. > >> I realize the >> only caller of the function passes NULL for "featureset", but I'd >> like to understand the rationale here anyway before giving an R-b. > > The meaning of 'featureset' is "here are the exact bits I want you to use". With validation to happen only in the hypervisor then, I suppose? If for both parts my understanding is correct: Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
On 17/06/2020 11:32, Jan Beulich wrote: > On 16.06.2020 18:15, Andrew Cooper wrote: >> On 16/06/2020 10:33, Jan Beulich wrote: >>> On 15.06.2020 16:15, Andrew Cooper wrote: >>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, >>>> goto out; >>>> } >>>> >>>> + /* >>>> + * Account for feature which have been disabled by default since Xen 4.13, >>>> + * so migrated-in VM's don't risk seeing features disappearing. >>>> + */ >>>> + if ( restore ) >>>> + { >>>> + if ( di.hvm ) >>>> + { >>>> + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); >>> Why do you derive this from the host featureset instead of the max >>> one for the guest type? >> Because that is how the logic worked for 4.13. >> >> Also, because we don't have easy access to the actual guest max >> featureset at this point. I could add two new sysctl subops to >> get_featureset, but the reason for not doing so before are still >> applicable now. >> >> There is a theoretical case where host MPX is visible but guest max is >> hidden, and that is down to the vmentry controls. As this doesn't exist >> in real hardware, I'm not terribly concerned about it. > I'd also see us allow features to be kept for the host, but masked > off of the/some guest feature sets, by way of a to-be-introduced > command line option. What kind of usecase do you have in mind for this? We've got a load of features which are blanket disabled for guests. I suppose `ler` et al ought to have an impact, except for the fact that LBR at that level isn't architectural and always expected. > I take your reply to mean that you agree that conceptually it > ought to be max which gets used here, but there's no practical > difference at this point. If max were trivially available, I'd use use that, but its not, and host is equivalent in the cases we care about. > >>> Also, while you modify p here, ... >>> >>>> + } >>>> + } >>>> + >>>> if ( featureset ) >>>> { >>>> uint32_t disabled_features[FEATURESET_NR_ENTRIES], >>> ... the code in this if()'s body ignores p altogether. >> That is correct. >> >>> I realize the >>> only caller of the function passes NULL for "featureset", but I'd >>> like to understand the rationale here anyway before giving an R-b. >> The meaning of 'featureset' is "here are the exact bits I want you to use". > With validation to happen only in the hypervisor then, I suppose? Almost none of this logic has any validation in the toolstack side (that which does, was added by me fairly recently). Its going to be one of several large changes in the new world. > If for both parts my understanding is correct: > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks. ~Andrew
On 17.06.2020 13:16, Andrew Cooper wrote: > On 17/06/2020 11:32, Jan Beulich wrote: >> On 16.06.2020 18:15, Andrew Cooper wrote: >>> On 16/06/2020 10:33, Jan Beulich wrote: >>>> On 15.06.2020 16:15, Andrew Cooper wrote: >>>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, >>>>> goto out; >>>>> } >>>>> >>>>> + /* >>>>> + * Account for feature which have been disabled by default since Xen 4.13, >>>>> + * so migrated-in VM's don't risk seeing features disappearing. >>>>> + */ >>>>> + if ( restore ) >>>>> + { >>>>> + if ( di.hvm ) >>>>> + { >>>>> + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); >>>> Why do you derive this from the host featureset instead of the max >>>> one for the guest type? >>> Because that is how the logic worked for 4.13. >>> >>> Also, because we don't have easy access to the actual guest max >>> featureset at this point. I could add two new sysctl subops to >>> get_featureset, but the reason for not doing so before are still >>> applicable now. >>> >>> There is a theoretical case where host MPX is visible but guest max is >>> hidden, and that is down to the vmentry controls. As this doesn't exist >>> in real hardware, I'm not terribly concerned about it. >> I'd also see us allow features to be kept for the host, but masked >> off of the/some guest feature sets, by way of a to-be-introduced >> command line option. > > What kind of usecase do you have in mind for this? We've got a load of > features which are blanket disabled for guests. I suppose `ler` et al > ought to have an impact, except for the fact that LBR at that level > isn't architectural and always expected. What I was thinking of was the kind of "none of my guests should use AVX512 - let me disable it globally, rather than individually in each guest's config" approach. Of course AVX512 is something we use in Xen only to emulate guest insns, but I think the example still serves the purpose. Jan
On 17/06/2020 12:24, Jan Beulich wrote: > On 17.06.2020 13:16, Andrew Cooper wrote: >> On 17/06/2020 11:32, Jan Beulich wrote: >>> On 16.06.2020 18:15, Andrew Cooper wrote: >>>> On 16/06/2020 10:33, Jan Beulich wrote: >>>>> On 15.06.2020 16:15, Andrew Cooper wrote: >>>>>> @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, >>>>>> goto out; >>>>>> } >>>>>> >>>>>> + /* >>>>>> + * Account for feature which have been disabled by default since Xen 4.13, >>>>>> + * so migrated-in VM's don't risk seeing features disappearing. >>>>>> + */ >>>>>> + if ( restore ) >>>>>> + { >>>>>> + if ( di.hvm ) >>>>>> + { >>>>>> + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); >>>>> Why do you derive this from the host featureset instead of the max >>>>> one for the guest type? >>>> Because that is how the logic worked for 4.13. >>>> >>>> Also, because we don't have easy access to the actual guest max >>>> featureset at this point. I could add two new sysctl subops to >>>> get_featureset, but the reason for not doing so before are still >>>> applicable now. >>>> >>>> There is a theoretical case where host MPX is visible but guest max is >>>> hidden, and that is down to the vmentry controls. As this doesn't exist >>>> in real hardware, I'm not terribly concerned about it. >>> I'd also see us allow features to be kept for the host, but masked >>> off of the/some guest feature sets, by way of a to-be-introduced >>> command line option. >> What kind of usecase do you have in mind for this? We've got a load of >> features which are blanket disabled for guests. I suppose `ler` et al >> ought to have an impact, except for the fact that LBR at that level >> isn't architectural and always expected. > What I was thinking of was the kind of "none of my guests should use > AVX512 - let me disable it globally, rather than individually in > each guest's config" approach. Of course AVX512 is something we use > in Xen only to emulate guest insns, but I think the example still > serves the purpose. We actually have AVX512 disabled by default in XenServer. The perf implications of letting 1 guest play with it is very severe. Now I think about it, I'm tempted to recommend it moves out of default generally. ~Andrew
On 17.06.2020 13:28, Andrew Cooper wrote: > We actually have AVX512 disabled by default in XenServer. The perf > implications of letting 1 guest play with it is very severe. > > Now I think about it, I'm tempted to recommend it moves out of default > generally. Hmm, I'm tempted to ask whether you're kidding. This is the kind of feature that I see no reason at all to move out of default. Imo we shouldn't put in place policy like this - if anything shouldn't be on by default, it should imo be because of limitations in our handling (I've recently revived my UMIP emulation patch, which comes to mind here) or because of uncertainty on some aspects (like is the case for MOVDIR / ENQCMD for example). Anything else should be left to the admins to decide. Jan
On 17/06/2020 12:41, Jan Beulich wrote: > On 17.06.2020 13:28, Andrew Cooper wrote: >> We actually have AVX512 disabled by default in XenServer. The perf >> implications of letting 1 guest play with it is very severe. >> >> Now I think about it, I'm tempted to recommend it moves out of default >> generally. > Hmm, I'm tempted to ask whether you're kidding. I'm very definitely not. AVX512 is a disaster, perf wise on Skylake/CascadeLake, and its very easy to cripple the entire system, including the other guest. So much so that "better AVX512 frequency transitions" is a headline feature in IceLake. > This is the kind of > feature that I see no reason at all to move out of default. Imo we > shouldn't put in place policy like this - if anything shouldn't be > on by default, it should imo be because of limitations in our > handling (I've recently revived my UMIP emulation patch, which > comes to mind here) or because of uncertainty on some aspects (like > is the case for MOVDIR / ENQCMD for example). Anything else should > be left to the admins to decide. "left to the admins to decide" does not mean "on by default". "default" needs to be a sensible set, which migrates safely, and can't be used to trivially DoS the rest of the system. An admin can always opt into allowing this DoS, but shouldn't have it by default. ~Andrew
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index e017abffce..5649913e69 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -436,6 +436,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, xen_cpuid_leaf_t *leaves = NULL; struct cpuid_policy *p = NULL; uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1; + uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {}; + uint32_t len = ARRAY_SIZE(host_featureset); if ( xc_domain_getinfo(xch, domid, 1, &di) != 1 || di.domid != domid ) @@ -458,6 +460,22 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, (p = calloc(1, sizeof(*p))) == NULL ) goto out; + /* Get the host policy. */ + rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host, + &len, host_featureset); + if ( rc ) + { + /* Tolerate "buffer too small", as we've got the bits we need. */ + if ( errno == ENOBUFS ) + rc = 0; + else + { + PERROR("Failed to obtain host featureset"); + rc = -errno; + goto out; + } + } + /* Get the domain's default policy. */ nr_msrs = 0; rc = xc_get_system_cpu_policy(xch, di.hvm ? XEN_SYSCTL_cpu_policy_hvm_default @@ -479,6 +497,18 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, goto out; } + /* + * Account for feature which have been disabled by default since Xen 4.13, + * so migrated-in VM's don't risk seeing features disappearing. + */ + if ( restore ) + { + if ( di.hvm ) + { + p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); + } + } + if ( featureset ) { uint32_t disabled_features[FEATURESET_NR_ENTRIES], @@ -530,24 +560,6 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, if ( !di.hvm ) { - uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {}; - uint32_t len = ARRAY_SIZE(host_featureset); - - rc = xc_get_cpu_featureset(xch, XEN_SYSCTL_cpu_featureset_host, - &len, host_featureset); - if ( rc ) - { - /* Tolerate "buffer too small", as we've got the bits we need. */ - if ( errno == ENOBUFS ) - rc = 0; - else - { - PERROR("Failed to obtain host featureset"); - rc = -errno; - goto out; - } - } - /* * On hardware without CPUID Faulting, PV guests see real topology. * As a consequence, they also need to see the host htt/cmp fields. diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 5ca35d9d97..af1b8a96a6 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -207,7 +207,7 @@ XEN_CPUFEATURE(INVPCID, 5*32+10) /*H Invalidate Process Context ID */ XEN_CPUFEATURE(RTM, 5*32+11) /*A Restricted Transactional Memory */ XEN_CPUFEATURE(PQM, 5*32+12) /* Platform QoS Monitoring */ XEN_CPUFEATURE(NO_FPU_SEL, 5*32+13) /*! FPU CS/DS stored as zero */ -XEN_CPUFEATURE(MPX, 5*32+14) /*S Memory Protection Extensions */ +XEN_CPUFEATURE(MPX, 5*32+14) /*s Memory Protection Extensions */ XEN_CPUFEATURE(PQE, 5*32+15) /* Platform QoS Enforcement */ XEN_CPUFEATURE(AVX512F, 5*32+16) /*A AVX-512 Foundation Instructions */ XEN_CPUFEATURE(AVX512DQ, 5*32+17) /*A AVX-512 Doubleword & Quadword Instrs */
Memory Protection eXtension support has been dropped from GCC and Linux, and will be dropped from future Intel CPUs. With all other default/max pieces in place, move MPX from default to max. This means that VMs won't be offered it by default, but can explicitly opt into using it via cpuid="host,mpx=1" in their vm.cfg file. The difference as visible to the guest is: diff --git a/default b/mpx index 0e91765d6b..c8c33cd584 100644 --- a/default +++ b/mpx @@ -13,15 +13,17 @@ Native cpuid: 00000004:00000004 -> 00000000:00000000:00000000:00000000 00000005:ffffffff -> 00000000:00000000:00000000:00000000 00000006:ffffffff -> 00000000:00000000:00000000:00000000 - 00000007:00000000 -> 00000000:009c2fbb:00000000:9c000400 + 00000007:00000000 -> 00000000:009c6fbb:00000000:9c000400 00000008:ffffffff -> 00000000:00000000:00000000:00000000 00000009:ffffffff -> 00000000:00000000:00000000:00000000 0000000a:ffffffff -> 00000000:00000000:00000000:00000000 0000000b:ffffffff -> 00000000:00000000:00000000:00000000 0000000c:ffffffff -> 00000000:00000000:00000000:00000000 - 0000000d:00000000 -> 00000007:00000240:00000340:00000000 + 0000000d:00000000 -> 0000001f:00000240:00000440:00000000 0000000d:00000001 -> 0000000f:00000240:00000000:00000000 0000000d:00000002 -> 00000100:00000240:00000000:00000000 + 0000000d:00000003 -> 00000040:000003c0:00000000:00000000 + 0000000d:00000004 -> 00000040:00000400:00000000:00000000 40000000:ffffffff -> 40000005:566e6558:65584d4d:4d4d566e 40000001:ffffffff -> 0004000e:00000000:00000000:00000000 40000002:ffffffff -> 00000001:40000000:00000000:00000000 Adjust the legacy restore path in libxc to cope safely with pre-4.14 VMs. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wl@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Ian Jackson <Ian.Jackson@citrix.com> CC: Wei Liu <wl@xen.org> CC: Paul Durrant <paul@xen.org> Dropped Jan's R-by from previous posting, as the patch has gained new toolstack logic to avoid breaking migrate. --- tools/libxc/xc_cpuid_x86.c | 48 ++++++++++++++++++----------- xen/include/public/arch-x86/cpufeatureset.h | 2 +- 2 files changed, 31 insertions(+), 19 deletions(-)