diff mbox series

[v3,6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

Message ID 20190225132716.6982-7-weijiang.yang@intel.com (mailing list archive)
State New, archived
Headers show
Series This patch-set is to enable Guest CET support | expand

Commit Message

Yang, Weijiang Feb. 25, 2019, 1:27 p.m. UTC
"Load Guest CET state" bit controls whether guest CET states
will be loaded at Guest entry. Before doing that, KVM needs
to check if CPU CET feature is available.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Yang, Weijiang Feb. 28, 2019, 8:38 a.m. UTC | #1
On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > "Load Guest CET state" bit controls whether guest CET states
> > will be loaded at Guest entry. Before doing that, KVM needs
> > to check if CPU CET feature is available.
> > 
> > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 89ee086e1729..d32cee9ee079 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -55,6 +55,7 @@
> >  #include <asm/mmu_context.h>
> >  #include <asm/spec-ctrl.h>
> >  #include <asm/mshyperv.h>
> > +#include <asm/cet.h>
> >  
> >  #include "trace.h"
> >  #include "pmu.h"
> > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> >  	return !(val & ~valid_bits);
> >  }
> >  
> > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > +{
> > +	u32 eax, ebx, ecx, edx;
> > +
> > +	/*
> > +	 * Guest CET can work as long as HW supports the feature, independent
> > +	 * to Host SW enabling status.
> > +	 */
> > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > +
> > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> 
> Given the holes in the (current) architecture/spec, I think KVM has to
> require both features to be supported in the guest to allow CR4.CET to
> be enabled.
The reason why I use a "OR" here is to keep CET enabling control the
same as that on host, right now on host, users can select to enable  SHSTK or IBT
feature by disabling the unexpected one. It's free to select SHSTK & IBT
or SHSTK | IBT.
> 
> Technically SHSTK and IBT can be enabled independently, but unless I'm
> missing something, supporting that in KVM (or any VMM) would be nasty
> and would likely degrade guest performance significantly.
> 
> MSRs IA32_U_CET and IA32_S_CET have enable bits for each CET feature.
> Presumably the bits for each feature are reserved if the feature is not
> supported, e.g. SH_STK_EN is reserved to zero if SHSTK isn't supported.
> This wouldn't be a problem except that IA32_U_CET and the shadow stack
> MSRs, e.g. IA32_PL*_SSP, can be saved/restored via XSAVES/XRSTORS.  The
> behavior is restricted by IA32_XSS, but again it's all or nothing, e.g.
> if any CET feature is supported then XSS_CET_{S,U} can be set.
> 
> For example, if a guest supported IBT and !SHSTK, and the guest enabled
> XSS_CET_{S,I}, KVM would need to trap XSAVES/XRSTORS to enforce that the
> SHSTK bits in XSS_CET_U aren't set.  And that doesn't even address the
> fact that the architecture defines the effects on the size of the XSAVE
> state area as being a bundled deal, e.g. IA32_XSS.CET_U=1 increases the
> size by 16 bytes (for IA32_U_CET and IA32_PL3_SSP) regardless of whether
> or not SHSTK is supported.  One would assume that IA32_PL3_SSP doesn't
> exist if shadow stacks are not supported by the CPU.
> 
> TL;DR: the architecture enumerates SHSTK and IBT independently, but
> the architecture effectively assumes they are bundled together.
> 
> > +}
> > +
> >  static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> >  {
> >  	switch (msr->index) {
> > @@ -5409,6 +5424,23 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  			return 1;
> >  	}
> >  
> > +	/*
> > +	 * To enable Guest CET, check whether CPU CET feature is
> > +	 * available, if it's there, set Guest CET state loading bit
> > +	 * per CR4.CET status, otherwise, return a fault to Guest.
> > +	 */
> > +	if (vmx_guest_cet_cap(vcpu)) {
> 
> This is wrong, it's checking the host capabilities.  Use guest_cpuid_has()
> to query the guest capabilities.  E.g. CET can be supported in the host
> but not exposed to guest, in which case the CPUID bits will not be "set"
> for the guest.
>
you're right, guest_cpuid_has() is enough for CET checking here since
now guest CET enabling is independent to host CET state.

> > +		if (cr4 & X86_CR4_CET) {
> 
> No need for curly braces here, both the 'if' and 'else' contain a single
> statement.

will remove the braces.
> 
> > +			vmcs_set_bits(VM_ENTRY_CONTROLS,
> > +				      VM_ENTRY_LOAD_GUEST_CET_STATE);
> > +		} else {
> > +			vmcs_clear_bits(VM_ENTRY_CONTROLS,
> > +					VM_ENTRY_LOAD_GUEST_CET_STATE);
> > +		}
> > +	} else if (cr4 & X86_CR4_CET) {
> > +		return 1;
> > +	}
> > +
> >  	if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
> >  		return 1;
> >  
> > -- 
> > 2.17.1
> >
Sean Christopherson Feb. 28, 2019, 4:17 p.m. UTC | #2
On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> "Load Guest CET state" bit controls whether guest CET states
> will be loaded at Guest entry. Before doing that, KVM needs
> to check if CPU CET feature is available.
> 
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 89ee086e1729..d32cee9ee079 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -55,6 +55,7 @@
>  #include <asm/mmu_context.h>
>  #include <asm/spec-ctrl.h>
>  #include <asm/mshyperv.h>
> +#include <asm/cet.h>
>  
>  #include "trace.h"
>  #include "pmu.h"
> @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>  	return !(val & ~valid_bits);
>  }
>  
> +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> +{
> +	u32 eax, ebx, ecx, edx;
> +
> +	/*
> +	 * Guest CET can work as long as HW supports the feature, independent
> +	 * to Host SW enabling status.
> +	 */
> +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> +
> +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;

Given the holes in the (current) architecture/spec, I think KVM has to
require both features to be supported in the guest to allow CR4.CET to
be enabled.

Technically SHSTK and IBT can be enabled independently, but unless I'm
missing something, supporting that in KVM (or any VMM) would be nasty
and would likely degrade guest performance significantly.

MSRs IA32_U_CET and IA32_S_CET have enable bits for each CET feature.
Presumably the bits for each feature are reserved if the feature is not
supported, e.g. SH_STK_EN is reserved to zero if SHSTK isn't supported.
This wouldn't be a problem except that IA32_U_CET and the shadow stack
MSRs, e.g. IA32_PL*_SSP, can be saved/restored via XSAVES/XRSTORS.  The
behavior is restricted by IA32_XSS, but again it's all or nothing, e.g.
if any CET feature is supported then XSS_CET_{S,U} can be set.

For example, if a guest supported IBT and !SHSTK, and the guest enabled
XSS_CET_{S,I}, KVM would need to trap XSAVES/XRSTORS to enforce that the
SHSTK bits in XSS_CET_U aren't set.  And that doesn't even address the
fact that the architecture defines the effects on the size of the XSAVE
state area as being a bundled deal, e.g. IA32_XSS.CET_U=1 increases the
size by 16 bytes (for IA32_U_CET and IA32_PL3_SSP) regardless of whether
or not SHSTK is supported.  One would assume that IA32_PL3_SSP doesn't
exist if shadow stacks are not supported by the CPU.

TL;DR: the architecture enumerates SHSTK and IBT independently, but
the architecture effectively assumes they are bundled together.

> +}
> +
>  static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>  {
>  	switch (msr->index) {
> @@ -5409,6 +5424,23 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  			return 1;
>  	}
>  
> +	/*
> +	 * To enable Guest CET, check whether CPU CET feature is
> +	 * available, if it's there, set Guest CET state loading bit
> +	 * per CR4.CET status, otherwise, return a fault to Guest.
> +	 */
> +	if (vmx_guest_cet_cap(vcpu)) {

This is wrong, it's checking the host capabilities.  Use guest_cpuid_has()
to query the guest capabilities.  E.g. CET can be supported in the host
but not exposed to guest, in which case the CPUID bits will not be "set"
for the guest.

> +		if (cr4 & X86_CR4_CET) {

No need for curly braces here, both the 'if' and 'else' contain a single
statement.

> +			vmcs_set_bits(VM_ENTRY_CONTROLS,
> +				      VM_ENTRY_LOAD_GUEST_CET_STATE);
> +		} else {
> +			vmcs_clear_bits(VM_ENTRY_CONTROLS,
> +					VM_ENTRY_LOAD_GUEST_CET_STATE);
> +		}
> +	} else if (cr4 & X86_CR4_CET) {
> +		return 1;
> +	}
> +
>  	if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
>  		return 1;
>  
> -- 
> 2.17.1
>
Sean Christopherson March 1, 2019, 2:58 p.m. UTC | #3
On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > "Load Guest CET state" bit controls whether guest CET states
> > > will be loaded at Guest entry. Before doing that, KVM needs
> > > to check if CPU CET feature is available.
> > > 
> > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 89ee086e1729..d32cee9ee079 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -55,6 +55,7 @@
> > >  #include <asm/mmu_context.h>
> > >  #include <asm/spec-ctrl.h>
> > >  #include <asm/mshyperv.h>
> > > +#include <asm/cet.h>
> > >  
> > >  #include "trace.h"
> > >  #include "pmu.h"
> > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > >  	return !(val & ~valid_bits);
> > >  }
> > >  
> > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > +{
> > > +	u32 eax, ebx, ecx, edx;
> > > +
> > > +	/*
> > > +	 * Guest CET can work as long as HW supports the feature, independent
> > > +	 * to Host SW enabling status.
> > > +	 */
> > > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > +
> > > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > 
> > Given the holes in the (current) architecture/spec, I think KVM has to
> > require both features to be supported in the guest to allow CR4.CET to
> > be enabled.
> The reason why I use a "OR" here is to keep CET enabling control the
> same as that on host, right now on host, users can select to enable  SHSTK or IBT
> feature by disabling the unexpected one. It's free to select SHSTK & IBT
> or SHSTK | IBT.

Which is not the same as SHSTK != IBT in *hardware*, which is effectively
what this is allowing for the guest.  The problem is that the architecture
doesn't cleanly separate the two features, i.e. we'd have a virtualization
hole where the guest could touch state for a disabled feature.

Regardless, the guest would still be able to selectively enable each CET
feature, it would just never see a model where SHSTK != IBT.
Yang, Weijiang March 3, 2019, 12:26 p.m. UTC | #4
On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > "Load Guest CET state" bit controls whether guest CET states
> > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > to check if CPU CET feature is available.
> > > > 
> > > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > ---
> > > >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > >  1 file changed, 32 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > index 89ee086e1729..d32cee9ee079 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -55,6 +55,7 @@
> > > >  #include <asm/mmu_context.h>
> > > >  #include <asm/spec-ctrl.h>
> > > >  #include <asm/mshyperv.h>
> > > > +#include <asm/cet.h>
> > > >  
> > > >  #include "trace.h"
> > > >  #include "pmu.h"
> > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > >  	return !(val & ~valid_bits);
> > > >  }
> > > >  
> > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	u32 eax, ebx, ecx, edx;
> > > > +
> > > > +	/*
> > > > +	 * Guest CET can work as long as HW supports the feature, independent
> > > > +	 * to Host SW enabling status.
> > > > +	 */
> > > > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > +
> > > > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > 
> > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > require both features to be supported in the guest to allow CR4.CET to
> > > be enabled.
> > The reason why I use a "OR" here is to keep CET enabling control the
> > same as that on host, right now on host, users can select to enable  SHSTK or IBT
> > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > or SHSTK | IBT.
> 
> Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> what this is allowing for the guest.  The problem is that the architecture
> doesn't cleanly separate the two features, i.e. we'd have a virtualization
> hole where the guest could touch state for a disabled feature.
> 
> Regardless, the guest would still be able to selectively enable each CET
> feature, it would just never see a model where SHSTK != IBT.
Hi, Sean,
I'd like to understand your concerns. From my point of view, e.g., 
when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
configured. Could you detail your concerns?
Yang, Weijiang March 4, 2019, 9:56 a.m. UTC | #5
On Mon, Mar 04, 2019 at 10:43:07AM -0800, Sean Christopherson wrote:
> On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> > On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > > to check if CPU CET feature is available.
> > > > > > 
> > > > > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > > ---
> > > > > >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 32 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > @@ -55,6 +55,7 @@
> > > > > >  #include <asm/mmu_context.h>
> > > > > >  #include <asm/spec-ctrl.h>
> > > > > >  #include <asm/mshyperv.h>
> > > > > > +#include <asm/cet.h>
> > > > > >  
> > > > > >  #include "trace.h"
> > > > > >  #include "pmu.h"
> > > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > >  	return !(val & ~valid_bits);
> > > > > >  }
> > > > > >  
> > > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +	u32 eax, ebx, ecx, edx;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Guest CET can work as long as HW supports the feature, independent
> > > > > > +	 * to Host SW enabling status.
> > > > > > +	 */
> > > > > > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > +
> > > > > > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > > 
> > > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > > require both features to be supported in the guest to allow CR4.CET to
> > > > > be enabled.
> > > > The reason why I use a "OR" here is to keep CET enabling control the
> > > > same as that on host, right now on host, users can select to enable  SHSTK or IBT
> > > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > > or SHSTK | IBT.
> > > 
> > > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > > what this is allowing for the guest.  The problem is that the architecture
> > > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > > hole where the guest could touch state for a disabled feature.
> > > 
> > > Regardless, the guest would still be able to selectively enable each CET
> > > feature, it would just never see a model where SHSTK != IBT.
> > Hi, Sean,
> > I'd like to understand your concerns. From my point of view, e.g., 
> > when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> > this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> > configured. Could you detail your concerns?
> 
> In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
> to the guest.  If only SHSTK is exposed, a devious guest can still use IBT
> because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
> Preventing the guest from using IBT in this scenario is infeasible as it
> would require trapping and emulating the XSAVE as well as the relevent CET
> MSRs.
Cannot agree with you more!
This is some design limitation, but from my point of view, once vmm
exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
which features to be enabled, we don't need to add extra constraints on
the usage.
Yang, Weijiang March 4, 2019, 12:36 p.m. UTC | #6
On Mon, Mar 04, 2019 at 07:12:02PM -0800, Sean Christopherson wrote:
> On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
> > On Mon, Mar 04, 2019 at 10:43:07AM -0800, Sean Christopherson wrote:
> > > On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> > > > On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > > > > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > > > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > > > > to check if CPU CET feature is available.
> > > > > > > > 
> > > > > > > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > > > > ---
> > > > > > > >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > > > >  1 file changed, 32 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > > > @@ -55,6 +55,7 @@
> > > > > > > >  #include <asm/mmu_context.h>
> > > > > > > >  #include <asm/spec-ctrl.h>
> > > > > > > >  #include <asm/mshyperv.h>
> > > > > > > > +#include <asm/cet.h>
> > > > > > > >  
> > > > > > > >  #include "trace.h"
> > > > > > > >  #include "pmu.h"
> > > > > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > > > >  	return !(val & ~valid_bits);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > > > > +{
> > > > > > > > +	u32 eax, ebx, ecx, edx;
> > > > > > > > +
> > > > > > > > +	/*
> > > > > > > > +	 * Guest CET can work as long as HW supports the feature, independent
> > > > > > > > +	 * to Host SW enabling status.
> > > > > > > > +	 */
> > > > > > > > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > > > +
> > > > > > > > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > > > > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > > > > 
> > > > > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > > > > require both features to be supported in the guest to allow CR4.CET to
> > > > > > > be enabled.
> > > > > > The reason why I use a "OR" here is to keep CET enabling control the
> > > > > > same as that on host, right now on host, users can select to enable  SHSTK or IBT
> > > > > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > > > > or SHSTK | IBT.
> > > > > 
> > > > > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > > > > what this is allowing for the guest.  The problem is that the architecture
> > > > > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > > > > hole where the guest could touch state for a disabled feature.
> > > > > 
> > > > > Regardless, the guest would still be able to selectively enable each CET
> > > > > feature, it would just never see a model where SHSTK != IBT.
> > > > Hi, Sean,
> > > > I'd like to understand your concerns. From my point of view, e.g., 
> > > > when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> > > > this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> > > > configured. Could you detail your concerns?
> > > 
> > > In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
> > > to the guest.  If only SHSTK is exposed, a devious guest can still use IBT
> > > because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
> > > Preventing the guest from using IBT in this scenario is infeasible as it
> > > would require trapping and emulating the XSAVE as well as the relevent CET
> > > MSRs.
> > Cannot agree with you more!
> > This is some design limitation, but from my point of view, once vmm
> > exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
> > which features to be enabled, we don't need to add extra constraints on
> > the usage.
> 
> But if KVM allows SHSTK and IBT to be toggled independently then the VMM
> has only exposed SHSTK or IBT, not CET as whole.
> 
> Even if SHSTK and IBT are bundled together the guest still has to opt-in
> to enabling each feature.  I don't see what we gain by pretending that
> SHSTK/IBT can be individually exposed to the guest, and on the flip side
> doing so creates a virtualization hole.
you almost convinced me ;-), maybe I'll make the feature as a bundle in
next release after check with kernel team. BTW, what do you mean by
saying "create a virtualization hole"? Is it what you stated in above
reply?
Sean Christopherson March 4, 2019, 6:43 p.m. UTC | #7
On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > to check if CPU CET feature is available.
> > > > > 
> > > > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > ---
> > > > >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 32 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > --- a/arch/x86/kvm/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > @@ -55,6 +55,7 @@
> > > > >  #include <asm/mmu_context.h>
> > > > >  #include <asm/spec-ctrl.h>
> > > > >  #include <asm/mshyperv.h>
> > > > > +#include <asm/cet.h>
> > > > >  
> > > > >  #include "trace.h"
> > > > >  #include "pmu.h"
> > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > >  	return !(val & ~valid_bits);
> > > > >  }
> > > > >  
> > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > +	u32 eax, ebx, ecx, edx;
> > > > > +
> > > > > +	/*
> > > > > +	 * Guest CET can work as long as HW supports the feature, independent
> > > > > +	 * to Host SW enabling status.
> > > > > +	 */
> > > > > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > +
> > > > > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > 
> > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > require both features to be supported in the guest to allow CR4.CET to
> > > > be enabled.
> > > The reason why I use a "OR" here is to keep CET enabling control the
> > > same as that on host, right now on host, users can select to enable  SHSTK or IBT
> > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > or SHSTK | IBT.
> > 
> > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > what this is allowing for the guest.  The problem is that the architecture
> > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > hole where the guest could touch state for a disabled feature.
> > 
> > Regardless, the guest would still be able to selectively enable each CET
> > feature, it would just never see a model where SHSTK != IBT.
> Hi, Sean,
> I'd like to understand your concerns. From my point of view, e.g., 
> when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> configured. Could you detail your concerns?

In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
to the guest.  If only SHSTK is exposed, a devious guest can still use IBT
because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
Preventing the guest from using IBT in this scenario is infeasible as it
would require trapping and emulating the XSAVE as well as the relevent CET
MSRs.
Sean Christopherson March 5, 2019, 3:12 a.m. UTC | #8
On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
> On Mon, Mar 04, 2019 at 10:43:07AM -0800, Sean Christopherson wrote:
> > On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> > > On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > > > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > > > to check if CPU CET feature is available.
> > > > > > > 
> > > > > > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > > > ---
> > > > > > >  arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 32 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > > @@ -55,6 +55,7 @@
> > > > > > >  #include <asm/mmu_context.h>
> > > > > > >  #include <asm/spec-ctrl.h>
> > > > > > >  #include <asm/mshyperv.h>
> > > > > > > +#include <asm/cet.h>
> > > > > > >  
> > > > > > >  #include "trace.h"
> > > > > > >  #include "pmu.h"
> > > > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > > >  	return !(val & ~valid_bits);
> > > > > > >  }
> > > > > > >  
> > > > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > +	u32 eax, ebx, ecx, edx;
> > > > > > > +
> > > > > > > +	/*
> > > > > > > +	 * Guest CET can work as long as HW supports the feature, independent
> > > > > > > +	 * to Host SW enabling status.
> > > > > > > +	 */
> > > > > > > +	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > > +
> > > > > > > +	return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > > > +		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > > > 
> > > > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > > > require both features to be supported in the guest to allow CR4.CET to
> > > > > > be enabled.
> > > > > The reason why I use a "OR" here is to keep CET enabling control the
> > > > > same as that on host, right now on host, users can select to enable  SHSTK or IBT
> > > > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > > > or SHSTK | IBT.
> > > > 
> > > > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > > > what this is allowing for the guest.  The problem is that the architecture
> > > > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > > > hole where the guest could touch state for a disabled feature.
> > > > 
> > > > Regardless, the guest would still be able to selectively enable each CET
> > > > feature, it would just never see a model where SHSTK != IBT.
> > > Hi, Sean,
> > > I'd like to understand your concerns. From my point of view, e.g., 
> > > when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> > > this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> > > configured. Could you detail your concerns?
> > 
> > In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
> > to the guest.  If only SHSTK is exposed, a devious guest can still use IBT
> > because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
> > Preventing the guest from using IBT in this scenario is infeasible as it
> > would require trapping and emulating the XSAVE as well as the relevent CET
> > MSRs.
> Cannot agree with you more!
> This is some design limitation, but from my point of view, once vmm
> exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
> which features to be enabled, we don't need to add extra constraints on
> the usage.

But if KVM allows SHSTK and IBT to be toggled independently then the VMM
has only exposed SHSTK or IBT, not CET as whole.

Even if SHSTK and IBT are bundled together the guest still has to opt-in
to enabling each feature.  I don't see what we gain by pretending that
SHSTK/IBT can be individually exposed to the guest, and on the flip side
doing so creates a virtualization hole.
Sean Christopherson March 8, 2019, 4:28 p.m. UTC | #9
On Mon, Mar 04, 2019 at 08:36:55PM +0800, Yang Weijiang wrote:
> On Mon, Mar 04, 2019 at 07:12:02PM -0800, Sean Christopherson wrote:
> > On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
> > > Cannot agree with you more!
> > > This is some design limitation, but from my point of view, once vmm
> > > exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
> > > which features to be enabled, we don't need to add extra constraints on
> > > the usage.
> > 
> > But if KVM allows SHSTK and IBT to be toggled independently then the VMM
> > has only exposed SHSTK or IBT, not CET as whole.
> > 
> > Even if SHSTK and IBT are bundled together the guest still has to opt-in
> > to enabling each feature.  I don't see what we gain by pretending that
> > SHSTK/IBT can be individually exposed to the guest, and on the flip side
> > doing so creates a virtualization hole.
> you almost convinced me ;-), maybe I'll make the feature as a bundle in
> next release after check with kernel team. BTW, what do you mean by
> saying "create a virtualization hole"? Is it what you stated in above
> reply?

By "virtualization hole" I mean the guest would be able to use a feature
that the virtual CPU model says isn't supported.

After rereading the XSS architecture, there's a marginally less crappy
option for handling XRSTOR as we could use the XSS_EXIT_BITMAP to
intercept XRSTOR if SHSTK != IBT and the guest is restoring CET state,
e.g. to ensure the guest isn't setting IA32_PL*_SSP if !SHSTK and isn't
setting bits that are effectively reserved in IA32_U_CET.

But practically speaking that'd be the same as intercepting XRSTORS
unconditionally when the guest is using CET, i.e. it's still going to
tank the performance of a guest that uses CET+XSAVES/XRSTORS.
Paolo Bonzini March 8, 2019, 4:36 p.m. UTC | #10
On 08/03/19 17:28, Sean Christopherson wrote:
> On Mon, Mar 04, 2019 at 08:36:55PM +0800, Yang Weijiang wrote:
>> On Mon, Mar 04, 2019 at 07:12:02PM -0800, Sean Christopherson wrote:
>>> On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
>>>> Cannot agree with you more!
>>>> This is some design limitation, but from my point of view, once vmm
>>>> exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
>>>> which features to be enabled, we don't need to add extra constraints on
>>>> the usage.
>>>
>>> But if KVM allows SHSTK and IBT to be toggled independently then the VMM
>>> has only exposed SHSTK or IBT, not CET as whole.
>>>
>>> Even if SHSTK and IBT are bundled together the guest still has to opt-in
>>> to enabling each feature.  I don't see what we gain by pretending that
>>> SHSTK/IBT can be individually exposed to the guest, and on the flip side
>>> doing so creates a virtualization hole.
>> you almost convinced me ;-), maybe I'll make the feature as a bundle in
>> next release after check with kernel team. BTW, what do you mean by
>> saying "create a virtualization hole"? Is it what you stated in above
>> reply?
> 
> By "virtualization hole" I mean the guest would be able to use a feature
> that the virtual CPU model says isn't supported.

I think it's okay to leave the hole and leave it to userspace to forbid
enabling only one of the bits.

Paolo

> After rereading the XSS architecture, there's a marginally less crappy
> option for handling XRSTOR as we could use the XSS_EXIT_BITMAP to
> intercept XRSTOR if SHSTK != IBT and the guest is restoring CET state,
> e.g. to ensure the guest isn't setting IA32_PL*_SSP if !SHSTK and isn't
> setting bits that are effectively reserved in IA32_U_CET.
> 
> But practically speaking that'd be the same as intercepting XRSTORS
> unconditionally when the guest is using CET, i.e. it's still going to
> tank the performance of a guest that uses CET+XSAVES/XRSTORS.
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 89ee086e1729..d32cee9ee079 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -55,6 +55,7 @@ 
 #include <asm/mmu_context.h>
 #include <asm/spec-ctrl.h>
 #include <asm/mshyperv.h>
+#include <asm/cet.h>
 
 #include "trace.h"
 #include "pmu.h"
@@ -4065,6 +4066,20 @@  static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
 	return !(val & ~valid_bits);
 }
 
+static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
+{
+	u32 eax, ebx, ecx, edx;
+
+	/*
+	 * Guest CET can work as long as HW supports the feature, independent
+	 * to Host SW enabling status.
+	 */
+	cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
+
+	return ((ecx & bit(X86_FEATURE_SHSTK)) |
+		(edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
+}
+
 static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
@@ -5409,6 +5424,23 @@  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	/*
+	 * To enable Guest CET, check whether CPU CET feature is
+	 * available, if it's there, set Guest CET state loading bit
+	 * per CR4.CET status, otherwise, return a fault to Guest.
+	 */
+	if (vmx_guest_cet_cap(vcpu)) {
+		if (cr4 & X86_CR4_CET) {
+			vmcs_set_bits(VM_ENTRY_CONTROLS,
+				      VM_ENTRY_LOAD_GUEST_CET_STATE);
+		} else {
+			vmcs_clear_bits(VM_ENTRY_CONTROLS,
+					VM_ENTRY_LOAD_GUEST_CET_STATE);
+		}
+	} else if (cr4 & X86_CR4_CET) {
+		return 1;
+	}
+
 	if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
 		return 1;