diff mbox

KVM: VMX: expose the host's ARCH_CAPABILITIES MSR to userspace

Message ID 20180302214212.GB13606@flask (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář March 2, 2018, 9:42 p.m. UTC
2018-03-02 10:36+0100, Paolo Bonzini:
> On 01/03/2018 22:39, Radim Krčmář wrote:
> > [Resent after removing g@char.us.oracle.com.]
> > 
> > 2018-02-26 17:13-0500, Konrad Rzeszutek Wilk:
> >> On Sat, Feb 24, 2018 at 01:52:26AM +0100, Paolo Bonzini wrote:
> >>> Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
> >>> userspace.  This way, userspace can access the capabilities even if it
> >>> does not have the permissions to read MSRs.
> >>
> >> ... That is good but could you expand a bit of why it would want this?
> >>
> >> I am 99% sure it is due to the lovely spectre_v2 mitigation but
> >> could you include that in the commit message so that in say a year
> >> folks would know what this is?
> > 
> > Userspace can currently get the MSR by creating a VCPU and reading its
> > MSR_IA32_ARCH_CAPABILITIES, because it is set from the hardware MSR.
> > 
> > I thought that "permissions to read MSRs" talked about hardware MSRs, so
> > the purpose of this patch would be a better interface, but I don't see
> > how if we keep the auto-setting on VCPU creation.
> 
> Yeah, it's mostly about a better interface and being able to do checks
> before creating the VCPU.  The commit message was written before I
> noticed the auto-setting on VCPU creation, and I failed to update it.

Ok, sounds good.  I've deferred it to rc5 as I think we'll want to use
this to replace the auto setting:  I would not bet that it is going to
be safe to expose future bits, so having the userspace always sanitize
the capabilities would be safer (and more in line with what we do with
other MSRs).  i.e. this patch would also

Comments

Paolo Bonzini March 7, 2018, 11:53 a.m. UTC | #1
On 02/03/2018 22:42, Radim Krčmář wrote:
> Ok, sounds good.  I've deferred it to rc5 as I think we'll want to use
> this to replace the auto setting:  I would not bet that it is going to
> be safe to expose future bits, so having the userspace always sanitize
> the capabilities would be safer (and more in line with what we do with
> other MSRs).  i.e. this patch would also
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 051dab74e4e9..86ea4a83af1f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5740,9 +5740,6 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  		++vmx->nmsrs;
>  	}
>  
> -	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> -		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
> -
>  	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
>  
>  	/* 22.2.1, 20.8.1 */

I don't know. There are good reasons for both behaviors, and especially
the following two for _not_ removing the rdmsr:

1) so far you could just pass the result of KVM_GET_SUPPORTED_CPUID to
KVM_SET_CPUID2, and expect the result to be "as close as possible to the
host";

2) having different behavior for VMX and ARCH_CAPABILITIES MSRs would be
confusing.

I think I'm leaning more towards the following direction: whitelist
ARCH_CAPABILITIES, like we do for the AMD LFENCE MSR already, and
default the AMD LFENCE MSR to the host value.

Thanks,

Paolo
Radim Krčmář March 7, 2018, 2:56 p.m. UTC | #2
2018-03-07 12:53+0100, Paolo Bonzini:
> On 02/03/2018 22:42, Radim Krčmář wrote:
> > Ok, sounds good.  I've deferred it to rc5 as I think we'll want to use
> > this to replace the auto setting:  I would not bet that it is going to
> > be safe to expose future bits, so having the userspace always sanitize
> > the capabilities would be safer (and more in line with what we do with
> > other MSRs).  i.e. this patch would also
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 051dab74e4e9..86ea4a83af1f 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -5740,9 +5740,6 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >  		++vmx->nmsrs;
> >  	}
> >  
> > -	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> > -		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
> > -
> >  	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
> >  
> >  	/* 22.2.1, 20.8.1 */
> 
> I don't know. There are good reasons for both behaviors, and especially
> the following two for _not_ removing the rdmsr:
> 
> 1) so far you could just pass the result of KVM_GET_SUPPORTED_CPUID to
> KVM_SET_CPUID2, and expect the result to be "as close as possible to the
> host";
>
> 2) having different behavior for VMX and ARCH_CAPABILITIES MSRs would be
> confusing.

Right, we can't just stop setting them by default ...

(I am in favor of forcing the userspace to configure everything and I'd
 accept this exception as a mistake of the past.)

> I think I'm leaning more towards the following direction: whitelist
> ARCH_CAPABILITIES, like we do for the AMD LFENCE MSR already, and
> default the AMD LFENCE MSR to the host value.

The whitelisting is a good idea and I'm ok with just that, thanks.

The MSR_F10H_DECFG default is questionable -- MSR_F10H_DECFG is an
architectural MSR, so we'd be changing the guest under the sight of
existing userspaces.
A potential security risk if they migrate the guest to a CPU that
doesn't serialize LFENCE.  ARCH_CAPABILITIES are at least hidden by a
new CPUID bit.

The feature MSR defaults are going to be a mess anyway: we have
MSR_IA32_UCODE_REV that is tightly coupled with CPUID.  Not a good
candidate for passing by default and currently also has a default value.
Paolo Bonzini March 7, 2018, 3:10 p.m. UTC | #3
On 07/03/2018 15:56, Radim Krčmář wrote:
> The MSR_F10H_DECFG default is questionable -- MSR_F10H_DECFG is an
> architectural MSR, so we'd be changing the guest under the sight of
> existing userspaces.
> A potential security risk if they migrate the guest to a CPU that
> doesn't serialize LFENCE.  ARCH_CAPABILITIES are at least hidden by a
> new CPUID bit.

Good point.  Perhaps we should add a KVM-specific CPUID bit for
serializing LFENCE.

> The feature MSR defaults are going to be a mess anyway: we have
> MSR_IA32_UCODE_REV that is tightly coupled with CPUID.  Not a good
> candidate for passing by default and currently also has a default value.

Yeah, ucode revision is out of question.

Paolo
Radim Krčmář March 7, 2018, 3:39 p.m. UTC | #4
2018-03-07 16:10+0100, Paolo Bonzini:
> On 07/03/2018 15:56, Radim Krčmář wrote:
> > The MSR_F10H_DECFG default is questionable -- MSR_F10H_DECFG is an
> > architectural MSR, so we'd be changing the guest under the sight of
> > existing userspaces.
> > A potential security risk if they migrate the guest to a CPU that
> > doesn't serialize LFENCE.  ARCH_CAPABILITIES are at least hidden by a
> > new CPUID bit.
> 
> Good point.  Perhaps we should add a KVM-specific CPUID bit for
> serializing LFENCE.

I reckon it wouldn't help much in the wild: we'd need userspace changes
(at least for QEMU) and at that point, userspace can as well just
implement MSR_F10H_DECFG and use an unmodified guest.

We can't easily intercept LFENCE to emulate the feature either, so it
seems like a waste of effort to me.
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 051dab74e4e9..86ea4a83af1f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5740,9 +5740,6 @@  static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 		++vmx->nmsrs;
 	}
 
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
-
 	vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
 	/* 22.2.1, 20.8.1 */