diff mbox series

[1/2] KVM: nVMX: Always indicate HLT activity support in VMX_MISC MSR

Message ID 20190819214650.41991-2-nikita.leshchenko@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: nVMX: Improve HLT activity support | expand

Commit Message

Nikita Leshenko Aug. 19, 2019, 9:46 p.m. UTC
Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
VMCS12. We can fix it by either failing VM entries with HLT activity state when
it's not supported or by disallowing clearing this bit.

The latter is preferable. If we go with the former, to disable
GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
1" control, otherwise KVM will be presenting a bogus model to L1.

Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
compatibility.

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
---
 arch/x86/kvm/vmx/nested.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Sean Christopherson Aug. 19, 2019, 10:11 p.m. UTC | #1
On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote:
> Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
> VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
> VMCS12. We can fix it by either failing VM entries with HLT activity state when
> it's not supported or by disallowing clearing this bit.
> 
> The latter is preferable. If we go with the former, to disable
> GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
> 1" control, otherwise KVM will be presenting a bogus model to L1.
> 
> Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
> compatibility.

Paolo, do we actually need to maintain backwards compatibility in this
case?  This seems like a good candidate for "fix the bug and see who yells".

> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 46af3a5e9209..24734946ec75 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1102,6 +1102,14 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
>  	if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc))
>  		return -EINVAL;
>  
> +	/*
> +	 * We always support HLT activity state. In the past it was possible to
> +	 * turn HLT bit off (without actually turning off HLT activity state
> +	 * support) so we don't fail vmx_restore_vmx_misc if this bit is turned
> +	 * off.
> +	 */
> +	data |= VMX_MISC_ACTIVITY_HLT;
> +
>  	vmx->nested.msrs.misc_low = data;
>  	vmx->nested.msrs.misc_high = data >> 32;
>  
> -- 
> 2.20.1
>
Jim Mattson Aug. 21, 2019, 8:59 p.m. UTC | #2
On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote:
> > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
> > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
> > VMCS12. We can fix it by either failing VM entries with HLT activity state when
> > it's not supported or by disallowing clearing this bit.
> >
> > The latter is preferable. If we go with the former, to disable
> > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
> > 1" control, otherwise KVM will be presenting a bogus model to L1.
> >
> > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
> > compatibility.
>
> Paolo, do we actually need to maintain backwards compatibility in this
> case?  This seems like a good candidate for "fix the bug and see who yells".

Google's userspace clears bit 6. Please don't fail that write!
Sean Christopherson Aug. 21, 2019, 10:22 p.m. UTC | #3
On Wed, Aug 21, 2019 at 01:59:20PM -0700, Jim Mattson wrote:
> On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote:
> > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
> > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
> > > VMCS12. We can fix it by either failing VM entries with HLT activity state when
> > > it's not supported or by disallowing clearing this bit.
> > >
> > > The latter is preferable. If we go with the former, to disable
> > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
> > > 1" control, otherwise KVM will be presenting a bogus model to L1.
> > >
> > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
> > > compatibility.
> >
> > Paolo, do we actually need to maintain backwards compatibility in this
> > case?  This seems like a good candidate for "fix the bug and see who yells".
> 
> Google's userspace clears bit 6. Please don't fail that write!

Booooo.

Requiring CPU_BASED_HLT_EXITING to be forced to 1 is probably off the
table since the bits are in two separate MSRs, i.e. we run into a
chicken-and-egg scenario.

Silently forcing GUEST_ACTIVITY_HLT seems wrong, especially if userspace
has also forced CPU_BASED_HLT_EXITING, e.g. KVM would be letting L1 do
something userspace explicitly asked KVM to prevent.

Generally speaking, KVM lets userspace do dumb things so long as it
doesn't break the kernel, so, what if we change sync_vmcs02_to_vmcs12()
to propagate GUEST_ACTIVITY_HLT to vmcs12 if and only if the activity
state exists?  That might be better than causing a nested VM-Enter to
fail?  I'm also a-ok doing nothing and fulling putting the onus on
userspace to get the config correct.
Jim Mattson Aug. 21, 2019, 11:01 p.m. UTC | #4
On Wed, Aug 21, 2019 at 3:22 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Aug 21, 2019 at 01:59:20PM -0700, Jim Mattson wrote:
> > On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote:
> > > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
> > > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
> > > > VMCS12. We can fix it by either failing VM entries with HLT activity state when
> > > > it's not supported or by disallowing clearing this bit.
> > > >
> > > > The latter is preferable. If we go with the former, to disable
> > > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
> > > > 1" control, otherwise KVM will be presenting a bogus model to L1.
> > > >
> > > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
> > > > compatibility.
> > >
> > > Paolo, do we actually need to maintain backwards compatibility in this
> > > case?  This seems like a good candidate for "fix the bug and see who yells".
> >
> > Google's userspace clears bit 6. Please don't fail that write!
>
> Booooo.
>

Supporting activity state HLT is on our list of things to do, but I'm
not convinced that kvm actually handles it properly yet. For
instance...

What happens if L1 launches L2 into activity state HLT with a
zero-valued VMX preemption timer? (Maybe this is fixed now?)
What happens if "monitor trap flag" is set and "HLT exiting" is clear
in the vmcs12, and immediately on VM-entry, L2 executes HLT? (Yes,
this is a special case of MTF being broken when L0 emulates an L2
instruction.)

I'm sure there are other interesting scenarios that haven't been validated.
Sean Christopherson Aug. 21, 2019, 11:20 p.m. UTC | #5
On Wed, Aug 21, 2019 at 04:01:49PM -0700, Jim Mattson wrote:
> On Wed, Aug 21, 2019 at 3:22 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Wed, Aug 21, 2019 at 01:59:20PM -0700, Jim Mattson wrote:
> > > On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > > >
> > > > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote:
> > > > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
> > > > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
> > > > > VMCS12. We can fix it by either failing VM entries with HLT activity state when
> > > > > it's not supported or by disallowing clearing this bit.
> > > > >
> > > > > The latter is preferable. If we go with the former, to disable
> > > > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
> > > > > 1" control, otherwise KVM will be presenting a bogus model to L1.
> > > > >
> > > > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
> > > > > compatibility.
> > > >
> > > > Paolo, do we actually need to maintain backwards compatibility in this
> > > > case?  This seems like a good candidate for "fix the bug and see who yells".
> > >
> > > Google's userspace clears bit 6. Please don't fail that write!
> >
> > Booooo.
> >
> 
> Supporting activity state HLT is on our list of things to do, but I'm
> not convinced that kvm actually handles it properly yet. For
> instance...

I fully understand why you'd want to hide it from L1, I was just bummed
that we couldn't go with a quick and dirty fix :-)

> What happens if L1 launches L2 into activity state HLT with a
> zero-valued VMX preemption timer? (Maybe this is fixed now?)

I think that one got fixed in vmx_start_preemption_timer().

> What happens if "monitor trap flag" is set and "HLT exiting" is clear
> in the vmcs12, and immediately on VM-entry, L2 executes HLT? (Yes,
> this is a special case of MTF being broken when L0 emulates an L2
> instruction.)
> 
> I'm sure there are other interesting scenarios that haven't been validated.
Jim Mattson Aug. 22, 2019, 5:58 p.m. UTC | #6
On Mon, Aug 19, 2019 at 2:47 PM Nikita Leshenko
<nikita.leshchenko@oracle.com> wrote:
>
> Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
> VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
> VMCS12. We can fix it by either failing VM entries with HLT activity state when
> it's not supported or by disallowing clearing this bit.
>
> The latter is preferable. If we go with the former, to disable
> GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
> 1" control, otherwise KVM will be presenting a bogus model to L1.
>
> Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
> compatibility.
>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 46af3a5e9209..24734946ec75 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1102,6 +1102,14 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
>         if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc))
>                 return -EINVAL;
>
> +       /*
> +        * We always support HLT activity state. In the past it was possible to
> +        * turn HLT bit off (without actually turning off HLT activity state
> +        * support) so we don't fail vmx_restore_vmx_misc if this bit is turned
> +        * off.
> +        */
> +       data |= VMX_MISC_ACTIVITY_HLT;
> +
>         vmx->nested.msrs.misc_low = data;
>         vmx->nested.msrs.misc_high = data >> 32;
>

This change breaks live migration to an upgraded kernel, since it
doesn't allow the IA32_VMX_MISC MSR to be restored to its original
value. I think this warrants a quirk.
Nikita Leshenko Aug. 26, 2019, 11:30 a.m. UTC | #7
> On 21 Aug 2019, at 23:59, Jim Mattson <jmattson@google.com> wrote:
> 
> On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
>> 
>> On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote:
>>> Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in
>>> VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in
>>> VMCS12. We can fix it by either failing VM entries with HLT activity state when
>>> it's not supported or by disallowing clearing this bit.
>>> 
>>> The latter is preferable. If we go with the former, to disable
>>> GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be
>>> 1" control, otherwise KVM will be presenting a bogus model to L1.
>>> 
>>> Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards
>>> compatibility.
>> 
>> Paolo, do we actually need to maintain backwards compatibility in this
>> case?  This seems like a good candidate for "fix the bug and see who yells".
> 
> Google's userspace clears bit 6. Please don't fail that write!
What happens if the guest tries to use HLT activity state regardless of the bit?
Currently in KVM the guest will succeed, since there are no checks on VM entry for
that. I previously submitted a patch[1] that adds a check for that, what do you think
about it?

[1] https://patchwork.kernel.org/patch/11092209/

Nikita
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 46af3a5e9209..24734946ec75 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1102,6 +1102,14 @@  static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
 	if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc))
 		return -EINVAL;
 
+	/*
+	 * We always support HLT activity state. In the past it was possible to
+	 * turn HLT bit off (without actually turning off HLT activity state
+	 * support) so we don't fail vmx_restore_vmx_misc if this bit is turned
+	 * off.
+	 */
+	data |= VMX_MISC_ACTIVITY_HLT;
+
 	vmx->nested.msrs.misc_low = data;
 	vmx->nested.msrs.misc_high = data >> 32;