diff mbox series

[v4,1/8] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write

Message ID 20220301060351.442881-2-oupton@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: VMX ctrl MSR + KVM quirk fixes | expand

Commit Message

Oliver Upton March 1, 2022, 6:03 a.m. UTC
Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
when guest MPX disabled"), KVM has taken ownership of the "load
IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
MSRs if the guest's CPUID supports MPX, and clear otherwise.

However, commit aedbaf4f6afd ("KVM: x86: Extract
kvm_update_cpuid_runtime() from kvm_update_cpuid()") partially broke KVM
ownership of the aforementioned bits. Before, kvm_update_cpuid() was
exercised frequently when running a guest and constantly applied its own
changes to the BNDCFGS bits. Now, the BNDCFGS bits are only ever
updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a
subsequent MSR write from userspace will clobber these values.

Uphold the old ABI by reapplying KVM's tweaks to the BNDCFGS bits after
an MSR write from userspace.

Fixes: aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()")
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/vmx/nested.c | 9 +++++++++
 arch/x86/kvm/vmx/vmx.c    | 2 +-
 arch/x86/kvm/vmx/vmx.h    | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini March 1, 2022, 6 p.m. UTC | #1
On 3/1/22 07:03, Oliver Upton wrote:
> +
> +	/*
> +	 * Ensure KVM fiddling with these MSRs is preserved after userspace
> +	 * write.
> +	 */
> +	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
> +	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
> +		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
> +

I still don't understand this patch.  You say:

> Now, the BNDCFGS bits are only ever
> updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a
> subsequent MSR write from userspace will clobber these values.

but I don't understand what's wrong with that.  If you can (if so 
inclined) define a VM without LOAD_BNDCFGS or CLEAR_BNDCFGS even if MPX 
enabled, commit aedbaf4f6afd counts as a bugfix.

Paolo
Oliver Upton March 1, 2022, 6:43 p.m. UTC | #2
Hi Paolo,

On Tue, Mar 01, 2022 at 07:00:57PM +0100, Paolo Bonzini wrote:
> On 3/1/22 07:03, Oliver Upton wrote:
> > +
> > +	/*
> > +	 * Ensure KVM fiddling with these MSRs is preserved after userspace
> > +	 * write.
> > +	 */
> > +	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
> > +	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
> > +		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
> > +
> 
> I still don't understand this patch.  You say:
> 
> > Now, the BNDCFGS bits are only ever
> > updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a
> > subsequent MSR write from userspace will clobber these values.
> 
> but I don't understand what's wrong with that.  If you can (if so inclined)
> define a VM without LOAD_BNDCFGS or CLEAR_BNDCFGS even if MPX enabled,
> commit aedbaf4f6afd counts as a bugfix.

Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the
responsibility of userspace. My issue is that the commit message in
commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when
guest MPX disabled") suggests that userspace can expect these bits to be
configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd
("KVM: x86: Extract kvm_update_cpuid_runtime() from
kvm_update_cpuid()"), if userspace clears these bits, KVM will continue
to set them based on CPUID.

What is the userspace expectation here? If we are saying that changes to
IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a
bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit
message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to
configure these bits based on guest CPUID.

Given that there were previous userspace expectations, I attempted to
restore the old behavior of KVM (ignore userspace writes) and add a
quirk to fully back out of the mess. All this logic also applies to
Patch 2 as well.

--
Oliver
Paolo Bonzini March 2, 2022, 12:21 p.m. UTC | #3
On 3/1/22 19:43, Oliver Upton wrote:
> Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the
> responsibility of userspace. My issue is that the commit message in
> commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when
> guest MPX disabled") suggests that userspace can expect these bits to be
> configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd
> ("KVM: x86: Extract kvm_update_cpuid_runtime() from
> kvm_update_cpuid()"), if userspace clears these bits, KVM will continue
> to set them based on CPUID.
> 
> What is the userspace expectation here? If we are saying that changes to
> IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a
> bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit
> message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to
> configure these bits based on guest CPUID.

Yes, but I think it's reasonable that userspace wants to override them. 
  It has to do that after KVM_SET_CPUID2, but that's okay too.

Paolo
Oliver Upton March 2, 2022, 8:51 p.m. UTC | #4
On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote:
> On 3/1/22 19:43, Oliver Upton wrote:
> > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the
> > responsibility of userspace. My issue is that the commit message in
> > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when
> > guest MPX disabled") suggests that userspace can expect these bits to be
> > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd
> > ("KVM: x86: Extract kvm_update_cpuid_runtime() from
> > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue
> > to set them based on CPUID.
> > 
> > What is the userspace expectation here? If we are saying that changes to
> > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a
> > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit
> > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to
> > configure these bits based on guest CPUID.
> 
> Yes, but I think it's reasonable that userspace wants to override them.  It
> has to do that after KVM_SET_CPUID2, but that's okay too.
> 

In that case, I can rework the tests at the end of this series to ensure
userspace's ability to override w/o a quirk. Sorry for the toil,
aedbaf4f6afd caused some breakage for us internally, but really is just
a userspace bug.

Is it possible to pick up patch 4/8 "KVM: x86: Introduce
KVM_CAP_DISABLE_QUIRKS2" independent of the rest of this series?

--
Thanks,
Oliver
Paolo Bonzini March 2, 2022, 9:22 p.m. UTC | #5
On 3/2/22 21:51, Oliver Upton wrote:
> On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote:
>> On 3/1/22 19:43, Oliver Upton wrote:
>>> Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the
>>> responsibility of userspace. My issue is that the commit message in
>>> commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when
>>> guest MPX disabled") suggests that userspace can expect these bits to be
>>> configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd
>>> ("KVM: x86: Extract kvm_update_cpuid_runtime() from
>>> kvm_update_cpuid()"), if userspace clears these bits, KVM will continue
>>> to set them based on CPUID.
>>>
>>> What is the userspace expectation here? If we are saying that changes to
>>> IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a
>>> bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit
>>> message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to
>>> configure these bits based on guest CPUID.
>>
>> Yes, but I think it's reasonable that userspace wants to override them.  It
>> has to do that after KVM_SET_CPUID2, but that's okay too.
>>
> 
> In that case, I can rework the tests at the end of this series to ensure
> userspace's ability to override w/o a quirk. Sorry for the toil,
> aedbaf4f6afd caused some breakage for us internally, but really is just
> a userspace bug.

How did vanadium break?

Paolo

> Is it possible to pick up patch 4/8 "KVM: x86: Introduce
> KVM_CAP_DISABLE_QUIRKS2" independent of the rest of this series?
Oliver Upton March 2, 2022, 9:54 p.m. UTC | #6
On Wed, Mar 02, 2022 at 10:22:43PM +0100, Paolo Bonzini wrote:
> On 3/2/22 21:51, Oliver Upton wrote:
> > On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote:
> > > On 3/1/22 19:43, Oliver Upton wrote:
> > > > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the
> > > > responsibility of userspace. My issue is that the commit message in
> > > > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when
> > > > guest MPX disabled") suggests that userspace can expect these bits to be
> > > > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd
> > > > ("KVM: x86: Extract kvm_update_cpuid_runtime() from
> > > > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue
> > > > to set them based on CPUID.
> > > > 
> > > > What is the userspace expectation here? If we are saying that changes to
> > > > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a
> > > > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit
> > > > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to
> > > > configure these bits based on guest CPUID.
> > > 
> > > Yes, but I think it's reasonable that userspace wants to override them.  It
> > > has to do that after KVM_SET_CPUID2, but that's okay too.
> > > 
> > 
> > In that case, I can rework the tests at the end of this series to ensure
> > userspace's ability to override w/o a quirk. Sorry for the toil,
> > aedbaf4f6afd caused some breakage for us internally, but really is just
> > a userspace bug.
> 
> How did vanadium break?

Maybe I can redirect you to a test case to highlight a possible
regression in KVM, as seen by userspace ;-)

from Patch 7/8:

> +	/*
> +	 * Test that KVM will set these bits regardless of userspace if the
> +	 * guest CPUID exposes a supporting vPMU.
> +	 */
> +	test_vmx_control_msr(vm, MSR_IA32_VMX_TRUE_ENTRY_CTLS,
> +			     0,						/* set */
> +			     VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,	/* clear */
> +			     VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,	/* exp_set */
> +			     0);					/* exp_clear */
> +	test_vmx_control_msr(vm, MSR_IA32_VMX_TRUE_EXIT_CTLS,
> +			     0,						/* set */
> +			     VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,	/* clear */
> +			     VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,	/* exp_set */
> +			     0);					/* exp_clear */

Same goes for the "{load,clear} IA32_BNDCFGS" bits too.

--
Thanks,
Oliver
Sean Christopherson March 3, 2022, 1:43 a.m. UTC | #7
On Wed, Mar 02, 2022, Oliver Upton wrote:
> On Wed, Mar 02, 2022 at 10:22:43PM +0100, Paolo Bonzini wrote:
> > On 3/2/22 21:51, Oliver Upton wrote:
> > > On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote:
> > > > On 3/1/22 19:43, Oliver Upton wrote:
> > > > > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the
> > > > > responsibility of userspace. My issue is that the commit message in
> > > > > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when
> > > > > guest MPX disabled") suggests that userspace can expect these bits to be
> > > > > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd
> > > > > ("KVM: x86: Extract kvm_update_cpuid_runtime() from
> > > > > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue
> > > > > to set them based on CPUID.
> > > > > 
> > > > > What is the userspace expectation here? If we are saying that changes to
> > > > > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a
> > > > > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit
> > > > > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to
> > > > > configure these bits based on guest CPUID.
> > > > 
> > > > Yes, but I think it's reasonable that userspace wants to override them.  It
> > > > has to do that after KVM_SET_CPUID2, but that's okay too.
> > > > 
> > > 
> > > In that case, I can rework the tests at the end of this series to ensure
> > > userspace's ability to override w/o a quirk. Sorry for the toil,
> > > aedbaf4f6afd caused some breakage for us internally, but really is just
> > > a userspace bug.
> > 
> > How did vanadium break?
> 
> Maybe I can redirect you to a test case to highlight a possible
> regression in KVM, as seen by userspace ;-)

Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking
with unrelated things.  The original hack was to fix a userspace bug and should
never have been mreged.
Paolo Bonzini March 3, 2022, 6:29 a.m. UTC | #8
On 3/3/22 02:43, Sean Christopherson wrote:
>> Maybe I can redirect you to a test case to highlight a possible
>> regression in KVM, as seen by userspace;-)
> Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking
> with unrelated things.  The original hack was to fix a userspace bug and should
> never have been mreged.

Note that it dates back to:

     commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8
     Author: Liran Alon <liran.alon@oracle.com>
     Date:   Fri Sep 14 03:25:52 2018 +0300

     KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
     
     Before this commit, KVM exposes MPX VMX controls to L1 guest only based
     on if KVM and host processor supports MPX virtualization.
     However, these controls should be exposed to guest only in case guest
     vCPU supports MPX.

It's not to fix a userspace bug, it's to support userspace that doesn't
know about using KVM_SET_MSR for VMX features---which is okay since unlike
KVM_SET_CPUID2 it's not a mandatory call.

Paolo
Sean Christopherson March 3, 2022, 4:15 p.m. UTC | #9
On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> On 3/3/22 02:43, Sean Christopherson wrote:
> > > Maybe I can redirect you to a test case to highlight a possible
> > > regression in KVM, as seen by userspace;-)
> > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking
> > with unrelated things.  The original hack was to fix a userspace bug and should
> > never have been mreged.
> 
> Note that it dates back to:
> 
>     commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8
>     Author: Liran Alon <liran.alon@oracle.com>
>     Date:   Fri Sep 14 03:25:52 2018 +0300
> 
>     KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
>     Before this commit, KVM exposes MPX VMX controls to L1 guest only based
>     on if KVM and host processor supports MPX virtualization.
>     However, these controls should be exposed to guest only in case guest
>     vCPU supports MPX.
> 
> It's not to fix a userspace bug, it's to support userspace that doesn't
> know about using KVM_SET_MSR for VMX features---which is okay since unlike
> KVM_SET_CPUID2 it's not a mandatory call.

I disagree, IMO failure to properly configure the vCPU model is a userspace bug.
Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM
ABI, but it's still a userspace bug.  One could argue that KVM should disable/clear
VMX features if userspace clears a related CPUID feature, but _setting_ a VMX
feature based on CPUID is architecturally wrong.  Even if we consider one or both
cases to be desirable behavior in terms of creating a consistent vCPU model, forcing
a consistent vCPU model for this one case goes against every other ioctl in KVM's
ABI.

If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then
KVM has a bunch of "bugs".

  X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA

  X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING,
                     SECONDARY_EXEC_TSC_SCALING

  X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID

  X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING

  X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA,
                          VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL

  X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES
Jim Mattson March 3, 2022, 9:44 p.m. UTC | #10
On Thu, Mar 3, 2022 at 8:15 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> > On 3/3/22 02:43, Sean Christopherson wrote:
> > > > Maybe I can redirect you to a test case to highlight a possible
> > > > regression in KVM, as seen by userspace;-)
> > > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking
> > > with unrelated things.  The original hack was to fix a userspace bug and should
> > > never have been mreged.
> >
> > Note that it dates back to:
> >
> >     commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8
> >     Author: Liran Alon <liran.alon@oracle.com>
> >     Date:   Fri Sep 14 03:25:52 2018 +0300
> >
> >     KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
> >     Before this commit, KVM exposes MPX VMX controls to L1 guest only based
> >     on if KVM and host processor supports MPX virtualization.
> >     However, these controls should be exposed to guest only in case guest
> >     vCPU supports MPX.
> >
> > It's not to fix a userspace bug, it's to support userspace that doesn't
> > know about using KVM_SET_MSR for VMX features---which is okay since unlike
> > KVM_SET_CPUID2 it's not a mandatory call.
>
> I disagree, IMO failure to properly configure the vCPU model is a userspace bug.
> Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM
> ABI, but it's still a userspace bug.  One could argue that KVM should disable/clear
> VMX features if userspace clears a related CPUID feature, but _setting_ a VMX
> feature based on CPUID is architecturally wrong.  Even if we consider one or both
> cases to be desirable behavior in terms of creating a consistent vCPU model, forcing
> a consistent vCPU model for this one case goes against every other ioctl in KVM's
> ABI.
>
> If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then
> KVM has a bunch of "bugs".
>
>   X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA
>
>   X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING,
>                      SECONDARY_EXEC_TSC_SCALING
>
>   X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID
>
>   X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING
>
>   X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA,
>                           VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL
>
>   X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES

I don't disagree with you, but this does beg the question, "What's
going on with all of the invocations of cr4_fixed1_update()?"
Sean Christopherson March 3, 2022, 11:44 p.m. UTC | #11
On Thu, Mar 03, 2022, Jim Mattson wrote:
> On Thu, Mar 3, 2022 at 8:15 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> > > On 3/3/22 02:43, Sean Christopherson wrote:
> > > > > Maybe I can redirect you to a test case to highlight a possible
> > > > > regression in KVM, as seen by userspace;-)
> > > > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking
> > > > with unrelated things.  The original hack was to fix a userspace bug and should
> > > > never have been mreged.
> > >
> > > Note that it dates back to:
> > >
> > >     commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8
> > >     Author: Liran Alon <liran.alon@oracle.com>
> > >     Date:   Fri Sep 14 03:25:52 2018 +0300
> > >
> > >     KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
> > >     Before this commit, KVM exposes MPX VMX controls to L1 guest only based
> > >     on if KVM and host processor supports MPX virtualization.
> > >     However, these controls should be exposed to guest only in case guest
> > >     vCPU supports MPX.
> > >
> > > It's not to fix a userspace bug, it's to support userspace that doesn't
> > > know about using KVM_SET_MSR for VMX features---which is okay since unlike
> > > KVM_SET_CPUID2 it's not a mandatory call.
> >
> > I disagree, IMO failure to properly configure the vCPU model is a userspace bug.
> > Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM
> > ABI, but it's still a userspace bug.  One could argue that KVM should disable/clear
> > VMX features if userspace clears a related CPUID feature, but _setting_ a VMX
> > feature based on CPUID is architecturally wrong.  Even if we consider one or both
> > cases to be desirable behavior in terms of creating a consistent vCPU model, forcing
> > a consistent vCPU model for this one case goes against every other ioctl in KVM's
> > ABI.
> >
> > If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then
> > KVM has a bunch of "bugs".
> >
> >   X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA
> >
> >   X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING,
> >                      SECONDARY_EXEC_TSC_SCALING
> >
> >   X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID
> >
> >   X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING
> >
> >   X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA,
> >                           VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL
> >
> >   X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES
> 
> I don't disagree with you, but this does beg the question, "What's
> going on with all of the invocations of cr4_fixed1_update()?"

Boo, I forgot legal CR4 is controlled via MSRs too.  Ha!  That's a bug in nVMX.
nVMX only checks msrs.cr4_fixed0/1, it doesn't check "cr4_reserved_bits", which
is KVM's set of host reserved bits.  That means userspace can bypass those reserved
bits by setting guest CPUID and/or VMX MSRs and loading CR4 via VM-Enter/VM-Exit.

The immediate nVMX bug can be fixed by calling kvm_is_valid_cr4(), which calls
back into nVMX to do the VMX MSR checks.

My vote would be to include nested_vmx_cr_fixed1_bits_update() in the quirk, but
keep the guest CPUID enforcement that's in kvm_is_valid_cr4().  I.e. let userspace
further restrict CR4, but don't let it allow nested VM-Enter/VM-Exit to load bits
that L1 can't set via MOV CR4.

I'll send this as a proper patch:

diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index c92cea0b8ccc..46dd1967ec08 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
 }

 /* No difference in the restrictions on guest and host CR4 in VMX operation. */
-#define nested_guest_cr4_valid nested_cr4_valid
-#define nested_host_cr4_valid  nested_cr4_valid
+#define nested_guest_cr4_valid kvm_is_valid_cr4
+#define nested_host_cr4_valid  kvm_is_valid_cr4

 extern struct kvm_x86_nested_ops vmx_nested_ops;
Paolo Bonzini March 4, 2022, 3:50 p.m. UTC | #12
On 3/4/22 00:44, Sean Christopherson wrote:
> 
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index c92cea0b8ccc..46dd1967ec08 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
>  }
> 
>  /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> -#define nested_guest_cr4_valid nested_cr4_valid
> -#define nested_host_cr4_valid  nested_cr4_valid
> +#define nested_guest_cr4_valid kvm_is_valid_cr4
> +#define nested_host_cr4_valid  kvm_is_valid_cr4

This doesn't allow the theoretically possible case of L0 setting some 
CR4-fixed-0 bits for L1.  I'll send another one.

Paolo
Sean Christopherson April 7, 2022, 12:26 a.m. UTC | #13
On Fri, Mar 04, 2022, Paolo Bonzini wrote:
> On 3/4/22 00:44, Sean Christopherson wrote:
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> > index c92cea0b8ccc..46dd1967ec08 100644
> > --- a/arch/x86/kvm/vmx/nested.h
> > +++ b/arch/x86/kvm/vmx/nested.h
> > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> >  }
> > 
> >  /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> > -#define nested_guest_cr4_valid nested_cr4_valid
> > -#define nested_host_cr4_valid  nested_cr4_valid
> > +#define nested_guest_cr4_valid kvm_is_valid_cr4
> > +#define nested_host_cr4_valid  kvm_is_valid_cr4
> 
> This doesn't allow the theoretically possible case of L0 setting some
> CR4-fixed-0 bits for L1.  I'll send another one.

Are you still planning on sending a proper patch for this?

And more importantly, have we shifted your view on this patch/series?
Oliver Upton April 7, 2022, 12:29 a.m. UTC | #14
Hey Sean,

On Wed, Apr 6, 2022 at 5:26 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Mar 04, 2022, Paolo Bonzini wrote:
> > On 3/4/22 00:44, Sean Christopherson wrote:
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> > > index c92cea0b8ccc..46dd1967ec08 100644
> > > --- a/arch/x86/kvm/vmx/nested.h
> > > +++ b/arch/x86/kvm/vmx/nested.h
> > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> > >  }
> > >
> > >  /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> > > -#define nested_guest_cr4_valid nested_cr4_valid
> > > -#define nested_host_cr4_valid  nested_cr4_valid
> > > +#define nested_guest_cr4_valid kvm_is_valid_cr4
> > > +#define nested_host_cr4_valid  kvm_is_valid_cr4
> >
> > This doesn't allow the theoretically possible case of L0 setting some
> > CR4-fixed-0 bits for L1.  I'll send another one.
>
> Are you still planning on sending a proper patch for this?
>
> And more importantly, have we shifted your view on this patch/series?

Sorry, I should've followed up. If nobody else complains, let's just
leave everything as-is and avoid repeating the mistakes of the patches
to blame (hey, I authored one of those!)

--
Best,
Oliver
Oliver Upton April 7, 2022, 12:32 a.m. UTC | #15
On Wed, Apr 6, 2022 at 5:29 PM Oliver Upton <oupton@google.com> wrote:
>
> Hey Sean,
>
> On Wed, Apr 6, 2022 at 5:26 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Mar 04, 2022, Paolo Bonzini wrote:
> > > On 3/4/22 00:44, Sean Christopherson wrote:
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> > > > index c92cea0b8ccc..46dd1967ec08 100644
> > > > --- a/arch/x86/kvm/vmx/nested.h
> > > > +++ b/arch/x86/kvm/vmx/nested.h
> > > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> > > >  }
> > > >
> > > >  /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> > > > -#define nested_guest_cr4_valid nested_cr4_valid
> > > > -#define nested_host_cr4_valid  nested_cr4_valid
> > > > +#define nested_guest_cr4_valid kvm_is_valid_cr4
> > > > +#define nested_host_cr4_valid  kvm_is_valid_cr4
> > >
> > > This doesn't allow the theoretically possible case of L0 setting some
> > > CR4-fixed-0 bits for L1.  I'll send another one.
> >
> > Are you still planning on sending a proper patch for this?
> >
> > And more importantly, have we shifted your view on this patch/series?
>
> Sorry, I should've followed up. If nobody else complains, let's just
> leave everything as-is and avoid repeating the mistakes of the patches
> to blame (hey, I authored one of those!)

Well, that is to say we do nothing to plug the half-baked behavior of
changing MSRs based on CPUID. Quirk the rest of it away if we don't
want to fudge with these bits any more :)

--
Thanks,
Oliver
Sean Christopherson April 7, 2022, 12:34 a.m. UTC | #16
On Wed, Apr 06, 2022, Oliver Upton wrote:
> Hey Sean,
> 
> On Wed, Apr 6, 2022 at 5:26 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Mar 04, 2022, Paolo Bonzini wrote:
> > > On 3/4/22 00:44, Sean Christopherson wrote:
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> > > > index c92cea0b8ccc..46dd1967ec08 100644
> > > > --- a/arch/x86/kvm/vmx/nested.h
> > > > +++ b/arch/x86/kvm/vmx/nested.h
> > > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> > > >  }
> > > >
> > > >  /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> > > > -#define nested_guest_cr4_valid nested_cr4_valid
> > > > -#define nested_host_cr4_valid  nested_cr4_valid
> > > > +#define nested_guest_cr4_valid kvm_is_valid_cr4
> > > > +#define nested_host_cr4_valid  kvm_is_valid_cr4
> > >
> > > This doesn't allow the theoretically possible case of L0 setting some
> > > CR4-fixed-0 bits for L1.  I'll send another one.
> >
> > Are you still planning on sending a proper patch for this?
> >
> > And more importantly, have we shifted your view on this patch/series?
> 
> Sorry, I should've followed up. If nobody else complains, let's just
> leave everything as-is and avoid repeating the mistakes of the patches
> to blame (hey, I authored one of those!)

The problem is that if we leave things as is, someone will inevitably think it's
the right thing to do and will repeat those mistakes.  I don't see why we wouldn't
add the quirk, broken userspace gets to keep its broken behavior unless it opts
into to disabling the quirk.
Sean Christopherson May 27, 2022, 4:55 p.m. UTC | #17
/cast resurrect

On Fri, Mar 04, 2022, Paolo Bonzini wrote:
> On 3/4/22 00:44, Sean Christopherson wrote:
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> > index c92cea0b8ccc..46dd1967ec08 100644
> > --- a/arch/x86/kvm/vmx/nested.h
> > +++ b/arch/x86/kvm/vmx/nested.h
> > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> >  }
> > 
> >  /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> > -#define nested_guest_cr4_valid nested_cr4_valid
> > -#define nested_host_cr4_valid  nested_cr4_valid
> > +#define nested_guest_cr4_valid kvm_is_valid_cr4
> > +#define nested_host_cr4_valid  kvm_is_valid_cr4
> 
> This doesn't allow the theoretically possible case of L0 setting some
> CR4-fixed-0 bits for L1.  I'll send another one.

Ha!  My "patch" is correct.  kvm_is_valid_cr4() calls vmx_is_valid_cr4(), which
calls nested_cr4_valid() when the vCPU is post-VMXON, so it _does_ cover the fixed0
case.  I'll send a proper patch with a comment to call out that subtlety.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1dfe23963a9e..a13f8f4e3d82 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1291,6 +1291,15 @@  vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 
 	*lowp = data;
 	*highp = data >> 32;
+
+	/*
+	 * Ensure KVM fiddling with these MSRs is preserved after userspace
+	 * write.
+	 */
+	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
+	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
+		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b325f99b2177..3a97220c5f78 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7246,7 +7246,7 @@  static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 #undef cr4_fixed1_update
 }
 
-static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 7f2c82e7f38f..e134e2763502 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -423,6 +423,8 @@  static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 
 void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
+
 /*
  * Note, early Intel manuals have the write-low and read-high bitmap offsets
  * the wrong way round.  The bitmaps control MSRs 0x00000000-0x00001fff and