diff mbox series

[v2,1/7] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write

Message ID 20220204204705.3538240-2-oupton@google.com (mailing list archive)
State New, archived
Headers show
Series VMX: nVMX: VMX control MSR fixes | expand

Commit Message

Oliver Upton Feb. 4, 2022, 8:46 p.m. UTC
Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
when guest MPX disabled"), KVM has taken ownership of the "load
IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
MSRs if the guest's CPUID supports MPX, and clear otherwise.

However, KVM will only do so if userspace sets the CPUID before writing
to the corresponding MSRs. Of course, there are no ordering requirements
between these ioctls. Uphold the ABI regardless of ordering by
reapplying KVMs tweaks to the VMX control MSRs after userspace has
written to them.

Fixes: 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled")
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/vmx/nested.c | 9 +++++++++
 arch/x86/kvm/vmx/vmx.c    | 2 +-
 arch/x86/kvm/vmx/vmx.h    | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Feb. 7, 2022, 5:21 p.m. UTC | #1
On 2/4/22 21:46, Oliver Upton wrote:
> Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
> when guest MPX disabled"), KVM has taken ownership of the "load
> IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
> is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
> MSRs if the guest's CPUID supports MPX, and clear otherwise.
> 
> However, KVM will only do so if userspace sets the CPUID before writing
> to the corresponding MSRs. Of course, there are no ordering requirements
> between these ioctls. Uphold the ABI regardless of ordering by
> reapplying KVMs tweaks to the VMX control MSRs after userspace has
> written to them.

I don't understand this patch.  If you first write the CPUID and then 
the MSR, the consistency is upheld by these checks:

         if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
                 return -EINVAL;

         if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
                 return -EINVAL;

If you're fixing a case where userspace first writes the MSR and then 
sets CPUID, then I would expect that KVM_SET_CPUID2 redoes those checks 
and, if they fail, it adjusts the MSRs.

The selftests confirm that I'm a bit confused, but in general 
vmx_restore_control_msr is not the place where I was expecting the change.

Paolo

> Fixes: 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled")
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>   arch/x86/kvm/vmx/nested.c | 9 +++++++++
>   arch/x86/kvm/vmx/vmx.c    | 2 +-
>   arch/x86/kvm/vmx/vmx.h    | 2 ++
>   3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index ba34e94049c7..59164394569f 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1291,6 +1291,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
>   
>   	*lowp = data;
>   	*highp = data >> 32;
> +
> +	/*
> +	 * Ensure KVM fiddling with these MSRs is preserved after userspace
> +	 * write.
> +	 */
> +	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
> +	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
> +		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
> +
>   	return 0;
>   }
>   
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index aca3ae2a02f3..d63d6dfbadbf 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7227,7 +7227,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
>   #undef cr4_fixed1_update
>   }
>   
> -static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
> +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
>   
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 7f2c82e7f38f..e134e2763502 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -423,6 +423,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
>   
>   void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
>   
> +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
> +
>   /*
>    * Note, early Intel manuals have the write-low and read-high bitmap offsets
>    * the wrong way round.  The bitmaps control MSRs 0x00000000-0x00001fff and
Sean Christopherson Feb. 7, 2022, 6:13 p.m. UTC | #2
On Mon, Feb 07, 2022, Paolo Bonzini wrote:
> On 2/4/22 21:46, Oliver Upton wrote:
> > Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
> > when guest MPX disabled"), KVM has taken ownership of the "load
> > IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
> > is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
> > MSRs if the guest's CPUID supports MPX, and clear otherwise.
> > 
> > However, KVM will only do so if userspace sets the CPUID before writing
> > to the corresponding MSRs. Of course, there are no ordering requirements
> > between these ioctls. Uphold the ABI regardless of ordering by
> > reapplying KVMs tweaks to the VMX control MSRs after userspace has
> > written to them.
> 
> I don't understand this patch.  If you first write the CPUID and then the
> MSR, the consistency is upheld by these checks:
> 
>         if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
>                 return -EINVAL;
> 
>         if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
>                 return -EINVAL;
> 
> If you're fixing a case where userspace first writes the MSR and then sets
> CPUID, then I would expect that KVM_SET_CPUID2 redoes those checks and, if
> they fail, it adjusts the MSRs.
> 
> The selftests confirm that I'm a bit confused, but in general
> vmx_restore_control_msr is not the place where I was expecting the change.

Do we even need this change?  The ABI is whatever it is, not what may or may not
have been intended by a flawed, 3+ year old commit.   E.g. if there's a userspace
that relies on being able to override KVM's tweaks by writing the MSRs after
setting CPUID, then this commit will break the ABI for that userspace.  The quirk
should be sufficient warning that KVM's behavior is funky.
Oliver Upton Feb. 7, 2022, 6:22 p.m. UTC | #3
Hi Paolo,

On Mon, Feb 07, 2022 at 06:21:30PM +0100, Paolo Bonzini wrote:
> On 2/4/22 21:46, Oliver Upton wrote:
> > Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
> > when guest MPX disabled"), KVM has taken ownership of the "load
> > IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
> > is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
> > MSRs if the guest's CPUID supports MPX, and clear otherwise.
> > 
> > However, KVM will only do so if userspace sets the CPUID before writing
> > to the corresponding MSRs. Of course, there are no ordering requirements
> > between these ioctls. Uphold the ABI regardless of ordering by
> > reapplying KVMs tweaks to the VMX control MSRs after userspace has
> > written to them.
> 
> I don't understand this patch.  If you first write the CPUID and then the
> MSR, the consistency is upheld by these checks:
> 
>         if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
>                 return -EINVAL;
> 
>         if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
>                 return -EINVAL;

Right, this works if KVM chose to clear the bit, but userspace is trying
to set it. If KVM chose to set the bit, and userspace attempts to clear
it, these checks would pass.

> If you're fixing a case where userspace first writes the MSR and then sets
> CPUID, then I would expect that KVM_SET_CPUID2 redoes those checks and, if
> they fail, it adjusts the MSRs.

I am fixing the case where userspace issues KVM_SET_CPUID2 then writes
to the corresponding MSRs.

Until recently, this all sort of 'worked'. Since we called
kvm_update_cpuid() all the time it was possible for KVM to overwrite the
bits after the MSR write, just not immediately so. After the whole CPUID
rework, we only update the VMX control MSRs immediately after a
KVM_SET_CPUID2, meaning we've missed the case of MSR write after CPUID.

The entire spirit of this series is to back KVM out of touching these
bits, but it seemingly requires a quirk to do so given the userspace
expectations around these bits. Given that, these first two patches fix
the ordering issue between KVM_SET_CPUID2 and an MSR write and enforce
KVM ownership unconditionally.

--
Thanks,
Oliver
Paolo Bonzini Feb. 7, 2022, 6:27 p.m. UTC | #4
On 2/7/22 19:22, Oliver Upton wrote:
> Hi Paolo,
> 
> On Mon, Feb 07, 2022 at 06:21:30PM +0100, Paolo Bonzini wrote:
>> On 2/4/22 21:46, Oliver Upton wrote:
>>> Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
>>> when guest MPX disabled"), KVM has taken ownership of the "load
>>> IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
>>> is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
>>> MSRs if the guest's CPUID supports MPX, and clear otherwise.
>>>
>>> However, KVM will only do so if userspace sets the CPUID before writing
>>> to the corresponding MSRs. Of course, there are no ordering requirements
>>> between these ioctls. Uphold the ABI regardless of ordering by
>>> reapplying KVMs tweaks to the VMX control MSRs after userspace has
>>> written to them.
>>
>> I don't understand this patch.  If you first write the CPUID and then the
>> MSR, the consistency is upheld by these checks:
>>
>>          if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
>>                  return -EINVAL;
>>
>>          if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
>>                  return -EINVAL;
> 
> Right, this works if KVM chose to clear the bit, but userspace is trying
> to set it. If KVM chose to set the bit, and userspace attempts to clear
> it, these checks would pass.

Okay, that's what I expected too but I thought it would be okay that the 
checks pass.  Are you trying to undo an involuntary API change, and if 
so why was the change a bug and not a fix?

Paolo
Sean Christopherson Feb. 7, 2022, 6:34 p.m. UTC | #5
On Mon, Feb 07, 2022, Oliver Upton wrote:
> Until recently, this all sort of 'worked'. Since we called
> kvm_update_cpuid() all the time it was possible for KVM to overwrite the
> bits after the MSR write, just not immediately so. After the whole CPUID
> rework, we only update the VMX control MSRs immediately after a
> KVM_SET_CPUID2, meaning we've missed the case of MSR write after CPUID.

That needs to be explained in the changelog (ditto for patch 02), and arguably
the Fixes tag is wrong too, or at least incomplete.  The commit that truly broke
things was

  aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()")

I'm guessing this is why Paolo is also confused.  Without understanding that KVM
used too (eventually) enforce its overrides, it looks like you're proposing an
arbitrary, unnecessary ABI change.
Oliver Upton Feb. 7, 2022, 6:52 p.m. UTC | #6
On Mon, Feb 07, 2022 at 06:34:19PM +0000, Sean Christopherson wrote:
> On Mon, Feb 07, 2022, Oliver Upton wrote:
> > Until recently, this all sort of 'worked'. Since we called
> > kvm_update_cpuid() all the time it was possible for KVM to overwrite the
> > bits after the MSR write, just not immediately so. After the whole CPUID
> > rework, we only update the VMX control MSRs immediately after a
> > KVM_SET_CPUID2, meaning we've missed the case of MSR write after CPUID.
> 
> That needs to be explained in the changelog (ditto for patch 02), and arguably
> the Fixes tag is wrong too, or at least incomplete.  The commit that truly broke
> things was
> 
>   aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()")
> 
> I'm guessing this is why Paolo is also confused.  Without understanding that KVM
> used too (eventually) enforce its overrides, it looks like you're proposing an
> arbitrary, unnecessary ABI change.

Gah, sorry, I really didn't provide the full context on this. I chose to
blame the original commits for these since it was still possible to
write the MSR and avoid a KVM update (just looking for paths where
kvm_update_cpuid() is not called), but agree that full breakage came
from the above commit.

I'll add some language discussing how commit aedbaf4f6afd ("KVM: x86: Extract
kvm_update_cpuid_runtime() from kvm_update_cpuid()") fully broke this.

--
Thanks,
Oliver
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ba34e94049c7..59164394569f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1291,6 +1291,15 @@  vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
 
 	*lowp = data;
 	*highp = data >> 32;
+
+	/*
+	 * Ensure KVM fiddling with these MSRs is preserved after userspace
+	 * write.
+	 */
+	if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
+	    msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
+		nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index aca3ae2a02f3..d63d6dfbadbf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7227,7 +7227,7 @@  static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 #undef cr4_fixed1_update
 }
 
-static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 7f2c82e7f38f..e134e2763502 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -423,6 +423,8 @@  static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 
 void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
+
 /*
  * Note, early Intel manuals have the write-low and read-high bitmap offsets
  * the wrong way round.  The bitmaps control MSRs 0x00000000-0x00001fff and