Message ID | 20220204204705.3538240-2-oupton@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VMX: nVMX: VMX control MSR fixes | expand |
On 2/4/22 21:46, Oliver Upton wrote: > Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls > when guest MPX disabled"), KVM has taken ownership of the "load > IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI > is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS > MSRs if the guest's CPUID supports MPX, and clear otherwise. > > However, KVM will only do so if userspace sets the CPUID before writing > to the corresponding MSRs. Of course, there are no ordering requirements > between these ioctls. Uphold the ABI regardless of ordering by > reapplying KVMs tweaks to the VMX control MSRs after userspace has > written to them. I don't understand this patch. If you first write the CPUID and then the MSR, the consistency is upheld by these checks: if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0))) return -EINVAL; if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32))) return -EINVAL; If you're fixing a case where userspace first writes the MSR and then sets CPUID, then I would expect that KVM_SET_CPUID2 redoes those checks and, if they fail, it adjusts the MSRs. The selftests confirm that I'm a bit confused, but in general vmx_restore_control_msr is not the place where I was expecting the change. Paolo > Fixes: 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled") > Signed-off-by: Oliver Upton <oupton@google.com> > --- > arch/x86/kvm/vmx/nested.c | 9 +++++++++ > arch/x86/kvm/vmx/vmx.c | 2 +- > arch/x86/kvm/vmx/vmx.h | 2 ++ > 3 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index ba34e94049c7..59164394569f 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -1291,6 +1291,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data) > > *lowp = data; > *highp = data >> 32; > + > + /* > + * Ensure KVM fiddling with these MSRs is preserved after userspace > + * write. > + */ > + if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || > + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) > + nested_vmx_entry_exit_ctls_update(&vmx->vcpu); > + > return 0; > } > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index aca3ae2a02f3..d63d6dfbadbf 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7227,7 +7227,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) > #undef cr4_fixed1_update > } > > -static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) > +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index 7f2c82e7f38f..e134e2763502 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -423,6 +423,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, > > void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); > > +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu); > + > /* > * Note, early Intel manuals have the write-low and read-high bitmap offsets > * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
On Mon, Feb 07, 2022, Paolo Bonzini wrote: > On 2/4/22 21:46, Oliver Upton wrote: > > Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls > > when guest MPX disabled"), KVM has taken ownership of the "load > > IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI > > is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS > > MSRs if the guest's CPUID supports MPX, and clear otherwise. > > > > However, KVM will only do so if userspace sets the CPUID before writing > > to the corresponding MSRs. Of course, there are no ordering requirements > > between these ioctls. Uphold the ABI regardless of ordering by > > reapplying KVMs tweaks to the VMX control MSRs after userspace has > > written to them. > > I don't understand this patch. If you first write the CPUID and then the > MSR, the consistency is upheld by these checks: > > if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0))) > return -EINVAL; > > if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32))) > return -EINVAL; > > If you're fixing a case where userspace first writes the MSR and then sets > CPUID, then I would expect that KVM_SET_CPUID2 redoes those checks and, if > they fail, it adjusts the MSRs. > > The selftests confirm that I'm a bit confused, but in general > vmx_restore_control_msr is not the place where I was expecting the change. Do we even need this change? The ABI is whatever it is, not what may or may not have been intended by a flawed, 3+ year old commit. E.g. if there's a userspace that relies on being able to override KVM's tweaks by writing the MSRs after setting CPUID, then this commit will break the ABI for that userspace. The quirk should be sufficient warning that KVM's behavior is funky.
Hi Paolo, On Mon, Feb 07, 2022 at 06:21:30PM +0100, Paolo Bonzini wrote: > On 2/4/22 21:46, Oliver Upton wrote: > > Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls > > when guest MPX disabled"), KVM has taken ownership of the "load > > IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI > > is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS > > MSRs if the guest's CPUID supports MPX, and clear otherwise. > > > > However, KVM will only do so if userspace sets the CPUID before writing > > to the corresponding MSRs. Of course, there are no ordering requirements > > between these ioctls. Uphold the ABI regardless of ordering by > > reapplying KVMs tweaks to the VMX control MSRs after userspace has > > written to them. > > I don't understand this patch. If you first write the CPUID and then the > MSR, the consistency is upheld by these checks: > > if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0))) > return -EINVAL; > > if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32))) > return -EINVAL; Right, this works if KVM chose to clear the bit, but userspace is trying to set it. If KVM chose to set the bit, and userspace attempts to clear it, these checks would pass. > If you're fixing a case where userspace first writes the MSR and then sets > CPUID, then I would expect that KVM_SET_CPUID2 redoes those checks and, if > they fail, it adjusts the MSRs. I am fixing the case where userspace issues KVM_SET_CPUID2 then writes to the corresponding MSRs. Until recently, this all sort of 'worked'. Since we called kvm_update_cpuid() all the time it was possible for KVM to overwrite the bits after the MSR write, just not immediately so. After the whole CPUID rework, we only update the VMX control MSRs immediately after a KVM_SET_CPUID2, meaning we've missed the case of MSR write after CPUID. The entire spirit of this series is to back KVM out of touching these bits, but it seemingly requires a quirk to do so given the userspace expectations around these bits. Given that, these first two patches fix the ordering issue between KVM_SET_CPUID2 and an MSR write and enforce KVM ownership unconditionally. -- Thanks, Oliver
On 2/7/22 19:22, Oliver Upton wrote: > Hi Paolo, > > On Mon, Feb 07, 2022 at 06:21:30PM +0100, Paolo Bonzini wrote: >> On 2/4/22 21:46, Oliver Upton wrote: >>> Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls >>> when guest MPX disabled"), KVM has taken ownership of the "load >>> IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI >>> is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS >>> MSRs if the guest's CPUID supports MPX, and clear otherwise. >>> >>> However, KVM will only do so if userspace sets the CPUID before writing >>> to the corresponding MSRs. Of course, there are no ordering requirements >>> between these ioctls. Uphold the ABI regardless of ordering by >>> reapplying KVMs tweaks to the VMX control MSRs after userspace has >>> written to them. >> >> I don't understand this patch. If you first write the CPUID and then the >> MSR, the consistency is upheld by these checks: >> >> if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0))) >> return -EINVAL; >> >> if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32))) >> return -EINVAL; > > Right, this works if KVM chose to clear the bit, but userspace is trying > to set it. If KVM chose to set the bit, and userspace attempts to clear > it, these checks would pass. Okay, that's what I expected too but I thought it would be okay that the checks pass. Are you trying to undo an involuntary API change, and if so why was the change a bug and not a fix? Paolo
On Mon, Feb 07, 2022, Oliver Upton wrote: > Until recently, this all sort of 'worked'. Since we called > kvm_update_cpuid() all the time it was possible for KVM to overwrite the > bits after the MSR write, just not immediately so. After the whole CPUID > rework, we only update the VMX control MSRs immediately after a > KVM_SET_CPUID2, meaning we've missed the case of MSR write after CPUID. That needs to be explained in the changelog (ditto for patch 02), and arguably the Fixes tag is wrong too, or at least incomplete. The commit that truly broke things was aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") I'm guessing this is why Paolo is also confused. Without understanding that KVM used too (eventually) enforce its overrides, it looks like you're proposing an arbitrary, unnecessary ABI change.
On Mon, Feb 07, 2022 at 06:34:19PM +0000, Sean Christopherson wrote: > On Mon, Feb 07, 2022, Oliver Upton wrote: > > Until recently, this all sort of 'worked'. Since we called > > kvm_update_cpuid() all the time it was possible for KVM to overwrite the > > bits after the MSR write, just not immediately so. After the whole CPUID > > rework, we only update the VMX control MSRs immediately after a > > KVM_SET_CPUID2, meaning we've missed the case of MSR write after CPUID. > > That needs to be explained in the changelog (ditto for patch 02), and arguably > the Fixes tag is wrong too, or at least incomplete. The commit that truly broke > things was > > aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") > > I'm guessing this is why Paolo is also confused. Without understanding that KVM > used too (eventually) enforce its overrides, it looks like you're proposing an > arbitrary, unnecessary ABI change. Gah, sorry, I really didn't provide the full context on this. I chose to blame the original commits for these since it was still possible to write the MSR and avoid a KVM update (just looking for paths where kvm_update_cpuid() is not called), but agree that full breakage came from the above commit. I'll add some language discussing how commit aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") fully broke this. -- Thanks, Oliver
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ba34e94049c7..59164394569f 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1291,6 +1291,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data) *lowp = data; *highp = data >> 32; + + /* + * Ensure KVM fiddling with these MSRs is preserved after userspace + * write. + */ + if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) + nested_vmx_entry_exit_ctls_update(&vmx->vcpu); + return 0; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index aca3ae2a02f3..d63d6dfbadbf 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7227,7 +7227,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) #undef cr4_fixed1_update } -static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 7f2c82e7f38f..e134e2763502 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -423,6 +423,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"), KVM has taken ownership of the "load IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports MPX, and clear otherwise. However, KVM will only do so if userspace sets the CPUID before writing to the corresponding MSRs. Of course, there are no ordering requirements between these ioctls. Uphold the ABI regardless of ordering by reapplying KVMs tweaks to the VMX control MSRs after userspace has written to them. Fixes: 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled") Signed-off-by: Oliver Upton <oupton@google.com> --- arch/x86/kvm/vmx/nested.c | 9 +++++++++ arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/vmx/vmx.h | 2 ++ 3 files changed, 12 insertions(+), 1 deletion(-)