Message ID | 20220301060351.442881-2-oupton@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: VMX ctrl MSR + KVM quirk fixes | expand |
On 3/1/22 07:03, Oliver Upton wrote: > + > + /* > + * Ensure KVM fiddling with these MSRs is preserved after userspace > + * write. > + */ > + if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || > + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) > + nested_vmx_entry_exit_ctls_update(&vmx->vcpu); > + I still don't understand this patch. You say: > Now, the BNDCFGS bits are only ever > updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a > subsequent MSR write from userspace will clobber these values. but I don't understand what's wrong with that. If you can (if so inclined) define a VM without LOAD_BNDCFGS or CLEAR_BNDCFGS even if MPX enabled, commit aedbaf4f6afd counts as a bugfix. Paolo
Hi Paolo, On Tue, Mar 01, 2022 at 07:00:57PM +0100, Paolo Bonzini wrote: > On 3/1/22 07:03, Oliver Upton wrote: > > + > > + /* > > + * Ensure KVM fiddling with these MSRs is preserved after userspace > > + * write. > > + */ > > + if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || > > + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) > > + nested_vmx_entry_exit_ctls_update(&vmx->vcpu); > > + > > I still don't understand this patch. You say: > > > Now, the BNDCFGS bits are only ever > > updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a > > subsequent MSR write from userspace will clobber these values. > > but I don't understand what's wrong with that. If you can (if so inclined) > define a VM without LOAD_BNDCFGS or CLEAR_BNDCFGS even if MPX enabled, > commit aedbaf4f6afd counts as a bugfix. Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the responsibility of userspace. My issue is that the commit message in commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled") suggests that userspace can expect these bits to be configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()"), if userspace clears these bits, KVM will continue to set them based on CPUID. What is the userspace expectation here? If we are saying that changes to IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to configure these bits based on guest CPUID. Given that there were previous userspace expectations, I attempted to restore the old behavior of KVM (ignore userspace writes) and add a quirk to fully back out of the mess. All this logic also applies to Patch 2 as well. -- Oliver
On 3/1/22 19:43, Oliver Upton wrote: > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the > responsibility of userspace. My issue is that the commit message in > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when > guest MPX disabled") suggests that userspace can expect these bits to be > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd > ("KVM: x86: Extract kvm_update_cpuid_runtime() from > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue > to set them based on CPUID. > > What is the userspace expectation here? If we are saying that changes to > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to > configure these bits based on guest CPUID. Yes, but I think it's reasonable that userspace wants to override them. It has to do that after KVM_SET_CPUID2, but that's okay too. Paolo
On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote: > On 3/1/22 19:43, Oliver Upton wrote: > > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the > > responsibility of userspace. My issue is that the commit message in > > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when > > guest MPX disabled") suggests that userspace can expect these bits to be > > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd > > ("KVM: x86: Extract kvm_update_cpuid_runtime() from > > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue > > to set them based on CPUID. > > > > What is the userspace expectation here? If we are saying that changes to > > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a > > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit > > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to > > configure these bits based on guest CPUID. > > Yes, but I think it's reasonable that userspace wants to override them. It > has to do that after KVM_SET_CPUID2, but that's okay too. > In that case, I can rework the tests at the end of this series to ensure userspace's ability to override w/o a quirk. Sorry for the toil, aedbaf4f6afd caused some breakage for us internally, but really is just a userspace bug. Is it possible to pick up patch 4/8 "KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2" independent of the rest of this series? -- Thanks, Oliver
On 3/2/22 21:51, Oliver Upton wrote: > On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote: >> On 3/1/22 19:43, Oliver Upton wrote: >>> Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the >>> responsibility of userspace. My issue is that the commit message in >>> commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when >>> guest MPX disabled") suggests that userspace can expect these bits to be >>> configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd >>> ("KVM: x86: Extract kvm_update_cpuid_runtime() from >>> kvm_update_cpuid()"), if userspace clears these bits, KVM will continue >>> to set them based on CPUID. >>> >>> What is the userspace expectation here? If we are saying that changes to >>> IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a >>> bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit >>> message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to >>> configure these bits based on guest CPUID. >> >> Yes, but I think it's reasonable that userspace wants to override them. It >> has to do that after KVM_SET_CPUID2, but that's okay too. >> > > In that case, I can rework the tests at the end of this series to ensure > userspace's ability to override w/o a quirk. Sorry for the toil, > aedbaf4f6afd caused some breakage for us internally, but really is just > a userspace bug. How did vanadium break? Paolo > Is it possible to pick up patch 4/8 "KVM: x86: Introduce > KVM_CAP_DISABLE_QUIRKS2" independent of the rest of this series?
On Wed, Mar 02, 2022 at 10:22:43PM +0100, Paolo Bonzini wrote: > On 3/2/22 21:51, Oliver Upton wrote: > > On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote: > > > On 3/1/22 19:43, Oliver Upton wrote: > > > > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the > > > > responsibility of userspace. My issue is that the commit message in > > > > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when > > > > guest MPX disabled") suggests that userspace can expect these bits to be > > > > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd > > > > ("KVM: x86: Extract kvm_update_cpuid_runtime() from > > > > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue > > > > to set them based on CPUID. > > > > > > > > What is the userspace expectation here? If we are saying that changes to > > > > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a > > > > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit > > > > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to > > > > configure these bits based on guest CPUID. > > > > > > Yes, but I think it's reasonable that userspace wants to override them. It > > > has to do that after KVM_SET_CPUID2, but that's okay too. > > > > > > > In that case, I can rework the tests at the end of this series to ensure > > userspace's ability to override w/o a quirk. Sorry for the toil, > > aedbaf4f6afd caused some breakage for us internally, but really is just > > a userspace bug. > > How did vanadium break? Maybe I can redirect you to a test case to highlight a possible regression in KVM, as seen by userspace ;-) from Patch 7/8: > + /* > + * Test that KVM will set these bits regardless of userspace if the > + * guest CPUID exposes a supporting vPMU. > + */ > + test_vmx_control_msr(vm, MSR_IA32_VMX_TRUE_ENTRY_CTLS, > + 0, /* set */ > + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, /* clear */ > + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, /* exp_set */ > + 0); /* exp_clear */ > + test_vmx_control_msr(vm, MSR_IA32_VMX_TRUE_EXIT_CTLS, > + 0, /* set */ > + VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL, /* clear */ > + VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL, /* exp_set */ > + 0); /* exp_clear */ Same goes for the "{load,clear} IA32_BNDCFGS" bits too. -- Thanks, Oliver
On Wed, Mar 02, 2022, Oliver Upton wrote: > On Wed, Mar 02, 2022 at 10:22:43PM +0100, Paolo Bonzini wrote: > > On 3/2/22 21:51, Oliver Upton wrote: > > > On Wed, Mar 02, 2022 at 01:21:23PM +0100, Paolo Bonzini wrote: > > > > On 3/1/22 19:43, Oliver Upton wrote: > > > > > Right, a 1-setting of '{load,clear} IA32_BNDCFGS' should really be the > > > > > responsibility of userspace. My issue is that the commit message in > > > > > commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when > > > > > guest MPX disabled") suggests that userspace can expect these bits to be > > > > > configured based on guest CPUID. Furthermore, before commit aedbaf4f6afd > > > > > ("KVM: x86: Extract kvm_update_cpuid_runtime() from > > > > > kvm_update_cpuid()"), if userspace clears these bits, KVM will continue > > > > > to set them based on CPUID. > > > > > > > > > > What is the userspace expectation here? If we are saying that changes to > > > > > IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS after userspace writes these MSRs is a > > > > > bug, then I agree aedbaf4f6afd is in fact a bugfix. But, the commit > > > > > message in 5f76f6f5ff96 seems to indicate that userspace wants KVM to > > > > > configure these bits based on guest CPUID. > > > > > > > > Yes, but I think it's reasonable that userspace wants to override them. It > > > > has to do that after KVM_SET_CPUID2, but that's okay too. > > > > > > > > > > In that case, I can rework the tests at the end of this series to ensure > > > userspace's ability to override w/o a quirk. Sorry for the toil, > > > aedbaf4f6afd caused some breakage for us internally, but really is just > > > a userspace bug. > > > > How did vanadium break? > > Maybe I can redirect you to a test case to highlight a possible > regression in KVM, as seen by userspace ;-) Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking with unrelated things. The original hack was to fix a userspace bug and should never have been mreged.
On 3/3/22 02:43, Sean Christopherson wrote: >> Maybe I can redirect you to a test case to highlight a possible >> regression in KVM, as seen by userspace;-) > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking > with unrelated things. The original hack was to fix a userspace bug and should > never have been mreged. Note that it dates back to: commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8 Author: Liran Alon <liran.alon@oracle.com> Date: Fri Sep 14 03:25:52 2018 +0300 KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled Before this commit, KVM exposes MPX VMX controls to L1 guest only based on if KVM and host processor supports MPX virtualization. However, these controls should be exposed to guest only in case guest vCPU supports MPX. It's not to fix a userspace bug, it's to support userspace that doesn't know about using KVM_SET_MSR for VMX features---which is okay since unlike KVM_SET_CPUID2 it's not a mandatory call. Paolo
On Thu, Mar 03, 2022, Paolo Bonzini wrote: > On 3/3/22 02:43, Sean Christopherson wrote: > > > Maybe I can redirect you to a test case to highlight a possible > > > regression in KVM, as seen by userspace;-) > > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking > > with unrelated things. The original hack was to fix a userspace bug and should > > never have been mreged. > > Note that it dates back to: > > commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8 > Author: Liran Alon <liran.alon@oracle.com> > Date: Fri Sep 14 03:25:52 2018 +0300 > > KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled > Before this commit, KVM exposes MPX VMX controls to L1 guest only based > on if KVM and host processor supports MPX virtualization. > However, these controls should be exposed to guest only in case guest > vCPU supports MPX. > > It's not to fix a userspace bug, it's to support userspace that doesn't > know about using KVM_SET_MSR for VMX features---which is okay since unlike > KVM_SET_CPUID2 it's not a mandatory call. I disagree, IMO failure to properly configure the vCPU model is a userspace bug. Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM ABI, but it's still a userspace bug. One could argue that KVM should disable/clear VMX features if userspace clears a related CPUID feature, but _setting_ a VMX feature based on CPUID is architecturally wrong. Even if we consider one or both cases to be desirable behavior in terms of creating a consistent vCPU model, forcing a consistent vCPU model for this one case goes against every other ioctl in KVM's ABI. If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then KVM has a bunch of "bugs". X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, SECONDARY_EXEC_TSC_SCALING X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES
On Thu, Mar 3, 2022 at 8:15 AM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Mar 03, 2022, Paolo Bonzini wrote: > > On 3/3/22 02:43, Sean Christopherson wrote: > > > > Maybe I can redirect you to a test case to highlight a possible > > > > regression in KVM, as seen by userspace;-) > > > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking > > > with unrelated things. The original hack was to fix a userspace bug and should > > > never have been mreged. > > > > Note that it dates back to: > > > > commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8 > > Author: Liran Alon <liran.alon@oracle.com> > > Date: Fri Sep 14 03:25:52 2018 +0300 > > > > KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled > > Before this commit, KVM exposes MPX VMX controls to L1 guest only based > > on if KVM and host processor supports MPX virtualization. > > However, these controls should be exposed to guest only in case guest > > vCPU supports MPX. > > > > It's not to fix a userspace bug, it's to support userspace that doesn't > > know about using KVM_SET_MSR for VMX features---which is okay since unlike > > KVM_SET_CPUID2 it's not a mandatory call. > > I disagree, IMO failure to properly configure the vCPU model is a userspace bug. > Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM > ABI, but it's still a userspace bug. One could argue that KVM should disable/clear > VMX features if userspace clears a related CPUID feature, but _setting_ a VMX > feature based on CPUID is architecturally wrong. Even if we consider one or both > cases to be desirable behavior in terms of creating a consistent vCPU model, forcing > a consistent vCPU model for this one case goes against every other ioctl in KVM's > ABI. > > If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then > KVM has a bunch of "bugs". > > X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA > > X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, > SECONDARY_EXEC_TSC_SCALING > > X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID > > X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING > > X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, > VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL > > X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES I don't disagree with you, but this does beg the question, "What's going on with all of the invocations of cr4_fixed1_update()?"
On Thu, Mar 03, 2022, Jim Mattson wrote: > On Thu, Mar 3, 2022 at 8:15 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Thu, Mar 03, 2022, Paolo Bonzini wrote: > > > On 3/3/22 02:43, Sean Christopherson wrote: > > > > > Maybe I can redirect you to a test case to highlight a possible > > > > > regression in KVM, as seen by userspace;-) > > > > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking > > > > with unrelated things. The original hack was to fix a userspace bug and should > > > > never have been mreged. > > > > > > Note that it dates back to: > > > > > > commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8 > > > Author: Liran Alon <liran.alon@oracle.com> > > > Date: Fri Sep 14 03:25:52 2018 +0300 > > > > > > KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled > > > Before this commit, KVM exposes MPX VMX controls to L1 guest only based > > > on if KVM and host processor supports MPX virtualization. > > > However, these controls should be exposed to guest only in case guest > > > vCPU supports MPX. > > > > > > It's not to fix a userspace bug, it's to support userspace that doesn't > > > know about using KVM_SET_MSR for VMX features---which is okay since unlike > > > KVM_SET_CPUID2 it's not a mandatory call. > > > > I disagree, IMO failure to properly configure the vCPU model is a userspace bug. > > Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM > > ABI, but it's still a userspace bug. One could argue that KVM should disable/clear > > VMX features if userspace clears a related CPUID feature, but _setting_ a VMX > > feature based on CPUID is architecturally wrong. Even if we consider one or both > > cases to be desirable behavior in terms of creating a consistent vCPU model, forcing > > a consistent vCPU model for this one case goes against every other ioctl in KVM's > > ABI. > > > > If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then > > KVM has a bunch of "bugs". > > > > X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA > > > > X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, > > SECONDARY_EXEC_TSC_SCALING > > > > X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID > > > > X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING > > > > X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, > > VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL > > > > X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES > > I don't disagree with you, but this does beg the question, "What's > going on with all of the invocations of cr4_fixed1_update()?" Boo, I forgot legal CR4 is controlled via MSRs too. Ha! That's a bug in nVMX. nVMX only checks msrs.cr4_fixed0/1, it doesn't check "cr4_reserved_bits", which is KVM's set of host reserved bits. That means userspace can bypass those reserved bits by setting guest CPUID and/or VMX MSRs and loading CR4 via VM-Enter/VM-Exit. The immediate nVMX bug can be fixed by calling kvm_is_valid_cr4(), which calls back into nVMX to do the VMX MSR checks. My vote would be to include nested_vmx_cr_fixed1_bits_update() in the quirk, but keep the guest CPUID enforcement that's in kvm_is_valid_cr4(). I.e. let userspace further restrict CR4, but don't let it allow nested VM-Enter/VM-Exit to load bits that L1 can't set via MOV CR4. I'll send this as a proper patch: diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h index c92cea0b8ccc..46dd1967ec08 100644 --- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) } /* No difference in the restrictions on guest and host CR4 in VMX operation. */ -#define nested_guest_cr4_valid nested_cr4_valid -#define nested_host_cr4_valid nested_cr4_valid +#define nested_guest_cr4_valid kvm_is_valid_cr4 +#define nested_host_cr4_valid kvm_is_valid_cr4 extern struct kvm_x86_nested_ops vmx_nested_ops;
On 3/4/22 00:44, Sean Christopherson wrote: > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > index c92cea0b8ccc..46dd1967ec08 100644 > --- a/arch/x86/kvm/vmx/nested.h > +++ b/arch/x86/kvm/vmx/nested.h > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) > } > > /* No difference in the restrictions on guest and host CR4 in VMX operation. */ > -#define nested_guest_cr4_valid nested_cr4_valid > -#define nested_host_cr4_valid nested_cr4_valid > +#define nested_guest_cr4_valid kvm_is_valid_cr4 > +#define nested_host_cr4_valid kvm_is_valid_cr4 This doesn't allow the theoretically possible case of L0 setting some CR4-fixed-0 bits for L1. I'll send another one. Paolo
On Fri, Mar 04, 2022, Paolo Bonzini wrote: > On 3/4/22 00:44, Sean Christopherson wrote: > > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > > index c92cea0b8ccc..46dd1967ec08 100644 > > --- a/arch/x86/kvm/vmx/nested.h > > +++ b/arch/x86/kvm/vmx/nested.h > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) > > } > > > > /* No difference in the restrictions on guest and host CR4 in VMX operation. */ > > -#define nested_guest_cr4_valid nested_cr4_valid > > -#define nested_host_cr4_valid nested_cr4_valid > > +#define nested_guest_cr4_valid kvm_is_valid_cr4 > > +#define nested_host_cr4_valid kvm_is_valid_cr4 > > This doesn't allow the theoretically possible case of L0 setting some > CR4-fixed-0 bits for L1. I'll send another one. Are you still planning on sending a proper patch for this? And more importantly, have we shifted your view on this patch/series?
Hey Sean, On Wed, Apr 6, 2022 at 5:26 PM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Mar 04, 2022, Paolo Bonzini wrote: > > On 3/4/22 00:44, Sean Christopherson wrote: > > > > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > > > index c92cea0b8ccc..46dd1967ec08 100644 > > > --- a/arch/x86/kvm/vmx/nested.h > > > +++ b/arch/x86/kvm/vmx/nested.h > > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) > > > } > > > > > > /* No difference in the restrictions on guest and host CR4 in VMX operation. */ > > > -#define nested_guest_cr4_valid nested_cr4_valid > > > -#define nested_host_cr4_valid nested_cr4_valid > > > +#define nested_guest_cr4_valid kvm_is_valid_cr4 > > > +#define nested_host_cr4_valid kvm_is_valid_cr4 > > > > This doesn't allow the theoretically possible case of L0 setting some > > CR4-fixed-0 bits for L1. I'll send another one. > > Are you still planning on sending a proper patch for this? > > And more importantly, have we shifted your view on this patch/series? Sorry, I should've followed up. If nobody else complains, let's just leave everything as-is and avoid repeating the mistakes of the patches to blame (hey, I authored one of those!) -- Best, Oliver
On Wed, Apr 6, 2022 at 5:29 PM Oliver Upton <oupton@google.com> wrote: > > Hey Sean, > > On Wed, Apr 6, 2022 at 5:26 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, Mar 04, 2022, Paolo Bonzini wrote: > > > On 3/4/22 00:44, Sean Christopherson wrote: > > > > > > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > > > > index c92cea0b8ccc..46dd1967ec08 100644 > > > > --- a/arch/x86/kvm/vmx/nested.h > > > > +++ b/arch/x86/kvm/vmx/nested.h > > > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) > > > > } > > > > > > > > /* No difference in the restrictions on guest and host CR4 in VMX operation. */ > > > > -#define nested_guest_cr4_valid nested_cr4_valid > > > > -#define nested_host_cr4_valid nested_cr4_valid > > > > +#define nested_guest_cr4_valid kvm_is_valid_cr4 > > > > +#define nested_host_cr4_valid kvm_is_valid_cr4 > > > > > > This doesn't allow the theoretically possible case of L0 setting some > > > CR4-fixed-0 bits for L1. I'll send another one. > > > > Are you still planning on sending a proper patch for this? > > > > And more importantly, have we shifted your view on this patch/series? > > Sorry, I should've followed up. If nobody else complains, let's just > leave everything as-is and avoid repeating the mistakes of the patches > to blame (hey, I authored one of those!) Well, that is to say we do nothing to plug the half-baked behavior of changing MSRs based on CPUID. Quirk the rest of it away if we don't want to fudge with these bits any more :) -- Thanks, Oliver
On Wed, Apr 06, 2022, Oliver Upton wrote: > Hey Sean, > > On Wed, Apr 6, 2022 at 5:26 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, Mar 04, 2022, Paolo Bonzini wrote: > > > On 3/4/22 00:44, Sean Christopherson wrote: > > > > > > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > > > > index c92cea0b8ccc..46dd1967ec08 100644 > > > > --- a/arch/x86/kvm/vmx/nested.h > > > > +++ b/arch/x86/kvm/vmx/nested.h > > > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) > > > > } > > > > > > > > /* No difference in the restrictions on guest and host CR4 in VMX operation. */ > > > > -#define nested_guest_cr4_valid nested_cr4_valid > > > > -#define nested_host_cr4_valid nested_cr4_valid > > > > +#define nested_guest_cr4_valid kvm_is_valid_cr4 > > > > +#define nested_host_cr4_valid kvm_is_valid_cr4 > > > > > > This doesn't allow the theoretically possible case of L0 setting some > > > CR4-fixed-0 bits for L1. I'll send another one. > > > > Are you still planning on sending a proper patch for this? > > > > And more importantly, have we shifted your view on this patch/series? > > Sorry, I should've followed up. If nobody else complains, let's just > leave everything as-is and avoid repeating the mistakes of the patches > to blame (hey, I authored one of those!) The problem is that if we leave things as is, someone will inevitably think it's the right thing to do and will repeat those mistakes. I don't see why we wouldn't add the quirk, broken userspace gets to keep its broken behavior unless it opts into to disabling the quirk.
/cast resurrect On Fri, Mar 04, 2022, Paolo Bonzini wrote: > On 3/4/22 00:44, Sean Christopherson wrote: > > > > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > > index c92cea0b8ccc..46dd1967ec08 100644 > > --- a/arch/x86/kvm/vmx/nested.h > > +++ b/arch/x86/kvm/vmx/nested.h > > @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) > > } > > > > /* No difference in the restrictions on guest and host CR4 in VMX operation. */ > > -#define nested_guest_cr4_valid nested_cr4_valid > > -#define nested_host_cr4_valid nested_cr4_valid > > +#define nested_guest_cr4_valid kvm_is_valid_cr4 > > +#define nested_host_cr4_valid kvm_is_valid_cr4 > > This doesn't allow the theoretically possible case of L0 setting some > CR4-fixed-0 bits for L1. I'll send another one. Ha! My "patch" is correct. kvm_is_valid_cr4() calls vmx_is_valid_cr4(), which calls nested_cr4_valid() when the vCPU is post-VMXON, so it _does_ cover the fixed0 case. I'll send a proper patch with a comment to call out that subtlety.
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 1dfe23963a9e..a13f8f4e3d82 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1291,6 +1291,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data) *lowp = data; *highp = data >> 32; + + /* + * Ensure KVM fiddling with these MSRs is preserved after userspace + * write. + */ + if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) + nested_vmx_entry_exit_ctls_update(&vmx->vcpu); + return 0; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b325f99b2177..3a97220c5f78 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7246,7 +7246,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) #undef cr4_fixed1_update } -static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 7f2c82e7f38f..e134e2763502 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -423,6 +423,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"), KVM has taken ownership of the "load IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports MPX, and clear otherwise. However, commit aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") partially broke KVM ownership of the aforementioned bits. Before, kvm_update_cpuid() was exercised frequently when running a guest and constantly applied its own changes to the BNDCFGS bits. Now, the BNDCFGS bits are only ever updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a subsequent MSR write from userspace will clobber these values. Uphold the old ABI by reapplying KVM's tweaks to the BNDCFGS bits after an MSR write from userspace. Fixes: aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") Signed-off-by: Oliver Upton <oupton@google.com> --- arch/x86/kvm/vmx/nested.c | 9 +++++++++ arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/vmx/vmx.h | 2 ++ 3 files changed, 12 insertions(+), 1 deletion(-)