Message ID | 20210107093854.882483-2-mlevitsk@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nSVM: few random fixes | expand |
On Thu, Jan 07, 2021, Maxim Levitsky wrote: > It is possible to exit the nested guest mode, entered by > svm_set_nested_state prior to first vm entry to it (e.g due to pending event) > if the nested run was not pending during the migration. Ugh, I assume this is due to one of the "premature" nested_ops->check_events() calls that are necessitated by the event mess? I'm guessing kvm_vcpu_running() is the culprit? If my assumption is correct, this bug affects nVMX as well. Rather than clear the request blindly on any nested VM-Exit, what about something like the following? IMO, KVM really shouldn't be synthesizing a nested VM-Exit before it processes pending requests, but unfortunately the nested_ops->check_events() mess isn't easily fixed. This will at least limit the mess, e.g. with this we'd get a WARN if KVM_REQ_GET_NESTED_STATE_PAGES is set after some other VM-Exit. diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 3136e05831cf..f44e6f7a0c9b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2857,17 +2857,15 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu) if (!pe) return; - if (is_guest_mode(vcpu)) { - r = kvm_x86_ops.nested_ops->check_events(vcpu); - if (r < 0) - return; - /* - * If an event has happened and caused a vmexit, - * we know INITs are latched and therefore - * we will not incorrectly deliver an APIC - * event instead of a vmexit. - */ - } + r = kvm_nested_check_events(vcpu); + if (r < 0) + return; + + /* + * If an event has happened and caused a vmexit, we know INITs are + * latched and therefore we will not incorrectly deliver an APIC + * event instead of a vmexit. + */ /* * INITs are latched while CPU is in specific states diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3f7c1fc7a3ce..b0f172d13cab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8223,6 +8223,25 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu) kvm_x86_ops.update_cr8_intercept(vcpu, tpr, max_irr); } +int kvm_nested_check_events(struct kvm_vcpu *vcpu) +{ + int r; + + if (!is_guest_mode(vcpu)) + return 0; + + r = kvm_x86_ops.nested_ops->check_events(vcpu); + + /* + * Clear nested-specific requests if checking nested events triggered a + * VM-Exit, they'll be re-requested on nested VM-Enter (if necessary). + */ + if (!is_guest_mode(vcpu)) + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); + + return r; +} + static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) { int r; @@ -8267,11 +8286,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit * from L2 to L1 due to pending L1 events which require exit * from L2 to L1. */ - if (is_guest_mode(vcpu)) { - r = kvm_x86_ops.nested_ops->check_events(vcpu); - if (r < 0) - goto busy; - } + r = kvm_nested_check_events(vcpu); + if (r < 0) + goto busy; /* try to inject new event if pending */ if (vcpu->arch.exception.pending) { @@ -8789,7 +8806,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { - if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { + if (!WARN_ON(!is_guest_mode(vcpu) && + unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu)))) { r = 0; goto out; } @@ -9111,8 +9129,7 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu) static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) - kvm_x86_ops.nested_ops->check_events(vcpu); + (void)kvm_nested_check_events(vcpu); return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index c5ee0f5ce0f1..dce61fda9c5e 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -247,6 +247,8 @@ static inline bool kvm_vcpu_latch_init(struct kvm_vcpu *vcpu) return is_smm(vcpu) || kvm_x86_ops.apic_init_signal_blocked(vcpu); } +int kvm_nested_check_events(struct kvm_vcpu *vcpu); + void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr); > In this case we must not switch to the nested msr permission bitmap. > Also add a warning to catch similar cases in the future. > > Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run") > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/svm/nested.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index b0b667456b2e7..ee4f2082ad1bd 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -199,6 +199,10 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm) > static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > + > + if (WARN_ON_ONCE(!is_guest_mode(&svm->vcpu))) > + return false; > + > if (!nested_svm_vmrun_msrpm(svm)) { > vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; > vcpu->run->internal.suberror = > @@ -595,6 +599,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm) > svm->nested.vmcb12_gpa = 0; > WARN_ON_ONCE(svm->nested.nested_run_pending); > > + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, &svm->vcpu); > + > /* in case we halted in L2 */ > svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE; > > -- > 2.26.2 >
On 07/01/21 18:00, Sean Christopherson wrote: > Ugh, I assume this is due to one of the "premature" nested_ops->check_events() > calls that are necessitated by the event mess? I'm guessing kvm_vcpu_running() > is the culprit? > > If my assumption is correct, this bug affects nVMX as well. Yes, though it may be latent. For SVM it was until we started allocating svm->nested on demand. > Rather than clear the request blindly on any nested VM-Exit, what > about something like the following? I think your patch is overkill, KVM_REQ_GET_NESTED_STATE_PAGES is only set from KVM_SET_NESTED_STATE so it cannot happen while the VM runs. Something like this is small enough and works well. diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index a622e63739b4..cb4c6ee10029 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -595,6 +596,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm) svm->nested.vmcb12_gpa = 0; WARN_ON_ONCE(svm->nested.nested_run_pending); + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, &svm->vcpu); + /* in case we halted in L2 */ svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE; diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index e2f26564a12d..0fbb46990dfc 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4442,6 +4442,8 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, /* trying to cancel vmlaunch/vmresume is a bug */ WARN_ON_ONCE(vmx->nested.nested_run_pending); + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); + /* Service the TLB flush request for L2 before switching to L1. */ if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) kvm_vcpu_flush_tlb_current(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3f7c1fc7a3ce..b7e784b5489c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8789,7 +8789,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { - if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { + if (WARN_ON_ONCE(!is_guest_mode(&svm->vcpu))) + ; + else if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { r = 0; goto out; }
On 07/01/21 18:51, Paolo Bonzini wrote: > On 07/01/21 18:00, Sean Christopherson wrote: >> Ugh, I assume this is due to one of the "premature" >> nested_ops->check_events() >> calls that are necessitated by the event mess? I'm guessing >> kvm_vcpu_running() >> is the culprit? >> >> If my assumption is correct, this bug affects nVMX as well. > > Yes, though it may be latent. For SVM it was until we started > allocating svm->nested on demand. > >> Rather than clear the request blindly on any nested VM-Exit, what >> about something like the following? > > I think your patch is overkill, KVM_REQ_GET_NESTED_STATE_PAGES is only > set from KVM_SET_NESTED_STATE so it cannot happen while the VM runs. ... and when leaving SMM. But in either case, there cannot be something else causing a nested vmexit before the request is set, because SMM does not support VMX operation. So I still don't think that it justifies the extra code and indirection. Paolo
On Thu, 2021-01-07 at 18:51 +0100, Paolo Bonzini wrote: > On 07/01/21 18:00, Sean Christopherson wrote: > > Ugh, I assume this is due to one of the "premature" nested_ops->check_events() > > calls that are necessitated by the event mess? I'm guessing kvm_vcpu_running() > > is the culprit? > > > > If my assumption is correct, this bug affects nVMX as well. > > Yes, though it may be latent. For SVM it was until we started > allocating svm->nested on demand. > > > Rather than clear the request blindly on any nested VM-Exit, what > > about something like the following? > > I think your patch is overkill, KVM_REQ_GET_NESTED_STATE_PAGES is only > set from KVM_SET_NESTED_STATE so it cannot happen while the VM runs. Note that I didn't include the same fix for VMX becasue it uses a separate vmcs for guest which has its own msr bitmap so in theory this shouldn't be needed, but it won't hurt. I'll test indeed if canceling the KVM_REQ_GET_NESTED_STATE_PAGES on VMX makes any difference on VMX in regard to nested migration crashes I am seeing. Best regards, Maxim Levitsky > > Something like this is small enough and works well. > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index a622e63739b4..cb4c6ee10029 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -595,6 +596,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm) > svm->nested.vmcb12_gpa = 0; > WARN_ON_ONCE(svm->nested.nested_run_pending); > > + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, &svm->vcpu); > + > /* in case we halted in L2 */ > svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE; > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index e2f26564a12d..0fbb46990dfc 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -4442,6 +4442,8 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 > vm_exit_reason, > /* trying to cancel vmlaunch/vmresume is a bug */ > WARN_ON_ONCE(vmx->nested.nested_run_pending); > > + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); > + > /* Service the TLB flush request for L2 before switching to L1. */ > if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) > kvm_vcpu_flush_tlb_current(vcpu); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 3f7c1fc7a3ce..b7e784b5489c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -8789,7 +8789,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > if (kvm_request_pending(vcpu)) { > if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { > - if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { > + if (WARN_ON_ONCE(!is_guest_mode(&svm->vcpu))) > + ; > + else if > (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { > r = 0; > goto out; > } >
On Thu, Jan 07, 2021, Paolo Bonzini wrote: > On 07/01/21 18:00, Sean Christopherson wrote: > > Ugh, I assume this is due to one of the "premature" nested_ops->check_events() > > calls that are necessitated by the event mess? I'm guessing kvm_vcpu_running() > > is the culprit? > > > > If my assumption is correct, this bug affects nVMX as well. > > Yes, though it may be latent. For SVM it was until we started allocating > svm->nested on demand. > > > Rather than clear the request blindly on any nested VM-Exit, what > > about something like the following? > > I think your patch is overkill, KVM_REQ_GET_NESTED_STATE_PAGES is only set > from KVM_SET_NESTED_STATE so it cannot happen while the VM runs. Yeah, which is why I was hoping we could avoid clearing the request on every nested exit. > Something like this is small enough and works well. I've no argument against it working, rather that I dislike clearing the request on every exit. Except for the ->check_events() case, hitting the scenario where there's a pending request at the time of nested VM-Exit would ideally be treated as a KVM bug. On the other hand, clearing nested-specific request on nested VM-Exit is logically sound, so I guess I'm ok with the minimal patch.
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index b0b667456b2e7..ee4f2082ad1bd 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -199,6 +199,10 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm) static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + + if (WARN_ON_ONCE(!is_guest_mode(&svm->vcpu))) + return false; + if (!nested_svm_vmrun_msrpm(svm)) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = @@ -595,6 +599,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm) svm->nested.vmcb12_gpa = 0; WARN_ON_ONCE(svm->nested.nested_run_pending); + kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, &svm->vcpu); + /* in case we halted in L2 */ svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE;
It is possible to exit the nested guest mode, entered by svm_set_nested_state prior to first vm entry to it (e.g due to pending event) if the nested run was not pending during the migration. In this case we must not switch to the nested msr permission bitmap. Also add a warning to catch similar cases in the future. Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/kvm/svm/nested.c | 6 ++++++ 1 file changed, 6 insertions(+)