Message ID | 2d56b269-0d21-10a3-80ce-19e989d6903b@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 20/11/17 23:47, Paolo Bonzini wrote: > On 19/11/2017 17:25, Liran Alon wrote: >> L2 RDMSR could be handled as described below: >> 1) L2 exits to L0 on RDMSR and calls handle_rdmsr(). >> 2) handle_rdmsr() calls kvm_inject_gp() which sets >> KVM_REQ_EVENT, exception.pending=true and exception.injected=false. >> 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls >> inject_pending_event() which calls vmx_check_nested_events() >> which sees that exception.pending=true but >> nested_vmx_check_exception() returns 0 and therefore does nothing at >> this point. However let's assume it later sees vmx-preemption-timer >> expired and therefore exits from L2 to L1 by calling >> nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12() >> which calls vmcs12_save_pending_event() but it does nothing as >> exception.injected is false. Also prepare_vmcs12() calls >> kvm_clear_exception_queue() which does nothing as >> exception.injected is already false. >> 5) We now return from vmx_check_nested_events() with 0 while still >> having exception.pending=true! >> 6) Therefore inject_pending_event() continues >> and we inject L2 exception to L1!... > > But this is buggy as well, because the #GP is lost, isn't it? I don't think so. Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected"), there is a fundamental difference between a pending exception and an injected exception. A pending exception means that no side-effects of the exception have been applied yet. Including incrementing the RIP after the instruction which cause exception. In our case for example, handle_wrmsr() calls kvm_inject_gp() and returns without calling kvm_skip_emulated_instruction() which increments the RIP. Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can safely clear exception.pending because when L1 will resume L2, the exception will be raised again (the same WRMSR instruction will be run again which will raise #GP again). This is also why vmcs12_save_pending_event() only makes sure to save in VMCS12 idt-vectoring-info the "injected" events and not the "pending" events (interrupt.pending is misleading name and I would rename it in upcoming patch to interrupt.injected. See explanation below). And this is also why exception.pending is intercepted but exception.injected is not. I can confirm this patch works because I have wrote a kvm-unit-test which reproduce this issue. And after the fix the #GP is not lost and raised to L2 directly correctly. (I haven't posted the unit-test yet because it is very dependent on correct vmx-preemption-timer timer config that varies between environments). > > Is the bug that the preemption timer vmexit should only be injected if > there are no pending exceptions? In fact, the same bug could also > happened for NMIs or interrupts, I think. As explained above, I don't think so. In general there is a bit of mess in KVM code regarding clean separation between a "pending" event and an "injected" event. NMIs & Exceptions are now separated nicely. However, interrupt.pending is actually interrupt.injected as it is signaled after the side-effects have occurred (bit moved from IRR to ISR for example). I am going to post another series (which is a v2 of a previous series I posted here) tomorrow that will attempt to clean this and on the way fix a couple of bugs in this area. > > So, something like (101% untested): > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 5b436f4e6e3e..64eecb8b126d 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > bool block_nested_events = > vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu); > > - if (vcpu->arch.exception.pending && > - nested_vmx_check_exception(vcpu, &exit_qual)) { > + if (vcpu->arch.exception.pending) { > + if (!nested_vmx_check_exception(vcpu, &exit_qual)) > + return 0; > if (block_nested_events) > return -EBUSY; > nested_vmx_inject_exception_vmexit(vcpu, exit_qual); > @@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > return 0; > } > > - if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && > - vmx->nested.preemption_timer_expired) { > - if (block_nested_events) > - return -EBUSY; > - nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); > - return 0; > - } > - > - if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) { > + if (vcpu->arch.nmi_pending) { > + if (!nested_exit_on_nmi(vcpu)) > + return 0; > if (block_nested_events) > return -EBUSY; > nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, > @@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > return 0; > } > > - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && > - nested_exit_on_intr(vcpu)) { > + if (kvm_cpu_has_interrupt(vcpu) || external_intr) { > + if (!nested_exit_on_intr(vcpu)) > + return 0; > if (block_nested_events) > return -EBUSY; > nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); > return 0; > } > > + if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && > + vmx->nested.preemption_timer_expired) { > + if (block_nested_events) > + return -EBUSY; > + nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); > + return 0; > + } > + > vmx_complete_nested_posted_interrupt(vcpu); > return 0; > } > > Paolo > >> This commit will fix above issue by changing step (4) to >> clear exception.pending in kvm_clear_exception_queue(). >> >> Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not >> yet been injected") >> >> Signed-off-by: Liran Alon <liran.alon@oracle.com> >> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> >> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> >> --- >> arch/x86/kvm/vmx.c | 1 - >> arch/x86/kvm/x86.h | 1 + >> 2 files changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 7c3522a989d0..bee08782c781 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) >> if (vmx->nested.nested_run_pending) >> return -EBUSY; >> nested_vmx_inject_exception_vmexit(vcpu, exit_qual); >> - vcpu->arch.exception.pending = false; >> return 0; >> } >> >> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h >> index d0b95b7a90b4..6d112d8f799c 100644 >> --- a/arch/x86/kvm/x86.h >> +++ b/arch/x86/kvm/x86.h >> @@ -12,6 +12,7 @@ >> >> static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) >> { >> + vcpu->arch.exception.pending = false; >> vcpu->arch.exception.injected = false; >> } >> >> > > Should kvm_clear_interrupt_queue do the same? interrupts currently only have interrupt.pending (which as I said above, it is actually interrupt.injected). Therefore kvm_clear_interrupt_queue() clears all there is... > > Thanks, > > Paolo > Regards, -Liran
On 20/11/2017 23:46, Liran Alon wrote: >>> >> >> But this is buggy as well, because the #GP is lost, isn't it? > > I don't think so. > > Since commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has > not yet been injected"), there is a fundamental difference between a > pending exception and an injected exception. > A pending exception means that no side-effects of the exception have > been applied yet. Including incrementing the RIP after the instruction > which cause exception. In our case for example, handle_wrmsr() calls > kvm_inject_gp() and returns without calling > kvm_skip_emulated_instruction() which increments the RIP. Ok, I was almost sure this was going to be the case if the #GP wasn't lost (but couldn't convince myself 100%). > Therefore, when we exit from L2 to L1 on vmx-preemption-timer, we can > safely clear exception.pending because when L1 will resume L2, the > exception will be raised again (the same WRMSR instruction will be run > again which will raise #GP again). > This is also why vmcs12_save_pending_event() only makes sure to save in > VMCS12 idt-vectoring-info the "injected" events and not the "pending" > events (interrupt.pending is misleading name and I would rename it in > upcoming patch to interrupt.injected. See explanation below). Indeed. And then kvm_event_needs_reinjection starts making sense. > I can confirm this patch works because I have wrote a kvm-unit-test > which reproduce this issue. And after the fix the #GP is not lost and > raised to L2 directly correctly. > (I haven't posted the unit-test yet because it is very dependent on > correct vmx-preemption-timer timer config that varies between > environments). Can you post it anyway? Tests always help understanding the code. Paolo
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5b436f4e6e3e..64eecb8b126d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) bool block_nested_events = vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu); - if (vcpu->arch.exception.pending && - nested_vmx_check_exception(vcpu, &exit_qual)) { + if (vcpu->arch.exception.pending) { + if (!nested_vmx_check_exception(vcpu, &exit_qual)) + return 0; if (block_nested_events) return -EBUSY; nested_vmx_inject_exception_vmexit(vcpu, exit_qual); @@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) return 0; } - if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && - vmx->nested.preemption_timer_expired) { - if (block_nested_events) - return -EBUSY; - nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); - return 0; - } - - if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) { + if (vcpu->arch.nmi_pending) { + if (!nested_exit_on_nmi(vcpu)) + return 0; if (block_nested_events) return -EBUSY; nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, @@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) return 0; } - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && - nested_exit_on_intr(vcpu)) { + if (kvm_cpu_has_interrupt(vcpu) || external_intr) { + if (!nested_exit_on_intr(vcpu)) + return 0; if (block_nested_events) return -EBUSY; nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); return 0; } + if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && + vmx->nested.preemption_timer_expired) { + if (block_nested_events) + return -EBUSY; + nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); + return 0; + } + vmx_complete_nested_posted_interrupt(vcpu); return 0; }