Message ID | 1500979246-104432-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 25/07/2017 12:40, Wanpeng Li wrote: > Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS) > tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough: > > - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the > L2 can generate #PF which can be intercepted by L0. > - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 > guest and injects the #PF into L1. > - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field > and fix the shadow page table before resuming L2 guest. > - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0 > - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility > state field which is updated in vmcs12 when prepare vmcs02 > - ......... The "..." part is not very enlightening. My understanding is: - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the L2 can generate #PF which can be intercepted by L0. - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 guest and injects the #PF into L1. At this point the vmcs02 has nmi_known_unmasked=true. - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field of vmcs12 (and fixes the shadow page table) before resuming L2 guest. - L1 executes VMRESUME to resume L2, causing a vmexit to L0 - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the interruptibility-state field of vmcs02, but nmi_known_unmasked is still true. - on the next L2 exit to L0, nmi_known_unmasked is true so vmx_recover_nmi_blocking does not do anything. Can you explain instead what happens if your v1 patch is applied (on top of mine), and why it fixes the bug. The patch is correct and almost obvious, but I'd like the commit message to be precise. (Also, does your machine have shadow VMCS support?) Thanks, Paolo
2017-07-25 18:55 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > On 25/07/2017 12:40, Wanpeng Li wrote: >> Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS) >> tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough: >> >> - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the >> L2 can generate #PF which can be intercepted by L0. >> - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 >> guest and injects the #PF into L1. >> - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field >> and fix the shadow page table before resuming L2 guest. >> - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0 >> - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility >> state field which is updated in vmcs12 when prepare vmcs02 >> - ......... > > The "..." part is not very enlightening. My understanding is: > > - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault > on IRET, so the L2 can generate #PF which can be intercepted by L0. > - L0 walks L1's guest page table and sees the mapping is invalid, it > resumes the L1 guest and injects the #PF into L1. At this point the > vmcs02 has nmi_known_unmasked=true. > - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field > of vmcs12 (and fixes the shadow page table) before resuming L2 guest. > - L1 executes VMRESUME to resume L2, causing a vmexit to L0 > - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the > interruptibility-state field of vmcs02, but nmi_known_unmasked is > still true. > - on the next L2 exit to L0, nmi_known_unmasked is true so > vmx_recover_nmi_blocking does not do anything. Thanks for that. :) > > Can you explain instead what happens if your v1 patch is applied (on top of mine), > and why it fixes the bug. We will set the expected guest interruptibility-state field before the final step: L0 fixes the shadow page table (NGVA -> HPA), then L0 resumes the guest w/ the expected guest interruptibility-state. > > The patch is correct and almost obvious, but I'd like the commit message to be precise. > > (Also, does your machine have shadow VMCS support?) A Haswell desktop w/ shadow vmcs enabled. Regards, Wanpeng Li
On 25/07/2017 13:36, Wanpeng Li wrote: >> - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the >> interruptibility-state field of vmcs02, but nmi_known_unmasked is >> still true. >> - on the next L2 exit to L0, nmi_known_unmasked is true so >> vmx_recover_nmi_blocking does not do anything. >> >> Can you explain instead what happens if your v1 patch is applied (on top of mine), >> and why it fixes the bug. > > We will set the expected guest interruptibility-state field before the > final step: L0 fixes the shadow page table (NGVA -> HPA), then L0 > resumes the guest w/ the expected guest interruptibility-state. Aha, that makes sense indeed! I was confused between the first page fault (that is reflected to L1) and the second (that is fixed by L0). Thanks! Paolo
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 29fd8af..bc999a1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10041,6 +10041,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, vmcs12->vm_entry_instruction_len); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, vmcs12->guest_interruptibility_info); + vmx->loaded_vmcs->nmi_known_unmasked = + !(vmcs12->guest_interruptibility_info & GUEST_INTR_STATE_NMI); } else { vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); }
Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1: Before NMI IRET test Sending NMI to self NMI isr running stack 0x461000 Sending nested NMI to self After nested NMI to self Nested NMI isr running rip=40038e After iret After NMI to self FAIL: NMI Reference SDM 31.7.1.2: If the “virtual NMIs” VM-execution control is 1, bit 12 of the VM-exit interruption-information field indicates that the VM exit was due to a fault encountered during an execution of the IRET instruction that removed virtual-NMI blocking. In particular, it provides this indication if the following are both true: - Bit 31 (valid) in the IDT-vectoring information field is 0. - The value of bits 7:0 (vector) of the VM-exit interruption-information field is not 8 (the VM exit is not due to a double-fault exception). If both are true and bit 12 of the VM-exit interruption-information field is 1, there was virtual-NMI blocking before guest software executed the IRET instruction that caused the fault that caused the VM exit. The VMM should set bit 3 (blocking by NMI) in the interruptibility-state field (using VMREAD and VMWRITE) before resuming guest software. Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS) tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough: - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the L2 can generate #PF which can be intercepted by L0. - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 guest and injects the #PF into L1. - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field and fix the shadow page table before resuming L2 guest. - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0 - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility state field which is updated in vmcs12 when prepare vmcs02 - ......... This patch fixes it by updating nmi_known_unmasked when preparing vmcs02 from vmcs12. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> --- arch/x86/kvm/vmx.c | 2 ++ 1 file changed, 2 insertions(+)