diff mbox

[v2] KVM: nVMX: Fix losing NMI blocking state

Message ID 1500979246-104432-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li July 25, 2017, 10:40 a.m. UTC
Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:

Before NMI IRET test
Sending NMI to self
NMI isr running stack 0x461000
Sending nested NMI to self
After nested NMI to self
Nested NMI isr running rip=40038e
After iret
After NMI to self
FAIL: NMI

Reference SDM 31.7.1.2:

 If the “virtual NMIs” VM-execution control is 1, bit 12 of the VM-exit
 interruption-information field indicates that the VM exit was due to a fault
 encountered during an execution of the IRET instruction that removed virtual-NMI
 blocking. In particular, it provides this indication if the following are both
 true:

  - Bit 31 (valid) in the IDT-vectoring information field is 0.
  - The value of bits 7:0 (vector) of the VM-exit interruption-information
    field is not 8 (the VM exit is not due to a double-fault exception).

 If both are true and bit 12 of the VM-exit interruption-information field is 1,
 there was virtual-NMI blocking before guest software executed the IRET instruction
 that caused the fault that caused the VM exit. The VMM should set bit 3 (blocking
 by NMI) in the interruptibility-state field (using VMREAD and VMWRITE) before
 resuming guest software.

Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS)
tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough:

 - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the 
   L2 can generate #PF which can be intercepted by L0. 
 - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 
   guest and injects the #PF into L1.
 - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field 
   and fix the shadow page table before resuming L2 guest.
 - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0 
 - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility 
   state field which is updated in vmcs12 when prepare vmcs02
 - .........

This patch fixes it by updating nmi_known_unmasked when preparing vmcs02 from vmcs12.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/vmx.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Paolo Bonzini July 25, 2017, 10:55 a.m. UTC | #1
On 25/07/2017 12:40, Wanpeng Li wrote:
> Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS)
> tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough:
> 
>  - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the 
>    L2 can generate #PF which can be intercepted by L0. 
>  - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 
>    guest and injects the #PF into L1.
>  - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field 
>    and fix the shadow page table before resuming L2 guest.
>  - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0 
>  - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility 
>    state field which is updated in vmcs12 when prepare vmcs02
>  - .........

The "..." part is not very enlightening.  My understanding is:

 - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault 
   on IRET, so the L2 can generate #PF which can be intercepted by L0.
 - L0 walks L1's guest page table and sees the mapping is invalid, it 
   resumes the L1 guest and injects the #PF into L1.  At this point the
   vmcs02 has nmi_known_unmasked=true.
 - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
   of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
 - L1 executes VMRESUME to resume L2, causing a vmexit to L0
 - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
   interruptibility-state field of vmcs02, but nmi_known_unmasked is
   still true.
 - on the next L2 exit to L0, nmi_known_unmasked is true so
   vmx_recover_nmi_blocking does not do anything.

Can you explain instead what happens if your v1 patch is applied (on top of mine),
and why it fixes the bug.

The patch is correct and almost obvious, but I'd like the commit message to be precise.

(Also, does your machine have shadow VMCS support?)

Thanks,

Paolo
Wanpeng Li July 25, 2017, 11:36 a.m. UTC | #2
2017-07-25 18:55 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 25/07/2017 12:40, Wanpeng Li wrote:
>> Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS)
>> tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough:
>>
>>  - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the
>>    L2 can generate #PF which can be intercepted by L0.
>>  - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1
>>    guest and injects the #PF into L1.
>>  - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field
>>    and fix the shadow page table before resuming L2 guest.
>>  - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0
>>  - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility
>>    state field which is updated in vmcs12 when prepare vmcs02
>>  - .........
>
> The "..." part is not very enlightening.  My understanding is:
>
>  - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
>    on IRET, so the L2 can generate #PF which can be intercepted by L0.
>  - L0 walks L1's guest page table and sees the mapping is invalid, it
>    resumes the L1 guest and injects the #PF into L1.  At this point the
>    vmcs02 has nmi_known_unmasked=true.
>  - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
>    of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
>  - L1 executes VMRESUME to resume L2, causing a vmexit to L0
>  - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
>    interruptibility-state field of vmcs02, but nmi_known_unmasked is
>    still true.
>  - on the next L2 exit to L0, nmi_known_unmasked is true so
>    vmx_recover_nmi_blocking does not do anything.

Thanks for that. :)

>
> Can you explain instead what happens if your v1 patch is applied (on top of mine),
> and why it fixes the bug.

We will set the expected guest interruptibility-state field before the
final step: L0 fixes the shadow page table (NGVA -> HPA), then L0
resumes the guest w/ the expected guest interruptibility-state.

>
> The patch is correct and almost obvious, but I'd like the commit message to be precise.
>
> (Also, does your machine have shadow VMCS support?)

A Haswell desktop w/ shadow vmcs enabled.

Regards,
Wanpeng Li
Paolo Bonzini July 25, 2017, 1 p.m. UTC | #3
On 25/07/2017 13:36, Wanpeng Li wrote:
>>  - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
>>    interruptibility-state field of vmcs02, but nmi_known_unmasked is
>>    still true.
>>  - on the next L2 exit to L0, nmi_known_unmasked is true so
>>    vmx_recover_nmi_blocking does not do anything.
>> 
>> Can you explain instead what happens if your v1 patch is applied (on top of mine),
>> and why it fixes the bug.
>
> We will set the expected guest interruptibility-state field before the
> final step: L0 fixes the shadow page table (NGVA -> HPA), then L0
> resumes the guest w/ the expected guest interruptibility-state.

Aha, that makes sense indeed!  I was confused between the first page
fault (that is reflected to L1) and the second (that is fixed by L0).

Thanks!

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 29fd8af..bc999a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10041,6 +10041,8 @@  static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 			     vmcs12->vm_entry_instruction_len);
 		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
 			     vmcs12->guest_interruptibility_info);
+		vmx->loaded_vmcs->nmi_known_unmasked =
+			!(vmcs12->guest_interruptibility_info & GUEST_INTR_STATE_NMI);
 	} else {
 		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);
 	}