diff mbox series

[Bug,217304] KVM does not handle NMI blocking correctly in nested virtualization

Message ID bug-217304-28872-obRYeQMhMS@https.bugzilla.kernel.org/ (mailing list archive)
State New, archived
Headers show
Series [Bug,217304] KVM does not handle NMI blocking correctly in nested virtualization | expand

Commit Message

bugzilla-daemon@kernel.org April 6, 2023, 7:14 p.m. UTC
https://bugzilla.kernel.org/show_bug.cgi?id=217304

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
On Thu, Apr 06, 2023, bugzilla-daemon@kernel.org wrote:
> Assume KVM runs in L0, LHV runs in L1, the nested guest runs in L2.
> 
> The code in LHV performs an experiment (called "Experiment 13" in serial
> output) on CPU 0 to test the behavior of NMI blocking. The experiment steps
> are:
> 1. Prepare state such that the CPU is currently in L1 (LHV), and NMI is
> blocked
> 2. Modify VMCS12 to make sure that L2 has virtual NMIs enabled (NMI exiting =
> 1, Virtual NMIs = 1), and L2 does not block NMI (Blocking by NMI = 0)
> 3. VM entry to L2
> 4. L2 performs VMCALL, get VM exit to L1
> 5. L1 checks whether NMI is blocked.
> 
> The expected behavior is that NMI should be blocked, which is reproduced on
> real hardware. According to Intel SDM, NMIs should be unblocked after VM
> entry
> to L2 (step 3). After VM exit to L1 (step 4), NMI blocking does not change,
> so
> NMIs are still unblocked. This behavior is reproducible on real hardware.
> 
> However, when running on KVM, the experiment shows that at step 5, NMIs are
> blocked in L1. Thus, I think NMI blocking is not implemented correctly in
> KVM's
> nested virtualization.

Ya, KVM blocks NMIs on nested NMI VM-Exits, but doesn't unblock NMIs for all
other
exit types.  I believe this is the fix (untested):

---
 arch/x86/kvm/vmx/nested.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)


@@ -4865,6 +4860,13 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32
vm_exit_reason,
                                INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
                }

+               /*
+                * NMIs are blocked on VM-Exit due to NMI, and unblocked by all
+                * other VM-Exit types.
+                */
+               vmx_set_nmi_mask(vcpu, (u16)vm_exit_reason ==
EXIT_REASON_EXCEPTION_NMI &&
+                                      !is_nmi(vmcs12->vm_exit_intr_info));
+
                if (vm_exit_reason != -1)
                        trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason,
                                                      
vmcs12->exit_qualification,

base-commit: 0b87a6bfd1bdb47b766aa0641b7cf93f3d3227e9

Comments

Sean Christopherson April 12, 2023, 5 p.m. UTC | #1
On Thu, Apr 06, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217304
> 
> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> On Thu, Apr 06, 2023, bugzilla-daemon@kernel.org wrote:
> > Assume KVM runs in L0, LHV runs in L1, the nested guest runs in L2.
> > 
> > The code in LHV performs an experiment (called "Experiment 13" in serial
> > output) on CPU 0 to test the behavior of NMI blocking. The experiment steps
> > are:
> > 1. Prepare state such that the CPU is currently in L1 (LHV), and NMI is
> > blocked
> > 2. Modify VMCS12 to make sure that L2 has virtual NMIs enabled (NMI exiting =
> > 1, Virtual NMIs = 1), and L2 does not block NMI (Blocking by NMI = 0)
> > 3. VM entry to L2
> > 4. L2 performs VMCALL, get VM exit to L1
> > 5. L1 checks whether NMI is blocked.
> > 
> > The expected behavior is that NMI should be blocked, which is reproduced on
> > real hardware. According to Intel SDM, NMIs should be unblocked after VM
> > entry
> > to L2 (step 3). After VM exit to L1 (step 4), NMI blocking does not change,
> > so
> > NMIs are still unblocked. This behavior is reproducible on real hardware.
> > 
> > However, when running on KVM, the experiment shows that at step 5, NMIs are
> > blocked in L1. Thus, I think NMI blocking is not implemented correctly in
> > KVM's
> > nested virtualization.
> 
> Ya, KVM blocks NMIs on nested NMI VM-Exits, but doesn't unblock NMIs for all
> other
> exit types.  I believe this is the fix (untested):
> 
> ---
>  arch/x86/kvm/vmx/nested.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 96ede74a6067..4240a052628a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4164,12 +4164,7 @@ static int vmx_check_nested_events(struct kvm_vcpu
> *vcpu)
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>                                   NMI_VECTOR | INTR_TYPE_NMI_INTR |
>                                   INTR_INFO_VALID_MASK, 0);
> -               /*
> -                * The NMI-triggered VM exit counts as injection:
> -                * clear this one and block further NMIs.
> -                */
>                 vcpu->arch.nmi_pending = 0;
> -               vmx_set_nmi_mask(vcpu, true);
>                 return 0;
>         }
> 
> @@ -4865,6 +4860,13 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32
> vm_exit_reason,
>                                 INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
>                 }
> 
> +               /*
> +                * NMIs are blocked on VM-Exit due to NMI, and unblocked by all
> +                * other VM-Exit types.
> +                */
> +               vmx_set_nmi_mask(vcpu, (u16)vm_exit_reason ==
> EXIT_REASON_EXCEPTION_NMI &&
> +                                      !is_nmi(vmcs12->vm_exit_intr_info));

Ugh, this is wrong.  As Eric stated in the bug report, and per section "27.5.5
Updating Non-Register State", VM-Exit does *not* affect NMI blocking except if
the VM-Exit is directly due to an NMI

  Event blocking is affected as follows:
    * There is no blocking by STI or by MOV SS after a VM exit.
    * VM exits caused directly by non-maskable interrupts (NMIs) cause blocking by
      NMI (see Table 24-3). Other VM exits do not affect blocking by NMI. (See
      Section 27.1 for the case in which an NMI causes a VM exit indirectly.)

The scenario here is that virtual NMIs are enabled, in which case case VM-Enter,
not VM-Exit, effectively clears NMI blocking.  From "26.7.1 Interruptibility State":

  The blocking of non-maskable interrupts (NMIs) is determined as follows:
    * If the "virtual NMIs" VM-execution control is 0, NMIs are blocked if and
      only if bit 3 (blocking by NMI) in the interruptibility-state field is 1.
      If the "NMI exiting" VM-execution control is 0, execution of the IRET
      instruction removes this blocking (even if the instruction generates a fault).
      If the "NMI exiting" control is 1, IRET does not affect this blocking.
    * The following items describe the use of bit 3 (blocking by NMI) in the
      interruptibility-state field if the "virtual NMIs" VM-execution control is 1:
        * The bit’s value does not affect the blocking of NMIs after VM entry. NMIs
          are not blocked in VMX non-root operation (except for ordinary blocking
          for other reasons, such as by the MOV SS instruction, the wait-for-SIPI
          state, etc.)
        * The bit’s value determines whether there is virtual-NMI blocking after VM
          entry. If the bit is 1, virtual-NMI blocking is in effect after VM entry.
          If the bit is 0, there is no virtual-NMI blocking after VM entry unless
          the VM entry is injecting an NMI (see Section 26.6.1.1). Execution of IRET
          removes virtual-NMI blocking (even if the instruction generates a fault).

I.e. forcing NMIs to be unblocked is wrong when virtual NMIs are disabled.

Unfortunately, that means fixing this will require a much more involved patch
(series?), e.g. KVM can't modify NMI blocking until the VM-Enter is successful,
at which point vmcs02, not vmcs01, is loaded, and so KVM will likely need to
to track NMI blocking in a software variable.  That in turn gets complicated by
the !vNMI case, because then KVM needs to propagate NMI blocking between vmcs01,
vmcs12, and vmcs02.  Blech.

I'm going to punt fixing this due to lack of bandwidth, and AFAIK lack of a use
case beyond testing.  Hopefully I'll be able to revisit this in a few weeks, but
that might be wishful thinking.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 96ede74a6067..4240a052628a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4164,12 +4164,7 @@  static int vmx_check_nested_events(struct kvm_vcpu
*vcpu)
                nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
                                  NMI_VECTOR | INTR_TYPE_NMI_INTR |
                                  INTR_INFO_VALID_MASK, 0);
-               /*
-                * The NMI-triggered VM exit counts as injection:
-                * clear this one and block further NMIs.
-                */
                vcpu->arch.nmi_pending = 0;
-               vmx_set_nmi_mask(vcpu, true);
                return 0;
        }