[09/13] KVM: nVMX: Prioritize SMI over nested IRQ/NMI

Message ID	20200423022550.15113-10-sean.j.christopherson@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=WUL8=6H=vger.kernel.org=kvm-owner@kernel.org> IronPort-SDR: +nBxMxcXqWWx39b15NuvTH0UJG7KCKtBAyuyRdi2WfqI7bzD6at7Mbl9YSoHab3ifDPBt7AI9M BccvzRKMF+EA== IronPort-SDR: GP1MlK79NuNQBiJH1vB+o7Gny4E38M0BKWoSiowH0vr7O/PvyqUKi5hWRgXOJyAzoeKnFstH+T 4TEzb4JKsuzg== From: Sean Christopherson <sean.j.christopherson@intel.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>, Peter Shier <pshier@google.com> Subject: [PATCH 09/13] KVM: nVMX: Prioritize SMI over nested IRQ/NMI Date: Wed, 22 Apr 2020 19:25:46 -0700 Message-Id: <20200423022550.15113-10-sean.j.christopherson@intel.com> In-Reply-To: <20200423022550.15113-1-sean.j.christopherson@intel.com> References: <20200423022550.15113-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	KVM: x86: Event fixes and cleanup \| expand [00/13] KVM: x86: Event fixes and cleanup [01/13] KVM: nVMX: Preserve exception priority irrespective of exiting behavior [02/13] KVM: nVMX: Open a window for pending nested VMX preemption timer [03/13] KVM: x86: Set KVM_REQ_EVENT if run is canceled with req_immediate_exit set [04/13] KVM: x86: Make return for {interrupt_nmi}_allowed() a bool instead of int [05/13] KVM: nVMX: Move nested_exit_on_nmi() to nested.h [06/13] KVM: nVMX: Report NMIs as allowed when in L2 and Exit-on-NMI is set [07/13] KVM: VMX: Split out architectural interrupt/NMI blocking checks [08/13] KVM: nVMX: Preserve IRQ/NMI priority irrespective of exiting behavior [09/13] KVM: nVMX: Prioritize SMI over nested IRQ/NMI [10/13] KVM: x86: WARN on injected+pending exception even in nested case [11/13] KVM: VMX: Use vmx_interrupt_blocked() directly from vmx_handle_exit() [12/13] KVM: x86: Replace late check_nested_events() hack with more precise fix [13/13] KVM: VMX: Use vmx_get_rflags() to query RFLAGS in vmx_interrupt_blocked()

Sean Christopherson April 23, 2020, 2:25 a.m. UTC

Check for an unblocked SMI in vmx_check_nested_events() so that pending
SMIs are correctly prioritized over IRQs and NMIs when the latter events
will trigger VM-Exit.  This also fixes an issue where an SMI that was
marked pending while processing a nested VM-Enter wouldn't trigger an
immediate exit, i.e. would be incorrectly delayed until L2 happened to
take a VM-Exit.

Fixes: 64d6067057d96 ("KVM: x86: stubs for SMM support")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++++
 1 file changed, 6 insertions(+)

Jim Mattson April 28, 2020, 10:04 p.m. UTC | #1

On Wed, Apr 22, 2020 at 7:26 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Check for an unblocked SMI in vmx_check_nested_events() so that pending
> SMIs are correctly prioritized over IRQs and NMIs when the latter events
> will trigger VM-Exit.  This also fixes an issue where an SMI that was
> marked pending while processing a nested VM-Enter wouldn't trigger an
> immediate exit, i.e. would be incorrectly delayed until L2 happened to
> take a VM-Exit.
>
> Fixes: 64d6067057d96 ("KVM: x86: stubs for SMM support")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 1fdaca5fd93d..8c16b190816b 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3750,6 +3750,12 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> +       if (vcpu->arch.smi_pending && !is_smm(vcpu)) {
> +               if (block_nested_events)
> +                       return -EBUSY;
> +               goto no_vmexit;
> +       }
> +

From the SDM, volume 3:

• System-management interrupts (SMIs), INIT signals, and higher
priority events take priority over MTF VM exits.

I think this block needs to be moved up.

Sean Christopherson April 28, 2020, 10:59 p.m. UTC | #2

On Tue, Apr 28, 2020 at 03:04:02PM -0700, Jim Mattson wrote:
> On Wed, Apr 22, 2020 at 7:26 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Check for an unblocked SMI in vmx_check_nested_events() so that pending
> > SMIs are correctly prioritized over IRQs and NMIs when the latter events
> > will trigger VM-Exit.  This also fixes an issue where an SMI that was
> > marked pending while processing a nested VM-Enter wouldn't trigger an
> > immediate exit, i.e. would be incorrectly delayed until L2 happened to
> > take a VM-Exit.
> >
> > Fixes: 64d6067057d96 ("KVM: x86: stubs for SMM support")
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 1fdaca5fd93d..8c16b190816b 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3750,6 +3750,12 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > +       if (vcpu->arch.smi_pending && !is_smm(vcpu)) {
> > +               if (block_nested_events)
> > +                       return -EBUSY;
> > +               goto no_vmexit;
> > +       }
> > +
> 
> From the SDM, volume 3:
> 
> • System-management interrupts (SMIs), INIT signals, and higher
> priority events take priority over MTF VM exits.
> 
> I think this block needs to be moved up.

Hrm.  It definitely needs to be moved above the preemption timer, though I
can't find any public documentation about the preemption timer's priority.
Preemption timer is lower priority than MTF, ergo it's not in the same
class as SMI.

Regarding SMI vs. MTF and #DB trap, to actually prioritize SMIs above MTF
and #DBs, we'd need to save/restore MTF and pending #DBs via SMRAM.  I
think it makes sense to take the easy road and keep SMI after the traps,
with a comment to say it's technically wrong but not worth fixing.

Jim Mattson April 28, 2020, 11:16 p.m. UTC | #3

On Tue, Apr 28, 2020 at 3:59 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Apr 28, 2020 at 03:04:02PM -0700, Jim Mattson wrote:
> > On Wed, Apr 22, 2020 at 7:26 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Check for an unblocked SMI in vmx_check_nested_events() so that pending
> > > SMIs are correctly prioritized over IRQs and NMIs when the latter events
> > > will trigger VM-Exit.  This also fixes an issue where an SMI that was
> > > marked pending while processing a nested VM-Enter wouldn't trigger an
> > > immediate exit, i.e. would be incorrectly delayed until L2 happened to
> > > take a VM-Exit.
> > >
> > > Fixes: 64d6067057d96 ("KVM: x86: stubs for SMM support")
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index 1fdaca5fd93d..8c16b190816b 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3750,6 +3750,12 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> > >                 return 0;
> > >         }
> > >
> > > +       if (vcpu->arch.smi_pending && !is_smm(vcpu)) {
> > > +               if (block_nested_events)
> > > +                       return -EBUSY;
> > > +               goto no_vmexit;
> > > +       }
> > > +
> >
> > From the SDM, volume 3:
> >
> > • System-management interrupts (SMIs), INIT signals, and higher
> > priority events take priority over MTF VM exits.
> >
> > I think this block needs to be moved up.
>
> Hrm.  It definitely needs to be moved above the preemption timer, though I
> can't find any public documentation about the preemption timer's priority.
> Preemption timer is lower priority than MTF, ergo it's not in the same
> class as SMI.
>
> Regarding SMI vs. MTF and #DB trap, to actually prioritize SMIs above MTF
> and #DBs, we'd need to save/restore MTF and pending #DBs via SMRAM.  I
> think it makes sense to take the easy road and keep SMI after the traps,
> with a comment to say it's technically wrong but not worth fixing.

Pending debug exceptions should just go in the pending debug
exceptions field. End of story and end of complications. I don't
understand why kvm is so averse to using this field the way it was
intended.

As for the MTF, section 34.14.1 of the SDM, volume 3, clearly states:

The pseudocode above makes reference to the saving of VMX-critical
state. This state consists of the following:
(1) SS.DPL (the current privilege level); (2) RFLAGS.VM; (3) the state
of blocking by STI and by MOV SS (see
Table 24-3 in Section 24.4.2); (4) the state of virtual-NMI blocking
(only if the processor is in VMX non-root oper-
ation and the “virtual NMIs” VM-execution control is 1); and (5) an
indication of whether an MTF VM exit is pending
(see Section 25.5.2). These data may be saved internal to the
processor or in the VMCS region of the current
VMCS. Processors that do not support SMI recognition while there is
blocking by STI or by MOV SS need not save
the state of such blocking.

I haven't really looked at kvm's implementation of SMM (because Google
doesn't support it), but it seems that the "MTF VM exit is pending"
bit should be trivial to deal with. I assume we save the other
VMX-critical state somewhere!

Jim Mattson April 28, 2020, 11:23 p.m. UTC | #4

On Tue, Apr 28, 2020 at 3:59 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Apr 28, 2020 at 03:04:02PM -0700, Jim Mattson wrote:
> > On Wed, Apr 22, 2020 at 7:26 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > Check for an unblocked SMI in vmx_check_nested_events() so that pending
> > > SMIs are correctly prioritized over IRQs and NMIs when the latter events
> > > will trigger VM-Exit.  This also fixes an issue where an SMI that was
> > > marked pending while processing a nested VM-Enter wouldn't trigger an
> > > immediate exit, i.e. would be incorrectly delayed until L2 happened to
> > > take a VM-Exit.
> > >
> > > Fixes: 64d6067057d96 ("KVM: x86: stubs for SMM support")
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index 1fdaca5fd93d..8c16b190816b 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3750,6 +3750,12 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> > >                 return 0;
> > >         }
> > >
> > > +       if (vcpu->arch.smi_pending && !is_smm(vcpu)) {
> > > +               if (block_nested_events)
> > > +                       return -EBUSY;
> > > +               goto no_vmexit;
> > > +       }
> > > +
> >
> > From the SDM, volume 3:
> >
> > • System-management interrupts (SMIs), INIT signals, and higher
> > priority events take priority over MTF VM exits.
> >
> > I think this block needs to be moved up.
>
> Hrm.  It definitely needs to be moved above the preemption timer, though I
> can't find any public documentation about the preemption timer's priority.

Section 25.2 of the SDM, volume 3:

Debug-trap exceptions and higher priority events take priority over VM
exits caused by the VMX-preemption
timer. VM exits caused by the VMX-preemption timer take priority over
VM exits caused by the “NMI-window
exiting” VM-execution control and lower priority events.

Sean Christopherson April 29, 2020, 2:50 p.m. UTC | #5

On Tue, Apr 28, 2020 at 04:16:16PM -0700, Jim Mattson wrote:
> On Tue, Apr 28, 2020 at 3:59 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 03:04:02PM -0700, Jim Mattson wrote:
> > > From the SDM, volume 3:
> > >
> > > • System-management interrupts (SMIs), INIT signals, and higher
> > > priority events take priority over MTF VM exits.
> > >
> > > I think this block needs to be moved up.
> >
> > Hrm.  It definitely needs to be moved above the preemption timer, though I
> > can't find any public documentation about the preemption timer's priority.
> > Preemption timer is lower priority than MTF, ergo it's not in the same
> > class as SMI.
> >
> > Regarding SMI vs. MTF and #DB trap, to actually prioritize SMIs above MTF
> > and #DBs, we'd need to save/restore MTF and pending #DBs via SMRAM.  I
> > think it makes sense to take the easy road and keep SMI after the traps,
> > with a comment to say it's technically wrong but not worth fixing.
> 
> Pending debug exceptions should just go in the pending debug
> exceptions field. End of story and end of complications. I don't
> understand why kvm is so averse to using this field the way it was
> intended.

Ah, it took my brain a bit to catch on.  I assume you're suggesting calling
nested_vmx_updated_pending_dbg() so that the pending #DB naturally gets
propagated to/from vmcs12 on SMI/RSM?  I think that should work.

> As for the MTF, section 34.14.1 of the SDM, volume 3, clearly states:
> 
> The pseudocode above makes reference to the saving of VMX-critical
> state. This state consists of the following:
> (1) SS.DPL (the current privilege level); (2) RFLAGS.VM; (3) the state
> of blocking by STI and by MOV SS (see
> Table 24-3 in Section 24.4.2); (4) the state of virtual-NMI blocking
> (only if the processor is in VMX non-root oper-
> ation and the “virtual NMIs” VM-execution control is 1); and (5) an
> indication of whether an MTF VM exit is pending
> (see Section 25.5.2). These data may be saved internal to the
> processor or in the VMCS region of the current
> VMCS. Processors that do not support SMI recognition while there is
> blocking by STI or by MOV SS need not save
> the state of such blocking.
> 
> I haven't really looked at kvm's implementation of SMM (because Google
> doesn't support it), but it seems that the "MTF VM exit is pending"
> bit should be trivial to deal with. I assume we save the other
> VMX-critical state somewhere!

True, I spaced on the extistence of vmx_pre_{enter,leave}_smm().

I'll send a patch, the delta to what's in kvm/queue should actually be
quite small.

Sean Christopherson April 29, 2020, 8:06 p.m. UTC | #6

On Wed, Apr 29, 2020 at 07:50:40AM -0700, Sean Christopherson wrote:
> On Tue, Apr 28, 2020 at 04:16:16PM -0700, Jim Mattson wrote:
> > On Tue, Apr 28, 2020 at 3:59 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 03:04:02PM -0700, Jim Mattson wrote:
> > > > From the SDM, volume 3:
> > > >
> > > > • System-management interrupts (SMIs), INIT signals, and higher
> > > > priority events take priority over MTF VM exits.
> > > >
> > > > I think this block needs to be moved up.
> > >
> > > Hrm.  It definitely needs to be moved above the preemption timer, though I
> > > can't find any public documentation about the preemption timer's priority.
> > > Preemption timer is lower priority than MTF, ergo it's not in the same
> > > class as SMI.
> > >
> > > Regarding SMI vs. MTF and #DB trap, to actually prioritize SMIs above MTF
> > > and #DBs, we'd need to save/restore MTF and pending #DBs via SMRAM.  I
> > > think it makes sense to take the easy road and keep SMI after the traps,
> > > with a comment to say it's technically wrong but not worth fixing.
> > 
> > Pending debug exceptions should just go in the pending debug
> > exceptions field. End of story and end of complications. I don't
> > understand why kvm is so averse to using this field the way it was
> > intended.
> 
> Ah, it took my brain a bit to catch on.  I assume you're suggesting calling
> nested_vmx_updated_pending_dbg() so that the pending #DB naturally gets
> propagated to/from vmcs12 on SMI/RSM?  I think that should work.

This works for L2, but not L1 :-(  And L2 can't be fixed without first
fixing L1 because inject_pending_event() also incorrectly prioritizes #DB
over SMI.  For L1, utilizing SMRAM to save/restore the pending #DB is
likely the easiest solution as it avoids having to add new state for
migration.

I have everything coded up but it'll probably take a few weeks to test and
get it sent out, need to focus on other stuff for a while.

[09/13] KVM: nVMX: Prioritize SMI over nested IRQ/NMI

Commit Message

Comments

Patch