diff mbox series

[2/2] KVM: x86: Mask LVTPC when handling a PMI

Message ID 20230901185646.2823254-2-jmattson@google.com (mailing list archive)
State New, archived
Headers show
Series [1/2] KVM: x86: Synthesize at most one PMI per VM-exit | expand

Commit Message

Jim Mattson Sept. 1, 2023, 6:56 p.m. UTC
Per the SDM, "When the local APIC handles a performance-monitoring
counters interrupt, it automatically sets the mask flag in the LVT
performance counter register."

Add this behavior to KVM's local APIC emulation, to reduce the
incidence of "dazed and confused" spurious NMI warnings in Linux
guests (at least, those that use a PMI handler with "late_ack").

Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/lapic.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Mingwei Zhang Sept. 2, 2023, 7:06 p.m. UTC | #1
On Fri, Sep 01, 2023, Jim Mattson wrote:
> Per the SDM, "When the local APIC handles a performance-monitoring
> counters interrupt, it automatically sets the mask flag in the LVT
> performance counter register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the
> incidence of "dazed and confused" spurious NMI warnings in Linux
> guests (at least, those that use a PMI handler with "late_ack").
> 
> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
> Signed-off-by: Jim Mattson <jmattson@google.com>
Tested-by: Mingwei Zhang <mizhang@google.com>

I see consistent number of PMIs and NMIs when running perf on an idle
VM.
> ---
>  arch/x86/kvm/lapic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a983a16163b1..1a79ec54ae1e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> +		if (lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
>  		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
>  					NULL);
>  	}
> -- 
> 2.42.0.283.g2d96d420d3-goog
>
Mi, Dapeng1 Sept. 6, 2023, 8:59 a.m. UTC | #2
> -----Original Message-----
> From: Jim Mattson <jmattson@google.com>
> Sent: Saturday, September 2, 2023 2:57 AM
> To: kvm@vger.kernel.org; Christopherson,, Sean <seanjc@google.com>; Paolo
> Bonzini <pbonzini@redhat.com>
> Cc: Xu, Like <likexu@tencent.com>; Mingwei Zhang <mizhang@google.com>;
> Roman Kagan <rkagan@amazon.de>; Liang, Kan <kan.liang@intel.com>; Mi,
> Dapeng1 <dapeng1.mi@intel.com>; Jim Mattson <jmattson@google.com>
> Subject: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
> 
> Per the SDM, "When the local APIC handles a performance-monitoring counters
> interrupt, it automatically sets the mask flag in the LVT performance counter
> register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the incidence of
> "dazed and confused" spurious NMI warnings in Linux guests (at least, those that
> use a PMI handler with "late_ack").
> 
> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/lapic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> a983a16163b1..1a79ec54ae1e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int
> lvt_type)
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> +		if (lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg |
> APIC_LVT_MASKED);

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

>  		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
>  					NULL);
>  	}
> --
> 2.42.0.283.g2d96d420d3-goog
Sean Christopherson Sept. 22, 2023, 6:22 p.m. UTC | #3
On Fri, Sep 01, 2023, Jim Mattson wrote:
> Per the SDM, "When the local APIC handles a performance-monitoring
> counters interrupt, it automatically sets the mask flag in the LVT
> performance counter register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the
> incidence of "dazed and confused" spurious NMI warnings in Linux
> guests (at least, those that use a PMI handler with "late_ack").

Hmm, I don't like the "to reduce the incidence" language as that suggests that
this isn't a hard requirement.  That makes it sound like KVM is doing the guest
a favor.  This?

    Per the SDM, "When the local APIC handles a performance-monitoring
    counters interrupt, it automatically sets the mask flag in the LVT
    performance counter register."  Add this behavior to KVM's local APIC
    emulation.
    
    Failure to mask the LVTPC entry results in spurious PMIs, e.g. when
    running Linux as a guest, PMI handlers that do a "late_ack" spew a large
    number of "dazed and confused" spurious NMI warnings.

> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/lapic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a983a16163b1..1a79ec54ae1e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> +		if (lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);

IMO, unconditionally setting the masked bit is wrong.  I'm 99% certain KVM should
only set the mask bit if an interrupt is actually delivered somehwere.

__apic_accept_irq() "fails" as follows:

  APIC_DM_LOWEST and APIC_DM_FIXED
	1. Trigger Mode is "level" triggered and level is deasserted.  This can't
           happen because this code hardcodes the level to '1'.

        2. APIC is disabled (H/W or S/W).  This is a non-issue because the H/W
           disabled case ignores it entirely, and all masks are defined to be set
           if the APIC is S/W disabled (per the SDM):

           The mask bits for all the LVT entries are set. Attempts to reset these
           bits will be ignored.


  APIC_DM_SMI
        1. If SMI injection fails because CONFIG_KVM_SMM=n.

  APIC_DM_INIT
        1. Trigger Mode is "level" triggered and level is deasserted.  As above,
           this can't happen.

  APIC_DM_EXTINT
        1. Unconditionally ignored by KVM.  This is architecturally correct for
           the LVTPC as the SDM says:

           Not supported for the LVT CMCI register, the LVT thermal monitor
           register, or the LVT performance counter register.

So basically, failure happens if and only if the guest attempts to send an SMI or
ExtINT that KVM ignores.  The SDM doesn't explicitly state that mask bit is left
unset in these cases, but my reading of

  When the local APIC handles a performance-monitoring counters interrupt, it
  automatically sets the mask flag in the LVT performance counter register.

is that there has to be an actual interrupt.

I highly doubt any software will ever care, e.g. the guest is going to be unhappy
in CONFIG_KVM_SMM=n case no matter what, but setting the mask bit without actually
triggering an interrupt seems odd.

This as fixup?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 71d87b0db0d9..ebfc3d92a266 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2759,15 +2759,17 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 {
        u32 reg = kvm_lapic_get_reg(apic, lvt_type);
        int vector, mode, trig_mode;
+       int r;
 
        if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
                vector = reg & APIC_VECTOR_MASK;
                mode = reg & APIC_MODE_MASK;
                trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
-               if (lvt_type == APIC_LVTPC)
+
+               r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
+               if (r && lvt_type == APIC_LVTPC)
                        kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
-               return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
-                                       NULL);
+               return r;
        }
        return 0;
 }
Jim Mattson Sept. 25, 2023, 5:52 p.m. UTC | #4
On Fri, Sep 22, 2023 at 11:22 AM Sean Christopherson <seanjc@google.com> wrote:
>The SDM doesn't explicitly state that mask bit is left
> unset in these cases, but my reading of
>
>   When the local APIC handles a performance-monitoring counters interrupt, it
>   automatically sets the mask flag in the LVT performance counter register.
>
> is that there has to be an actual interrupt.

I assume you mean an actual interrupt from the APIC, not an actual PMI
to the APIC.

I would argue that one way of "handling" a performance-monitoring
counters interrupt is to do nothing with it, but apparently I'm wrong.
At least, if I set the delivery mode in LVTPC to the illegal value of
6, I never see the LVTPC.MASK bit get set.
Sean Christopherson Sept. 25, 2023, 6 p.m. UTC | #5
On Mon, Sep 25, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 11:22 AM Sean Christopherson <seanjc@google.com> wrote:
> >The SDM doesn't explicitly state that mask bit is left
> > unset in these cases, but my reading of
> >
> >   When the local APIC handles a performance-monitoring counters interrupt, it
> >   automatically sets the mask flag in the LVT performance counter register.
> >
> > is that there has to be an actual interrupt.
> 
> I assume you mean an actual interrupt from the APIC, not an actual PMI
> to the APIC.

Yeah.

> I would argue that one way of "handling" a performance-monitoring
> counters interrupt is to do nothing with it, but apparently I'm wrong.
> At least, if I set the delivery mode in LVTPC to the illegal value of
> 6, I never see the LVTPC.MASK bit get set.

Heh, you could complain to Intel and see if they'll change "handles" to "delivers" :-)
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a983a16163b1..1a79ec54ae1e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2743,6 +2743,8 @@  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 		vector = reg & APIC_VECTOR_MASK;
 		mode = reg & APIC_MODE_MASK;
 		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+		if (lvt_type == APIC_LVTPC)
+			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
 		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
 					NULL);
 	}