[12/19] x86/mce: handle LMCE locally

Message ID	20170217063936.13208-13-haozhong.zhang@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Haozhong Zhang <haozhong.zhang@intel.com> To: xen-devel@lists.xen.org Date: Fri, 17 Feb 2017 14:39:29 +0800 Message-Id: <20170217063936.13208-13-haozhong.zhang@intel.com> In-Reply-To: <20170217063936.13208-1-haozhong.zhang@intel.com> References: <20170217063936.13208-1-haozhong.zhang@intel.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com>, Christoph Egger <chegger@amazon.de>, Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>, Liu Jinsong <jinsong.liu@alibaba-inc.com> Subject: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Message ID

20170217063936.13208-13-haozhong.zhang@intel.com (mailing list archive)

State

New, archived

Headers

From: Haozhong Zhang <haozhong.zhang@intel.com>
To: xen-devel@lists.xen.org
Date: Fri, 17 Feb 2017 14:39:29 +0800
Message-Id: <20170217063936.13208-13-haozhong.zhang@intel.com>
In-Reply-To: <20170217063936.13208-1-haozhong.zhang@intel.com>
References: <20170217063936.13208-1-haozhong.zhang@intel.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
	Christoph Egger <chegger@amazon.de>,
	Andrew Cooper <andrew.cooper3@citrix.com>, 
	Jan Beulich <jbeulich@suse.com>,
	Liu Jinsong <jinsong.liu@alibaba-inc.com>
Subject: [Xen-devel] [PATCH 12/19] x86/mce: handle LMCE locally
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Commit Message

Haozhong Zhang Feb. 17, 2017, 6:39 a.m. UTC

LMCE is sent to only one CPU thread, so MCE handler, barriers and
softirq handler should go without waiting for other CPUs, when
handling LMCE. Note LMCE is still broadcast to all vcpus as regular
MCE on Intel CPU right now.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Christoph Egger <chegger@amazon.de>
Cc: Liu Jinsong <jinsong.liu@alibaba-inc.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/mcheck/barrier.c  |  4 ++--
 xen/arch/x86/cpu/mcheck/mcaction.c |  4 +++-
 xen/arch/x86/cpu/mcheck/mce.c      | 25 ++++++++++++++++++++++---
 xen/arch/x86/cpu/mcheck/mce.h      |  3 +++
 xen/arch/x86/cpu/mcheck/x86_mca.h  |  4 +++-
 5 files changed, 33 insertions(+), 7 deletions(-)

Comments

Jan Beulich Feb. 22, 2017, 1:53 p.m. UTC | #1

>>> On 17.02.17 at 07:39, <haozhong.zhang@intel.com> wrote:
> --- a/xen/arch/x86/cpu/mcheck/barrier.c
> +++ b/xen/arch/x86/cpu/mcheck/barrier.c
> @@ -20,7 +20,7 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar)
>  {
>      int gen;
>  
> -    if (!mce_broadcast)
> +    if ( !mce_broadcast || __get_cpu_var(lmce_in_process) )

this_cpu() please instead of __get_cpu_var() (which we should get
rid of rather sooner than later).

> @@ -462,6 +474,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
>      uint64_t gstatus;
>      mctelem_cookie_t mctc = NULL;
>      struct mca_summary bs;
> +    bool *lmce_in_process = &__get_cpu_var(lmce_in_process);
>  
>      mce_spin_lock(&mce_logout_lock);
>  
> @@ -505,6 +518,8 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
>      }
>      mce_spin_unlock(&mce_logout_lock);
>  
> +    *lmce_in_process = bs.lmce;

You don't need a new local variable for this.

> @@ -1709,6 +1724,7 @@ static void mce_softirq(void)
>  {
>      int cpu = smp_processor_id();
>      unsigned int workcpu;
> +    bool lmce = per_cpu(lmce_in_process, cpu);

Is this flag valid to be looked at anymore at this point in time? MCIP
was cleared a lot earlier, so there may well have been a 2nd #MC
in between. In any event you again don#t need the local variable
here afaict.

Jan

Haozhong Zhang Feb. 23, 2017, 3:06 a.m. UTC | #2

On 02/22/17 06:53 -0700, Jan Beulich wrote:
> >>> On 17.02.17 at 07:39, <haozhong.zhang@intel.com> wrote:
> > --- a/xen/arch/x86/cpu/mcheck/barrier.c
> > +++ b/xen/arch/x86/cpu/mcheck/barrier.c
> > @@ -20,7 +20,7 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar)
> >  {
> >      int gen;
> >  
> > -    if (!mce_broadcast)
> > +    if ( !mce_broadcast || __get_cpu_var(lmce_in_process) )
> 
> this_cpu() please instead of __get_cpu_var() (which we should get
> rid of rather sooner than later).

will use this_cpu()

>
> > @@ -462,6 +474,7 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
> >      uint64_t gstatus;
> >      mctelem_cookie_t mctc = NULL;
> >      struct mca_summary bs;
> > +    bool *lmce_in_process = &__get_cpu_var(lmce_in_process);
> >  
> >      mce_spin_lock(&mce_logout_lock);
> >  
> > @@ -505,6 +518,8 @@ void mcheck_cmn_handler(const struct cpu_user_regs *regs)
> >      }
> >      mce_spin_unlock(&mce_logout_lock);
> >  
> > +    *lmce_in_process = bs.lmce;
> 
> You don't need a new local variable for this.
> 
> > @@ -1709,6 +1724,7 @@ static void mce_softirq(void)
> >  {
> >      int cpu = smp_processor_id();
> >      unsigned int workcpu;
> > +    bool lmce = per_cpu(lmce_in_process, cpu);
> 
> Is this flag valid to be looked at anymore at this point in time? MCIP
> was cleared a lot earlier, so there may well have been a 2nd #MC
> in between. In any event you again don#t need the local variable
> here afaict.

A non-LMCE MC# coming in between does not cause problem. As
lmce_in_process on all CPUs are cleared, mce_softirq() on all CPUs
will sync with each other as before and finally one of them will
handle the pending LMCE.

I think the problem is one flag is not enough rather than non
needed. One lmce_in_process flag misses the following case:
1) mcheck_cmn_handler() first handles a non-LMCE MC on CPU#n and raises
   MACHINE_CHECK_SOFTIRQ.
2) Before mce_softirq() gets chance to run on CPU#n, another LMCE
   comes to CPU#n. Then mcheck_cmn_handler() sets lmce_in_process on
   CPU#n and raises MACHINE_CHECK_SOFTIRQ again.
3) mce_softirq() finally gets change to run on CPU#n. It sees
   lmce_in_process is set and consequently handles pending MCEs on
   CPU#n w/o waiting for other CPUs. However, one of the pending MCEs
   is not LMCE.

So I'm considering to introduce another local flag "mce_in_process" to
indicate whether there is a non-LMCE MC is pending for softirq.
1) When a non-LMCE MC# comes to CPU#n, mcheck_cmn_handler() sets
   mce_in_process flag on CPU#n.
2) When a LMCE comes to CPU#n, mcheck_cmn_handler() sets
   lmce_in_process flag on CPU#n.
3) When mce_softirq() starts, it clears lmce_in_process flag if
   mce_in_process is set, so it will not handle non-LMCE MC w/o
   waiting for other CPUs.
4) mce_softirq() clears both flags after exiting all MC barriers.

Haozhong

Jan Beulich Feb. 23, 2017, 7:42 a.m. UTC | #3

>>> On 23.02.17 at 04:06, <haozhong.zhang@intel.com> wrote:
> On 02/22/17 06:53 -0700, Jan Beulich wrote:
>> >>> On 17.02.17 at 07:39, <haozhong.zhang@intel.com> wrote:
>> > @@ -1709,6 +1724,7 @@ static void mce_softirq(void)
>> >  {
>> >      int cpu = smp_processor_id();
>> >      unsigned int workcpu;
>> > +    bool lmce = per_cpu(lmce_in_process, cpu);
>> 
>> Is this flag valid to be looked at anymore at this point in time? MCIP
>> was cleared a lot earlier, so there may well have been a 2nd #MC
>> in between. In any event you again don#t need the local variable
>> here afaict.
> 
> A non-LMCE MC# coming in between does not cause problem. As
> lmce_in_process on all CPUs are cleared, mce_softirq() on all CPUs
> will sync with each other as before and finally one of them will
> handle the pending LMCE.
> 
> I think the problem is one flag is not enough rather than non
> needed. One lmce_in_process flag misses the following case:
> 1) mcheck_cmn_handler() first handles a non-LMCE MC on CPU#n and raises
>    MACHINE_CHECK_SOFTIRQ.
> 2) Before mce_softirq() gets chance to run on CPU#n, another LMCE
>    comes to CPU#n. Then mcheck_cmn_handler() sets lmce_in_process on
>    CPU#n and raises MACHINE_CHECK_SOFTIRQ again.
> 3) mce_softirq() finally gets change to run on CPU#n. It sees
>    lmce_in_process is set and consequently handles pending MCEs on
>    CPU#n w/o waiting for other CPUs. However, one of the pending MCEs
>    is not LMCE.
> 
> So I'm considering to introduce another local flag "mce_in_process" to
> indicate whether there is a non-LMCE MC is pending for softirq.
> 1) When a non-LMCE MC# comes to CPU#n, mcheck_cmn_handler() sets
>    mce_in_process flag on CPU#n.
> 2) When a LMCE comes to CPU#n, mcheck_cmn_handler() sets
>    lmce_in_process flag on CPU#n.
> 3) When mce_softirq() starts, it clears lmce_in_process flag if
>    mce_in_process is set, so it will not handle non-LMCE MC w/o
>    waiting for other CPUs.
> 4) mce_softirq() clears both flags after exiting all MC barriers.

I'm afraid I still don't see how a static flag can deal with an
arbitrary number of #MC-s arriving prior to the softirq handler
getting a chance to run after the very first one came in.

Jan

Haozhong Zhang Feb. 23, 2017, 8:38 a.m. UTC | #4

On 02/23/17 00:42 -0700, Jan Beulich wrote:
> >>> On 23.02.17 at 04:06, <haozhong.zhang@intel.com> wrote:
> > On 02/22/17 06:53 -0700, Jan Beulich wrote:
> >> >>> On 17.02.17 at 07:39, <haozhong.zhang@intel.com> wrote:
> >> > @@ -1709,6 +1724,7 @@ static void mce_softirq(void)
> >> >  {
> >> >      int cpu = smp_processor_id();
> >> >      unsigned int workcpu;
> >> > +    bool lmce = per_cpu(lmce_in_process, cpu);
> >> 
> >> Is this flag valid to be looked at anymore at this point in time? MCIP
> >> was cleared a lot earlier, so there may well have been a 2nd #MC
> >> in between. In any event you again don#t need the local variable
> >> here afaict.
> > 
> > A non-LMCE MC# coming in between does not cause problem. As
> > lmce_in_process on all CPUs are cleared, mce_softirq() on all CPUs
> > will sync with each other as before and finally one of them will
> > handle the pending LMCE.
> > 
> > I think the problem is one flag is not enough rather than non
> > needed. One lmce_in_process flag misses the following case:
> > 1) mcheck_cmn_handler() first handles a non-LMCE MC on CPU#n and raises
> >    MACHINE_CHECK_SOFTIRQ.
> > 2) Before mce_softirq() gets chance to run on CPU#n, another LMCE
> >    comes to CPU#n. Then mcheck_cmn_handler() sets lmce_in_process on
> >    CPU#n and raises MACHINE_CHECK_SOFTIRQ again.
> > 3) mce_softirq() finally gets change to run on CPU#n. It sees
> >    lmce_in_process is set and consequently handles pending MCEs on
> >    CPU#n w/o waiting for other CPUs. However, one of the pending MCEs
> >    is not LMCE.
> > 
> > So I'm considering to introduce another local flag "mce_in_process" to
> > indicate whether there is a non-LMCE MC is pending for softirq.
> > 1) When a non-LMCE MC# comes to CPU#n, mcheck_cmn_handler() sets
> >    mce_in_process flag on CPU#n.
> > 2) When a LMCE comes to CPU#n, mcheck_cmn_handler() sets
> >    lmce_in_process flag on CPU#n.
> > 3) When mce_softirq() starts, it clears lmce_in_process flag if
> >    mce_in_process is set, so it will not handle non-LMCE MC w/o
> >    waiting for other CPUs.
> > 4) mce_softirq() clears both flags after exiting all MC barriers.
> 
> I'm afraid I still don't see how a static flag can deal with an
> arbitrary number of #MC-s arriving prior to the softirq handler
> getting a chance to run after the very first one came in.
>

mce_barrier_enter/exit need to be adapted to check both flags to
decide whether it needs to wait for others.
    void mce_barrier_enter/exit(struct mce_softirq_barrier *bar)
    {
        int gen;

    -   if (!mce_broadcast)
    +   if ( !mce_broadcast ||
    +        (!this_cpu(mce_in_process) && this_cpu(lmce_in_process)) )
            return;


When mce softirq handler finally starts to run, regardless how many
MCs have happened before,

1) if it sees only lmce_in_process flag is set, all pending MCs in
   this_cpu(mctctl.pending) are LMCE and the handler does not need to
   wait for others and mce_barrier_enter/exit allows it to do so.

2) if it sees mce_in_process flag is set, all or some pending MCs in
   this_cpu(mctctl.pending) are non-LMCE, i.e. other CPUs have
   received them as well, so mce softirq handler should wait for
   others and mce_barrier_enter/exit does wait for others.


Haozhong

diff --git a/xen/arch/x86/cpu/mcheck/barrier.c b/xen/arch/x86/cpu/mcheck/barrier.c
index 5dce1fb..869fd20 100644
--- a/xen/arch/x86/cpu/mcheck/barrier.c
+++ b/xen/arch/x86/cpu/mcheck/barrier.c
@@ -20,7 +20,7 @@  void mce_barrier_enter(struct mce_softirq_barrier *bar)
 {
     int gen;
 
-    if (!mce_broadcast)
+    if ( !mce_broadcast || __get_cpu_var(lmce_in_process) )
         return;
     atomic_inc(&bar->ingen);
     gen = atomic_read(&bar->outgen);
@@ -38,7 +38,7 @@  void mce_barrier_exit(struct mce_softirq_barrier *bar)
 {
     int gen;
 
-    if ( !mce_broadcast )
+    if ( !mce_broadcast || __get_cpu_var(lmce_in_process) )
         return;
     atomic_inc(&bar->outgen);
     gen = atomic_read(&bar->ingen);
diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index 8b2b834..90c68ff 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -95,7 +95,9 @@  mc_memerr_dhandler(struct mca_binfo *binfo,
 
                 bank->mc_addr = gfn << PAGE_SHIFT |
                   (bank->mc_addr & (PAGE_SIZE -1 ));
-                if ( fill_vmsr_data(bank, d, global->mc_gstatus,
+                /* TODO: support injecting LMCE */
+                if ( fill_vmsr_data(bank, d,
+                                    global->mc_gstatus & ~MCG_STATUS_LMCE,
                                     vmce_vcpuid == VMCE_INJECT_BROADCAST) == -1 )
                 {
                     mce_printk(MCE_QUIET, "Fill vMCE# data for DOM%d "
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 95a9da3..2d69222 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -42,6 +42,17 @@  DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, poll_bankmask);
 DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, no_cmci_banks);
 DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, mce_clear_banks);
 
+/*
+ * Flag to indicate whether the current MCE on this CPU is a LMCE.
+ *
+ * The MCE handler should set/clear this flag before entering any MCE
+ * barriers and raising MCE softirq. MCE barriers rely on this flag to
+ * decide whether they need to wait for other CPUs. MCE softirq handler
+ * relies on this flag to decide whether it needs to handle pending
+ * MCEs on other CPUs.
+ */
+DEFINE_PER_CPU(bool, lmce_in_process);
+
 static void intpose_init(void);
 static void mcinfo_clear(struct mc_info *);
 struct mca_banks *mca_allbanks;
@@ -399,6 +410,7 @@  mcheck_mca_logout(enum mca_source who, struct mca_banks *bankmask,
         sp->errcnt = errcnt;
         sp->ripv = (gstatus & MCG_STATUS_RIPV) != 0;
         sp->eipv = (gstatus & MCG_STATUS_EIPV) != 0;
+        sp->lmce = (gstatus & MCG_STATUS_LMCE) != 0;
         sp->uc = uc;
         sp->pcc = pcc;
         sp->recoverable = recover;
@@ -462,6 +474,7 @@  void mcheck_cmn_handler(const struct cpu_user_regs *regs)
     uint64_t gstatus;
     mctelem_cookie_t mctc = NULL;
     struct mca_summary bs;
+    bool *lmce_in_process = &__get_cpu_var(lmce_in_process);
 
     mce_spin_lock(&mce_logout_lock);
 
@@ -505,6 +518,8 @@  void mcheck_cmn_handler(const struct cpu_user_regs *regs)
     }
     mce_spin_unlock(&mce_logout_lock);
 
+    *lmce_in_process = bs.lmce;
+
     mce_barrier_enter(&mce_trap_bar);
     if ( mctc != NULL && mce_urgent_action(regs, mctc))
         cpumask_set_cpu(smp_processor_id(), &mce_fatal_cpus);
@@ -1709,6 +1724,7 @@  static void mce_softirq(void)
 {
     int cpu = smp_processor_id();
     unsigned int workcpu;
+    bool lmce = per_cpu(lmce_in_process, cpu);
 
     mce_printk(MCE_VERBOSE, "CPU%d enter softirq\n", cpu);
 
@@ -1738,9 +1754,12 @@  static void mce_softirq(void)
         /* Step1: Fill DOM0 LOG buffer, vMCE injection buffer and
          * vMCE MSRs virtualization buffer
          */
-        for_each_online_cpu(workcpu) {
-            mctelem_process_deferred(workcpu, mce_delayed_action);
-        }
+        if ( lmce )
+            mctelem_process_deferred(cpu, mce_delayed_action);
+        else
+            for_each_online_cpu(workcpu) {
+                mctelem_process_deferred(workcpu, mce_delayed_action);
+            }
 
         /* Step2: Send Log to DOM0 through vIRQ */
         if (dom0_vmce_enabled()) {
diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index 2f4e7a4..2c033af 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -110,12 +110,15 @@  struct mca_summary {
     bool_t      uc;     /* UC flag */
     bool_t      pcc;    /* PCC flag */
     bool_t      recoverable; /* software error recoverable flag */
+    bool_t      lmce;   /* LMCE flag (Intel specific) */
 };
 
 DECLARE_PER_CPU(struct mca_banks *, poll_bankmask);
 DECLARE_PER_CPU(struct mca_banks *, no_cmci_banks);
 DECLARE_PER_CPU(struct mca_banks *, mce_clear_banks);
 
+DECLARE_PER_CPU(bool, lmce_in_process);
+
 extern bool_t cmci_support;
 extern bool_t is_mc_panic;
 extern bool_t mce_broadcast;
diff --git a/xen/arch/x86/cpu/mcheck/x86_mca.h b/xen/arch/x86/cpu/mcheck/x86_mca.h
index e25d619..322b7d4 100644
--- a/xen/arch/x86/cpu/mcheck/x86_mca.h
+++ b/xen/arch/x86/cpu/mcheck/x86_mca.h
@@ -42,7 +42,9 @@ 
 #define MCG_STATUS_RIPV         0x0000000000000001ULL
 #define MCG_STATUS_EIPV         0x0000000000000002ULL
 #define MCG_STATUS_MCIP         0x0000000000000004ULL
-/* Bits 3-63 are reserved */
+#define MCG_STATUS_LMCE         0x0000000000000008ULL  /* Intel specific */
+/* Bits 3-63 are reserved on CPU not supporting LMCE */
+/* Bits 4-63 are reserved on CPU supporting LMCE */
 
 /* Bitfield of MSR_K8_MCi_STATUS registers */
 /* MCA error code */

[12/19] x86/mce: handle LMCE locally

Commit Message

Comments

Patch