From patchwork Wed Nov 18 15:15:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paoloni, Gabriele" X-Patchwork-Id: 11915313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, LOTS_OF_MONEY,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DAFFC5519F for ; Wed, 18 Nov 2020 15:16:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E369824761 for ; Wed, 18 Nov 2020 15:16:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726234AbgKRPQF (ORCPT ); Wed, 18 Nov 2020 10:16:05 -0500 Received: from mga12.intel.com ([192.55.52.136]:8041 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725943AbgKRPQF (ORCPT ); Wed, 18 Nov 2020 10:16:05 -0500 IronPort-SDR: Uz82WJJscjClWgeoGpbMjVBkQQRYRt380ycLj8h9Wz7xmQ+yzM5593hZkA9+YsBZ5ANXY/1xT+ TfAx5ySBG/tA== X-IronPort-AV: E=McAfee;i="6000,8403,9808"; a="150399355" X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="150399355" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:00 -0800 IronPort-SDR: GYtp/eQEMJHox3iSJRYa0KY4b4v4TwS7rbVt7DlVD6PBO5ZpAkMbHz9LG3SGZhf51VhcE8MPWI ebHCf7THJvWg== X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="359492070" Received: from paolonig001.ir.intel.com ([163.33.183.93]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:15:58 -0800 From: Gabriele Paoloni To: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, hpa@zytor.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Cc: gabriele.paoloni@intel.com, linux-safety@lists.elisa.tech Subject: [PATCH 1/4] x86/mce: do not overwrite no_way_out if mce_end() fails Date: Wed, 18 Nov 2020 15:15:49 +0000 Message-Id: <20201118151552.1412-2-gabriele.paoloni@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201118151552.1412-1-gabriele.paoloni@intel.com> References: <20201118151552.1412-1-gabriele.paoloni@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Currently if mce_end() fails no_way_out is set equal to worst. worst is the worst severirty that was found in the MCA banks associated to the current CPU; however at this point no_way_out could be already set by mca_start() by looking at all severities of all CPUs that entered the MCE handler. if mce_end() fails we first check if no_way_out is already set and if so we stick to it, otherwise we use the local worst value Signed-off-by: Gabriele Paoloni Reviewed-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 4102b866e7c0..b990892c6766 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1385,7 +1385,7 @@ noinstr void do_machine_check(struct pt_regs *regs) */ if (!lmce) { if (mce_end(order) < 0) - no_way_out = worst >= MCE_PANIC_SEVERITY; + no_way_out = no_way_out ? no_way_out : worst >= MCE_PANIC_SEVERITY; } else { /* * If there was a fatal machine check we should have From patchwork Wed Nov 18 15:15:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paoloni, Gabriele" X-Patchwork-Id: 11915321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, LOTS_OF_MONEY,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F545C64E7D for ; Wed, 18 Nov 2020 15:16:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04AE724749 for ; Wed, 18 Nov 2020 15:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726701AbgKRPQF (ORCPT ); Wed, 18 Nov 2020 10:16:05 -0500 Received: from mga12.intel.com ([192.55.52.136]:8041 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726211AbgKRPQF (ORCPT ); Wed, 18 Nov 2020 10:16:05 -0500 IronPort-SDR: K3HjGX1a54vaJlemn/4Jm/oaECa4wSLVdar+mMdAAk6DOHJmqdL/ag869B4F2qQuf5oZNtbyZs reHre9+wqiEA== X-IronPort-AV: E=McAfee;i="6000,8403,9808"; a="150399361" X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="150399361" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:03 -0800 IronPort-SDR: YsAkNBARL3HuPONP6kRUnF11yeoR3+ZaUcpZskIYpOJG2DWYRZUtTZ+eTWadQ2Lq3sjmZnV/4z mSsF6lGpIX9Q== X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="359492091" Received: from paolonig001.ir.intel.com ([163.33.183.93]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:01 -0800 From: Gabriele Paoloni To: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, hpa@zytor.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Cc: gabriele.paoloni@intel.com, linux-safety@lists.elisa.tech Subject: [PATCH 2/4] x86/mce: move the mce_panic() call and kill_it assignments at the right places Date: Wed, 18 Nov 2020 15:15:50 +0000 Message-Id: <20201118151552.1412-3-gabriele.paoloni@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201118151552.1412-1-gabriele.paoloni@intel.com> References: <20201118151552.1412-1-gabriele.paoloni@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Right now for local MCEs we panic(),if needed, right after lmce is set. For global MCEs mce_reign() takes care of calling mce_panic(). Hence this patch: - improves readibility by moving the conditional evaluation of tolerant up to when kill_it is set first - moves the mce_panic() call up into the statement where mce_end() fails Signed-off-by: Gabriele Paoloni Reviewed-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index b990892c6766..e025ff04438f 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1350,8 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs) * severity is MCE_AR_SEVERITY we have other options. */ if (!(m.mcgstatus & MCG_STATUS_RIPV)) - kill_it = 1; - + kill_it = (cfg->tolerant == 3) ? 0 : 1; /* * Check if this MCE is signaled to only this logical processor, * on Intel, Zhaoxin only. @@ -1384,8 +1383,15 @@ noinstr void do_machine_check(struct pt_regs *regs) * When there's any problem use only local no_way_out state. */ if (!lmce) { - if (mce_end(order) < 0) + if (mce_end(order) < 0) { no_way_out = no_way_out ? no_way_out : worst >= MCE_PANIC_SEVERITY; + /* + * mce_reign() has probably failed hence evaluate if we need + * to panic + */ + if (no_way_out && mca_cfg.tolerant < 3) + mce_panic("Fatal machine check on current CPU", &m, msg); + } } else { /* * If there was a fatal machine check we should have @@ -1401,15 +1407,6 @@ noinstr void do_machine_check(struct pt_regs *regs) } } - /* - * If tolerant is at an insane level we drop requests to kill - * processes and continue even when there is no way out. - */ - if (cfg->tolerant == 3) - kill_it = 0; - else if (no_way_out) - mce_panic("Fatal machine check on current CPU", &m, msg); - if (worst > 0) irq_work_queue(&mce_irq_work); From patchwork Wed Nov 18 15:15:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paoloni, Gabriele" X-Patchwork-Id: 11915317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, LOTS_OF_MONEY,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EE18C6379F for ; Wed, 18 Nov 2020 15:16:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5F272474A for ; Wed, 18 Nov 2020 15:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727241AbgKRPQG (ORCPT ); Wed, 18 Nov 2020 10:16:06 -0500 Received: from mga12.intel.com ([192.55.52.136]:8044 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725943AbgKRPQF (ORCPT ); Wed, 18 Nov 2020 10:16:05 -0500 IronPort-SDR: HYv+NxmuI3Dz4yzt6kBd0qUlAe06yEp/sydXK6TufaB/vjVlU/N/tAfU5oa8C6d72dclTo/eDL 1RKBYjGs8PlA== X-IronPort-AV: E=McAfee;i="6000,8403,9808"; a="150399368" X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="150399368" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:05 -0800 IronPort-SDR: FhT7lQxgVb3O+FEWaDT+DGgvl5M7rYr2Jw/7nkzcwdHVSZZFiGNSsXSFvavNBSBPlWoMCVPaHZ TOaOPvn+1F2A== X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="359492116" Received: from paolonig001.ir.intel.com ([163.33.183.93]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:03 -0800 From: Gabriele Paoloni To: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, hpa@zytor.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Cc: gabriele.paoloni@intel.com, linux-safety@lists.elisa.tech Subject: [PATCH 3/4] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Date: Wed, 18 Nov 2020 15:15:51 +0000 Message-Id: <20201118151552.1412-4-gabriele.paoloni@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201118151552.1412-1-gabriele.paoloni@intel.com> References: <20201118151552.1412-1-gabriele.paoloni@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Right now for LMCE if no_way_out is set mce_panic() is called regardless of mca_cfg.tolerant. This is not correct as if mca_cfg.tolerant = 3 we should never panic. Signed-off-by: Gabriele Paoloni Reviewed-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index e025ff04438f..d16cbb05b09c 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1367,7 +1367,7 @@ noinstr void do_machine_check(struct pt_regs *regs) * to see it will clear it. */ if (lmce) { - if (no_way_out) + if (no_way_out && mca_cfg.tolerant < 3) mce_panic("Fatal local machine check", &m, msg); } else { order = mce_start(&no_way_out); From patchwork Wed Nov 18 15:15:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paoloni, Gabriele" X-Patchwork-Id: 11915315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, LOTS_OF_MONEY,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E96F8C64E69 for ; Wed, 18 Nov 2020 15:16:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BC58A24752 for ; Wed, 18 Nov 2020 15:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727266AbgKRPQI (ORCPT ); Wed, 18 Nov 2020 10:16:08 -0500 Received: from mga12.intel.com ([192.55.52.136]:8050 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725943AbgKRPQI (ORCPT ); Wed, 18 Nov 2020 10:16:08 -0500 IronPort-SDR: nYonFwEl9zJoMqqTd2+ZwiZQti2c4WFbWJJGn9YppHl8DZ0xtevVTqrl4kDhUFMJ4MuuWJ6JQS qVLDP8y4yS3g== X-IronPort-AV: E=McAfee;i="6000,8403,9808"; a="150399374" X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="150399374" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:07 -0800 IronPort-SDR: Qdj1G/BZDfUwKBd9PGrja0yEPB2y5oIR8EVsEoULTfcS+ZmJWqFhPNXha2H4xpjI3mfdLEKQTK ha0EJCyBdLKQ== X-IronPort-AV: E=Sophos;i="5.77,486,1596524400"; d="scan'208";a="359492130" Received: from paolonig001.ir.intel.com ([163.33.183.93]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2020 07:16:05 -0800 From: Gabriele Paoloni To: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, hpa@zytor.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Cc: gabriele.paoloni@intel.com, linux-safety@lists.elisa.tech Subject: [PATCH 4/4] x86/mce: remove redundant call to irq_work_queue() Date: Wed, 18 Nov 2020 15:15:52 +0000 Message-Id: <20201118151552.1412-5-gabriele.paoloni@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201118151552.1412-1-gabriele.paoloni@intel.com> References: <20201118151552.1412-1-gabriele.paoloni@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Right now in do_machine_check() we have: __mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work) hence the call of irq_work_queue() below after __mc_scan_banks() seems redundant. Just remove it. Signed-off-by: Gabriele Paoloni Reviewed-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index d16cbb05b09c..f2f7bfc60c67 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1407,9 +1407,6 @@ noinstr void do_machine_check(struct pt_regs *regs) } } - if (worst > 0) - irq_work_queue(&mce_irq_work); - if (worst != MCE_AR_SEVERITY && !kill_it) goto out;