From patchwork Sat Jan 30 00:00:19 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 8196451 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 04BA49F38B for ; Tue, 2 Feb 2016 23:27:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1859D202F0 for ; Tue, 2 Feb 2016 23:27:24 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 00D5B202E9 for ; Tue, 2 Feb 2016 23:27:23 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E3EDE1A1DDC; Tue, 2 Feb 2016 15:27:22 -0800 (PST) X-Original-To: linux-nvdimm@ml01.01.org Delivered-To: linux-nvdimm@ml01.01.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by ml01.01.org (Postfix) with ESMTP id 326341A1DDB for ; Tue, 2 Feb 2016 15:27:21 -0800 (PST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP; 02 Feb 2016 15:27:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,387,1449561600"; d="scan'208";a="895084259" Received: from agluck-desk.sc.intel.com ([10.3.79.148]) by fmsmga001.fm.intel.com with SMTP; 02 Feb 2016 15:27:20 -0800 Message-Id: In-Reply-To: References: From: Tony Luck Date: Fri, 29 Jan 2016 16:00:19 -0800 Subject: [PATCH v9 4/4] x86: Create a new synthetic cpu capability for machine check recovery MIME-Version: 1.0 To: Ingo Molnar Cc: Brian Gerst , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Borislav Petkov , Andy Lutomirski , linux-nvdimm@ml01.01.org, Andrew Morton X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The Intel Software Developer Manual describes bit 24 in the MCG_CAP MSR: MCG_SER_P (software error recovery support present) flag, bit 24 — Indicates (when set) that the processor supports software error recovery But only some models with this capability bit set will actually generate recoverable machine checks. Check the model name and set a synthetic capability bit. Provide a command line option to set this bit anyway in case the kernel doesn't recognise the model name. Signed-off-by: Tony Luck --- Documentation/x86/x86_64/boot-options.txt | 4 ++++ arch/x86/include/asm/cpufeature.h | 1 + arch/x86/include/asm/mce.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 11 +++++++++++ 4 files changed, 17 insertions(+) diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 68ed3114c363..8423c04ae7b3 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt @@ -60,6 +60,10 @@ Machine check threshold to 1. Enabling this may make memory predictive failure analysis less effective if the bios sets thresholds for memory errors since we will not see details for all errors. + mce=recovery + Tell the kernel that this system can generate recoverable + machine checks (useful when the kernel doesn't recognize + the cpuid x86_model_id[]) nomce (for compatibility with i386): same as mce=off diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 7ad8c9464297..06c6c2d2fea0 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -106,6 +106,7 @@ #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ #define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ +#define X86_FEATURE_MCE_RECOVERY ( 3*32+31) /* cpu has recoverable machine checks */ /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 2ea4527e462f..18d2ba9c8e44 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -113,6 +113,7 @@ struct mca_config { bool ignore_ce; bool disabled; bool ser; + bool recovery; bool bios_cmci_threshold; u8 banks; s8 bootlog; diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 028409a376e3..da2866e1c52a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1697,6 +1697,15 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) return; } + /* + * MCG_CAP.MCG_SER_P is necessary but not sufficient to know + * whether this processor will actually generate recoverable + * machine checks. Check to see if this is an E7 model Xeon. + */ + if (mca_cfg.recovery || (mca_cfg.ser && + !strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24))) + set_cpu_cap(c, X86_FEATURE_MCE_RECOVERY); + if (mce_gen_pool_init()) { mca_cfg.disabled = true; pr_emerg("Couldn't allocate MCE records pool!\n"); @@ -2031,6 +2040,8 @@ static int __init mcheck_enable(char *str) cfg->bootlog = (str[0] == 'b'); else if (!strcmp(str, "bios_cmci_threshold")) cfg->bios_cmci_threshold = true; + else if (!strcmp(str, "recovery")) + cfg->recovery = true; else if (isdigit(str[0])) { if (get_option(&str, &cfg->tolerant) == 2) get_option(&str, &(cfg->monarch_timeout));