diff mbox

[v10,4/4] x86: Create a new synthetic cpu capability for machine check recovery

Message ID 97426a50c5667bb81a28340b820b371d7fadb6fa.1454618190.git.tony.luck@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Luck, Tony Jan. 30, 2016, midnight UTC
The Intel Software Developer Manual describes bit 24 in the MCG_CAP
MSR:
   MCG_SER_P (software error recovery support present) flag,
   bit 24 — Indicates (when set) that the processor supports
   software error recovery
But only some models with this capability bit set will actually
generate recoverable machine checks.

Check the model name and set a synthetic capability bit. Provide
a command line option to set this bit anyway in case the kernel
doesn't recognise the model name.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/x86/x86_64/boot-options.txt |  4 ++++
 arch/x86/include/asm/cpufeature.h         |  1 +
 arch/x86/include/asm/mce.h                |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c          | 11 +++++++++++
 4 files changed, 17 insertions(+)

Comments

Borislav Petkov Feb. 7, 2016, 5:10 p.m. UTC | #1
On Fri, Jan 29, 2016 at 04:00:19PM -0800, Tony Luck wrote:
> The Intel Software Developer Manual describes bit 24 in the MCG_CAP
> MSR:
>    MCG_SER_P (software error recovery support present) flag,
>    bit 24 — Indicates (when set) that the processor supports
>    software error recovery
> But only some models with this capability bit set will actually
> generate recoverable machine checks.
> 
> Check the model name and set a synthetic capability bit. Provide
> a command line option to set this bit anyway in case the kernel
> doesn't recognise the model name.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  Documentation/x86/x86_64/boot-options.txt |  4 ++++
>  arch/x86/include/asm/cpufeature.h         |  1 +
>  arch/x86/include/asm/mce.h                |  1 +
>  arch/x86/kernel/cpu/mcheck/mce.c          | 11 +++++++++++
>  4 files changed, 17 insertions(+)
> 
> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
> index 68ed3114c363..8423c04ae7b3 100644
> --- a/Documentation/x86/x86_64/boot-options.txt
> +++ b/Documentation/x86/x86_64/boot-options.txt
> @@ -60,6 +60,10 @@ Machine check
>  		threshold to 1. Enabling this may make memory predictive failure
>  		analysis less effective if the bios sets thresholds for memory
>  		errors since we will not see details for all errors.
> +   mce=recovery
> +		Tell the kernel that this system can generate recoverable
> +		machine checks (useful when the kernel doesn't recognize
> +		the cpuid x86_model_id[])

I'd say		"Force-enable generation of recoverable MCEs."

and not mention implementation details in the description text.

>     nomce (for compatibility with i386): same as mce=off
>  
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 7ad8c9464297..06c6c2d2fea0 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -106,6 +106,7 @@
>  #define X86_FEATURE_APERFMPERF	( 3*32+28) /* APERFMPERF */
>  #define X86_FEATURE_EAGER_FPU	( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
>  #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
> +#define X86_FEATURE_MCE_RECOVERY ( 3*32+31) /* cpu has recoverable machine checks */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
>  #define X86_FEATURE_XMM3	( 4*32+ 0) /* "pni" SSE-3 */
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 2ea4527e462f..18d2ba9c8e44 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -113,6 +113,7 @@ struct mca_config {
>  	bool ignore_ce;
>  	bool disabled;
>  	bool ser;
> +	bool recovery;
>  	bool bios_cmci_threshold;
>  	u8 banks;
>  	s8 bootlog;
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 905f3070f412..16a3d0e29f84 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1696,6 +1696,15 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
>  		return;
>  	}
>  
> +	/*
> +	 * MCG_CAP.MCG_SER_P is necessary but not sufficient to know
> +	 * whether this processor will actually generate recoverable
> +	 * machine checks. Check to see if this is an E7 model Xeon.
> +	 */
> +	if (mca_cfg.recovery || (mca_cfg.ser &&
> +		!strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24)))

Eeww, a model string check :-(

Lemme guess: those E7s can't be represented by a range of
model/steppings, can they?

Similar to AMD_MODEL_RANGE() thing in cpu/amd.c, for example.

In any case, that chunk belongs in the Intel part of
__mcheck_cpu_apply_quirks().
Luck, Tony Feb. 9, 2016, 11:38 p.m. UTC | #2
> > +	if (mca_cfg.recovery || (mca_cfg.ser &&
> > +		!strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24)))
> 
> Eeww, a model string check :-(
> 
> Lemme guess: those E7s can't be represented by a range of
> model/steppings, can they?

We use the same model number for E5 and E7 series. E.g. 63 for Haswell.
The model_id string seems to be the only way to tell ahead of time
whether you will get a recoverable machine check or die when you
touch uncorrected memory.

-Tony
Borislav Petkov Feb. 10, 2016, 11:06 a.m. UTC | #3
On Tue, Feb 09, 2016 at 03:38:57PM -0800, Luck, Tony wrote:
> We use the same model number for E5 and E7 series. E.g. 63 for Haswell.
> The model_id string seems to be the only way to tell ahead of time
> whether you will get a recoverable machine check or die when you
> touch uncorrected memory.

What about MSR_IA32_PLATFORM_ID or some other MSR or register, for
example?

I.e., isn't there some other, more reliable distinction between E5 and
E7 besides the model ID?
Luck, Tony Feb. 10, 2016, 7:27 p.m. UTC | #4
On Wed, Feb 10, 2016 at 12:06:03PM +0100, Borislav Petkov wrote:
> What about MSR_IA32_PLATFORM_ID or some other MSR or register, for
> example?

Bits 52:50 give us "information concerning the intended platform
for the processor" ... but we don't seem to decode that vague
statement into anything that I can make use of.

> I.e., isn't there some other, more reliable distinction between E5 and
> E7 besides the model ID?

Digging in the data sheet I found the CAPID0 register which does
indicate in bit 4 whether this is an "EX" (a.k.a. "E7" part). But
we invent a new PCI device ID for this every generation (0x0EC3 in
Ivy Bridge, 0x2fc0 in Haswell, 0x6fc0 in Broadwell). The offset
has stayed at 0x84 through all this.

I don't think that hunting the ever-changing PCI-id is a
good choice ... the "E5/E7" naming convention has stuck for
four generations[1] (Sandy Bridge, Ivy Bridge, Haswell, Broadwell).

-Tony

[1] Although this probably means that marketing are about to
think of something new ... they generally do when people start
understanding the model names :-(

-Tony
Borislav Petkov Feb. 11, 2016, 11:55 a.m. UTC | #5
On Wed, Feb 10, 2016 at 11:27:50AM -0800, Luck, Tony wrote:
> Digging in the data sheet I found the CAPID0 register which does
> indicate in bit 4 whether this is an "EX" (a.k.a. "E7" part). But
> we invent a new PCI device ID for this every generation (0x0EC3 in
> Ivy Bridge, 0x2fc0 in Haswell, 0x6fc0 in Broadwell). The offset
> has stayed at 0x84 through all this.
> 
> I don't think that hunting the ever-changing PCI-id is a
> good choice ...

Right :-\

> the "E5/E7" naming convention has stuck for
> four generations[1] (Sandy Bridge, Ivy Bridge, Haswell, Broadwell).
> 
> -Tony
> 
> [1] Although this probably means that marketing are about to
> think of something new ... they generally do when people start
> understanding the model names :-(

Yeah, customers shouldn't slack and relax into even thinking they know
the model names. Fortunately there's wikipedia...

Thanks.
diff mbox

Patch

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 68ed3114c363..8423c04ae7b3 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -60,6 +60,10 @@  Machine check
 		threshold to 1. Enabling this may make memory predictive failure
 		analysis less effective if the bios sets thresholds for memory
 		errors since we will not see details for all errors.
+   mce=recovery
+		Tell the kernel that this system can generate recoverable
+		machine checks (useful when the kernel doesn't recognize
+		the cpuid x86_model_id[])
 
    nomce (for compatibility with i386): same as mce=off
 
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 7ad8c9464297..06c6c2d2fea0 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -106,6 +106,7 @@ 
 #define X86_FEATURE_APERFMPERF	( 3*32+28) /* APERFMPERF */
 #define X86_FEATURE_EAGER_FPU	( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
 #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
+#define X86_FEATURE_MCE_RECOVERY ( 3*32+31) /* cpu has recoverable machine checks */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_XMM3	( 4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ea4527e462f..18d2ba9c8e44 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -113,6 +113,7 @@  struct mca_config {
 	bool ignore_ce;
 	bool disabled;
 	bool ser;
+	bool recovery;
 	bool bios_cmci_threshold;
 	u8 banks;
 	s8 bootlog;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 905f3070f412..16a3d0e29f84 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1696,6 +1696,15 @@  void mcheck_cpu_init(struct cpuinfo_x86 *c)
 		return;
 	}
 
+	/*
+	 * MCG_CAP.MCG_SER_P is necessary but not sufficient to know
+	 * whether this processor will actually generate recoverable
+	 * machine checks. Check to see if this is an E7 model Xeon.
+	 */
+	if (mca_cfg.recovery || (mca_cfg.ser &&
+		!strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24)))
+		set_cpu_cap(c, X86_FEATURE_MCE_RECOVERY);
+
 	if (mce_gen_pool_init()) {
 		mca_cfg.disabled = true;
 		pr_emerg("Couldn't allocate MCE records pool!\n");
@@ -2030,6 +2039,8 @@  static int __init mcheck_enable(char *str)
 		cfg->bootlog = (str[0] == 'b');
 	else if (!strcmp(str, "bios_cmci_threshold"))
 		cfg->bios_cmci_threshold = true;
+	else if (!strcmp(str, "recovery"))
+		cfg->recovery = true;
 	else if (isdigit(str[0])) {
 		if (get_option(&str, &cfg->tolerant) == 2)
 			get_option(&str, &(cfg->monarch_timeout));