diff mbox series

[v3,1/4] x86/mce: Add Zhaoxin MCE support

Message ID 9d6769dca6394638a013ccad2c8f964c@zhaoxin.com (mailing list archive)
State New, archived
Headers show
Series [v3,1/4] x86/mce: Add Zhaoxin MCE support | expand

Commit Message

Tony W Wang-oc Sept. 11, 2019, 12:01 p.m. UTC
All Zhaoxin newer CPUs support MCE that compatible with Intel's
"Machine-Check Architecture", so add support for Zhaoxin MCE in
mce/core.c.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
v2->v3:
 - Make ifelse-case to switch-case
 - Simplify Zhaoxin CPU FMS checking

 arch/x86/kernel/cpu/mce/core.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

Comments

Borislav Petkov Sept. 11, 2019, 2:35 p.m. UTC | #1
On Wed, Sep 11, 2019 at 12:01:42PM +0000, Tony W Wang-oc wrote:
> All Zhaoxin newer CPUs support MCE that compatible with Intel's
> "Machine-Check Architecture", so add support for Zhaoxin MCE in
> mce/core.c.
> 
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
> v2->v3:
>  - Make ifelse-case to switch-case
>  - Simplify Zhaoxin CPU FMS checking

Btw, for future submissions - you don't have to do it now - search the
web for threaded mails, look at git-send-email's manpage and especially
the --thread and --chain-reply-to options. Also, look at lkml for
examples.

IOW, patchsets should have a 0/N message and all the others should
be sent as a reply to that message, i.e., shallow threading, as the
git-send-email manpage calls it.

Thx.
Tony Luck Sept. 13, 2019, 6:10 p.m. UTC | #2
On Wed, Sep 11, 2019 at 12:01:42PM +0000, Tony W Wang-oc wrote:
> +	/* Checks after this one are Intel/Zhaoxin-specific: */
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
> +	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)


Is it time to have a big cleanup on how we handle similarities
and oddities in the MCE subsystem?  We've been adding ad-hoc
tests like this in random places ... and it all looks very
messy.  Lines that mention x86_vendor|x86|x86_model below
arch/x86/kernel/cpu/mce/ currently look like this:

arch/x86/kernel/cpu/mce/amd.c:		   (c->x86_model >= 0x10 && c->x86_model <= 0x2F)) {
arch/x86/kernel/cpu/mce/amd.c:	    c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
arch/x86/kernel/cpu/mce/amd.c:	} else if (c->x86 == 0x17 &&
arch/x86/kernel/cpu/mce/amd.c:	if (c->x86 == 0x15 && bank == 4) {
arch/x86/kernel/cpu/mce/amd.c:	if (c->x86 == 0x17 &&
arch/x86/kernel/cpu/mce/core.c:	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
arch/x86/kernel/cpu/mce/core.c:	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
arch/x86/kernel/cpu/mce/core.c:	     c->x86 > 6) {
arch/x86/kernel/cpu/mce/core.c:	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
arch/x86/kernel/cpu/mce/core.c:	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
arch/x86/kernel/cpu/mce/core.c:	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 < 0x11 && cfg->bootlog < 0) {
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 0x15 && c->x86_model <= 0xf)
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86 != 5)
arch/x86/kernel/cpu/mce/core.c:		if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
arch/x86/kernel/cpu/mce/core.c:	if ((c->x86 == 6 && c->x86_model == 0xf && c->x86_stepping >= 0xe) ||
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && c->x86_model == 45)
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_AMD) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_INTEL) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
arch/x86/kernel/cpu/mce/core.c:	m->cpuvendor = boot_cpu_data.x86_vendor;
arch/x86/kernel/cpu/mce/core.c:	switch (c->x86_vendor) {
arch/x86/kernel/cpu/mce/inject.c:	    boot_cpu_data.x86 < 0x17) {
arch/x86/kernel/cpu/mce/inject.c:	m->cpuvendor = boot_cpu_data.x86_vendor;
arch/x86/kernel/cpu/mce/intel.c:	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
arch/x86/kernel/cpu/mce/intel.c:	switch (c->x86_model) {
arch/x86/kernel/cpu/mce/severity.c:	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
arch/x86/kernel/cpu/mce/severity.c:	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
arch/x86/kernel/cpu/mce/therm_throt.c:		if (c->x86 == 6 && (c->x86_model == 9 || c->x86_model == 13)) {

Maybe we can X86_VENDOR_ZHAOXIN to this jumble with the excuse that
it is already so ugly that this patch series only makes things 5% worse?

Or should we make a big table of CPU vendors/families/models and use
x86_match_cpu() to pick out what are running on and set some bits/flags
(like X86_FEATURE/X86_BUG) which we can use in the code to do the
right thing in each place?

E.g. default for Intel and Zhaoxin vendors would be to set MCE_INTEL_LIKE.

Thoughts?

-Tony
Borislav Petkov Sept. 13, 2019, 9:16 p.m. UTC | #3
On Fri, Sep 13, 2019 at 11:10:31AM -0700, Luck, Tony wrote:
> Is it time to have a big cleanup on how we handle similarities
> and oddities in the MCE subsystem?  We've been adding ad-hoc
> tests like this in random places ... and it all looks very
> messy.

Hohum, it has been bothering me for a while now too. ;-\

> Or should we make a big table of CPU vendors/families/models and use
> x86_match_cpu() to pick out what are running on and set some bits/flags
> (like X86_FEATURE/X86_BUG) which we can use in the code to do the
> right thing in each place?

Yes, that. And I have started doing something along those lines, see
struct mce_vendor_flags.

If we did the X86_FEATURE/BUG things, we would still end up using those
new definitions in the MCA code only so I think having our own bits in a
bitfield would be cleaner/nicer.

Anyway, detection can be all done in __mcheck_cpu_init_early() or
somewhere similar, all matching flags/bits set and then the rest of the
code would query only them.

We can also merge mce_vendor_flags into mca_cfg as that thing is used
everywhere.

Another advantage of having our own flags is that we can define them as
we like and stick them all in internal.h so no exposure to the outside.

And so on.

> E.g. default for Intel and Zhaoxin vendors would be to set MCE_INTEL_LIKE.
> 
> Thoughts?

Yah, I think that's a good idea and I think we should do it. Not
immediately but work towards it.

Thx.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 743370e..7bcd8c1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -488,8 +488,9 @@  int mce_usable_address(struct mce *m)
 	if (!(m->status & MCI_STATUS_ADDRV))
 		return 0;
 
-	/* Checks after this one are Intel-specific: */
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+	/* Checks after this one are Intel/Zhaoxin-specific: */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
+	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
 		return 1;
 
 	if (!(m->status & MCI_STATUS_MISCV))
@@ -507,10 +508,13 @@  EXPORT_SYMBOL_GPL(mce_usable_address);
 
 bool mce_is_memory_error(struct mce *m)
 {
-	if (m->cpuvendor == X86_VENDOR_AMD ||
-	    m->cpuvendor == X86_VENDOR_HYGON) {
+	switch (m->cpuvendor) {
+	case X86_VENDOR_AMD:
+	case X86_VENDOR_HYGON:
 		return amd_mce_is_memory_error(m);
-	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
+
+	case X86_VENDOR_INTEL:
+	case X86_VENDOR_ZHAOXIN:
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -527,9 +531,10 @@  bool mce_is_memory_error(struct mce *m)
 		return (m->status & 0xef80) == BIT(7) ||
 		       (m->status & 0xef00) == BIT(8) ||
 		       (m->status & 0xeffc) == 0xc;
-	}
 
-	return false;
+	default:
+		return false;
+	}
 }
 EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
@@ -1697,6 +1702,18 @@  static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		if (c->x86 == 6 && c->x86_model == 45)
 			quirk_no_way_out = quirk_sandybridge_ifu;
 	}
+
+	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		/*
+		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
+		 * synchronization with a one second timeout.
+		 */
+		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+			if (cfg->monarch_timeout < 0)
+				cfg->monarch_timeout = USEC_PER_SEC;
+		}
+	}
+
 	if (cfg->monarch_timeout < 0)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
@@ -2014,15 +2031,16 @@  static void mce_disable_error_reporting(void)
 static void vendor_disable_error_reporting(void)
 {
 	/*
-	 * Don't clear on Intel or AMD or Hygon CPUs. Some of these MSRs
-	 * are socket-wide.
+	 * Don't clear on Intel or AMD or Hygon or Zhaoxin CPUs. Some of these
+	 * MSRs are socket-wide.
 	 * Disabling them for just a single offlined CPU is bad, since it will
 	 * inhibit reporting for all shared resources on the socket like the
 	 * last level cache (LLC), the integrated memory controller (iMC), etc.
 	 */
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
 	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
 		return;
 
 	mce_disable_error_reporting();