Message ID | 9d6769dca6394638a013ccad2c8f964c@zhaoxin.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/4] x86/mce: Add Zhaoxin MCE support | expand |
On Wed, Sep 11, 2019 at 12:01:42PM +0000, Tony W Wang-oc wrote: > All Zhaoxin newer CPUs support MCE that compatible with Intel's > "Machine-Check Architecture", so add support for Zhaoxin MCE in > mce/core.c. > > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > --- > v2->v3: > - Make ifelse-case to switch-case > - Simplify Zhaoxin CPU FMS checking Btw, for future submissions - you don't have to do it now - search the web for threaded mails, look at git-send-email's manpage and especially the --thread and --chain-reply-to options. Also, look at lkml for examples. IOW, patchsets should have a 0/N message and all the others should be sent as a reply to that message, i.e., shallow threading, as the git-send-email manpage calls it. Thx.
On Wed, Sep 11, 2019 at 12:01:42PM +0000, Tony W Wang-oc wrote: > + /* Checks after this one are Intel/Zhaoxin-specific: */ > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && > + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) Is it time to have a big cleanup on how we handle similarities and oddities in the MCE subsystem? We've been adding ad-hoc tests like this in random places ... and it all looks very messy. Lines that mention x86_vendor|x86|x86_model below arch/x86/kernel/cpu/mce/ currently look like this: arch/x86/kernel/cpu/mce/amd.c: (c->x86_model >= 0x10 && c->x86_model <= 0x2F)) { arch/x86/kernel/cpu/mce/amd.c: c->x86_model >= 0x10 && c->x86_model <= 0x2F && arch/x86/kernel/cpu/mce/amd.c: } else if (c->x86 == 0x17 && arch/x86/kernel/cpu/mce/amd.c: if (c->x86 == 0x15 && bank == 4) { arch/x86/kernel/cpu/mce/amd.c: if (c->x86 == 0x17 && arch/x86/kernel/cpu/mce/core.c: boot_cpu_data.x86_vendor == X86_VENDOR_AMD) arch/x86/kernel/cpu/mce/core.c: boot_cpu_data.x86_vendor == X86_VENDOR_HYGON || arch/x86/kernel/cpu/mce/core.c: c->x86 > 6) { arch/x86/kernel/cpu/mce/core.c: if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) arch/x86/kernel/cpu/mce/core.c: if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) arch/x86/kernel/cpu/mce/core.c: if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL || arch/x86/kernel/cpu/mce/core.c: if (c->x86 < 0x11 && cfg->bootlog < 0) { arch/x86/kernel/cpu/mce/core.c: if (c->x86 == 0x15 && c->x86_model <= 0xf) arch/x86/kernel/cpu/mce/core.c: if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { arch/x86/kernel/cpu/mce/core.c: if (c->x86 != 5) arch/x86/kernel/cpu/mce/core.c: if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && arch/x86/kernel/cpu/mce/core.c: if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) arch/x86/kernel/cpu/mce/core.c: if ((c->x86 == 6 && c->x86_model == 0xf && c->x86_stepping >= 0xe) || arch/x86/kernel/cpu/mce/core.c: if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) arch/x86/kernel/cpu/mce/core.c: if (c->x86 == 6 && c->x86_model == 45) arch/x86/kernel/cpu/mce/core.c: if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0) arch/x86/kernel/cpu/mce/core.c: if (c->x86_vendor == X86_VENDOR_AMD) { arch/x86/kernel/cpu/mce/core.c: if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) { arch/x86/kernel/cpu/mce/core.c: if (c->x86_vendor == X86_VENDOR_INTEL) { arch/x86/kernel/cpu/mce/core.c: if (c->x86_vendor == X86_VENDOR_UNKNOWN) { arch/x86/kernel/cpu/mce/core.c: m->cpuvendor = boot_cpu_data.x86_vendor; arch/x86/kernel/cpu/mce/core.c: switch (c->x86_vendor) { arch/x86/kernel/cpu/mce/inject.c: boot_cpu_data.x86 < 0x17) { arch/x86/kernel/cpu/mce/inject.c: m->cpuvendor = boot_cpu_data.x86_vendor; arch/x86/kernel/cpu/mce/intel.c: if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) arch/x86/kernel/cpu/mce/intel.c: switch (c->x86_model) { arch/x86/kernel/cpu/mce/severity.c: boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) arch/x86/kernel/cpu/mce/severity.c: if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || arch/x86/kernel/cpu/mce/therm_throt.c: if (c->x86 == 6 && (c->x86_model == 9 || c->x86_model == 13)) { Maybe we can X86_VENDOR_ZHAOXIN to this jumble with the excuse that it is already so ugly that this patch series only makes things 5% worse? Or should we make a big table of CPU vendors/families/models and use x86_match_cpu() to pick out what are running on and set some bits/flags (like X86_FEATURE/X86_BUG) which we can use in the code to do the right thing in each place? E.g. default for Intel and Zhaoxin vendors would be to set MCE_INTEL_LIKE. Thoughts? -Tony
On Fri, Sep 13, 2019 at 11:10:31AM -0700, Luck, Tony wrote: > Is it time to have a big cleanup on how we handle similarities > and oddities in the MCE subsystem? We've been adding ad-hoc > tests like this in random places ... and it all looks very > messy. Hohum, it has been bothering me for a while now too. ;-\ > Or should we make a big table of CPU vendors/families/models and use > x86_match_cpu() to pick out what are running on and set some bits/flags > (like X86_FEATURE/X86_BUG) which we can use in the code to do the > right thing in each place? Yes, that. And I have started doing something along those lines, see struct mce_vendor_flags. If we did the X86_FEATURE/BUG things, we would still end up using those new definitions in the MCA code only so I think having our own bits in a bitfield would be cleaner/nicer. Anyway, detection can be all done in __mcheck_cpu_init_early() or somewhere similar, all matching flags/bits set and then the rest of the code would query only them. We can also merge mce_vendor_flags into mca_cfg as that thing is used everywhere. Another advantage of having our own flags is that we can define them as we like and stick them all in internal.h so no exposure to the outside. And so on. > E.g. default for Intel and Zhaoxin vendors would be to set MCE_INTEL_LIKE. > > Thoughts? Yah, I think that's a good idea and I think we should do it. Not immediately but work towards it. Thx.
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 743370e..7bcd8c1 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -488,8 +488,9 @@ int mce_usable_address(struct mce *m) if (!(m->status & MCI_STATUS_ADDRV)) return 0; - /* Checks after this one are Intel-specific: */ - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + /* Checks after this one are Intel/Zhaoxin-specific: */ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) return 1; if (!(m->status & MCI_STATUS_MISCV)) @@ -507,10 +508,13 @@ EXPORT_SYMBOL_GPL(mce_usable_address); bool mce_is_memory_error(struct mce *m) { - if (m->cpuvendor == X86_VENDOR_AMD || - m->cpuvendor == X86_VENDOR_HYGON) { + switch (m->cpuvendor) { + case X86_VENDOR_AMD: + case X86_VENDOR_HYGON: return amd_mce_is_memory_error(m); - } else if (m->cpuvendor == X86_VENDOR_INTEL) { + + case X86_VENDOR_INTEL: + case X86_VENDOR_ZHAOXIN: /* * Intel SDM Volume 3B - 15.9.2 Compound Error Codes * @@ -527,9 +531,10 @@ bool mce_is_memory_error(struct mce *m) return (m->status & 0xef80) == BIT(7) || (m->status & 0xef00) == BIT(8) || (m->status & 0xeffc) == 0xc; - } - return false; + default: + return false; + } } EXPORT_SYMBOL_GPL(mce_is_memory_error); @@ -1697,6 +1702,18 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) if (c->x86 == 6 && c->x86_model == 45) quirk_no_way_out = quirk_sandybridge_ifu; } + + if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { + /* + * All newer Zhaoxin CPUs support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) { + if (cfg->monarch_timeout < 0) + cfg->monarch_timeout = USEC_PER_SEC; + } + } + if (cfg->monarch_timeout < 0) cfg->monarch_timeout = 0; if (cfg->bootlog != 0) @@ -2014,15 +2031,16 @@ static void mce_disable_error_reporting(void) static void vendor_disable_error_reporting(void) { /* - * Don't clear on Intel or AMD or Hygon CPUs. Some of these MSRs - * are socket-wide. + * Don't clear on Intel or AMD or Hygon or Zhaoxin CPUs. Some of these + * MSRs are socket-wide. * Disabling them for just a single offlined CPU is bad, since it will * inhibit reporting for all shared resources on the socket like the * last level cache (LLC), the integrated memory controller (iMC), etc. */ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL || boot_cpu_data.x86_vendor == X86_VENDOR_HYGON || - boot_cpu_data.x86_vendor == X86_VENDOR_AMD) + boot_cpu_data.x86_vendor == X86_VENDOR_AMD || + boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) return; mce_disable_error_reporting();
All Zhaoxin newer CPUs support MCE that compatible with Intel's "Machine-Check Architecture", so add support for Zhaoxin MCE in mce/core.c. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> --- v2->v3: - Make ifelse-case to switch-case - Simplify Zhaoxin CPU FMS checking arch/x86/kernel/cpu/mce/core.c | 38 ++++++++++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 10 deletions(-)