Message ID | 20200217130659.15895-1-prarit@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] x86/mce: Do not log spurious corrected mce errors | expand |
On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote: > A user has reported that they are seeing spurious corrected errors on > their hardware. > > Intel Errata HSD131, HSM142, HSW131, and BDM48 report that > "spurious corrected errors may be logged in the IA32_MC0_STATUS register > with the valid field (bit 63) set, the uncorrected error field (bit 61) > not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and > an MCA Error Code (bits [15:0]) of 0x0005." > > Block these spurious errors from the console and logs. > > Links to Intel Specification updates: > HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html > HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html > HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html > BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html My previous review comment still holds: Those links tend to get stale with time. If you really want to refer to the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add them there as an attachment and add the link to the entry to the commit message. > Signed-off-by: Prarit Bhargava <prarit@redhat.com> > Co-developed-by: Alexander Krupp <centos@akr.yagii.de> WARNING: Co-developed-by: must be immediately followed by Signed-off-by: #36: See Documentation/process/submitting-patches.rst for more detail. > Cc: Tony Luck <tony.luck@intel.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: x86@kernel.org > Cc: linux-edac@vger.kernel.org > --- > arch/x86/kernel/cpu/mce/core.c | 2 ++ > arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++ > arch/x86/kernel/cpu/mce/internal.h | 1 + > 3 files changed, 20 insertions(+) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 2c4f949611e4..fe3983d551cc 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m) > { > if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) > return amd_filter_mce(m); > + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) > + return intel_filter_mce(m); > > return false; > } > diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c > index 5627b1091b85..989148e6746c 100644 > --- a/arch/x86/kernel/cpu/mce/intel.c > +++ b/arch/x86/kernel/cpu/mce/intel.c > @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c) > { > intel_clear_lmce(); > } > + > +bool intel_filter_mce(struct mce *m) > +{ > + struct cpuinfo_x86 *c = &boot_cpu_data; > + > + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */ > + if ((c->x86 == 6) && > + ((c->x86_model == INTEL_FAM6_HASWELL) || > + (c->x86_model == INTEL_FAM6_HASWELL_L) || > + (c->x86_model == INTEL_FAM6_BROADWELL) || > + (c->x86_model == INTEL_FAM6_HASWELL_G)) && > + (m->bank == 0) && > + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005)) > + return true; > + > + return false; > +} > diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h > index b785c0d0b590..821faba5b05d 100644 > --- a/arch/x86/kernel/cpu/mce/internal.h > +++ b/arch/x86/kernel/cpu/mce/internal.h > @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m); > #else > static inline bool amd_filter_mce(struct mce *m) { return false; }; > #endif > +extern bool intel_filter_mce(struct mce *m); It doesn't even build: ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce': /home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce' make: *** [Makefile:1077: vmlinux] Error 1 Hint: do it like it is done for amd_filter_mce() but in the respective #ifdef CONFIG_X86_MCE_INTEL place.
On 2/18/20 11:13 AM, Borislav Petkov wrote: > On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote: >> A user has reported that they are seeing spurious corrected errors on >> their hardware. >> >> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that >> "spurious corrected errors may be logged in the IA32_MC0_STATUS register >> with the valid field (bit 63) set, the uncorrected error field (bit 61) >> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and >> an MCA Error Code (bits [15:0]) of 0x0005." >> >> Block these spurious errors from the console and logs. >> >> Links to Intel Specification updates: >> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html >> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html >> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html >> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html > > My previous review comment still holds: > > Those links tend to get stale with time. If you really want to refer to > the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add > them there as an attachment and add the link to the entry to the commit > message. > >> Signed-off-by: Prarit Bhargava <prarit@redhat.com> >> Co-developed-by: Alexander Krupp <centos@akr.yagii.de> > > WARNING: Co-developed-by: must be immediately followed by Signed-off-by: > #36: > > See Documentation/process/submitting-patches.rst for more detail. > >> Cc: Tony Luck <tony.luck@intel.com> >> Cc: Borislav Petkov <bp@alien8.de> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: x86@kernel.org >> Cc: linux-edac@vger.kernel.org >> --- >> arch/x86/kernel/cpu/mce/core.c | 2 ++ >> arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++ >> arch/x86/kernel/cpu/mce/internal.h | 1 + >> 3 files changed, 20 insertions(+) >> >> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c >> index 2c4f949611e4..fe3983d551cc 100644 >> --- a/arch/x86/kernel/cpu/mce/core.c >> +++ b/arch/x86/kernel/cpu/mce/core.c >> @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m) >> { >> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) >> return amd_filter_mce(m); >> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) >> + return intel_filter_mce(m); >> >> return false; >> } >> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c >> index 5627b1091b85..989148e6746c 100644 >> --- a/arch/x86/kernel/cpu/mce/intel.c >> +++ b/arch/x86/kernel/cpu/mce/intel.c >> @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c) >> { >> intel_clear_lmce(); >> } >> + >> +bool intel_filter_mce(struct mce *m) >> +{ >> + struct cpuinfo_x86 *c = &boot_cpu_data; >> + >> + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */ >> + if ((c->x86 == 6) && >> + ((c->x86_model == INTEL_FAM6_HASWELL) || >> + (c->x86_model == INTEL_FAM6_HASWELL_L) || >> + (c->x86_model == INTEL_FAM6_BROADWELL) || >> + (c->x86_model == INTEL_FAM6_HASWELL_G)) && >> + (m->bank == 0) && >> + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005)) >> + return true; >> + >> + return false; >> +} >> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h >> index b785c0d0b590..821faba5b05d 100644 >> --- a/arch/x86/kernel/cpu/mce/internal.h >> +++ b/arch/x86/kernel/cpu/mce/internal.h >> @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m); >> #else >> static inline bool amd_filter_mce(struct mce *m) { return false; }; >> #endif >> +extern bool intel_filter_mce(struct mce *m); > > It doesn't even build: > > ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce': > /home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce' > make: *** [Makefile:1077: vmlinux] Error 1 > > Hint: do it like it is done for amd_filter_mce() but in the respective > #ifdef CONFIG_X86_MCE_INTEL place.' That's weird. I just re-compiled it on my system without any issue. Can I have your .config to double check my compile? I'm using the standard fedora config FWIW. P. >
On Tue, Feb 18, 2020 at 11:17:34AM -0500, Prarit Bhargava wrote: > That's weird. I just re-compiled it on my system without any issue. Can I have > your .config to double check my compile? I'm using the standard fedora config FWIW. Just disable CONFIG_X86_MCE_INTEL
Hi Prarit,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/auto-latest linux/master linus/master v5.6-rc2 next-20200219]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Prarit-Bhargava/x86-mce-Do-not-log-spurious-corrected-mce-errors/20200219-154357
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 248ed51048c40d36728e70914e38bffd7821da57
config: i386-alldefconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce':
>> core.c:(.text+0x21d4): undefined reference to `intel_filter_mce'
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 2/18/20 11:13 AM, Borislav Petkov wrote: > On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote: >> A user has reported that they are seeing spurious corrected errors on >> their hardware. >> >> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that >> "spurious corrected errors may be logged in the IA32_MC0_STATUS register >> with the valid field (bit 63) set, the uncorrected error field (bit 61) >> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and >> an MCA Error Code (bits [15:0]) of 0x0005." >> >> Block these spurious errors from the console and logs. >> >> Links to Intel Specification updates: >> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html >> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html >> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html >> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html > > My previous review comment still holds: > > Those links tend to get stale with time. If you really want to refer to > the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add > them there as an attachment and add the link to the entry to the commit > message. > >> Signed-off-by: Prarit Bhargava <prarit@redhat.com> >> Co-developed-by: Alexander Krupp <centos@akr.yagii.de> > > WARNING: Co-developed-by: must be immediately followed by Signed-off-by: > #36: Borislav, when I've been using your Co-developed-by & not using a Signed-off-by process but when I run checkpatch.py I get the following warning: WARNING: Co-developed-by and Signed-off-by: name/email do not match #21: Co-developed-by: Alexander Krupp <centos@akr.yagii.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> When I submitted this patch I looked at other commits in the kernel near top-of-tree and they have Signed-off-by followed by Co-developed-by, and also took your suggestion of not using a Signed-off-by for Alexander. That's why I chose to display them this way in v2. Examples: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate") 40ca1bf580ef ("PCI: brcmstb: Add MSI support") I'm now thoroughly confused as to what the correct format is. It seems like checkpatch.py is telling me to include a Signed-off-by in addition to the Co-developed-by for Alexander but you explicitly told me not to. P. > > See Documentation/process/submitting-patches.rst for more detail. > >> Cc: Tony Luck <tony.luck@intel.com> >> Cc: Borislav Petkov <bp@alien8.de> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: x86@kernel.org >> Cc: linux-edac@vger.kernel.org >> --- >> arch/x86/kernel/cpu/mce/core.c | 2 ++ >> arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++ >> arch/x86/kernel/cpu/mce/internal.h | 1 + >> 3 files changed, 20 insertions(+) >> >> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c >> index 2c4f949611e4..fe3983d551cc 100644 >> --- a/arch/x86/kernel/cpu/mce/core.c >> +++ b/arch/x86/kernel/cpu/mce/core.c >> @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m) >> { >> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) >> return amd_filter_mce(m); >> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) >> + return intel_filter_mce(m); >> >> return false; >> } >> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c >> index 5627b1091b85..989148e6746c 100644 >> --- a/arch/x86/kernel/cpu/mce/intel.c >> +++ b/arch/x86/kernel/cpu/mce/intel.c >> @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c) >> { >> intel_clear_lmce(); >> } >> + >> +bool intel_filter_mce(struct mce *m) >> +{ >> + struct cpuinfo_x86 *c = &boot_cpu_data; >> + >> + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */ >> + if ((c->x86 == 6) && >> + ((c->x86_model == INTEL_FAM6_HASWELL) || >> + (c->x86_model == INTEL_FAM6_HASWELL_L) || >> + (c->x86_model == INTEL_FAM6_BROADWELL) || >> + (c->x86_model == INTEL_FAM6_HASWELL_G)) && >> + (m->bank == 0) && >> + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005)) >> + return true; >> + >> + return false; >> +} >> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h >> index b785c0d0b590..821faba5b05d 100644 >> --- a/arch/x86/kernel/cpu/mce/internal.h >> +++ b/arch/x86/kernel/cpu/mce/internal.h >> @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m); >> #else >> static inline bool amd_filter_mce(struct mce *m) { return false; }; >> #endif >> +extern bool intel_filter_mce(struct mce *m); > > It doesn't even build: > > ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce': > /home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce' > make: *** [Makefile:1077: vmlinux] Error 1 > > Hint: do it like it is done for amd_filter_mce() but in the respective > #ifdef CONFIG_X86_MCE_INTEL place. >
On Wed, Feb 19, 2020 at 07:25:59AM -0500, Prarit Bhargava wrote: > When I submitted this patch I looked at other commits in the kernel near > top-of-tree and they have Signed-off-by followed by Co-developed-by, and also > took your suggestion of not using a Signed-off-by for Alexander. That's why I I said: "This is not how this is expressed. Either you write that in free text in the commit message or you use Co-developed-by. More details in Documentation/process/submitting-patches.rst" > I'm now thoroughly confused as to what the correct format is. It seems like > checkpatch.py is telling me to include a Signed-off-by in addition to the > Co-developed-by for Alexander but you explicitly told me not to. > > > See Documentation/process/submitting-patches.rst for more detail. You need to start reading my replies in their entirety and finally read that document I've pointed to twice: "Co-developed-by: states that the patch was co-created by multiple developers; it is a used to give attribution to co-authors (in addition to the author attributed by the From: tag) when several people work on a single patch. Since ^^^^^ Co-developed-by: denotes authorship, every Co-developed-by: must be immediately ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ followed by a Signed-off-by: of the associated co-author." ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I.e., basically what checkpatch is saying. So you either a) write in free text in the commit message something like "This is based on a patch submitted to RH bugzilla by Alexander Krupp <centos@akr.yagii.de>" OR b) use Co-developed-by: Alexander Krupp <centos@akr.yagii.de> Signed-off-by: Alexander Krupp <centos@akr.yagii.de> Either a) XOR b).
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2c4f949611e4..fe3983d551cc 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m) { if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) return amd_filter_mce(m); + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + return intel_filter_mce(m); return false; } diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 5627b1091b85..989148e6746c 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c) { intel_clear_lmce(); } + +bool intel_filter_mce(struct mce *m) +{ + struct cpuinfo_x86 *c = &boot_cpu_data; + + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */ + if ((c->x86 == 6) && + ((c->x86_model == INTEL_FAM6_HASWELL) || + (c->x86_model == INTEL_FAM6_HASWELL_L) || + (c->x86_model == INTEL_FAM6_BROADWELL) || + (c->x86_model == INTEL_FAM6_HASWELL_G)) && + (m->bank == 0) && + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005)) + return true; + + return false; +} diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index b785c0d0b590..821faba5b05d 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m); #else static inline bool amd_filter_mce(struct mce *m) { return false; }; #endif +extern bool intel_filter_mce(struct mce *m); #endif /* __X86_MCE_INTERNAL_H__ */
A user has reported that they are seeing spurious corrected errors on their hardware. Intel Errata HSD131, HSM142, HSW131, and BDM48 report that "spurious corrected errors may be logged in the IA32_MC0_STATUS register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits [15:0]) of 0x0005." Block these spurious errors from the console and logs. Links to Intel Specification updates: HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html Signed-off-by: Prarit Bhargava <prarit@redhat.com> Co-developed-by: Alexander Krupp <centos@akr.yagii.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-edac@vger.kernel.org --- arch/x86/kernel/cpu/mce/core.c | 2 ++ arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++ arch/x86/kernel/cpu/mce/internal.h | 1 + 3 files changed, 20 insertions(+)