Message ID | 20250213-wip-mca-updates-v2-5-3636547fe05f@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | AMD MCA interrupts rework | expand |
On Thu, Feb 13, 2025 at 04:45:54PM +0000, Yazen Ghannam wrote: > From: Borislav Petkov <bp@suse.de> > > Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename > that function to what it does now - prepares banks. Do this so that > generic and vendor banks init goes first so that settings done during > that init can take effect before the first bank polling takes place. > > Move __mcheck_cpu_check_banks() into __mcheck_cpu_init_prepare_banks() > as it already loops over the banks. > > Signed-off-by: Borislav Petkov <bp@suse.de> > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > --- > > Notes: > Link: > https://lore.kernel.org/r/20221206173607.1185907-2-yazen.ghannam@amd.com > > v1->v2: > * New in v2, but based on old patch (see link). > * Kept old tags for reference. > > arch/x86/include/asm/mce.h | 3 +- > arch/x86/kernel/cpu/mce/core.c | 63 ++++++++++++------------------------------ > 2 files changed, 19 insertions(+), 47 deletions(-) > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index 2701aca04aec..36ff81c1b3b1 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -290,8 +290,7 @@ DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); > enum mcp_flags { > MCP_TIMESTAMP = BIT(0), /* log time stamp */ > MCP_UC = BIT(1), /* log uncorrected errors */ > - MCP_DONTLOG = BIT(2), /* only clear, don't log */ MCP_DONTLOG is removed in this patch. But no mention of this change in the commit description. -Tony
On Thu, Feb 13, 2025 at 02:32:27PM -0800, Luck, Tony wrote: > On Thu, Feb 13, 2025 at 04:45:54PM +0000, Yazen Ghannam wrote: > > From: Borislav Petkov <bp@suse.de> > > > > Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename > > that function to what it does now - prepares banks. Do this so that > > generic and vendor banks init goes first so that settings done during > > that init can take effect before the first bank polling takes place. > > > > Move __mcheck_cpu_check_banks() into __mcheck_cpu_init_prepare_banks() > > as it already loops over the banks. > > > > Signed-off-by: Borislav Petkov <bp@suse.de> > > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > > --- > > > > Notes: > > Link: > > https://lore.kernel.org/r/20221206173607.1185907-2-yazen.ghannam@amd.com > > > > v1->v2: > > * New in v2, but based on old patch (see link). > > * Kept old tags for reference. > > > > arch/x86/include/asm/mce.h | 3 +- > > arch/x86/kernel/cpu/mce/core.c | 63 ++++++++++++------------------------------ > > 2 files changed, 19 insertions(+), 47 deletions(-) > > > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > > index 2701aca04aec..36ff81c1b3b1 100644 > > --- a/arch/x86/include/asm/mce.h > > +++ b/arch/x86/include/asm/mce.h > > @@ -290,8 +290,7 @@ DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); > > enum mcp_flags { > > MCP_TIMESTAMP = BIT(0), /* log time stamp */ > > MCP_UC = BIT(1), /* log uncorrected errors */ > > - MCP_DONTLOG = BIT(2), /* only clear, don't log */ > > MCP_DONTLOG is removed in this patch. But no mention of this change in > the commit description. > Yes, good point. How about this? "The MCP_DONTLOG flag is no longer needed, since the MCA polling function is now called only if boot-time logging should be done." Thanks, Yazen
> From: Yazen Ghannam <yazen.ghannam@amd.com> > Sent: Friday, February 14, 2025 12:46 AM > To: x86@kernel.org; Luck, Tony <tony.luck@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; > Smita.KoralahalliChannabasappa@amd.com; Yazen Ghannam > <yazen.ghannam@amd.com> > Subject: [PATCH v2 05/16] x86/mce: Cleanup bank processing on init > > From: Borislav Petkov <bp@suse.de> > > Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename > that function to what it does now - prepares banks. Do this so that generic > and vendor banks init goes first so that settings done during that init can take > effect before the first bank polling takes place. > > Move __mcheck_cpu_check_banks() into > __mcheck_cpu_init_prepare_banks() as it already loops over the banks. > > Signed-off-by: Borislav Petkov <bp@suse.de> > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Other than Tony's comments about the missing comment for 'MCP_DONTLOG', Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > MCP_DONTLOG is removed in this patch. But no mention of this change in > > the commit description. > > > > Yes, good point. How about this? > > "The MCP_DONTLOG flag is no longer needed, since the MCA polling > function is now called only if boot-time logging should be done." Looks good. -Tony
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 2701aca04aec..36ff81c1b3b1 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -290,8 +290,7 @@ DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); enum mcp_flags { MCP_TIMESTAMP = BIT(0), /* log time stamp */ MCP_UC = BIT(1), /* log uncorrected errors */ - MCP_DONTLOG = BIT(2), /* only clear, don't log */ - MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */ + MCP_QUEUE_LOG = BIT(2), /* only queue to genpool */ }; void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index d39af20154c7..d85bd861ecca 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -785,9 +785,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) continue; log_it: - if (flags & MCP_DONTLOG) - goto clear_it; - mce_read_aux(&err, i); m->severity = mce_severity(m, NULL, NULL, false); /* @@ -1807,7 +1804,7 @@ static void __mcheck_cpu_mce_banks_init(void) /* * Init them all, __mcheck_cpu_apply_quirks() is going to apply * the required vendor quirks before - * __mcheck_cpu_init_clear_banks() does the final bank setup. + * __mcheck_cpu_init_prepare_banks() does the final bank setup. */ b->ctl = -1ULL; b->init = true; @@ -1846,21 +1843,8 @@ static void __mcheck_cpu_cap_init(void) static void __mcheck_cpu_init_generic(void) { - enum mcp_flags m_fl = 0; - mce_banks_t all_banks; u64 cap; - if (!mca_cfg.bootlog) - m_fl = MCP_DONTLOG; - - /* - * Log the machine checks left over from the previous reset. Log them - * only, do not start processing them. That will happen in mcheck_late_init() - * when all consumers have been registered on the notifier chain. - */ - bitmap_fill(all_banks, MAX_NR_BANKS); - machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks); - cr4_set_bits(X86_CR4_MCE); rdmsrl(MSR_IA32_MCG_CAP, cap); @@ -1868,36 +1852,23 @@ static void __mcheck_cpu_init_generic(void) wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff); } -static void __mcheck_cpu_init_clear_banks(void) +static void __mcheck_cpu_init_prepare_banks(void) { struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); + u64 msrval; int i; - for (i = 0; i < this_cpu_read(mce_num_banks); i++) { - struct mce_bank *b = &mce_banks[i]; + /* + * Log the machine checks left over from the previous reset. Log them + * only, do not start processing them. That will happen in mcheck_late_init() + * when all consumers have been registered on the notifier chain. + */ + if (mca_cfg.bootlog) { + mce_banks_t all_banks; - if (!b->init) - continue; - wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl); - wrmsrl(mca_msr_reg(i, MCA_STATUS), 0); + bitmap_fill(all_banks, MAX_NR_BANKS); + machine_check_poll(MCP_UC | MCP_QUEUE_LOG, &all_banks); } -} - -/* - * Do a final check to see if there are any unused/RAZ banks. - * - * This must be done after the banks have been initialized and any quirks have - * been applied. - * - * Do not call this from any user-initiated flows, e.g. CPU hotplug or sysfs. - * Otherwise, a user who disables a bank will not be able to re-enable it - * without a system reboot. - */ -static void __mcheck_cpu_check_banks(void) -{ - struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); - u64 msrval; - int i; for (i = 0; i < this_cpu_read(mce_num_banks); i++) { struct mce_bank *b = &mce_banks[i]; @@ -1905,6 +1876,9 @@ static void __mcheck_cpu_check_banks(void) if (!b->init) continue; + wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl); + wrmsrl(mca_msr_reg(i, MCA_STATUS), 0); + rdmsrl(mca_msr_reg(i, MCA_CTL), msrval); b->init = !!msrval; } @@ -2310,8 +2284,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) __mcheck_cpu_init_early(c); __mcheck_cpu_init_generic(); __mcheck_cpu_init_vendor(c); - __mcheck_cpu_init_clear_banks(); - __mcheck_cpu_check_banks(); + __mcheck_cpu_init_prepare_banks(); __mcheck_cpu_setup_timer(); } @@ -2479,7 +2452,7 @@ static void mce_syscore_resume(void) { __mcheck_cpu_init_generic(); __mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info)); - __mcheck_cpu_init_clear_banks(); + __mcheck_cpu_init_prepare_banks(); } static struct syscore_ops mce_syscore_ops = { @@ -2497,7 +2470,7 @@ static void mce_cpu_restart(void *data) if (!mce_available(raw_cpu_ptr(&cpu_info))) return; __mcheck_cpu_init_generic(); - __mcheck_cpu_init_clear_banks(); + __mcheck_cpu_init_prepare_banks(); __mcheck_cpu_init_timer(); }