diff mbox series

[v2] mce: prevent concurrent polling of MCE events

Message ID 20230626203712.GW4090740@cathedrallabs.org (mailing list archive)
State New, archived
Headers show
Series [v2] mce: prevent concurrent polling of MCE events | expand

Commit Message

Aristeu Rozanski June 26, 2023, 8:37 p.m. UTC
On Intel microarchitectures that support CMCI but have it disabled (BIOS,
kernel option or CMCI storm code) the kernel will resort to polling for MCEs.
In these microarchitectures the IMC registers are shared by all CPUs in the
same package and despite the fact that the polling is set up in the kernel to
prevent all CPUs to poll at the same time, it's still possible they'll overlap
and report the same MCE multiple times.

This patch fixes this by introducing synchronization during polling only for
the affected microarchitectures. This can be improved further in the future if
Intel implements a way to determine which package a certain CPU belongs to.

Signed-off-by: Aristeu Rozanski <aris@ruivo.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org

---
 arch/x86/kernel/cpu/mce/core.c     |    2 +
 arch/x86/kernel/cpu/mce/intel.c    |   47 +++++++++++++++++++++++++++++++------
 arch/x86/kernel/cpu/mce/internal.h |    4 +++
 3 files changed, 46 insertions(+), 7 deletions(-)

Comments

Tony Luck June 26, 2023, 9:01 p.m. UTC | #1
On Mon, Jun 26, 2023 at 04:37:12PM -0400, Aristeu Rozanski wrote:

> This patch fixes this by introducing synchronization during polling only for
> the affected microarchitectures. This can be improved further in the future if
> Intel implements a way to determine which package a certain CPU belongs to.

It is already possible to associate a CPU with a package. The problem
here is knowing which CPUs are sharing which banks.

> --- linus-2.6.orig/arch/x86/kernel/cpu/mce/core.c	2023-06-26 12:37:12.972386600 -0400
> +++ linus-2.6/arch/x86/kernel/cpu/mce/core.c	2023-06-26 12:47:21.895072048 -0400
> @@ -1618,7 +1618,9 @@ static void mce_timer_fn(struct timer_li
>  	iv = __this_cpu_read(mce_next_interval);
>  
>  	if (mce_available(this_cpu_ptr(&cpu_info))) {
> +		bool locked = intel_cmci_poll_lock();

Is mce_timer_fn() used on AMD? I think it is (mcheck_cpu_init() calls
__mcheck_cpu_setup_timer() apparently unconditionally). So this code is
OK if CONFIG_X86_MCE_INTEL=n because that's an empty stub. But with
both CONFIG_X86_MCE_INTEL=y and CONFIG_X86_MCE_AMD=y you'll call this
function on AMD systems.

Or is this OK because this will always return false on AMD?

-Tony
Aristeu Rozanski June 26, 2023, 10:32 p.m. UTC | #2
On Mon, Jun 26, 2023 at 02:01:09PM -0700, Tony Luck wrote:
> It is already possible to associate a CPU with a package. The problem
> here is knowing which CPUs are sharing which banks.

Then this patch can be improved to have one lock per package instead one
for the entire system. How do I know which package the current CPU
belongs?

> Is mce_timer_fn() used on AMD? I think it is (mcheck_cpu_init() calls
> __mcheck_cpu_setup_timer() apparently unconditionally). So this code is
> OK if CONFIG_X86_MCE_INTEL=n because that's an empty stub. But with
> both CONFIG_X86_MCE_INTEL=y and CONFIG_X86_MCE_AMD=y you'll call this
> function on AMD systems.
> 
> Or is this OK because this will always return false on AMD?

That was my intention. I could use a static to only call
cmci_supported_hw() once (that in fact could be useful for users of
cmci_supported() that actually just want cmci_supported_hw() but I didn't
want to mix it with this change) if you prefer:

+bool intel_cmci_poll_lock(void)
{
	static bool supported = cmci_supported_hw();

	if (!supported)
		return false;

	spin_lock(&cmci_poll_lock);

	return true;
}

Still a call and comparison for everyone (although it's likely
intel_cmci_poll_{,un}lock() will be inlined). You prefer this or have
a different idea?

Thanks
Tony Luck June 26, 2023, 10:52 p.m. UTC | #3
> Then this patch can be improved to have one lock per package instead one
> for the entire system. How do I know which package the current CPU
> belongs?

topology_physical_package_id(cpu);

But it seems like that adds a lot of work to find out how many packages there
are and initialize a spin lock for each one. Just to reduce what is likely to be
quite low contention on a lock in a rare code path.

> Or is this OK because this will always return false on AMD?
>
> That was my intention. I could use a static to only call
> cmci_supported_hw() once (that in fact could be useful for users of
> cmci_supported() that actually just want cmci_supported_hw() but I didn't
> want to mix it with this change) if you prefer:

Maybe the function just needs a better name. Calling something
with an "intel_cmci_" prefix from generic code is what's causing me
to step back.

What if the code read:

	if (mce_available(this_cpu_ptr(&cpu_info))) {
+		bool locked = serialize_mc_bank_access();
 		machine_check_poll(0, this_cpu_ptr(&mce_poll_banks));
+		unserialize_mc_bank_access(locked);

As an intermediate functions. With core.c declaring:

static bool serialize_mc_bank_access(void)
{
	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
		return false;

	return intel_cmci_poll_lock();
}

static void unserialize_mc_bank_access(bool locked)
{
	if (!locked || boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
		return false;

	return intel_cmci_poll_unlock();
}

That would be in-lined and almost all disappear for CONFIG_X86_MCE_INTEL=n

-Tony
Aristeu Rozanski June 27, 2023, 3:05 a.m. UTC | #4
On Mon, Jun 26, 2023 at 10:52:01PM +0000, Luck, Tony wrote:
> topology_physical_package_id(cpu);
> 
> But it seems like that adds a lot of work to find out how many packages there
> are and initialize a spin lock for each one. Just to reduce what is likely to be
> quite low contention on a lock in a rare code path.

OK, I'll update the description then.

> > Or is this OK because this will always return false on AMD?
> >
> > That was my intention. I could use a static to only call
> > cmci_supported_hw() once (that in fact could be useful for users of
> > cmci_supported() that actually just want cmci_supported_hw() but I didn't
> > want to mix it with this change) if you prefer:
> 
> Maybe the function just needs a better name. Calling something
> with an "intel_cmci_" prefix from generic code is what's causing me
> to step back.
> 
> What if the code read:
> 
> 	if (mce_available(this_cpu_ptr(&cpu_info))) {
> +		bool locked = serialize_mc_bank_access();
>  		machine_check_poll(0, this_cpu_ptr(&mce_poll_banks));
> +		unserialize_mc_bank_access(locked);
> 
> As an intermediate functions. With core.c declaring:
> 
> static bool serialize_mc_bank_access(void)
> {
> 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> 		return false;
> 
> 	return intel_cmci_poll_lock();
> }
> 
> static void unserialize_mc_bank_access(bool locked)
> {
> 	if (!locked || boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> 		return false;
> 
> 	return intel_cmci_poll_unlock();
> }
> 
> That would be in-lined and almost all disappear for CONFIG_X86_MCE_INTEL=n

Ah, I see. Makes sense. I'll update and send it tomorrow.
diff mbox series

Patch

--- linus-2.6.orig/arch/x86/kernel/cpu/mce/core.c	2023-06-26 12:37:12.972386600 -0400
+++ linus-2.6/arch/x86/kernel/cpu/mce/core.c	2023-06-26 12:47:21.895072048 -0400
@@ -1618,7 +1618,9 @@  static void mce_timer_fn(struct timer_li
 	iv = __this_cpu_read(mce_next_interval);
 
 	if (mce_available(this_cpu_ptr(&cpu_info))) {
+		bool locked = intel_cmci_poll_lock();
 		machine_check_poll(0, this_cpu_ptr(&mce_poll_banks));
+		intel_cmci_poll_unlock(locked);
 
 		if (mce_intel_cmci_poll()) {
 			iv = mce_adjust_timer(iv);
--- linus-2.6.orig/arch/x86/kernel/cpu/mce/intel.c	2023-06-26 12:37:12.972386600 -0400
+++ linus-2.6/arch/x86/kernel/cpu/mce/intel.c	2023-06-26 13:10:41.436942020 -0400
@@ -73,13 +73,8 @@  enum {
 
 static atomic_t cmci_storm_on_cpus;
 
-static int cmci_supported(int *banks)
+static bool cmci_supported_hw(void)
 {
-	u64 cap;
-
-	if (mca_cfg.cmci_disabled || mca_cfg.ignore_ce)
-		return 0;
-
 	/*
 	 * Vendor check is not strictly needed, but the initial
 	 * initialization is vendor keyed and this
@@ -87,10 +82,24 @@  return 0;
 	 */
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
 	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
-		return 0;
+		return false;
 
 	if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
+		return false;
+
+	return true;
+}
+
+static int cmci_supported(int *banks)
+{
+	u64 cap;
+
+	if (mca_cfg.cmci_disabled || mca_cfg.ignore_ce)
 		return 0;
+
+	if (!cmci_supported_hw())
+		return 0;
+
 	rdmsrl(MSR_IA32_MCG_CAP, cap);
 	*banks = min_t(unsigned, MAX_NR_BANKS, cap & 0xff);
 	return !!(cap & MCG_CMCI_P);
@@ -519,3 +528,27 @@  ((m->status & 0xa0000000ffffffff) == 0x8
 
 	return false;
 }
+
+/*
+ * On systems that do support CMCI but it's disabled, polling for MCEs can
+ * cause the same event to be reported multiple times because IA32_MCi_STATUS
+ * is shared by the same package.
+ */
+static DEFINE_SPINLOCK(cmci_poll_lock);
+bool intel_cmci_poll_lock(void)
+{
+	if (!cmci_supported_hw())
+		return false;
+
+	spin_lock(&cmci_poll_lock);
+
+	return true;
+}
+
+void intel_cmci_poll_unlock(bool locked)
+{
+	if (!locked)
+		return;
+
+	spin_unlock(&cmci_poll_lock);
+}
--- linus-2.6.orig/arch/x86/kernel/cpu/mce/internal.h	2023-06-26 12:37:12.972386600 -0400
+++ linus-2.6/arch/x86/kernel/cpu/mce/internal.h	2023-06-26 12:47:21.896072039 -0400
@@ -49,6 +49,8 @@  void intel_init_cmci(void);
 void intel_init_lmce(void);
 void intel_clear_lmce(void);
 bool intel_filter_mce(struct mce *m);
+bool intel_cmci_poll_lock(void);
+void intel_cmci_poll_unlock(bool locked);
 #else
 # define cmci_intel_adjust_timer mce_adjust_timer_default
 static inline bool mce_intel_cmci_poll(void) { return false; }
@@ -58,6 +60,8 @@  static inline void intel_init_cmci(void)
 static inline void intel_init_lmce(void) { }
 static inline void intel_clear_lmce(void) { }
 static inline bool intel_filter_mce(struct mce *m) { return false; }
+static inline bool intel_cmci_poll_lock(void) { return false; }
+static inline void intel_cmci_poll_unlock(bool locked) { }
 #endif
 
 void mce_timer_kick(unsigned long interval);