diff mbox series

[v2,1/3] x86/mce/inject: Remove call to mce_notify_irq()

Message ID 20250210154707.114219-2-nik.borisov@suse.com (mailing list archive)
State New
Headers show
Series Simpify mce code somewhat | expand

Commit Message

Nikolay Borisov Feb. 10, 2025, 3:47 p.m. UTC
The call is actually a noop because when the MCE is raised the early
notifier is the only call site that correctly calls mce_notify_irq()
because it also sets mce_need_notify. Remove this call and as a result
make mce_notify_irq() static

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
---
 arch/x86/include/asm/mce.h       |  2 --
 arch/x86/kernel/cpu/mce/core.c   | 44 ++++++++++++++++----------------
 arch/x86/kernel/cpu/mce/inject.c |  1 -
 3 files changed, 22 insertions(+), 25 deletions(-)

Comments

Qiuxu Zhuo Feb. 11, 2025, 7:03 a.m. UTC | #1
> From: Nikolay Borisov <nik.borisov@suse.com>
> [...]
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -584,6 +584,28 @@ bool mce_is_correctable(struct mce *m)  }
> EXPORT_SYMBOL_GPL(mce_is_correctable);
> 
> +/*
> + * Notify the user(s) about new machine check events.
> + * Can be called from interrupt context, but not from machine check/NMI
> + * context.
> + */
> +static int mce_notify_irq(void)

Why make mce_notify_irq() return an int?

> [...]
> -/*
> - * Notify the user(s) about new machine check events.
> - * Can be called from interrupt context, but not from machine check/NMI
> - * context.
> - */
> -bool mce_notify_irq(void)
> [...]
Nikolay Borisov Feb. 11, 2025, 8:49 a.m. UTC | #2
On 11.02.25 г. 9:03 ч., Zhuo, Qiuxu wrote:
>> From: Nikolay Borisov <nik.borisov@suse.com>
>> [...]
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -584,6 +584,28 @@ bool mce_is_correctable(struct mce *m)  }
>> EXPORT_SYMBOL_GPL(mce_is_correctable);
>>
>> +/*
>> + * Notify the user(s) about new machine check events.
>> + * Can be called from interrupt context, but not from machine check/NMI
>> + * context.
>> + */
>> +static int mce_notify_irq(void)
> 
> Why make mce_notify_irq() return an int?


Pff... because I based my patches off master and not tip, which has it 
as bool...
> 
>> [...]
>> -/*
>> - * Notify the user(s) about new machine check events.
>> - * Can be called from interrupt context, but not from machine check/NMI
>> - * context.
>> - */
>> -bool mce_notify_irq(void)
>> [...]
Qiuxu Zhuo Feb. 11, 2025, 12:37 p.m. UTC | #3
> From: Nikolay Borisov <nik.borisov@suse.com>
> [...]
> >> +static int mce_notify_irq(void)
> >
> > Why make mce_notify_irq() return an int?
> 
> 
> Pff... because I based my patches off master and not tip, which has it as
> bool...

The return type of this function has been "bool" since mainline v6.14-rc1.
It's better to rebase the patch on top of the new version.

-Qiuxu
Borislav Petkov Feb. 16, 2025, 4:10 p.m. UTC | #4
On Mon, Feb 10, 2025 at 05:47:04PM +0200, Nikolay Borisov wrote:
> The call is actually a noop because when the MCE is raised the early
> notifier is the only call site that correctly calls mce_notify_irq()
> because it also sets mce_need_notify. Remove this call and as a result
> make mce_notify_irq() static
> 
> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
> ---
>  arch/x86/include/asm/mce.h       |  2 --
>  arch/x86/kernel/cpu/mce/core.c   | 44 ++++++++++++++++----------------
>  arch/x86/kernel/cpu/mce/inject.c |  1 -
>  3 files changed, 22 insertions(+), 25 deletions(-)

So what you're looking at are the remnants of the old soft-inject of MCE
errors. And it seems that we lost some of that functionality along the way and
no one has noticed because, well, it seems no one uses it anymore.

In order to understand how this thing was supposed to work, checkout

ea149b36c7f5 ("x86, mce: add basic error injection infrastructure")

and follow what raise_mce() does and pay attention to notify_user which is
what mce_need_notify was called back then.

Remember to have fun :-P
Nikolay Borisov Feb. 17, 2025, 10:01 a.m. UTC | #5
On 16.02.25 г. 18:10 ч., Borislav Petkov wrote:
> On Mon, Feb 10, 2025 at 05:47:04PM +0200, Nikolay Borisov wrote:
>> The call is actually a noop because when the MCE is raised the early
>> notifier is the only call site that correctly calls mce_notify_irq()
>> because it also sets mce_need_notify. Remove this call and as a result
>> make mce_notify_irq() static
>>
>> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
>> ---
>>   arch/x86/include/asm/mce.h       |  2 --
>>   arch/x86/kernel/cpu/mce/core.c   | 44 ++++++++++++++++----------------
>>   arch/x86/kernel/cpu/mce/inject.c |  1 -
>>   3 files changed, 22 insertions(+), 25 deletions(-)
> 
> So what you're looking at are the remnants of the old soft-inject of MCE
> errors. And it seems that we lost some of that functionality along the way and
> no one has noticed because, well, it seems no one uses it anymore.
> 
> In order to understand how this thing was supposed to work, checkout
> 
> ea149b36c7f5 ("x86, mce: add basic error injection infrastructure")


The original code in  ea149b36c7f5 was setting the notify_user bit via

raise_mce()->machine_check_poll()->mce_log(),
  mce_notify_user() - consumes notify_user set in mce_log above.


  subsequently in 011d82611172 ("RAS: Add a Corrected Errors Collector") 
you factored out the code from mce_log() mce_first_notifier, where the 
bit setting remains to this day, but since it's been removed from 
mce_log() it made the call in raise_local() defunct.


Considering that no one complained about this after all these years and 
that the dev-mcelog is deprecated I think it still makes sense to make 
mce_notify_irq() private

> 
> and follow what raise_mce() does and pay attention to notify_user which is
> what mce_need_notify was called back then.
> 
> Remember to have fun :-P
>
Nikolay Borisov Feb. 17, 2025, 11:20 a.m. UTC | #6
On 17.02.25 г. 12:01 ч., Nikolay Borisov wrote:
> 
> 
> On 16.02.25 г. 18:10 ч., Borislav Petkov wrote:
>> On Mon, Feb 10, 2025 at 05:47:04PM +0200, Nikolay Borisov wrote:
>>> The call is actually a noop because when the MCE is raised the early
>>> notifier is the only call site that correctly calls mce_notify_irq()
>>> because it also sets mce_need_notify. Remove this call and as a result
>>> make mce_notify_irq() static
>>>
>>> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
>>> ---
>>>   arch/x86/include/asm/mce.h       |  2 --
>>>   arch/x86/kernel/cpu/mce/core.c   | 44 ++++++++++++++++----------------
>>>   arch/x86/kernel/cpu/mce/inject.c |  1 -
>>>   3 files changed, 22 insertions(+), 25 deletions(-)
>>
>> So what you're looking at are the remnants of the old soft-inject of MCE
>> errors. And it seems that we lost some of that functionality along the 
>> way and
>> no one has noticed because, well, it seems no one uses it anymore.
>>
>> In order to understand how this thing was supposed to work, checkout
>>
>> ea149b36c7f5 ("x86, mce: add basic error injection infrastructure")
> 
> 
> The original code in  ea149b36c7f5 was setting the notify_user bit via
> 
> raise_mce()->machine_check_poll()->mce_log(),
>   mce_notify_user() - consumes notify_user set in mce_log above.
> 
> 
>   subsequently in 011d82611172 ("RAS: Add a Corrected Errors Collector") 
> you factored out the code from mce_log() mce_first_notifier, where the 
> bit setting remains to this day, but since it's been removed from 
> mce_log() it made the call in raise_local() defunct.
> 
> 
> Considering that no one complained about this after all these years and 
> that the dev-mcelog is deprecated I think it still makes sense to make 
> mce_notify_irq() private


Actually there is no loss of functionality since after an MCE is injected
the early notifier will correctly call the usermode helper. So I
think the following change log better reflects the situation:


     x86/mce/inject: Remove call to mce_notify_irq()
     
     The call to mce_notify_irq() has been there since the initial version of
     the soft inject mce machinery, introduced in ea149b36c7f5 ("x86,
     mce: add basic error injection infrastructure"). At that time it was
     functional since injecting an MCE resulted in the following call chain:
     
     raise_mce()
       ->machine_check_poll()
           ->mce_log() - sets notfiy_user_bit
     
     ->mce_notify_user() (current mce_notify_irq) consumed the bit and called the
     usermode helper.
     
     However, with the introduction of 011d82611172 ("RAS: Add a Corrected Errors Collector")
     the code got moved around and the user mode helper began to be called
     via the early notifier (mce_first_notifier()) rendering the remaining call
     in raise_local() defunct as the mce_need_notify bit (ex notify_user) is
     only being set from the early notifier.
     
     Remove the noop call and make mce_notify_irq static. No functional
     changes.

> 
>>
>> and follow what raise_mce() does and pay attention to notify_user 
>> which is
>> what mce_need_notify was called back then.
>>
>> Remember to have fun :-P
>>
>
Borislav Petkov Feb. 17, 2025, 11:47 a.m. UTC | #7
On Mon, Feb 17, 2025 at 01:20:55PM +0200, Nikolay Borisov wrote:
> Actually there is no loss of functionality since after an MCE is injected
> the early notifier will correctly call the usermode helper. So I
> think the following change log better reflects the situation:
> 
> 
>     x86/mce/inject: Remove call to mce_notify_irq()
>     The call to mce_notify_irq() has been there since the initial version of
>     the soft inject mce machinery, introduced in ea149b36c7f5 ("x86,
>     mce: add basic error injection infrastructure"). At that time it was
>     functional since injecting an MCE resulted in the following call chain:
>     raise_mce()
>       ->machine_check_poll()
>           ->mce_log() - sets notfiy_user_bit
>     ->mce_notify_user() (current mce_notify_irq) consumed the bit and called the
>     usermode helper.
>     However, with the introduction of 011d82611172 ("RAS: Add a Corrected Errors Collector")
>     the code got moved around and the user mode helper began to be called
>     via the early notifier (mce_first_notifier()) rendering the remaining call
>     in raise_local() defunct as the mce_need_notify bit (ex notify_user) is
>     only being set from the early notifier.
>     Remove the noop call and make mce_notify_irq static. No functional
>     changes.

Ah, there was that. We're doing a serious dance just to log an error, that's
for sure.

Yeah, that makes sense.

Thx.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index eb2db07ef39c..6c77c03139f7 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -296,8 +296,6 @@  enum mcp_flags {
 
 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
-bool mce_notify_irq(void);
-
 DECLARE_PER_CPU(struct mce, injectm);
 
 /* Disable CMCI/polling for MCA bank claimed by firmware */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0dc00c9894c7..89625ff79c3b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -584,6 +584,28 @@  bool mce_is_correctable(struct mce *m)
 }
 EXPORT_SYMBOL_GPL(mce_is_correctable);
 
+/*
+ * Notify the user(s) about new machine check events.
+ * Can be called from interrupt context, but not from machine check/NMI
+ * context.
+ */
+static int mce_notify_irq(void)
+{
+	/* Not more than two messages every minute */
+	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
+
+	if (test_and_clear_bit(0, &mce_need_notify)) {
+		mce_work_trigger();
+
+		if (__ratelimit(&ratelimit))
+			pr_info(HW_ERR "Machine check events logged\n");
+
+		return 1;
+	}
+
+	return 0;
+}
+
 static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
@@ -1773,28 +1795,6 @@  static void mce_timer_delete_all(void)
 		del_timer_sync(&per_cpu(mce_timer, cpu));
 }
 
-/*
- * Notify the user(s) about new machine check events.
- * Can be called from interrupt context, but not from machine check/NMI
- * context.
- */
-bool mce_notify_irq(void)
-{
-	/* Not more than two messages every minute */
-	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
-
-	if (test_and_clear_bit(0, &mce_need_notify)) {
-		mce_work_trigger();
-
-		if (__ratelimit(&ratelimit))
-			pr_info(HW_ERR "Machine check events logged\n");
-
-		return true;
-	}
-	return false;
-}
-EXPORT_SYMBOL_GPL(mce_notify_irq);
-
 static void __mcheck_cpu_mce_banks_init(void)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 313fe682db33..06e3cf7229ce 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -229,7 +229,6 @@  static int raise_local(void)
 	} else if (m->status) {
 		pr_info("Starting machine check poll CPU %d\n", cpu);
 		raise_poll(m);
-		mce_notify_irq();
 		pr_info("Machine check poll done on CPU %d\n", cpu);
 	} else
 		m->finished = 0;