diff mbox series

[v1,2/4] x86/mce: dump error msg from severities

Message ID 20250211060200.33845-3-xueshuai@linux.alibaba.com (mailing list archive)
State New
Headers show
Series fmm/hwpoison: Fix regressions in memory failure handling | expand

Commit Message

Shuai Xue Feb. 11, 2025, 6:01 a.m. UTC
The message in severities is useful for identifying the type of MCE that
has occurred; dump it if it is valid.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 arch/x86/kernel/cpu/mce/core.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Luck, Tony Feb. 11, 2025, 4:44 p.m. UTC | #1
> The message in severities is useful for identifying the type of MCE that
> has occurred; dump it if it is valid.
>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2919a077cd66..c1319db45b0a 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1456,6 +1456,8 @@ static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(stru
>       if (count > 1)
>               return;
>
> +     if (msg)
> +             pr_err("%s\n", msg);
>       task_work_add(current, &current->mce_kill_me, TWA_RESUME);
>  }

This is called from the #MC handler. Is that a safe context to print a console
message? It wasn't in the past, but maybe changes to how console messages
are handled have changed this.

-Tony
Shuai Xue Feb. 14, 2025, 9:29 a.m. UTC | #2
在 2025/2/12 00:44, Luck, Tony 写道:
>> The message in severities is useful for identifying the type of MCE that
>> has occurred; dump it if it is valid.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>>   arch/x86/kernel/cpu/mce/core.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 2919a077cd66..c1319db45b0a 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1456,6 +1456,8 @@ static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(stru
>>        if (count > 1)
>>                return;
>>
>> +     if (msg)
>> +             pr_err("%s\n", msg);
>>        task_work_add(current, &current->mce_kill_me, TWA_RESUME);
>>   }
> 
> This is called from the #MC handler. Is that a safe context to print a console
> message? It wasn't in the past, but maybe changes to how console messages
> are handled have changed this.
> 
> -Tony

#MC is a kind of NMI context, as far as I know, since

commit 42a0bb3f71383b457a7db362f1c69e7afb96732b
printk/nmi: generic solution for safe printk in NMI

print a console message is safe.

Please correct me if I missed anything.

Thanks.
Shuai
Luck, Tony Feb. 14, 2025, 4:57 p.m. UTC | #3
> > This is called from the #MC handler. Is that a safe context to print a console
> > message? It wasn't in the past, but maybe changes to how console messages
> > are handled have changed this.
> >
> > -Tony
>
> #MC is a kind of NMI context, as far as I know, since
>
> commit 42a0bb3f71383b457a7db362f1c69e7afb96732b
> printk/nmi: generic solution for safe printk in NMI
>
> print a console message is safe.
>
> Please correct me if I missed anything.

wow, that's v4.7 (ancient history). I thought I'd had issues with debug
messages in the machine check handler more recently than that, but
perhaps I'm misremembering.

-Tony
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2919a077cd66..c1319db45b0a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1456,6 +1456,8 @@  static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(stru
 	if (count > 1)
 		return;
 
+	if (msg)
+		pr_err("%s\n", msg);
 	task_work_add(current, &current->mce_kill_me, TWA_RESUME);
 }