diff mbox series

x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

Message ID 20230417162006.3292715-1-yazen.ghannam@amd.com (mailing list archive)
State New, archived
Headers show
Series x86/mce: Schedule mce_setup() on correct CPU for CPER decoding | expand

Commit Message

Yazen Ghannam April 17, 2023, 4:20 p.m. UTC
Scalable MCA systems may report errors found during boot-time polling
through the ACPI Boot Error Record Table (BERT). The errors are logged
in an "x86 Processor" Common Platform Error Record (CPER). The format of
the x86 CPER does not include a logical CPU number, but it does provide
the logical APIC ID for the logical CPU. Also, it does not explicitly
provide MCA error information, but it can share this information using
an "MSR Context" defined in the CPER format.

The MCA error information is parsed by
1) Checking that the context matches the Scalable MCA register space.
2) Finding the logical CPU that matches the logical APIC ID from the
   CPER.
3) Filling in struct mce with the relevant data and logging it.

All the above is done when the BERT is processed during late init. This
can be scheduled on any CPU, and it may be preemptible.

This results in two issues.
1) mce_setup() includes a call to smp_processor_id(). This will throw a
   warning if preemption is enabled.
2) mce_setup() will pull info from the executing CPU, so some info in
   struct mce may be incorrect for the CPU with the error. For example,
   in a dual-socket system, an error logged in socket 1 CPU but
   processed by a socket 0 CPU will save the PPIN of the socket 0 CPU.

Fix both issues by scheduling mce_setup() to run on the logical CPU
indicated in the error record. Preemption is disabled when calling
smp_call_function_*() resolving issue #1. And the error info is gathered
from the proper logical CPU resolving issue #2.

Furthermore, smp_call_function_*() handles calls with invalid CPU
numbers, etc. So extra checking by the caller is not necessary.

Fixes: 4a24d80b8c3e ("x86/mce, cper: Pass x86 CPER through the MCA handling chain")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/mce/apei.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

Comments

Tony Luck April 17, 2023, 5:17 p.m. UTC | #1
> int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)

I'm a bit puzzled why this function has "smca" in the name. I think it is called
unconditionally on Intel as well as AMD systems ... and looks like it will do useful
things if Intel someday has a hybrid server that reports APEI errors (maybe it is
already useful on hybrid clients?).

Otherwise this look looks fine.

Acked-by: Tony Luck <tony.luck@intel.com>

-Tony
Yazen Ghannam April 17, 2023, 5:28 p.m. UTC | #2
On 4/17/23 13:17, Luck, Tony wrote:
>> int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
> 
> I'm a bit puzzled why this function has "smca" in the name. I think it is called
> unconditionally on Intel as well as AMD systems ... and looks like it will do useful
> things if Intel someday has a hybrid server that reports APEI errors (maybe it is
> already useful on hybrid clients?).
> 
> Otherwise this look looks fine.
> 
> Acked-by: Tony Luck <tony.luck@intel.com>
> 

Thanks Tony.

The function expects the data to match the "MCAX" register layout used on
Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets
for legacy MCA and MCAX. But the rest will be different.

This could be extended though, if other systems will use it.

Thanks,
Yazen
Tony Luck April 17, 2023, 5:39 p.m. UTC | #3
> The function expects the data to match the "MCAX" register layout used on
> Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets
> for legacy MCA and MCAX. But the rest will be different.

Ah yes. Looking at the code, rather than the patch I see that it first checks for SMCA
and returns if it isn't on an SMCA system.

        if (!boot_cpu_has(X86_FEATURE_SMCA))
                return -EINVAL;

> This could be extended though, if other systems will use it.

I'll keep it in mind if Intel makes a hybrid server.

-Tony
Yazen Ghannam June 9, 2023, 2:26 p.m. UTC | #4
On 4/17/23 1:39 PM, Luck, Tony wrote:
>> The function expects the data to match the "MCAX" register layout used on
>> Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets
>> for legacy MCA and MCAX. But the rest will be different.
> 
> Ah yes. Looking at the code, rather than the patch I see that it first checks for SMCA
> and returns if it isn't on an SMCA system.
> 
>         if (!boot_cpu_has(X86_FEATURE_SMCA))
>                 return -EINVAL;
> 
>> This could be extended though, if other systems will use it.
> 
> I'll keep it in mind if Intel makes a hybrid server.
>

Hi all,

Any further comments on this?

Thanks,
Yazen
Borislav Petkov June 15, 2023, 3:20 p.m. UTC | #5
On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote:
> @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>  	if (ctx_info->reg_arr_size < 48)
>  		return -EINVAL;
>  
> -	mce_setup(&m);
> -
> -	m.extcpu = -1;
> -	m.socketid = -1;
> -
> -	for_each_possible_cpu(cpu) {
> -		if (cpu_data(cpu).initial_apicid == lapic_id) {
> -			m.extcpu = cpu;
> -			m.socketid = cpu_data(m.extcpu).phys_proc_id;
> +	for_each_possible_cpu(cpu)
> +		if (cpu_data(cpu).initial_apicid == lapic_id)
>  			break;
> -		}
> -	}
>  
> -	m.apicid = lapic_id;
> +	if (smp_call_function_single(cpu, __mce_setup, &m, 1))

I can see the following call-chain from NMI context which is a no-no:

ghes_notify_nmi
|-> ghes_in_nmi_spool_from_list
    |-> ghes_in_nmi_queue_one_entry
        |-> __ghes_panic
	    |-> __ghes_print_estatus
	        |-> cper_estatus_print
		    |-> cper_estatus_print_section
		        |-> cper_print_proc_ia
			    |-> arch_apei_report_x86_error
			        |-> apei_smca_report_x86_error
				    |-> smp_call_function_single
Yazen Ghannam June 15, 2023, 3:34 p.m. UTC | #6
On 6/15/2023 11:20 AM, Borislav Petkov wrote:
> On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote:
>> @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>>   	if (ctx_info->reg_arr_size < 48)
>>   		return -EINVAL;
>>   
>> -	mce_setup(&m);
>> -
>> -	m.extcpu = -1;
>> -	m.socketid = -1;
>> -
>> -	for_each_possible_cpu(cpu) {
>> -		if (cpu_data(cpu).initial_apicid == lapic_id) {
>> -			m.extcpu = cpu;
>> -			m.socketid = cpu_data(m.extcpu).phys_proc_id;
>> +	for_each_possible_cpu(cpu)
>> +		if (cpu_data(cpu).initial_apicid == lapic_id)
>>   			break;
>> -		}
>> -	}
>>   
>> -	m.apicid = lapic_id;
>> +	if (smp_call_function_single(cpu, __mce_setup, &m, 1))
> 
> I can see the following call-chain from NMI context which is a no-no:
> 
> ghes_notify_nmi
> |-> ghes_in_nmi_spool_from_list
>      |-> ghes_in_nmi_queue_one_entry
>          |-> __ghes_panic
> 	    |-> __ghes_print_estatus
> 	        |-> cper_estatus_print
> 		    |-> cper_estatus_print_section
> 		        |-> cper_print_proc_ia
> 			    |-> arch_apei_report_x86_error
> 			        |-> apei_smca_report_x86_error
> 				    |-> smp_call_function_single
> 
> 

Right, but in practice SMCA errors are not reported through GHES at 
runtime. They will only come in through BERT at boot time. There aren't 
any plans to change this, so the NMI issue won't be encountered.

I can include this info in the commit message and/or code comments. Is 
this okay?

We can solve the NMI issue if it ever comes up in the future. Unless 
there's an obvious change to avoid this now. Any suggestions?

Thanks,
Yazen
Borislav Petkov June 15, 2023, 4:20 p.m. UTC | #7
On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote:
> We can solve the NMI issue if it ever comes up in the future. Unless there's
> an obvious change to avoid this now. Any suggestions?

Yes, solve it right from the get-go. "It cannot happen now" is not good
enough. It should not be even technically possible.

Just report what's logged into BERT - nothing more. Whoever needs the
remaining info, can dump it from the machine.

Thx.
Yazen Ghannam June 15, 2023, 5:02 p.m. UTC | #8
On 6/15/2023 12:20 PM, Borislav Petkov wrote:
> On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote:
>> We can solve the NMI issue if it ever comes up in the future. Unless there's
>> an obvious change to avoid this now. Any suggestions?
> 
> Yes, solve it right from the get-go. "It cannot happen now" is not good
> enough. It should not be even technically possible.
>

Okay, understood.

> Just report what's logged into BERT - nothing more. Whoever needs the
> remaining info, can dump it from the machine.
> 

Will do.

Thanks,
Yazen
Yazen Ghannam June 15, 2023, 5:39 p.m. UTC | #9
On 6/15/2023 1:02 PM, Yazen Ghannam wrote:
> On 6/15/2023 12:20 PM, Borislav Petkov wrote:
>> On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote:
>>> We can solve the NMI issue if it ever comes up in the future. Unless 
>>> there's
>>> an obvious change to avoid this now. Any suggestions?
>>
>> Yes, solve it right from the get-go. "It cannot happen now" is not good
>> enough. It should not be even technically possible.
>>
> 
> Okay, understood.
> 
>> Just report what's logged into BERT - nothing more. Whoever needs the
>> remaining info, can dump it from the machine.
>>
> 
> Will do.
> 

How about these changes? I can split this into separate preemption and 
PPIN patches.

The PPIN isn't coming directly from BERT. But it seems we can use the 
per_cpu value which is set up during CPU init.

Thanks,
Yazen

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 8ed341714686..db16dc3c7b03 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -97,15 +97,19 @@ int apei_smca_report_x86_error(struct 
cper_ia_proc_ctx *ctx_info, u64 lapic_id)
         if (ctx_info->reg_arr_size < 48)
                 return -EINVAL;

+       get_cpu();
         mce_setup(&m);
+       put_cpu();

         m.extcpu = -1;
         m.socketid = -1;
+       m.ppin = 0;

         for_each_possible_cpu(cpu) {
                 if (cpu_data(cpu).initial_apicid == lapic_id) {
                         m.extcpu = cpu;
                         m.socketid = cpu_data(m.extcpu).phys_proc_id;
+                       m.ppin = cpu_data(m.extcpu).ppin;
                         break;
                 }
         }
Tony Luck June 15, 2023, 5:54 p.m. UTC | #10
>         for_each_possible_cpu(cpu) {
>                 if (cpu_data(cpu).initial_apicid == lapic_id) {
>                         m.extcpu = cpu;

Are there any other places in the kernel where we need to find the
Linux CPU number starting from the apic id?

I briefly looked a while back when I needed this for some
debug use case and didn't find it then. But Linux changes
and maybe there are some now.

Which is a long way to say I'd like to see boot code initialize an array so the above
could just be:

	m.extcpu = lapic_to_cpu[lapic_id];

But we'd need at least a couple of use cases to justify.

-Tony
Yazen Ghannam June 16, 2023, 2:16 p.m. UTC | #11
On 6/15/2023 1:54 PM, Luck, Tony wrote:
>>          for_each_possible_cpu(cpu) {
>>                  if (cpu_data(cpu).initial_apicid == lapic_id) {
>>                          m.extcpu = cpu;
> 
> Are there any other places in the kernel where we need to find the
> Linux CPU number starting from the apic id?
> 
> I briefly looked a while back when I needed this for some
> debug use case and didn't find it then. But Linux changes
> and maybe there are some now.
> 
> Which is a long way to say I'd like to see boot code initialize an array so the above
> could just be:
> 
> 	m.extcpu = lapic_to_cpu[lapic_id];
>

So functionally the inverse of "x86_cpu_to_apicid", right? Though not 
per_cpu, of course.

I'll try and write this up if I don't find anything like it.

> But we'd need at least a couple of use cases to justify.
> 

Can you describe your original use case?

Thanks,
Yazen
Tony Luck June 16, 2023, 4:05 p.m. UTC | #12
> Can you describe your original use case?

<vague_mode>
I was playing with a potential future feature where I was given the x2apicid and
had to find the Linux CPU number.
</vaguemode>

-Tony
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 8ed341714686..5c0381a4a66f 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -63,6 +63,11 @@  void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 }
 EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 
+static void __mce_setup(void *info)
+{
+	mce_setup((struct mce *)info);
+}
+
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
 	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
@@ -97,20 +102,13 @@  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 	if (ctx_info->reg_arr_size < 48)
 		return -EINVAL;
 
-	mce_setup(&m);
-
-	m.extcpu = -1;
-	m.socketid = -1;
-
-	for_each_possible_cpu(cpu) {
-		if (cpu_data(cpu).initial_apicid == lapic_id) {
-			m.extcpu = cpu;
-			m.socketid = cpu_data(m.extcpu).phys_proc_id;
+	for_each_possible_cpu(cpu)
+		if (cpu_data(cpu).initial_apicid == lapic_id)
 			break;
-		}
-	}
 
-	m.apicid = lapic_id;
+	if (smp_call_function_single(cpu, __mce_setup, &m, 1))
+		return -EINVAL;
+
 	m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
 	m.status = *i_mce;
 	m.addr = *(i_mce + 1);