Message ID | 20230417162006.3292715-1-yazen.ghannam@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mce: Schedule mce_setup() on correct CPU for CPER decoding | expand |
> int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) I'm a bit puzzled why this function has "smca" in the name. I think it is called unconditionally on Intel as well as AMD systems ... and looks like it will do useful things if Intel someday has a hybrid server that reports APEI errors (maybe it is already useful on hybrid clients?). Otherwise this look looks fine. Acked-by: Tony Luck <tony.luck@intel.com> -Tony
On 4/17/23 13:17, Luck, Tony wrote: >> int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) > > I'm a bit puzzled why this function has "smca" in the name. I think it is called > unconditionally on Intel as well as AMD systems ... and looks like it will do useful > things if Intel someday has a hybrid server that reports APEI errors (maybe it is > already useful on hybrid clients?). > > Otherwise this look looks fine. > > Acked-by: Tony Luck <tony.luck@intel.com> > Thanks Tony. The function expects the data to match the "MCAX" register layout used on Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets for legacy MCA and MCAX. But the rest will be different. This could be extended though, if other systems will use it. Thanks, Yazen
> The function expects the data to match the "MCAX" register layout used on > Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets > for legacy MCA and MCAX. But the rest will be different. Ah yes. Looking at the code, rather than the patch I see that it first checks for SMCA and returns if it isn't on an SMCA system. if (!boot_cpu_has(X86_FEATURE_SMCA)) return -EINVAL; > This could be extended though, if other systems will use it. I'll keep it in mind if Intel makes a hybrid server. -Tony
On 4/17/23 1:39 PM, Luck, Tony wrote: >> The function expects the data to match the "MCAX" register layout used on >> Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets >> for legacy MCA and MCAX. But the rest will be different. > > Ah yes. Looking at the code, rather than the patch I see that it first checks for SMCA > and returns if it isn't on an SMCA system. > > if (!boot_cpu_has(X86_FEATURE_SMCA)) > return -EINVAL; > >> This could be extended though, if other systems will use it. > > I'll keep it in mind if Intel makes a hybrid server. > Hi all, Any further comments on this? Thanks, Yazen
On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote: > @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) > if (ctx_info->reg_arr_size < 48) > return -EINVAL; > > - mce_setup(&m); > - > - m.extcpu = -1; > - m.socketid = -1; > - > - for_each_possible_cpu(cpu) { > - if (cpu_data(cpu).initial_apicid == lapic_id) { > - m.extcpu = cpu; > - m.socketid = cpu_data(m.extcpu).phys_proc_id; > + for_each_possible_cpu(cpu) > + if (cpu_data(cpu).initial_apicid == lapic_id) > break; > - } > - } > > - m.apicid = lapic_id; > + if (smp_call_function_single(cpu, __mce_setup, &m, 1)) I can see the following call-chain from NMI context which is a no-no: ghes_notify_nmi |-> ghes_in_nmi_spool_from_list |-> ghes_in_nmi_queue_one_entry |-> __ghes_panic |-> __ghes_print_estatus |-> cper_estatus_print |-> cper_estatus_print_section |-> cper_print_proc_ia |-> arch_apei_report_x86_error |-> apei_smca_report_x86_error |-> smp_call_function_single
On 6/15/2023 11:20 AM, Borislav Petkov wrote: > On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote: >> @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) >> if (ctx_info->reg_arr_size < 48) >> return -EINVAL; >> >> - mce_setup(&m); >> - >> - m.extcpu = -1; >> - m.socketid = -1; >> - >> - for_each_possible_cpu(cpu) { >> - if (cpu_data(cpu).initial_apicid == lapic_id) { >> - m.extcpu = cpu; >> - m.socketid = cpu_data(m.extcpu).phys_proc_id; >> + for_each_possible_cpu(cpu) >> + if (cpu_data(cpu).initial_apicid == lapic_id) >> break; >> - } >> - } >> >> - m.apicid = lapic_id; >> + if (smp_call_function_single(cpu, __mce_setup, &m, 1)) > > I can see the following call-chain from NMI context which is a no-no: > > ghes_notify_nmi > |-> ghes_in_nmi_spool_from_list > |-> ghes_in_nmi_queue_one_entry > |-> __ghes_panic > |-> __ghes_print_estatus > |-> cper_estatus_print > |-> cper_estatus_print_section > |-> cper_print_proc_ia > |-> arch_apei_report_x86_error > |-> apei_smca_report_x86_error > |-> smp_call_function_single > > Right, but in practice SMCA errors are not reported through GHES at runtime. They will only come in through BERT at boot time. There aren't any plans to change this, so the NMI issue won't be encountered. I can include this info in the commit message and/or code comments. Is this okay? We can solve the NMI issue if it ever comes up in the future. Unless there's an obvious change to avoid this now. Any suggestions? Thanks, Yazen
On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote: > We can solve the NMI issue if it ever comes up in the future. Unless there's > an obvious change to avoid this now. Any suggestions? Yes, solve it right from the get-go. "It cannot happen now" is not good enough. It should not be even technically possible. Just report what's logged into BERT - nothing more. Whoever needs the remaining info, can dump it from the machine. Thx.
On 6/15/2023 12:20 PM, Borislav Petkov wrote: > On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote: >> We can solve the NMI issue if it ever comes up in the future. Unless there's >> an obvious change to avoid this now. Any suggestions? > > Yes, solve it right from the get-go. "It cannot happen now" is not good > enough. It should not be even technically possible. > Okay, understood. > Just report what's logged into BERT - nothing more. Whoever needs the > remaining info, can dump it from the machine. > Will do. Thanks, Yazen
On 6/15/2023 1:02 PM, Yazen Ghannam wrote: > On 6/15/2023 12:20 PM, Borislav Petkov wrote: >> On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote: >>> We can solve the NMI issue if it ever comes up in the future. Unless >>> there's >>> an obvious change to avoid this now. Any suggestions? >> >> Yes, solve it right from the get-go. "It cannot happen now" is not good >> enough. It should not be even technically possible. >> > > Okay, understood. > >> Just report what's logged into BERT - nothing more. Whoever needs the >> remaining info, can dump it from the machine. >> > > Will do. > How about these changes? I can split this into separate preemption and PPIN patches. The PPIN isn't coming directly from BERT. But it seems we can use the per_cpu value which is set up during CPU init. Thanks, Yazen diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 8ed341714686..db16dc3c7b03 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -97,15 +97,19 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) if (ctx_info->reg_arr_size < 48) return -EINVAL; + get_cpu(); mce_setup(&m); + put_cpu(); m.extcpu = -1; m.socketid = -1; + m.ppin = 0; for_each_possible_cpu(cpu) { if (cpu_data(cpu).initial_apicid == lapic_id) { m.extcpu = cpu; m.socketid = cpu_data(m.extcpu).phys_proc_id; + m.ppin = cpu_data(m.extcpu).ppin; break; } }
> for_each_possible_cpu(cpu) { > if (cpu_data(cpu).initial_apicid == lapic_id) { > m.extcpu = cpu; Are there any other places in the kernel where we need to find the Linux CPU number starting from the apic id? I briefly looked a while back when I needed this for some debug use case and didn't find it then. But Linux changes and maybe there are some now. Which is a long way to say I'd like to see boot code initialize an array so the above could just be: m.extcpu = lapic_to_cpu[lapic_id]; But we'd need at least a couple of use cases to justify. -Tony
On 6/15/2023 1:54 PM, Luck, Tony wrote: >> for_each_possible_cpu(cpu) { >> if (cpu_data(cpu).initial_apicid == lapic_id) { >> m.extcpu = cpu; > > Are there any other places in the kernel where we need to find the > Linux CPU number starting from the apic id? > > I briefly looked a while back when I needed this for some > debug use case and didn't find it then. But Linux changes > and maybe there are some now. > > Which is a long way to say I'd like to see boot code initialize an array so the above > could just be: > > m.extcpu = lapic_to_cpu[lapic_id]; > So functionally the inverse of "x86_cpu_to_apicid", right? Though not per_cpu, of course. I'll try and write this up if I don't find anything like it. > But we'd need at least a couple of use cases to justify. > Can you describe your original use case? Thanks, Yazen
> Can you describe your original use case?
<vague_mode>
I was playing with a potential future feature where I was given the x2apicid and
had to find the Linux CPU number.
</vaguemode>
-Tony
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 8ed341714686..5c0381a4a66f 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -63,6 +63,11 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) } EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); +static void __mce_setup(void *info) +{ + mce_setup((struct mce *)info); +} + int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) { const u64 *i_mce = ((const u64 *) (ctx_info + 1)); @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) if (ctx_info->reg_arr_size < 48) return -EINVAL; - mce_setup(&m); - - m.extcpu = -1; - m.socketid = -1; - - for_each_possible_cpu(cpu) { - if (cpu_data(cpu).initial_apicid == lapic_id) { - m.extcpu = cpu; - m.socketid = cpu_data(m.extcpu).phys_proc_id; + for_each_possible_cpu(cpu) + if (cpu_data(cpu).initial_apicid == lapic_id) break; - } - } - m.apicid = lapic_id; + if (smp_call_function_single(cpu, __mce_setup, &m, 1)) + return -EINVAL; + m.bank = (ctx_info->msr_addr >> 4) & 0xFF; m.status = *i_mce; m.addr = *(i_mce + 1);
Scalable MCA systems may report errors found during boot-time polling through the ACPI Boot Error Record Table (BERT). The errors are logged in an "x86 Processor" Common Platform Error Record (CPER). The format of the x86 CPER does not include a logical CPU number, but it does provide the logical APIC ID for the logical CPU. Also, it does not explicitly provide MCA error information, but it can share this information using an "MSR Context" defined in the CPER format. The MCA error information is parsed by 1) Checking that the context matches the Scalable MCA register space. 2) Finding the logical CPU that matches the logical APIC ID from the CPER. 3) Filling in struct mce with the relevant data and logging it. All the above is done when the BERT is processed during late init. This can be scheduled on any CPU, and it may be preemptible. This results in two issues. 1) mce_setup() includes a call to smp_processor_id(). This will throw a warning if preemption is enabled. 2) mce_setup() will pull info from the executing CPU, so some info in struct mce may be incorrect for the CPU with the error. For example, in a dual-socket system, an error logged in socket 1 CPU but processed by a socket 0 CPU will save the PPIN of the socket 0 CPU. Fix both issues by scheduling mce_setup() to run on the logical CPU indicated in the error record. Preemption is disabled when calling smp_call_function_*() resolving issue #1. And the error info is gathered from the proper logical CPU resolving issue #2. Furthermore, smp_call_function_*() handles calls with invalid CPU numbers, etc. So extra checking by the caller is not necessary. Fixes: 4a24d80b8c3e ("x86/mce, cper: Pass x86 CPER through the MCA handling chain") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: stable@vger.kernel.org --- arch/x86/kernel/cpu/mce/apei.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-)