Message ID | 20240625195624.2565741-4-avadhut.naik@amd.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | MCE wrapper and support for new SMCA syndrome MSRs | expand |
On Tue, Jun 25, 2024 at 02:56:23PM -0500, Avadhut Naik wrote: > From: Yazen Ghannam <yazen.ghannam@amd.com> > > ACPI Boot Error Record Table (BERT) is being used by the kernel to > report errors that occurred in a previous boot. On some modern AMD > systems, these very errors within the BERT are reported through the > x86 Common Platform Error Record (CPER) format which consists of one > or more Processor Context Information Structures. These context > structures provide a starting address and represent an x86 MSR range > in which the data constitutes a contiguous set of MSRs starting from, > and including the starting address. > > It's common, for AMD systems that implement this behavior, that the > MSR range represents the MCAX register space used for the Scalable MCA > feature. The apei_smca_report_x86_error() function decodes and passes > this information through the MCE notifier chain. However, this function > assumes a fixed register size based on the original HW/FW implementation. > > This assumption breaks with the addition of two new MCAX registers viz. > MCA_SYND1 and MCA_SYND2. These registers are added at the end of the > MCAX register space, so they won't be included when decoding the CPER > data. > > Rework apei_smca_report_x86_error() to support a variable register array > size. This covers any case where the MSR context information starts at > the MCAX address for MCA_STATUS and ends at any other register within > the MCAX register space. > > Add code comments indicating the MCAX register at each offset. > > [Yazen: Add Avadhut as co-developer for wrapper changes.] > > Co-developed-by: Avadhut Naik <avadhut.naik@amd.com> > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> This needs Avadhut's SOB after Yazen's. Touchups ontop: diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 7a15f0ca1bd1..6bbeb29125a9 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -69,7 +69,7 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) { const u64 *i_mce = ((const u64 *) (ctx_info + 1)); - unsigned int cpu, num_registers; + unsigned int cpu, num_regs; struct mce_hw_err err; struct mce *m = &err.m; @@ -93,10 +93,10 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) /* * The number of registers in the register array is determined by * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2. - * Ensure that the array size includes at least 1 register. + * Sanity-check registers array size. */ - num_registers = ctx_info->reg_arr_size >> 3; - if (!num_registers) + num_regs = ctx_info->reg_arr_size >> 3; + if (!num_regs) return -EINVAL; mce_setup(m); @@ -118,13 +118,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) /* * The SMCA register layout is fixed and includes 16 registers. * The end of the array may be variable, but the beginning is known. - * Switch on the number of registers. Cap the number of registers to - * expected max (15). + * Cap the number of registers to expected max (15). */ - if (num_registers > 15) - num_registers = 15; + if (num_regs > 15) + num_regs = 15; - switch (num_registers) { + switch (num_regs) { /* MCA_SYND2 */ case 15: err.vi.amd.synd2 = *(i_mce + 14);
On 6/26/2024 06:57, Borislav Petkov wrote: > On Tue, Jun 25, 2024 at 02:56:23PM -0500, Avadhut Naik wrote: >> From: Yazen Ghannam <yazen.ghannam@amd.com> >> >> ACPI Boot Error Record Table (BERT) is being used by the kernel to >> report errors that occurred in a previous boot. On some modern AMD >> systems, these very errors within the BERT are reported through the >> x86 Common Platform Error Record (CPER) format which consists of one >> or more Processor Context Information Structures. These context >> structures provide a starting address and represent an x86 MSR range >> in which the data constitutes a contiguous set of MSRs starting from, >> and including the starting address. >> >> It's common, for AMD systems that implement this behavior, that the >> MSR range represents the MCAX register space used for the Scalable MCA >> feature. The apei_smca_report_x86_error() function decodes and passes >> this information through the MCE notifier chain. However, this function >> assumes a fixed register size based on the original HW/FW implementation. >> >> This assumption breaks with the addition of two new MCAX registers viz. >> MCA_SYND1 and MCA_SYND2. These registers are added at the end of the >> MCAX register space, so they won't be included when decoding the CPER >> data. >> >> Rework apei_smca_report_x86_error() to support a variable register array >> size. This covers any case where the MSR context information starts at >> the MCAX address for MCA_STATUS and ends at any other register within >> the MCAX register space. >> >> Add code comments indicating the MCAX register at each offset. >> >> [Yazen: Add Avadhut as co-developer for wrapper changes.] >> >> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> > > This needs Avadhut's SOB after Yazen's. > Will do. Will change to the below format: Co-developed-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > Touchups ontop: > > diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c > index 7a15f0ca1bd1..6bbeb29125a9 100644 > --- a/arch/x86/kernel/cpu/mce/apei.c > +++ b/arch/x86/kernel/cpu/mce/apei.c > @@ -69,7 +69,7 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); > int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) > { > const u64 *i_mce = ((const u64 *) (ctx_info + 1)); > - unsigned int cpu, num_registers; > + unsigned int cpu, num_regs; > struct mce_hw_err err; > struct mce *m = &err.m; > > @@ -93,10 +93,10 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) > /* > * The number of registers in the register array is determined by > * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2. > - * Ensure that the array size includes at least 1 register. > + * Sanity-check registers array size. > */ > - num_registers = ctx_info->reg_arr_size >> 3; > - if (!num_registers) > + num_regs = ctx_info->reg_arr_size >> 3; > + if (!num_regs) > return -EINVAL; > > mce_setup(m); > @@ -118,13 +118,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) > /* > * The SMCA register layout is fixed and includes 16 registers. > * The end of the array may be variable, but the beginning is known. > - * Switch on the number of registers. Cap the number of registers to > - * expected max (15). > + * Cap the number of registers to expected max (15). > */ > - if (num_registers > 15) > - num_registers = 15; > + if (num_regs > 15) > + num_regs = 15; > > - switch (num_registers) { > + switch (num_regs) { > /* MCA_SYND2 */ > case 15: > err.vi.amd.synd2 = *(i_mce + 14); > Will incorporate these.
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index b8f4e75fb8a7..7a15f0ca1bd1 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -69,9 +69,9 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) { const u64 *i_mce = ((const u64 *) (ctx_info + 1)); + unsigned int cpu, num_registers; struct mce_hw_err err; struct mce *m = &err.m; - unsigned int cpu; memset(&err, 0, sizeof(struct mce_hw_err)); @@ -91,16 +91,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) return -EINVAL; /* - * The register array size must be large enough to include all the - * SMCA registers which need to be extracted. - * * The number of registers in the register array is determined by * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2. - * The register layout is fixed and currently the raw data in the - * register array includes 6 SMCA registers which the kernel can - * extract. + * Ensure that the array size includes at least 1 register. */ - if (ctx_info->reg_arr_size < 48) + num_registers = ctx_info->reg_arr_size >> 3; + if (!num_registers) return -EINVAL; mce_setup(m); @@ -118,12 +114,61 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) m->apicid = lapic_id; m->bank = (ctx_info->msr_addr >> 4) & 0xFF; - m->status = *i_mce; - m->addr = *(i_mce + 1); - m->misc = *(i_mce + 2); - /* Skipping MCA_CONFIG */ - m->ipid = *(i_mce + 4); - m->synd = *(i_mce + 5); + + /* + * The SMCA register layout is fixed and includes 16 registers. + * The end of the array may be variable, but the beginning is known. + * Switch on the number of registers. Cap the number of registers to + * expected max (15). + */ + if (num_registers > 15) + num_registers = 15; + + switch (num_registers) { + /* MCA_SYND2 */ + case 15: + err.vi.amd.synd2 = *(i_mce + 14); + fallthrough; + /* MCA_SYND1 */ + case 14: + err.vi.amd.synd1 = *(i_mce + 13); + fallthrough; + /* MCA_MISC4 */ + case 13: + /* MCA_MISC3 */ + case 12: + /* MCA_MISC2 */ + case 11: + /* MCA_MISC1 */ + case 10: + /* MCA_DEADDR */ + case 9: + /* MCA_DESTAT */ + case 8: + /* reserved */ + case 7: + /* MCA_SYND */ + case 6: + m->synd = *(i_mce + 5); + fallthrough; + /* MCA_IPID */ + case 5: + m->ipid = *(i_mce + 4); + fallthrough; + /* MCA_CONFIG */ + case 4: + /* MCA_MISC0 */ + case 3: + m->misc = *(i_mce + 2); + fallthrough; + /* MCA_ADDR */ + case 2: + m->addr = *(i_mce + 1); + fallthrough; + /* MCA_STATUS */ + case 1: + m->status = *i_mce; + } mce_log(&err);