Message ID | 20231118193248.1296798-3-yazen.ghannam@amd.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | MCA Updates | expand |
On Sat, Nov 18, 2023 at 01:32:30PM -0600, Yazen Ghannam wrote: > +void mce_setup_global(struct mce *m) We usually call those things "common": mce_setup_common(). > +{ > + memset(m, 0, sizeof(struct mce)); > + > + m->cpuid = cpuid_eax(1); > + m->cpuvendor = boot_cpu_data.x86_vendor; > + m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); > + /* need the internal __ version to avoid deadlocks */ > + m->time = __ktime_get_real_seconds(); > +} > + > +void mce_setup_per_cpu(struct mce *m) And call this mce_setup_for_cpu(unsigned int cpu, struct mce *m); so that it doesn't look like some per_cpu helper. And yes, you should supply the CPU number as an argument. Because otherwise, when you look at your next change: + mce_setup_global(&m); + m.cpu = m.extcpu = cpu; + mce_setup_per_cpu(&m); This contains the "hidden" requirement that m.extcpu happens *always* *before* the mce_setup_per_cpu() call and that is flaky and error prone. So make that: mce_setup_common(&m); mce_setup_for_cpu(m.extcpu, &m); and do m.cpu = m.extcpu = cpu inside the second function. And then it JustWorks(tm) and you can't "forget" assigning m.extcpu and there's no subtlety. Ok?
On 11/22/2023 1:24 PM, Borislav Petkov wrote: > On Sat, Nov 18, 2023 at 01:32:30PM -0600, Yazen Ghannam wrote: >> +void mce_setup_global(struct mce *m) > > We usually call those things "common": > > mce_setup_common(). > >> +{ >> + memset(m, 0, sizeof(struct mce)); >> + >> + m->cpuid = cpuid_eax(1); >> + m->cpuvendor = boot_cpu_data.x86_vendor; >> + m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); >> + /* need the internal __ version to avoid deadlocks */ >> + m->time = __ktime_get_real_seconds(); >> +} >> + >> +void mce_setup_per_cpu(struct mce *m) > > And call this > > mce_setup_for_cpu(unsigned int cpu, struct mce *m); > > so that it doesn't look like some per_cpu helper. > > And yes, you should supply the CPU number as an argument. Because > otherwise, when you look at your next change: > > > + mce_setup_global(&m); > + m.cpu = m.extcpu = cpu; > + mce_setup_per_cpu(&m); > > This contains the "hidden" requirement that m.extcpu happens *always* > *before* the mce_setup_per_cpu() call and that is flaky and error prone. > > So make that: > > mce_setup_common(&m); > mce_setup_for_cpu(m.extcpu, &m); > > and do m.cpu = m.extcpu = cpu inside the second function. > > And then it JustWorks(tm) and you can't "forget" assigning m.extcpu and > there's no subtlety. > > Ok? > Yep, understood. Thanks! -Yazen
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 1642018dd6c9..7e86086aa19c 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -115,20 +115,31 @@ static struct irq_work mce_irq_work; */ BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain); +void mce_setup_global(struct mce *m) +{ + memset(m, 0, sizeof(struct mce)); + + m->cpuid = cpuid_eax(1); + m->cpuvendor = boot_cpu_data.x86_vendor; + m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); + /* need the internal __ version to avoid deadlocks */ + m->time = __ktime_get_real_seconds(); +} + +void mce_setup_per_cpu(struct mce *m) +{ + m->apicid = cpu_data(m->extcpu).topo.initial_apicid; + m->microcode = cpu_data(m->extcpu).microcode; + m->ppin = cpu_data(m->extcpu).ppin; + m->socketid = cpu_data(m->extcpu).topo.pkg_id; +} + /* Do initial initialization of a struct mce */ void mce_setup(struct mce *m) { - memset(m, 0, sizeof(struct mce)); + mce_setup_global(m); m->cpu = m->extcpu = smp_processor_id(); - /* need the internal __ version to avoid deadlocks */ - m->time = __ktime_get_real_seconds(); - m->cpuvendor = boot_cpu_data.x86_vendor; - m->cpuid = cpuid_eax(1); - m->socketid = cpu_data(m->extcpu).topo.pkg_id; - m->apicid = cpu_data(m->extcpu).topo.initial_apicid; - m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); - m->ppin = cpu_data(m->extcpu).ppin; - m->microcode = boot_cpu_data.microcode; + mce_setup_per_cpu(m); } DEFINE_PER_CPU(struct mce, injectm);
Generally, MCA information for an error is gathered on the CPU that reported the error. In this case, CPU-specific information from the running CPU will be correct. However, this will be incorrect if the MCA information is gathered while running on a CPU that didn't report the error. One example is creating an MCA record using mce_setup() for errors reported from ACPI. Split mce_setup() so that there is a helper function to gather global, i.e. not CPU-specific, information and another helper for CPU-specific information. Don't set the CPU number in either helper function. This will be set appropriately for each call site of the helpers. Leave mce_setup() defined as-is for the common case when running on the reporting CPU. Get MCG_CAP in the global helper even though the register is per-CPU. This value is not already cached per-CPU like other values. And it does not assist with any per-CPU decoding or handling. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- arch/x86/kernel/cpu/mce/core.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-)