Message ID | 157252708836.3876.4604398213417262402.stgit@buzz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/MCE/AMD: fix warning about sleep-in-atomic at early boot | expand |
On Thu, Oct 31, 2019 at 04:04:48PM +0300, Konstantin Khlebnikov wrote: > Function smca_configure() is called only for current cpu thus > rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe(). > > BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 > in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1 ^^^^^^^^^^ I'm assuming you hit this on latest upstream too? > Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019 > Call Trace: > dump_stack+0x5c/0x7b > ___might_sleep+0xec/0x110 > wait_for_completion+0x39/0x160 > ? __rdmsr_safe_on_cpu+0x45/0x60 > rdmsr_safe_on_cpu+0xae/0xf0 > ? wrmsr_on_cpus+0x20/0x20 > ? machine_check_poll+0xfd/0x1f0 > ? mce_amd_feature_init+0x190/0x2d0 > mce_amd_feature_init+0x190/0x2d0 > mcheck_cpu_init+0x11a/0x460 > identify_cpu+0x3e2/0x560 > identify_secondary_cpu+0x13/0x80 > smp_store_cpu_info+0x45/0x50 > start_secondary+0xaa/0x200 > secondary_startup_64+0xa4/0xb0 > > Except warning in kernel log everything works fine. > > Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types") > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > --- > arch/x86/kernel/cpu/mce/amd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c > index 6ea7fdc82f3c..c7ab0d38af79 100644 > --- a/arch/x86/kernel/cpu/mce/amd.c > +++ b/arch/x86/kernel/cpu/mce/amd.c > @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu) > if (smca_banks[bank].hwid) > return; > > - if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { > + if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { Yazen, any objections?
> -----Original Message----- > From: Borislav Petkov <bp@alien8.de> > Sent: Thursday, October 31, 2019 10:30 AM > To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>; Ghannam, Yazen <Yazen.Ghannam@amd.com> > Cc: Tony Luck <tony.luck@intel.com>; linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; x86@kernel.org > Subject: Re: [PATCH] x86/MCE/AMD: fix warning about sleep-in-atomic at early boot > > On Thu, Oct 31, 2019 at 04:04:48PM +0300, Konstantin Khlebnikov wrote: > > Function smca_configure() is called only for current cpu thus > > rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe(). > > > > BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 > > in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1 > ^^^^^^^^^^ > > I'm assuming you hit this on latest upstream too? > > > Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019 > > Call Trace: > > dump_stack+0x5c/0x7b > > ___might_sleep+0xec/0x110 > > wait_for_completion+0x39/0x160 > > ? __rdmsr_safe_on_cpu+0x45/0x60 > > rdmsr_safe_on_cpu+0xae/0xf0 > > ? wrmsr_on_cpus+0x20/0x20 > > ? machine_check_poll+0xfd/0x1f0 > > ? mce_amd_feature_init+0x190/0x2d0 > > mce_amd_feature_init+0x190/0x2d0 > > mcheck_cpu_init+0x11a/0x460 > > identify_cpu+0x3e2/0x560 > > identify_secondary_cpu+0x13/0x80 > > smp_store_cpu_info+0x45/0x50 > > start_secondary+0xaa/0x200 > > secondary_startup_64+0xa4/0xb0 > > > > Except warning in kernel log everything works fine. > > > > Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types") > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > --- > > arch/x86/kernel/cpu/mce/amd.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c > > index 6ea7fdc82f3c..c7ab0d38af79 100644 > > --- a/arch/x86/kernel/cpu/mce/amd.c > > +++ b/arch/x86/kernel/cpu/mce/amd.c > > @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu) > > if (smca_banks[bank].hwid) > > return; > > > > - if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { > > + if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { > > Yazen, any objections? > This looks good to me. We can go further and remove the "cpu" parameter from this entire function. But that can be another patch. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks, Yazen
On 31/10/2019 17.29, Borislav Petkov wrote: > On Thu, Oct 31, 2019 at 04:04:48PM +0300, Konstantin Khlebnikov wrote: >> Function smca_configure() is called only for current cpu thus >> rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe(). >> >> BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 >> in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 >> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1 > ^^^^^^^^^^ > > I'm assuming you hit this on latest upstream too? I tried 5.4 once but there was no warning. Code in 4.19 and in mainline almost the same. Probably hardware needs full power cycle to reset state or something else. > >> Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019 >> Call Trace: >> dump_stack+0x5c/0x7b >> ___might_sleep+0xec/0x110 >> wait_for_completion+0x39/0x160 >> ? __rdmsr_safe_on_cpu+0x45/0x60 >> rdmsr_safe_on_cpu+0xae/0xf0 >> ? wrmsr_on_cpus+0x20/0x20 >> ? machine_check_poll+0xfd/0x1f0 >> ? mce_amd_feature_init+0x190/0x2d0 >> mce_amd_feature_init+0x190/0x2d0 >> mcheck_cpu_init+0x11a/0x460 >> identify_cpu+0x3e2/0x560 >> identify_secondary_cpu+0x13/0x80 >> smp_store_cpu_info+0x45/0x50 >> start_secondary+0xaa/0x200 >> secondary_startup_64+0xa4/0xb0 >> >> Except warning in kernel log everything works fine. >> >> Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types") >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> >> --- >> arch/x86/kernel/cpu/mce/amd.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c >> index 6ea7fdc82f3c..c7ab0d38af79 100644 >> --- a/arch/x86/kernel/cpu/mce/amd.c >> +++ b/arch/x86/kernel/cpu/mce/amd.c >> @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu) >> if (smca_banks[bank].hwid) >> return; >> >> - if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { >> + if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { > > Yazen, any objections? >
On Fri, Nov 01, 2019 at 04:39:17PM +0300, Konstantin Khlebnikov wrote: > I tried 5.4 once but there was no warning. > Code in 4.19 and in mainline almost the same. Yes, but early boot code has changed a lot since 4.19. If you can't trigger it on 5.4, then I'll drop the BUG splat from your commit message and change it to talk about replacing the IPI-sending function, which is a good cleanup in itself. Thx.
On Thu, Nov 07, 2019 at 11:53:10AM +0100, Borislav Petkov wrote: > On Fri, Nov 01, 2019 at 04:39:17PM +0300, Konstantin Khlebnikov wrote: > > I tried 5.4 once but there was no warning. > > Code in 4.19 and in mainline almost the same. > > Yes, but early boot code has changed a lot since 4.19. If you can't > trigger it on 5.4, then I'll drop the BUG splat from your commit message > and change it to talk about replacing the IPI-sending function, which is > a good cleanup in itself. Ok, I was able to trigger it myself: [ 0.822602] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 [ 0.822602] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 [ 0.822602] no locks held by swapper/1/0. [ 0.822602] irq event stamp: 0 [ 0.822602] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 0.822602] hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 [ 0.822602] softirqs last enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 [ 0.822602] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 0.822602] Preemption disabled at: [ 0.822602] [<ffffffff8104703b>] start_secondary+0x3b/0x190 [ 0.822602] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 [ 0.822602] Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 [ 0.822602] Call Trace: [ 0.822602] dump_stack+0x71/0xa0 [ 0.822602] ___might_sleep.cold.92+0xf7/0x11f [ 0.822602] wait_for_completion+0x3c/0x180 [ 0.822602] ? generic_exec_single+0xca/0x100 [ 0.822602] rdmsr_safe_on_cpu+0xe8/0x100 [ 0.822602] ? wrmsr_on_cpus+0x20/0x20 [ 0.822602] mce_amd_feature_init+0x2ab/0x590 [ 0.822602] mcheck_cpu_init+0x17a/0x4d0 [ 0.822602] identify_cpu+0x3f0/0x750 [ 0.822602] identify_secondary_cpu+0x13/0x80 [ 0.822602] smp_store_cpu_info+0x45/0x50 [ 0.822602] start_secondary+0x50/0x190 [ 0.822602] secondary_startup_64+0xa4/0xb0 Rerouting patch... Thx.
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 6ea7fdc82f3c..c7ab0d38af79 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu) if (smca_banks[bank].hwid) return; - if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { + if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { pr_warn("Failed to read MCA_IPID for bank %d\n", bank); return; }
Function smca_configure() is called only for current cpu thus rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe(). BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1 Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019 Call Trace: dump_stack+0x5c/0x7b ___might_sleep+0xec/0x110 wait_for_completion+0x39/0x160 ? __rdmsr_safe_on_cpu+0x45/0x60 rdmsr_safe_on_cpu+0xae/0xf0 ? wrmsr_on_cpus+0x20/0x20 ? machine_check_poll+0xfd/0x1f0 ? mce_amd_feature_init+0x190/0x2d0 mce_amd_feature_init+0x190/0x2d0 mcheck_cpu_init+0x11a/0x460 identify_cpu+0x3e2/0x560 identify_secondary_cpu+0x13/0x80 smp_store_cpu_info+0x45/0x50 start_secondary+0xaa/0x200 secondary_startup_64+0xa4/0xb0 Except warning in kernel log everything works fine. Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- arch/x86/kernel/cpu/mce/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)