diff mbox series

x86/MCE/AMD: fix warning about sleep-in-atomic at early boot

Message ID 157252708836.3876.4604398213417262402.stgit@buzz (mailing list archive)
State New, archived
Headers show
Series x86/MCE/AMD: fix warning about sleep-in-atomic at early boot | expand

Commit Message

Konstantin Khlebnikov Oct. 31, 2019, 1:04 p.m. UTC
Function smca_configure() is called only for current cpu thus
rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe().

 BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1
 Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019
 Call Trace:
  dump_stack+0x5c/0x7b
  ___might_sleep+0xec/0x110
  wait_for_completion+0x39/0x160
  ? __rdmsr_safe_on_cpu+0x45/0x60
  rdmsr_safe_on_cpu+0xae/0xf0
  ? wrmsr_on_cpus+0x20/0x20
  ? machine_check_poll+0xfd/0x1f0
  ? mce_amd_feature_init+0x190/0x2d0
  mce_amd_feature_init+0x190/0x2d0
  mcheck_cpu_init+0x11a/0x460
  identify_cpu+0x3e2/0x560
  identify_secondary_cpu+0x13/0x80
  smp_store_cpu_info+0x45/0x50
  start_secondary+0xaa/0x200
  secondary_startup_64+0xa4/0xb0

Except warning in kernel log everything works fine.

Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 arch/x86/kernel/cpu/mce/amd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Borislav Petkov Oct. 31, 2019, 2:29 p.m. UTC | #1
On Thu, Oct 31, 2019 at 04:04:48PM +0300, Konstantin Khlebnikov wrote:
> Function smca_configure() is called only for current cpu thus
> rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe().
> 
>  BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1
					     ^^^^^^^^^^

I'm assuming you hit this on latest upstream too?

>  Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019
>  Call Trace:
>   dump_stack+0x5c/0x7b
>   ___might_sleep+0xec/0x110
>   wait_for_completion+0x39/0x160
>   ? __rdmsr_safe_on_cpu+0x45/0x60
>   rdmsr_safe_on_cpu+0xae/0xf0
>   ? wrmsr_on_cpus+0x20/0x20
>   ? machine_check_poll+0xfd/0x1f0
>   ? mce_amd_feature_init+0x190/0x2d0
>   mce_amd_feature_init+0x190/0x2d0
>   mcheck_cpu_init+0x11a/0x460
>   identify_cpu+0x3e2/0x560
>   identify_secondary_cpu+0x13/0x80
>   smp_store_cpu_info+0x45/0x50
>   start_secondary+0xaa/0x200
>   secondary_startup_64+0xa4/0xb0
> 
> Except warning in kernel log everything works fine.
> 
> Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types")
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  arch/x86/kernel/cpu/mce/amd.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 6ea7fdc82f3c..c7ab0d38af79 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
>  	if (smca_banks[bank].hwid)
>  		return;
>  
> -	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
> +	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {

Yazen, any objections?
Yazen Ghannam Oct. 31, 2019, 2:58 p.m. UTC | #2
> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Thursday, October 31, 2019 10:30 AM
> To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>; Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: Tony Luck <tony.luck@intel.com>; linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH] x86/MCE/AMD: fix warning about sleep-in-atomic at early boot
> 
> On Thu, Oct 31, 2019 at 04:04:48PM +0300, Konstantin Khlebnikov wrote:
> > Function smca_configure() is called only for current cpu thus
> > rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe().
> >
> >  BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
> >  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
> >  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1
> 					     ^^^^^^^^^^
> 
> I'm assuming you hit this on latest upstream too?
> 
> >  Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019
> >  Call Trace:
> >   dump_stack+0x5c/0x7b
> >   ___might_sleep+0xec/0x110
> >   wait_for_completion+0x39/0x160
> >   ? __rdmsr_safe_on_cpu+0x45/0x60
> >   rdmsr_safe_on_cpu+0xae/0xf0
> >   ? wrmsr_on_cpus+0x20/0x20
> >   ? machine_check_poll+0xfd/0x1f0
> >   ? mce_amd_feature_init+0x190/0x2d0
> >   mce_amd_feature_init+0x190/0x2d0
> >   mcheck_cpu_init+0x11a/0x460
> >   identify_cpu+0x3e2/0x560
> >   identify_secondary_cpu+0x13/0x80
> >   smp_store_cpu_info+0x45/0x50
> >   start_secondary+0xaa/0x200
> >   secondary_startup_64+0xa4/0xb0
> >
> > Except warning in kernel log everything works fine.
> >
> > Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types")
> > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > ---
> >  arch/x86/kernel/cpu/mce/amd.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> > index 6ea7fdc82f3c..c7ab0d38af79 100644
> > --- a/arch/x86/kernel/cpu/mce/amd.c
> > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
> >  	if (smca_banks[bank].hwid)
> >  		return;
> >
> > -	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
> > +	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
> 
> Yazen, any objections?
> 

This looks good to me.

We can go further and remove the "cpu" parameter from this entire function.
But that can be another patch.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>

Thanks,
Yazen
Konstantin Khlebnikov Nov. 1, 2019, 1:39 p.m. UTC | #3
On 31/10/2019 17.29, Borislav Petkov wrote:
> On Thu, Oct 31, 2019 at 04:04:48PM +0300, Konstantin Khlebnikov wrote:
>> Function smca_configure() is called only for current cpu thus
>> rdmsr_safe_on_cpu() could be replaced with atomic rdmsr_safe().
>>
>>   BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
>>   in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>>   CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.79-16 #1
> 					     ^^^^^^^^^^
> 
> I'm assuming you hit this on latest upstream too?

I tried 5.4 once but there was no warning.
Code in 4.19 and in mainline almost the same.

Probably hardware needs full power cycle to reset state or something else.

> 
>>   Hardware name: GIGABYTE R181-Z90-00/MZ91-FS0-00, BIOS R11 10/25/2019
>>   Call Trace:
>>    dump_stack+0x5c/0x7b
>>    ___might_sleep+0xec/0x110
>>    wait_for_completion+0x39/0x160
>>    ? __rdmsr_safe_on_cpu+0x45/0x60
>>    rdmsr_safe_on_cpu+0xae/0xf0
>>    ? wrmsr_on_cpus+0x20/0x20
>>    ? machine_check_poll+0xfd/0x1f0
>>    ? mce_amd_feature_init+0x190/0x2d0
>>    mce_amd_feature_init+0x190/0x2d0
>>    mcheck_cpu_init+0x11a/0x460
>>    identify_cpu+0x3e2/0x560
>>    identify_secondary_cpu+0x13/0x80
>>    smp_store_cpu_info+0x45/0x50
>>    start_secondary+0xaa/0x200
>>    secondary_startup_64+0xa4/0xb0
>>
>> Except warning in kernel log everything works fine.
>>
>> Fixes: 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types")
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>>   arch/x86/kernel/cpu/mce/amd.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 6ea7fdc82f3c..c7ab0d38af79 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -269,7 +269,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
>>   	if (smca_banks[bank].hwid)
>>   		return;
>>   
>> -	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
>> +	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
> 
> Yazen, any objections?
>
Borislav Petkov Nov. 7, 2019, 10:53 a.m. UTC | #4
On Fri, Nov 01, 2019 at 04:39:17PM +0300, Konstantin Khlebnikov wrote:
> I tried 5.4 once but there was no warning.
> Code in 4.19 and in mainline almost the same.

Yes, but early boot code has changed a lot since 4.19. If you can't
trigger it on 5.4, then I'll drop the BUG splat from your commit message
and change it to talk about replacing the IPI-sending function, which is
a good cleanup in itself.

Thx.
Borislav Petkov Dec. 17, 2019, 7:53 a.m. UTC | #5
On Thu, Nov 07, 2019 at 11:53:10AM +0100, Borislav Petkov wrote:
> On Fri, Nov 01, 2019 at 04:39:17PM +0300, Konstantin Khlebnikov wrote:
> > I tried 5.4 once but there was no warning.
> > Code in 4.19 and in mainline almost the same.
> 
> Yes, but early boot code has changed a lot since 4.19. If you can't
> trigger it on 5.4, then I'll drop the BUG splat from your commit message
> and change it to talk about replacing the IPI-sending function, which is
> a good cleanup in itself.

Ok, I was able to trigger it myself:

[    0.822602] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
[    0.822602] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[    0.822602] no locks held by swapper/1/0.
[    0.822602] irq event stamp: 0
[    0.822602] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.822602] hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
[    0.822602] softirqs last  enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
[    0.822602] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.822602] Preemption disabled at:
[    0.822602] [<ffffffff8104703b>] start_secondary+0x3b/0x190
[    0.822602] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1
[    0.822602] Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
[    0.822602] Call Trace:
[    0.822602]  dump_stack+0x71/0xa0
[    0.822602]  ___might_sleep.cold.92+0xf7/0x11f
[    0.822602]  wait_for_completion+0x3c/0x180
[    0.822602]  ? generic_exec_single+0xca/0x100
[    0.822602]  rdmsr_safe_on_cpu+0xe8/0x100
[    0.822602]  ? wrmsr_on_cpus+0x20/0x20
[    0.822602]  mce_amd_feature_init+0x2ab/0x590
[    0.822602]  mcheck_cpu_init+0x17a/0x4d0
[    0.822602]  identify_cpu+0x3f0/0x750
[    0.822602]  identify_secondary_cpu+0x13/0x80
[    0.822602]  smp_store_cpu_info+0x45/0x50
[    0.822602]  start_secondary+0x50/0x190
[    0.822602]  secondary_startup_64+0xa4/0xb0

Rerouting patch...

Thx.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6ea7fdc82f3c..c7ab0d38af79 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -269,7 +269,7 @@  static void smca_configure(unsigned int bank, unsigned int cpu)
 	if (smca_banks[bank].hwid)
 		return;
 
-	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
+	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
 		pr_warn("Failed to read MCA_IPID for bank %d\n", bank);
 		return;
 	}