diff mbox

ARM: avoid Cortex-A9 livelock on tight dmb loops

Message ID E1f5qij-00033R-Sl@rmk-PC.armlinux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King (Oracle) April 10, 2018, 10:41 a.m. UTC
Executing loops such as:

	while (1)
		cpu_relax();

with interrupts disabled results in a livelock of the entire system,
as other CPUs are prevented making progress.  This is most noticable
as a failure of crashdump kexec, which stops just after issuing:

	Loading crashdump kernel...

to the system console.  Two other locations of these loops within the
ARM code have been identified and fixed up.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 arch/arm/kernel/machine_kexec.c  | 3 ++-
 arch/arm/kernel/smp.c            | 2 +-
 arch/arm/mach-omap2/prm_common.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

Comments

Tony Lindgren April 10, 2018, 1:41 p.m. UTC | #1
* Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
> diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
> index 021b5a8b9c0a..d4ddc78b2a0b 100644
> --- a/arch/arm/mach-omap2/prm_common.c
> +++ b/arch/arm/mach-omap2/prm_common.c
> @@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
>  	prm_ll_data->reset_system();
>  
>  	while (1)
> -		cpu_relax();
> +		cpu_do_idle();
>  }
>  

Hmm we need to check so the added WFI here does not cause an
undesired change to a low power state. Adding Tero to Cc also.

Regards,

Tony
Tero Kristo April 10, 2018, 2:12 p.m. UTC | #2
On 10/04/18 16:41, Tony Lindgren wrote:
> * Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
>> diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
>> index 021b5a8b9c0a..d4ddc78b2a0b 100644
>> --- a/arch/arm/mach-omap2/prm_common.c
>> +++ b/arch/arm/mach-omap2/prm_common.c
>> @@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
>>   	prm_ll_data->reset_system();
>>   
>>   	while (1)
>> -		cpu_relax();
>> +		cpu_do_idle();
>>   }
>>   
> 
> Hmm we need to check so the added WFI here does not cause an
> undesired change to a low power state. Adding Tero to Cc also.

Generally it is a bad idea to call arbitrary WFI within OMAP 
architecture, as this triggers a PRCM power transition and will most 
likely cause a hang if not controlled properly.

Has this patch been tested on any platform that supports proper power 
management?

-Tero
--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Will Deacon April 10, 2018, 3:28 p.m. UTC | #3
On Tue, Apr 10, 2018 at 05:12:37PM +0300, Tero Kristo wrote:
> On 10/04/18 16:41, Tony Lindgren wrote:
> >* Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
> >>diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
> >>index 021b5a8b9c0a..d4ddc78b2a0b 100644
> >>--- a/arch/arm/mach-omap2/prm_common.c
> >>+++ b/arch/arm/mach-omap2/prm_common.c
> >>@@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
> >>  	prm_ll_data->reset_system();
> >>  	while (1)
> >>-		cpu_relax();
> >>+		cpu_do_idle();
> >>  }
> >
> >Hmm we need to check so the added WFI here does not cause an
> >undesired change to a low power state. Adding Tero to Cc also.
> 
> Generally it is a bad idea to call arbitrary WFI within OMAP architecture,
> as this triggers a PRCM power transition and will most likely cause a hang
> if not controlled properly.
> 
> Has this patch been tested on any platform that supports proper power
> management?

An alternative to WFI would be a DSB, which should avoid unexpected
interactions with power management.

Will
Russell King (Oracle) April 11, 2018, 12:52 p.m. UTC | #4
On Tue, Apr 10, 2018 at 05:12:37PM +0300, Tero Kristo wrote:
> On 10/04/18 16:41, Tony Lindgren wrote:
> >* Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
> >>diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
> >>index 021b5a8b9c0a..d4ddc78b2a0b 100644
> >>--- a/arch/arm/mach-omap2/prm_common.c
> >>+++ b/arch/arm/mach-omap2/prm_common.c
> >>@@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
> >>  	prm_ll_data->reset_system();
> >>  	while (1)
> >>-		cpu_relax();
> >>+		cpu_do_idle();
> >>  }
> >
> >Hmm we need to check so the added WFI here does not cause an
> >undesired change to a low power state. Adding Tero to Cc also.
> 
> Generally it is a bad idea to call arbitrary WFI within OMAP architecture,
> as this triggers a PRCM power transition and will most likely cause a hang
> if not controlled properly.
> 
> Has this patch been tested on any platform that supports proper power
> management?

That will also go for the other locations in this patch too, as they
are all callable on _any_ platform.

It sounds like we need to abstract this so that platforms where "wfi"
is complex can handle the "spin on this CPU forever" appropriately.

While we could use dsb, we're asking a CPU to indefinitely spin in a
tight loop, which isn't going to be good for power consumption - what
if we have three CPUs doing that, could it push a SoC over the thermal
limits?  I don't think that's a question we can confidently answer
except for specific SoCs.
Tero Kristo April 11, 2018, 12:57 p.m. UTC | #5
On 11/04/18 15:52, Russell King - ARM Linux wrote:
> On Tue, Apr 10, 2018 at 05:12:37PM +0300, Tero Kristo wrote:
>> On 10/04/18 16:41, Tony Lindgren wrote:
>>> * Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
>>>> diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
>>>> index 021b5a8b9c0a..d4ddc78b2a0b 100644
>>>> --- a/arch/arm/mach-omap2/prm_common.c
>>>> +++ b/arch/arm/mach-omap2/prm_common.c
>>>> @@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
>>>>   	prm_ll_data->reset_system();
>>>>   	while (1)
>>>> -		cpu_relax();
>>>> +		cpu_do_idle();
>>>>   }
>>>
>>> Hmm we need to check so the added WFI here does not cause an
>>> undesired change to a low power state. Adding Tero to Cc also.
>>
>> Generally it is a bad idea to call arbitrary WFI within OMAP architecture,
>> as this triggers a PRCM power transition and will most likely cause a hang
>> if not controlled properly.
>>
>> Has this patch been tested on any platform that supports proper power
>> management?
> 
> That will also go for the other locations in this patch too, as they
> are all callable on _any_ platform.
> 
> It sounds like we need to abstract this so that platforms where "wfi"
> is complex can handle the "spin on this CPU forever" appropriately.

Yea, I would definitely prefer this over adding arbitrary WFIs in the 
kernel.

-Tero

> 
> While we could use dsb, we're asking a CPU to indefinitely spin in a
> tight loop, which isn't going to be good for power consumption - what
> if we have three CPUs doing that, could it push a SoC over the thermal
> limits?  I don't think that's a question we can confidently answer
> except for specific SoCs.
> 

--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
J, KEERTHY April 11, 2018, 12:59 p.m. UTC | #6
On Wednesday 11 April 2018 06:22 PM, Russell King - ARM Linux wrote:
> On Tue, Apr 10, 2018 at 05:12:37PM +0300, Tero Kristo wrote:
>> On 10/04/18 16:41, Tony Lindgren wrote:
>>> * Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
>>>> diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
>>>> index 021b5a8b9c0a..d4ddc78b2a0b 100644
>>>> --- a/arch/arm/mach-omap2/prm_common.c
>>>> +++ b/arch/arm/mach-omap2/prm_common.c
>>>> @@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
>>>>  	prm_ll_data->reset_system();
>>>>  	while (1)
>>>> -		cpu_relax();
>>>> +		cpu_do_idle();
>>>>  }
>>>
>>> Hmm we need to check so the added WFI here does not cause an
>>> undesired change to a low power state. Adding Tero to Cc also.
>>
>> Generally it is a bad idea to call arbitrary WFI within OMAP architecture,
>> as this triggers a PRCM power transition and will most likely cause a hang
>> if not controlled properly.
>>
>> Has this patch been tested on any platform that supports proper power
>> management?
> 
> That will also go for the other locations in this patch too, as they
> are all callable on _any_ platform.
> 
> It sounds like we need to abstract this so that platforms where "wfi"
> is complex can handle the "spin on this CPU forever" appropriately.
> 
> While we could use dsb, we're asking a CPU to indefinitely spin in a
> tight loop, which isn't going to be good for power consumption - what
> if we have three CPUs doing that, could it push a SoC over the thermal
> limits?  I don't think that's a question we can confidently answer
> except for specific SoCs.

Yes. If the ondemand governor detects that CPU was busy greater than
80% of the time it bumps to the highest OPP and can lead to higher
temperatures though CPU might not be doing anything useful.

>
Russell King (Oracle) April 11, 2018, 1:10 p.m. UTC | #7
On Wed, Apr 11, 2018 at 06:29:21PM +0530, Keerthy wrote:
> On Wednesday 11 April 2018 06:22 PM, Russell King - ARM Linux wrote:
> > That will also go for the other locations in this patch too, as they
> > are all callable on _any_ platform.
> > 
> > It sounds like we need to abstract this so that platforms where "wfi"
> > is complex can handle the "spin on this CPU forever" appropriately.
> > 
> > While we could use dsb, we're asking a CPU to indefinitely spin in a
> > tight loop, which isn't going to be good for power consumption - what
> > if we have three CPUs doing that, could it push a SoC over the thermal
> > limits?  I don't think that's a question we can confidently answer
> > except for specific SoCs.
> 
> Yes. If the ondemand governor detects that CPU was busy greater than
> 80% of the time it bumps to the highest OPP and can lead to higher
> temperatures though CPU might not be doing anything useful.

That probably wouldn't happen - all these paths are concerned with
stopping CPUs doing something as a result of either a panic, a crash
or a failed attempt to reset the system.

We'd enter them in whatever operating state the system was in at the
time, which is indeterminant.  What we can be relatively sure about
is that no further operating state transitions will occur.

For example, in the case of a crash with kexec and a crashdump kernel
loaded, the non-crashing CPUs end up in machine_crash_nonpanic_core().
Should kexec fail, then the system stops leaving all but one CPU
spinning in that function in whatever operating state they were in,
which could be the highest OPP.

This means that, for example, in the case of a four CPU system, three
CPUs will be spinning hard on whatever instructions we have there,
while one CPU is trying to perform cache operations to prepare to boot
the crashdump kernel.

For a panic, it's very similar - the CPUs which didn't call panic()
are directed to ipi_cpu_stop() where they spin.  By default, a panic()
halts the panicing CPU and nothing further happens, so the other CPUs
will endlessly spin in the same way as above.  The panicing CPU may
be waiting for the panic timeout to expire before trying to reboot the
system.

The OMAP reset case is slightly different, because that's a case of
failure-to-reboot - combine that with a panic timeout, and you can end
up with _all_ CPUs in the system indefinitely spinning hard in a tight
loop.
Tony Lindgren April 11, 2018, 2:11 p.m. UTC | #8
* Russell King - ARM Linux <linux@armlinux.org.uk> [180411 12:53]:
> On Tue, Apr 10, 2018 at 05:12:37PM +0300, Tero Kristo wrote:
> > On 10/04/18 16:41, Tony Lindgren wrote:
> > >* Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
> > >>diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
> > >>index 021b5a8b9c0a..d4ddc78b2a0b 100644
> > >>--- a/arch/arm/mach-omap2/prm_common.c
> > >>+++ b/arch/arm/mach-omap2/prm_common.c
> > >>@@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
> > >>  	prm_ll_data->reset_system();
> > >>  	while (1)
> > >>-		cpu_relax();
> > >>+		cpu_do_idle();
> > >>  }
> > >
> > >Hmm we need to check so the added WFI here does not cause an
> > >undesired change to a low power state. Adding Tero to Cc also.
> > 
> > Generally it is a bad idea to call arbitrary WFI within OMAP architecture,
> > as this triggers a PRCM power transition and will most likely cause a hang
> > if not controlled properly.
> > 
> > Has this patch been tested on any platform that supports proper power
> > management?
> 
> That will also go for the other locations in this patch too, as they
> are all callable on _any_ platform.
> 
> It sounds like we need to abstract this so that platforms where "wfi"
> is complex can handle the "spin on this CPU forever" appropriately.
> 
> While we could use dsb, we're asking a CPU to indefinitely spin in a
> tight loop, which isn't going to be good for power consumption - what
> if we have three CPUs doing that, could it push a SoC over the thermal
> limits?  I don't think that's a question we can confidently answer
> except for specific SoCs.

We already have code in the kernel (and in the bootrom) to "park" a
cpu after starting. But using it without resetting the cpu would require
1-1 memory mapping or modifying the code. That is if we wanted to use
the same code also for parking the cpus for kexec without resetting
them.

Regards,

Tony
Russell King (Oracle) April 15, 2018, 2:08 p.m. UTC | #9
On Wed, Apr 11, 2018 at 07:11:39AM -0700, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@armlinux.org.uk> [180411 12:53]:
> > On Tue, Apr 10, 2018 at 05:12:37PM +0300, Tero Kristo wrote:
> > > On 10/04/18 16:41, Tony Lindgren wrote:
> > > >* Russell King <rmk+kernel@armlinux.org.uk> [180410 10:43]:
> > > >>diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
> > > >>index 021b5a8b9c0a..d4ddc78b2a0b 100644
> > > >>--- a/arch/arm/mach-omap2/prm_common.c
> > > >>+++ b/arch/arm/mach-omap2/prm_common.c
> > > >>@@ -523,7 +523,7 @@ void omap_prm_reset_system(void)
> > > >>  	prm_ll_data->reset_system();
> > > >>  	while (1)
> > > >>-		cpu_relax();
> > > >>+		cpu_do_idle();
> > > >>  }
> > > >
> > > >Hmm we need to check so the added WFI here does not cause an
> > > >undesired change to a low power state. Adding Tero to Cc also.
> > > 
> > > Generally it is a bad idea to call arbitrary WFI within OMAP architecture,
> > > as this triggers a PRCM power transition and will most likely cause a hang
> > > if not controlled properly.
> > > 
> > > Has this patch been tested on any platform that supports proper power
> > > management?
> > 
> > That will also go for the other locations in this patch too, as they
> > are all callable on _any_ platform.
> > 
> > It sounds like we need to abstract this so that platforms where "wfi"
> > is complex can handle the "spin on this CPU forever" appropriately.
> > 
> > While we could use dsb, we're asking a CPU to indefinitely spin in a
> > tight loop, which isn't going to be good for power consumption - what
> > if we have three CPUs doing that, could it push a SoC over the thermal
> > limits?  I don't think that's a question we can confidently answer
> > except for specific SoCs.
> 
> We already have code in the kernel (and in the bootrom) to "park" a
> cpu after starting. But using it without resetting the cpu would require
> 1-1 memory mapping or modifying the code. That is if we wanted to use
> the same code also for parking the cpus for kexec without resetting
> them.

In which case, how about using:

	while (1) {
		cpu_relax();
		wfe();
	}

instead - that appears to also have the desired effect, allowing kdump
to work on the SDP4430.
Russell King (Oracle) April 15, 2018, 3:50 p.m. UTC | #10
On Sun, Apr 15, 2018 at 03:08:34PM +0100, Russell King - ARM Linux wrote:
> On Wed, Apr 11, 2018 at 07:11:39AM -0700, Tony Lindgren wrote:
> > We already have code in the kernel (and in the bootrom) to "park" a
> > cpu after starting. But using it without resetting the cpu would require
> > 1-1 memory mapping or modifying the code. That is if we wanted to use
> > the same code also for parking the cpus for kexec without resetting
> > them.
> 
> In which case, how about using:
> 
> 	while (1) {
> 		cpu_relax();
> 		wfe();
> 	}
> 
> instead - that appears to also have the desired effect, allowing kdump
> to work on the SDP4430.

... but results in compile failures on non-ARMv7 targets.
diff mbox

Patch

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 6b38d7a634c1..75d4f5ce6cfd 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -91,8 +91,9 @@  void machine_crash_nonpanic_core(void *unused)
 
 	set_cpu_online(smp_processor_id(), false);
 	atomic_dec(&waiting_for_crash_ipi);
+
 	while (1)
-		cpu_relax();
+		cpu_do_idle();
 }
 
 static void machine_kexec_mask_interrupts(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 5c7ec00a500e..cbaba4a15a3a 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -568,7 +568,7 @@  static void ipi_cpu_stop(unsigned int cpu)
 	local_irq_disable();
 
 	while (1)
-		cpu_relax();
+		cpu_do_idle();
 }
 
 static DEFINE_PER_CPU(struct completion *, cpu_completion);
diff --git a/arch/arm/mach-omap2/prm_common.c b/arch/arm/mach-omap2/prm_common.c
index 021b5a8b9c0a..d4ddc78b2a0b 100644
--- a/arch/arm/mach-omap2/prm_common.c
+++ b/arch/arm/mach-omap2/prm_common.c
@@ -523,7 +523,7 @@  void omap_prm_reset_system(void)
 	prm_ll_data->reset_system();
 
 	while (1)
-		cpu_relax();
+		cpu_do_idle();
 }
 
 /**