[v3,2/3] ARM: rockchip: ensure CPU to enter WFI/WFE state
diff mbox

Message ID 1433517104-7595-3-git-send-email-wxt@rock-chips.com
State New
Headers show

Commit Message

Caesar Wang June 5, 2015, 3:11 p.m. UTC
In idle mode, core1/2/3 of Cortex-A17 should be either power off or in
WFI/WFE state.
we can delay 1ms to ensure the CPU enter WFI/WFE state.

Signed-off-by: Caesar Wang <wxt@rock-chips.com>
---

 arch/arm/mach-rockchip/platsmp.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Doug Anderson June 5, 2015, 5:49 p.m. UTC | #1
Caesar,

On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote:
> In idle mode, core1/2/3 of Cortex-A17 should be either power off or in
> WFI/WFE state.
> we can delay 1ms to ensure the CPU enter WFI/WFE state.
>
> Signed-off-by: Caesar Wang <wxt@rock-chips.com>
> ---
>
>  arch/arm/mach-rockchip/platsmp.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c
> index 25da16f..6672fdd 100644
> --- a/arch/arm/mach-rockchip/platsmp.c
> +++ b/arch/arm/mach-rockchip/platsmp.c
> @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus)
>  #ifdef CONFIG_HOTPLUG_CPU
>  static int rockchip_cpu_kill(unsigned int cpu)
>  {
> +       /* ensure CPU can enter the WFI/WFE state */
> +       mdelay(1);

This is a pretty weak assurance.  Is there any stronger assurance you
can give that we're in WFI/WFE state and won't come out of it?

Do you actually see problems if you power off a CPU when it's not in
WFI/WFE state?

...so I _think_ I see the path that is happening here and what you're
trying to handle.  Specifically, I see:

On dying CPU:
1. cpu_die() calls 'complete(&cpu_died)'
2. cpu_die() calls 'smp_ops.cpu_die(cpu)' AKA rockchip_cpu_die()
3. rockchip_cpu_die() does a bit more cache flushing before looping in
cpu_do_idle()

The problem is that the moment the completion happens in step #1 above
the dying CPU can be killed.  ...so you're trying to make sure the
dying CPU makes it to cpu_do_idle().  In that case a fixed mdelay(1)
might be OK since the time that the CPU takes to run through a few
instructions (with no interrupts) is pretty predictable.  It would be
really nice if the commit message went through all this, though.

...but is there any chance that cpu_do_idle() could somehow return?
We shouldn't send any events since we've marked the core offline, but
perhaps some per-core interrupt (arch timer?) that didn't get
migrated?


-Doug
Russell King - ARM Linux June 5, 2015, 6:29 p.m. UTC | #2
On Fri, Jun 05, 2015 at 10:49:14AM -0700, Doug Anderson wrote:
> On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote:
> > In idle mode, core1/2/3 of Cortex-A17 should be either power off or in
> > WFI/WFE state.
> > we can delay 1ms to ensure the CPU enter WFI/WFE state.
> >
> > Signed-off-by: Caesar Wang <wxt@rock-chips.com>
> > ---
> >
> >  arch/arm/mach-rockchip/platsmp.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c
> > index 25da16f..6672fdd 100644
> > --- a/arch/arm/mach-rockchip/platsmp.c
> > +++ b/arch/arm/mach-rockchip/platsmp.c
> > @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus)
> >  #ifdef CONFIG_HOTPLUG_CPU
> >  static int rockchip_cpu_kill(unsigned int cpu)
> >  {
> > +       /* ensure CPU can enter the WFI/WFE state */
> > +       mdelay(1);
> 
> This is a pretty weak assurance.  Is there any stronger assurance you
> can give that we're in WFI/WFE state and won't come out of it?
> 
> Do you actually see problems if you power off a CPU when it's not in
> WFI/WFE state?

I really don't like to see platforms pulling crap tricks in their
CPU hotunplug code like this.  If there's something wrong with the
generic code, then the generic code needs to be fixed, not worked
around in platform code.

> ...so I _think_ I see the path that is happening here and what you're
> trying to handle.  Specifically, I see:
> 
> On dying CPU:
> 1. cpu_die() calls 'complete(&cpu_died)'
> 2. cpu_die() calls 'smp_ops.cpu_die(cpu)' AKA rockchip_cpu_die()
> 3. rockchip_cpu_die() does a bit more cache flushing before looping in
> cpu_do_idle()
> 
> The problem is that the moment the completion happens in step #1 above
> the dying CPU can be killed.  ...so you're trying to make sure the
> dying CPU makes it to cpu_do_idle().  In that case a fixed mdelay(1)
> might be OK since the time that the CPU takes to run through a few
> instructions (with no interrupts) is pretty predictable.  It would be
> really nice if the commit message went through all this, though.
> 
> ...but is there any chance that cpu_do_idle() could somehow return?
> We shouldn't send any events since we've marked the core offline, but
> perhaps some per-core interrupt (arch timer?) that didn't get
> migrated?

How this is supposed to work is:

	CPU requesting death		CPU dying
	stop_machine(take_down_cpu)
					take_down_cpu()
					-__cpu_disable()
					- platform_cpu_disable()
					- marks CPU offline
					- migrates IRQs away
					- caches flushed
					- tlbs flushed
	__cpu_die()			- processes migrated away
	-wait_for_completion_timeout()	-timekeeping migrated away
					returns to idle loop
					arch_cpu_idle_dead()
					-cpu_die()
					- idle_task_exit()
					- flush_cache_louis()

At this point, dirty cache lines which matter are flushed from the
dying CPUs cache.  However, we still need the dying CPU to be
coherent for the next step, which is to issue the completion.

					- complete()
					- flush_cache_louis()

At some point during the above, the completion becomes visible to
the other CPU, and it continues its execution.

	-pr_notice()
	-platform_cpu_kill()		- smp_ops.cpu_die()

The precise ordering of smp_ops.cpu_die() vs platform_cpu_kill() is
pretty much indeterminant, and is subject to scheduling effects which
could delay the requesting CPU.

Now, the problem is, from what ARM has been saying, that cache lines
can get migrated to the dying CPU which may contain the "master" copy
of the data, which means that when the dying CPU actually exits
coherency, that data is lost to the rest of the system.

Ideally, what we would like to do is to exit coherency earlier, and
then have some notification mechanism between the dying CPU and the
rest of the system to say that it's finished exiting coherency.
However, we face several issues.

1) v7_coherency_exit() is specific to v7 CPUs and can't be used by
   generic code.

2) we have no way to perform that notification reliably once
   coherency has been exited.

There's other reasons we need to change the notification mechanism,
one of them is that completions use RCU, and RCU isn't actually valid
in this context - and we can't make it valid because the dying CPU
might loose power at any moment after it's called complete().

Paul McKenny created what was a generic infrastructure for this
notification, but I object to it on principle using atomic_t... and
if we've exited coherency (as we would want to), it won't work
because it makes use of atomic_cmpxchg() on the dying CPU.

I've been working on a solution to use an IPI to notify from the
dying CPU, but that's fraught because of the separation of the IRQ
controller code from the architecture code, and we now have spinlocks
in the IPI-raising path due to the big.LITTLE stuff - and spinlocks
need coherency to be working.

So, we're actually in a very sticky position over taking CPUs offline.
It seems to be something that the ARM architecture and kernel
architecture doesn't actually allow to be done safely.  So much so,
that in a similar way to the original Keystone 2 physical address
switch, I'm tempted to make taking a CPU offline taint the kernel!

We're going to end up with tainted kernels through this path when Paul
pushes his RCU correctness patches anyway anyway, so it's just a
matter of the inevitable taint happening sooner or later.
Doug Anderson June 5, 2015, 8:24 p.m. UTC | #3
Russell,

On Fri, Jun 5, 2015 at 11:29 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by
>    generic code.

Oh, I see.  So (I think) you're saying that perhaps the reason that
Caesar needed his patch was that he needed the dying processor to
execute v7_exit_coherency_flush(), NOT that he needed the dying
processor to be in WFI/WFE.  That actually makes a lot more sense to
me!  :)  Thanks a lot for pointing that out, it's very helpful.


> So, we're actually in a very sticky position over taking CPUs offline.
> It seems to be something that the ARM architecture and kernel
> architecture doesn't actually allow to be done safely.  So much so,
> that in a similar way to the original Keystone 2 physical address
> switch, I'm tempted to make taking a CPU offline taint the kernel!

Wow, that's going to suck.  So if you want to suspend / resume you
need to taint your kernel.  So much for saving the planet by going
into suspend...  ...or are you thinking that it won't taint the kernel
when the kernel takes CPUs offline for suspend/resume purposes?  ...or
are you thinking you've some solution that works for suspend/resume
that doesn't work for the general cpu offlining problem?  I'd be very
interested to hear...


I know I'm not a maintainer, but if I were and I knew that lots of
smart people had thought about the problem of CPU offlining and they
didn't have a solution and I could make my platform 99.99999999%
reliable by allowing a very safe mdelay(1) where I had a pretty strong
guarantee that the 1ms was enough time, I would probably accept that
code...


So since I'm not a maintainer and I certainly couldn't ack such code,
I would certainly be happy to add my Reviewed-by to Caesar's patch if
he changed it mention that he needed to make sure that
v7_exit_coherency_flush() in rockchip_cpu_die() executed in time.


-Doug
Caesar Wang June 8, 2015, 4:56 a.m. UTC | #4
? 2015?06?06? 04:24, Doug Anderson ??:
> Russell,
>
> On Fri, Jun 5, 2015 at 11:29 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by
>>     generic code.
> Oh, I see.  So (I think) you're saying that perhaps the reason that
> Caesar needed his patch was that he needed the dying processor to
> execute v7_exit_coherency_flush(), NOT that he needed the dying
> processor to be in WFI/WFE.  That actually makes a lot more sense to
> me!  :)  Thanks a lot for pointing that out, it's very helpful.
>
>
>> So, we're actually in a very sticky position over taking CPUs offline.
>> It seems to be something that the ARM architecture and kernel
>> architecture doesn't actually allow to be done safely.  So much so,
>> that in a similar way to the original Keystone 2 physical address
>> switch, I'm tempted to make taking a CPU offline taint the kernel!
> Wow, that's going to suck.  So if you want to suspend / resume you
> need to taint your kernel.  So much for saving the planet by going
> into suspend...  ...or are you thinking that it won't taint the kernel
> when the kernel takes CPUs offline for suspend/resume purposes?  ...or
> are you thinking you've some solution that works for suspend/resume
> that doesn't work for the general cpu offlining problem?  I'd be very
> interested to hear...
>
>
> I know I'm not a maintainer, but if I were and I knew that lots of
> smart people had thought about the problem of CPU offlining and they
> didn't have a solution and I could make my platform 99.99999999%
> reliable by allowing a very safe mdelay(1) where I had a pretty strong
> guarantee that the 1ms was enough time, I would probably accept that
> code...
>
>
> So since I'm not a maintainer and I certainly couldn't ack such code,
> I would certainly be happy to add my Reviewed-by to Caesar's patch if
> he changed it mention that he needed to make sure that
> v7_exit_coherency_flush() in rockchip_cpu_die() executed in time.

OK.
The dying processor to execute v7_exit_coherency_flush(),not that the 
dying processor to be in WFI/WFE.

It's needed to enter WFI/WFE state from the ARM refer document when CPU 
down.

But......

Here is my test: (won't to enter the WFI state)
@@ -331,8 +331,8 @@ static int rockchip_cpu_kill(unsigned int cpu)
  static void rockchip_cpu_die(unsigned int cpu)
  {
         v7_exit_coherency_flush(louis);
-       while (1)
-               cpu_do_idle();
+       while (1);
+               //cpu_do_idle();
  }

echo 0 > /sys/module/printk/parameters/console_suspend
echo 1 > /sys/power/pm_print_times
echo mem > sys/power/state

You can play anything
or do some test for CPU up/down:

cd /sys/devices/system/cpu/
for i in $(seq 10000); do
   echo "================= $i ============"
   for j in $(seq 100); do
     while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" 
!= "000" ]]; do

       echo 0 > cpu1/online
       echo 0 > cpu2/online
       echo 0 > cpu3/online
     done
     while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" 
!= "111" ]]; do
       echo 1 > cpu1/online
       echo 1 > cpu2/online
       echo 1 > cpu3/online
     done
   done
done
Sometimes,the system will be restart when do the about test.
I'm no sure what's happen, That maybe abnormal won't to enter the WFI 
state.

>
> -Doug
>
>
>
Doug Anderson June 8, 2015, 8:52 p.m. UTC | #5
Caesar,

On Sun, Jun 7, 2015 at 9:56 PM, Caesar Wang <wxt@rock-chips.com> wrote:
> OK.
> The dying processor to execute v7_exit_coherency_flush(),not that the dying
> processor to be in WFI/WFE.
>
> It's needed to enter WFI/WFE state from the ARM refer document when CPU
> down.
>
> But......
>
> Here is my test: (won't to enter the WFI state)
> @@ -331,8 +331,8 @@ static int rockchip_cpu_kill(unsigned int cpu)
>  static void rockchip_cpu_die(unsigned int cpu)
>  {
>         v7_exit_coherency_flush(louis);
> -       while (1)
> -               cpu_do_idle();
> +       while (1);
> +               //cpu_do_idle();
>  }
>
> echo 0 > /sys/module/printk/parameters/console_suspend
> echo 1 > /sys/power/pm_print_times
> echo mem > sys/power/state
>
> You can play anything
> or do some test for CPU up/down:
>
> cd /sys/devices/system/cpu/
> for i in $(seq 10000); do
>   echo "================= $i ============"
>   for j in $(seq 100); do
>     while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" !=
> "000" ]]; do
>
>       echo 0 > cpu1/online
>       echo 0 > cpu2/online
>       echo 0 > cpu3/online
>     done
>     while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" !=
> "111" ]]; do
>       echo 1 > cpu1/online
>       echo 1 > cpu2/online
>       echo 1 > cpu3/online
>     done
>   done
> done
> Sometimes,the system will be restart when do the about test.

I assume you meant the _above_ test.  So it sometimes works and
sometimes doesn't?  Strange...

> I'm no sure what's happen, That maybe abnormal won't to enter the WFI state.

Good test.  I think I understood what you said above: basically if you
don't get to the WFI state then the system will sometimes restart.

I guess that seems a little strange to me since I would have thought
that since you assert "reset" to the CPU before you power it off it
wouldn't matter a whole lot what state the system was in.  The
processor has exited concurrency and isn't touching any memory, so
nothing it does should be hurting anyone.  ...but I also am only
guessing about how all this works / trying to use common sense.  If we
really need to be in WFI/WFE then I guess that's what we need to do.

It still makes me nervous to say both that we "need to be in WFI" and
that we have a loop around cpu_do_idle().  The loop implies that
cpu_do_idle() might return sometimes.  That would be OK, except now
we've learned that if we happen to kill the CPU while we're executing
the loop that it might crash the system.  That's pretty non-ideal.  I
know the chances are incredibly unlikely, but still something that
worries me...


-Doug

Patch
diff mbox

diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c
index 25da16f..6672fdd 100644
--- a/arch/arm/mach-rockchip/platsmp.c
+++ b/arch/arm/mach-rockchip/platsmp.c
@@ -325,6 +325,9 @@  static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus)
 #ifdef CONFIG_HOTPLUG_CPU
 static int rockchip_cpu_kill(unsigned int cpu)
 {
+	/* ensure CPU can enter the WFI/WFE state */
+	mdelay(1);
+
 	pmu_set_power_domain(0 + cpu, false);
 	return 1;
 }