Message ID | 1433517104-7595-3-git-send-email-wxt@rock-chips.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Caesar, On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote: > In idle mode, core1/2/3 of Cortex-A17 should be either power off or in > WFI/WFE state. > we can delay 1ms to ensure the CPU enter WFI/WFE state. > > Signed-off-by: Caesar Wang <wxt@rock-chips.com> > --- > > arch/arm/mach-rockchip/platsmp.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c > index 25da16f..6672fdd 100644 > --- a/arch/arm/mach-rockchip/platsmp.c > +++ b/arch/arm/mach-rockchip/platsmp.c > @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) > #ifdef CONFIG_HOTPLUG_CPU > static int rockchip_cpu_kill(unsigned int cpu) > { > + /* ensure CPU can enter the WFI/WFE state */ > + mdelay(1); This is a pretty weak assurance. Is there any stronger assurance you can give that we're in WFI/WFE state and won't come out of it? Do you actually see problems if you power off a CPU when it's not in WFI/WFE state? ...so I _think_ I see the path that is happening here and what you're trying to handle. Specifically, I see: On dying CPU: 1. cpu_die() calls 'complete(&cpu_died)' 2. cpu_die() calls 'smp_ops.cpu_die(cpu)' AKA rockchip_cpu_die() 3. rockchip_cpu_die() does a bit more cache flushing before looping in cpu_do_idle() The problem is that the moment the completion happens in step #1 above the dying CPU can be killed. ...so you're trying to make sure the dying CPU makes it to cpu_do_idle(). In that case a fixed mdelay(1) might be OK since the time that the CPU takes to run through a few instructions (with no interrupts) is pretty predictable. It would be really nice if the commit message went through all this, though. ...but is there any chance that cpu_do_idle() could somehow return? We shouldn't send any events since we've marked the core offline, but perhaps some per-core interrupt (arch timer?) that didn't get migrated? -Doug
On Fri, Jun 05, 2015 at 10:49:14AM -0700, Doug Anderson wrote: > On Fri, Jun 5, 2015 at 8:11 AM, Caesar Wang <wxt@rock-chips.com> wrote: > > In idle mode, core1/2/3 of Cortex-A17 should be either power off or in > > WFI/WFE state. > > we can delay 1ms to ensure the CPU enter WFI/WFE state. > > > > Signed-off-by: Caesar Wang <wxt@rock-chips.com> > > --- > > > > arch/arm/mach-rockchip/platsmp.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c > > index 25da16f..6672fdd 100644 > > --- a/arch/arm/mach-rockchip/platsmp.c > > +++ b/arch/arm/mach-rockchip/platsmp.c > > @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) > > #ifdef CONFIG_HOTPLUG_CPU > > static int rockchip_cpu_kill(unsigned int cpu) > > { > > + /* ensure CPU can enter the WFI/WFE state */ > > + mdelay(1); > > This is a pretty weak assurance. Is there any stronger assurance you > can give that we're in WFI/WFE state and won't come out of it? > > Do you actually see problems if you power off a CPU when it's not in > WFI/WFE state? I really don't like to see platforms pulling crap tricks in their CPU hotunplug code like this. If there's something wrong with the generic code, then the generic code needs to be fixed, not worked around in platform code. > ...so I _think_ I see the path that is happening here and what you're > trying to handle. Specifically, I see: > > On dying CPU: > 1. cpu_die() calls 'complete(&cpu_died)' > 2. cpu_die() calls 'smp_ops.cpu_die(cpu)' AKA rockchip_cpu_die() > 3. rockchip_cpu_die() does a bit more cache flushing before looping in > cpu_do_idle() > > The problem is that the moment the completion happens in step #1 above > the dying CPU can be killed. ...so you're trying to make sure the > dying CPU makes it to cpu_do_idle(). In that case a fixed mdelay(1) > might be OK since the time that the CPU takes to run through a few > instructions (with no interrupts) is pretty predictable. It would be > really nice if the commit message went through all this, though. > > ...but is there any chance that cpu_do_idle() could somehow return? > We shouldn't send any events since we've marked the core offline, but > perhaps some per-core interrupt (arch timer?) that didn't get > migrated? How this is supposed to work is: CPU requesting death CPU dying stop_machine(take_down_cpu) take_down_cpu() -__cpu_disable() - platform_cpu_disable() - marks CPU offline - migrates IRQs away - caches flushed - tlbs flushed __cpu_die() - processes migrated away -wait_for_completion_timeout() -timekeeping migrated away returns to idle loop arch_cpu_idle_dead() -cpu_die() - idle_task_exit() - flush_cache_louis() At this point, dirty cache lines which matter are flushed from the dying CPUs cache. However, we still need the dying CPU to be coherent for the next step, which is to issue the completion. - complete() - flush_cache_louis() At some point during the above, the completion becomes visible to the other CPU, and it continues its execution. -pr_notice() -platform_cpu_kill() - smp_ops.cpu_die() The precise ordering of smp_ops.cpu_die() vs platform_cpu_kill() is pretty much indeterminant, and is subject to scheduling effects which could delay the requesting CPU. Now, the problem is, from what ARM has been saying, that cache lines can get migrated to the dying CPU which may contain the "master" copy of the data, which means that when the dying CPU actually exits coherency, that data is lost to the rest of the system. Ideally, what we would like to do is to exit coherency earlier, and then have some notification mechanism between the dying CPU and the rest of the system to say that it's finished exiting coherency. However, we face several issues. 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by generic code. 2) we have no way to perform that notification reliably once coherency has been exited. There's other reasons we need to change the notification mechanism, one of them is that completions use RCU, and RCU isn't actually valid in this context - and we can't make it valid because the dying CPU might loose power at any moment after it's called complete(). Paul McKenny created what was a generic infrastructure for this notification, but I object to it on principle using atomic_t... and if we've exited coherency (as we would want to), it won't work because it makes use of atomic_cmpxchg() on the dying CPU. I've been working on a solution to use an IPI to notify from the dying CPU, but that's fraught because of the separation of the IRQ controller code from the architecture code, and we now have spinlocks in the IPI-raising path due to the big.LITTLE stuff - and spinlocks need coherency to be working. So, we're actually in a very sticky position over taking CPUs offline. It seems to be something that the ARM architecture and kernel architecture doesn't actually allow to be done safely. So much so, that in a similar way to the original Keystone 2 physical address switch, I'm tempted to make taking a CPU offline taint the kernel! We're going to end up with tainted kernels through this path when Paul pushes his RCU correctness patches anyway anyway, so it's just a matter of the inevitable taint happening sooner or later.
Russell, On Fri, Jun 5, 2015 at 11:29 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by > generic code. Oh, I see. So (I think) you're saying that perhaps the reason that Caesar needed his patch was that he needed the dying processor to execute v7_exit_coherency_flush(), NOT that he needed the dying processor to be in WFI/WFE. That actually makes a lot more sense to me! :) Thanks a lot for pointing that out, it's very helpful. > So, we're actually in a very sticky position over taking CPUs offline. > It seems to be something that the ARM architecture and kernel > architecture doesn't actually allow to be done safely. So much so, > that in a similar way to the original Keystone 2 physical address > switch, I'm tempted to make taking a CPU offline taint the kernel! Wow, that's going to suck. So if you want to suspend / resume you need to taint your kernel. So much for saving the planet by going into suspend... ...or are you thinking that it won't taint the kernel when the kernel takes CPUs offline for suspend/resume purposes? ...or are you thinking you've some solution that works for suspend/resume that doesn't work for the general cpu offlining problem? I'd be very interested to hear... I know I'm not a maintainer, but if I were and I knew that lots of smart people had thought about the problem of CPU offlining and they didn't have a solution and I could make my platform 99.99999999% reliable by allowing a very safe mdelay(1) where I had a pretty strong guarantee that the 1ms was enough time, I would probably accept that code... So since I'm not a maintainer and I certainly couldn't ack such code, I would certainly be happy to add my Reviewed-by to Caesar's patch if he changed it mention that he needed to make sure that v7_exit_coherency_flush() in rockchip_cpu_die() executed in time. -Doug
? 2015?06?06? 04:24, Doug Anderson ??: > Russell, > > On Fri, Jun 5, 2015 at 11:29 AM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: >> 1) v7_coherency_exit() is specific to v7 CPUs and can't be used by >> generic code. > Oh, I see. So (I think) you're saying that perhaps the reason that > Caesar needed his patch was that he needed the dying processor to > execute v7_exit_coherency_flush(), NOT that he needed the dying > processor to be in WFI/WFE. That actually makes a lot more sense to > me! :) Thanks a lot for pointing that out, it's very helpful. > > >> So, we're actually in a very sticky position over taking CPUs offline. >> It seems to be something that the ARM architecture and kernel >> architecture doesn't actually allow to be done safely. So much so, >> that in a similar way to the original Keystone 2 physical address >> switch, I'm tempted to make taking a CPU offline taint the kernel! > Wow, that's going to suck. So if you want to suspend / resume you > need to taint your kernel. So much for saving the planet by going > into suspend... ...or are you thinking that it won't taint the kernel > when the kernel takes CPUs offline for suspend/resume purposes? ...or > are you thinking you've some solution that works for suspend/resume > that doesn't work for the general cpu offlining problem? I'd be very > interested to hear... > > > I know I'm not a maintainer, but if I were and I knew that lots of > smart people had thought about the problem of CPU offlining and they > didn't have a solution and I could make my platform 99.99999999% > reliable by allowing a very safe mdelay(1) where I had a pretty strong > guarantee that the 1ms was enough time, I would probably accept that > code... > > > So since I'm not a maintainer and I certainly couldn't ack such code, > I would certainly be happy to add my Reviewed-by to Caesar's patch if > he changed it mention that he needed to make sure that > v7_exit_coherency_flush() in rockchip_cpu_die() executed in time. OK. The dying processor to execute v7_exit_coherency_flush(),not that the dying processor to be in WFI/WFE. It's needed to enter WFI/WFE state from the ARM refer document when CPU down. But...... Here is my test: (won't to enter the WFI state) @@ -331,8 +331,8 @@ static int rockchip_cpu_kill(unsigned int cpu) static void rockchip_cpu_die(unsigned int cpu) { v7_exit_coherency_flush(louis); - while (1) - cpu_do_idle(); + while (1); + //cpu_do_idle(); } echo 0 > /sys/module/printk/parameters/console_suspend echo 1 > /sys/power/pm_print_times echo mem > sys/power/state You can play anything or do some test for CPU up/down: cd /sys/devices/system/cpu/ for i in $(seq 10000); do echo "================= $i ============" for j in $(seq 100); do while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != "000" ]]; do echo 0 > cpu1/online echo 0 > cpu2/online echo 0 > cpu3/online done while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != "111" ]]; do echo 1 > cpu1/online echo 1 > cpu2/online echo 1 > cpu3/online done done done Sometimes,the system will be restart when do the about test. I'm no sure what's happen, That maybe abnormal won't to enter the WFI state. > > -Doug > > >
Caesar, On Sun, Jun 7, 2015 at 9:56 PM, Caesar Wang <wxt@rock-chips.com> wrote: > OK. > The dying processor to execute v7_exit_coherency_flush(),not that the dying > processor to be in WFI/WFE. > > It's needed to enter WFI/WFE state from the ARM refer document when CPU > down. > > But...... > > Here is my test: (won't to enter the WFI state) > @@ -331,8 +331,8 @@ static int rockchip_cpu_kill(unsigned int cpu) > static void rockchip_cpu_die(unsigned int cpu) > { > v7_exit_coherency_flush(louis); > - while (1) > - cpu_do_idle(); > + while (1); > + //cpu_do_idle(); > } > > echo 0 > /sys/module/printk/parameters/console_suspend > echo 1 > /sys/power/pm_print_times > echo mem > sys/power/state > > You can play anything > or do some test for CPU up/down: > > cd /sys/devices/system/cpu/ > for i in $(seq 10000); do > echo "================= $i ============" > for j in $(seq 100); do > while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != > "000" ]]; do > > echo 0 > cpu1/online > echo 0 > cpu2/online > echo 0 > cpu3/online > done > while [[ "$(cat cpu1/online)$(cat cpu2/online)$(cat cpu3/online)" != > "111" ]]; do > echo 1 > cpu1/online > echo 1 > cpu2/online > echo 1 > cpu3/online > done > done > done > Sometimes,the system will be restart when do the about test. I assume you meant the _above_ test. So it sometimes works and sometimes doesn't? Strange... > I'm no sure what's happen, That maybe abnormal won't to enter the WFI state. Good test. I think I understood what you said above: basically if you don't get to the WFI state then the system will sometimes restart. I guess that seems a little strange to me since I would have thought that since you assert "reset" to the CPU before you power it off it wouldn't matter a whole lot what state the system was in. The processor has exited concurrency and isn't touching any memory, so nothing it does should be hurting anyone. ...but I also am only guessing about how all this works / trying to use common sense. If we really need to be in WFI/WFE then I guess that's what we need to do. It still makes me nervous to say both that we "need to be in WFI" and that we have a loop around cpu_do_idle(). The loop implies that cpu_do_idle() might return sometimes. That would be OK, except now we've learned that if we happen to kill the CPU while we're executing the loop that it might crash the system. That's pretty non-ideal. I know the chances are incredibly unlikely, but still something that worries me... -Doug
diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c index 25da16f..6672fdd 100644 --- a/arch/arm/mach-rockchip/platsmp.c +++ b/arch/arm/mach-rockchip/platsmp.c @@ -325,6 +325,9 @@ static void __init rockchip_smp_prepare_cpus(unsigned int max_cpus) #ifdef CONFIG_HOTPLUG_CPU static int rockchip_cpu_kill(unsigned int cpu) { + /* ensure CPU can enter the WFI/WFE state */ + mdelay(1); + pmu_set_power_domain(0 + cpu, false); return 1; }
In idle mode, core1/2/3 of Cortex-A17 should be either power off or in WFI/WFE state. we can delay 1ms to ensure the CPU enter WFI/WFE state. Signed-off-by: Caesar Wang <wxt@rock-chips.com> --- arch/arm/mach-rockchip/platsmp.c | 3 +++ 1 file changed, 3 insertions(+)