Message ID | 1353003478-22291-1-git-send-email-srinivas.kandagatla@st.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, On Thu, Nov 15, 2012 at 06:17:58PM +0000, Srinivas KANDAGATLA wrote: > From: Srinivas Kandagatla <srinivas.kandagatla@st.com> > > Use-case is simple kexec on smp system. > > Lets say there are two cores core-0 and core-1. > > sequence here is: > As part of kexec call, the secondary cores are stopped by calling > machine_shutdown(), which in turn issues IPIs to off-line the other CPUs > and then call smp_ops.cpu_kill. > > machine_shutdown invokes smp_send_stop from core-0 which then sends > IPI_CPU_STOP ipi message for core-1 and then smp_kill_cpus call > smp_ops.cpu_kill from core-0 with cpu=1. > > The expectation here is that cpu_kill will relocate secondary-core > to safe location from primary-core(core-0). Not quite. The expectation is actually that you wiggle a line to a power controller and the secondary core either disappears or resets. > This is a catch 22 situation for SOCs, which can't change PC's of other > cores. At the point when cpu_kill is invoked, it cant execute any > function calls for core-1, as it is off-line and is spinning in > cpu_relax. I thought these SoCs were fairly rare in production -- usually there is some way to power down secondary cores. > So the only way is to put the core into safe place by executing some > SOC specific relocation code. > > Having a callback from ipi handler when cpu is stopped would be very > usefull for SOC's which want to relocate the secondary-core to safe > location. I thought about this for Versatile Express but even if you do that, your problems are just beginning. The bigger issue is co-ordinating with the secondaries so you know when they're safely out of the way and you can proceed with the kexec. Remember that you won't have working locks and you can't cross-call a function because it won't actually return. If you can solve the synchronisation problem then we can think about adding these hooks. Will
On 15/11/12 21:18, Will Deacon wrote: > Hello, > > On Thu, Nov 15, 2012 at 06:17:58PM +0000, Srinivas KANDAGATLA wrote: >> From: Srinivas Kandagatla <srinivas.kandagatla@st.com> >> >> Use-case is simple kexec on smp system. >> >> Lets say there are two cores core-0 and core-1. >> >> sequence here is: >> As part of kexec call, the secondary cores are stopped by calling >> machine_shutdown(), which in turn issues IPIs to off-line the other CPUs >> and then call smp_ops.cpu_kill. >> >> machine_shutdown invokes smp_send_stop from core-0 which then sends >> IPI_CPU_STOP ipi message for core-1 and then smp_kill_cpus call >> smp_ops.cpu_kill from core-0 with cpu=1. >> >> The expectation here is that cpu_kill will relocate secondary-core >> to safe location from primary-core(core-0). > Not quite. The expectation is actually that you wiggle a line to a power > controller and the secondary core either disappears or resets. Ok, I got this. But In our chips we can't control power/reset to individual cores, reset is tied up to both the cores. > >> This is a catch 22 situation for SOCs, which can't change PC's of other >> cores. At the point when cpu_kill is invoked, it cant execute any >> function calls for core-1, as it is off-line and is spinning in >> cpu_relax. > I thought these SoCs were fairly rare in production -- usually there is > some way to power down secondary cores. We have at-least 3 chips of this category, and we are looking forward to mainline the linux-support soon. > >> So the only way is to put the core into safe place by executing some >> SOC specific relocation code. >> >> Having a callback from ipi handler when cpu is stopped would be very >> usefull for SOC's which want to relocate the secondary-core to safe >> location. > I thought about this for Versatile Express but even if you do that, your > problems are just beginning. Ok, atleast some SOC has similar requirement. > The bigger issue is co-ordinating with the > secondaries so you know when they're safely out of the way and you can > proceed with the kexec. Remember that you won't have working locks and you > can't cross-call a function because it won't actually return. Yes I agree and understand the limitation. The callback should simple and just relocate the secondary core to a safe place which in our case is a holding pen, which is safe and not in the system ram. After the callback, the secondary core will be spinning in the holding pen which will be released once they get a matching cpu-id. And this approach works perfectly ok on our chips and I guess should work for other chips aswell. > If you can solve the synchronisation problem then we can think about adding > these hooks. I think the call back code should not do anything complex other than few lines of assembly or jump to a holding pen, this way it does not need any synchronization calls or system infrastructure. Most of the SOCs in the current tree return 1 for cpu_kill callback, Am not sure if they fit in this category. I think adding the callback hook would also benefit other chips which fit in this category. ?? Thanks, srini > > Will >
On Fri, Nov 16, 2012 at 08:39:19AM +0000, Srinivas KANDAGATLA wrote: > On 15/11/12 21:18, Will Deacon wrote: > > > > The bigger issue is co-ordinating with the > > secondaries so you know when they're safely out of the way and you can > > proceed with the kexec. Remember that you won't have working locks and you > > can't cross-call a function because it won't actually return. > > Yes I agree and understand the limitation. > The callback should simple and just relocate the secondary core to a > safe place which in our case is a holding pen, which is safe and not in > the system ram. > After the callback, the secondary core will be spinning in the holding > pen which will be released once they get a matching cpu-id. > And this approach works perfectly ok on our chips and I guess should > work for other chips aswell. But how do you know when the callback is complete? That's the part that is tricky as you need to avoid clobbering the kernel image before you know for sure that all the secondaries are out of the way. I think you either need some horrible homebrew locking primitives or something in hardware to signal back to the primary CPU. > > If you can solve the synchronisation problem then we can think about adding > > these hooks. > > I think the call back code should not do anything complex other than few > lines of assembly or jump to a holding pen, this way it does not need > any synchronization calls or system infrastructure. See above. Will
On 19/11/12 10:58, Will Deacon wrote: > On Fri, Nov 16, 2012 at 08:39:19AM +0000, Srinivas KANDAGATLA wrote: >> On 15/11/12 21:18, Will Deacon wrote: >>> The bigger issue is co-ordinating with the >>> secondaries so you know when they're safely out of the way and you can >>> proceed with the kexec. Remember that you won't have working locks and you >>> can't cross-call a function because it won't actually return. >> Yes I agree and understand the limitation. >> The callback should simple and just relocate the secondary core to a >> safe place which in our case is a holding pen, which is safe and not in >> the system ram. >> After the callback, the secondary core will be spinning in the holding >> pen which will be released once they get a matching cpu-id. >> And this approach works perfectly ok on our chips and I guess should >> work for other chips aswell. > But how do you know when the callback is complete? That's the part that is > tricky as you need to avoid clobbering the kernel image before you know for > sure that all the secondaries are out of the way. Yes, there is a very small window of opportunity for the primary core to continue with new kernel image. We have two options here: 1> I think the existing infrastructure of cpu_kill can be re-used to know if the callback is complete and is very much specific to platform implementation. 2> Add a is_cpu_stopped() in smp_operations and call it from primary core with cpu set to secondary cores, similar to smp_kill_cpus call. This is also specific to platform implementation. Both of them will address the concern. ??? > I think you either need > some horrible homebrew locking primitives or something in hardware to signal > back to the primary CPU. > >>> If you can solve the synchronisation problem then we can think about adding >>> these hooks. >> I think the call back code should not do anything complex other than few >> lines of assembly or jump to a holding pen, this way it does not need >> any synchronization calls or system infrastructure. > See above. > > Will
On Mon, Nov 19, 2012 at 12:07:57PM +0000, Srinivas KANDAGATLA wrote: > On 19/11/12 10:58, Will Deacon wrote: > > On Fri, Nov 16, 2012 at 08:39:19AM +0000, Srinivas KANDAGATLA wrote: > >> On 15/11/12 21:18, Will Deacon wrote: > >>> The bigger issue is co-ordinating with the > >>> secondaries so you know when they're safely out of the way and you can > >>> proceed with the kexec. Remember that you won't have working locks and you > >>> can't cross-call a function because it won't actually return. > >> Yes I agree and understand the limitation. > >> The callback should simple and just relocate the secondary core to a > >> safe place which in our case is a holding pen, which is safe and not in > >> the system ram. > >> After the callback, the secondary core will be spinning in the holding > >> pen which will be released once they get a matching cpu-id. > >> And this approach works perfectly ok on our chips and I guess should > >> work for other chips aswell. > > But how do you know when the callback is complete? That's the part that is > > tricky as you need to avoid clobbering the kernel image before you know for > > sure that all the secondaries are out of the way. > Yes, there is a very small window of opportunity for the primary core to > continue with new kernel image. It might not be that small, particulary if the secondaries are doing something with interrupts disabled and the signal from the primary is delayed. > We have two options here: > > 1> I think the existing infrastructure of cpu_kill can be re-used to > know if the callback is complete and is very much specific to platform > implementation. How? I would just like a description of one concrete implementation so that I know we're adding something which is actually usable. > 2> Add a is_cpu_stopped() in smp_operations and call it from primary > core with cpu set to secondary cores, similar to smp_kill_cpus call. > This is also specific to platform implementation. Again, how does is_cpu_stopped() work? I'm not even sure we can guarantee coherency on this path. Perhaps it's best if you include these patches with the platform support for a board that uses it, then we can see the bigger picture. Will
diff --git a/arch/arm/include/asm/smp.h b/arch/arm/include/asm/smp.h index 2e3be16..0071a54 100644 --- a/arch/arm/include/asm/smp.h +++ b/arch/arm/include/asm/smp.h @@ -100,6 +100,13 @@ struct smp_operations { * This also gives us the initial stack to use for this CPU. */ int (*smp_boot_secondary)(unsigned int cpu, struct task_struct *idle); + + /* Notify interested parties before parking core in cpu_relax. + * SOC's can park cores in safe place in this callback, is very useful + * for cases like kexec. Stopping cpu is the victim cpu in this case. + */ + int (*smp_notify_cpu_stop)(unsigned int cpu); + #ifdef CONFIG_HOTPLUG_CPU int (*cpu_kill)(unsigned int cpu); void (*cpu_die)(unsigned int cpu); diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index fbc8b26..06fcef9 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -564,6 +564,9 @@ static void ipi_cpu_stop(unsigned int cpu) local_fiq_disable(); local_irq_disable(); + if (smp_ops.smp_notify_cpu_stop) + smp_ops.smp_notify_cpu_stop(cpu); + while (1) cpu_relax(); }