diff mbox

[v2] ARM: Don't use complete() during __cpu_die

Message ID alpine.LFD.2.11.1502250941210.25484@knanqh.ubzr (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolas Pitre Feb. 25, 2015, 4:47 p.m. UTC
On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:

> On Thu, Feb 05, 2015 at 04:11:00PM +0000, Russell King - ARM Linux wrote:
> > On Thu, Feb 05, 2015 at 06:29:18AM -0800, Paul E. McKenney wrote:
> > > Works for me, assuming no hidden uses of RCU in the IPI code.  ;-)
> > 
> > Sigh... I kind'a new it wouldn't be this simple.  The gic code which
> > actually raises the IPI takes a raw spinlock, so it's not going to be
> > this simple - there's a small theoretical window where we have taken
> > this lock, written the register to send the IPI, and then dropped the
> > lock - the update to the lock to release it could get lost if the
> > CPU power is quickly cut at that point.
> > 
> > Also, we _do_ need the second cache flush in place to ensure that the
> > unlock is seen to other CPUs.
> 
> It's time to start discussing this problem again now that we're the
> other side of the merge window.
> 
> I've been thinking about the lock in the GIC code.  Do we actually need
> this lock in gic_raise_softirq(), or could we move this lock into the
> higher level code?

It could be a rw lock as you say.

> Let's consider the bL switcher.
> 
> I think the bL switcher is racy wrt CPU hotplug at the moment.  What
> happens if we're sleeping in wait_for_completion(&inbound_alive) and
> CPU hotplug unplugs the CPU outgoing CPU?  What protects us against
> this scenario?  I can't see anything in bL_switch_to() which ensures
> that CPU hotplug can't run.

True.  The actual switch would then be suspended in mid air until that 
CPU is plugged back in.  The inbound CPU would wait at mcpm_entry_gated 
until the outbound CPU comes back to open the gate.  There wouldn't be 
much harm besides the minor fact that the inbound CPU would be wasting 
more power while looping on a WFE compared to its previously disabled 
state.  I'm still debating if this is worth fixing.

> Let's assume that this rather glaring bug has been fixed, and that CPU
> hotplug can't run in parallel with the bL switcher (and hence
> gic_migrate_target() can't run concurrently with a CPU being taken
> offline.)

I'm still trying to figure out how this might happen.  At the point 
where gic_migrate_target() is called, IRQs are disabled and nothing can 
prevent the switch from happening anymore.  Any IPI attempting to stop 
that CPU for hotplug would be pending until the inbound CPU 
eventually honors it.

> If we have that guarantee, then we don't need to take a lock when sending
> the completion IPI - we would know that while a CPU is being taken down,
> the bL switcher could not run.  What we now need is a lock-less way to
> raise an IPI.
>
> Now, is the locking between the normal IPI paths and the bL switcher
> something that is specific to the interrupt controller, or should generic
> code care about it?  I think it's something generic code should care about
> - and I believe that would make life a little easier.

Well... The only reason for having a lock there is to ensure that no 
IPIs are sent to the outbound CPU after gic_cpu_map[] has been modified 
and pending IPIs on the outbound CPU have been migrated to the inbound 
CPU.  I think this is pretty specific to the GIC driver code.

If there was a way for gic_migrate_target() to be sure no other CPUs are 
using the old gic_cpu_map value any longer then no lock would be needed 
in gic_raise_softirq().  The code in gic_migrate_target() would only 
have to wait until it is safe to migrate pending IPIs on the outbound 
CPU without missing any.

> The current bL switcher idea is to bring the new CPU up, disable IRQs
> and FIQs on the outgoing CPU, change the IRQ/IPI targets, then read
> any pending SGIs and raise them on the new CPU.  But what about any
> pending SPIs?  These look like they could be lost.

SPIs are raised and cleared independently of their distribution config.  
So the only thing that gic_migrate_target() has to do is to disable the 
distribution target for the outbound CPU and enable the target for the 
inbound CPU.  This way unserviced IRQs become pending on the outbound 
CPU automatically. The only other part that plays with targets is 
gic_set_affinity() and irq_controller_lock protects against concurrency 
here.

> How about this for an idea instead - the bL switcher code:
> 
> - brings the new CPU online.
> - disables IRQs and FIQs.
> - takes the IPI lock, which prevents new IPIs being raised.
> - re-targets IRQs and IPIs onto the new CPU.
> - releases the IPI lock.

But aren't we trying to get rid of that IPI lock to start with?  I'd 
personally love to remove it -- it's been nagging me since I initially 
added it.

> - re-enables IRQs and FIQs.
> - polls the IRQ controller to wait for any remaining IRQs and IPIs
>   to be delivered.

Poll for how long? How can you be sure no other CPU is in the process of 
targetting an IPI to the outbound CPU?  With things like the FIQ 
debugger coming to mainline or even JTAG-based debuggers, this could 
represent an indetermined amount of time if the sending CPU is stopped 
at the right moment.

That notwithstanding, I'm afraid this would open a big can of worms.  
The CPU would no longer have functional interrupts since they're all 
directed to the inbound CPU at that point.  Any IRQ controls are now 
directed to the new CPU and things like self-IPIs (first scenario that 
comes to my mind) would no longer produce the expected result.  I'd much 
prefer to get over with the switch ASAP at that point rather than 
letting the outbound CPU run much longer in a degraded state.

> - re-disables IRQs and FIQs (which shouldn't be received anyway since
>   they're now targetting the new CPU.)
> - shuts down the tick device.
> - completes the switch
> 
> What this means is that we're not needing to have complex code in the
> interrupt controllers to re-raise interrupts on other CPUs, and we
> don't need a lock in the interrupt controller code synchronising IPI
> raising with the bL switcher.
> 
> I'd also suggest is that this IPI lock should _not_ be a spinlock - it
> should be a read/write spin lock - it's perfectly acceptable to have
> multiple CPUs raising IPIs to each other, but it is not acceptable to
> have any CPU raising an IPI when the bL switcher is modifying the IRQ
> targets.  That fits the rwlock semantics.
> 
> What this means is that gic_raise_softirq() should again become a lock-
> less function, which opens the door to using an IPI to complete the
> CPU hot-unplug operation cleanly.
> 
> Thoughts (especially from Nico)?

I completely agree with the r/w spinlock. Something like this ought to 
be sufficient to make gic_raise_softirq() reentrant which is the issue 
here, right?  I've been stress-testing it for a while with no problems 
so far.

Comments

Russell King - ARM Linux Feb. 25, 2015, 5 p.m. UTC | #1
On Wed, Feb 25, 2015 at 11:47:48AM -0500, Nicolas Pitre wrote:
> I completely agree with the r/w spinlock. Something like this ought to 
> be sufficient to make gic_raise_softirq() reentrant which is the issue 
> here, right?  I've been stress-testing it for a while with no problems 
> so far.

No.  The issue is that we need a totally lockless way to raise an IPI
during CPU hot-unplug, so we can raise an IPI in __cpu_die() to tell
the __cpu_kill() code that it's safe to proceed to platform code.

As soon sa that IPI has been received, the receiving CPU can decide
to cut power to the dying CPU.  So, it's entirely possible that power
could be lost on the dying CPU before the unlock has become visible.

It's a catch-22 - the reason we're sending the IPI is for synchronisation,
but right now we need another form of synchronisation because we're
using a form of synchronisation...

We could just use the spin-and-poll solution instead of an IPI, but
I really don't like that - when you see the complexity needed to
re-initialise it each time, it quickly becomes very yucky because
there is no well defined order between __cpu_die() and __cpu_kill()
being called by the two respective CPUs.

The last patch I saw doing that had multiple bits to indicate success
and timeout, and rather a lot of complexity to recover from failures,
and reinitialise state for a second CPU going down.
Nicolas Pitre Feb. 25, 2015, 6:13 p.m. UTC | #2
On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:

> On Wed, Feb 25, 2015 at 11:47:48AM -0500, Nicolas Pitre wrote:
> > I completely agree with the r/w spinlock. Something like this ought to 
> > be sufficient to make gic_raise_softirq() reentrant which is the issue 
> > here, right?  I've been stress-testing it for a while with no problems 
> > so far.
> 
> No.  The issue is that we need a totally lockless way to raise an IPI
> during CPU hot-unplug, so we can raise an IPI in __cpu_die() to tell
> the __cpu_kill() code that it's safe to proceed to platform code.
> 
> As soon sa that IPI has been received, the receiving CPU can decide
> to cut power to the dying CPU.  So, it's entirely possible that power
> could be lost on the dying CPU before the unlock has become visible.

However... wouldn't this be fragile to rely on every interrupt 
controller drivers to never modify RAM in their IPI sending path?  That 
would constitute an estrange requirement on IRQ controller drivers that 
was never spelled out before.

> It's a catch-22 - the reason we're sending the IPI is for synchronisation,
> but right now we need another form of synchronisation because we're
> using a form of synchronisation...

Can't the dying CPU pull the plug by itself in most cases?

> We could just use the spin-and-poll solution instead of an IPI, but
> I really don't like that - when you see the complexity needed to
> re-initialise it each time, it quickly becomes very yucky because
> there is no well defined order between __cpu_die() and __cpu_kill()
> being called by the two respective CPUs.
> 
> The last patch I saw doing that had multiple bits to indicate success
> and timeout, and rather a lot of complexity to recover from failures,
> and reinitialise state for a second CPU going down.

What about a per CPU state?  That would at least avoid the need to 
serialize things across CPUs.  If only one CPU may write its state, that 
should eliminate the need for any kind of locking.


Nicolas
Daniel Thompson Feb. 26, 2015, 7:14 p.m. UTC | #3
On Wed, 2015-02-25 at 11:47 -0500, Nicolas Pitre wrote:
> On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:
> 
> > On Thu, Feb 05, 2015 at 04:11:00PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Feb 05, 2015 at 06:29:18AM -0800, Paul E. McKenney wrote:
> > > > Works for me, assuming no hidden uses of RCU in the IPI code.  ;-)
> > > 
> > > Sigh... I kind'a new it wouldn't be this simple.  The gic code which
> > > actually raises the IPI takes a raw spinlock, so it's not going to be
> > > this simple - there's a small theoretical window where we have taken
> > > this lock, written the register to send the IPI, and then dropped the
> > > lock - the update to the lock to release it could get lost if the
> > > CPU power is quickly cut at that point.
> > > 
> > > Also, we _do_ need the second cache flush in place to ensure that the
> > > unlock is seen to other CPUs.
> > 
> > It's time to start discussing this problem again now that we're the
> > other side of the merge window.
> > 
> > I've been thinking about the lock in the GIC code.  Do we actually need
> > this lock in gic_raise_softirq(), or could we move this lock into the
> > higher level code?
> 
> It could be a rw lock as you say.
> 
> > Let's consider the bL switcher.
> > 
> > I think the bL switcher is racy wrt CPU hotplug at the moment.  What
> > happens if we're sleeping in wait_for_completion(&inbound_alive) and
> > CPU hotplug unplugs the CPU outgoing CPU?  What protects us against
> > this scenario?  I can't see anything in bL_switch_to() which ensures
> > that CPU hotplug can't run.
> 
> True.  The actual switch would then be suspended in mid air until that 
> CPU is plugged back in.  The inbound CPU would wait at mcpm_entry_gated 
> until the outbound CPU comes back to open the gate.  There wouldn't be 
> much harm besides the minor fact that the inbound CPU would be wasting 
> more power while looping on a WFE compared to its previously disabled 
> state.  I'm still debating if this is worth fixing.
> 
> > Let's assume that this rather glaring bug has been fixed, and that CPU
> > hotplug can't run in parallel with the bL switcher (and hence
> > gic_migrate_target() can't run concurrently with a CPU being taken
> > offline.)
> 
> I'm still trying to figure out how this might happen.  At the point 
> where gic_migrate_target() is called, IRQs are disabled and nothing can 
> prevent the switch from happening anymore.  Any IPI attempting to stop 
> that CPU for hotplug would be pending until the inbound CPU 
> eventually honors it.
> 
> > If we have that guarantee, then we don't need to take a lock when sending
> > the completion IPI - we would know that while a CPU is being taken down,
> > the bL switcher could not run.  What we now need is a lock-less way to
> > raise an IPI.
> >
> > Now, is the locking between the normal IPI paths and the bL switcher
> > something that is specific to the interrupt controller, or should generic
> > code care about it?  I think it's something generic code should care about
> > - and I believe that would make life a little easier.
> 
> Well... The only reason for having a lock there is to ensure that no 
> IPIs are sent to the outbound CPU after gic_cpu_map[] has been modified 
> and pending IPIs on the outbound CPU have been migrated to the inbound 
> CPU.  I think this is pretty specific to the GIC driver code.
> 
> If there was a way for gic_migrate_target() to be sure no other CPUs are 
> using the old gic_cpu_map value any longer then no lock would be needed 
> in gic_raise_softirq().  The code in gic_migrate_target() would only 
> have to wait until it is safe to migrate pending IPIs on the outbound 
> CPU without missing any.
> 
> > The current bL switcher idea is to bring the new CPU up, disable IRQs
> > and FIQs on the outgoing CPU, change the IRQ/IPI targets, then read
> > any pending SGIs and raise them on the new CPU.  But what about any
> > pending SPIs?  These look like they could be lost.
> 
> SPIs are raised and cleared independently of their distribution config.  
> So the only thing that gic_migrate_target() has to do is to disable the 
> distribution target for the outbound CPU and enable the target for the 
> inbound CPU.  This way unserviced IRQs become pending on the outbound 
> CPU automatically. The only other part that plays with targets is 
> gic_set_affinity() and irq_controller_lock protects against concurrency 
> here.
> 
> > How about this for an idea instead - the bL switcher code:
> > 
> > - brings the new CPU online.
> > - disables IRQs and FIQs.
> > - takes the IPI lock, which prevents new IPIs being raised.
> > - re-targets IRQs and IPIs onto the new CPU.
> > - releases the IPI lock.
> 
> But aren't we trying to get rid of that IPI lock to start with?  I'd 
> personally love to remove it -- it's been nagging me since I initially 
> added it.
> 
> > - re-enables IRQs and FIQs.
> > - polls the IRQ controller to wait for any remaining IRQs and IPIs
> >   to be delivered.
> 
> Poll for how long? How can you be sure no other CPU is in the process of 
> targetting an IPI to the outbound CPU?  With things like the FIQ 
> debugger coming to mainline or even JTAG-based debuggers, this could 
> represent an indetermined amount of time if the sending CPU is stopped 
> at the right moment.
> 
> That notwithstanding, I'm afraid this would open a big can of worms.  
> The CPU would no longer have functional interrupts since they're all 
> directed to the inbound CPU at that point.  Any IRQ controls are now 
> directed to the new CPU and things like self-IPIs (first scenario that 
> comes to my mind) would no longer produce the expected result.  I'd much 
> prefer to get over with the switch ASAP at that point rather than 
> letting the outbound CPU run much longer in a degraded state.
> 
> > - re-disables IRQs and FIQs (which shouldn't be received anyway since
> >   they're now targetting the new CPU.)
> > - shuts down the tick device.
> > - completes the switch
> > 
> > What this means is that we're not needing to have complex code in the
> > interrupt controllers to re-raise interrupts on other CPUs, and we
> > don't need a lock in the interrupt controller code synchronising IPI
> > raising with the bL switcher.
> > 
> > I'd also suggest is that this IPI lock should _not_ be a spinlock - it
> > should be a read/write spin lock - it's perfectly acceptable to have
> > multiple CPUs raising IPIs to each other, but it is not acceptable to
> > have any CPU raising an IPI when the bL switcher is modifying the IRQ
> > targets.  That fits the rwlock semantics.
> > 
> > What this means is that gic_raise_softirq() should again become a lock-
> > less function, which opens the door to using an IPI to complete the
> > CPU hot-unplug operation cleanly.
> > 
> > Thoughts (especially from Nico)?
> 
> I completely agree with the r/w spinlock. Something like this ought to 
> be sufficient to make gic_raise_softirq() reentrant which is the issue 
> here, right?  I've been stress-testing it for a while with no problems 
> so far.

Do you fancy trying patch 1 and 2 from this series?
http://thread.gmane.org/gmane.linux.kernel/1881415

The recent FIQ work required gic_raise_softirq() to be reentrant so I
came up with similar patches to yours. As soon as we tease out this code
into a separate lock people observe that the lock can melt away entirely
if the b.L switcher is not compiled in and make that the next move...



> 
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index 4634cf7d0e..3404c6bc12 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -80,6 +80,9 @@ static DEFINE_RAW_SPINLOCK(irq_controller_lock);
>  #define NR_GIC_CPU_IF 8
>  static u8 gic_cpu_map[NR_GIC_CPU_IF] __read_mostly;
>  
> +/* This allows for multiple concurrent users of gic_cpu_map[] */
> +static DEFINE_RWLOCK(gic_cpu_map_lock);
> +
>  /*
>   * Supported arch specific GIC irq extension.
>   * Default make them NULL.
> @@ -627,7 +630,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq)
>  	int cpu;
>  	unsigned long flags, map = 0;
>  
> -	raw_spin_lock_irqsave(&irq_controller_lock, flags);
> +	read_lock_irqsave(&gic_cpu_map_lock, flags);
>  
>  	/* Convert our logical CPU mask into a physical one. */
>  	for_each_cpu(cpu, mask)
> @@ -642,7 +645,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq)
>  	/* this always happens on GIC0 */
>  	writel_relaxed(map << 16 | irq, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT);
>  
> -	raw_spin_unlock_irqrestore(&irq_controller_lock, flags);
> +	read_unlock_irqrestore(&gic_cpu_map_lock, flags);
>  }
>  #endif
>  
> @@ -711,6 +714,7 @@ void gic_migrate_target(unsigned int new_cpu_id)
>  	cur_target_mask = 0x01010101 << cur_cpu_id;
>  	ror_val = (cur_cpu_id - new_cpu_id) & 31;
>  
> +	write_lock(&gic_cpu_map_lock);
>  	raw_spin_lock(&irq_controller_lock);
>  
>  	/* Update the target interface for this logical CPU */
> @@ -731,6 +735,7 @@ void gic_migrate_target(unsigned int new_cpu_id)
>  		}
>  	}
>  
> +	write_unlock(&gic_cpu_map_lock);
>  	raw_spin_unlock(&irq_controller_lock);
>  
>  	/*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Nicolas Pitre Feb. 26, 2015, 7:47 p.m. UTC | #4
On Thu, 26 Feb 2015, Daniel Thompson wrote:

> On Wed, 2015-02-25 at 11:47 -0500, Nicolas Pitre wrote:
> > On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:
> > 
> > > On Thu, Feb 05, 2015 at 04:11:00PM +0000, Russell King - ARM Linux wrote:
> > > > On Thu, Feb 05, 2015 at 06:29:18AM -0800, Paul E. McKenney wrote:
> > > > > Works for me, assuming no hidden uses of RCU in the IPI code.  ;-)
> > > > 
> > > > Sigh... I kind'a new it wouldn't be this simple.  The gic code which
> > > > actually raises the IPI takes a raw spinlock, so it's not going to be
> > > > this simple - there's a small theoretical window where we have taken
> > > > this lock, written the register to send the IPI, and then dropped the
> > > > lock - the update to the lock to release it could get lost if the
> > > > CPU power is quickly cut at that point.
> > > > 
> > > > Also, we _do_ need the second cache flush in place to ensure that the
> > > > unlock is seen to other CPUs.
> > > 
> > > It's time to start discussing this problem again now that we're the
> > > other side of the merge window.
> > > 
> > > I've been thinking about the lock in the GIC code.  Do we actually need
> > > this lock in gic_raise_softirq(), or could we move this lock into the
> > > higher level code?
> > 
> > It could be a rw lock as you say.
> > 
> > > Let's consider the bL switcher.
> > > 
> > > I think the bL switcher is racy wrt CPU hotplug at the moment.  What
> > > happens if we're sleeping in wait_for_completion(&inbound_alive) and
> > > CPU hotplug unplugs the CPU outgoing CPU?  What protects us against
> > > this scenario?  I can't see anything in bL_switch_to() which ensures
> > > that CPU hotplug can't run.
> > 
> > True.  The actual switch would then be suspended in mid air until that 
> > CPU is plugged back in.  The inbound CPU would wait at mcpm_entry_gated 
> > until the outbound CPU comes back to open the gate.  There wouldn't be 
> > much harm besides the minor fact that the inbound CPU would be wasting 
> > more power while looping on a WFE compared to its previously disabled 
> > state.  I'm still debating if this is worth fixing.
> > 
> > > Let's assume that this rather glaring bug has been fixed, and that CPU
> > > hotplug can't run in parallel with the bL switcher (and hence
> > > gic_migrate_target() can't run concurrently with a CPU being taken
> > > offline.)
> > 
> > I'm still trying to figure out how this might happen.  At the point 
> > where gic_migrate_target() is called, IRQs are disabled and nothing can 
> > prevent the switch from happening anymore.  Any IPI attempting to stop 
> > that CPU for hotplug would be pending until the inbound CPU 
> > eventually honors it.
> > 
> > > If we have that guarantee, then we don't need to take a lock when sending
> > > the completion IPI - we would know that while a CPU is being taken down,
> > > the bL switcher could not run.  What we now need is a lock-less way to
> > > raise an IPI.
> > >
> > > Now, is the locking between the normal IPI paths and the bL switcher
> > > something that is specific to the interrupt controller, or should generic
> > > code care about it?  I think it's something generic code should care about
> > > - and I believe that would make life a little easier.
> > 
> > Well... The only reason for having a lock there is to ensure that no 
> > IPIs are sent to the outbound CPU after gic_cpu_map[] has been modified 
> > and pending IPIs on the outbound CPU have been migrated to the inbound 
> > CPU.  I think this is pretty specific to the GIC driver code.
> > 
> > If there was a way for gic_migrate_target() to be sure no other CPUs are 
> > using the old gic_cpu_map value any longer then no lock would be needed 
> > in gic_raise_softirq().  The code in gic_migrate_target() would only 
> > have to wait until it is safe to migrate pending IPIs on the outbound 
> > CPU without missing any.
> > 
> > > The current bL switcher idea is to bring the new CPU up, disable IRQs
> > > and FIQs on the outgoing CPU, change the IRQ/IPI targets, then read
> > > any pending SGIs and raise them on the new CPU.  But what about any
> > > pending SPIs?  These look like they could be lost.
> > 
> > SPIs are raised and cleared independently of their distribution config.  
> > So the only thing that gic_migrate_target() has to do is to disable the 
> > distribution target for the outbound CPU and enable the target for the 
> > inbound CPU.  This way unserviced IRQs become pending on the outbound 
> > CPU automatically. The only other part that plays with targets is 
> > gic_set_affinity() and irq_controller_lock protects against concurrency 
> > here.
> > 
> > > How about this for an idea instead - the bL switcher code:
> > > 
> > > - brings the new CPU online.
> > > - disables IRQs and FIQs.
> > > - takes the IPI lock, which prevents new IPIs being raised.
> > > - re-targets IRQs and IPIs onto the new CPU.
> > > - releases the IPI lock.
> > 
> > But aren't we trying to get rid of that IPI lock to start with?  I'd 
> > personally love to remove it -- it's been nagging me since I initially 
> > added it.
> > 
> > > - re-enables IRQs and FIQs.
> > > - polls the IRQ controller to wait for any remaining IRQs and IPIs
> > >   to be delivered.
> > 
> > Poll for how long? How can you be sure no other CPU is in the process of 
> > targetting an IPI to the outbound CPU?  With things like the FIQ 
> > debugger coming to mainline or even JTAG-based debuggers, this could 
> > represent an indetermined amount of time if the sending CPU is stopped 
> > at the right moment.
> > 
> > That notwithstanding, I'm afraid this would open a big can of worms.  
> > The CPU would no longer have functional interrupts since they're all 
> > directed to the inbound CPU at that point.  Any IRQ controls are now 
> > directed to the new CPU and things like self-IPIs (first scenario that 
> > comes to my mind) would no longer produce the expected result.  I'd much 
> > prefer to get over with the switch ASAP at that point rather than 
> > letting the outbound CPU run much longer in a degraded state.
> > 
> > > - re-disables IRQs and FIQs (which shouldn't be received anyway since
> > >   they're now targetting the new CPU.)
> > > - shuts down the tick device.
> > > - completes the switch
> > > 
> > > What this means is that we're not needing to have complex code in the
> > > interrupt controllers to re-raise interrupts on other CPUs, and we
> > > don't need a lock in the interrupt controller code synchronising IPI
> > > raising with the bL switcher.
> > > 
> > > I'd also suggest is that this IPI lock should _not_ be a spinlock - it
> > > should be a read/write spin lock - it's perfectly acceptable to have
> > > multiple CPUs raising IPIs to each other, but it is not acceptable to
> > > have any CPU raising an IPI when the bL switcher is modifying the IRQ
> > > targets.  That fits the rwlock semantics.
> > > 
> > > What this means is that gic_raise_softirq() should again become a lock-
> > > less function, which opens the door to using an IPI to complete the
> > > CPU hot-unplug operation cleanly.
> > > 
> > > Thoughts (especially from Nico)?
> > 
> > I completely agree with the r/w spinlock. Something like this ought to 
> > be sufficient to make gic_raise_softirq() reentrant which is the issue 
> > here, right?  I've been stress-testing it for a while with no problems 
> > so far.
> 
> Do you fancy trying patch 1 and 2 from this series?
> http://thread.gmane.org/gmane.linux.kernel/1881415
> 
> The recent FIQ work required gic_raise_softirq() to be reentrant so I
> came up with similar patches to yours. As soon as we tease out this code
> into a separate lock people observe that the lock can melt away entirely
> if the b.L switcher is not compiled in and make that the next move...

Patch #1 is wrong.  It provably opens a race.  I'll reply on that thread 
to explain why.  Patch #2 is fine.


Nicolas
diff mbox

Patch

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 4634cf7d0e..3404c6bc12 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -80,6 +80,9 @@  static DEFINE_RAW_SPINLOCK(irq_controller_lock);
 #define NR_GIC_CPU_IF 8
 static u8 gic_cpu_map[NR_GIC_CPU_IF] __read_mostly;
 
+/* This allows for multiple concurrent users of gic_cpu_map[] */
+static DEFINE_RWLOCK(gic_cpu_map_lock);
+
 /*
  * Supported arch specific GIC irq extension.
  * Default make them NULL.
@@ -627,7 +630,7 @@  static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq)
 	int cpu;
 	unsigned long flags, map = 0;
 
-	raw_spin_lock_irqsave(&irq_controller_lock, flags);
+	read_lock_irqsave(&gic_cpu_map_lock, flags);
 
 	/* Convert our logical CPU mask into a physical one. */
 	for_each_cpu(cpu, mask)
@@ -642,7 +645,7 @@  static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq)
 	/* this always happens on GIC0 */
 	writel_relaxed(map << 16 | irq, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT);
 
-	raw_spin_unlock_irqrestore(&irq_controller_lock, flags);
+	read_unlock_irqrestore(&gic_cpu_map_lock, flags);
 }
 #endif
 
@@ -711,6 +714,7 @@  void gic_migrate_target(unsigned int new_cpu_id)
 	cur_target_mask = 0x01010101 << cur_cpu_id;
 	ror_val = (cur_cpu_id - new_cpu_id) & 31;
 
+	write_lock(&gic_cpu_map_lock);
 	raw_spin_lock(&irq_controller_lock);
 
 	/* Update the target interface for this logical CPU */
@@ -731,6 +735,7 @@  void gic_migrate_target(unsigned int new_cpu_id)
 		}
 	}
 
+	write_unlock(&gic_cpu_map_lock);
 	raw_spin_unlock(&irq_controller_lock);
 
 	/*