diff mbox

[3.18-rc4,v11,3/6] irqchip: gic: Make gic_raise_softirq FIQ-safe

Message ID 1417119024-22844-4-git-send-email-daniel.thompson@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Thompson Nov. 27, 2014, 8:10 p.m. UTC
It is currently possible for FIQ handlers to re-enter gic_raise_softirq()
and lock up.

    	gic_raise_softirq()
	   lock(x);
-~-> FIQ
        handle_fiq()
	   gic_raise_softirq()
	      lock(x);		<-- Lockup

Calling printk() from a FIQ handler can trigger this problem because
printk() raises an IPI when it needs to wake_up_klogd(). More generally,
IPIs are the only means for FIQ handlers to safely defer work to less
restrictive calling context so the function to raise them really needs
to be FIQ-safe.

This patch fixes the problem by converting the cpu_map_migration_lock
into a rwlock making it safe to re-enter the function.

Having made it safe to re-enter gic_raise_softirq() we no longer need to
mask interrupts during gic_raise_softirq() because the b.L migration is
always performed from task context.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
---
 drivers/irqchip/irq-gic.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

Comments

Thomas Gleixner Nov. 27, 2014, 9:45 p.m. UTC | #1
On Thu, 27 Nov 2014, Daniel Thompson wrote:
> It is currently possible for FIQ handlers to re-enter gic_raise_softirq()
> and lock up.
> 
>     	gic_raise_softirq()
> 	   lock(x);
> -~-> FIQ
>         handle_fiq()
> 	   gic_raise_softirq()
> 	      lock(x);		<-- Lockup
> 
> Calling printk() from a FIQ handler can trigger this problem because
> printk() raises an IPI when it needs to wake_up_klogd(). More generally,
> IPIs are the only means for FIQ handlers to safely defer work to less
> restrictive calling context so the function to raise them really needs
> to be FIQ-safe.

That's not really true. irq_work can be used from FIQ/NMI context and
it was specifically designed for that purpose.
 
Now printk is a different issue, but there is work in progress to make
printk safe from FIQ/NMI context as well. This is not an ARM specific
issue. Any architecture which has NMI like facilities has the problem
of doing printk from that context. Steven is working on a mitigation
for that. https://lkml.org/lkml/2014/11/18/1146

Thanks,

	tglx
Daniel Thompson Nov. 28, 2014, 9:21 a.m. UTC | #2
On 27/11/14 21:45, Thomas Gleixner wrote:
> On Thu, 27 Nov 2014, Daniel Thompson wrote:
>> It is currently possible for FIQ handlers to re-enter gic_raise_softirq()
>> and lock up.
>>
>>     	gic_raise_softirq()
>> 	   lock(x);
>> -~-> FIQ
>>         handle_fiq()
>> 	   gic_raise_softirq()
>> 	      lock(x);		<-- Lockup
>>
>> Calling printk() from a FIQ handler can trigger this problem because
>> printk() raises an IPI when it needs to wake_up_klogd(). More generally,
>> IPIs are the only means for FIQ handlers to safely defer work to less
>> restrictive calling context so the function to raise them really needs
>> to be FIQ-safe.
> 
> That's not really true. irq_work can be used from FIQ/NMI context and
> it was specifically designed for that purpose.

Actually we cannot currently issue irq_work used from FIQ context;
that's exactly what this patch fixes and is why wake_up_klogd()
currently locks up. ARM implements arch_irq_work_raise() using IPIs (at
least on SMP).

I'll fix the wording to make this more explicit.


> Now printk is a different issue, but there is work in progress to make
> printk safe from FIQ/NMI context as well. This is not an ARM specific
> issue. Any architecture which has NMI like facilities has the problem
> of doing printk from that context. Steven is working on a mitigation
> for that. https://lkml.org/lkml/2014/11/18/1146

Thanks. I'll watch that with interest.

In that case I'll drop the printk() rationale entirely. The rationale
above shoudl be enough motivation because the only other thing likely to
printk() from interrupt is the hard lockup detector and, because that
uses perf events, it requires irq_work be be free from lockups way
before it ever thinks about calling printk().
diff mbox

Patch

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index e875da93f24a..5d72823bc5e9 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -75,22 +75,25 @@  static DEFINE_RAW_SPINLOCK(irq_controller_lock);
 /*
  * This lock is used by the big.LITTLE migration code to ensure no IPIs
  * can be pended on the old core after the map has been updated.
+ *
+ * This lock may be locked for reading from both IRQ and FIQ handlers
+ * and therefore must not be locked for writing when these are enabled.
  */
 #ifdef CONFIG_BL_SWITCHER
-static DEFINE_RAW_SPINLOCK(cpu_map_migration_lock);
+static DEFINE_RWLOCK(cpu_map_migration_lock);
 
-static inline void bl_migration_lock(unsigned long *flags)
+static inline void bl_migration_lock(void)
 {
-	raw_spin_lock_irqsave(&cpu_map_migration_lock, *flags);
+	read_lock(&cpu_map_migration_lock);
 }
 
-static inline void bl_migration_unlock(unsigned long flags)
+static inline void bl_migration_unlock(void)
 {
-	raw_spin_unlock_irqrestore(&cpu_map_migration_lock, flags);
+	read_unlock(&cpu_map_migration_lock);
 }
 #else
-static inline void bl_migration_lock(unsigned long *flags) {}
-static inline void bl_migration_unlock(unsigned long flags) {}
+static inline void bl_migration_lock(void) {}
+static inline void bl_migration_unlock(void) {}
 #endif
 
 /*
@@ -640,12 +643,20 @@  static void __init gic_pm_init(struct gic_chip_data *gic)
 #endif
 
 #ifdef CONFIG_SMP
+/*
+ * Raise the specified IPI on all cpus set in mask.
+ *
+ * This function is safe to call from all calling contexts, including
+ * FIQ handlers. It relies on bl_migration_lock() being multiply acquirable
+ * to avoid deadlocks when the function is re-entered at different
+ * exception levels.
+ */
 static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq)
 {
 	int cpu;
-	unsigned long flags, map = 0;
+	unsigned long map = 0;
 
-	bl_migration_lock(&flags);
+	bl_migration_lock();
 
 	/* Convert our logical CPU mask into a physical one. */
 	for_each_cpu(cpu, mask)
@@ -660,7 +671,7 @@  static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq)
 	/* this always happens on GIC0 */
 	writel_relaxed(map << 16 | irq, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT);
 
-	bl_migration_unlock(flags);
+	bl_migration_unlock();
 }
 #endif
 
@@ -708,7 +719,8 @@  int gic_get_cpu_id(unsigned int cpu)
  * Migrate all peripheral interrupts with a target matching the current CPU
  * to the interface corresponding to @new_cpu_id.  The CPU interface mapping
  * is also updated.  Targets to other CPU interfaces are unchanged.
- * This must be called with IRQs locally disabled.
+ * This must be called from a task context and with IRQ and FIQ locally
+ * disabled.
  */
 void gic_migrate_target(unsigned int new_cpu_id)
 {
@@ -739,9 +751,9 @@  void gic_migrate_target(unsigned int new_cpu_id)
 	 * pending on the old cpu static. That means we can defer the
 	 * migration until after we have released the irq_controller_lock.
 	 */
-	raw_spin_lock(&cpu_map_migration_lock);
+	write_lock(&cpu_map_migration_lock);
 	gic_cpu_map[cpu] = 1 << new_cpu_id;
-	raw_spin_unlock(&cpu_map_migration_lock);
+	write_unlock(&cpu_map_migration_lock);
 
 	/*
 	 * Find all the peripheral interrupts targetting the current