Message ID | 1415968543-29469-2-git-send-email-daniel.thompson@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 14 Nov 2014, Daniel Thompson wrote: > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c > index 38493ff28fa5..0db62a6f1ee3 100644 > --- a/drivers/irqchip/irq-gic.c > +++ b/drivers/irqchip/irq-gic.c > @@ -73,6 +73,13 @@ struct gic_chip_data { > static DEFINE_RAW_SPINLOCK(irq_controller_lock); > > /* > + * This lock may be locked for reading by FIQ handlers. Thus although > + * read locking may be used liberally, write locking must only take > + * place only when local FIQ handling is disabled. > + */ > +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); > + > +/* > * The GIC mapping of CPU interfaces does not necessarily match > * the logical CPU numbering. Let's use a mapping as returned > * by the GIC itself. > @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) > int cpu; > unsigned long flags, map = 0; > > - raw_spin_lock_irqsave(&irq_controller_lock, flags); > + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); Just for the record: You might have noticed that you replace a raw lock with a non raw one. That's not an issue on mainline, but that pretty much renders that code broken for RT. Surely nothing I worry too much about given the current state of RT. Thanks, tglx
On 24/11/14 18:20, Thomas Gleixner wrote: > On Fri, 14 Nov 2014, Daniel Thompson wrote: >> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c >> index 38493ff28fa5..0db62a6f1ee3 100644 >> --- a/drivers/irqchip/irq-gic.c >> +++ b/drivers/irqchip/irq-gic.c >> @@ -73,6 +73,13 @@ struct gic_chip_data { >> static DEFINE_RAW_SPINLOCK(irq_controller_lock); >> >> /* >> + * This lock may be locked for reading by FIQ handlers. Thus although >> + * read locking may be used liberally, write locking must only take >> + * place only when local FIQ handling is disabled. >> + */ >> +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); >> + >> +/* >> * The GIC mapping of CPU interfaces does not necessarily match >> * the logical CPU numbering. Let's use a mapping as returned >> * by the GIC itself. >> @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) >> int cpu; >> unsigned long flags, map = 0; >> >> - raw_spin_lock_irqsave(&irq_controller_lock, flags); >> + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); > > Just for the record: > > You might have noticed that you replace a raw lock with a non raw > one. That's not an issue on mainline, but that pretty much renders > that code broken for RT. Indeed. For that reason I've been pretty anxious to hear your views on this one. Older versions of this patch did retain the raw lock but the code ends up looking a bit weird and resulted in negative comments during review: if (in_nmi()) raw_spin_lock(&fiq_exclusive_cpu_map_lock); else raw_spin_lock_irqsave(&irq_controller_lock, flags); The above form relies for correctness on the fact the b.L switcher code can take both locks and already runs with FIQ disabled. > Surely nothing I worry too much about given the current state of RT. Hobby or not, I don't want to make your work here any harder. I could go back to the old form. Alternatively I could provide a patch to go in -rt that converts the rw locks to spin locks but that just sounds like a maintenance hassle for you. Daniel.
On Mon, 24 Nov 2014, Thomas Gleixner wrote: > On Fri, 14 Nov 2014, Daniel Thompson wrote: > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c > > index 38493ff28fa5..0db62a6f1ee3 100644 > > --- a/drivers/irqchip/irq-gic.c > > +++ b/drivers/irqchip/irq-gic.c > > @@ -73,6 +73,13 @@ struct gic_chip_data { > > static DEFINE_RAW_SPINLOCK(irq_controller_lock); > > > > /* > > + * This lock may be locked for reading by FIQ handlers. Thus although > > + * read locking may be used liberally, write locking must only take > > + * place only when local FIQ handling is disabled. > > + */ > > +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); > > + > > +/* > > * The GIC mapping of CPU interfaces does not necessarily match > > * the logical CPU numbering. Let's use a mapping as returned > > * by the GIC itself. > > @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) > > int cpu; > > unsigned long flags, map = 0; > > > > - raw_spin_lock_irqsave(&irq_controller_lock, flags); > > + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); > > Just for the record: > > You might have noticed that you replace a raw lock with a non raw > one. That's not an issue on mainline, but that pretty much renders > that code broken for RT. > > Surely nothing I worry too much about given the current state of RT. And having a second thought here. Looking at the protection scope independent of the spin vs. rw lock gic_raise_softirq() lock(); /* Does not need any protection */ for_each_cpu(cpu, mask) map |= gic_cpu_map[cpu]; /* * Can be outside the lock region as well as it makes sure * that previous writes (usually the IPI data) are visible * before the write to the SOFTINT register. */ dmb(ishst); /* Why needs this protection? */ write(map, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT)); unlock(); gic_migrate_target() .... lock(); /* Migrate all peripheral interrupts */ unlock(); So what's the point of that protection? gic_raise_softirq() is used to send IPIs, which are PPIs on the target CPUs so they are not affected from the migration of the peripheral interrupts at all. The write to the SOFTINT register in gic_migrate_target() is not inside the lock region. So what's serialized by the lock in gic_raise_softirq() at all? Either I'm missing something really important here or this locking exercise in gic_raise_softirq() and therefor the rwlock conversion is completely pointless. Thanks, tglx
On 24/11/14 18:48, Thomas Gleixner wrote: > On Mon, 24 Nov 2014, Thomas Gleixner wrote: >> On Fri, 14 Nov 2014, Daniel Thompson wrote: >>> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c >>> index 38493ff28fa5..0db62a6f1ee3 100644 >>> --- a/drivers/irqchip/irq-gic.c >>> +++ b/drivers/irqchip/irq-gic.c >>> @@ -73,6 +73,13 @@ struct gic_chip_data { >>> static DEFINE_RAW_SPINLOCK(irq_controller_lock); >>> >>> /* >>> + * This lock may be locked for reading by FIQ handlers. Thus although >>> + * read locking may be used liberally, write locking must only take >>> + * place only when local FIQ handling is disabled. >>> + */ >>> +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); >>> + >>> +/* >>> * The GIC mapping of CPU interfaces does not necessarily match >>> * the logical CPU numbering. Let's use a mapping as returned >>> * by the GIC itself. >>> @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) >>> int cpu; >>> unsigned long flags, map = 0; >>> >>> - raw_spin_lock_irqsave(&irq_controller_lock, flags); >>> + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); >> >> Just for the record: >> >> You might have noticed that you replace a raw lock with a non raw >> one. That's not an issue on mainline, but that pretty much renders >> that code broken for RT. >> >> Surely nothing I worry too much about given the current state of RT. > > And having a second thought here. Looking at the protection scope > independent of the spin vs. rw lock > > gic_raise_softirq() > > lock(); > > /* Does not need any protection */ > for_each_cpu(cpu, mask) > map |= gic_cpu_map[cpu]; > > /* > * Can be outside the lock region as well as it makes sure > * that previous writes (usually the IPI data) are visible > * before the write to the SOFTINT register. > */ > dmb(ishst); > > /* Why needs this protection? */ > write(map, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT)); If gic_cpu_map changed value during the execution of this function then this could raise an IPI on the wrong CPU. Most of the rest of this mail is explaining how this could happen. > unlock(); > > gic_migrate_target() > > .... > lock(); Also: Value of gic_cpu_map is updated. > /* Migrate all peripheral interrupts */ > > unlock(); Also: /* Migrate all IPIs pending on the old core */ > So what's the point of that protection? > > gic_raise_softirq() is used to send IPIs, which are PPIs on the target > CPUs so they are not affected from the migration of the peripheral > interrupts at all. > > The write to the SOFTINT register in gic_migrate_target() is not > inside the lock region. So what's serialized by the lock in > gic_raise_softirq() at all? At the point that gic_migrate_target() takes the lock it knows that no further IPIs can be submitted. Once the gic_cpu_map is updated we can permit new IPIs to be submitted because these will be routed to the correct core. As a result we don't actually need to hold the lock to migrate the pending IPIs since we know that no new IPIs can possibly be sent to the wrong core. > Either I'm missing something really important here or this locking > exercise in gic_raise_softirq() and therefor the rwlock conversion is > completely pointless. I did want to remove the lock too. However when I reviewed this code I concluded the lock was still required. Without it I think it is possible for gic_raise_softirq() to raise an IPI on the old core *after* the code to migrate pending IPIs has been run. Daniel.
On Mon, 24 Nov 2014, Thomas Gleixner wrote: > On Mon, 24 Nov 2014, Thomas Gleixner wrote: > > On Fri, 14 Nov 2014, Daniel Thompson wrote: > > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c > > > index 38493ff28fa5..0db62a6f1ee3 100644 > > > --- a/drivers/irqchip/irq-gic.c > > > +++ b/drivers/irqchip/irq-gic.c > > > @@ -73,6 +73,13 @@ struct gic_chip_data { > > > static DEFINE_RAW_SPINLOCK(irq_controller_lock); > > > > > > /* > > > + * This lock may be locked for reading by FIQ handlers. Thus although > > > + * read locking may be used liberally, write locking must only take > > > + * place only when local FIQ handling is disabled. > > > + */ > > > +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); > > > + > > > +/* > > > * The GIC mapping of CPU interfaces does not necessarily match > > > * the logical CPU numbering. Let's use a mapping as returned > > > * by the GIC itself. > > > @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) > > > int cpu; > > > unsigned long flags, map = 0; > > > > > > - raw_spin_lock_irqsave(&irq_controller_lock, flags); > > > + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); > > > > Just for the record: > > > > You might have noticed that you replace a raw lock with a non raw > > one. That's not an issue on mainline, but that pretty much renders > > that code broken for RT. > > > > Surely nothing I worry too much about given the current state of RT. > > And having a second thought here. Looking at the protection scope > independent of the spin vs. rw lock > > gic_raise_softirq() > > lock(); > > /* Does not need any protection */ > for_each_cpu(cpu, mask) > map |= gic_cpu_map[cpu]; > > /* > * Can be outside the lock region as well as it makes sure > * that previous writes (usually the IPI data) are visible > * before the write to the SOFTINT register. > */ > dmb(ishst); > > /* Why needs this protection? */ > write(map, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT)); > > unlock(); > > gic_migrate_target() > > .... > lock(); > > /* Migrate all peripheral interrupts */ > > unlock(); > > So what's the point of that protection? > > gic_raise_softirq() is used to send IPIs, which are PPIs on the target > CPUs so they are not affected from the migration of the peripheral > interrupts at all. > > The write to the SOFTINT register in gic_migrate_target() is not > inside the lock region. So what's serialized by the lock in > gic_raise_softirq() at all? > > Either I'm missing something really important here or this locking > exercise in gic_raise_softirq() and therefor the rwlock conversion is > completely pointless. Thanks to Marc I figured it out now what I'm missing. That stuff is part of the bl switcher horror. Well documented as all of that ... So the lock protects against an IPI being sent to the current cpu while the target map is redirected and the pending state of the current cpu is migrated to another cpu. It's not your fault, that the initial authors of that just abused irq_controller_lock for that purpose instead of introducing a seperate lock with a clear description of the protection scope in the first place. Now you came up with the rw lock to handle the following FIQ related case: gic_raise_softirq() lock(x); ---> FIQ handle_fiq() gic_raise_softirq() lock(x); <-- Live lock Now the rwlock lets you avoid that, and it only lets you avoid that because rwlocks are not fair. So while I cannot come up with a brilliant replacement, it would be really helpful documentation wise if you could do the following: 1) Create a patch which introduces irq_migration_lock as a raw spinlock and replaces the usage of irq_controller_lock in gic_raise_softirq() and gic_migrate_target() along with a proper explanation in the code and the changelog of course. 2) Make the rwlock conversion on top of that with a proper documentation in the code of the only relevant reason (See above). The protection scope which prevents IPIs being sent while switching over is still the same and not affected. That's not the first time that I stumble over this bl switcher mess which got boltet into the kernel mindlessly. If the scope of the issue would have been clear up front, I wouldn't have complained about the RT relevance for this as it is simple to either disable FIQs for RT or just handle the above case differently. Thanks, tglx
On Mon, 24 Nov 2014, Daniel Thompson wrote: > I did want to remove the lock too. However when I reviewed this code I > concluded the lock was still required. Without it I think it is possible > for gic_raise_softirq() to raise an IPI on the old core *after* the code > to migrate pending IPIs has been run. And I bet it took you quite some time to figure that out from that overly documented abuse of irq_controller_lock. See my other reply. Thanks, tglx
On 24/11/14 20:38, Thomas Gleixner wrote: > On Mon, 24 Nov 2014, Thomas Gleixner wrote: >> On Mon, 24 Nov 2014, Thomas Gleixner wrote: >>> On Fri, 14 Nov 2014, Daniel Thompson wrote: >>>> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c >>>> index 38493ff28fa5..0db62a6f1ee3 100644 >>>> --- a/drivers/irqchip/irq-gic.c >>>> +++ b/drivers/irqchip/irq-gic.c >>>> @@ -73,6 +73,13 @@ struct gic_chip_data { >>>> static DEFINE_RAW_SPINLOCK(irq_controller_lock); >>>> >>>> /* >>>> + * This lock may be locked for reading by FIQ handlers. Thus although >>>> + * read locking may be used liberally, write locking must only take >>>> + * place only when local FIQ handling is disabled. >>>> + */ >>>> +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); >>>> + >>>> +/* >>>> * The GIC mapping of CPU interfaces does not necessarily match >>>> * the logical CPU numbering. Let's use a mapping as returned >>>> * by the GIC itself. >>>> @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) >>>> int cpu; >>>> unsigned long flags, map = 0; >>>> >>>> - raw_spin_lock_irqsave(&irq_controller_lock, flags); >>>> + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); >>> >>> Just for the record: >>> >>> You might have noticed that you replace a raw lock with a non raw >>> one. That's not an issue on mainline, but that pretty much renders >>> that code broken for RT. >>> >>> Surely nothing I worry too much about given the current state of RT. >> >> And having a second thought here. Looking at the protection scope >> independent of the spin vs. rw lock >> >> gic_raise_softirq() >> >> lock(); >> >> /* Does not need any protection */ >> for_each_cpu(cpu, mask) >> map |= gic_cpu_map[cpu]; >> >> /* >> * Can be outside the lock region as well as it makes sure >> * that previous writes (usually the IPI data) are visible >> * before the write to the SOFTINT register. >> */ >> dmb(ishst); >> >> /* Why needs this protection? */ >> write(map, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT)); >> >> unlock(); >> >> gic_migrate_target() >> >> .... >> lock(); >> >> /* Migrate all peripheral interrupts */ >> >> unlock(); >> >> So what's the point of that protection? >> >> gic_raise_softirq() is used to send IPIs, which are PPIs on the target >> CPUs so they are not affected from the migration of the peripheral >> interrupts at all. >> >> The write to the SOFTINT register in gic_migrate_target() is not >> inside the lock region. So what's serialized by the lock in >> gic_raise_softirq() at all? >> >> Either I'm missing something really important here or this locking >> exercise in gic_raise_softirq() and therefor the rwlock conversion is >> completely pointless. > > Thanks to Marc I figured it out now what I'm missing. That stuff is > part of the bl switcher horror. Well documented as all of that ... > > So the lock protects against an IPI being sent to the current cpu > while the target map is redirected and the pending state of the > current cpu is migrated to another cpu. > > It's not your fault, that the initial authors of that just abused > irq_controller_lock for that purpose instead of introducing a seperate > lock with a clear description of the protection scope in the first > place. > > Now you came up with the rw lock to handle the following FIQ related > case: > gic_raise_softirq() > lock(x); > ---> FIQ > handle_fiq() > gic_raise_softirq() > lock(x); <-- Live lock > > Now the rwlock lets you avoid that, and it only lets you avoid that > because rwlocks are not fair. > > So while I cannot come up with a brilliant replacement, it would be > really helpful documentation wise if you could do the following: > > 1) Create a patch which introduces irq_migration_lock as a raw > spinlock and replaces the usage of irq_controller_lock in > gic_raise_softirq() and gic_migrate_target() along with a proper > explanation in the code and the changelog of course. Replace irq_controller_lock or augment it with a new one? irq_raise_softirq() can share a single r/w lock with irq_set_affinity() because irq_set_affinity() would have to lock it for writing and that would bring the deadlock back for a badly timed FIQ. Thus if we want calls to gic_raise_softirq() to be FIQ-safe there there must be two locks taken in gic_migrate_target(). We can eliminate irq_controller_lock but we cannot replace it with one r/w lock. > 2) Make the rwlock conversion on top of that with a proper > documentation in the code of the only relevant reason (See above). > > The protection scope which prevents IPIs being sent while switching > over is still the same and not affected. > > That's not the first time that I stumble over this bl switcher mess > which got boltet into the kernel mindlessly. > > If the scope of the issue would have been clear up front, I wouldn't > have complained about the RT relevance for this as it is simple to > either disable FIQs for RT or just handle the above case differently. > > Thanks, > > tglx >
On 24/11/14 20:41, Thomas Gleixner wrote: > On Mon, 24 Nov 2014, Daniel Thompson wrote: >> I did want to remove the lock too. However when I reviewed this code I >> concluded the lock was still required. Without it I think it is possible >> for gic_raise_softirq() to raise an IPI on the old core *after* the code >> to migrate pending IPIs has been run. > > And I bet it took you quite some time to figure that out from that > overly documented abuse of irq_controller_lock. See my other reply. Yes. It did take quite some time, although compared to some of the other FIQ/NMI-safety reviews I've been doing recently it could be worse. ;-)
On Mon, 24 Nov 2014, Daniel Thompson wrote: > On 24/11/14 20:38, Thomas Gleixner wrote: > > So while I cannot come up with a brilliant replacement, it would be > > really helpful documentation wise if you could do the following: > > > > 1) Create a patch which introduces irq_migration_lock as a raw > > spinlock and replaces the usage of irq_controller_lock in > > gic_raise_softirq() and gic_migrate_target() along with a proper > > explanation in the code and the changelog of course. > > Replace irq_controller_lock or augment it with a new one? Replace it in gic_raise_softirq() and add it to gic_migrate_target() as you did with the RW lock. Thanks, tglx
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 38493ff28fa5..0db62a6f1ee3 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -73,6 +73,13 @@ struct gic_chip_data { static DEFINE_RAW_SPINLOCK(irq_controller_lock); /* + * This lock may be locked for reading by FIQ handlers. Thus although + * read locking may be used liberally, write locking must only take + * place only when local FIQ handling is disabled. + */ +static DEFINE_RWLOCK(fiq_safe_cpu_map_lock); + +/* * The GIC mapping of CPU interfaces does not necessarily match * the logical CPU numbering. Let's use a mapping as returned * by the GIC itself. @@ -624,7 +631,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) int cpu; unsigned long flags, map = 0; - raw_spin_lock_irqsave(&irq_controller_lock, flags); + read_lock_irqsave(&fiq_safe_cpu_map_lock, flags); /* Convert our logical CPU mask into a physical one. */ for_each_cpu(cpu, mask) @@ -639,7 +646,7 @@ static void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) /* this always happens on GIC0 */ writel_relaxed(map << 16 | irq, gic_data_dist_base(&gic_data[0]) + GIC_DIST_SOFTINT); - raw_spin_unlock_irqrestore(&irq_controller_lock, flags); + read_unlock_irqrestore(&fiq_safe_cpu_map_lock, flags); } #endif @@ -687,7 +694,7 @@ int gic_get_cpu_id(unsigned int cpu) * Migrate all peripheral interrupts with a target matching the current CPU * to the interface corresponding to @new_cpu_id. The CPU interface mapping * is also updated. Targets to other CPU interfaces are unchanged. - * This must be called with IRQs locally disabled. + * This must be called with IRQ and FIQ locally disabled. */ void gic_migrate_target(unsigned int new_cpu_id) { @@ -709,6 +716,7 @@ void gic_migrate_target(unsigned int new_cpu_id) ror_val = (cur_cpu_id - new_cpu_id) & 31; raw_spin_lock(&irq_controller_lock); + write_lock(&fiq_safe_cpu_map_lock); /* Update the target interface for this logical CPU */ gic_cpu_map[cpu] = 1 << new_cpu_id; @@ -728,6 +736,7 @@ void gic_migrate_target(unsigned int new_cpu_id) } } + write_unlock(&fiq_safe_cpu_map_lock); raw_spin_unlock(&irq_controller_lock); /*
Currently calling printk() from a FIQ can result in deadlock on irq_controller_lock within gic_raise_softirq(). This occurs because printk(), which is otherwise structured to survive calls from FIQ/NMI, calls this function to raise an IPI when it needs to wake_up_klogd(). This patch fixes the problem by introducing an additional rwlock and using that to prevent softirqs being raised whilst the b.L switcher is updating the cpu map. Other parts of the code are not updated to use the new fiq_safe_cpu_map_lock because other users of gic_cpu_map either rely on external locking or upon irq_controller_lock. Both locks are held by the b.L switcher code. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Marc Zyngier <marc.zyngier@arm.com> --- drivers/irqchip/irq-gic.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)