Message ID | 20201124141449.572446-3-maz@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm/arm64: Allow the rescheduling IPI to bypass irq_enter/exit | expand |
On Tue, Nov 24, 2020 at 02:14:45PM +0000, Marc Zyngier wrote: > Some interrupts (such as the rescheduling IPI) rely on not going through > the irq_enter()/irq_exit() calls. To distinguish such interrupts, add > a new IRQ flag that allows the low-level handling code to sidestep the > enter()/exit() calls. Well, not quite. The scheduler_ipi() function is perfectly fine being called with irq_enter/irq_exit. As per this very series, that's your current reality. The function just doesn't need it.
On 2020-11-24 16:26, Peter Zijlstra wrote: > On Tue, Nov 24, 2020 at 02:14:45PM +0000, Marc Zyngier wrote: >> Some interrupts (such as the rescheduling IPI) rely on not going >> through >> the irq_enter()/irq_exit() calls. To distinguish such interrupts, add >> a new IRQ flag that allows the low-level handling code to sidestep the >> enter()/exit() calls. > > Well, not quite. The scheduler_ipi() function is perfectly fine being > called with irq_enter/irq_exit. As per this very series, that's your > current reality. > > The function just doesn't need it. Yup, definitely a very bad choice of words here. /me goes and repaint the commit message. Thanks, M.
Hi Marc, On 24/11/20 14:14, Marc Zyngier wrote: > @@ -680,14 +679,22 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, > * Some hardware gives randomly wrong interrupts. Rather > * than crashing, do something sensible. > */ > - if (unlikely(!irq || irq >= nr_irqs)) { > + if (unlikely(!irq || irq >= nr_irqs || !(desc = irq_to_desc(irq)))) { > ack_bad_irq(irq); > ret = -EINVAL; > + goto out; > + } > + > + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW) && > + unlikely(irq_settings_is_raw(desc))) { > + generic_handle_irq_desc(desc); If I got the RCU bits right from what Thomas mentioned in https://lore.kernel.org/r/87ft5q18qs.fsf@nanos.tec.linutronix.de https://lore.kernel.org/r/87lfewnmdz.fsf@nanos.tec.linutronix.de then we're still missing something to notify RCU in the case the IRQ hits the idle task. All I see on our entry path is trace_hardirqs_off(); ... irq_handler() handle_domain_irq(); ... trace_hardirqs_on(); so we do currently rely on handle_domain_irq()'s irq_enter() + irq_exit() for that. rcu_irq_enter() says CONFIG_RCU_EQS_DEBUG=y can detect missing bits, but I don't get any warnings with your series on my Juno. Now, irq_enter() gives us: rcu_irq_enter(); irq_enter_rcu() raise_softirq faffery; __irq_enter() irqtime accounting; preempt count + lockdep; } __irq_enter_raw() Looking at irqentry_enter() + DEFINE_IDTENTRY_SYSVEC_SIMPLE(), I *think* we would be fine with just: rcu_irq_enter(); __irq_enter_raw(); generic_handle_irq_desc() __irq_exit_raw(); rcu_irq_exit(); I tested that and it didn't explode (though I haven't managed to make CONFIG_RCU_EQS_DEBUG squeal). Also please note RCU isn't my forte, so the above may contain traces of bullcrap. > } else { > - generic_handle_irq(irq); > + irq_enter(); > + generic_handle_irq_desc(desc); > + irq_exit(); > } > > - irq_exit(); > +out: > set_irq_regs(old_regs); > return ret; > } [...] > @@ -180,3 +182,16 @@ static inline bool irq_settings_is_hidden(struct irq_desc *desc) > { > return desc->status_use_accessors & _IRQ_HIDDEN; > } > + > +static inline bool irq_settings_is_raw(struct irq_desc *desc) > +{ > + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW)) > + return desc->status_use_accessors & _IRQ_RAW; > + > + /* > + * Using IRQ_RAW on architectures that don't expect it is > + * likely to be wrong. > + */ > + WARN_ON_ONCE(1); Per __handle_domain_irq()'s short-circuit evaluation, this is only entered when the above config is enabled. Perhaps a better place to check for this would be in __irq_settings_clr_and_set(). > + return false; > +}
On Thu, Nov 26, 2020 at 06:18:33PM +0000, Valentin Schneider wrote: > If I got the RCU bits right from what Thomas mentioned in > > https://lore.kernel.org/r/87ft5q18qs.fsf@nanos.tec.linutronix.de > https://lore.kernel.org/r/87lfewnmdz.fsf@nanos.tec.linutronix.de > > then we're still missing something to notify RCU in the case the IRQ hits > the idle task. All I see on our entry path is > > trace_hardirqs_off(); > ... > irq_handler() > handle_domain_irq(); > ... > trace_hardirqs_on(); > > so we do currently rely on handle_domain_irq()'s irq_enter() + irq_exit() > for that. rcu_irq_enter() says CONFIG_RCU_EQS_DEBUG=y can detect missing > bits, but I don't get any warnings with your series on my Juno. The scheduler IPI really doesn't need RCU either ;-)
On 03/12/20 13:03, Peter Zijlstra wrote: > On Thu, Nov 26, 2020 at 06:18:33PM +0000, Valentin Schneider wrote: >> If I got the RCU bits right from what Thomas mentioned in >> >> https://lore.kernel.org/r/87ft5q18qs.fsf@nanos.tec.linutronix.de >> https://lore.kernel.org/r/87lfewnmdz.fsf@nanos.tec.linutronix.de >> >> then we're still missing something to notify RCU in the case the IRQ hits >> the idle task. All I see on our entry path is >> >> trace_hardirqs_off(); >> ... >> irq_handler() >> handle_domain_irq(); >> ... >> trace_hardirqs_on(); >> >> so we do currently rely on handle_domain_irq()'s irq_enter() + irq_exit() >> for that. rcu_irq_enter() says CONFIG_RCU_EQS_DEBUG=y can detect missing >> bits, but I don't get any warnings with your series on my Juno. > > The scheduler IPI really doesn't need RCU either ;-) Because it doesn't enter any new read-side section, right? But as with any other interrupt, we could then go through: preempt_schedule_irq() ~> pick_next_task_fair() -> newidle_balance() which does enter a read-side section, so RCU would need to be watching. Looking at kernel/entry/common.c:irqentry_exit_cond_resched(), it seems we do check for this via rcu_irq_exit_check_preempt(). I however cannot grok why irqentry_exit() *doesn't* call into preempt_schedule_irq() if RCU wasn't watching on IRQ entry, so I'm starting to question everything (again).
On 03/12/20 15:52, Valentin Schneider wrote: > On 03/12/20 13:03, Peter Zijlstra wrote: [...] >> The scheduler IPI really doesn't need RCU either ;-) [...] > But as with any other interrupt, we could then go through: > > preempt_schedule_irq() ~> pick_next_task_fair() -> newidle_balance() > > which does enter a read-side section, so RCU would need to be > watching. Looking at kernel/entry/common.c:irqentry_exit_cond_resched(), it > seems we do check for this via rcu_irq_exit_check_preempt(). > > I however cannot grok why irqentry_exit() *doesn't* call into > preempt_schedule_irq() if RCU wasn't watching on IRQ entry RCU wasn't watching on IRQ entry: -> we should be on the idle task -> no unvoluntary preemption for the idle task, scheduling always happens at the tail of the idle loop -> ignore what I've been saying, current patch is fine
Hi Marc, On Tue, Nov 24, 2020 at 02:14:45PM +0000, Marc Zyngier wrote: > Some interrupts (such as the rescheduling IPI) rely on not going through > the irq_enter()/irq_exit() calls. To distinguish such interrupts, add > a new IRQ flag that allows the low-level handling code to sidestep the > enter()/exit() calls. > > Only the architecture code is expected to use this. It will do the wrong > thing on normal interrupts. Note that this is a band-aid until we can > move to some more correct infrastructure (such as kernel/entry/common.c). > > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > include/linux/irq.h | 2 ++ > kernel/irq/Kconfig | 3 +++ > kernel/irq/debugfs.c | 1 + > kernel/irq/irqdesc.c | 17 ++++++++++++----- > kernel/irq/settings.h | 15 +++++++++++++++ > 5 files changed, 33 insertions(+), 5 deletions(-) [...] > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 1a7723604399..f5beee546a6f 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -667,10 +667,9 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, > { > struct pt_regs *old_regs = set_irq_regs(regs); > unsigned int irq = hwirq; > + struct irq_desc *desc; > int ret = 0; > > - irq_enter(); > - > #ifdef CONFIG_IRQ_DOMAIN > if (lookup) > irq = irq_find_mapping(domain, hwirq); > @@ -680,14 +679,22 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, > * Some hardware gives randomly wrong interrupts. Rather > * than crashing, do something sensible. > */ > - if (unlikely(!irq || irq >= nr_irqs)) { > + if (unlikely(!irq || irq >= nr_irqs || !(desc = irq_to_desc(irq)))) { > ack_bad_irq(irq); > ret = -EINVAL; > + goto out; > + } > + > + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW) && > + unlikely(irq_settings_is_raw(desc))) { > + generic_handle_irq_desc(desc); Based on tglx's previous comments, I was expecting to see calls to __irq_{enter,exit}_raw() around this. Are they hiding somewhere else or are they not needed? Will
On Tue, Nov 24, 2020 at 6:15 AM Marc Zyngier <maz@kernel.org> wrote: > > Some interrupts (such as the rescheduling IPI) rely on not going through > the irq_enter()/irq_exit() calls. To distinguish such interrupts, add > a new IRQ flag that allows the low-level handling code to sidestep the > enter()/exit() calls. > > Only the architecture code is expected to use this. It will do the wrong > thing on normal interrupts. Note that this is a band-aid until we can > move to some more correct infrastructure (such as kernel/entry/common.c). > > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > include/linux/irq.h | 2 ++ > kernel/irq/Kconfig | 3 +++ > kernel/irq/debugfs.c | 1 + > kernel/irq/irqdesc.c | 17 ++++++++++++----- > kernel/irq/settings.h | 15 +++++++++++++++ > 5 files changed, 33 insertions(+), 5 deletions(-) > > diff --git a/include/linux/irq.h b/include/linux/irq.h > index c55f218d5b61..605ba5949255 100644 > --- a/include/linux/irq.h > +++ b/include/linux/irq.h > @@ -72,6 +72,7 @@ enum irqchip_irq_state; > * mechanism and from core side polling. > * IRQ_DISABLE_UNLAZY - Disable lazy irq disable > * IRQ_HIDDEN - Don't show up in /proc/interrupts > + * IRQ_RAW - Skip tick management and irqtime accounting > */ > enum { > IRQ_TYPE_NONE = 0x00000000, > @@ -99,6 +100,7 @@ enum { > IRQ_IS_POLLED = (1 << 18), > IRQ_DISABLE_UNLAZY = (1 << 19), > IRQ_HIDDEN = (1 << 20), > + IRQ_RAW = (1 << 21), > }; > > #define IRQF_MODIFY_MASK \ > diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig > index 164a031cfdb6..ae9b13d5ee91 100644 > --- a/kernel/irq/Kconfig > +++ b/kernel/irq/Kconfig > @@ -109,6 +109,9 @@ config GENERIC_IRQ_MATRIX_ALLOCATOR > config GENERIC_IRQ_RESERVATION_MODE > bool > > +config ARCH_WANTS_IRQ_RAW > + bool > + > # Support forced irq threading > config IRQ_FORCED_THREADING > bool > diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c > index e4cff358b437..f53475d88072 100644 > --- a/kernel/irq/debugfs.c > +++ b/kernel/irq/debugfs.c > @@ -140,6 +140,7 @@ static const struct irq_bit_descr irqdesc_states[] = { > BIT_MASK_DESCR(_IRQ_IS_POLLED), > BIT_MASK_DESCR(_IRQ_DISABLE_UNLAZY), > BIT_MASK_DESCR(_IRQ_HIDDEN), > + BIT_MASK_DESCR(_IRQ_RAW), > }; > > static const struct irq_bit_descr irqdesc_istates[] = { > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 1a7723604399..f5beee546a6f 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -667,10 +667,9 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, > { > struct pt_regs *old_regs = set_irq_regs(regs); > unsigned int irq = hwirq; > + struct irq_desc *desc; > int ret = 0; > > - irq_enter(); > - > #ifdef CONFIG_IRQ_DOMAIN > if (lookup) > irq = irq_find_mapping(domain, hwirq); > @@ -680,14 +679,22 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, > * Some hardware gives randomly wrong interrupts. Rather > * than crashing, do something sensible. > */ > - if (unlikely(!irq || irq >= nr_irqs)) { > + if (unlikely(!irq || irq >= nr_irqs || !(desc = irq_to_desc(irq)))) { I see a checkpatch error here: ERROR:ASSIGN_IN_IF: do not use assignment in if condition #96: FILE: kernel/irq/irqdesc.c:682: > ack_bad_irq(irq); > ret = -EINVAL; > + goto out; > + } > + > + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW) && > + unlikely(irq_settings_is_raw(desc))) { > + generic_handle_irq_desc(desc); > } else { > - generic_handle_irq(irq); > + irq_enter(); > + generic_handle_irq_desc(desc); > + irq_exit(); > } > > - irq_exit(); > +out: > set_irq_regs(old_regs); > return ret; > } > diff --git a/kernel/irq/settings.h b/kernel/irq/settings.h > index 51acdf43eadc..0033d459fdac 100644 > --- a/kernel/irq/settings.h > +++ b/kernel/irq/settings.h > @@ -18,6 +18,7 @@ enum { > _IRQ_IS_POLLED = IRQ_IS_POLLED, > _IRQ_DISABLE_UNLAZY = IRQ_DISABLE_UNLAZY, > _IRQ_HIDDEN = IRQ_HIDDEN, > + _IRQ_RAW = IRQ_RAW, > _IRQF_MODIFY_MASK = IRQF_MODIFY_MASK, > }; > > @@ -33,6 +34,7 @@ enum { > #define IRQ_IS_POLLED GOT_YOU_MORON > #define IRQ_DISABLE_UNLAZY GOT_YOU_MORON > #define IRQ_HIDDEN GOT_YOU_MORON > +#define IRQ_RAW GOT_YOU_MORON > #undef IRQF_MODIFY_MASK > #define IRQF_MODIFY_MASK GOT_YOU_MORON > > @@ -180,3 +182,16 @@ static inline bool irq_settings_is_hidden(struct irq_desc *desc) > { > return desc->status_use_accessors & _IRQ_HIDDEN; > } > + > +static inline bool irq_settings_is_raw(struct irq_desc *desc) > +{ > + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW)) > + return desc->status_use_accessors & _IRQ_RAW; > + > + /* > + * Using IRQ_RAW on architectures that don't expect it is > + * likely to be wrong. > + */ > + WARN_ON_ONCE(1); > + return false; > +} > -- > 2.28.0 > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
diff --git a/include/linux/irq.h b/include/linux/irq.h index c55f218d5b61..605ba5949255 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -72,6 +72,7 @@ enum irqchip_irq_state; * mechanism and from core side polling. * IRQ_DISABLE_UNLAZY - Disable lazy irq disable * IRQ_HIDDEN - Don't show up in /proc/interrupts + * IRQ_RAW - Skip tick management and irqtime accounting */ enum { IRQ_TYPE_NONE = 0x00000000, @@ -99,6 +100,7 @@ enum { IRQ_IS_POLLED = (1 << 18), IRQ_DISABLE_UNLAZY = (1 << 19), IRQ_HIDDEN = (1 << 20), + IRQ_RAW = (1 << 21), }; #define IRQF_MODIFY_MASK \ diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 164a031cfdb6..ae9b13d5ee91 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -109,6 +109,9 @@ config GENERIC_IRQ_MATRIX_ALLOCATOR config GENERIC_IRQ_RESERVATION_MODE bool +config ARCH_WANTS_IRQ_RAW + bool + # Support forced irq threading config IRQ_FORCED_THREADING bool diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c index e4cff358b437..f53475d88072 100644 --- a/kernel/irq/debugfs.c +++ b/kernel/irq/debugfs.c @@ -140,6 +140,7 @@ static const struct irq_bit_descr irqdesc_states[] = { BIT_MASK_DESCR(_IRQ_IS_POLLED), BIT_MASK_DESCR(_IRQ_DISABLE_UNLAZY), BIT_MASK_DESCR(_IRQ_HIDDEN), + BIT_MASK_DESCR(_IRQ_RAW), }; static const struct irq_bit_descr irqdesc_istates[] = { diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 1a7723604399..f5beee546a6f 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -667,10 +667,9 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, { struct pt_regs *old_regs = set_irq_regs(regs); unsigned int irq = hwirq; + struct irq_desc *desc; int ret = 0; - irq_enter(); - #ifdef CONFIG_IRQ_DOMAIN if (lookup) irq = irq_find_mapping(domain, hwirq); @@ -680,14 +679,22 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq, * Some hardware gives randomly wrong interrupts. Rather * than crashing, do something sensible. */ - if (unlikely(!irq || irq >= nr_irqs)) { + if (unlikely(!irq || irq >= nr_irqs || !(desc = irq_to_desc(irq)))) { ack_bad_irq(irq); ret = -EINVAL; + goto out; + } + + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW) && + unlikely(irq_settings_is_raw(desc))) { + generic_handle_irq_desc(desc); } else { - generic_handle_irq(irq); + irq_enter(); + generic_handle_irq_desc(desc); + irq_exit(); } - irq_exit(); +out: set_irq_regs(old_regs); return ret; } diff --git a/kernel/irq/settings.h b/kernel/irq/settings.h index 51acdf43eadc..0033d459fdac 100644 --- a/kernel/irq/settings.h +++ b/kernel/irq/settings.h @@ -18,6 +18,7 @@ enum { _IRQ_IS_POLLED = IRQ_IS_POLLED, _IRQ_DISABLE_UNLAZY = IRQ_DISABLE_UNLAZY, _IRQ_HIDDEN = IRQ_HIDDEN, + _IRQ_RAW = IRQ_RAW, _IRQF_MODIFY_MASK = IRQF_MODIFY_MASK, }; @@ -33,6 +34,7 @@ enum { #define IRQ_IS_POLLED GOT_YOU_MORON #define IRQ_DISABLE_UNLAZY GOT_YOU_MORON #define IRQ_HIDDEN GOT_YOU_MORON +#define IRQ_RAW GOT_YOU_MORON #undef IRQF_MODIFY_MASK #define IRQF_MODIFY_MASK GOT_YOU_MORON @@ -180,3 +182,16 @@ static inline bool irq_settings_is_hidden(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_HIDDEN; } + +static inline bool irq_settings_is_raw(struct irq_desc *desc) +{ + if (IS_ENABLED(CONFIG_ARCH_WANTS_IRQ_RAW)) + return desc->status_use_accessors & _IRQ_RAW; + + /* + * Using IRQ_RAW on architectures that don't expect it is + * likely to be wrong. + */ + WARN_ON_ONCE(1); + return false; +}
Some interrupts (such as the rescheduling IPI) rely on not going through the irq_enter()/irq_exit() calls. To distinguish such interrupts, add a new IRQ flag that allows the low-level handling code to sidestep the enter()/exit() calls. Only the architecture code is expected to use this. It will do the wrong thing on normal interrupts. Note that this is a band-aid until we can move to some more correct infrastructure (such as kernel/entry/common.c). Signed-off-by: Marc Zyngier <maz@kernel.org> --- include/linux/irq.h | 2 ++ kernel/irq/Kconfig | 3 +++ kernel/irq/debugfs.c | 1 + kernel/irq/irqdesc.c | 17 ++++++++++++----- kernel/irq/settings.h | 15 +++++++++++++++ 5 files changed, 33 insertions(+), 5 deletions(-)