Message ID | 20200710145642.28978-1-valentin.schneider@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] irqchip/gic: Implement irq_chip->irq_retrigger() | expand |
On Fri, 10 Jul 2020 15:56:42 +0100, Valentin Schneider <valentin.schneider@arm.com> wrote: > > While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come > to my attention that the IRQ resend situation seems a bit precarious for > the GIC(s). Below is a (somewhat structured) dump of my notes/thoughts > about that. > > IRQCHIP_EOI_IF_HANDLED > ====================== > > If the irqchip doesn't have this flag set, we may issue an irq_eoi() > despite not having handled the IRQ in the following cases: > > o !irq_may_run() > - IRQ may be in progress (handle_irq_event() ongoing) > - IRQ is an armed wakeup source (also sets it pending) > > o !action or IRQ disabled; in this case it is set as pending. > > irq_resend() > ============ > > For the above cases where the IRQ is marked as pending, it means that when > we'll go through check_irq_resend() (e.g. when we re-enable the IRQ) we > will end up in resend_irqs() which goes through the flow handler again, IOW > will issue *another* EOI on the same IRQ. > > Now, this is all done via a tasklet, so AFAICT cannot happen in irq > context (as defined by in_interrupt() - it can happen at the tail of > handling an IRQ if it wasn't nesting). > > GIC woes > ======== > > The TL;DR for IRQ handling on the GIC is that we have 3 steps: > o Acknowledgement (Ack) > o priority drop (PD) > o deactivation (D) > > The GIC also has an "eoimode" knob that says whether PD and D are split, IOW: > o eoimode=0: irq_eoi() does PD + D > o eoimode=1: irq_eoi() does D > > Regardless of the mode, it is paramount that any PD is > o issued on the same CPU that Ack'd the IRQ > o issued in reverse order of the Acks > > IHI0069D, 4.1.1 Physical CPU interface, Priority drop > """ > A priority drop must be performed by the same PE that activated the > interrupt. > > [...] > > For each CPU interface, the GIC architecture requires the order of the > valid writes to ICC_EOIR0_EL1 and ICC_EOIR1_EL1 to be the exact reverse of > the order of the reads from ICC_IAR0_EL1 and ICC_IAR1_EL1 > """ > > IHI0069D, 8.2.9 ICC_EOIR0_EL1, Interrupt Controller End Of Interrupt Register 0 > """ > A write to this register must correspond to the most recent valid read by > this PE from an Interrupt Acknowledge Register, and must correspond to the > INTID that was read from ICC_IAR0_EL1, otherwise the system behavior is > UNPREDICTABLE. > """ > > For eoimode=1, the PD is hidden from genirq and is contained within the GIC > driver. This means that issuing another irq_eoi() will only be re-issuing a > D, which I think the GIC can live with (even if issued from a different CPU). > > For eoimode=0, it is more troubling, as we break the aforementioned > restrictions. That said, IIUC this cannot cause e.g. a bogus running > priority reduction due to the tasklet context mentioned above (running > priority ought to be idle priority). > > Linux hosts will almost always use eoimode=1, so that leaves us with > guests running with eoimode=0, and I don't know how common it is (if at all > possible) for those to use pm / wakeup IRQs. I suppose that is a reason > why this hasn't cropped up before, that or I miserably misunderstood the > whole thing. > > In any case, the virtual interface follows the same restrictions wrt > PD ordering: > > IHI0069D 8.3.7 ICV_EOIR0_EL1, Interrupt Controller Virtual End Of Interrupt Register 0 > """ > A write to this register must correspond to the most recent valid read by > this vPE from a Virtual Interrupt Acknowledge Register, and must correspond > to the INTID that was read from ICV_IAR0_EL1, otherwise the system behavior > is UNPREDICTABLE. > """ > > Change > ====== > > One way to ensure we only get one PD per interrupt activation and maintain > the expected ordering is to add the IRQCHIP_EOI_IF_HANDLED flag to all > irq-gic chips, but that only really works for eoimode=1; for eoimode=0 that > would mean leaving the flow handler without having issued a PD at all. > > At the same time, this whole IRQS_PENDING & resend thing feels like > something we can handle in hardware: we can leverage > irq_chip.irq_retrigger() and use this to mark the interrupt as pending in > the GIC itself, in which case we *don't* want to have > IRQCHIP_EOI_IF_HANDLED (as the retrigger will lead to another ack+eoi > pair). > > Implement irq_chip.irq_retrigger() for both GICs. Although I am very grateful for the whole documentation, I'd rather have a slightly more condensed changelog that documents the implementation of the retrigger callback! ;-) > > Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> > --- > drivers/irqchip/irq-gic-v3.c | 7 +++++++ > drivers/irqchip/irq-gic.c | 6 ++++++ > 2 files changed, 13 insertions(+) > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index cc46bc2d634b..c025e8b51464 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -1207,6 +1207,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, > #define gic_smp_init() do { } while(0) > #endif > > +static int gic_retrigger(struct irq_data *data) > +{ > + return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); If I'm not mistaken, check_irq_resend() requires a non-zero return value if the retrigger has succeeded. So something like return !gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); would be more appropriate. Otherwise, looks good. Thanks, M.
On 10/07/20 17:29, Marc Zyngier wrote: > On Fri, 10 Jul 2020 15:56:42 +0100, > Valentin Schneider <valentin.schneider@arm.com> wrote: [...] >> Implement irq_chip.irq_retrigger() for both GICs. > > Although I am very grateful for the whole documentation, I'd rather > have a slightly more condensed changelog that documents the > implementation of the retrigger callback! ;-) > Hah, indeed! I was relatively unsure about that whole thing, hence why I sent it as RFC with a wall of text attached. I'll probably strip out the GIC doc snippets for the "actual" changelog, and talk about the *contents* of the patch some more. >> >> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> >> --- >> drivers/irqchip/irq-gic-v3.c | 7 +++++++ >> drivers/irqchip/irq-gic.c | 6 ++++++ >> 2 files changed, 13 insertions(+) >> >> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c >> index cc46bc2d634b..c025e8b51464 100644 >> --- a/drivers/irqchip/irq-gic-v3.c >> +++ b/drivers/irqchip/irq-gic-v3.c >> @@ -1207,6 +1207,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, >> #define gic_smp_init() do { } while(0) >> #endif >> >> +static int gic_retrigger(struct irq_data *data) >> +{ >> + return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); > > If I'm not mistaken, check_irq_resend() requires a non-zero return > value if the retrigger has succeeded. So something like > > return !gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); > > would be more appropriate. > Aye, you're right. And while we're at it, we probably still don't want to fallback to irq_sw_resend() if the retrigger fails, so we should add some irqd_set_handle_enforce_irqctx() somewhere in the GICs or plainly deselect CONFIG_HARDIRQS_SW_RESEND. I'm not very familiar with LPIs just yet, but seeing as they too use handle_fasteoi_irq() and can't get retriggered, I'd rather play it safe. This brings me to another point: while this boots just fine, I didn't get to test out IRQs marked with IRQS_PENDING. IIUC enable_irq_wake() should give me a decent trail - I see serial_core making use of it. I'll go give suspend a try. > Otherwise, looks good. > Thanks for having a look! > Thanks, > > M.
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index cc46bc2d634b..c025e8b51464 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1207,6 +1207,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, #define gic_smp_init() do { } while(0) #endif +static int gic_retrigger(struct irq_data *data) +{ + return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); +} + #ifdef CONFIG_CPU_PM static int gic_cpu_pm_notifier(struct notifier_block *self, unsigned long cmd, void *v) @@ -1242,6 +1247,7 @@ static struct irq_chip gic_chip = { .irq_eoi = gic_eoi_irq, .irq_set_type = gic_set_type, .irq_set_affinity = gic_set_affinity, + .irq_retrigger = gic_retrigger, .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, .irq_nmi_setup = gic_irq_nmi_setup, @@ -1258,6 +1264,7 @@ static struct irq_chip gic_eoimode1_chip = { .irq_eoi = gic_eoimode1_eoi_irq, .irq_set_type = gic_set_type, .irq_set_affinity = gic_set_affinity, + .irq_retrigger = gic_retrigger, .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, .irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity, diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 00de05abd3c3..33ce025868d0 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -355,6 +355,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, } #endif +static int gic_retrigger(struct irq_data *data) +{ + return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true); +} + static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) { u32 irqstat, irqnr; @@ -425,6 +430,7 @@ static const struct irq_chip gic_chip = { .irq_unmask = gic_unmask_irq, .irq_eoi = gic_eoi_irq, .irq_set_type = gic_set_type, + .irq_retrigger = gic_retrigger, .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, .flags = IRQCHIP_SET_TYPE_MASKED |
While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come to my attention that the IRQ resend situation seems a bit precarious for the GIC(s). Below is a (somewhat structured) dump of my notes/thoughts about that. IRQCHIP_EOI_IF_HANDLED ====================== If the irqchip doesn't have this flag set, we may issue an irq_eoi() despite not having handled the IRQ in the following cases: o !irq_may_run() - IRQ may be in progress (handle_irq_event() ongoing) - IRQ is an armed wakeup source (also sets it pending) o !action or IRQ disabled; in this case it is set as pending. irq_resend() ============ For the above cases where the IRQ is marked as pending, it means that when we'll go through check_irq_resend() (e.g. when we re-enable the IRQ) we will end up in resend_irqs() which goes through the flow handler again, IOW will issue *another* EOI on the same IRQ. Now, this is all done via a tasklet, so AFAICT cannot happen in irq context (as defined by in_interrupt() - it can happen at the tail of handling an IRQ if it wasn't nesting). GIC woes ======== The TL;DR for IRQ handling on the GIC is that we have 3 steps: o Acknowledgement (Ack) o priority drop (PD) o deactivation (D) The GIC also has an "eoimode" knob that says whether PD and D are split, IOW: o eoimode=0: irq_eoi() does PD + D o eoimode=1: irq_eoi() does D Regardless of the mode, it is paramount that any PD is o issued on the same CPU that Ack'd the IRQ o issued in reverse order of the Acks IHI0069D, 4.1.1 Physical CPU interface, Priority drop """ A priority drop must be performed by the same PE that activated the interrupt. [...] For each CPU interface, the GIC architecture requires the order of the valid writes to ICC_EOIR0_EL1 and ICC_EOIR1_EL1 to be the exact reverse of the order of the reads from ICC_IAR0_EL1 and ICC_IAR1_EL1 """ IHI0069D, 8.2.9 ICC_EOIR0_EL1, Interrupt Controller End Of Interrupt Register 0 """ A write to this register must correspond to the most recent valid read by this PE from an Interrupt Acknowledge Register, and must correspond to the INTID that was read from ICC_IAR0_EL1, otherwise the system behavior is UNPREDICTABLE. """ For eoimode=1, the PD is hidden from genirq and is contained within the GIC driver. This means that issuing another irq_eoi() will only be re-issuing a D, which I think the GIC can live with (even if issued from a different CPU). For eoimode=0, it is more troubling, as we break the aforementioned restrictions. That said, IIUC this cannot cause e.g. a bogus running priority reduction due to the tasklet context mentioned above (running priority ought to be idle priority). Linux hosts will almost always use eoimode=1, so that leaves us with guests running with eoimode=0, and I don't know how common it is (if at all possible) for those to use pm / wakeup IRQs. I suppose that is a reason why this hasn't cropped up before, that or I miserably misunderstood the whole thing. In any case, the virtual interface follows the same restrictions wrt PD ordering: IHI0069D 8.3.7 ICV_EOIR0_EL1, Interrupt Controller Virtual End Of Interrupt Register 0 """ A write to this register must correspond to the most recent valid read by this vPE from a Virtual Interrupt Acknowledge Register, and must correspond to the INTID that was read from ICV_IAR0_EL1, otherwise the system behavior is UNPREDICTABLE. """ Change ====== One way to ensure we only get one PD per interrupt activation and maintain the expected ordering is to add the IRQCHIP_EOI_IF_HANDLED flag to all irq-gic chips, but that only really works for eoimode=1; for eoimode=0 that would mean leaving the flow handler without having issued a PD at all. At the same time, this whole IRQS_PENDING & resend thing feels like something we can handle in hardware: we can leverage irq_chip.irq_retrigger() and use this to mark the interrupt as pending in the GIC itself, in which case we *don't* want to have IRQCHIP_EOI_IF_HANDLED (as the retrigger will lead to another ack+eoi pair). Implement irq_chip.irq_retrigger() for both GICs. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> --- drivers/irqchip/irq-gic-v3.c | 7 +++++++ drivers/irqchip/irq-gic.c | 6 ++++++ 2 files changed, 13 insertions(+)