diff mbox series

[RFC] irqchip/gic: Implement irq_chip->irq_retrigger()

Message ID 20200710145642.28978-1-valentin.schneider@arm.com (mailing list archive)
State New, archived
Headers show
Series [RFC] irqchip/gic: Implement irq_chip->irq_retrigger() | expand

Commit Message

Valentin Schneider July 10, 2020, 2:56 p.m. UTC
While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come
to my attention that the IRQ resend situation seems a bit precarious for
the GIC(s). Below is a (somewhat structured) dump of my notes/thoughts
about that.

IRQCHIP_EOI_IF_HANDLED
======================

If the irqchip doesn't have this flag set, we may issue an irq_eoi()
despite not having handled the IRQ in the following cases:

o !irq_may_run()
  - IRQ may be in progress (handle_irq_event() ongoing)
  - IRQ is an armed wakeup source (also sets it pending)

o !action or IRQ disabled; in this case it is set as pending.

irq_resend()
============

For the above cases where the IRQ is marked as pending, it means that when
we'll go through check_irq_resend() (e.g. when we re-enable the IRQ) we
will end up in resend_irqs() which goes through the flow handler again, IOW
will issue *another* EOI on the same IRQ.

Now, this is all done via a tasklet, so AFAICT cannot happen in irq
context (as defined by in_interrupt() - it can happen at the tail of
handling an IRQ if it wasn't nesting).

GIC woes
========

The TL;DR for IRQ handling on the GIC is that we have 3 steps:
o Acknowledgement (Ack)
o priority drop (PD)
o deactivation (D)

The GIC also has an "eoimode" knob that says whether PD and D are split, IOW:
o eoimode=0: irq_eoi() does PD + D
o eoimode=1: irq_eoi() does D

Regardless of the mode, it is paramount that any PD is
o issued on the same CPU that Ack'd the IRQ
o issued in reverse order of the Acks

IHI0069D, 4.1.1 Physical CPU interface, Priority drop
"""
A priority drop must be performed by the same PE that activated the
interrupt.

[...]

For each CPU interface, the GIC architecture requires the order of the
valid writes to ICC_EOIR0_EL1 and ICC_EOIR1_EL1 to be the exact reverse of
the order of the reads from ICC_IAR0_EL1 and ICC_IAR1_EL1
"""

IHI0069D, 8.2.9 ICC_EOIR0_EL1, Interrupt Controller End Of Interrupt Register 0
"""
A write to this register must correspond to the most recent valid read by
this PE from an Interrupt Acknowledge Register, and must correspond to the
INTID that was read from ICC_IAR0_EL1, otherwise the system behavior is
UNPREDICTABLE.
"""

For eoimode=1, the PD is hidden from genirq and is contained within the GIC
driver. This means that issuing another irq_eoi() will only be re-issuing a
D, which I think the GIC can live with (even if issued from a different CPU).

For eoimode=0, it is more troubling, as we break the aforementioned
restrictions. That said, IIUC this cannot cause e.g. a bogus running
priority reduction due to the tasklet context mentioned above (running
priority ought to be idle priority).

Linux hosts will almost always use eoimode=1, so that leaves us with
guests running with eoimode=0, and I don't know how common it is (if at all
possible) for those to use pm / wakeup IRQs. I suppose that is a reason
why this hasn't cropped up before, that or I miserably misunderstood the
whole thing.

In any case, the virtual interface follows the same restrictions wrt
PD ordering:

IHI0069D 8.3.7 ICV_EOIR0_EL1, Interrupt Controller Virtual End Of Interrupt Register 0
"""
A write to this register must correspond to the most recent valid read by
this vPE from a Virtual Interrupt Acknowledge Register, and must correspond
to the INTID that was read from ICV_IAR0_EL1, otherwise the system behavior
is UNPREDICTABLE.
"""

Change
======

One way to ensure we only get one PD per interrupt activation and maintain
the expected ordering is to add the IRQCHIP_EOI_IF_HANDLED flag to all
irq-gic chips, but that only really works for eoimode=1; for eoimode=0 that
would mean leaving the flow handler without having issued a PD at all.

At the same time, this whole IRQS_PENDING & resend thing feels like
something we can handle in hardware: we can leverage
irq_chip.irq_retrigger() and use this to mark the interrupt as pending in
the GIC itself, in which case we *don't* want to have
IRQCHIP_EOI_IF_HANDLED (as the retrigger will lead to another ack+eoi
pair).

Implement irq_chip.irq_retrigger() for both GICs.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 drivers/irqchip/irq-gic-v3.c | 7 +++++++
 drivers/irqchip/irq-gic.c    | 6 ++++++
 2 files changed, 13 insertions(+)

Comments

Marc Zyngier July 10, 2020, 4:29 p.m. UTC | #1
On Fri, 10 Jul 2020 15:56:42 +0100,
Valentin Schneider <valentin.schneider@arm.com> wrote:
> 
> While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come
> to my attention that the IRQ resend situation seems a bit precarious for
> the GIC(s). Below is a (somewhat structured) dump of my notes/thoughts
> about that.
> 
> IRQCHIP_EOI_IF_HANDLED
> ======================
> 
> If the irqchip doesn't have this flag set, we may issue an irq_eoi()
> despite not having handled the IRQ in the following cases:
> 
> o !irq_may_run()
>   - IRQ may be in progress (handle_irq_event() ongoing)
>   - IRQ is an armed wakeup source (also sets it pending)
> 
> o !action or IRQ disabled; in this case it is set as pending.
> 
> irq_resend()
> ============
> 
> For the above cases where the IRQ is marked as pending, it means that when
> we'll go through check_irq_resend() (e.g. when we re-enable the IRQ) we
> will end up in resend_irqs() which goes through the flow handler again, IOW
> will issue *another* EOI on the same IRQ.
> 
> Now, this is all done via a tasklet, so AFAICT cannot happen in irq
> context (as defined by in_interrupt() - it can happen at the tail of
> handling an IRQ if it wasn't nesting).
> 
> GIC woes
> ========
> 
> The TL;DR for IRQ handling on the GIC is that we have 3 steps:
> o Acknowledgement (Ack)
> o priority drop (PD)
> o deactivation (D)
> 
> The GIC also has an "eoimode" knob that says whether PD and D are split, IOW:
> o eoimode=0: irq_eoi() does PD + D
> o eoimode=1: irq_eoi() does D
> 
> Regardless of the mode, it is paramount that any PD is
> o issued on the same CPU that Ack'd the IRQ
> o issued in reverse order of the Acks
> 
> IHI0069D, 4.1.1 Physical CPU interface, Priority drop
> """
> A priority drop must be performed by the same PE that activated the
> interrupt.
> 
> [...]
> 
> For each CPU interface, the GIC architecture requires the order of the
> valid writes to ICC_EOIR0_EL1 and ICC_EOIR1_EL1 to be the exact reverse of
> the order of the reads from ICC_IAR0_EL1 and ICC_IAR1_EL1
> """
> 
> IHI0069D, 8.2.9 ICC_EOIR0_EL1, Interrupt Controller End Of Interrupt Register 0
> """
> A write to this register must correspond to the most recent valid read by
> this PE from an Interrupt Acknowledge Register, and must correspond to the
> INTID that was read from ICC_IAR0_EL1, otherwise the system behavior is
> UNPREDICTABLE.
> """
> 
> For eoimode=1, the PD is hidden from genirq and is contained within the GIC
> driver. This means that issuing another irq_eoi() will only be re-issuing a
> D, which I think the GIC can live with (even if issued from a different CPU).
> 
> For eoimode=0, it is more troubling, as we break the aforementioned
> restrictions. That said, IIUC this cannot cause e.g. a bogus running
> priority reduction due to the tasklet context mentioned above (running
> priority ought to be idle priority).
> 
> Linux hosts will almost always use eoimode=1, so that leaves us with
> guests running with eoimode=0, and I don't know how common it is (if at all
> possible) for those to use pm / wakeup IRQs. I suppose that is a reason
> why this hasn't cropped up before, that or I miserably misunderstood the
> whole thing.
> 
> In any case, the virtual interface follows the same restrictions wrt
> PD ordering:
> 
> IHI0069D 8.3.7 ICV_EOIR0_EL1, Interrupt Controller Virtual End Of Interrupt Register 0
> """
> A write to this register must correspond to the most recent valid read by
> this vPE from a Virtual Interrupt Acknowledge Register, and must correspond
> to the INTID that was read from ICV_IAR0_EL1, otherwise the system behavior
> is UNPREDICTABLE.
> """
> 
> Change
> ======
> 
> One way to ensure we only get one PD per interrupt activation and maintain
> the expected ordering is to add the IRQCHIP_EOI_IF_HANDLED flag to all
> irq-gic chips, but that only really works for eoimode=1; for eoimode=0 that
> would mean leaving the flow handler without having issued a PD at all.
> 
> At the same time, this whole IRQS_PENDING & resend thing feels like
> something we can handle in hardware: we can leverage
> irq_chip.irq_retrigger() and use this to mark the interrupt as pending in
> the GIC itself, in which case we *don't* want to have
> IRQCHIP_EOI_IF_HANDLED (as the retrigger will lead to another ack+eoi
> pair).
> 
> Implement irq_chip.irq_retrigger() for both GICs.

Although I am very grateful for the whole documentation, I'd rather
have a slightly more condensed changelog that documents the
implementation of the retrigger callback! ;-)

> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  drivers/irqchip/irq-gic-v3.c | 7 +++++++
>  drivers/irqchip/irq-gic.c    | 6 ++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index cc46bc2d634b..c025e8b51464 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1207,6 +1207,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>  #define gic_smp_init()		do { } while(0)
>  #endif
>  
> +static int gic_retrigger(struct irq_data *data)
> +{
> +	return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);

If I'm not mistaken, check_irq_resend() requires a non-zero return
value if the retrigger has succeeded. So something like

      return !gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);

would be more appropriate.

Otherwise, looks good.

Thanks,

	M.
Valentin Schneider July 10, 2020, 5:08 p.m. UTC | #2
On 10/07/20 17:29, Marc Zyngier wrote:
> On Fri, 10 Jul 2020 15:56:42 +0100,
> Valentin Schneider <valentin.schneider@arm.com> wrote:

[...]

>> Implement irq_chip.irq_retrigger() for both GICs.
>
> Although I am very grateful for the whole documentation, I'd rather
> have a slightly more condensed changelog that documents the
> implementation of the retrigger callback! ;-)
>

Hah, indeed! I was relatively unsure about that whole thing, hence why I
sent it as RFC with a wall of text attached. I'll probably strip out the
GIC doc snippets for the "actual" changelog, and talk about the *contents*
of the patch some more.

>>
>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>> ---
>>  drivers/irqchip/irq-gic-v3.c | 7 +++++++
>>  drivers/irqchip/irq-gic.c    | 6 ++++++
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index cc46bc2d634b..c025e8b51464 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -1207,6 +1207,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>>  #define gic_smp_init()		do { } while(0)
>>  #endif
>>
>> +static int gic_retrigger(struct irq_data *data)
>> +{
>> +	return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);
>
> If I'm not mistaken, check_irq_resend() requires a non-zero return
> value if the retrigger has succeeded. So something like
>
>       return !gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);
>
> would be more appropriate.
>

Aye, you're right. And while we're at it, we probably still don't want to
fallback to irq_sw_resend() if the retrigger fails, so we should add some
irqd_set_handle_enforce_irqctx() somewhere in the GICs or plainly deselect
CONFIG_HARDIRQS_SW_RESEND.

I'm not very familiar with LPIs just yet, but seeing as they too use
handle_fasteoi_irq() and can't get retriggered, I'd rather play it safe.


This brings me to another point: while this boots just
fine, I didn't get to test out IRQs marked with IRQS_PENDING. IIUC
enable_irq_wake() should give me a decent trail - I see serial_core making
use of it. I'll go give suspend a try.

> Otherwise, looks good.
>

Thanks for having a look!

> Thanks,
>
>       M.
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index cc46bc2d634b..c025e8b51464 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1207,6 +1207,11 @@  static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 #define gic_smp_init()		do { } while(0)
 #endif
 
+static int gic_retrigger(struct irq_data *data)
+{
+	return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);
+}
+
 #ifdef CONFIG_CPU_PM
 static int gic_cpu_pm_notifier(struct notifier_block *self,
 			       unsigned long cmd, void *v)
@@ -1242,6 +1247,7 @@  static struct irq_chip gic_chip = {
 	.irq_eoi		= gic_eoi_irq,
 	.irq_set_type		= gic_set_type,
 	.irq_set_affinity	= gic_set_affinity,
+	.irq_retrigger          = gic_retrigger,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.irq_nmi_setup		= gic_irq_nmi_setup,
@@ -1258,6 +1264,7 @@  static struct irq_chip gic_eoimode1_chip = {
 	.irq_eoi		= gic_eoimode1_eoi_irq,
 	.irq_set_type		= gic_set_type,
 	.irq_set_affinity	= gic_set_affinity,
+	.irq_retrigger          = gic_retrigger,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.irq_set_vcpu_affinity	= gic_irq_set_vcpu_affinity,
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 00de05abd3c3..33ce025868d0 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -355,6 +355,11 @@  static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 }
 #endif
 
+static int gic_retrigger(struct irq_data *data)
+{
+	return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);
+}
+
 static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
 {
 	u32 irqstat, irqnr;
@@ -425,6 +430,7 @@  static const struct irq_chip gic_chip = {
 	.irq_unmask		= gic_unmask_irq,
 	.irq_eoi		= gic_eoi_irq,
 	.irq_set_type		= gic_set_type,
+	.irq_retrigger          = gic_retrigger,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |