diff mbox series

irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890

Message ID 20240624165541.1286227-1-rkagan@amazon.de (mailing list archive)
State New, archived
Headers show
Series irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890 | expand

Commit Message

Kagan, Roman June 24, 2024, 4:55 p.m. UTC
According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
Table until one of a number of external events occurs.

No LPIs are lost but they may not be delivered in a finite time.

The workaround is to issue an INV using GICR_INVLPIR to an unused, in
range LPI ID to retrigger the search.

Add this workaround to the quirk table.  When the quirk is applicable,
carve out one LPI ID from the available range and run periodic work to
do INV to it, in order to prevent GIC from stalling.

TT: https://t.corp.amazon.com/D82032616
Signed-off-by: Elad Rosner <eladros@amazon.com>
Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
 drivers/irqchip/irq-gic-v3-its.c            | 70 ++++++++++++++++++++-
 Documentation/arch/arm64/silicon-errata.rst |  2 +
 arch/arm64/Kconfig                          | 18 ++++++
 3 files changed, 89 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner June 24, 2024, 9:44 p.m. UTC | #1
On Mon, Jun 24 2024 at 18:55, Roman Kagan wrote:
> According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> Table until one of a number of external events occurs.
>
> No LPIs are lost but they may not be delivered in a finite time.
>
> The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> range LPI ID to retrigger the search.
>
> Add this workaround to the quirk table.  When the quirk is applicable,
> carve out one LPI ID from the available range and run periodic work to
> do INV to it, in order to prevent GIC from stalling.
>
> TT: https://t.corp.amazon.com/D82032616

Can you please refrain from providing internal links? 

> Signed-off-by: Elad Rosner <eladros@amazon.com>
> Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>

That Signed-off-by chain is invalid. See Documentation/process/

Thanks,

        tglx
Marc Zyngier June 25, 2024, 8:45 a.m. UTC | #2
On Mon, 24 Jun 2024 17:55:41 +0100,
Roman Kagan <rkagan@amazon.de> wrote:
> 
> According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> Table until one of a number of external events occurs.

Please add a link to the errata document.

> 
> No LPIs are lost but they may not be delivered in a finite time.
> 
> The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> range LPI ID to retrigger the search.
> 
> Add this workaround to the quirk table.  When the quirk is applicable,
> carve out one LPI ID from the available range and run periodic work to
> do INV to it, in order to prevent GIC from stalling.

The errata document says a lot more:

<quote>
For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
to an unused, in range LPI ID to retrigger the search. This could be
done periodically, for example, in line with a residency change, or as
part of servicing LPIs.  If using LPIs as the event, then the
GICR_INVLPIR write could be issued after servicing every LPI.

However, it only needs to be issued if:

* At least 4 interrupts in the block of 32 are enabled and mapped to
  the current PE or, if easier,

* At least 4 interrupts in the block of 32 are enabled and mapped to
  any PE
</quote>

> 
> TT: https://t.corp.amazon.com/D82032616

Gniii????

> Signed-off-by: Elad Rosner <eladros@amazon.com>
> Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>

Who is the author?

> ---
>  drivers/irqchip/irq-gic-v3-its.c            | 70 ++++++++++++++++++++-
>  Documentation/arch/arm64/silicon-errata.rst |  2 +
>  arch/arm64/Kconfig                          | 18 ++++++
>  3 files changed, 89 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 3c755d5dad6e..53cf50dd8e13 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -29,6 +29,7 @@
>  #include <linux/percpu.h>
>  #include <linux/slab.h>
>  #include <linux/syscore_ops.h>
> +#include <linux/workqueue.h>
>  
>  #include <linux/irqchip.h>
>  #include <linux/irqchip/arm-gic-v3.h>
> @@ -49,6 +50,7 @@
>  #define RD_LOCAL_MEMRESERVE_DONE                BIT(2)
>  
>  static u32 lpi_id_bits;
> +static u32 lpi_id_base __initdata = 8192;
>  
>  /*
>   * We allocate memory for PROPBASE to cover 2 ^ lpi_id_bits LPIs to
> @@ -2136,7 +2138,7 @@ static int __init its_lpi_init(u32 id_bits)
>  	 * Initializing the allocator is just the same as freeing the
>  	 * full range of LPIs.
>  	 */
> -	err = free_lpi_range(8192, lpis);
> +	err = free_lpi_range(lpi_id_base, lpis - lpi_id_base + 8192);
>  	pr_debug("ITS: Allocator initialized for %u LPIs\n", lpis);
>  	return err;
>  }
> @@ -4763,6 +4765,61 @@ static bool its_set_non_coherent(void *data)
>  	return true;
>  }
>  
> +#define ITS_QUIRK_GIC700_2195890_PERIOD_MSEC 1000

Use MSEC_PER_SEC.

> +static struct {
> +	u32 lpi;
> +	struct delayed_work work;
> +} its_quirk_gic700_2195890_data __maybe_unused;
> +
> +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> +{
> +	int cpu;
> +	void __iomem *rdbase;
> +	u64 gicr_invlpir_val;
> +
> +	for_each_online_cpu(cpu) {

The errata document doesn't say that this need to happen for *every*
RD. Can you please clarify this?

> +		rdbase = gic_data_rdist_cpu(cpu)->rd_base;
> +		if (!rdbase) {
> +			continue;
> +		}
> +
> +		/*
> +		 * Prod the respective GIC with an INV for an otherwise unused
> +		 * LPI.  This is only to resume the stalled processing, so
> +		 * there's no need to wait for invalidation to complete.
> +		 */
> +		gicr_invlpir_val =
> +			FIELD_PREP(GICR_INVLPIR_INTID,
> +				   its_quirk_gic700_2195890_data.lpi);

Don't split assignments.

> +		raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> +		gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> +		raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);

No synchronisation? How is that supposed to work?

Also, if you need to dig into the internals of the driver, extract a
helper from __direct_lpi_inv().


> +	}
> +
> +	schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> +		msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));

It would be pretty easy to detect whether an LPI was ack'ed since the
last pass, and not issue the invalidate.

> +}
> +
> +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> +{
> +	struct its_node *its = data;
> +
> +	if (its_quirk_gic700_2195890_data.lpi)
> +		return true;
> +
> +	/*
> +	 * Use one LPI INTID from the start of the LPI range for GIC prodding,
> +	 * and make it unavailable for regular LPI use later.
> +	 */
> +	its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> +
> +	INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> +			  its_quirk_gic700_2195890_work_handler);
> +	schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> +
> +	return true;
> +}

It is a bit odd to hook this on an ITS being probed when the ITS isn't
really involved. Not a big deal, but a bit clumsy.

> +
>  static const struct gic_quirk its_quirks[] = {
>  #ifdef CONFIG_CAVIUM_ERRATUM_22375
>  	{
> @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
>  		.property = "dma-noncoherent",
>  		.init   = its_set_non_coherent,
>  	},
> +#ifdef CONFIG_ARM64_ERRATUM_2195890
> +	{
> +		.desc	= "ITS: GIC-700 erratum 2195890",
> +		/*
> +		 * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> +		 */
> +		.iidr	= 0x0400043b,
> +		.mask	= 0xfffeffff,
> +		.init	= its_enable_quirk_gic700_2195890,

This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
are 0).

Overall, this requires a bit of rework. Notably, this could be
significantly relaxed to match the requirements of the published
workaround.

Thanks,

	M.
Kagan, Roman June 25, 2024, 11:59 a.m. UTC | #3
On Mon, Jun 24, 2024 at 11:44:11PM +0200, Thomas Gleixner wrote:
> On Mon, Jun 24 2024 at 18:55, Roman Kagan wrote:
> > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > Table until one of a number of external events occurs.
> >
> > No LPIs are lost but they may not be delivered in a finite time.
> >
> > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > range LPI ID to retrigger the search.
> >
> > Add this workaround to the quirk table.  When the quirk is applicable,
> > carve out one LPI ID from the available range and run periodic work to
> > do INV to it, in order to prevent GIC from stalling.
> >
> > TT: https://t.corp.amazon.com/D82032616
> 
> Can you please refrain from providing internal links?
> 
> > Signed-off-by: Elad Rosner <eladros@amazon.com>
> > Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> > Signed-off-by: Roman Kagan <rkagan@amazon.de>
> 
> That Signed-off-by chain is invalid. See Documentation/process/

ACK to all, will fix (sorry haven't posted patches for a while, muscle
memory got rusty).

Thanks,
Roman.



Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Kagan, Roman June 25, 2024, 1:54 p.m. UTC | #4
On Tue, Jun 25, 2024 at 09:45:22AM +0100, Marc Zyngier wrote:
> On Mon, 24 Jun 2024 17:55:41 +0100,
> Roman Kagan <rkagan@amazon.de> wrote:
> >
> > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > Table until one of a number of external events occurs.
> 
> Please add a link to the errata document.

https://developer.arm.com/documentation/SDEN-1769194/
Will include when respinning.

> >
> > No LPIs are lost but they may not be delivered in a finite time.
> >
> > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > range LPI ID to retrigger the search.
> >
> > Add this workaround to the quirk table.  When the quirk is applicable,
> > carve out one LPI ID from the available range and run periodic work to
> > do INV to it, in order to prevent GIC from stalling.
> 
> The errata document says a lot more:
> 
> <quote>
> For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
> to an unused, in range LPI ID to retrigger the search. This could be
> done periodically, for example, in line with a residency change, or as
> part of servicing LPIs.  If using LPIs as the event, then the
> GICR_INVLPIR write could be issued after servicing every LPI.
> 
> However, it only needs to be issued if:
> 
> * At least 4 interrupts in the block of 32 are enabled and mapped to
>   the current PE or, if easier,
> 
> * At least 4 interrupts in the block of 32 are enabled and mapped to
>   any PE
> </quote>

It didn't feel like worth optimizing for.  I'll reconsider.

> > TT: https://t.corp.amazon.com/D82032616
> 
> Gniii????

Indeed Q-/

> > Signed-off-by: Elad Rosner <eladros@amazon.com>
> > Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> > Signed-off-by: Roman Kagan <rkagan@amazon.de>
> 
> Who is the author?

Joint effort aka inherited ownership.  Will fix according to the
process doc.

> > +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> > +{
> > +     int cpu;
> > +     void __iomem *rdbase;
> > +     u64 gicr_invlpir_val;
> > +
> > +     for_each_online_cpu(cpu) {
> 
> The errata document doesn't say that this need to happen for *every*
> RD. Can you please clarify this?

(Digging out a year-old comms with ARM)
> > In multi-chip GIC system, does this write have to happen in each
> > chip or would a write to a single GICR trigger the search in all
> > GICDs?
> The write needs to occur for each physical PE - in other words, to
> each individual GICR that the search needs to be re-triggered for.

> > +             raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> > +             gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> > +             raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
> 
> No synchronisation? How is that supposed to work?
> 
> Also, if you need to dig into the internals of the driver, extract a
> helper from __direct_lpi_inv().

ACK

> > +     }
> > +
> > +     schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> > +             msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
> 
> It would be pretty easy to detect whether an LPI was ack'ed since the
> last pass, and not issue the invalidate.

Makes sense, will look into it.

Overall, do you think this approach with a global work looping over cpus
is the right one, or we should better try and implement something
per-cpu?

> > +}
> > +
> > +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> > +{
> > +     struct its_node *its = data;
> > +
> > +     if (its_quirk_gic700_2195890_data.lpi)
> > +             return true;
> > +
> > +     /*
> > +      * Use one LPI INTID from the start of the LPI range for GIC prodding,
> > +      * and make it unavailable for regular LPI use later.
> > +      */
> > +     its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> > +
> > +     INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> > +                       its_quirk_gic700_2195890_work_handler);
> > +     schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> > +
> > +     return true;
> > +}
> 
> It is a bit odd to hook this on an ITS being probed when the ITS isn't
> really involved. Not a big deal, but a bit clumsy.

True, but the LPI allocation lives in this file so it looked easier to
wire it all up here.  Where do you think it's more appropriate?

> >  static const struct gic_quirk its_quirks[] = {
> >  #ifdef CONFIG_CAVIUM_ERRATUM_22375
> >       {
> > @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
> >               .property = "dma-noncoherent",
> >               .init   = its_set_non_coherent,
> >       },
> > +#ifdef CONFIG_ARM64_ERRATUM_2195890
> > +     {
> > +             .desc   = "ITS: GIC-700 erratum 2195890",
> > +             /*
> > +              * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> > +              */
> > +             .iidr   = 0x0400043b,
> > +             .mask   = 0xfffeffff,
> > +             .init   = its_enable_quirk_gic700_2195890,
> 
> This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
> are 0).

Ouch, right.  Given the erratum exact wording

> Fault Status: Present in: r0p0, r0p1, r1p0 Fixed in: r2p0

I guess I should match everything below r2p0 and allow arbitrary bits
15:12 (i.e. set the third nibble in the mask to 0).

> Overall, this requires a bit of rework. Notably, this could be
> significantly relaxed to match the requirements of the published
> workaround.

Thanks for the propmpt review!  Will rework and respin.
Roman.



Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Marc Zyngier June 25, 2024, 2:30 p.m. UTC | #5
On Tue, 25 Jun 2024 14:54:28 +0100,
Roman Kagan <rkagan@amazon.de> wrote:
> 
> On Tue, Jun 25, 2024 at 09:45:22AM +0100, Marc Zyngier wrote:
> > On Mon, 24 Jun 2024 17:55:41 +0100,
> > Roman Kagan <rkagan@amazon.de> wrote:
> > >
> > > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > > Table until one of a number of external events occurs.
> > 
> > Please add a link to the errata document.
> 
> https://developer.arm.com/documentation/SDEN-1769194/
> Will include when respinning.
> 
> > >
> > > No LPIs are lost but they may not be delivered in a finite time.
> > >
> > > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > > range LPI ID to retrigger the search.
> > >
> > > Add this workaround to the quirk table.  When the quirk is applicable,
> > > carve out one LPI ID from the available range and run periodic work to
> > > do INV to it, in order to prevent GIC from stalling.
> > 
> > The errata document says a lot more:
> > 
> > <quote>
> > For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
> > to an unused, in range LPI ID to retrigger the search. This could be
> > done periodically, for example, in line with a residency change, or as
> > part of servicing LPIs.  If using LPIs as the event, then the
> > GICR_INVLPIR write could be issued after servicing every LPI.
> > 
> > However, it only needs to be issued if:
> > 
> > * At least 4 interrupts in the block of 32 are enabled and mapped to
> >   the current PE or, if easier,
> > 
> > * At least 4 interrupts in the block of 32 are enabled and mapped to
> >   any PE
> > </quote>
> 
> It didn't feel like worth optimizing for.  I'll reconsider.

I'm not sure we want to optimise for it, but I'd certainly want to
hear the *rationale* behind not considering the optimisation.

> 
> > > TT: https://t.corp.amazon.com/D82032616
> > 
> > Gniii????
> 
> Indeed Q-/
> 
> > > Signed-off-by: Elad Rosner <eladros@amazon.com>
> > > Signed-off-by: Mohamed Mediouni <mediou@amazon.com>
> > > Signed-off-by: Roman Kagan <rkagan@amazon.de>
> > 
> > Who is the author?
> 
> Joint effort aka inherited ownership.  Will fix according to the
> process doc.
> 
> > > +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> > > +{
> > > +     int cpu;
> > > +     void __iomem *rdbase;
> > > +     u64 gicr_invlpir_val;
> > > +
> > > +     for_each_online_cpu(cpu) {
> > 
> > The errata document doesn't say that this need to happen for *every*
> > RD. Can you please clarify this?
> 
> (Digging out a year-old comms with ARM)
> > > In multi-chip GIC system, does this write have to happen in each
> > > chip or would a write to a single GICR trigger the search in all
> > > GICDs?
> > The write needs to occur for each physical PE - in other words, to
> > each individual GICR that the search needs to be re-triggered for.

OK, that pretty much rules out doing anything clever (note to self,
check the GIC revision before buying the HW...).

> 
> > > +             raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> > > +             gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> > > +             raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
> > 
> > No synchronisation? How is that supposed to work?
> > 
> > Also, if you need to dig into the internals of the driver, extract a
> > helper from __direct_lpi_inv().
> 
> ACK
> 
> > > +     }
> > > +
> > > +     schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> > > +             msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
> > 
> > It would be pretty easy to detect whether an LPI was ack'ed since the
> > last pass, and not issue the invalidate.
> 
> Makes sense, will look into it.
> 
> Overall, do you think this approach with a global work looping over cpus
> is the right one, or we should better try and implement something
> per-cpu?

One of my worries is that you're crossing all node boundaries by doing
this, which is going to suck on really large systems. If anything,
you'd be better off with a per-node worker.

It doesn't need to be implemented right now, but I have the feeling
that someone is going to ask.

> 
> > > +}
> > > +
> > > +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> > > +{
> > > +     struct its_node *its = data;
> > > +
> > > +     if (its_quirk_gic700_2195890_data.lpi)
> > > +             return true;
> > > +
> > > +     /*
> > > +      * Use one LPI INTID from the start of the LPI range for GIC prodding,
> > > +      * and make it unavailable for regular LPI use later.
> > > +      */
> > > +     its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> > > +
> > > +     INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> > > +                       its_quirk_gic700_2195890_work_handler);
> > > +     schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> > > +
> > > +     return true;
> > > +}
> > 
> > It is a bit odd to hook this on an ITS being probed when the ITS isn't
> > really involved. Not a big deal, but a bit clumsy.
> 
> True, but the LPI allocation lives in this file so it looked easier to
> wire it all up here.  Where do you think it's more appropriate?

But the allocation doesn't really take place, does it? You just nick
one LPI. Which by the way I'd rather you pick the last one instead of
the first, as this messes with devices that require ye oldie MultiMSI
and its stupid alignment requirements.

Also, you have its_cpu_init() which takes care of RDs. You could
absolutely add the quirk to the main GIC driver (based on the
distributor IIDR), add a flag to rdists.flags, and let it roll.

Either way, I don't really care. Maybe keeping it centralised is good
enough.

> 
> > >  static const struct gic_quirk its_quirks[] = {
> > >  #ifdef CONFIG_CAVIUM_ERRATUM_22375
> > >       {
> > > @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
> > >               .property = "dma-noncoherent",
> > >               .init   = its_set_non_coherent,
> > >       },
> > > +#ifdef CONFIG_ARM64_ERRATUM_2195890
> > > +     {
> > > +             .desc   = "ITS: GIC-700 erratum 2195890",
> > > +             /*
> > > +              * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> > > +              */
> > > +             .iidr   = 0x0400043b,
> > > +             .mask   = 0xfffeffff,
> > > +             .init   = its_enable_quirk_gic700_2195890,
> > 
> > This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
> > are 0).
> 
> Ouch, right.  Given the erratum exact wording
> 
> > Fault Status: Present in: r0p0, r0p1, r1p0 Fixed in: r2p0
> 
> I guess I should match everything below r2p0 and allow arbitrary bits
> 15:12 (i.e. set the third nibble in the mask to 0).

Either that or you could have two entries. Or use the fact that there
is no released r1p1 to your advantage...

Thanks,

	M.
kernel test robot June 28, 2024, 4:07 p.m. UTC | #6
Hi Roman,

kernel test robot noticed the following build warnings:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on soc/for-next tip/irq/core linus/master v6.10-rc5 next-20240627]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Roman-Kagan/irqchip-gicv3-its-Workaround-for-GIC-700-erratum-2195890/20240625-212945
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link:    https://lore.kernel.org/r/20240624165541.1286227-1-rkagan%40amazon.de
patch subject: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
config: arm64-randconfig-004-20240628 (https://download.01.org/0day-ci/archive/20240628/202406282359.98P8jJ4c-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240628/202406282359.98P8jJ4c-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406282359.98P8jJ4c-lkp@intel.com/

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> WARNING: modpost: vmlinux: section mismatch in reference: its_enable_quirk_gic700_2195890+0x24 (section: .text) -> lpi_id_base (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: its_enable_quirk_gic700_2195890+0x34 (section: .text) -> lpi_id_base (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: its_enable_quirk_gic700_2195890+0x44 (section: .text) -> lpi_id_base (section: .init.data)
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/crct10dif-ce.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/arm64/crypto/aes-neon-bs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/locking/test-ww_mutex.o
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/time/test_udelay.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp850.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp860.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp862.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp863.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp949.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp950.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_cp1251.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/nls_iso8859-7.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/mac-centeuro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nls/mac-gaelic.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/binfmt_misc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/cramfs/cramfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/fat/fat.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/ufs/ufs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/efs/efs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/btrfs/btrfs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in security/keys/trusted-keys/trusted.o
WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/algif_hash.o
WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/algif_skcipher.o
WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/xor.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/crypto/libdes.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_hexdump.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_bpf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_firmware.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test-kstrtox.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_module.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_user_copy.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_blackhole_dev.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/atomic64_test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/asn1_encoder.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pinctrl/pinctrl-mcp23s08_i2c.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pinctrl/pinctrl-mcp23s08.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/video/backlight/platform_lcd.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/dma/qcom/hdma.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/serial/8250/8250_base.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/serial/serial_mctrl_gpio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/n_hdlc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/n_gsm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/tty/ttynull.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/hw_random/omap-rng.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/hw_random/omap3-rom-rng.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/hw_random/arm_smccc_trng.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/ppdev.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/misc/fastrpc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/auxdisplay/line-display.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/media/tuners/tda9887.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/media/dvb-frontends/au8522_decoder.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/media/dvb-frontends/mb86a16.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mmc/core/pwrseq_emmc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mmc/core/sdio_uart.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/arm-ccn.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/uio/uio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/greybus/greybus.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/rpmsg/rpmsg_char.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/buffer/kfifo_buf.o
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 3c755d5dad6e..53cf50dd8e13 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -29,6 +29,7 @@ 
 #include <linux/percpu.h>
 #include <linux/slab.h>
 #include <linux/syscore_ops.h>
+#include <linux/workqueue.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-v3.h>
@@ -49,6 +50,7 @@ 
 #define RD_LOCAL_MEMRESERVE_DONE                BIT(2)
 
 static u32 lpi_id_bits;
+static u32 lpi_id_base __initdata = 8192;
 
 /*
  * We allocate memory for PROPBASE to cover 2 ^ lpi_id_bits LPIs to
@@ -2136,7 +2138,7 @@  static int __init its_lpi_init(u32 id_bits)
 	 * Initializing the allocator is just the same as freeing the
 	 * full range of LPIs.
 	 */
-	err = free_lpi_range(8192, lpis);
+	err = free_lpi_range(lpi_id_base, lpis - lpi_id_base + 8192);
 	pr_debug("ITS: Allocator initialized for %u LPIs\n", lpis);
 	return err;
 }
@@ -4763,6 +4765,61 @@  static bool its_set_non_coherent(void *data)
 	return true;
 }
 
+#define ITS_QUIRK_GIC700_2195890_PERIOD_MSEC 1000
+static struct {
+	u32 lpi;
+	struct delayed_work work;
+} its_quirk_gic700_2195890_data __maybe_unused;
+
+static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
+{
+	int cpu;
+	void __iomem *rdbase;
+	u64 gicr_invlpir_val;
+
+	for_each_online_cpu(cpu) {
+		rdbase = gic_data_rdist_cpu(cpu)->rd_base;
+		if (!rdbase) {
+			continue;
+		}
+
+		/*
+		 * Prod the respective GIC with an INV for an otherwise unused
+		 * LPI.  This is only to resume the stalled processing, so
+		 * there's no need to wait for invalidation to complete.
+		 */
+		gicr_invlpir_val =
+			FIELD_PREP(GICR_INVLPIR_INTID,
+				   its_quirk_gic700_2195890_data.lpi);
+		raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
+		gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
+		raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
+	}
+
+	schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
+		msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
+}
+
+static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
+{
+	struct its_node *its = data;
+
+	if (its_quirk_gic700_2195890_data.lpi)
+		return true;
+
+	/*
+	 * Use one LPI INTID from the start of the LPI range for GIC prodding,
+	 * and make it unavailable for regular LPI use later.
+	 */
+	its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
+
+	INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
+			  its_quirk_gic700_2195890_work_handler);
+	schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
+
+	return true;
+}
+
 static const struct gic_quirk its_quirks[] = {
 #ifdef CONFIG_CAVIUM_ERRATUM_22375
 	{
@@ -4822,6 +4879,17 @@  static const struct gic_quirk its_quirks[] = {
 		.property = "dma-noncoherent",
 		.init   = its_set_non_coherent,
 	},
+#ifdef CONFIG_ARM64_ERRATUM_2195890
+	{
+		.desc	= "ITS: GIC-700 erratum 2195890",
+		/*
+		 * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
+		 */
+		.iidr	= 0x0400043b,
+		.mask	= 0xfffeffff,
+		.init	= its_enable_quirk_gic700_2195890,
+	},
+#endif
 	{
 	}
 };
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index eb8af8032c31..67445075ae88 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -169,6 +169,8 @@  stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | GIC-700         | #2941627        | ARM64_ERRATUM_2941627       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | GIC-700         | #2195890        | ARM64_ERRATUM_2195890       |
++----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
 | Broadcom       | Brahma-B53      | N/A             | ARM64_ERRATUM_845719        |
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5d91259ee7b5..9c330029131c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1279,6 +1279,24 @@  config ROCKCHIP_ERRATUM_3588001
 
 	  If unsure, say Y.
 
+config ARM64_ERRATUM_2195890
+	bool "GIC-700: 2195890: LPIs may be held in Pending Table until specific external events happen"
+	default y
+	help
+	  This option adds a workaround for Arm GIC-700 erratum 2195890.
+
+	  In affected GIC-700 versions (r0p0, r0p1, r1p0) under certain
+	  conditions LPIs may remain in the Pending Table until one of a number
+	  of external events occurs.
+
+	  No LPIs are lost and this can happen on physical or virtual PEs but
+	  this erratum means they may not be delivered in a finite time.
+
+	  Work around the issue by inserting an INV command for an unused but
+	  valid LPI INTID every so often to retrigger the search.
+
+	  If unsure, say Y.
+
 config SOCIONEXT_SYNQUACER_PREITS
 	bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
 	default y