diff mbox

irqchip: omap-intc: fix spurious irq handling

Message ID 3d433cfeeb93366cadbb1668ebeac2e8006b0fd5.1445247844.git.nsekhar@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sekhar Nori Oct. 19, 2015, 9:46 a.m. UTC
Under some conditions, irq sorting procedure used by INTC can go wrong
resulting in a spurious irq getting reported.

This condition is flagged by INTC by setting "Spurious IRQ Flag" in SIR
register to 0x1ffffff. Section 6.2.5 of AM335x TRM revised Jun 2014
describes this.

Using IRQ number 0 for checking this condition is wrong. 0 is a valid
INTC IRQ. For example, on AM335x, it is the emulation interrupt.

Fix handing of spurious interrupt condition in omap-intc driver by
correct detection of spurious interrupt condition.

Since spurious IRQ condition can happen under genuine conditions (see
the section of AM335x TRM for details) and is recoverable, we do not
need a warning splat for users to report. It can however result in
reduced performance so we add a ratelimited debug print to aid
developers.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/irqchip/irq-omap-intc.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner Oct. 19, 2015, 10:13 a.m. UTC | #1
On Mon, 19 Oct 2015, Sekhar Nori wrote:
> +	/*
> +	 * A spurious IRQ can result if interrupt that triggered the
> +	 * sorting is no longer active during the sorting (10 INTC
> +	 * functional clock cycles after interrupt assertion). Or a
> +	 * change in interrupt mask affected the result during sorting
> +	 * time. There is no special handling required except ignoring
> +	 * the SIR register value just read and retrying.
> +	 * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
> +	 */
> +	if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {
> +		pr_debug_ratelimited("%s: spurious irq!\n", __func__);

I'd prefer that this is a pr_once() and the spurious interrupt counter
is incremented. That's far more useful as it gives you real
information about the frequency of the issue.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Oct. 19, 2015, 2:50 p.m. UTC | #2
Hi,

* Sekhar Nori <nsekhar@ti.com> [151019 02:51]:
> Under some conditions, irq sorting procedure used by INTC can go wrong
> resulting in a spurious irq getting reported.
> 
> This condition is flagged by INTC by setting "Spurious IRQ Flag" in SIR
> register to 0x1ffffff. Section 6.2.5 of AM335x TRM revised Jun 2014
> describes this.

OK so we have this finally documented, that's great. It's been bugging
me for years now :) What we used to have for omap3 was 6ccc4c0dedf8
("ARM: OMAP3: Warn about spurious interrupts"). I alsways thought it's
some undocumented omap3 weirdness but obviously not if you're seeing it
on am335x too.

> Using IRQ number 0 for checking this condition is wrong. 0 is a valid
> INTC IRQ. For example, on AM335x, it is the emulation interrupt.
> 
> Fix handing of spurious interrupt condition in omap-intc driver by
> correct detection of spurious interrupt condition.
> 
> Since spurious IRQ condition can happen under genuine conditions (see
> the section of AM335x TRM for details) and is recoverable, we do not
> need a warning splat for users to report. It can however result in
> reduced performance so we add a ratelimited debug print to aid
> developers.

Do you know what really is causing the spurious interrupts in your
case?

In all the cases I've seen, the spurious interrupts were caused by
a missing flush of posted write acking the IRQ at the device driver.
for the _previously triggered_ INTC interrupt.

If you have a reproducable case, I suggest you test that by printing
out the previous interrupt to check if that makes sense. And then see
if adding the missing read back to that interrupt handler fixes the
issue.

And if my assumption is correct, you can then update your patch and
actually warn about the real culprit irq number :)

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sekhar Nori Oct. 20, 2015, 6:22 a.m. UTC | #3
On Monday 19 October 2015 08:20 PM, Tony Lindgren wrote:
> Hi,
> 
> * Sekhar Nori <nsekhar@ti.com> [151019 02:51]:
>> Under some conditions, irq sorting procedure used by INTC can go wrong
>> resulting in a spurious irq getting reported.
>>
>> This condition is flagged by INTC by setting "Spurious IRQ Flag" in SIR
>> register to 0x1ffffff. Section 6.2.5 of AM335x TRM revised Jun 2014
>> describes this.
> 
> OK so we have this finally documented, that's great. It's been bugging
> me for years now :) What we used to have for omap3 was 6ccc4c0dedf8
> ("ARM: OMAP3: Warn about spurious interrupts"). I alsways thought it's
> some undocumented omap3 weirdness but obviously not if you're seeing it
> on am335x too.

BTW, I noticed the AM335x documentation itself is copied from OMAP35x
public TRM: http://www.ti.com/lit/ug/spruf98y/spruf98y.pdf. Surprising
that OMAP34x never had this documented though.

> 
>> Using IRQ number 0 for checking this condition is wrong. 0 is a valid
>> INTC IRQ. For example, on AM335x, it is the emulation interrupt.
>>
>> Fix handing of spurious interrupt condition in omap-intc driver by
>> correct detection of spurious interrupt condition.
>>
>> Since spurious IRQ condition can happen under genuine conditions (see
>> the section of AM335x TRM for details) and is recoverable, we do not
>> need a warning splat for users to report. It can however result in
>> reduced performance so we add a ratelimited debug print to aid
>> developers.
> 
> Do you know what really is causing the spurious interrupts in your
> case?

No, not yet.

> 
> In all the cases I've seen, the spurious interrupts were caused by
> a missing flush of posted write acking the IRQ at the device driver.
> for the _previously triggered_ INTC interrupt.
> 
> If you have a reproducable case, I suggest you test that by printing
> out the previous interrupt to check if that makes sense. And then see
> if adding the missing read back to that interrupt handler fixes the
> issue.

Okay, thats good to know. Thanks for the hints and history of your debug
on OMAP3. The issue is not easily reproducible in my case. But if I try
hard enough, I can get hit it though. So I can surely try your hints.

> 
> And if my assumption is correct, you can then update your patch and
> actually warn about the real culprit irq number :)

I am not sure about introducing the prediction of bad IRQ in the same
patch as this. While its certainly useful to have hints about the
culprit, its not guaranteed to be true all the time. And if we later
discover the prediction scheme is throwing people off course more often
than not, it will be easy to revert just that prediction part without
affecting basic detection of spurious IRQ itself as documented by TRM.

So I propose this patch goes in with Thomas's comments fixed and I work
on adding some prediction based on your work in 6ccc4c0dedf8 ("ARM:
OMAP3: Warn about spurious interrupts").

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Ogness Oct. 20, 2015, 7:32 a.m. UTC | #4
On 2015-10-20, Sekhar Nori <nsekhar@ti.com> wrote:
>> Do you know what really is causing the spurious interrupts in your
>> case?
>
> No, not yet.

According to the TRM this is normal behavior if conditions that might
affect priority are changed during priority sorting.

    6.2.5 ARM A8 INTC Spurious Interrupt Handling

    The spurious flag indicates whether the result of the sorting (a
    window of 10 INTC functional clock cycles after the interrupt
    assertion) is invalid. The sorting is invalid if:

    - The interrupt that triggered the sorting is no longer active
      during the sorting.

    - A change in the mask has affected the result during the sorting
      time.

>> In all the cases I've seen, the spurious interrupts were caused by a
>> missing flush of posted write acking the IRQ at the device driver.
>> for the _previously triggered_ INTC interrupt.
>> 
>> If you have a reproducable case, I suggest you test that by printing
>> out the previous interrupt to check if that makes sense. And then see
>> if adding the missing read back to that interrupt handler fixes the
>> issue.
>
> Okay, thats good to know. Thanks for the hints and history of your debug
> on OMAP3. The issue is not easily reproducible in my case. But if I try
> hard enough, I can get hit it though. So I can surely try your hints.

I can reproduce the situation very easily. After running a test for a
few minutes and printing out the previous interrupt, I have the
following list. These are the irq numbers seen by the handler before the
spurious interrupt triggered.

    INT12 - EDMACOMPINT - TPCC (EDMA)
    INT41 - 3PGSWRXINT0 - CPSW (Ethernet)
    INT42 - 3PGSWTXINT0 - CPSW (Ethernet)
    INT68 - TINT2       - DMTIMER2
    INT72 - UART0INT    - UART0

From this I do not think we can put the blame on any single driver. I
trigger this situation very easily by putting a load of 7,000+
interrupts per second on the system. This means we have 70,000 INTC
clock cycles per second where a change in the interrupt priority
conditions would cause the priority sorting to become invalid and thus
cause the spurious interrupt.

I'm not sure if we can/should do anything more than Sekhar's patch of
acknowledging the spurious interrupt so the priority sorting algorithm
can run again.

John Ogness
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Oct. 20, 2015, 2:52 p.m. UTC | #5
* John Ogness <john.ogness@linutronix.de> [151020 00:33]:
> On 2015-10-20, Sekhar Nori <nsekhar@ti.com> wrote:
> >> Do you know what really is causing the spurious interrupts in your
> >> case?
> >
> > No, not yet.
> 
> According to the TRM this is normal behavior if conditions that might
> affect priority are changed during priority sorting.
> 
>     6.2.5 ARM A8 INTC Spurious Interrupt Handling
> 
>     The spurious flag indicates whether the result of the sorting (a
>     window of 10 INTC functional clock cycles after the interrupt
>     assertion) is invalid. The sorting is invalid if:
> 
>     - The interrupt that triggered the sorting is no longer active
>       during the sorting.
> 
>     - A change in the mask has affected the result during the sorting
>       time.
> 
> >> In all the cases I've seen, the spurious interrupts were caused by a
> >> missing flush of posted write acking the IRQ at the device driver.
> >> for the _previously triggered_ INTC interrupt.
> >> 
> >> If you have a reproducable case, I suggest you test that by printing
> >> out the previous interrupt to check if that makes sense. And then see
> >> if adding the missing read back to that interrupt handler fixes the
> >> issue.
> >
> > Okay, thats good to know. Thanks for the hints and history of your debug
> > on OMAP3. The issue is not easily reproducible in my case. But if I try
> > hard enough, I can get hit it though. So I can surely try your hints.
> 
> I can reproduce the situation very easily. After running a test for a
> few minutes and printing out the previous interrupt, I have the
> following list. These are the irq numbers seen by the handler before the
> spurious interrupt triggered.
> 
>     INT12 - EDMACOMPINT - TPCC (EDMA)
>     INT41 - 3PGSWRXINT0 - CPSW (Ethernet)
>     INT42 - 3PGSWTXINT0 - CPSW (Ethernet)
>     INT68 - TINT2       - DMTIMER2
>     INT72 - UART0INT    - UART0
> 
> From this I do not think we can put the blame on any single driver. I
> trigger this situation very easily by putting a load of 7,000+
> interrupts per second on the system. This means we have 70,000 INTC
> clock cycles per second where a change in the interrupt priority
> conditions would cause the priority sorting to become invalid and thus
> cause the spurious interrupt.
> 
> I'm not sure if we can/should do anything more than Sekhar's patch of
> acknowledging the spurious interrupt so the priority sorting algorithm
> can run again.

OK thanks for testing. My guess from the above list would be EDMA
or CPSW missing a flush of posted write. Maybe try adding a readback
of the related device revision register after acking the interrupt into
TPCC interrupt handler and CPSW interrupt handler(s)?

The timer2 and uart0 seem to be false positives here naturally.

I would not yet rule out the "previous interrupt" theory until you have
tried that. We really want to know the root cause of the issue, just
printing out spurious interrupt does not fix the problem :)

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sekhar Nori Dec. 3, 2015, 11:28 a.m. UTC | #6
Hi Tony,

On Tuesday 20 October 2015 08:22 PM, Tony Lindgren wrote:
> * John Ogness <john.ogness@linutronix.de> [151020 00:33]:
>> On 2015-10-20, Sekhar Nori <nsekhar@ti.com> wrote:
>>>> Do you know what really is causing the spurious interrupts in your
>>>> case?
>>>
>>> No, not yet.
>>
>> According to the TRM this is normal behavior if conditions that might
>> affect priority are changed during priority sorting.
>>
>>     6.2.5 ARM A8 INTC Spurious Interrupt Handling
>>
>>     The spurious flag indicates whether the result of the sorting (a
>>     window of 10 INTC functional clock cycles after the interrupt
>>     assertion) is invalid. The sorting is invalid if:
>>
>>     - The interrupt that triggered the sorting is no longer active
>>       during the sorting.
>>
>>     - A change in the mask has affected the result during the sorting
>>       time.
>>
>>>> In all the cases I've seen, the spurious interrupts were caused by a
>>>> missing flush of posted write acking the IRQ at the device driver.
>>>> for the _previously triggered_ INTC interrupt.
>>>>
>>>> If you have a reproducable case, I suggest you test that by printing
>>>> out the previous interrupt to check if that makes sense. And then see
>>>> if adding the missing read back to that interrupt handler fixes the
>>>> issue.
>>>
>>> Okay, thats good to know. Thanks for the hints and history of your debug
>>> on OMAP3. The issue is not easily reproducible in my case. But if I try
>>> hard enough, I can get hit it though. So I can surely try your hints.
>>
>> I can reproduce the situation very easily. After running a test for a
>> few minutes and printing out the previous interrupt, I have the
>> following list. These are the irq numbers seen by the handler before the
>> spurious interrupt triggered.
>>
>>     INT12 - EDMACOMPINT - TPCC (EDMA)
>>     INT41 - 3PGSWRXINT0 - CPSW (Ethernet)
>>     INT42 - 3PGSWTXINT0 - CPSW (Ethernet)
>>     INT68 - TINT2       - DMTIMER2
>>     INT72 - UART0INT    - UART0
>>
>> From this I do not think we can put the blame on any single driver. I
>> trigger this situation very easily by putting a load of 7,000+
>> interrupts per second on the system. This means we have 70,000 INTC
>> clock cycles per second where a change in the interrupt priority
>> conditions would cause the priority sorting to become invalid and thus
>> cause the spurious interrupt.
>>
>> I'm not sure if we can/should do anything more than Sekhar's patch of
>> acknowledging the spurious interrupt so the priority sorting algorithm
>> can run again.
> 
> OK thanks for testing. My guess from the above list would be EDMA
> or CPSW missing a flush of posted write. Maybe try adding a readback
> of the related device revision register after acking the interrupt into
> TPCC interrupt handler and CPSW interrupt handler(s)?

I could get back to debugging this only now. I have converted
__raw_writel to writel() and also added readback from the same register
in both EDMA and CPSW drivers. But I am still able to reproduce the
spurious irq reports.

> The timer2 and uart0 seem to be false positives here naturally.

I also added readback in 8250 driver. I haven't touched the timer
driver, but I guess if that driver had an issue, it should have come out
much earlier.

I also saw that sometimes previous irq was the TI LCDC interrupt. Added
readback there too. Did not help.

> I would not yet rule out the "previous interrupt" theory until you have
> tried that. We really want to know the root cause of the issue, just
> printing out spurious interrupt does not fix the problem :)

While we cannot rule out a software issue completely, the description in
TRM around spurious interrupts suggests it can happen even with no role
of software.

May I suggest we go ahead and add this patch to the kernel after
addressing Thomas's comment? At least it will prevent kernel from
locking up with flood of prints when a spurious irq happens and allows
easier debug by others too.

Thanks,
Sekhar

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Dec. 3, 2015, 3:02 p.m. UTC | #7
* Sekhar Nori <nsekhar@ti.com> [151203 03:29]:
> On Tuesday 20 October 2015 08:22 PM, Tony Lindgren wrote:
> > 
> > OK thanks for testing. My guess from the above list would be EDMA
> > or CPSW missing a flush of posted write. Maybe try adding a readback
> > of the related device revision register after acking the interrupt into
> > TPCC interrupt handler and CPSW interrupt handler(s)?
> 
> I could get back to debugging this only now. I have converted
> __raw_writel to writel() and also added readback from the same register
> in both EDMA and CPSW drivers. But I am still able to reproduce the
> spurious irq reports.
> 
> > The timer2 and uart0 seem to be false positives here naturally.
> 
> I also added readback in 8250 driver. I haven't touched the timer
> driver, but I guess if that driver had an issue, it should have come out
> much earlier.
> 
> I also saw that sometimes previous irq was the TI LCDC interrupt. Added
> readback there too. Did not help.

OK strange, so far all the ones we've seen have been fixable that way.

> > I would not yet rule out the "previous interrupt" theory until you have
> > tried that. We really want to know the root cause of the issue, just
> > printing out spurious interrupt does not fix the problem :)
> 
> While we cannot rule out a software issue completely, the description in
> TRM around spurious interrupts suggests it can happen even with no role
> of software.

Yes maybe we more than one reason for them.

> May I suggest we go ahead and add this patch to the kernel after
> addressing Thomas's comment? At least it will prevent kernel from
> locking up with flood of prints when a spurious irq happens and allows
> easier debug by others too.

Yes we should naturally fix up the kernel locking.

Please also add something like "enable debug for more information"
to the warning. And then print out the current and previous interrupt
if DEBUG is enabled. And in the comments mention that often the spurious
interrupts has been fixed by adding a flush of the posted write to the
previous interrupt handler in the device driver.

Also, do you have a reproducable test case with mainline kernel I
could add to my collection of shell scripts?

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sekhar Nori Dec. 3, 2015, 3:24 p.m. UTC | #8
On Thursday 03 December 2015 08:32 PM, Tony Lindgren wrote:
> * Sekhar Nori <nsekhar@ti.com> [151203 03:29]:
>> On Tuesday 20 October 2015 08:22 PM, Tony Lindgren wrote:
>>>
>>> OK thanks for testing. My guess from the above list would be EDMA
>>> or CPSW missing a flush of posted write. Maybe try adding a readback
>>> of the related device revision register after acking the interrupt into
>>> TPCC interrupt handler and CPSW interrupt handler(s)?
>>
>> I could get back to debugging this only now. I have converted
>> __raw_writel to writel() and also added readback from the same register
>> in both EDMA and CPSW drivers. But I am still able to reproduce the
>> spurious irq reports.
>>
>>> The timer2 and uart0 seem to be false positives here naturally.
>>
>> I also added readback in 8250 driver. I haven't touched the timer
>> driver, but I guess if that driver had an issue, it should have come out
>> much earlier.
>>
>> I also saw that sometimes previous irq was the TI LCDC interrupt. Added
>> readback there too. Did not help.
> 
> OK strange, so far all the ones we've seen have been fixable that way.
> 
>>> I would not yet rule out the "previous interrupt" theory until you have
>>> tried that. We really want to know the root cause of the issue, just
>>> printing out spurious interrupt does not fix the problem :)
>>
>> While we cannot rule out a software issue completely, the description in
>> TRM around spurious interrupts suggests it can happen even with no role
>> of software.
> 
> Yes maybe we more than one reason for them.
> 
>> May I suggest we go ahead and add this patch to the kernel after
>> addressing Thomas's comment? At least it will prevent kernel from
>> locking up with flood of prints when a spurious irq happens and allows
>> easier debug by others too.
> 
> Yes we should naturally fix up the kernel locking.

Alright. Thanks!

> 
> Please also add something like "enable debug for more information"
> to the warning. And then print out the current and previous interrupt

So I am unconvinced (based on the debug above) that the previous
interrupt information is actually giving any more useful information
than what can be gleaned from observing /proc/interrupts. It seems
previous interrupt noted can be any interrupt you would expect to occur
during the test case anyway.

> if DEBUG is enabled. And in the comments mention that often the spurious
> interrupts has been fixed by adding a flush of the posted write to the
> previous interrupt handler in the device driver.

I can add the comment, no problem.

> Also, do you have a reproducable test case with mainline kernel I
> could add to my collection of shell scripts?

The way I reproduce this is to run the serial port at 3Mbaud in internal
loopback mode with DMA enabled. The test program I use[1] compares the
data sent and received byte-for-byte. With current mainline, that can
mismatch pretty soon. The test will likely end before you see any
spurious irq. There are some patches John Ogness is working on
(currently included in TI's v4.1 kernel) which helps sustain the test
for long and then actually expose the spurious irq issue.

Thanks,
Sekhar

[1] https://git.breakpoint.cc/cgit/bigeasy/serialcheck.git
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Dec. 3, 2015, 3:39 p.m. UTC | #9
* Sekhar Nori <nsekhar@ti.com> [151203 07:25]:
> On Thursday 03 December 2015 08:32 PM, Tony Lindgren wrote:
> > 
> > Yes we should naturally fix up the kernel locking.
> 
> Alright. Thanks!
> 
> > 
> > Please also add something like "enable debug for more information"
> > to the warning. And then print out the current and previous interrupt
> 
> So I am unconvinced (based on the debug above) that the previous
> interrupt information is actually giving any more useful information
> than what can be gleaned from observing /proc/interrupts. It seems
> previous interrupt noted can be any interrupt you would expect to occur
> during the test case anyway.

OK and the fact that I've fixed up 4-5 of these and all of them were
really caused by missing flush of posted write makes me still suspicious :)

> > if DEBUG is enabled. And in the comments mention that often the spurious
> > interrupts has been fixed by adding a flush of the posted write to the
> > previous interrupt handler in the device driver.
> 
> I can add the comment, no problem.

OK thanks. We can add more debug once you figure out what is the root
cause.

> > Also, do you have a reproducable test case with mainline kernel I
> > could add to my collection of shell scripts?
> 
> The way I reproduce this is to run the serial port at 3Mbaud in internal
> loopback mode with DMA enabled. The test program I use[1] compares the
> data sent and received byte-for-byte. With current mainline, that can
> mismatch pretty soon. The test will likely end before you see any
> spurious irq. There are some patches John Ogness is working on
> (currently included in TI's v4.1 kernel) which helps sustain the test
> for long and then actually expose the spurious irq issue.

OK. One thing you have to consider here though is that the EDMA driver
may still wrongly consider several interconnect targets as a single entity.
This can lead to issues where flushing a posted write really only flushes
one of the interconnect targets and that may not be the right one.

Peter has been patching the EDMA driver to solve this problem, but I don't
know if all of them are merged yet, I've added him to Cc.

My bets are on a lack of flush of posted write in the EDMA driver somewhere
and I suggest you investigate that a bit more considering the multiple
interconnect targets :)

Regards,

Tony

> [1] https://git.breakpoint.cc/cgit/bigeasy/serialcheck.git
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/irqchip/irq-omap-intc.c b/drivers/irqchip/irq-omap-intc.c
index 8587d0f8d8c0..739725515fab 100644
--- a/drivers/irqchip/irq-omap-intc.c
+++ b/drivers/irqchip/irq-omap-intc.c
@@ -22,6 +22,8 @@ 
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
+#include <linux/ratelimit.h>
+#include <linux/printk.h>
 
 /* Define these here for now until we drop all board-files */
 #define OMAP24XX_IC_BASE	0x480fe000
@@ -47,6 +49,7 @@ 
 #define INTC_ILR0		0x0100
 
 #define ACTIVEIRQ_MASK		0x7f	/* omap2/3 active interrupt bits */
+#define SPURIOUSIRQ_MASK	(0x1ffffff << 7)
 #define INTCPS_NR_ILR_REGS	128
 #define INTCPS_NR_MIR_REGS	4
 
@@ -333,8 +336,23 @@  omap_intc_handle_irq(struct pt_regs *regs)
 	u32 irqnr;
 
 	irqnr = intc_readl(INTC_SIR);
+
+	/*
+	 * A spurious IRQ can result if interrupt that triggered the
+	 * sorting is no longer active during the sorting (10 INTC
+	 * functional clock cycles after interrupt assertion). Or a
+	 * change in interrupt mask affected the result during sorting
+	 * time. There is no special handling required except ignoring
+	 * the SIR register value just read and retrying.
+	 * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
+	 */
+	if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {
+		pr_debug_ratelimited("%s: spurious irq!\n", __func__);
+		omap_ack_irq(NULL);
+		return;
+	}
+
 	irqnr &= ACTIVEIRQ_MASK;
-	WARN_ONCE(!irqnr, "Spurious IRQ ?\n");
 	handle_domain_irq(domain, irqnr, regs);
 }