diff mbox series

serial: 8250: 8250_omap: Fix possible interrupt storm

Message ID 20210511151955.28071-1-vigneshr@ti.com (mailing list archive)
State New, archived
Headers show
Series serial: 8250: 8250_omap: Fix possible interrupt storm | expand

Commit Message

Vignesh Raghavendra May 11, 2021, 3:19 p.m. UTC
It is possible that RX TIMEOUT is signalled after RX FIFO has been
drained, in which case a dummy read of RX FIFO is required to clear RX
TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared
leading to an interrupt storm

Cc: stable@vger.kernel.org
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
---
 drivers/tty/serial/8250/8250_omap.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Greg KH May 13, 2021, 2:17 p.m. UTC | #1
On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote:
> It is possible that RX TIMEOUT is signalled after RX FIFO has been
> drained, in which case a dummy read of RX FIFO is required to clear RX
> TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared
> leading to an interrupt storm
> 
> Cc: stable@vger.kernel.org

How far back does this need to go?  What commit id does this fix?  What
caused this to just show up now vs. previously?

thanks,

greg k-h
Tony Lindgren May 28, 2021, 5:39 a.m. UTC | #2
Hi Greg, Vignesh & Jan,

* Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]:
> On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote:
> > It is possible that RX TIMEOUT is signalled after RX FIFO has been
> > drained, in which case a dummy read of RX FIFO is required to clear RX
> > TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared
> > leading to an interrupt storm
> > 
> > Cc: stable@vger.kernel.org
> 
> How far back does this need to go?  What commit id does this fix?  What
> caused this to just show up now vs. previously?

I just noticed this causes the following regression in Linux next when
pressing a key on uart console after boot at least on omap3. This seems
to happen on serial_port_in(port, UART_RX) in the quirk handling.

Vignesh, it seems this quirk needs some soc specific flag added to
it maybe? Or maybe UART_OMAP_RX_LVL register is not available for
all the SoCs?

I think it's best to drop this patch until the issues are resolved,
also there are some open comments above that might be answered by
limiting this quirk to a specific range of SoCs :)

Regards,

Tony

8< ------------------------
#regzb introduced: 31fae7c8b18c ("serial: 8250: 8250_omap: Fix possible interrupt storm")

Internal error: : 1028 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 870 Comm: syslog Not tainted 5.13.0-rc3-next-20210527-00001-g9b545469a50f #34
Hardware name: Generic OMAP36xx (Flattened Device Tree)
PC is at mem_serial_in+0xc/0x20
LR is at omap8250_irq+0x258/0x2d4
pc : [<c06762f0>]    lr : [<c0681720>]    psr: 60000193
sp : c2975c90  ip : 00000000  fp : c1836000
r10: c0ff7f20  r9 : c0ff7f40  r8 : 00000058
r7 : c2975ce0  r6 : 00000000  r5 : 00000001  r4 : c1031a24
r3 : fa06a000  r2 : 00000002  r1 : 00000000  r0 : c1031a24
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 829cc019  DAC: 00000051
Register r0 information: non-slab/vmalloc memory
Register r1 information: NULL pointer
Register r2 information: non-paged memory
Register r3 information: 0-page vmalloc region starting at 0xfa000000 allocated at iotable_init+0x0/0xf4
Register r4 information: non-slab/vmalloc memory
Register r5 information: non-paged memory
Register r6 information: NULL pointer
Register r7 information: non-slab/vmalloc memory
Register r8 information: non-paged memory
Register r9 information: non-slab/vmalloc memory
Register r10 information: non-slab/vmalloc memory
Register r11 information: slab kmalloc-256 start c1836000 pointer offset 0 size 256
Register r12 information: NULL pointer
Process syslog (pid: 870, stack limit = 0x64988e4e)
...
[<c06762f0>] (mem_serial_in) from [<c0681720>] (omap8250_irq+0x258/0x2d4)
[<c0681720>] (omap8250_irq) from [<c01a00b8>] (__handle_irq_event_percpu+0x58/0x1f0)
[<c01a00b8>] (__handle_irq_event_percpu) from [<c01a0334>] (handle_irq_event+0x68/0xcc)
[<c01a0334>] (handle_irq_event) from [<c01a4b6c>] (handle_level_irq+0xc4/0x1c8)
[<c01a4b6c>] (handle_level_irq) from [<c019f968>] (__handle_domain_irq+0x84/0xfc)
[<c019f968>] (__handle_domain_irq) from [<c0100b6c>] (__irq_svc+0x6c/0x90)
Exception stack(0xc2975d28 to 0xc2975d70)
5d20:                   c2532990 c25328a8 00000006 c289a015 c2532888 c2532990
5d40: 00000002 00000000 00000006 64407df7 c2532880 64407df7 00000006 c2975d78
5d60: c02ef31c c030097c 60000013 ffffffff
[<c0100b6c>] (__irq_svc) from [<c030097c>] (__d_lookup_rcu+0xbc/0x1b8)
[<c030097c>] (__d_lookup_rcu) from [<c02ef31c>] (lookup_fast+0x48/0x180)
[<c02ef31c>] (lookup_fast) from [<c02f23b8>] (walk_component+0x3c/0x1c8)
[<c02f23b8>] (walk_component) from [<c02f2780>] (link_path_walk.part.0+0x23c/0x364)
[<c02f2780>] (link_path_walk.part.0) from [<c02f28dc>] (path_parentat+0x34/0x74)
[<c02f28dc>] (path_parentat) from [<c02f48c8>] (filename_parentat+0x88/0x19c)
[<c02f48c8>] (filename_parentat) from [<c02f4a20>] (filename_create+0x44/0x150)
[<c02f4a20>] (filename_create) from [<c02f4c2c>] (do_mkdirat+0x58/0x11c)
[<c02f4c2c>] (do_mkdirat) from [<c0100080>] (ret_fast_syscall+0x0/0x58)
Vignesh Raghavendra May 28, 2021, 6:11 a.m. UTC | #3
Hi,

On 5/28/21 11:09 AM, Tony Lindgren wrote:
> Hi Greg, Vignesh & Jan,
> 
> * Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]:
>> On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote:
>>> It is possible that RX TIMEOUT is signalled after RX FIFO has been
>>> drained, in which case a dummy read of RX FIFO is required to clear RX
>>> TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared
>>> leading to an interrupt storm
>>>
>>> Cc: stable@vger.kernel.org
>>
>> How far back does this need to go?  What commit id does this fix?  What
>> caused this to just show up now vs. previously?

Sorry, I missed this reply. Issue was reported on AM65x SoC with custom
test case from Jan Kiszka that stressed UART with rapid baudrate changes
from 9600 to 4M along with data transfer.

Based on the condition that led to interrupt storm, I inferred it to
affect all SoCs with 8250 OMAP UARTs. But that seems thats not the best
idea as seen from OMAP3 regression.

Greg,

Could you please drop the patch? Very sorry for the inconvenience..

> 
> I just noticed this causes the following regression in Linux next when
> pressing a key on uart console after boot at least on omap3. This seems
> to happen on serial_port_in(port, UART_RX) in the quirk handling.
> 
> Vignesh, it seems this quirk needs some soc specific flag added to
> it maybe? Or maybe UART_OMAP_RX_LVL register is not available for
> all the SoCs?
> 

Yes indeed :(

> I think it's best to drop this patch until the issues are resolved,
> also there are some open comments above that might be answered by
> limiting this quirk to a specific range of SoCs :)
> 

Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART
IP is quite different. I will respin the patch making sure, workaround
applies only to AM65x and K3 SoCs.

Regards
Vignesh
Greg KH May 28, 2021, 9 a.m. UTC | #4
On Fri, May 28, 2021 at 11:41:36AM +0530, Vignesh Raghavendra wrote:
> Hi,
> 
> On 5/28/21 11:09 AM, Tony Lindgren wrote:
> > Hi Greg, Vignesh & Jan,
> > 
> > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]:
> >> On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote:
> >>> It is possible that RX TIMEOUT is signalled after RX FIFO has been
> >>> drained, in which case a dummy read of RX FIFO is required to clear RX
> >>> TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared
> >>> leading to an interrupt storm
> >>>
> >>> Cc: stable@vger.kernel.org
> >>
> >> How far back does this need to go?  What commit id does this fix?  What
> >> caused this to just show up now vs. previously?
> 
> Sorry, I missed this reply. Issue was reported on AM65x SoC with custom
> test case from Jan Kiszka that stressed UART with rapid baudrate changes
> from 9600 to 4M along with data transfer.
> 
> Based on the condition that led to interrupt storm, I inferred it to
> affect all SoCs with 8250 OMAP UARTs. But that seems thats not the best
> idea as seen from OMAP3 regression.
> 
> Greg,
> 
> Could you please drop the patch? Very sorry for the inconvenience..

Now reverted, thanks.

greg k-h
diff mbox series

Patch

diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 8ac11eaeca51..c71bd766fa56 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -104,6 +104,9 @@ 
 #define UART_OMAP_EFR2			0x23
 #define UART_OMAP_EFR2_TIMEOUT_BEHAVE	BIT(6)
 
+/* RX FIFO occupancy indicator */
+#define UART_OMAP_RX_LVL		0x64
+
 struct omap8250_priv {
 	int line;
 	u8 habit;
@@ -625,6 +628,15 @@  static irqreturn_t omap8250_irq(int irq, void *dev_id)
 	serial8250_rpm_get(up);
 	iir = serial_port_in(port, UART_IIR);
 	ret = serial8250_handle_irq(port, iir);
+	/*
+	 * It is possible that RX TIMEOUT is signalled after FIFO
+	 * has been drained, in which case a dummy read of RX FIFO is
+	 * required to clear RX TIMEOUT condition.
+	 */
+	if ((iir & UART_IIR_RX_TIMEOUT) == UART_IIR_RX_TIMEOUT) {
+		if (serial_port_in(port, UART_OMAP_RX_LVL) == 0)
+			serial_port_in(port, UART_RX);
+	}
 	serial8250_rpm_put(up);
 
 	return IRQ_RETVAL(ret);