diff mbox series

gianfar: fix jumbo packets+napi+rx overrun crash

Message ID 20210304031238.28880-1-michael-dev@fami-braun.de (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series gianfar: fix jumbo packets+napi+rx overrun crash | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cc_maintainers warning 2 maintainers not CCed: kuba@kernel.org davem@davemloft.net
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

michael-dev March 4, 2021, 3:12 a.m. UTC
From: Michael Braun <michael-dev@fami-braun.de>

When using jumbo packets and overrunning rx queue with napi enabled,
the following sequence is observed in gfar_add_rx_frag:

   | lstatus                              |       | skb                   |
t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
---+--------------------------------------+-------+-----------------------+
13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |

So at t=7 a new packets is started but not finished, probably due to rx
overrun - but rx overrun is not indicated in the flags. Instead a new
packets starts at t=8. This results in skb->len to exceed size for the LAST
fragment at t=13 and thus a negative fragment size added to the skb.

This then crashes:

kernel BUG at include/linux/skbuff.h:2277!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c04689f4] skb_pull+0x2c/0x48
LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
Call Trace:
[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
[ec4bff08] [c0062718] kthread+0x140/0x144
[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c

This patch fixes this by checking for computed LAST fragment size, so a
negative sized fragment is never added.
In order to prevent the newer rx frame from getting corrupted, the FIRST
flag is checked to discard the incomplete older frame.

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
---
 drivers/net/ethernet/freescale/gianfar.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Claudiu Manoil March 4, 2021, 10:05 a.m. UTC | #1
>-----Original Message-----
>From: michael-dev@fami-braun.de <michael-dev@fami-braun.de>
>Sent: Thursday, March 4, 2021 5:13 AM
>To: Claudiu Manoil <claudiu.manoil@nxp.com>
>Cc: netdev@vger.kernel.org; Michael Braun <michael-dev@fami-braun.de>
>Subject: [PATCH] gianfar: fix jumbo packets+napi+rx overrun crash
>
>From: Michael Braun <michael-dev@fami-braun.de>
>
>When using jumbo packets and overrunning rx queue with napi enabled,
>the following sequence is observed in gfar_add_rx_frag:
>

Hi,

Could you help provide some context, e.g.:
On what board/soc were you able to trigger this issue?
How often does the overrun occur? What's the use case? Is the issue triggered with smaller packets than 9600B?
Increasing the Rx ring size does significantly reduce ring overruns?
Thanks.

>   | lstatus                              |       | skb                   |
>t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
>---+--------------------------------------+-------+-----------------------+
>13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
>12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
>11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
>10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
>09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
>08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
>07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
>06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
>05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
>04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
>03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
>02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
>01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
>00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |
>
>So at t=7 a new packets is started but not finished, probably due to rx
>overrun - but rx overrun is not indicated in the flags. Instead a new
>packets starts at t=8. This results in skb->len to exceed size for the LAST
>fragment at t=13 and thus a negative fragment size added to the skb.
>
>This then crashes:
>
>kernel BUG at include/linux/skbuff.h:2277!
>Oops: Exception in kernel mode, sig: 5 [#1]
>...
>NIP [c04689f4] skb_pull+0x2c/0x48
>LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
>Call Trace:
>[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
>[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
>[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
>[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
>[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
>[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
>[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
>[ec4bff08] [c0062718] kthread+0x140/0x144
>[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
>
>This patch fixes this by checking for computed LAST fragment size, so a
>negative sized fragment is never added.
>In order to prevent the newer rx frame from getting corrupted, the FIRST
>flag is checked to discard the incomplete older frame.
>
>Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
>---
> drivers/net/ethernet/freescale/gianfar.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
>diff --git a/drivers/net/ethernet/freescale/gianfar.c
>b/drivers/net/ethernet/freescale/gianfar.c
>index 541de32ea662..2aecae23bfd0 100644
>--- a/drivers/net/ethernet/freescale/gianfar.c
>+++ b/drivers/net/ethernet/freescale/gianfar.c
>@@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff
>*rxb, u32 lstatus,
> 		if (lstatus & BD_LFLAG(RXBD_LAST))
> 			size -= skb->len;
>
>+		WARN(size < 0, "gianfar: rx fragment size underflow");
>+		if (size < 0)
>+			return false;
>+
> 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
> 				rxb->page_offset + RXBUF_ALIGNMENT,
> 				size, GFAR_RXB_TRUESIZE);
>@@ -2552,6 +2556,16 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q
>*rx_queue,
> 		if (lstatus & BD_LFLAG(RXBD_EMPTY))
> 			break;
>
>+		/* lost RXBD_LAST descriptor due to overrun */
>+		if (skb &&
>+		    (lstatus & BD_LFLAG(RXBD_FIRST))) {
>+			/* discard faulty buffer */
>+			dev_kfree_skb(skb);
>+			skb = NULL;
>+
>+			/* can continue normally */
>+		}
>+

This is indeed an invalid state. If you hit this condition, discarding the skb is the right thing to do.

Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>

> 		/* order rx buffer descriptor reads */
> 		rmb();
>
>--
>2.20.1
michael-dev March 7, 2021, 8:51 p.m. UTC | #2
Hi,

sorry I missed the mail.

Am 04.03.2021 11:05, schrieb Claudiu Manoil:
> Could you help provide some context, e.g.:
> On what board/soc were you able to trigger this issue?

I have an OpenWRT running on P1020WLAN boards and have IPsec for some 
gretap tunnel configured.
I run iperf3 -s on the AP and iperf3 -c --udp -b 1000M on some server 
and then the AP reliably crashes.
It also crashed sometimes during normal operations when doing some fast 
download over the IPsec tunnel, but that was hard to reproduce.

> How often does the overrun occur?
Reliably within seconds with the above test.

> What's the use case? Is the issue triggered with smaller packets than 
> 9600B?
It cannot be triggered with 1500 Byte  MTU packets as these have FIRST 
and LAST set in one.
I use jumbo frames to speed up ipsec.

> Increasing the Rx ring size does significantly reduce ring overruns?

I did not test this.

Regards,
Michael Braun
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 541de32ea662..2aecae23bfd0 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2390,6 +2390,10 @@  static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus,
 		if (lstatus & BD_LFLAG(RXBD_LAST))
 			size -= skb->len;
 
+		WARN(size < 0, "gianfar: rx fragment size underflow");
+		if (size < 0)
+			return false;
+
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
 				rxb->page_offset + RXBUF_ALIGNMENT,
 				size, GFAR_RXB_TRUESIZE);
@@ -2552,6 +2556,16 @@  static int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue,
 		if (lstatus & BD_LFLAG(RXBD_EMPTY))
 			break;
 
+		/* lost RXBD_LAST descriptor due to overrun */
+		if (skb &&
+		    (lstatus & BD_LFLAG(RXBD_FIRST))) {
+			/* discard faulty buffer */
+			dev_kfree_skb(skb);
+			skb = NULL;
+
+			/* can continue normally */
+		}
+
 		/* order rx buffer descriptor reads */
 		rmb();