Message ID | 20210304031238.28880-1-michael-dev@fami-braun.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | gianfar: fix jumbo packets+napi+rx overrun crash | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | warning | 2 maintainers not CCed: kuba@kernel.org davem@davemloft.net |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 20 this patch: 20 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 26 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 20 this patch: 20 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
>-----Original Message----- >From: michael-dev@fami-braun.de <michael-dev@fami-braun.de> >Sent: Thursday, March 4, 2021 5:13 AM >To: Claudiu Manoil <claudiu.manoil@nxp.com> >Cc: netdev@vger.kernel.org; Michael Braun <michael-dev@fami-braun.de> >Subject: [PATCH] gianfar: fix jumbo packets+napi+rx overrun crash > >From: Michael Braun <michael-dev@fami-braun.de> > >When using jumbo packets and overrunning rx queue with napi enabled, >the following sequence is observed in gfar_add_rx_frag: > Hi, Could you help provide some context, e.g.: On what board/soc were you able to trigger this issue? How often does the overrun occur? What's the use case? Is the issue triggered with smaller packets than 9600B? Increasing the Rx ring size does significantly reduce ring overruns? Thanks. > | lstatus | | skb | >t | lstatus, size, flags | first | len, data_len, *ptr | >---+--------------------------------------+-------+-----------------------+ >13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e | >12 | 10000640, 1600, INTERRUPT | 0 | 8000, 6400, f554c12e | >11 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, f554c12e | >10 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, f554c12e | >09 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, f554c12e | >08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e | >07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, f554c12e | >06 | 1c000080, 128, INTERRUPT LAST FIRST | 1 | 0, 0, abf3bd6e | >05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 | >04 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, c5a57780 | >03 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, c5a57780 | >02 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, c5a57780 | >01 | 10000640, 1600, INTERRUPT | 0 | 1600, 0, c5a57780 | >00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, c5a57780 | > >So at t=7 a new packets is started but not finished, probably due to rx >overrun - but rx overrun is not indicated in the flags. Instead a new >packets starts at t=8. This results in skb->len to exceed size for the LAST >fragment at t=13 and thus a negative fragment size added to the skb. > >This then crashes: > >kernel BUG at include/linux/skbuff.h:2277! >Oops: Exception in kernel mode, sig: 5 [#1] >... >NIP [c04689f4] skb_pull+0x2c/0x48 >LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 >Call Trace: >[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) >[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 >[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c >[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 >[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc >[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 >[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc >[ec4bff08] [c0062718] kthread+0x140/0x144 >[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c > >This patch fixes this by checking for computed LAST fragment size, so a >negative sized fragment is never added. >In order to prevent the newer rx frame from getting corrupted, the FIRST >flag is checked to discard the incomplete older frame. > >Signed-off-by: Michael Braun <michael-dev@fami-braun.de> >--- > drivers/net/ethernet/freescale/gianfar.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > >diff --git a/drivers/net/ethernet/freescale/gianfar.c >b/drivers/net/ethernet/freescale/gianfar.c >index 541de32ea662..2aecae23bfd0 100644 >--- a/drivers/net/ethernet/freescale/gianfar.c >+++ b/drivers/net/ethernet/freescale/gianfar.c >@@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff >*rxb, u32 lstatus, > if (lstatus & BD_LFLAG(RXBD_LAST)) > size -= skb->len; > >+ WARN(size < 0, "gianfar: rx fragment size underflow"); >+ if (size < 0) >+ return false; >+ > skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, > rxb->page_offset + RXBUF_ALIGNMENT, > size, GFAR_RXB_TRUESIZE); >@@ -2552,6 +2556,16 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q >*rx_queue, > if (lstatus & BD_LFLAG(RXBD_EMPTY)) > break; > >+ /* lost RXBD_LAST descriptor due to overrun */ >+ if (skb && >+ (lstatus & BD_LFLAG(RXBD_FIRST))) { >+ /* discard faulty buffer */ >+ dev_kfree_skb(skb); >+ skb = NULL; >+ >+ /* can continue normally */ >+ } >+ This is indeed an invalid state. If you hit this condition, discarding the skb is the right thing to do. Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com> > /* order rx buffer descriptor reads */ > rmb(); > >-- >2.20.1
Hi, sorry I missed the mail. Am 04.03.2021 11:05, schrieb Claudiu Manoil: > Could you help provide some context, e.g.: > On what board/soc were you able to trigger this issue? I have an OpenWRT running on P1020WLAN boards and have IPsec for some gretap tunnel configured. I run iperf3 -s on the AP and iperf3 -c --udp -b 1000M on some server and then the AP reliably crashes. It also crashed sometimes during normal operations when doing some fast download over the IPsec tunnel, but that was hard to reproduce. > How often does the overrun occur? Reliably within seconds with the above test. > What's the use case? Is the issue triggered with smaller packets than > 9600B? It cannot be triggered with 1500 Byte MTU packets as these have FIRST and LAST set in one. I use jumbo frames to speed up ipsec. > Increasing the Rx ring size does significantly reduce ring overruns? I did not test this. Regards, Michael Braun
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 541de32ea662..2aecae23bfd0 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2390,6 +2390,10 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus, if (lstatus & BD_LFLAG(RXBD_LAST)) size -= skb->len; + WARN(size < 0, "gianfar: rx fragment size underflow"); + if (size < 0) + return false; + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, rxb->page_offset + RXBUF_ALIGNMENT, size, GFAR_RXB_TRUESIZE); @@ -2552,6 +2556,16 @@ static int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, if (lstatus & BD_LFLAG(RXBD_EMPTY)) break; + /* lost RXBD_LAST descriptor due to overrun */ + if (skb && + (lstatus & BD_LFLAG(RXBD_FIRST))) { + /* discard faulty buffer */ + dev_kfree_skb(skb); + skb = NULL; + + /* can continue normally */ + } + /* order rx buffer descriptor reads */ rmb();