Message ID | 20211012163613.30030-1-biju.das.jz@bp.renesas.com (mailing list archive) |
---|---|
Headers | show |
Series | Add functional support for Gigabit Ethernet driver | expand |
On Tue, 12 Oct 2021 17:35:59 +0100 Biju Das wrote: > set_feature patch will send as separate RFC patch along with rx_checksum > patch, as it needs further discussion related to HW checksum. Is this part relating to the crash you observed because of TCP csum offload? I'm trying to understand the situation before and after this series. What makes the crash possible to trigger?
Hi Jakub Kicinski, > -----Original Message----- > Subject: Re: [PATCH net-next v3 00/14] Add functional support for Gigabit > Ethernet driver > > On Tue, 12 Oct 2021 17:35:59 +0100 Biju Das wrote: > > set_feature patch will send as separate RFC patch along with > > rx_checksum patch, as it needs further discussion related to HW > checksum. > > Is this part relating to the crash you observed because of TCP csum > offload? Yes, you are correct. Sergey, suggested use R-Car RX-HW checksum with RCSC/RCPT and But the TOE gives either 0x0 or 0xffff as csum output and feeding this value to skb->csum lead to kernel crash. Regards, Biju > > I'm trying to understand the situation before and after this series. > What makes the crash possible to trigger?
Hi Jakub Kicinski, Thanks for the feedback. > Subject: Re: [PATCH net-next v3 00/14] Add functional support for Gigabit > Ethernet driver > > On Tue, 12 Oct 2021 18:28:07 +0000 Biju Das wrote: > > > On Tue, 12 Oct 2021 17:35:59 +0100 Biju Das wrote: > > > > set_feature patch will send as separate RFC patch along with > > > > rx_checksum patch, as it needs further discussion related to HW > > > checksum. > > > > > > Is this part relating to the crash you observed because of TCP csum > > > offload? > > > > Yes, you are correct. Sergey, suggested use R-Car RX-HW checksum with > > RCSC/RCPT and But the TOE gives either 0x0 or 0xffff as csum output > > and feeding this value to skb->csum lead to kernel crash. > > That's quite concerning. Do you have any of the > > /proc/sys/kernel/panic_on_io_nmi > /proc/sys/kernel/panic_on_oops > /proc/sys/kernel/panic_on_rcu_stall > /proc/sys/kernel/panic_on_unrecovered_nmi > /proc/sys/kernel/panic_on_warn > > knobs set? I'm guessing we hit do_netdev_rx_csum_fault() when the checksum > is incorrect, but I'm surprised that causes a panic. > I tested this last week, if I remember correctly It was not panic, rather do_netdev_rx_csum_fault. I will recheck and will send you the stack trace next time. > > BTW are you seeing 0 / ffff with good or bad checksums? If the device does > a checksum over the IP pseudo-header + TCP only - 0 and ffff would be > correct for a packet with has a valid checksum. But you can't set it to > skb->csum and use CSUM_COMPLETE, you'd need to do something like: > > if (dev_csum == 0 || dev_csum == 0xffff) > skb->ip_summed = CHECKSUM_UNNECESSARY; Yes, it computes IPv4 header checksum and TCP/UDP/ICMP checksum. It will set to 0x0 for both csums, if there is no error. For IPV6, ipv4 header checksum is always set to 0xffff and 0 for TCP/UDP/ICMP checksum. Ok I will use this logic on next RFC. Regards, Biju
On Tue, 12 Oct 2021 18:53:50 +0000 Biju Das wrote: > > > Yes, you are correct. Sergey, suggested use R-Car RX-HW checksum with > > > RCSC/RCPT and But the TOE gives either 0x0 or 0xffff as csum output > > > and feeding this value to skb->csum lead to kernel crash. > > > > That's quite concerning. Do you have any of the > > > > /proc/sys/kernel/panic_on_io_nmi > > /proc/sys/kernel/panic_on_oops > > /proc/sys/kernel/panic_on_rcu_stall > > /proc/sys/kernel/panic_on_unrecovered_nmi > > /proc/sys/kernel/panic_on_warn > > > > knobs set? I'm guessing we hit do_netdev_rx_csum_fault() when the checksum > > is incorrect, but I'm surprised that causes a panic. > > I tested this last week, if I remember correctly It was not panic, > rather do_netdev_rx_csum_fault. I will recheck and will send you the > stack trace next time. Ah, when you say crash you mean a stack trace appears. The machine does not crash? That's fine, we don't need to see the trace.
Hi Jakub Kicinski, Thanks for the feedback. > Subject: Re: [PATCH net-next v3 00/14] Add functional support for Gigabit > Ethernet driver > > On Tue, 12 Oct 2021 18:53:50 +0000 Biju Das wrote: > > > > Yes, you are correct. Sergey, suggested use R-Car RX-HW checksum > > > > with RCSC/RCPT and But the TOE gives either 0x0 or 0xffff as csum > > > > output and feeding this value to skb->csum lead to kernel crash. > > > > > > That's quite concerning. Do you have any of the > > > > > > /proc/sys/kernel/panic_on_io_nmi > > > /proc/sys/kernel/panic_on_oops > > > /proc/sys/kernel/panic_on_rcu_stall > > > /proc/sys/kernel/panic_on_unrecovered_nmi > > > /proc/sys/kernel/panic_on_warn > > > > > > knobs set? I'm guessing we hit do_netdev_rx_csum_fault() when the > > > checksum is incorrect, but I'm surprised that causes a panic. > > > > I tested this last week, if I remember correctly It was not panic, > > rather do_netdev_rx_csum_fault. I will recheck and will send you the > > stack trace next time. > > Ah, when you say crash you mean a stack trace appears. The machine does > not crash? That's fine, we don't need to see the trace. I have rechecked today. It does crash, if you pass bad checksum to skb->csum. Please find the logs. root@smarc-rzg2l:~# ethtool -K eth0 rx on Actual changes: rx-checksum: on [ 34.391956] eth0: hw csum failure [ 34.396044] skb len=168 headroom=142 headlen=168 tailroom=15754 [ 34.396044] mac=(128,14) net=(142,20) trans=162 [ 34.396044] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) [ 34.396044] csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) [ 34.396044] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 [ 34.425196] dev name=eth0 feat=0x0x0000010000004000 [ 34.430231] skb headroom: 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 34.438064] skb headroom: 00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 34.445863] skb headroom: 00000020: ff ff ff ff ff ff ff ff ff ff ff fe ff ff ff ff [ 34.453660] skb headroom: 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 34.461453] skb headroom: 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 34.469246] skb headroom: 00000050: ff ff ff ff ff fe ff ff ff ff ef ff ff ff ff ff [ 34.477038] skb headroom: 00000060: ff ff ff ff ff ff ff ff fd fe ff ff ff ff ff ff [ 34.484831] skb headroom: 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff f7 [ 34.492629] skb headroom: 00000080: 00 11 22 33 44 55 08 00 27 a0 90 eb 08 00 [ 34.499895] skb linear: 00000000: 45 00 00 a8 54 06 40 00 40 06 50 f6 c0 a8 0a 01 [ 34.507692] skb linear: 00000010: c0 a8 0a 02 08 01 03 0c 1d aa 76 e1 a3 4b 1c d8 [ 34.515488] skb linear: 00000020: 80 18 21 55 72 75 00 00 01 01 08 0a da da 94 96 [ 34.523283] skb linear: 00000030: b5 fb f1 40 80 00 00 70 d9 4a 86 c0 00 00 00 01 [ 34.531078] skb linear: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 34.538873] skb linear: 00000050: 00 00 00 00 00 00 00 05 00 00 01 ff 00 00 00 01 [ 34.546668] skb linear: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 [ 34.554462] skb linear: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 34.562260] skb linear: 00000080: 58 9d 6c 79 80 e3 86 f2 00 00 00 00 00 9a 8d 38 [ 34.570055] skb linear: 00000090: 61 65 87 ee 22 8c 2f b9 61 4a 17 fb 35 09 00 98 [ 34.577850] skb linear: 000000a0: 61 4a 17 fb 35 09 00 98 [ 34.583534] skb tailroom: 00000000: 00 00 00 00 21 07 00 b0 63 70 47 f9 42 5f 47 f9 [ 34.591329] skb tailroom: 00000010: 21 f8 47 f9 80 74 47 f9 de 60 fc 97 a0 06 00 f9 [ 42.262329] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc4-arm64-renesas-01223-g75a50e114cb6-dirty #416 [ 42.272370] Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT) [ 42.278979] Call trace: [ 42.281460] dump_backtrace+0x0/0x188 [ 42.285186] show_stack+0x14/0x20 [ 42.288550] dump_stack_lvl+0x88/0xb0 [ 42.292267] dump_stack+0x14/0x2c [ 42.295629] do_netdev_rx_csum_fault+0x44/0x50 [ 42.300134] __skb_gro_checksum_complete+0xb0/0xb8 [ 42.304994] tcp4_gro_receive+0xbc/0x198 [ 42.308974] inet_gro_receive+0x300/0x410 [ 42.313039] dev_gro_receive+0x308/0x898 [ 42.317018] napi_gro_receive+0x88/0x370 [ 42.320996] ravb_rx_gbeth+0x36c/0x568 [ 42.324801] ravb_poll+0xd0/0x278 [ 42.328163] __napi_poll+0x38/0x2e0 [ 42.331702] net_rx_action+0xf4/0x248 [ 42.335416] _stext+0x150/0x5d8 [ 42.338604] irq_exit+0x198/0x1b8 [ 42.341970] handle_domain_irq+0x60/0x88 [ 42.345950] gic_handle_irq+0x50/0x110 [ 42.349755] call_on_irq_stack+0x28/0x3c [ 42.353733] do_interrupt_handler+0x4c/0x58 [ 42.357974] el1_interrupt+0x2c/0x108 [ 42.361688] el1h_64_irq_handler+0x14/0x20 [ 42.365841] el1h_64_irq+0x74/0x78 [ 42.369291] arch_cpu_idle+0x14/0x20 [ 42.372918] default_idle_call+0x78/0x330 [ 42.376984] do_idle+0x22c/0x278 [ 42.380261] cpu_startup_entry+0x20/0x68 [ 42.384239] rest_init+0x180/0x280 [ 42.387689] arch_call_rest_init+0xc/0x14 [ 42.391759] start_kernel+0x62c/0x664 [ 42.395472] __primary_switched+0xa0/0xa8 Regards, Biju
Sorry for the spam, it is not crash, it is just printing out the stack trace. I have tested with rx hw checksum turned on by default as well. It boots with NFS, but with that noisy messages, if we pass bad checksum to skb->csum. Regards, Biju > Subject: RE: [PATCH net-next v3 00/14] Add functional support for Gigabit > Ethernet driver > > Hi Jakub Kicinski, > > Thanks for the feedback. > > > Subject: Re: [PATCH net-next v3 00/14] Add functional support for > > Gigabit Ethernet driver > > > > On Tue, 12 Oct 2021 18:53:50 +0000 Biju Das wrote: > > > > > Yes, you are correct. Sergey, suggested use R-Car RX-HW checksum > > > > > with RCSC/RCPT and But the TOE gives either 0x0 or 0xffff as > > > > > csum output and feeding this value to skb->csum lead to kernel > crash. > > > > > > > > That's quite concerning. Do you have any of the > > > > > > > > /proc/sys/kernel/panic_on_io_nmi > > > > /proc/sys/kernel/panic_on_oops > > > > /proc/sys/kernel/panic_on_rcu_stall > > > > /proc/sys/kernel/panic_on_unrecovered_nmi > > > > /proc/sys/kernel/panic_on_warn > > > > > > > > knobs set? I'm guessing we hit do_netdev_rx_csum_fault() when the > > > > checksum is incorrect, but I'm surprised that causes a panic. > > > > > > I tested this last week, if I remember correctly It was not panic, > > > rather do_netdev_rx_csum_fault. I will recheck and will send you the > > > stack trace next time. > > > > Ah, when you say crash you mean a stack trace appears. The machine > > does not crash? That's fine, we don't need to see the trace. > > I have rechecked today. It does crash, if you pass bad checksum to skb- > >csum. Please find the logs. > > root@smarc-rzg2l:~# ethtool -K eth0 rx on Actual changes: > rx-checksum: on > [ 34.391956] eth0: hw csum failure > [ 34.396044] skb len=168 headroom=142 headlen=168 tailroom=15754 > [ 34.396044] mac=(128,14) net=(142,20) trans=162 > [ 34.396044] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > [ 34.396044] csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) > [ 34.396044] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 > [ 34.425196] dev name=eth0 feat=0x0x0000010000004000 > [ 34.430231] skb headroom: 00000000: ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff > [ 34.438064] skb headroom: 00000010: ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff > [ 34.445863] skb headroom: 00000020: ff ff ff ff ff ff ff ff ff ff ff fe > ff ff ff ff > [ 34.453660] skb headroom: 00000030: ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff > [ 34.461453] skb headroom: 00000040: ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff > [ 34.469246] skb headroom: 00000050: ff ff ff ff ff fe ff ff ff ff ef ff > ff ff ff ff > [ 34.477038] skb headroom: 00000060: ff ff ff ff ff ff ff ff fd fe ff ff > ff ff ff ff > [ 34.484831] skb headroom: 00000070: ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff f7 > [ 34.492629] skb headroom: 00000080: 00 11 22 33 44 55 08 00 27 a0 90 eb > 08 00 > [ 34.499895] skb linear: 00000000: 45 00 00 a8 54 06 40 00 40 06 50 f6 > c0 a8 0a 01 > [ 34.507692] skb linear: 00000010: c0 a8 0a 02 08 01 03 0c 1d aa 76 e1 > a3 4b 1c d8 > [ 34.515488] skb linear: 00000020: 80 18 21 55 72 75 00 00 01 01 08 0a > da da 94 96 > [ 34.523283] skb linear: 00000030: b5 fb f1 40 80 00 00 70 d9 4a 86 c0 > 00 00 00 01 > [ 34.531078] skb linear: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > [ 34.538873] skb linear: 00000050: 00 00 00 00 00 00 00 05 00 00 01 ff > 00 00 00 01 > [ 34.546668] skb linear: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 11 > [ 34.554462] skb linear: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > [ 34.562260] skb linear: 00000080: 58 9d 6c 79 80 e3 86 f2 00 00 00 00 > 00 9a 8d 38 > [ 34.570055] skb linear: 00000090: 61 65 87 ee 22 8c 2f b9 61 4a 17 fb > 35 09 00 98 > [ 34.577850] skb linear: 000000a0: 61 4a 17 fb 35 09 00 98 > [ 34.583534] skb tailroom: 00000000: 00 00 00 00 21 07 00 b0 63 70 47 f9 > 42 5f 47 f9 > [ 34.591329] skb tailroom: 00000010: 21 f8 47 f9 80 74 47 f9 de 60 fc 97 > a0 06 00 f9 > > [ 42.262329] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc4-arm64- > renesas-01223-g75a50e114cb6-dirty #416 > [ 42.272370] Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT) > [ 42.278979] Call trace: > [ 42.281460] dump_backtrace+0x0/0x188 > [ 42.285186] show_stack+0x14/0x20 > [ 42.288550] dump_stack_lvl+0x88/0xb0 > [ 42.292267] dump_stack+0x14/0x2c > [ 42.295629] do_netdev_rx_csum_fault+0x44/0x50 > [ 42.300134] __skb_gro_checksum_complete+0xb0/0xb8 > [ 42.304994] tcp4_gro_receive+0xbc/0x198 > [ 42.308974] inet_gro_receive+0x300/0x410 > [ 42.313039] dev_gro_receive+0x308/0x898 > [ 42.317018] napi_gro_receive+0x88/0x370 > [ 42.320996] ravb_rx_gbeth+0x36c/0x568 > [ 42.324801] ravb_poll+0xd0/0x278 > [ 42.328163] __napi_poll+0x38/0x2e0 > [ 42.331702] net_rx_action+0xf4/0x248 > [ 42.335416] _stext+0x150/0x5d8 > [ 42.338604] irq_exit+0x198/0x1b8 > [ 42.341970] handle_domain_irq+0x60/0x88 > [ 42.345950] gic_handle_irq+0x50/0x110 > [ 42.349755] call_on_irq_stack+0x28/0x3c > [ 42.353733] do_interrupt_handler+0x4c/0x58 > [ 42.357974] el1_interrupt+0x2c/0x108 > [ 42.361688] el1h_64_irq_handler+0x14/0x20 > [ 42.365841] el1h_64_irq+0x74/0x78 > [ 42.369291] arch_cpu_idle+0x14/0x20 > [ 42.372918] default_idle_call+0x78/0x330 > [ 42.376984] do_idle+0x22c/0x278 > [ 42.380261] cpu_startup_entry+0x20/0x68 > [ 42.384239] rest_init+0x180/0x280 > [ 42.387689] arch_call_rest_init+0xc/0x14 > [ 42.391759] start_kernel+0x62c/0x664 > [ 42.395472] __primary_switched+0xa0/0xa8 > > > Regards, > Biju
Hello: This series was applied to netdev/net-next.git (master) by Jakub Kicinski <kuba@kernel.org>: On Tue, 12 Oct 2021 17:35:59 +0100 you wrote: > The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are > similar to the R-Car Ethernet AVB IP. > > The Gigabit Ethernet IP consists of Ethernet controller (E-MAC), Internal > TCP/IP Offload Engine (TOE) and Dedicated Direct memory access controller > (DMAC). > > [...] Here is the summary with links: - [net-next,v3,01/14] ravb: Use ALIGN macro for max_rx_len https://git.kernel.org/netdev/net-next/c/23144a915684 - [net-next,v3,02/14] ravb: Add rx_max_buf_size to struct ravb_hw_info https://git.kernel.org/netdev/net-next/c/2e95e08ac009 - [net-next,v3,03/14] ravb: Fillup ravb_alloc_rx_desc_gbeth() stub https://git.kernel.org/netdev/net-next/c/3d4e37df882b - [net-next,v3,04/14] ravb: Fillup ravb_rx_ring_free_gbeth() stub https://git.kernel.org/netdev/net-next/c/2458b8edb887 - [net-next,v3,05/14] ravb: Fillup ravb_rx_ring_format_gbeth() stub https://git.kernel.org/netdev/net-next/c/16a6e245a9f3 - [net-next,v3,06/14] ravb: Fillup ravb_rx_gbeth() stub https://git.kernel.org/netdev/net-next/c/1c59eb678cbd - [net-next,v3,07/14] ravb: Add carrier_counters to struct ravb_hw_info https://git.kernel.org/netdev/net-next/c/b6a4ee6e74de - [net-next,v3,08/14] ravb: Add support to retrieve stats for GbEthernet https://git.kernel.org/netdev/net-next/c/0ee65bc14ff2 - [net-next,v3,09/14] ravb: Rename "tsrq" variable https://git.kernel.org/netdev/net-next/c/4ea3167bad27 - [net-next,v3,10/14] ravb: Optimize ravb_emac_init_gbeth function https://git.kernel.org/netdev/net-next/c/030634f37db9 - [net-next,v3,11/14] ravb: Rename "nc_queue" feature bit https://git.kernel.org/netdev/net-next/c/1091da579d7c - [net-next,v3,12/14] ravb: Document PFRI register bit https://git.kernel.org/netdev/net-next/c/95e99b10482d - [net-next,v3,13/14] ravb: Update ravb_emac_init_gbeth() https://git.kernel.org/netdev/net-next/c/3d6b24a2ada3 - [net-next,v3,14/14] ravb: Fix typo AVB->DMAC https://git.kernel.org/netdev/net-next/c/940409264647 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html