Message ID | 20240912071702.221128-1-en-wei.wu@canonical.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [ipsec,v2] xfrm: check MAC header is shown with both skb->mac_len and skb_mac_header_was_set() | expand |
On Thu, Sep 12, 2024 at 9:17 AM En-Wei Wu <en-wei.wu@canonical.com> wrote: > > When we use Intel WWAN with xfrm, our system always hangs after > browsing websites for a few seconds. The error message shows that > it is a slab-out-of-bounds error: > > [ 67.162014] BUG: KASAN: slab-out-of-bounds in xfrm_input+0x426e/0x6740 > [ 67.162030] Write of size 2 at addr ffff888156cb814b by task ksoftirqd/2/26 > > The reason is that the eth_hdr(skb) inside if statement evaluated > to an unexpected address with skb->mac_header = ~0U (indicating there > is no MAC header). The unreliability of skb->mac_len causes the if > statement to become true even if there is no MAC header inside the > skb data buffer. > > Check both the skb->mac_len and skb_mac_header_was_set(skb) fixes this issue. > > Fixes: 87cdf3148b11 ("xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto") > Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> > --- > Changes in v2: > * Change the title from "xfrm: avoid using skb->mac_len to decide if mac header is shown" > * Remain skb->mac_len check > * Apply fix on ipv6 path too > --- > net/xfrm/xfrm_input.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c > index 749e7eea99e4..eef0145c73a7 100644 > --- a/net/xfrm/xfrm_input.c > +++ b/net/xfrm/xfrm_input.c > @@ -251,7 +251,7 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) > > skb_reset_network_header(skb); > skb_mac_header_rebuild(skb); > - if (skb->mac_len) > + if (skb->mac_len && skb_mac_header_was_set(skb)) > eth_hdr(skb)->h_proto = skb->protocol; I would swap the two conditions : We might in the future debug kernels leave mac_len uninitialized if mac_header was never set. It would be nice to catch the issue sooner. Something is calling skb_reset_mac_len() while the mac_header was not set ? Considering the stack trace, I can not see why mac_header is not set. Could you try the following patch, and compile your test kernel with CONFIG_DEBUG_NET=y ? diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 39f1d16f362887821caa022464695c4045461493..fb06dc81039253bafeb49f0b7228748e898f480f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2909,9 +2909,19 @@ static inline void skb_reset_inner_headers(struct sk_buff *skb) skb->inner_transport_header = skb->transport_header; } +static inline int skb_mac_header_was_set(const struct sk_buff *skb) +{ + return skb->mac_header != (typeof(skb->mac_header))~0U; +} + static inline void skb_reset_mac_len(struct sk_buff *skb) { - skb->mac_len = skb->network_header - skb->mac_header; + if (!skb_mac_header_was_set(skb)) { + DEBUG_NET_WARN_ON_ONCE(1); + skb->mac_len = 0; + } else { + skb->mac_len = skb->network_header - skb->mac_header; + } } static inline unsigned char *skb_inner_transport_header(const struct sk_buff @@ -3014,11 +3024,6 @@ static inline void skb_set_network_header(struct sk_buff *skb, const int offset) skb->network_header += offset; } -static inline int skb_mac_header_was_set(const struct sk_buff *skb) -{ - return skb->mac_header != (typeof(skb->mac_header))~0U; -} - static inline unsigned char *skb_mac_header(const struct sk_buff *skb) { DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb));
Hello *, On Thu, 12 Sep 2024 15:17:02 +0800, En-Wei Wu <en-wei.wu@canonical.com> wrote: > When we use Intel WWAN with xfrm, our system always hangs after > browsing websites for a few seconds. The error message shows that > it is a slab-out-of-bounds error: > > [ 67.162014] BUG: KASAN: slab-out-of-bounds in xfrm_input+0x426e/0x6740 > [ 67.162030] Write of size 2 at addr ffff888156cb814b by task ksoftirqd/2/26 > > [ 67.162043] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted 6.11.0-rc6-c763c4339688+ #2 > [ 67.162053] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS 1.15.0 07/15/2024 > [ 67.162058] Call Trace: > [ 67.162062] <TASK> > [ 67.162068] dump_stack_lvl+0x76/0xa0 > [ 67.162079] print_report+0xce/0x5f0 > [ 67.162088] ? xfrm_input+0x426e/0x6740 > [ 67.162096] ? kasan_complete_mode_report_info+0x26/0x200 > [ 67.162105] ? xfrm_input+0x426e/0x6740 > [ 67.162112] kasan_report+0xbe/0x110 > [ 67.162119] ? xfrm_input+0x426e/0x6740 > [ 67.162129] __asan_report_store_n_noabort+0x12/0x30 > [ 67.162138] xfrm_input+0x426e/0x6740 > [ 67.162149] ? __pfx_xfrm_input+0x10/0x10 > [ 67.162160] ? __kasan_check_read+0x11/0x20 > [ 67.162168] ? __call_rcu_common+0x3e7/0x15b0 > [ 67.162178] xfrm4_rcv_encap+0x214/0x470 > [ 67.162186] ? __xfrm4_udp_encap_rcv.part.0+0x3cd/0x560 > [ 67.162195] xfrm4_udp_encap_rcv+0xdd/0xf0 > [ 67.162203] udp_queue_rcv_one_skb+0x880/0x12f0 > [ 67.162212] udp_queue_rcv_skb+0x139/0xa90 > [ 67.162221] udp_unicast_rcv_skb+0x116/0x350 > [ 67.162229] __udp4_lib_rcv+0x213b/0x3410 > [ 67.162237] ? ldsem_down_write+0x211/0x4ed > [ 67.162246] ? __pfx___udp4_lib_rcv+0x10/0x10 > [ 67.162254] ? __pfx_raw_local_deliver+0x10/0x10 > [ 67.162262] ? __pfx_cache_tag_flush_range_np+0x10/0x10 > [ 67.162273] udp_rcv+0x86/0xb0 > [ 67.162280] ip_protocol_deliver_rcu+0x152/0x380 > [ 67.162289] ip_local_deliver_finish+0x282/0x370 > [ 67.162296] ip_local_deliver+0x1a8/0x380 > [ 67.162303] ? __pfx_ip_local_deliver+0x10/0x10 > [ 67.162310] ? ip_rcv_finish_core.constprop.0+0x481/0x1ce0 > [ 67.162317] ? ip_rcv_core+0x5df/0xd60 > [ 67.162325] ip_rcv+0x2fc/0x380 > [ 67.162332] ? __pfx_ip_rcv+0x10/0x10 > [ 67.162338] ? __pfx_dma_map_page_attrs+0x10/0x10 > [ 67.162346] ? __kasan_check_write+0x14/0x30 > [ 67.162354] ? __build_skb_around+0x23a/0x350 > [ 67.162363] ? __pfx_ip_rcv+0x10/0x10 > [ 67.162369] __netif_receive_skb_one_core+0x173/0x1d0 > [ 67.162377] ? __pfx___netif_receive_skb_one_core+0x10/0x10 > [ 67.162386] ? __kasan_check_write+0x14/0x30 > [ 67.162394] ? _raw_spin_lock_irq+0x8b/0x100 > [ 67.162402] __netif_receive_skb+0x21/0x160 > [ 67.162409] process_backlog+0x1c0/0x590 > [ 67.162417] __napi_poll+0xab/0x550 > [ 67.162425] net_rx_action+0x53e/0xd10 > [ 67.162434] ? __pfx_net_rx_action+0x10/0x10 > [ 67.162443] ? __pfx_wake_up_var+0x10/0x10 > [ 67.162453] ? tasklet_action_common.constprop.0+0x22c/0x670 > [ 67.162463] handle_softirqs+0x18f/0x5d0 > [ 67.162472] ? __pfx_run_ksoftirqd+0x10/0x10 > [ 67.162480] run_ksoftirqd+0x3c/0x60 > [ 67.162487] smpboot_thread_fn+0x2f3/0x700 > [ 67.162497] kthread+0x2b5/0x390 > [ 67.162505] ? __pfx_smpboot_thread_fn+0x10/0x10 > [ 67.162512] ? __pfx_kthread+0x10/0x10 > [ 67.162519] ret_from_fork+0x43/0x90 > [ 67.162527] ? __pfx_kthread+0x10/0x10 > [ 67.162534] ret_from_fork_asm+0x1a/0x30 > [ 67.162544] </TASK> > > [ 67.162551] The buggy address belongs to the object at ffff888156cb8000 > which belongs to the cache kmalloc-rnd-09-8k of size 8192 > [ 67.162557] The buggy address is located 331 bytes inside of > allocated 8192-byte region [ffff888156cb8000, ffff888156cba000) > > [ 67.162566] The buggy address belongs to the physical page: > [ 67.162570] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x156cb8 > [ 67.162578] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > [ 67.162583] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) > [ 67.162591] page_type: 0xfdffffff(slab) > [ 67.162599] raw: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000 > [ 67.162605] raw: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000 > [ 67.162611] head: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000 > [ 67.162616] head: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000 > [ 67.162621] head: 0017ffffc0000003 ffffea00055b2e01 ffffffffffffffff 0000000000000000 > [ 67.162626] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 > [ 67.162630] page dumped because: kasan: bad access detected > > [ 67.162636] Memory state around the buggy address: > [ 67.162640] ffff888156cb8000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > [ 67.162645] ffff888156cb8080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > [ 67.162650] >ffff888156cb8100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > [ 67.162653] ^ > [ 67.162658] ffff888156cb8180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > [ 67.162663] ffff888156cb8200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > > The reason is that the eth_hdr(skb) inside if statement evaluated > to an unexpected address with skb->mac_header = ~0U (indicating there > is no MAC header). The unreliability of skb->mac_len causes the if > statement to become true even if there is no MAC header inside the > skb data buffer. > > Check both the skb->mac_len and skb_mac_header_was_set(skb) fixes this issue. > > Fixes: 87cdf3148b11 ("xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto") > Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> > --- > Changes in v2: > * Change the title from "xfrm: avoid using skb->mac_len to decide if mac header is shown" > * Remain skb->mac_len check > * Apply fix on ipv6 path too > --- > net/xfrm/xfrm_input.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c > index 749e7eea99e4..eef0145c73a7 100644 > --- a/net/xfrm/xfrm_input.c > +++ b/net/xfrm/xfrm_input.c > @@ -251,7 +251,7 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) > > skb_reset_network_header(skb); > skb_mac_header_rebuild(skb); > - if (skb->mac_len) > + if (skb->mac_len && skb_mac_header_was_set(skb)) > eth_hdr(skb)->h_proto = skb->protocol; > > err = 0; > @@ -288,7 +288,7 @@ static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) > > skb_reset_network_header(skb); > skb_mac_header_rebuild(skb); > - if (skb->mac_len) > + if (skb->mac_len && skb_mac_header_was_set(skb)) > eth_hdr(skb)->h_proto = skb->protocol; > > err = 0; Same change (and request for more debugging) already suggested in 2023, see [1]... Regards, Peter [1] https://lore.kernel.org/netdev/d1cf5a66-03e1-44b8-929d-ac123b1bbd7b@sylv.io/T/
On Thu, Sep 12, 2024 at 11:35 AM Peter Seiderer <ps.report@gmx.net> wrote: > > Same change (and request for more debugging) already suggested in 2023, see [1]... > > Regards, > Peter > > [1] https://lore.kernel.org/netdev/d1cf5a66-03e1-44b8-929d-ac123b1bbd7b@sylv.io/T/ Indeed ! Nice to see some consistency among us :)
> Could you try the following patch, and compile your test kernel with > CONFIG_DEBUG_NET=y ? [ 323.870221] ------------[ cut here ]------------ [ 323.870226] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904 __netif_receive_skb_core.constprop.0+0x201/0x39d0 [ 323.870369] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted 6.11.0-rc6-c763c4339688+ #12 [ 323.870372] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS 1.15.0 07/15/2024 [ 323.870373] RIP: 0010:__netif_receive_skb_core.constprop.0+0x201/0x39d0 [ 323.870376] Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 b4 24 00 00 41 0f b7 87 ba 00 00 00 29 c3 66 83 f8 ff 75 04 <0f> 0b 31 db 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 78 48 89 fa 48 [ 323.870378] RSP: 0018:ffffc90000377838 EFLAGS: 00010246 [ 323.870380] RAX: 000000000000ffff RBX: 00000000ffff0061 RCX: ffff88876cf48090 [ 323.870381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881756b2e7a [ 323.870382] RBP: ffffc90000377a88 R08: ffff88876cf48184 R09: 0000000000000000 [ 323.870383] R10: 0000000000000000 R11: 1ffff1102ead65b9 R12: ffff8881756b2dc0 [ 323.870384] R13: ffffc90000377b20 R14: ffff8881635ca000 R15: ffff8881756b2dc0 [ 323.870385] FS: 0000000000000000(0000) GS:ffff88876cf00000(0000) knlGS:0000000000000000 [ 323.870387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 323.870388] CR2: 0000769acfa9d080 CR3: 0000000712498000 CR4: 0000000000f50ef0 [ 323.870389] PKRU: 55555554 [ 323.870390] Call Trace: [ 323.870391] <TASK> [ 323.870393] ? show_regs+0x71/0x90 [ 323.870397] ? __warn+0xce/0x270 [ 323.870399] ? __netif_receive_skb_core.constprop.0+0x201/0x39d0 [ 323.870401] ? report_bug+0x2ad/0x300 [ 323.870404] ? handle_bug+0x46/0x90 [ 323.870407] ? exc_invalid_op+0x19/0x50 [ 323.870409] ? asm_exc_invalid_op+0x1b/0x20 [ 323.870413] ? __netif_receive_skb_core.constprop.0+0x201/0x39d0 [ 323.870415] ? intel_iommu_iotlb_sync_map+0x1a/0x30 [ 323.870418] ? iommu_map+0xab/0x140 [ 323.870421] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10 [ 323.870423] ? iommu_dma_map_page+0x159/0x720 [ 323.870425] ? dma_map_page_attrs+0x568/0xdc0 [ 323.870427] ? __kasan_slab_alloc+0x9d/0xa0 [ 323.870430] ? __pfx_dma_map_page_attrs+0x10/0x10 [ 323.870431] ? __kasan_check_write+0x14/0x30 [ 323.870434] ? __build_skb_around+0x23a/0x350 [ 323.870437] __netif_receive_skb_one_core+0xb4/0x1d0 [ 323.870439] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 323.870441] ? __kasan_check_write+0x14/0x30 [ 323.870443] ? _raw_spin_lock_irq+0x8b/0x100 [ 323.870445] __netif_receive_skb+0x21/0x160 [ 323.870447] process_backlog+0x1c0/0x590 [ 323.870449] __napi_poll+0xab/0x560 [ 323.870451] net_rx_action+0x53e/0xd10 [ 323.870453] ? __pfx_net_rx_action+0x10/0x10 [ 323.870455] ? __pfx_wake_up_var+0x10/0x10 [ 323.870457] ? tasklet_action_common.constprop.0+0x22c/0x670 [ 323.870461] handle_softirqs+0x18f/0x5d0 [ 323.870463] ? __pfx_run_ksoftirqd+0x10/0x10 [ 323.870465] run_ksoftirqd+0x3c/0x60 [ 323.870467] smpboot_thread_fn+0x2f3/0x700 [ 323.870470] kthread+0x2b5/0x390 [ 323.870472] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 323.870474] ? __pfx_kthread+0x10/0x10 [ 323.870476] ret_from_fork+0x43/0x90 [ 323.870478] ? __pfx_kthread+0x10/0x10 [ 323.870480] ret_from_fork_asm+0x1a/0x30 [ 323.870483] </TASK> [ 323.870484] ---[ end trace 0000000000000000 ]--- [ 350.300485] Initializing XFRM netlink socket [ 351.586993] ------------[ cut here ]------------ [ 351.586999] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904 dev_gro_receive+0x172c/0x2860 [ 351.587141] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Tainted: G W 6.11.0-rc6-c763c4339688+ #12 [ 351.587144] Tainted: [W]=WARN [ 351.587145] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS 1.15.0 07/15/2024 [ 351.587147] RIP: 0010:dev_gro_receive+0x172c/0x2860 [ 351.587149] Code: 07 83 c2 01 38 ca 7c 08 84 c9 0f 85 d2 09 00 00 8d 14 c5 00 00 00 00 41 0f b6 45 46 83 e0 c7 09 d0 41 88 45 46 e9 ee f9 ff ff <0f> 0b 45 31 f6 e9 64 f7 ff ff 45 31 e4 81 e3 c0 00 00 00 41 0f 95 [ 351.587151] RSP: 0018:ffffc90000377aa8 EFLAGS: 00010246 [ 351.587153] RAX: ffff888128d72840 RBX: ffffffff95a0d9c0 RCX: 0000000000000000 [ 351.587154] RDX: 000000000000ffff RSI: ffff88876cf52418 RDI: ffff88815880ad3a [ 351.587155] RBP: ffffc90000377b48 R08: 0000000000000000 R09: 0000000000000000 [ 351.587156] R10: 1ffff110ed9ea481 R11: 0000000000000000 R12: ffffffff95a0d9d0 [ 351.587157] R13: ffff88815880ac80 R14: 00000000ffff008d R15: ffff88815880acb8 [ 351.587159] FS: 0000000000000000(0000) GS:ffff88876cf00000(0000) knlGS:0000000000000000 [ 351.587160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 351.587161] CR2: 000078e9ea9e25b0 CR3: 0000000712498000 CR4: 0000000000f50ef0 [ 351.587163] PKRU: 55555554 [ 351.587163] Call Trace: [ 351.587164] <TASK> [ 351.587167] ? show_regs+0x71/0x90 [ 351.587171] ? __warn+0xce/0x270 [ 351.587173] ? dev_gro_receive+0x172c/0x2860 [ 351.587175] ? report_bug+0x2ad/0x300 [ 351.587178] ? handle_bug+0x46/0x90 [ 351.587181] ? exc_invalid_op+0x19/0x50 [ 351.587182] ? asm_exc_invalid_op+0x1b/0x20 [ 351.587187] ? dev_gro_receive+0x172c/0x2860 [ 351.587188] ? dev_gro_receive+0xcdd/0x2860 [ 351.587190] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 351.587192] ? __mutex_lock.constprop.0+0x150/0x1180 [ 351.587195] napi_gro_receive+0x3a2/0x900 [ 351.587197] gro_cell_poll+0xe5/0x1d0 [ 351.587200] __napi_poll+0xab/0x560 [ 351.587202] net_rx_action+0x53e/0xd10 [ 351.587204] ? __pfx_net_rx_action+0x10/0x10 [ 351.587206] ? __pfx_wake_up_var+0x10/0x10 [ 351.587209] ? tasklet_action_common.constprop.0+0x22c/0x670 [ 351.587212] handle_softirqs+0x18f/0x5d0 [ 351.587214] ? __pfx_run_ksoftirqd+0x10/0x10 [ 351.587216] run_ksoftirqd+0x3c/0x60 [ 351.587218] smpboot_thread_fn+0x2f3/0x700 [ 351.587220] kthread+0x2b5/0x390 [ 351.587223] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 351.587224] ? __pfx_kthread+0x10/0x10 [ 351.587226] ret_from_fork+0x43/0x90 [ 351.587229] ? __pfx_kthread+0x10/0x10 [ 351.587231] ret_from_fork_asm+0x1a/0x30 [ 351.587234] </TASK> [ 351.587235] ---[ end trace 0000000000000000 ]--- Seems like the __netif_receive_skb_core() and dev_gro_receive() are the places where it calls skb_reset_mac_len() with skb->mac_header = ~0U. On Thu, 12 Sept 2024 at 18:54, Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Sep 12, 2024 at 11:35 AM Peter Seiderer <ps.report@gmx.net> wrote: > > > > > Same change (and request for more debugging) already suggested in 2023, see [1]... > > > > Regards, > > Peter > > > > [1] https://lore.kernel.org/netdev/d1cf5a66-03e1-44b8-929d-ac123b1bbd7b@sylv.io/T/ > > Indeed ! > Nice to see some consistency among us :)
On Fri, Sep 13, 2024 at 7:29 AM En-Wei WU <en-wei.wu@canonical.com> wrote: > > > Could you try the following patch, and compile your test kernel with > > CONFIG_DEBUG_NET=y ? > [ 323.870221] ------------[ cut here ]------------ > [ 323.870226] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904 > __netif_receive_skb_core.constprop.0+0x201/0x39d0 > [ 323.870369] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted > 6.11.0-rc6-c763c4339688+ #12 > [ 323.870372] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS > 1.15.0 07/15/2024 > [ 323.870373] RIP: 0010:__netif_receive_skb_core.constprop.0+0x201/0x39d0 > [ 323.870376] Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 > 7c 08 84 d2 0f 85 b4 24 00 00 41 0f b7 87 ba 00 00 00 29 c3 66 83 f8 > ff 75 04 <0f> 0b 31 db 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 78 48 89 > fa 48 > [ 323.870378] RSP: 0018:ffffc90000377838 EFLAGS: 00010246 > [ 323.870380] RAX: 000000000000ffff RBX: 00000000ffff0061 RCX: ffff88876cf48090 > [ 323.870381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881756b2e7a > [ 323.870382] RBP: ffffc90000377a88 R08: ffff88876cf48184 R09: 0000000000000000 > [ 323.870383] R10: 0000000000000000 R11: 1ffff1102ead65b9 R12: ffff8881756b2dc0 > [ 323.870384] R13: ffffc90000377b20 R14: ffff8881635ca000 R15: ffff8881756b2dc0 > [ 323.870385] FS: 0000000000000000(0000) GS:ffff88876cf00000(0000) > knlGS:0000000000000000 > [ 323.870387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 323.870388] CR2: 0000769acfa9d080 CR3: 0000000712498000 CR4: 0000000000f50ef0 > [ 323.870389] PKRU: 55555554 > [ 323.870390] Call Trace: > [ 323.870391] <TASK> > [ 323.870393] ? show_regs+0x71/0x90 > [ 323.870397] ? __warn+0xce/0x270 > [ 323.870399] ? __netif_receive_skb_core.constprop.0+0x201/0x39d0 > [ 323.870401] ? report_bug+0x2ad/0x300 > [ 323.870404] ? handle_bug+0x46/0x90 > [ 323.870407] ? exc_invalid_op+0x19/0x50 > [ 323.870409] ? asm_exc_invalid_op+0x1b/0x20 > [ 323.870413] ? __netif_receive_skb_core.constprop.0+0x201/0x39d0 > [ 323.870415] ? intel_iommu_iotlb_sync_map+0x1a/0x30 > [ 323.870418] ? iommu_map+0xab/0x140 > [ 323.870421] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10 > [ 323.870423] ? iommu_dma_map_page+0x159/0x720 > [ 323.870425] ? dma_map_page_attrs+0x568/0xdc0 > [ 323.870427] ? __kasan_slab_alloc+0x9d/0xa0 > [ 323.870430] ? __pfx_dma_map_page_attrs+0x10/0x10 > [ 323.870431] ? __kasan_check_write+0x14/0x30 > [ 323.870434] ? __build_skb_around+0x23a/0x350 > [ 323.870437] __netif_receive_skb_one_core+0xb4/0x1d0 > [ 323.870439] ? __pfx___netif_receive_skb_one_core+0x10/0x10 > [ 323.870441] ? __kasan_check_write+0x14/0x30 > [ 323.870443] ? _raw_spin_lock_irq+0x8b/0x100 > [ 323.870445] __netif_receive_skb+0x21/0x160 > [ 323.870447] process_backlog+0x1c0/0x590 > [ 323.870449] __napi_poll+0xab/0x560 > [ 323.870451] net_rx_action+0x53e/0xd10 > [ 323.870453] ? __pfx_net_rx_action+0x10/0x10 > [ 323.870455] ? __pfx_wake_up_var+0x10/0x10 > [ 323.870457] ? tasklet_action_common.constprop.0+0x22c/0x670 > [ 323.870461] handle_softirqs+0x18f/0x5d0 > [ 323.870463] ? __pfx_run_ksoftirqd+0x10/0x10 > [ 323.870465] run_ksoftirqd+0x3c/0x60 > [ 323.870467] smpboot_thread_fn+0x2f3/0x700 > [ 323.870470] kthread+0x2b5/0x390 > [ 323.870472] ? __pfx_smpboot_thread_fn+0x10/0x10 > [ 323.870474] ? __pfx_kthread+0x10/0x10 > [ 323.870476] ret_from_fork+0x43/0x90 > [ 323.870478] ? __pfx_kthread+0x10/0x10 > [ 323.870480] ret_from_fork_asm+0x1a/0x30 > [ 323.870483] </TASK> > [ 323.870484] ---[ end trace 0000000000000000 ]--- > [ 350.300485] Initializing XFRM netlink socket > [ 351.586993] ------------[ cut here ]------------ > [ 351.586999] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904 > dev_gro_receive+0x172c/0x2860 > [ 351.587141] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Tainted: G > W 6.11.0-rc6-c763c4339688+ #12 > [ 351.587144] Tainted: [W]=WARN > [ 351.587145] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS > 1.15.0 07/15/2024 > [ 351.587147] RIP: 0010:dev_gro_receive+0x172c/0x2860 > [ 351.587149] Code: 07 83 c2 01 38 ca 7c 08 84 c9 0f 85 d2 09 00 00 > 8d 14 c5 00 00 00 00 41 0f b6 45 46 83 e0 c7 09 d0 41 88 45 46 e9 ee > f9 ff ff <0f> 0b 45 31 f6 e9 64 f7 ff ff 45 31 e4 81 e3 c0 00 00 00 41 > 0f 95 > [ 351.587151] RSP: 0018:ffffc90000377aa8 EFLAGS: 00010246 > [ 351.587153] RAX: ffff888128d72840 RBX: ffffffff95a0d9c0 RCX: 0000000000000000 > [ 351.587154] RDX: 000000000000ffff RSI: ffff88876cf52418 RDI: ffff88815880ad3a > [ 351.587155] RBP: ffffc90000377b48 R08: 0000000000000000 R09: 0000000000000000 > [ 351.587156] R10: 1ffff110ed9ea481 R11: 0000000000000000 R12: ffffffff95a0d9d0 > [ 351.587157] R13: ffff88815880ac80 R14: 00000000ffff008d R15: ffff88815880acb8 > [ 351.587159] FS: 0000000000000000(0000) GS:ffff88876cf00000(0000) > knlGS:0000000000000000 > [ 351.587160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 351.587161] CR2: 000078e9ea9e25b0 CR3: 0000000712498000 CR4: 0000000000f50ef0 > [ 351.587163] PKRU: 55555554 > [ 351.587163] Call Trace: > [ 351.587164] <TASK> > [ 351.587167] ? show_regs+0x71/0x90 > [ 351.587171] ? __warn+0xce/0x270 > [ 351.587173] ? dev_gro_receive+0x172c/0x2860 > [ 351.587175] ? report_bug+0x2ad/0x300 > [ 351.587178] ? handle_bug+0x46/0x90 > [ 351.587181] ? exc_invalid_op+0x19/0x50 > [ 351.587182] ? asm_exc_invalid_op+0x1b/0x20 > [ 351.587187] ? dev_gro_receive+0x172c/0x2860 > [ 351.587188] ? dev_gro_receive+0xcdd/0x2860 > [ 351.587190] ? __pfx___netif_receive_skb_one_core+0x10/0x10 > [ 351.587192] ? __mutex_lock.constprop.0+0x150/0x1180 > [ 351.587195] napi_gro_receive+0x3a2/0x900 > [ 351.587197] gro_cell_poll+0xe5/0x1d0 > [ 351.587200] __napi_poll+0xab/0x560 > [ 351.587202] net_rx_action+0x53e/0xd10 > [ 351.587204] ? __pfx_net_rx_action+0x10/0x10 > [ 351.587206] ? __pfx_wake_up_var+0x10/0x10 > [ 351.587209] ? tasklet_action_common.constprop.0+0x22c/0x670 > [ 351.587212] handle_softirqs+0x18f/0x5d0 > [ 351.587214] ? __pfx_run_ksoftirqd+0x10/0x10 > [ 351.587216] run_ksoftirqd+0x3c/0x60 > [ 351.587218] smpboot_thread_fn+0x2f3/0x700 > [ 351.587220] kthread+0x2b5/0x390 > [ 351.587223] ? __pfx_smpboot_thread_fn+0x10/0x10 > [ 351.587224] ? __pfx_kthread+0x10/0x10 > [ 351.587226] ret_from_fork+0x43/0x90 > [ 351.587229] ? __pfx_kthread+0x10/0x10 > [ 351.587231] ret_from_fork_asm+0x1a/0x30 > [ 351.587234] </TASK> > [ 351.587235] ---[ end trace 0000000000000000 ]--- > > Seems like the __netif_receive_skb_core() and dev_gro_receive() are > the places where it calls skb_reset_mac_len() with skb->mac_header = > ~0U. Ouch, let me take a look.
Hi, I would kindly ask if there is any progress :) Thanks. En-Wei. On Fri, 13 Sept 2024 at 09:04, Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Sep 13, 2024 at 7:29 AM En-Wei WU <en-wei.wu@canonical.com> wrote: > > > > > Could you try the following patch, and compile your test kernel with > > > CONFIG_DEBUG_NET=y ? > > [ 323.870221] ------------[ cut here ]------------ > > [ 323.870226] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904 > > __netif_receive_skb_core.constprop.0+0x201/0x39d0 > > [ 323.870369] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted > > 6.11.0-rc6-c763c4339688+ #12 > > [ 323.870372] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS > > 1.15.0 07/15/2024 > > [ 323.870373] RIP: 0010:__netif_receive_skb_core.constprop.0+0x201/0x39d0 > > [ 323.870376] Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 > > 7c 08 84 d2 0f 85 b4 24 00 00 41 0f b7 87 ba 00 00 00 29 c3 66 83 f8 > > ff 75 04 <0f> 0b 31 db 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 78 48 89 > > fa 48 > > [ 323.870378] RSP: 0018:ffffc90000377838 EFLAGS: 00010246 > > [ 323.870380] RAX: 000000000000ffff RBX: 00000000ffff0061 RCX: ffff88876cf48090 > > [ 323.870381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881756b2e7a > > [ 323.870382] RBP: ffffc90000377a88 R08: ffff88876cf48184 R09: 0000000000000000 > > [ 323.870383] R10: 0000000000000000 R11: 1ffff1102ead65b9 R12: ffff8881756b2dc0 > > [ 323.870384] R13: ffffc90000377b20 R14: ffff8881635ca000 R15: ffff8881756b2dc0 > > [ 323.870385] FS: 0000000000000000(0000) GS:ffff88876cf00000(0000) > > knlGS:0000000000000000 > > [ 323.870387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 323.870388] CR2: 0000769acfa9d080 CR3: 0000000712498000 CR4: 0000000000f50ef0 > > [ 323.870389] PKRU: 55555554 > > [ 323.870390] Call Trace: > > [ 323.870391] <TASK> > > [ 323.870393] ? show_regs+0x71/0x90 > > [ 323.870397] ? __warn+0xce/0x270 > > [ 323.870399] ? __netif_receive_skb_core.constprop.0+0x201/0x39d0 > > [ 323.870401] ? report_bug+0x2ad/0x300 > > [ 323.870404] ? handle_bug+0x46/0x90 > > [ 323.870407] ? exc_invalid_op+0x19/0x50 > > [ 323.870409] ? asm_exc_invalid_op+0x1b/0x20 > > [ 323.870413] ? __netif_receive_skb_core.constprop.0+0x201/0x39d0 > > [ 323.870415] ? intel_iommu_iotlb_sync_map+0x1a/0x30 > > [ 323.870418] ? iommu_map+0xab/0x140 > > [ 323.870421] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10 > > [ 323.870423] ? iommu_dma_map_page+0x159/0x720 > > [ 323.870425] ? dma_map_page_attrs+0x568/0xdc0 > > [ 323.870427] ? __kasan_slab_alloc+0x9d/0xa0 > > [ 323.870430] ? __pfx_dma_map_page_attrs+0x10/0x10 > > [ 323.870431] ? __kasan_check_write+0x14/0x30 > > [ 323.870434] ? __build_skb_around+0x23a/0x350 > > [ 323.870437] __netif_receive_skb_one_core+0xb4/0x1d0 > > [ 323.870439] ? __pfx___netif_receive_skb_one_core+0x10/0x10 > > [ 323.870441] ? __kasan_check_write+0x14/0x30 > > [ 323.870443] ? _raw_spin_lock_irq+0x8b/0x100 > > [ 323.870445] __netif_receive_skb+0x21/0x160 > > [ 323.870447] process_backlog+0x1c0/0x590 > > [ 323.870449] __napi_poll+0xab/0x560 > > [ 323.870451] net_rx_action+0x53e/0xd10 > > [ 323.870453] ? __pfx_net_rx_action+0x10/0x10 > > [ 323.870455] ? __pfx_wake_up_var+0x10/0x10 > > [ 323.870457] ? tasklet_action_common.constprop.0+0x22c/0x670 > > [ 323.870461] handle_softirqs+0x18f/0x5d0 > > [ 323.870463] ? __pfx_run_ksoftirqd+0x10/0x10 > > [ 323.870465] run_ksoftirqd+0x3c/0x60 > > [ 323.870467] smpboot_thread_fn+0x2f3/0x700 > > [ 323.870470] kthread+0x2b5/0x390 > > [ 323.870472] ? __pfx_smpboot_thread_fn+0x10/0x10 > > [ 323.870474] ? __pfx_kthread+0x10/0x10 > > [ 323.870476] ret_from_fork+0x43/0x90 > > [ 323.870478] ? __pfx_kthread+0x10/0x10 > > [ 323.870480] ret_from_fork_asm+0x1a/0x30 > > [ 323.870483] </TASK> > > [ 323.870484] ---[ end trace 0000000000000000 ]--- > > [ 350.300485] Initializing XFRM netlink socket > > [ 351.586993] ------------[ cut here ]------------ > > [ 351.586999] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904 > > dev_gro_receive+0x172c/0x2860 > > [ 351.587141] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Tainted: G > > W 6.11.0-rc6-c763c4339688+ #12 > > [ 351.587144] Tainted: [W]=WARN > > [ 351.587145] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS > > 1.15.0 07/15/2024 > > [ 351.587147] RIP: 0010:dev_gro_receive+0x172c/0x2860 > > [ 351.587149] Code: 07 83 c2 01 38 ca 7c 08 84 c9 0f 85 d2 09 00 00 > > 8d 14 c5 00 00 00 00 41 0f b6 45 46 83 e0 c7 09 d0 41 88 45 46 e9 ee > > f9 ff ff <0f> 0b 45 31 f6 e9 64 f7 ff ff 45 31 e4 81 e3 c0 00 00 00 41 > > 0f 95 > > [ 351.587151] RSP: 0018:ffffc90000377aa8 EFLAGS: 00010246 > > [ 351.587153] RAX: ffff888128d72840 RBX: ffffffff95a0d9c0 RCX: 0000000000000000 > > [ 351.587154] RDX: 000000000000ffff RSI: ffff88876cf52418 RDI: ffff88815880ad3a > > [ 351.587155] RBP: ffffc90000377b48 R08: 0000000000000000 R09: 0000000000000000 > > [ 351.587156] R10: 1ffff110ed9ea481 R11: 0000000000000000 R12: ffffffff95a0d9d0 > > [ 351.587157] R13: ffff88815880ac80 R14: 00000000ffff008d R15: ffff88815880acb8 > > [ 351.587159] FS: 0000000000000000(0000) GS:ffff88876cf00000(0000) > > knlGS:0000000000000000 > > [ 351.587160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 351.587161] CR2: 000078e9ea9e25b0 CR3: 0000000712498000 CR4: 0000000000f50ef0 > > [ 351.587163] PKRU: 55555554 > > [ 351.587163] Call Trace: > > [ 351.587164] <TASK> > > [ 351.587167] ? show_regs+0x71/0x90 > > [ 351.587171] ? __warn+0xce/0x270 > > [ 351.587173] ? dev_gro_receive+0x172c/0x2860 > > [ 351.587175] ? report_bug+0x2ad/0x300 > > [ 351.587178] ? handle_bug+0x46/0x90 > > [ 351.587181] ? exc_invalid_op+0x19/0x50 > > [ 351.587182] ? asm_exc_invalid_op+0x1b/0x20 > > [ 351.587187] ? dev_gro_receive+0x172c/0x2860 > > [ 351.587188] ? dev_gro_receive+0xcdd/0x2860 > > [ 351.587190] ? __pfx___netif_receive_skb_one_core+0x10/0x10 > > [ 351.587192] ? __mutex_lock.constprop.0+0x150/0x1180 > > [ 351.587195] napi_gro_receive+0x3a2/0x900 > > [ 351.587197] gro_cell_poll+0xe5/0x1d0 > > [ 351.587200] __napi_poll+0xab/0x560 > > [ 351.587202] net_rx_action+0x53e/0xd10 > > [ 351.587204] ? __pfx_net_rx_action+0x10/0x10 > > [ 351.587206] ? __pfx_wake_up_var+0x10/0x10 > > [ 351.587209] ? tasklet_action_common.constprop.0+0x22c/0x670 > > [ 351.587212] handle_softirqs+0x18f/0x5d0 > > [ 351.587214] ? __pfx_run_ksoftirqd+0x10/0x10 > > [ 351.587216] run_ksoftirqd+0x3c/0x60 > > [ 351.587218] smpboot_thread_fn+0x2f3/0x700 > > [ 351.587220] kthread+0x2b5/0x390 > > [ 351.587223] ? __pfx_smpboot_thread_fn+0x10/0x10 > > [ 351.587224] ? __pfx_kthread+0x10/0x10 > > [ 351.587226] ret_from_fork+0x43/0x90 > > [ 351.587229] ? __pfx_kthread+0x10/0x10 > > [ 351.587231] ret_from_fork_asm+0x1a/0x30 > > [ 351.587234] </TASK> > > [ 351.587235] ---[ end trace 0000000000000000 ]--- > > > > Seems like the __netif_receive_skb_core() and dev_gro_receive() are > > the places where it calls skb_reset_mac_len() with skb->mac_header = > > ~0U. > > Ouch, let me take a look.
On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote: > > Hi, > > I would kindly ask if there is any progress :) Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) : diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2909,9 +2909,19 @@ static inline void skb_reset_inner_headers(struct sk_buff *skb) skb->inner_transport_header = skb->transport_header; } +static inline int skb_mac_header_was_set(const struct sk_buff *skb) +{ + return skb->mac_header != (typeof(skb->mac_header))~0U; +} + static inline void skb_reset_mac_len(struct sk_buff *skb) { - skb->mac_len = skb->network_header - skb->mac_header; + if (!skb_mac_header_was_set(skb)) { + DEBUG_NET_WARN_ON_ONCE(1); + skb->mac_len = 0; + } else { + skb->mac_len = skb->network_header - skb->mac_header; + } } static inline unsigned char *skb_inner_transport_header(const struct sk_buff @@ -3014,11 +3024,6 @@ static inline void skb_set_network_header(struct sk_buff *skb, const int offset) skb->network_header += offset; } -static inline int skb_mac_header_was_set(const struct sk_buff *skb) -{ - return skb->mac_header != (typeof(skb->mac_header))~0U; -} - static inline unsigned char *skb_mac_header(const struct sk_buff *skb) { DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb)); @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct sk_buff *skb) static inline void skb_reset_mac_header(struct sk_buff *skb) { + DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head); skb->mac_header = skb->data - skb->head; } @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct sk_buff *skb, const int offset) { skb_reset_mac_header(skb); skb->mac_header += offset; + DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head); } static inline void skb_pop_mac_header(struct sk_buff *skb)
Hi, sorry for the late reply. I've tested this debug patch (with CONFIG_DEBUG_NET=y) on my machine, and the DEBUG_NET_WARN_ON_ONCE never got triggered. Thanks. On Wed, 2 Oct 2024 at 14:59, Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote: > > > > Hi, > > > > I would kindly ask if there is any progress :) > > Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) : > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132 > 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -2909,9 +2909,19 @@ static inline void > skb_reset_inner_headers(struct sk_buff *skb) > skb->inner_transport_header = skb->transport_header; > } > > +static inline int skb_mac_header_was_set(const struct sk_buff *skb) > +{ > + return skb->mac_header != (typeof(skb->mac_header))~0U; > +} > + > static inline void skb_reset_mac_len(struct sk_buff *skb) > { > - skb->mac_len = skb->network_header - skb->mac_header; > + if (!skb_mac_header_was_set(skb)) { > + DEBUG_NET_WARN_ON_ONCE(1); > + skb->mac_len = 0; > + } else { > + skb->mac_len = skb->network_header - skb->mac_header; > + } > } > > static inline unsigned char *skb_inner_transport_header(const struct sk_buff > @@ -3014,11 +3024,6 @@ static inline void > skb_set_network_header(struct sk_buff *skb, const int offset) > skb->network_header += offset; > } > > -static inline int skb_mac_header_was_set(const struct sk_buff *skb) > -{ > - return skb->mac_header != (typeof(skb->mac_header))~0U; > -} > - > static inline unsigned char *skb_mac_header(const struct sk_buff *skb) > { > DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb)); > @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct > sk_buff *skb) > > static inline void skb_reset_mac_header(struct sk_buff *skb) > { > + DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head); > skb->mac_header = skb->data - skb->head; > } > > @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct > sk_buff *skb, const int offset) > { > skb_reset_mac_header(skb); > skb->mac_header += offset; > + DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head); > } > > static inline void skb_pop_mac_header(struct sk_buff *skb)
> Seems like the __netif_receive_skb_core() and dev_gro_receive() are > the places where it calls skb_reset_mac_len() with skb->mac_header = > ~0U. I believe it's the root cause. My concern is that if we put something like: + if (!skb_mac_header_was_set(skb)) { + DEBUG_NET_WARN_ON_ONCE(1); + skb->mac_len = 0; in skb_reset_mac_len(), it may degrade the RX path a bit. Catching the bug in xfrm4_remove_tunnel_encap() and xfrm6_remove_tunnel_encap() (the original patch) is nice because it won't affect the systems which are not using the xfrm. Kind Regards, En-Wei. On Mon, 14 Oct 2024 at 22:06, En-Wei WU <en-wei.wu@canonical.com> wrote: > > Hi, sorry for the late reply. > > I've tested this debug patch (with CONFIG_DEBUG_NET=y) on my machine, > and the DEBUG_NET_WARN_ON_ONCE never got triggered. > > Thanks. > > On Wed, 2 Oct 2024 at 14:59, Eric Dumazet <edumazet@google.com> wrote: > > > > On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote: > > > > > > Hi, > > > > > > I would kindly ask if there is any progress :) > > > > Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) : > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132 > > 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -2909,9 +2909,19 @@ static inline void > > skb_reset_inner_headers(struct sk_buff *skb) > > skb->inner_transport_header = skb->transport_header; > > } > > > > +static inline int skb_mac_header_was_set(const struct sk_buff *skb) > > +{ > > + return skb->mac_header != (typeof(skb->mac_header))~0U; > > +} > > + > > static inline void skb_reset_mac_len(struct sk_buff *skb) > > { > > - skb->mac_len = skb->network_header - skb->mac_header; > > + if (!skb_mac_header_was_set(skb)) { > > + DEBUG_NET_WARN_ON_ONCE(1); > > + skb->mac_len = 0; > > + } else { > > + skb->mac_len = skb->network_header - skb->mac_header; > > + } > > } > > > > static inline unsigned char *skb_inner_transport_header(const struct sk_buff > > @@ -3014,11 +3024,6 @@ static inline void > > skb_set_network_header(struct sk_buff *skb, const int offset) > > skb->network_header += offset; > > } > > > > -static inline int skb_mac_header_was_set(const struct sk_buff *skb) > > -{ > > - return skb->mac_header != (typeof(skb->mac_header))~0U; > > -} > > - > > static inline unsigned char *skb_mac_header(const struct sk_buff *skb) > > { > > DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb)); > > @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct > > sk_buff *skb) > > > > static inline void skb_reset_mac_header(struct sk_buff *skb) > > { > > + DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head); > > skb->mac_header = skb->data - skb->head; > > } > > > > @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct > > sk_buff *skb, const int offset) > > { > > skb_reset_mac_header(skb); > > skb->mac_header += offset; > > + DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head); > > } > > > > static inline void skb_pop_mac_header(struct sk_buff *skb)
Hi, Can I kindly ask if there is any progress? Thanks, Regards. On Fri, 18 Oct 2024 at 21:21, En-Wei WU <en-wei.wu@canonical.com> wrote: > > > Seems like the __netif_receive_skb_core() and dev_gro_receive() are > > the places where it calls skb_reset_mac_len() with skb->mac_header = > > ~0U. > I believe it's the root cause. > > My concern is that if we put something like: > + if (!skb_mac_header_was_set(skb)) { > + DEBUG_NET_WARN_ON_ONCE(1); > + skb->mac_len = 0; > in skb_reset_mac_len(), it may degrade the RX path a bit. > > Catching the bug in xfrm4_remove_tunnel_encap() and > xfrm6_remove_tunnel_encap() (the original patch) is nice because it > won't affect the systems which are not using the xfrm. > > Kind Regards, > En-Wei. > > On Mon, 14 Oct 2024 at 22:06, En-Wei WU <en-wei.wu@canonical.com> wrote: > > > > Hi, sorry for the late reply. > > > > I've tested this debug patch (with CONFIG_DEBUG_NET=y) on my machine, > > and the DEBUG_NET_WARN_ON_ONCE never got triggered. > > > > Thanks. > > > > On Wed, 2 Oct 2024 at 14:59, Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote: > > > > > > > > Hi, > > > > > > > > I would kindly ask if there is any progress :) > > > > > > Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) : > > > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > > index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132 > > > 100644 > > > --- a/include/linux/skbuff.h > > > +++ b/include/linux/skbuff.h > > > @@ -2909,9 +2909,19 @@ static inline void > > > skb_reset_inner_headers(struct sk_buff *skb) > > > skb->inner_transport_header = skb->transport_header; > > > } > > > > > > +static inline int skb_mac_header_was_set(const struct sk_buff *skb) > > > +{ > > > + return skb->mac_header != (typeof(skb->mac_header))~0U; > > > +} > > > + > > > static inline void skb_reset_mac_len(struct sk_buff *skb) > > > { > > > - skb->mac_len = skb->network_header - skb->mac_header; > > > + if (!skb_mac_header_was_set(skb)) { > > > + DEBUG_NET_WARN_ON_ONCE(1); > > > + skb->mac_len = 0; > > > + } else { > > > + skb->mac_len = skb->network_header - skb->mac_header; > > > + } > > > } > > > > > > static inline unsigned char *skb_inner_transport_header(const struct sk_buff > > > @@ -3014,11 +3024,6 @@ static inline void > > > skb_set_network_header(struct sk_buff *skb, const int offset) > > > skb->network_header += offset; > > > } > > > > > > -static inline int skb_mac_header_was_set(const struct sk_buff *skb) > > > -{ > > > - return skb->mac_header != (typeof(skb->mac_header))~0U; > > > -} > > > - > > > static inline unsigned char *skb_mac_header(const struct sk_buff *skb) > > > { > > > DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb)); > > > @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct > > > sk_buff *skb) > > > > > > static inline void skb_reset_mac_header(struct sk_buff *skb) > > > { > > > + DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head); > > > skb->mac_header = skb->data - skb->head; > > > } > > > > > > @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct > > > sk_buff *skb, const int offset) > > > { > > > skb_reset_mac_header(skb); > > > skb->mac_header += offset; > > > + DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head); > > > } > > > > > > static inline void skb_pop_mac_header(struct sk_buff *skb)
On Fri, Oct 18, 2024 at 3:22 PM En-Wei WU <en-wei.wu@canonical.com> wrote: > > > Seems like the __netif_receive_skb_core() and dev_gro_receive() are > > the places where it calls skb_reset_mac_len() with skb->mac_header = > > ~0U. > I believe it's the root cause. > > My concern is that if we put something like: > + if (!skb_mac_header_was_set(skb)) { > + DEBUG_NET_WARN_ON_ONCE(1); > + skb->mac_len = 0; > in skb_reset_mac_len(), it may degrade the RX path a bit. I do not have such concerns. Note this is temporary until we fix the root cause. > > Catching the bug in xfrm4_remove_tunnel_encap() and > xfrm6_remove_tunnel_encap() (the original patch) is nice because it > won't affect the systems which are not using the xfrm. > Somehow xfrm is feeding to gro_cells_receive() packets without the mac header being set, this is the bug that needs to be fixed. GRO needs skb_mac_header() to return the correct pointer. For normal GRO, it is set either in : 1) napi_gro_frags : napi_frags_skb() calls skb_reset_mac_header(skb); 2) napi_gro_receive() : callers are supposed to call eth_type_trans() before calling napi_gro_receive(). eth_type_trans() calls skb_reset_mac_header() as expected. xfrm calls skb_mac_header_rebuild(), but it might be a NOP if MAC header was never set.
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 749e7eea99e4..eef0145c73a7 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -251,7 +251,7 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) skb_reset_network_header(skb); skb_mac_header_rebuild(skb); - if (skb->mac_len) + if (skb->mac_len && skb_mac_header_was_set(skb)) eth_hdr(skb)->h_proto = skb->protocol; err = 0; @@ -288,7 +288,7 @@ static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) skb_reset_network_header(skb); skb_mac_header_rebuild(skb); - if (skb->mac_len) + if (skb->mac_len && skb_mac_header_was_set(skb)) eth_hdr(skb)->h_proto = skb->protocol; err = 0;
When we use Intel WWAN with xfrm, our system always hangs after browsing websites for a few seconds. The error message shows that it is a slab-out-of-bounds error: [ 67.162014] BUG: KASAN: slab-out-of-bounds in xfrm_input+0x426e/0x6740 [ 67.162030] Write of size 2 at addr ffff888156cb814b by task ksoftirqd/2/26 [ 67.162043] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted 6.11.0-rc6-c763c4339688+ #2 [ 67.162053] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS 1.15.0 07/15/2024 [ 67.162058] Call Trace: [ 67.162062] <TASK> [ 67.162068] dump_stack_lvl+0x76/0xa0 [ 67.162079] print_report+0xce/0x5f0 [ 67.162088] ? xfrm_input+0x426e/0x6740 [ 67.162096] ? kasan_complete_mode_report_info+0x26/0x200 [ 67.162105] ? xfrm_input+0x426e/0x6740 [ 67.162112] kasan_report+0xbe/0x110 [ 67.162119] ? xfrm_input+0x426e/0x6740 [ 67.162129] __asan_report_store_n_noabort+0x12/0x30 [ 67.162138] xfrm_input+0x426e/0x6740 [ 67.162149] ? __pfx_xfrm_input+0x10/0x10 [ 67.162160] ? __kasan_check_read+0x11/0x20 [ 67.162168] ? __call_rcu_common+0x3e7/0x15b0 [ 67.162178] xfrm4_rcv_encap+0x214/0x470 [ 67.162186] ? __xfrm4_udp_encap_rcv.part.0+0x3cd/0x560 [ 67.162195] xfrm4_udp_encap_rcv+0xdd/0xf0 [ 67.162203] udp_queue_rcv_one_skb+0x880/0x12f0 [ 67.162212] udp_queue_rcv_skb+0x139/0xa90 [ 67.162221] udp_unicast_rcv_skb+0x116/0x350 [ 67.162229] __udp4_lib_rcv+0x213b/0x3410 [ 67.162237] ? ldsem_down_write+0x211/0x4ed [ 67.162246] ? __pfx___udp4_lib_rcv+0x10/0x10 [ 67.162254] ? __pfx_raw_local_deliver+0x10/0x10 [ 67.162262] ? __pfx_cache_tag_flush_range_np+0x10/0x10 [ 67.162273] udp_rcv+0x86/0xb0 [ 67.162280] ip_protocol_deliver_rcu+0x152/0x380 [ 67.162289] ip_local_deliver_finish+0x282/0x370 [ 67.162296] ip_local_deliver+0x1a8/0x380 [ 67.162303] ? __pfx_ip_local_deliver+0x10/0x10 [ 67.162310] ? ip_rcv_finish_core.constprop.0+0x481/0x1ce0 [ 67.162317] ? ip_rcv_core+0x5df/0xd60 [ 67.162325] ip_rcv+0x2fc/0x380 [ 67.162332] ? __pfx_ip_rcv+0x10/0x10 [ 67.162338] ? __pfx_dma_map_page_attrs+0x10/0x10 [ 67.162346] ? __kasan_check_write+0x14/0x30 [ 67.162354] ? __build_skb_around+0x23a/0x350 [ 67.162363] ? __pfx_ip_rcv+0x10/0x10 [ 67.162369] __netif_receive_skb_one_core+0x173/0x1d0 [ 67.162377] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 67.162386] ? __kasan_check_write+0x14/0x30 [ 67.162394] ? _raw_spin_lock_irq+0x8b/0x100 [ 67.162402] __netif_receive_skb+0x21/0x160 [ 67.162409] process_backlog+0x1c0/0x590 [ 67.162417] __napi_poll+0xab/0x550 [ 67.162425] net_rx_action+0x53e/0xd10 [ 67.162434] ? __pfx_net_rx_action+0x10/0x10 [ 67.162443] ? __pfx_wake_up_var+0x10/0x10 [ 67.162453] ? tasklet_action_common.constprop.0+0x22c/0x670 [ 67.162463] handle_softirqs+0x18f/0x5d0 [ 67.162472] ? __pfx_run_ksoftirqd+0x10/0x10 [ 67.162480] run_ksoftirqd+0x3c/0x60 [ 67.162487] smpboot_thread_fn+0x2f3/0x700 [ 67.162497] kthread+0x2b5/0x390 [ 67.162505] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 67.162512] ? __pfx_kthread+0x10/0x10 [ 67.162519] ret_from_fork+0x43/0x90 [ 67.162527] ? __pfx_kthread+0x10/0x10 [ 67.162534] ret_from_fork_asm+0x1a/0x30 [ 67.162544] </TASK> [ 67.162551] The buggy address belongs to the object at ffff888156cb8000 which belongs to the cache kmalloc-rnd-09-8k of size 8192 [ 67.162557] The buggy address is located 331 bytes inside of allocated 8192-byte region [ffff888156cb8000, ffff888156cba000) [ 67.162566] The buggy address belongs to the physical page: [ 67.162570] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x156cb8 [ 67.162578] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 67.162583] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ 67.162591] page_type: 0xfdffffff(slab) [ 67.162599] raw: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000 [ 67.162605] raw: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000 [ 67.162611] head: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000 [ 67.162616] head: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000 [ 67.162621] head: 0017ffffc0000003 ffffea00055b2e01 ffffffffffffffff 0000000000000000 [ 67.162626] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 [ 67.162630] page dumped because: kasan: bad access detected [ 67.162636] Memory state around the buggy address: [ 67.162640] ffff888156cb8000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 67.162645] ffff888156cb8080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 67.162650] >ffff888156cb8100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 67.162653] ^ [ 67.162658] ffff888156cb8180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 67.162663] ffff888156cb8200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc The reason is that the eth_hdr(skb) inside if statement evaluated to an unexpected address with skb->mac_header = ~0U (indicating there is no MAC header). The unreliability of skb->mac_len causes the if statement to become true even if there is no MAC header inside the skb data buffer. Check both the skb->mac_len and skb_mac_header_was_set(skb) fixes this issue. Fixes: 87cdf3148b11 ("xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto") Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com> --- Changes in v2: * Change the title from "xfrm: avoid using skb->mac_len to decide if mac header is shown" * Remain skb->mac_len check * Apply fix on ipv6 path too --- net/xfrm/xfrm_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)