Message ID | 20240313133402.9027-1-shiming.cheng@mediatek.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] udp: fix segmentation crash for untrusted source packet | expand |
On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote: > Kernel exception is reported when making udp frag list segmentation. > Backtrace is as below: > at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:229 > at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:262 > features=features@entry=19, is_ipv6=false) > at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:289 > features=19) > at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:399 > features=19) > at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/af_inet.c:1418 > skb@entry=0x0, features=19, features@entry=0) > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53 > tx_path=<optimized out>) > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124 A full backtrace would help better understanding the issue. > This packet's frag list is null while gso_type is not 0. Then it is treated > as a GRO-ed packet and sent to segment frag list. Function call path is > udp_rcv_segment => config features value > __udpv4_gso_segment => skb_gso_ok returns false. Here it should be > true. Why? If I read correctly the above, this is GSO packet landing in an UDP socket with no UDP_GRO sockopt. The packet is expected to be segmented again. > Failed reason is features doesn't match > gso_type. > __udp_gso_segment_list > skb_segment_list => packet is linear with skb->next = NULL > __udpv4_gso_segment_list_csum => use skb->next directly and > crash happens > > In rx-gro-list GRO-ed packet is set gso type as > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In gso > flow the features should also set them to match with gso_type. Or else it > will always return false in skb_gso_ok. Then it can't discover the > untrusted source packet and result crash in following function. What is the 'untrusted source' here? I read the above as the packet aggregation happened in the GRO engine??? Could you please give a complete description of the relevant scenario? > Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum recalculation") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > include/net/udp.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/net/udp.h b/include/net/udp.h > index 488a6d2babcc..c87baa23b9da 100644 > --- a/include/net/udp.h > +++ b/include/net/udp.h > @@ -464,7 +464,7 @@ void udpv6_encap_enable(void); > static inline struct sk_buff *udp_rcv_segment(struct sock *sk, > struct sk_buff *skb, bool ipv4) > { > - netdev_features_t features = NETIF_F_SG; > + netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST; This looks wrong: real UDP_L4 GSO packets will not segmented anymore and should be dropped (?!?)
On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote: > > Kernel exception is reported when making udp frag list > segmentation. > > Backtrace is as below: > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/ipv4/udp_offload.c:229 > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/ipv4/udp_offload.c:262 > > features=features@entry=19, is_ipv6=false) > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/ipv4/udp_offload.c:289 > > features=19) > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/ipv4/udp_offload.c:399 > > features=19) > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/ipv4/af_inet.c:1418 > > skb@entry=0x0, features=19, features@entry=0) > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53 > > tx_path=<optimized out>) > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124 > > A full backtrace would help better understanding the issue. Below is full backtrace: [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W OE 6.6.17-android15-0-g380371ea9bf1 #1 [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT) [ 1100.812215][ C3] Call trace: [ 1100.812218][ C3] dump_backtrace+0xec/0x138 [ 1100.812222][ C3] show_stack+0x18/0x24 [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c [ 1100.812232][ C3] dump_stack+0x18/0x24 [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump] [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump] [ 1100.812269][ C3] notifier_call_chain+0x90/0x174 [ 1100.812275][ C3] notify_die+0x50/0x8c [ 1100.812279][ C3] die+0x94/0x308 [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c [ 1100.812288][ C3] do_page_fault+0xa0/0x48c [ 1100.812293][ C3] do_translation_fault+0x38/0x54 [ 1100.812297][ C3] do_mem_abort+0x58/0x104 [ 1100.812302][ C3] el1_abort+0x3c/0x5c [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90 [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174 [ 1100.812326][ C3] inet_gso_segment+0x164/0x330 [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818 [ 1100.812358][ C3] udp_rcv+0x20/0x30 [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368 [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184 [ 1100.812373][ C3] ip_rcv+0x90/0x118 [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124 [ 1100.812383][ C3] process_backlog+0xd8/0x18c [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc [ 1100.812392][ C3] net_rx_action+0x150/0x334 [ 1100.812397][ C3] __do_softirq+0x120/0x3f4 [ 1100.812401][ C3] ____do_softirq+0x10/0x20 [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74 [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4 [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c [ 1100.812422][ C3] el1_interrupt+0x38/0x58 [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24 [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8 [ 1100.812443][ C3] cpuidle_enter+0x38/0x54 [ 1100.812449][ C3] do_idle+0x198/0x294 [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158 [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4 > > This packet's frag list is null while gso_type is not 0. Then it is > treated > > as a GRO-ed packet and sent to segment frag list. Function call > path is > > udp_rcv_segment => config features value > > __udpv4_gso_segment => skb_gso_ok returns false. Here it > should be > > true. > > Why? If I read correctly the above, this is GSO packet landing in an > UDP socket with no UDP_GRO sockopt. The packet is expected to be > segmented again. > Yes, it is GSO packet, however the fragment list of this GSO packet becomes NULL. As the occurrence rate is very low, we really don’t know why and when it becomes to be NULL. It happens both in cellular and wlan network and seems an unknown kernel issue. To avoid crash the packet should skip to be segmented when fraglist is null. > >Failed reason is features doesn't > match > > gso_type. > > __udp_gso_segment_list > > skb_segment_list => packet is linear with skb->next = > NULL > > __udpv4_gso_segment_list_csum => use skb->next directly > and > > crash happens > > > > In rx-gro-list GRO-ed packet is set gso type as > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In > gso > > flow the features should also set them to match with gso_type. Or > else it > > will always return false in skb_gso_ok. Then it can't discover the > > untrusted source packet and result crash in following function. > > What is the 'untrusted source' here? I read the above as the packet > aggregation happened in the GRO engine??? > > Could you please give a complete description of the relevant > scenario? > According to the backtrace info, we infer it is a rx-frag_list GRO packet. Before sending into the UDP socket with no UDP_GRO sockopt, it seems enter "skb_condense" to trim it and loose his frag list. However it still keeps gso_type and gso_size. Then it continues to do skb_segment_list. First crash happens in skb_segment_list. This patch resolves the crash and lets the packet becomes a skb without skb->next: https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/ Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish) -> __udpv4_gso_segment_list_csum, it uses skb->next without check then crash. What we want to do is to drop this abnormal packet. So we set features NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6 condation then drop it. Another solution is we can check the frag_list directly in skb_segment_lsit: diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 011d690..5b054f0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4336,6 +4336,8 @@ int len_diff, err; skb_push(skb, -skb_network_offset(skb) + offset); + if (!list_skb) + goto err_linearize; /* Ensure the head is writeable before touching the shared info */ err = skb_unclone(skb, GFP_ATOMIC); > > Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum > recalculation") > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > --- > > include/net/udp.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/net/udp.h b/include/net/udp.h > > index 488a6d2babcc..c87baa23b9da 100644 > > --- a/include/net/udp.h > > +++ b/include/net/udp.h > > @@ -464,7 +464,7 @@ void udpv6_encap_enable(void); > > static inline struct sk_buff *udp_rcv_segment(struct sock *sk, > > struct sk_buff *skb, bool ipv4) > > { > > -netdev_features_t features = NETIF_F_SG; > > +netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | > NETIF_F_GSO_FRAGLIST; > > This looks wrong: real UDP_L4 GSO packets will not segmented anymore > and should be dropped (?!?) > >
Lena Wang (王娜) wrote: > On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote: > > > Kernel exception is reported when making udp frag list > > segmentation. > > > Backtrace is as below: > > > at out/android15-6.6/kernel-6.6/kernel- > > 6.6/net/ipv4/udp_offload.c:229 > > > at out/android15-6.6/kernel-6.6/kernel- > > 6.6/net/ipv4/udp_offload.c:262 > > > features=features@entry=19, is_ipv6=false) > > > at out/android15-6.6/kernel-6.6/kernel- > > 6.6/net/ipv4/udp_offload.c:289 > > > features=19) > > > at out/android15-6.6/kernel-6.6/kernel- > > 6.6/net/ipv4/udp_offload.c:399 > > > features=19) > > > at out/android15-6.6/kernel-6.6/kernel- > > 6.6/net/ipv4/af_inet.c:1418 > > > skb@entry=0x0, features=19, features@entry=0) > > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53 > > > tx_path=<optimized out>) > > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124 > > > > A full backtrace would help better understanding the issue. > > Below is full backtrace: > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT) > [ 1100.812215][ C3] Call trace: > [ 1100.812218][ C3] dump_backtrace+0xec/0x138 > [ 1100.812222][ C3] show_stack+0x18/0x24 > [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c > [ 1100.812232][ C3] dump_stack+0x18/0x24 > [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump] > [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump] > [ 1100.812269][ C3] notifier_call_chain+0x90/0x174 > [ 1100.812275][ C3] notify_die+0x50/0x8c > [ 1100.812279][ C3] die+0x94/0x308 > [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c > [ 1100.812288][ C3] do_page_fault+0xa0/0x48c > [ 1100.812293][ C3] do_translation_fault+0x38/0x54 > [ 1100.812297][ C3] do_mem_abort+0x58/0x104 > [ 1100.812302][ C3] el1_abort+0x3c/0x5c > [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90 > [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174 > [ 1100.812326][ C3] inet_gso_segment+0x164/0x330 > [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818 > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368 > [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184 > [ 1100.812373][ C3] ip_rcv+0x90/0x118 > [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124 > [ 1100.812383][ C3] process_backlog+0xd8/0x18c > [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc > [ 1100.812392][ C3] net_rx_action+0x150/0x334 > [ 1100.812397][ C3] __do_softirq+0x120/0x3f4 > [ 1100.812401][ C3] ____do_softirq+0x10/0x20 > [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74 > [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c > [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4 > [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c > [ 1100.812422][ C3] el1_interrupt+0x38/0x58 > [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24 > [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c > [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8 > [ 1100.812443][ C3] cpuidle_enter+0x38/0x54 > [ 1100.812449][ C3] do_idle+0x198/0x294 > [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c > [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158 > [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4 > > > > This packet's frag list is null while gso_type is not 0. Then it is > > treated > > > as a GRO-ed packet and sent to segment frag list. Function call > > path is > > > udp_rcv_segment => config features value > > > __udpv4_gso_segment => skb_gso_ok returns false. Here it > > should be > > > true. > > > > Why? If I read correctly the above, this is GSO packet landing in an > > UDP socket with no UDP_GRO sockopt. The packet is expected to be > > segmented again. > > > Yes, it is GSO packet, however the fragment list of this GSO packet > becomes NULL. As the occurrence rate is very low, we really don’t know > why and when it becomes to be NULL. It happens both in cellular and > wlan network and seems an unknown kernel issue. > > To avoid crash the packet should skip to be segmented when fraglist is > null. > > > >Failed reason is features doesn't > > match > > > gso_type. > > > __udp_gso_segment_list > > > skb_segment_list => packet is linear with skb->next = > > NULL > > > __udpv4_gso_segment_list_csum => use skb->next directly > > and > > > crash happens > > > > > > In rx-gro-list GRO-ed packet is set gso type as > > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In > > gso > > > flow the features should also set them to match with gso_type. Or > > else it > > > will always return false in skb_gso_ok. Then it can't discover the > > > untrusted source packet and result crash in following function. > > > > What is the 'untrusted source' here? I read the above as the packet > > aggregation happened in the GRO engine??? > > > > Could you please give a complete description of the relevant > > scenario? > > > > According to the backtrace info, we infer it is a rx-frag_list GRO It would be helpful to see an skb_dump. But if this happens rarely in production, understood if that is not feasible. The packet arrives on process_backlog, so still not sure how it is produced. > packet. Before sending into the UDP socket with no UDP_GRO sockopt, it > seems enter "skb_condense" to trim it and loose his frag list. However > it still keeps gso_type and gso_size. Then it continues to do > skb_segment_list. > > First crash happens in skb_segment_list. > This patch resolves the crash and lets the packet becomes a skb without > skb->next: > https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/ > Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish) > -> __udpv4_gso_segment_list_csum, it uses skb->next without check then > crash. > > > What we want to do is to drop this abnormal packet. I think we want to deliver this packet if possible. Thanks for the added context. So this is assumed to be a GSO skb with SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug if true. You are suggesting that this happens in the skb_condense in __udp_enqueue_schedule_skb? If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST set. So one workaround (not fix) is to disable that. > So we set features > NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6 > condation then drop it.
On Sat, 2024-03-16 at 09:47 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Lena Wang (王娜) wrote: > > On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote: > > > > > > External email : Please do not click links or open attachments > until > > > you have verified the sender or the content. > > > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote: > > > > Kernel exception is reported when making udp frag list > > > segmentation. > > > > Backtrace is as below: > > > > at out/android15-6.6/kernel-6.6/kernel- > > > 6.6/net/ipv4/udp_offload.c:229 > > > > at out/android15-6.6/kernel-6.6/kernel- > > > 6.6/net/ipv4/udp_offload.c:262 > > > > features=features@entry=19, is_ipv6=false) > > > > at out/android15-6.6/kernel-6.6/kernel- > > > 6.6/net/ipv4/udp_offload.c:289 > > > > features=19) > > > > at out/android15-6.6/kernel-6.6/kernel- > > > 6.6/net/ipv4/udp_offload.c:399 > > > > features=19) > > > > at out/android15-6.6/kernel-6.6/kernel- > > > 6.6/net/ipv4/af_inet.c:1418 > > > > skb@entry=0x0, features=19, features@entry=0) > > > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/core/gso.c:53 > > > > tx_path=<optimized out>) > > > > at out/android15-6.6/kernel-6.6/kernel- > 6.6/net/core/gso.c:124 > > > > > > A full backtrace would help better understanding the issue. > > > > Below is full backtrace: > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT) > > [ 1100.812215][ C3] Call trace: > > [ 1100.812218][ C3] dump_backtrace+0xec/0x138 > > [ 1100.812222][ C3] show_stack+0x18/0x24 > > [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c > > [ 1100.812232][ C3] dump_stack+0x18/0x24 > > [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump] > > [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump] > > [ 1100.812269][ C3] notifier_call_chain+0x90/0x174 > > [ 1100.812275][ C3] notify_die+0x50/0x8c > > [ 1100.812279][ C3] die+0x94/0x308 > > [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c > > [ 1100.812288][ C3] do_page_fault+0xa0/0x48c > > [ 1100.812293][ C3] do_translation_fault+0x38/0x54 > > [ 1100.812297][ C3] do_mem_abort+0x58/0x104 > > [ 1100.812302][ C3] el1_abort+0x3c/0x5c > > [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90 > > [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174 > > [ 1100.812326][ C3] inet_gso_segment+0x164/0x330 > > [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818 > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368 > > [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184 > > [ 1100.812373][ C3] ip_rcv+0x90/0x118 > > [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124 > > [ 1100.812383][ C3] process_backlog+0xd8/0x18c > > [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc > > [ 1100.812392][ C3] net_rx_action+0x150/0x334 > > [ 1100.812397][ C3] __do_softirq+0x120/0x3f4 > > [ 1100.812401][ C3] ____do_softirq+0x10/0x20 > > [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74 > > [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c > > [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4 > > [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c > > [ 1100.812422][ C3] el1_interrupt+0x38/0x58 > > [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24 > > [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c > > [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8 > > [ 1100.812443][ C3] cpuidle_enter+0x38/0x54 > > [ 1100.812449][ C3] do_idle+0x198/0x294 > > [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c > > [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158 > > [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4 > > > > > > This packet's frag list is null while gso_type is not 0. Then > it is > > > treated > > > > as a GRO-ed packet and sent to segment frag list. Function call > > > path is > > > > udp_rcv_segment => config features value > > > > __udpv4_gso_segment => skb_gso_ok returns false. Here it > > > should be > > > > true. > > > > > > Why? If I read correctly the above, this is GSO packet landing in > an > > > UDP socket with no UDP_GRO sockopt. The packet is expected to be > > > segmented again. > > > > > Yes, it is GSO packet, however the fragment list of this GSO packet > > becomes NULL. As the occurrence rate is very low, we really don’t > know > > why and when it becomes to be NULL. It happens both in cellular and > > wlan network and seems an unknown kernel issue. > > > > To avoid crash the packet should skip to be segmented when fraglist > is > > null. > > > > > >Failed reason is features doesn't > > > match > > > > gso_type. > > > > __udp_gso_segment_list > > > > skb_segment_list => packet is linear with skb->next > = > > > NULL > > > > __udpv4_gso_segment_list_csum => use skb->next > directly > > > and > > > > crash happens > > > > > > > > In rx-gro-list GRO-ed packet is set gso type as > > > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. > In > > > gso > > > > flow the features should also set them to match with gso_type. > Or > > > else it > > > > will always return false in skb_gso_ok. Then it can't discover > the > > > > untrusted source packet and result crash in following function. > > > > > > What is the 'untrusted source' here? I read the above as the > packet > > > aggregation happened in the GRO engine??? > > > > > > Could you please give a complete description of the relevant > > > scenario? > > > > > > > According to the backtrace info, we infer it is a rx-frag_list GRO > > It would be helpful to see an skb_dump. But if this happens rarely in > production, understood if that is not feasible. > > The packet arrives on process_backlog, so still not sure how it is > produced. > Yes, it rarely happens. It is very hard to debug it and not sure its produced path. > > packet. Before sending into the UDP socket with no UDP_GRO sockopt, > it > > seems enter "skb_condense" to trim it and loose his frag list. > However > > it still keeps gso_type and gso_size. Then it continues to do > > skb_segment_list. > > > > First crash happens in skb_segment_list. > > This patch resolves the crash and lets the packet becomes a skb > without > > skb->next: > > https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/ > > Then crash moves to __udp_gso_sement_list -> > skb_segment_list(finish) > > -> __udpv4_gso_segment_list_csum, it uses skb->next without check > then > > crash. > > > > > > What we want to do is to drop this abnormal packet. > > I think we want to deliver this packet if possible. > > Thanks for the added context. So this is assumed to be a GSO skb with > SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug > if true. > > You are suggesting that this happens in the skb_condense in > __udp_enqueue_schedule_skb? > We try to add a skb_condense directly before skb_segment_list then get a similar KE backtrace and skb dump value same with this issue skb dump. However we don't know which condition trigger the flow runinto skb_condense. > If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST > set. > So one workaround (not fix) is to disable that. > As we met other issue previously with GRO in skb_segment(disalbe NETIF_F_GRO_FRAGLIST flow), it is still not safe to disable GRO fraglist. We hope current patch could be applied to drop the invalid packet. > > So we set features > > NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: > f2696099c6c6 > > condation then drop it.
diff --git a/include/net/udp.h b/include/net/udp.h index 488a6d2babcc..c87baa23b9da 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -464,7 +464,7 @@ void udpv6_encap_enable(void); static inline struct sk_buff *udp_rcv_segment(struct sock *sk, struct sk_buff *skb, bool ipv4) { - netdev_features_t features = NETIF_F_SG; + netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST; struct sk_buff *segs; /* Avoid csum recalculation by skb_segment unless userspace explicitly