diff mbox series

[net] udp: fix segmentation crash for untrusted source packet

Message ID 20240313133402.9027-1-shiming.cheng@mediatek.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] udp: fix segmentation crash for untrusted source packet | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1054 this patch: 1054
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 4 maintainers not CCed: linux-arm-kernel@lists.infradead.org angelogioacchino.delregno@collabora.com linux-mediatek@lists.infradead.org matthias.bgg@gmail.com
netdev/build_clang success Errors and warnings before: 957 this patch: 957
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1139 this patch: 1139
netdev/checkpatch warning WARNING: line length of 92 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-03-13--18-00 (tests: 906)

Commit Message

Shiming Cheng March 13, 2024, 1:34 p.m. UTC
Kernel exception is reported when making udp frag list segmentation.
Backtrace is as below:
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:229
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:262
features=features@entry=19, is_ipv6=false)
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:289
features=19)
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:399
features=19)
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/af_inet.c:1418
skb@entry=0x0, features=19, features@entry=0)
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
tx_path=<optimized out>)
    at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124

This packet's frag list is null while gso_type is not 0. Then it is treated
as a GRO-ed packet and sent to segment frag list. Function call path is
udp_rcv_segment => config features value
    __udpv4_gso_segment  => skb_gso_ok returns false. Here it should be
                            true. Failed reason is features doesn't match
                            gso_type.
        __udp_gso_segment_list
            skb_segment_list => packet is linear with skb->next = NULL
            __udpv4_gso_segment_list_csum => use skb->next directly and
                                             crash happens

In rx-gro-list GRO-ed packet is set gso type as
NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In gso
flow the features should also set them to match with gso_type. Or else it
will always return false in skb_gso_ok. Then it can't discover the
untrusted source packet and result crash in following function.

Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum recalculation")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 include/net/udp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Paolo Abeni March 13, 2024, 3:41 p.m. UTC | #1
On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> Kernel exception is reported when making udp frag list segmentation.
> Backtrace is as below:
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:229
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:262
> features=features@entry=19, is_ipv6=false)
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:289
> features=19)
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:399
> features=19)
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/af_inet.c:1418
> skb@entry=0x0, features=19, features@entry=0)
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> tx_path=<optimized out>)
>     at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124

A full backtrace would help better understanding the issue.

> This packet's frag list is null while gso_type is not 0. Then it is treated
> as a GRO-ed packet and sent to segment frag list. Function call path is
> udp_rcv_segment => config features value
>     __udpv4_gso_segment  => skb_gso_ok returns false. Here it should be
>                             true. 

Why? If I read correctly the above, this is GSO packet landing in an
UDP socket with no UDP_GRO sockopt. The packet is expected to be
segmented again.

>				Failed reason is features doesn't
match
>                             gso_type.
>         __udp_gso_segment_list
>             skb_segment_list => packet is linear with skb->next = NULL
>             __udpv4_gso_segment_list_csum => use skb->next directly and
>                                              crash happens
> 
> In rx-gro-list GRO-ed packet is set gso type as
> NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In gso
> flow the features should also set them to match with gso_type. Or else it
> will always return false in skb_gso_ok. Then it can't discover the
> untrusted source packet and result crash in following function.

What is the 'untrusted source' here? I read the above as the packet
aggregation happened in the GRO engine???

Could you please give a complete description of the relevant scenario?

> Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum recalculation")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  include/net/udp.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 488a6d2babcc..c87baa23b9da 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -464,7 +464,7 @@ void udpv6_encap_enable(void);
>  static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
>  					      struct sk_buff *skb, bool ipv4)
>  {
> -	netdev_features_t features = NETIF_F_SG;
> +	netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST;

This looks wrong: real UDP_L4 GSO packets will not segmented anymore
and should be dropped (?!?)
Lena Wang (王娜) March 15, 2024, 8:12 a.m. UTC | #2
On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > Kernel exception is reported when making udp frag list
> segmentation.
> > Backtrace is as below:
> >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:229
> >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:262
> > features=features@entry=19, is_ipv6=false)
> >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:289
> > features=19)
> >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:399
> > features=19)
> >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/af_inet.c:1418
> > skb@entry=0x0, features=19, features@entry=0)
> >     at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> > tx_path=<optimized out>)
> >     at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124
> 
> A full backtrace would help better understanding the issue.

Below is full backtrace:
 [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
 [ 1100.812211][    C3] Hardware name: MT6991(ENG) (DT)
 [ 1100.812215][    C3] Call trace:
 [ 1100.812218][    C3]  dump_backtrace+0xec/0x138
 [ 1100.812222][    C3]  show_stack+0x18/0x24
 [ 1100.812226][    C3]  dump_stack_lvl+0x50/0x6c
 [ 1100.812232][    C3]  dump_stack+0x18/0x24
 [ 1100.812237][    C3]  mrdump_common_die+0x24c/0x388 [mrdump]
 [ 1100.812259][    C3]  ipanic_die+0x20/0x34 [mrdump]
 [ 1100.812269][    C3]  notifier_call_chain+0x90/0x174
 [ 1100.812275][    C3]  notify_die+0x50/0x8c
 [ 1100.812279][    C3]  die+0x94/0x308
 [ 1100.812283][    C3]  __do_kernel_fault+0x240/0x26c
 [ 1100.812288][    C3]  do_page_fault+0xa0/0x48c
 [ 1100.812293][    C3]  do_translation_fault+0x38/0x54
 [ 1100.812297][    C3]  do_mem_abort+0x58/0x104
 [ 1100.812302][    C3]  el1_abort+0x3c/0x5c
 [ 1100.812307][    C3]  el1h_64_sync_handler+0x54/0x90
 [ 1100.812313][    C3]  el1h_64_sync+0x68/0x6c
 [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
 [ 1100.812322][    C3]  udp4_ufo_fragment+0x130/0x174
 [ 1100.812326][    C3]  inet_gso_segment+0x164/0x330
 [ 1100.812330][    C3]  skb_mac_gso_segment+0xc4/0x13c
 [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
 [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
 [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
 [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
 [ 1100.812353][    C3]  __udp4_lib_rcv+0x3e0/0x818
 [ 1100.812358][    C3]  udp_rcv+0x20/0x30
 [ 1100.812362][    C3]  ip_protocol_deliver_rcu+0x194/0x368
 [ 1100.812368][    C3]  ip_local_deliver+0xe4/0x184
 [ 1100.812373][    C3]  ip_rcv+0x90/0x118
 [ 1100.812378][    C3]  __netif_receive_skb+0x74/0x124
 [ 1100.812383][    C3]  process_backlog+0xd8/0x18c
 [ 1100.812388][    C3]  __napi_poll+0x5c/0x1fc
 [ 1100.812392][    C3]  net_rx_action+0x150/0x334
 [ 1100.812397][    C3]  __do_softirq+0x120/0x3f4
 [ 1100.812401][    C3]  ____do_softirq+0x10/0x20
 [ 1100.812405][    C3]  call_on_irq_stack+0x3c/0x74
 [ 1100.812410][    C3]  do_softirq_own_stack+0x1c/0x2c
 [ 1100.812414][    C3]  __irq_exit_rcu+0x5c/0xd4
 [ 1100.812418][    C3]  irq_exit_rcu+0x10/0x1c
 [ 1100.812422][    C3]  el1_interrupt+0x38/0x58
 [ 1100.812428][    C3]  el1h_64_irq_handler+0x18/0x24
 [ 1100.812434][    C3]  el1h_64_irq+0x68/0x6c
 [ 1100.812437][    C3]  arch_local_irq_enable+0x4/0x8
 [ 1100.812443][    C3]  cpuidle_enter+0x38/0x54
 [ 1100.812449][    C3]  do_idle+0x198/0x294
 [ 1100.812454][    C3]  cpu_startup_entry+0x34/0x3c
 [ 1100.812459][    C3]  secondary_start_kernel+0x138/0x158
 [ 1100.812465][    C3]  __secondary_switched+0xc0/0xc4

> > This packet's frag list is null while gso_type is not 0. Then it is
> treated
> > as a GRO-ed packet and sent to segment frag list. Function call
> path is
> > udp_rcv_segment => config features value
> >     __udpv4_gso_segment  => skb_gso_ok returns false. Here it
> should be
> >                             true. 
> 
> Why? If I read correctly the above, this is GSO packet landing in an
> UDP socket with no UDP_GRO sockopt. The packet is expected to be
> segmented again.
> 
Yes, it is GSO packet, however the fragment list of this GSO packet
becomes NULL. As the occurrence rate is very low, we really don’t know
why and when it becomes to be NULL. It happens both in cellular and
wlan network and seems an unknown kernel issue.

To avoid crash the packet should skip to be segmented when fraglist is
null.

> >Failed reason is features doesn't
> match
> >                             gso_type.
> >         __udp_gso_segment_list
> >             skb_segment_list => packet is linear with skb->next =
> NULL
> >             __udpv4_gso_segment_list_csum => use skb->next directly
> and
> >                                              crash happens
> > 
> > In rx-gro-list GRO-ed packet is set gso type as
> > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In
> gso
> > flow the features should also set them to match with gso_type. Or
> else it
> > will always return false in skb_gso_ok. Then it can't discover the
> > untrusted source packet and result crash in following function.
> 
> What is the 'untrusted source' here? I read the above as the packet
> aggregation happened in the GRO engine???
> 
> Could you please give a complete description of the relevant
> scenario?
> 

According to the backtrace info, we infer it is a rx-frag_list GRO
packet. Before sending into the UDP socket with no UDP_GRO sockopt, it
seems enter "skb_condense" to trim it and loose his frag list. However
it still keeps gso_type and gso_size. Then it continues to do
skb_segment_list.

First crash happens in skb_segment_list. 
This patch resolves the crash and lets the packet becomes a skb without
skb->next:
https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish)
-> __udpv4_gso_segment_list_csum, it uses skb->next without check then
crash.


What we want to do is to drop this abnormal packet. So we set features
NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6
condation then drop it. 

Another solution is we can check the frag_list directly in
skb_segment_lsit:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 011d690..5b054f0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4336,6 +4336,8 @@
  int len_diff, err;

  skb_push(skb, -skb_network_offset(skb) + offset);
+      if (!list_skb)
+             goto err_linearize;

  /* Ensure the head is writeable before touching the shared info */
  err = skb_unclone(skb, GFP_ATOMIC);

> > Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum
> recalculation")
> > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > ---
> >  include/net/udp.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/net/udp.h b/include/net/udp.h
> > index 488a6d2babcc..c87baa23b9da 100644
> > --- a/include/net/udp.h
> > +++ b/include/net/udp.h
> > @@ -464,7 +464,7 @@ void udpv6_encap_enable(void);
> >  static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> >        struct sk_buff *skb, bool ipv4)
> >  {
> > -netdev_features_t features = NETIF_F_SG;
> > +netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 |
> NETIF_F_GSO_FRAGLIST;
> 
> This looks wrong: real UDP_L4 GSO packets will not segmented anymore
> and should be dropped (?!?)
> 
>
Willem de Bruijn March 16, 2024, 1:47 p.m. UTC | #3
Lena Wang (王娜) wrote:
> On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > > Kernel exception is reported when making udp frag list
> > segmentation.
> > > Backtrace is as below:
> > >     at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:229
> > >     at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:262
> > > features=features@entry=19, is_ipv6=false)
> > >     at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:289
> > > features=19)
> > >     at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:399
> > > features=19)
> > >     at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/af_inet.c:1418
> > > skb@entry=0x0, features=19, features@entry=0)
> > >     at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> > > tx_path=<optimized out>)
> > >     at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124
> > 
> > A full backtrace would help better understanding the issue.
> 
> Below is full backtrace:
>  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
>  [ 1100.812211][    C3] Hardware name: MT6991(ENG) (DT)
>  [ 1100.812215][    C3] Call trace:
>  [ 1100.812218][    C3]  dump_backtrace+0xec/0x138
>  [ 1100.812222][    C3]  show_stack+0x18/0x24
>  [ 1100.812226][    C3]  dump_stack_lvl+0x50/0x6c
>  [ 1100.812232][    C3]  dump_stack+0x18/0x24
>  [ 1100.812237][    C3]  mrdump_common_die+0x24c/0x388 [mrdump]
>  [ 1100.812259][    C3]  ipanic_die+0x20/0x34 [mrdump]
>  [ 1100.812269][    C3]  notifier_call_chain+0x90/0x174
>  [ 1100.812275][    C3]  notify_die+0x50/0x8c
>  [ 1100.812279][    C3]  die+0x94/0x308
>  [ 1100.812283][    C3]  __do_kernel_fault+0x240/0x26c
>  [ 1100.812288][    C3]  do_page_fault+0xa0/0x48c
>  [ 1100.812293][    C3]  do_translation_fault+0x38/0x54
>  [ 1100.812297][    C3]  do_mem_abort+0x58/0x104
>  [ 1100.812302][    C3]  el1_abort+0x3c/0x5c
>  [ 1100.812307][    C3]  el1h_64_sync_handler+0x54/0x90
>  [ 1100.812313][    C3]  el1h_64_sync+0x68/0x6c
>  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
>  [ 1100.812322][    C3]  udp4_ufo_fragment+0x130/0x174
>  [ 1100.812326][    C3]  inet_gso_segment+0x164/0x330
>  [ 1100.812330][    C3]  skb_mac_gso_segment+0xc4/0x13c
>  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
>  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
>  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
>  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
>  [ 1100.812353][    C3]  __udp4_lib_rcv+0x3e0/0x818
>  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
>  [ 1100.812362][    C3]  ip_protocol_deliver_rcu+0x194/0x368
>  [ 1100.812368][    C3]  ip_local_deliver+0xe4/0x184
>  [ 1100.812373][    C3]  ip_rcv+0x90/0x118
>  [ 1100.812378][    C3]  __netif_receive_skb+0x74/0x124
>  [ 1100.812383][    C3]  process_backlog+0xd8/0x18c
>  [ 1100.812388][    C3]  __napi_poll+0x5c/0x1fc
>  [ 1100.812392][    C3]  net_rx_action+0x150/0x334
>  [ 1100.812397][    C3]  __do_softirq+0x120/0x3f4
>  [ 1100.812401][    C3]  ____do_softirq+0x10/0x20
>  [ 1100.812405][    C3]  call_on_irq_stack+0x3c/0x74
>  [ 1100.812410][    C3]  do_softirq_own_stack+0x1c/0x2c
>  [ 1100.812414][    C3]  __irq_exit_rcu+0x5c/0xd4
>  [ 1100.812418][    C3]  irq_exit_rcu+0x10/0x1c
>  [ 1100.812422][    C3]  el1_interrupt+0x38/0x58
>  [ 1100.812428][    C3]  el1h_64_irq_handler+0x18/0x24
>  [ 1100.812434][    C3]  el1h_64_irq+0x68/0x6c
>  [ 1100.812437][    C3]  arch_local_irq_enable+0x4/0x8
>  [ 1100.812443][    C3]  cpuidle_enter+0x38/0x54
>  [ 1100.812449][    C3]  do_idle+0x198/0x294
>  [ 1100.812454][    C3]  cpu_startup_entry+0x34/0x3c
>  [ 1100.812459][    C3]  secondary_start_kernel+0x138/0x158
>  [ 1100.812465][    C3]  __secondary_switched+0xc0/0xc4
> 
> > > This packet's frag list is null while gso_type is not 0. Then it is
> > treated
> > > as a GRO-ed packet and sent to segment frag list. Function call
> > path is
> > > udp_rcv_segment => config features value
> > >     __udpv4_gso_segment  => skb_gso_ok returns false. Here it
> > should be
> > >                             true. 
> > 
> > Why? If I read correctly the above, this is GSO packet landing in an
> > UDP socket with no UDP_GRO sockopt. The packet is expected to be
> > segmented again.
> > 
> Yes, it is GSO packet, however the fragment list of this GSO packet
> becomes NULL. As the occurrence rate is very low, we really don’t know
> why and when it becomes to be NULL. It happens both in cellular and
> wlan network and seems an unknown kernel issue.
>
> To avoid crash the packet should skip to be segmented when fraglist is
> null.
> 
> > >Failed reason is features doesn't
> > match
> > >                             gso_type.
> > >         __udp_gso_segment_list
> > >             skb_segment_list => packet is linear with skb->next =
> > NULL
> > >             __udpv4_gso_segment_list_csum => use skb->next directly
> > and
> > >                                              crash happens
> > > 
> > > In rx-gro-list GRO-ed packet is set gso type as
> > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In
> > gso
> > > flow the features should also set them to match with gso_type. Or
> > else it
> > > will always return false in skb_gso_ok. Then it can't discover the
> > > untrusted source packet and result crash in following function.
> > 
> > What is the 'untrusted source' here? I read the above as the packet
> > aggregation happened in the GRO engine???
> > 
> > Could you please give a complete description of the relevant
> > scenario?
> > 
> 
> According to the backtrace info, we infer it is a rx-frag_list GRO

It would be helpful to see an skb_dump. But if this happens rarely in
production, understood if that is not feasible.

The packet arrives on process_backlog, so still not sure how it is
produced.

> packet. Before sending into the UDP socket with no UDP_GRO sockopt, it
> seems enter "skb_condense" to trim it and loose his frag list. However
> it still keeps gso_type and gso_size. Then it continues to do
> skb_segment_list.
> 
> First crash happens in skb_segment_list. 
> This patch resolves the crash and lets the packet becomes a skb without
> skb->next:
> https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
> Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish)
> -> __udpv4_gso_segment_list_csum, it uses skb->next without check then
> crash.
> 
> 
> What we want to do is to drop this abnormal packet.

I think we want to deliver this packet if possible.

Thanks for the added context. So this is assumed to be a GSO skb with
SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug
if true.

You are suggesting that this happens in the skb_condense in
__udp_enqueue_schedule_skb?

If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST set.
So one workaround (not fix) is to disable that.

> So we set features
> NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6
> condation then drop it.
Lena Wang (王娜) March 26, 2024, 1:10 p.m. UTC | #4
On Sat, 2024-03-16 at 09:47 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Lena Wang (王娜) wrote:
> > On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
> > >   
> > > External email : Please do not click links or open attachments
> until
> > > you have verified the sender or the content.
> > >  On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > > > Kernel exception is reported when making udp frag list
> > > segmentation.
> > > > Backtrace is as below:
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:229
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:262
> > > > features=features@entry=19, is_ipv6=false)
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:289
> > > > features=19)
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:399
> > > > features=19)
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/af_inet.c:1418
> > > > skb@entry=0x0, features=19, features@entry=0)
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/core/gso.c:53
> > > > tx_path=<optimized out>)
> > > >     at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/core/gso.c:124
> > > 
> > > A full backtrace would help better understanding the issue.
> > 
> > Below is full backtrace:
> >  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
> >  [ 1100.812211][    C3] Hardware name: MT6991(ENG) (DT)
> >  [ 1100.812215][    C3] Call trace:
> >  [ 1100.812218][    C3]  dump_backtrace+0xec/0x138
> >  [ 1100.812222][    C3]  show_stack+0x18/0x24
> >  [ 1100.812226][    C3]  dump_stack_lvl+0x50/0x6c
> >  [ 1100.812232][    C3]  dump_stack+0x18/0x24
> >  [ 1100.812237][    C3]  mrdump_common_die+0x24c/0x388 [mrdump]
> >  [ 1100.812259][    C3]  ipanic_die+0x20/0x34 [mrdump]
> >  [ 1100.812269][    C3]  notifier_call_chain+0x90/0x174
> >  [ 1100.812275][    C3]  notify_die+0x50/0x8c
> >  [ 1100.812279][    C3]  die+0x94/0x308
> >  [ 1100.812283][    C3]  __do_kernel_fault+0x240/0x26c
> >  [ 1100.812288][    C3]  do_page_fault+0xa0/0x48c
> >  [ 1100.812293][    C3]  do_translation_fault+0x38/0x54
> >  [ 1100.812297][    C3]  do_mem_abort+0x58/0x104
> >  [ 1100.812302][    C3]  el1_abort+0x3c/0x5c
> >  [ 1100.812307][    C3]  el1h_64_sync_handler+0x54/0x90
> >  [ 1100.812313][    C3]  el1h_64_sync+0x68/0x6c
> >  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
> >  [ 1100.812322][    C3]  udp4_ufo_fragment+0x130/0x174
> >  [ 1100.812326][    C3]  inet_gso_segment+0x164/0x330
> >  [ 1100.812330][    C3]  skb_mac_gso_segment+0xc4/0x13c
> >  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
> >  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
> >  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
> >  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
> >  [ 1100.812353][    C3]  __udp4_lib_rcv+0x3e0/0x818
> >  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> >  [ 1100.812362][    C3]  ip_protocol_deliver_rcu+0x194/0x368
> >  [ 1100.812368][    C3]  ip_local_deliver+0xe4/0x184
> >  [ 1100.812373][    C3]  ip_rcv+0x90/0x118
> >  [ 1100.812378][    C3]  __netif_receive_skb+0x74/0x124
> >  [ 1100.812383][    C3]  process_backlog+0xd8/0x18c
> >  [ 1100.812388][    C3]  __napi_poll+0x5c/0x1fc
> >  [ 1100.812392][    C3]  net_rx_action+0x150/0x334
> >  [ 1100.812397][    C3]  __do_softirq+0x120/0x3f4
> >  [ 1100.812401][    C3]  ____do_softirq+0x10/0x20
> >  [ 1100.812405][    C3]  call_on_irq_stack+0x3c/0x74
> >  [ 1100.812410][    C3]  do_softirq_own_stack+0x1c/0x2c
> >  [ 1100.812414][    C3]  __irq_exit_rcu+0x5c/0xd4
> >  [ 1100.812418][    C3]  irq_exit_rcu+0x10/0x1c
> >  [ 1100.812422][    C3]  el1_interrupt+0x38/0x58
> >  [ 1100.812428][    C3]  el1h_64_irq_handler+0x18/0x24
> >  [ 1100.812434][    C3]  el1h_64_irq+0x68/0x6c
> >  [ 1100.812437][    C3]  arch_local_irq_enable+0x4/0x8
> >  [ 1100.812443][    C3]  cpuidle_enter+0x38/0x54
> >  [ 1100.812449][    C3]  do_idle+0x198/0x294
> >  [ 1100.812454][    C3]  cpu_startup_entry+0x34/0x3c
> >  [ 1100.812459][    C3]  secondary_start_kernel+0x138/0x158
> >  [ 1100.812465][    C3]  __secondary_switched+0xc0/0xc4
> > 
> > > > This packet's frag list is null while gso_type is not 0. Then
> it is
> > > treated
> > > > as a GRO-ed packet and sent to segment frag list. Function call
> > > path is
> > > > udp_rcv_segment => config features value
> > > >     __udpv4_gso_segment  => skb_gso_ok returns false. Here it
> > > should be
> > > >                             true. 
> > > 
> > > Why? If I read correctly the above, this is GSO packet landing in
> an
> > > UDP socket with no UDP_GRO sockopt. The packet is expected to be
> > > segmented again.
> > > 
> > Yes, it is GSO packet, however the fragment list of this GSO packet
> > becomes NULL. As the occurrence rate is very low, we really don’t
> know
> > why and when it becomes to be NULL. It happens both in cellular and
> > wlan network and seems an unknown kernel issue.
> >
> > To avoid crash the packet should skip to be segmented when fraglist
> is
> > null.
> > 
> > > >Failed reason is features doesn't
> > > match
> > > >                             gso_type.
> > > >         __udp_gso_segment_list
> > > >             skb_segment_list => packet is linear with skb->next 
> =
> > > NULL
> > > >             __udpv4_gso_segment_list_csum => use skb->next
> directly
> > > and
> > > >                                              crash happens
> > > > 
> > > > In rx-gro-list GRO-ed packet is set gso type as
> > > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete.
> In
> > > gso
> > > > flow the features should also set them to match with gso_type.
> Or
> > > else it
> > > > will always return false in skb_gso_ok. Then it can't discover
> the
> > > > untrusted source packet and result crash in following function.
> > > 
> > > What is the 'untrusted source' here? I read the above as the
> packet
> > > aggregation happened in the GRO engine???
> > > 
> > > Could you please give a complete description of the relevant
> > > scenario?
> > > 
> > 
> > According to the backtrace info, we infer it is a rx-frag_list GRO
> 
> It would be helpful to see an skb_dump. But if this happens rarely in
> production, understood if that is not feasible.
> 
> The packet arrives on process_backlog, so still not sure how it is
> produced.
> 
Yes, it rarely happens. It is very hard to debug it and not sure its
produced path.

> > packet. Before sending into the UDP socket with no UDP_GRO sockopt,
> it
> > seems enter "skb_condense" to trim it and loose his frag list.
> However
> > it still keeps gso_type and gso_size. Then it continues to do
> > skb_segment_list.
> > 
> > First crash happens in skb_segment_list. 
> > This patch resolves the crash and lets the packet becomes a skb
> without
> > skb->next:
> > https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
> > Then crash moves to __udp_gso_sement_list ->
> skb_segment_list(finish)
> > -> __udpv4_gso_segment_list_csum, it uses skb->next without check
> then
> > crash.
> > 
> > 
> > What we want to do is to drop this abnormal packet.
> 
> I think we want to deliver this packet if possible.
> 
> Thanks for the added context. So this is assumed to be a GSO skb with
> SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug
> if true.
> 
> You are suggesting that this happens in the skb_condense in
> __udp_enqueue_schedule_skb?
> 
We try to add a skb_condense directly before skb_segment_list then get
a similar KE backtrace and skb dump value same with this issue skb
dump. 
However we don't know which condition trigger the flow runinto
skb_condense.

> If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST
> set.
> So one workaround (not fix) is to disable that.
> 
As we met other issue previously with GRO in
skb_segment(disalbe NETIF_F_GRO_FRAGLIST flow), it is still not safe to
disable GRO fraglist.

We hope current patch could be applied to drop the invalid packet.

> > So we set features
> > NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes:
> f2696099c6c6
> > condation then drop it.
diff mbox series

Patch

diff --git a/include/net/udp.h b/include/net/udp.h
index 488a6d2babcc..c87baa23b9da 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -464,7 +464,7 @@  void udpv6_encap_enable(void);
 static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 					      struct sk_buff *skb, bool ipv4)
 {
-	netdev_features_t features = NETIF_F_SG;
+	netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST;
 	struct sk_buff *segs;
 
 	/* Avoid csum recalculation by skb_segment unless userspace explicitly