diff mbox series

[net] net/tunnel: wait until all sk_user_data reader finish before releasing the sock

Message ID 20221208120452.556997-1-liuhangbin@gmail.com (mailing list archive)
State Accepted
Commit 3cf7203ca620682165706f70a1b12b5194607dce
Delegated to: Netdev Maintainers
Headers show
Series [net] net/tunnel: wait until all sk_user_data reader finish before releasing the sock | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: yoshfuji@linux-ipv6.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hangbin Liu Dec. 8, 2022, 12:04 p.m. UTC
There is a race condition in vxlan that when deleting a vxlan device
during receiving packets, there is a possibility that the sock is
released after getting vxlan_sock vs from sk_user_data. Then in
later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
NULL pointer dereference. e.g.

   #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
   #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
   #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
   #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
   #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
   #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
   #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
      [exception RIP: vxlan_ecn_decapsulate+0x3b]
      RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
      RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
      RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
      RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
      R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
      R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
   #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
   #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
  #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
  #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
  #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
  #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
  #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
  #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
  #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
  #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
  #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3

Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh

Fix this by waiting for all sk_user_data reader to finish before
releasing the sock.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Fixes: 6a93cc905274 ("udp-tunnel: Add a few more UDP tunnel APIs")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/udp_tunnel_core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jiri Pirko Dec. 8, 2022, 12:38 p.m. UTC | #1
Thu, Dec 08, 2022 at 01:04:52PM CET, liuhangbin@gmail.com wrote:
>There is a race condition in vxlan that when deleting a vxlan device
>during receiving packets, there is a possibility that the sock is
>released after getting vxlan_sock vs from sk_user_data. Then in
>later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
>NULL pointer dereference. e.g.
>
>   #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
>   #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
>   #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
>   #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
>   #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
>   #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
>   #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
>      [exception RIP: vxlan_ecn_decapsulate+0x3b]
>      RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
>      RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
>      RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
>      RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
>      R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
>      R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
>   #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
>   #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
>  #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
>  #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
>  #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
>  #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
>  #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
>  #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
>  #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
>  #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
>  #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3
>
>Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh
>
>Fix this by waiting for all sk_user_data reader to finish before
>releasing the sock.
>
>Reported-by: Jianlin Shi <jishi@redhat.com>
>Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
>Fixes: 6a93cc905274 ("udp-tunnel: Add a few more UDP tunnel APIs")
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
patchwork-bot+netdevbpf@kernel.org Dec. 12, 2022, 10 a.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu,  8 Dec 2022 20:04:52 +0800 you wrote:
> There is a race condition in vxlan that when deleting a vxlan device
> during receiving packets, there is a possibility that the sock is
> released after getting vxlan_sock vs from sk_user_data. Then in
> later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
> NULL pointer dereference. e.g.
> 
>    #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
>    #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
>    #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
>    #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
>    #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
>    #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
>    #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
>       [exception RIP: vxlan_ecn_decapsulate+0x3b]
>       RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
>       RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
>       RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
>       RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
>       R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
>       R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>    #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
>    #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
>    #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
>   #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
>   #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
>   #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
>   #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
>   #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
>   #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
>   #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
>   #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
>   #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3
> 
> [...]

Here is the summary with links:
  - [net] net/tunnel: wait until all sk_user_data reader finish before releasing the sock
    https://git.kernel.org/netdev/net/c/3cf7203ca620

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index 8242c8947340..5f8104cf082d 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -176,6 +176,7 @@  EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
 void udp_tunnel_sock_release(struct socket *sock)
 {
 	rcu_assign_sk_user_data(sock->sk, NULL);
+	synchronize_rcu();
 	kernel_sock_shutdown(sock, SHUT_RDWR);
 	sock_release(sock);
 }