[net-next,v19,00/26] Introducing OpenVPN Data Channel Offload

Message ID	20250211-b4-ovpn-v19-0-86d5daf2a47a@openvpn.net (mailing list archive)
Headers	show Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 187EE1805E for <linux-kselftest@vger.kernel.org>; Tue, 11 Feb 2025 00:40:50 +0000 (UTC) From: Antonio Quartulli <antonio@openvpn.net> Subject: [PATCH net-next v19 00/26] Introducing OpenVPN Data Channel Offload Date: Tue, 11 Feb 2025 01:39:53 +0100 Message-Id: <20250211-b4-ovpn-v19-0-86d5daf2a47a@openvpn.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Donald Hunter <donald.hunter@gmail.com>, Antonio Quartulli <antonio@openvpn.net>, Shuah Khan <shuah@kernel.org>, sd@queasysnail.net, ryazanov.s.a@gmail.com, Andrew Lunn <andrew+netdev@lunn.ch> Cc: Simon Horman <horms@kernel.org>, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Xiao Liang <shaw.leon@gmail.com>, steffen.klassert@secunet.com, antony.antony@secunet.com, willemdebruijn.kernel@gmail.com, David Ahern <dsahern@kernel.org>, Andrew Lunn <andrew@lunn.ch>, Shuah Khan <skhan@linuxfoundation.org>, Andrew Morton <akpm@linux-foundation.org>
Series	Introducing OpenVPN Data Channel Offload \| expand [net-next,v19,00/26] Introducing OpenVPN Data Channel Offload [net-next,v19,01/26] net: introduce OpenVPN Data Channel Offload (ovpn) [net-next,v19,02/26] ovpn: add basic netlink support [net-next,v19,03/26] ovpn: add basic interface creation/destruction/management routines [net-next,v19,04/26] ovpn: keep carrier always on for MP interfaces [net-next,v19,05/26] ovpn: introduce the ovpn_peer object [net-next,v19,06/26] ovpn: introduce the ovpn_socket object [net-next,v19,07/26] ovpn: implement basic TX path (UDP) [net-next,v19,08/26] ovpn: implement basic RX path (UDP) [net-next,v19,09/26] ovpn: implement packet processing [net-next,v19,10/26] ovpn: store tunnel and transport statistics [net-next,v19,11/26] ipv6: export inet6_stream_ops via EXPORT_SYMBOL_GPL [net-next,v19,12/26] ovpn: implement TCP transport [net-next,v19,13/26] skb: implement skb_send_sock_locked_with_flags() [net-next,v19,14/26] ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg [net-next,v19,15/26] ovpn: implement multi-peer support [net-next,v19,16/26] ovpn: implement peer lookup logic [net-next,v19,17/26] ovpn: implement keepalive mechanism [net-next,v19,18/26] ovpn: add support for updating local UDP endpoint [net-next,v19,19/26] ovpn: add support for peer floating [net-next,v19,20/26] ovpn: implement peer add/get/dump/delete via netlink [net-next,v19,21/26] ovpn: implement key add/get/del/swap via netlink [net-next,v19,22/26] ovpn: kill key and notify userspace in case of IV exhaustion [net-next,v19,23/26] ovpn: notify userspace when a peer is deleted [net-next,v19,24/26] ovpn: add basic ethtool support [net-next,v19,25/26] testing/selftests: add test tool and scripts for ovpn module [net-next,v19,26/26] mailmap: remove unwanted entry for Antonio Quartulli

Antonio Quartulli Feb. 11, 2025, 12:39 a.m. UTC

All minor and major reported problems have been finally addressed.
Big thanks to Sabrina, who took the time to guide me through
converting the peer socket to an RCU pointer.

This patchset comes with an extra patch for .mailmap to remove
my @openvpn.net address from the file, otherwise git won't allow
me to send patches from this address.

Notable changes since v18:
* ovpnstruct.h renamed to ovpnpriv.h to reflect new struct name
* removed peers release upon NETDEV_DOWN
* moved netdev_put() to after call_rcu() in ovpn_peer_release()
* removed DEBUG_NET_WARN_ON_ONCE in case of mismatching peer in
  peer_del_p2p()
* turned peer->sock into RCU pointer to be able to guarantee
  its validity in every context (thanks Sabrina for this!)
** the point above forced various socket handling functions to
   change into using a properly dereferenced RCU pointer instead
   of accessing peer->sock directly
* ensured peer->sock is NULLified upon release so that other contexts
  have a way to check if the member carries a valid pointer or not
* fixed ovpn_opcode_from_skb()'s comment about accessing 4 bytes (not 1)
* improved ovpn_udp_encap_recv()'s comment wrt various packet types
* ensured that 'iv' in ovpn_aead_crypt/decrypt() is malloc'd and
  carried in the skb's cb for async crypto usage
* added missing ovpn_crypto_key_slot_put() in ovpn_encrypt_one() error
  path
* TCP TX work item moved from ovpn_peer to ovpn_socket to ensure it
  stays alive along with the socket (peer may disappear earlier during
  release)
* pulled 4 bytes (instead of 1) before calling ovpn_opcode_from_skb()
  in ovpn_tcp_rcv()
* added missing kfree_skb() in ovpn_tcp_socket_detach()
* moved call to ovpn_mp_alloc() to ndo_init() to ensure it could always
  be rolled back in case of register_netdevice() failure
* splitted by_vpn_addr hashtables into by_vpn_addr4 and by_vpn_addr6 so
  that peers assigned a v4 or a v6 can be hashed separately. This
  avoids hash_entry offset mix ups during entries iterations
* changed ovpn_nl_attr_sockaddr_remote() into returning just bool
* added a few more consistency checks in peer_new/peer_set
* added completion member to peer to signal when socket detachment has
  finished and thus allow peer_del call to happily terminate (thanks
  Sabrina again!)
* extended selftest script with "peer removal during traffic exchange"

Please note that some patches were already reviewed/tested by a few
people. These patches have retained the tags as they have hardly been
touched.

The latest code can also be found at:

https://github.com/OpenVPN/linux-kernel-ovpn

NOTE: TCP tests are still showing the following warning while running
iperf.
I have analysed the report several times, but it definitely looks like
a false positive to me, so nothing to worry about.

Basically the lockdep engine gets confused thinking that we are
acquiring the lock twice on the same sock, but actually:
1) the kernel is first locking the 'iperf' (user) TCP socket;
2) ovpn is later locking the underlying TCP trasport socket.

So there is NO risk of deadlock (and indeed nothing hangs), but I
couldn't find a way to make the warning go away.

Here it is:

[ 1056.319348] ============================================
[ 1056.319740] WARNING: possible recursive locking detected
[ 1056.319740] 6.14.0-rc1-00217-g30aa34e21028-dirty #111 Tainted: G           O
[ 1056.319740] --------------------------------------------
[ 1056.319740] iperf3/1438 is trying to acquire lock:
[ 1056.319740] ffff88811ea10218 (slock-AF_INET){+.-.}-{3:3}, at: ovpn_tcp_send_skb+0xef/0x490 [ovpn]
[ 1056.319740]
               but task is already holding lock:
[ 1056.319740] ffff8881112b1c98 (slock-AF_INET){+.-.}-{3:3}, at: tcp_delack_timer+0xaa/0x2e0
[ 1056.319740]
               other info that might help us debug this:
[ 1056.319740]  Possible unsafe locking scenario:

[ 1056.319740]        CPU0
[ 1056.319740]        ----
[ 1056.319740]   lock(slock-AF_INET);
[ 1056.319740]   lock(slock-AF_INET);
[ 1056.319740]
                *** DEADLOCK ***

[ 1056.319740]  May be due to missing lock nesting notation

[ 1056.319740] 6 locks held by iperf3/1438:
[ 1056.319740]  #0: ffffc90000400c28 ((&icsk->icsk_delack_timer)){+.-.}-{0:0}, at: call_timer_fn+0x102/0x230
[ 1056.319740]  #1: ffff8881112b1c98 (slock-AF_INET){+.-.}-{3:3}, at: tcp_delack_timer+0xaa/0x2e0
[ 1056.319740]  #2: ffffffff85ef2b20 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x68/0x1190
[ 1056.319740]  #3: ffffffff85ef2b20 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0x2d4/0x1bb0
[ 1056.319740]  #4: ffffffff85ef2ac0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x1fc/0xee0
[ 1056.319740]  #5: ffffffff85ef2b20 (rcu_read_lock){....}-{1:3}, at: ovpn_encrypt_post+0x168/0x5d0 [ovpn]
[ 1056.319740]
               stack backtrace:
[ 1056.319740] CPU: 4 UID: 0 PID: 1438 Comm: iperf3 Tainted: G           O       6.14.0-rc1-00217-g30aa34e21028-dirty #111
[ 1056.319740] Tainted: [O]=OOT_MODULE
[ 1056.319740] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
[ 1056.319740] Call Trace:
[ 1056.319740]  <IRQ>
[ 1056.319740]  dump_stack_lvl+0x75/0xb0
[ 1056.319740]  dump_stack+0x14/0x16
[ 1056.319740]  print_deadlock_bug.cold+0xbd/0xc7
[ 1056.319740]  validate_chain+0x6db/0xe00
[ 1056.319740]  ? kfree_sensitive+0x63/0x70
[ 1056.319740]  ? check_prev_add+0xa60/0xa60
[ 1056.319740]  ? mark_lock+0x34/0x2d0
[ 1056.319740]  ? check_wait_context.isra.0+0x245/0x6a0
[ 1056.319740]  __lock_acquire+0x6ed/0x1000
[ 1056.319740]  lock_acquire+0x182/0x3c0
[ 1056.319740]  ? ovpn_tcp_send_skb+0xef/0x490 [ovpn]
[ 1056.319740]  ? __lock_acquire+0x1000/0x1000
[ 1056.319740]  ? ovpn_encrypt_post+0x168/0x5d0 [ovpn]
[ 1056.319740]  ? __lock_acquire+0x1000/0x1000
[ 1056.319740]  ? __kasan_slab_free+0x47/0x50
[ 1056.319740]  _raw_spin_lock+0x36/0x50
[ 1056.319740]  ? ovpn_tcp_send_skb+0xef/0x490 [ovpn]
[ 1056.319740]  ovpn_tcp_send_skb+0xef/0x490 [ovpn]
[ 1056.319740]  ovpn_encrypt_post+0x388/0x5d0 [ovpn]
[ 1056.319740]  ovpn_send+0x2a9/0x720 [ovpn]
[ 1056.319740]  ? ovpn_encrypt_post+0x5d0/0x5d0 [ovpn]
[ 1056.319740]  ovpn_net_xmit+0x5e1/0xa40 [ovpn]
[ 1056.319740]  ? ovpn_send+0x720/0x720 [ovpn]
[ 1056.319740]  dev_hard_start_xmit+0x1c4/0x350
[ 1056.319740]  __dev_queue_xmit+0x477/0xee0
[ 1056.319740]  ? netdev_core_pick_tx+0x2b0/0x2b0
[ 1056.319740]  ? selinux_ip_postroute_compat+0x460/0x460
[ 1056.319740]  neigh_direct_output+0x17/0x20
[ 1056.319740]  ip_finish_output2+0x6fd/0x1bb0
[ 1056.319740]  ? ip_fragment.constprop.0+0x220/0x220
[ 1056.319740]  ? ip_frag_next+0xcf0/0xcf0
[ 1056.319740]  __ip_finish_output+0x2f4/0x530
[ 1056.319740]  ip_output+0x166/0x560
[ 1056.319740]  ? __ip_local_out+0x44d/0x800
[ 1056.319740]  ? ip_finish_output+0x20/0x20
[ 1056.319740]  ? __ip_finish_output+0x530/0x530
[ 1056.319740]  ? __kasan_check_write+0x18/0x20
[ 1056.319740]  __ip_queue_xmit+0x6ec/0x1190
[ 1056.319740]  ip_queue_xmit+0x4e/0x70
[ 1056.319740]  __tcp_transmit_skb+0x17ce/0x3480
[ 1056.319740]  ? __tcp_select_window+0x1700/0x1700
[ 1056.319740]  ? __build_skb_around+0x23d/0x350
[ 1056.319740]  __tcp_send_ack+0x393/0x700
[ 1056.319740]  tcp_send_ack+0x43/0x50
[ 1056.319740]  tcp_delack_timer_handler+0x2b2/0x440
[ 1056.319740]  ? tcp_clamp_probe0_to_user_timeout+0x110/0x110
[ 1056.319740]  tcp_delack_timer+0x1c9/0x2e0
[ 1056.319740]  ? call_timer_fn+0x102/0x230
[ 1056.319740]  ? tcp_delack_timer_handler+0x440/0x440
[ 1056.319740]  call_timer_fn+0x151/0x230
[ 1056.319740]  ? call_timer_fn+0x102/0x230
[ 1056.319740]  ? trace_timer_expire_entry+0x1e0/0x1e0
[ 1056.319740]  __run_timers+0x62f/0x950
[ 1056.319740]  ? tcp_delack_timer_handler+0x440/0x440
[ 1056.319740]  ? call_timer_fn+0x230/0x230
[ 1056.319740]  ? __kasan_check_write+0x18/0x20
[ 1056.319740]  ? trace_irq_disable+0x160/0x1e0
[ 1056.319740]  run_timer_softirq+0x156/0x260
[ 1056.319740]  ? __run_timers+0x950/0x950
[ 1056.319740]  handle_softirqs+0x208/0x5d0
[ 1056.319740]  ? trace_softirq_exit+0x1d0/0x1d0
[ 1056.319740]  ? trace_preempt_on+0x21/0x30
[ 1056.319740]  __irq_exit_rcu+0xc5/0x100
[ 1056.319740]  irq_exit_rcu+0x12/0x20
[ 1056.319740]  sysvec_apic_timer_interrupt+0x84/0xb0
[ 1056.319740]  </IRQ>
[ 1056.319740]  <TASK>
[ 1056.319740]  asm_sysvec_apic_timer_interrupt+0x1f/0x30
[ 1056.319740] RIP: 0010:acpi_pm_read+0x14/0x20
[ 1056.319740] Code: dd 7a 04 31 f6 48 89 e5 ff d0 5d c3 c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 55 8b 15 10 cc 2c 03 48 89 e5 ed <25> ff ff ff 00 5d c3 0f 1f 44 00 00 f3 0f 1e fa 0f 1f 44 00 00 8b
[ 1056.319740] RSP: 0018:ffffc90002277bf8 EFLAGS: 00000246
[ 1056.319740] RAX: 000000000097dc75 RBX: dffffc0000000000 RCX: ffffffff817c4b93
[ 1056.319740] RDX: 0000000000000608 RSI: 0000000000000000 RDI: ffffffff868a5960
[ 1056.319740] RBP: ffffc90002277bf8 R08: 0000000000000001 R09: 0000000000000000
[ 1056.319740] R10: 0000000000000001 R11: 0000000000000000 R12: fffff5200044ef8e
[ 1056.319740] R13: ffffc90002277c70 R14: 0000000000034608 R15: ffffffff868a5960
[ 1056.319740]  ? __x64_sys_gettimeofday+0xc3/0x1a0
[ 1056.319740]  ktime_get_real_ts64+0xcd/0x2d0
[ 1056.319740]  __x64_sys_gettimeofday+0xc3/0x1a0
[ 1056.319740]  ? put_itimerspec64+0x190/0x190
[ 1056.319740]  x64_sys_call+0x1741/0x17d0
[ 1056.319740]  do_syscall_64+0xd9/0x1b0
[ 1056.319740]  ? syscall_exit_to_user_mode+0xd4/0x1e0
[ 1056.319740]  ? do_syscall_64+0xe6/0x1b0
[ 1056.319740]  ? syscall_exit_to_user_mode+0xd4/0x1e0
[ 1056.319740]  ? do_syscall_64+0xe6/0x1b0
[ 1056.319740]  ? __might_fault+0xb9/0x120
[ 1056.319740]  ? __might_fault+0xee/0x120
[ 1056.319740]  ? syscall_exit_to_user_mode+0xd4/0x1e0
[ 1056.319740]  ? do_syscall_64+0xe6/0x1b0
[ 1056.319740]  ? syscall_exit_to_user_mode+0xd4/0x1e0
[ 1056.319740]  ? do_syscall_64+0xe6/0x1b0
[ 1056.319740]  ? do_syscall_64+0xe6/0x1b0
[ 1056.319740]  ? do_syscall_64+0xe6/0x1b0
[ 1056.319740]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1056.319740] RIP: 0033:0x7fa5fb82dabb
[ 1056.319740] Code: e8 26 89 c0 48 89 47 08 48 85 f6 75 1e 31 c0 c3 81 3d 59 b5 ff ff ff ff ff 7f 74 76 f3 90 e9 3c ff ff ff b8 60 00 00 00 0f 05 <c3> 81 3d 3e b5 ff ff ff ff ff 7f 48 8d 15 33 b5 ff ff 48 8d 05 2c
[ 1056.319740] RSP: 002b:00007ffd9d3763b8 EFLAGS: 00000297 ORIG_RAX: 0000000000000060
[ 1056.319740] RAX: ffffffffffffffda RBX: 000055a374dfb260 RCX: 00007fa5fb82dabb
[ 1056.319740] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffd9d3763c0
[ 1056.319740] RBP: 00007ffd9d3763d0 R08: 0000000000000000 R09: 0000000000034608
[ 1056.319740] R10: 00007fa5fb829000 R11: 0000000000000297 R12: 00007ffd9d376450
[ 1056.319740] R13: 00007ffd9d3763c0 R14: 0000000000000001 R15: 0000000000000001
[ 1056.319740]  </TASK>

Thanks a lot!
Best Regards,

Antonio Quartulli
OpenVPN Inc.

---
Antonio Quartulli (26):
      net: introduce OpenVPN Data Channel Offload (ovpn)
      ovpn: add basic netlink support
      ovpn: add basic interface creation/destruction/management routines
      ovpn: keep carrier always on for MP interfaces
      ovpn: introduce the ovpn_peer object
      ovpn: introduce the ovpn_socket object
      ovpn: implement basic TX path (UDP)
      ovpn: implement basic RX path (UDP)
      ovpn: implement packet processing
      ovpn: store tunnel and transport statistics
      ipv6: export inet6_stream_ops via EXPORT_SYMBOL_GPL
      ovpn: implement TCP transport
      skb: implement skb_send_sock_locked_with_flags()
      ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg
      ovpn: implement multi-peer support
      ovpn: implement peer lookup logic
      ovpn: implement keepalive mechanism
      ovpn: add support for updating local UDP endpoint
      ovpn: add support for peer floating
      ovpn: implement peer add/get/dump/delete via netlink
      ovpn: implement key add/get/del/swap via netlink
      ovpn: kill key and notify userspace in case of IV exhaustion
      ovpn: notify userspace when a peer is deleted
      ovpn: add basic ethtool support
      testing/selftests: add test tool and scripts for ovpn module
      mailmap: remove unwanted entry for Antonio Quartulli

 .mailmap                                           |    1 -
 Documentation/netlink/specs/ovpn.yaml              |  372 +++
 Documentation/netlink/specs/rt_link.yaml           |   16 +
 MAINTAINERS                                        |   11 +
 drivers/net/Kconfig                                |   15 +
 drivers/net/Makefile                               |    1 +
 drivers/net/ovpn/Makefile                          |   22 +
 drivers/net/ovpn/bind.c                            |   55 +
 drivers/net/ovpn/bind.h                            |  101 +
 drivers/net/ovpn/crypto.c                          |  211 ++
 drivers/net/ovpn/crypto.h                          |  145 ++
 drivers/net/ovpn/crypto_aead.c                     |  408 ++++
 drivers/net/ovpn/crypto_aead.h                     |   33 +
 drivers/net/ovpn/io.c                              |  462 ++++
 drivers/net/ovpn/io.h                              |   34 +
 drivers/net/ovpn/main.c                            |  346 +++
 drivers/net/ovpn/main.h                            |   14 +
 drivers/net/ovpn/netlink-gen.c                     |  213 ++
 drivers/net/ovpn/netlink-gen.h                     |   41 +
 drivers/net/ovpn/netlink.c                         | 1248 ++++++++++
 drivers/net/ovpn/netlink.h                         |   18 +
 drivers/net/ovpn/ovpnpriv.h                        |   57 +
 drivers/net/ovpn/peer.c                            | 1273 +++++++++++
 drivers/net/ovpn/peer.h                            |  164 ++
 drivers/net/ovpn/pktid.c                           |  129 ++
 drivers/net/ovpn/pktid.h                           |   87 +
 drivers/net/ovpn/proto.h                           |  118 +
 drivers/net/ovpn/skb.h                             |   61 +
 drivers/net/ovpn/socket.c                          |  223 ++
 drivers/net/ovpn/socket.h                          |   51 +
 drivers/net/ovpn/stats.c                           |   21 +
 drivers/net/ovpn/stats.h                           |   47 +
 drivers/net/ovpn/tcp.c                             |  557 +++++
 drivers/net/ovpn/tcp.h                             |   34 +
 drivers/net/ovpn/udp.c                             |  421 ++++
 drivers/net/ovpn/udp.h                             |   24 +
 include/linux/skbuff.h                             |    2 +
 include/uapi/linux/if_link.h                       |   15 +
 include/uapi/linux/ovpn.h                          |  111 +
 include/uapi/linux/udp.h                           |    1 +
 net/core/skbuff.c                                  |   18 +-
 net/ipv6/af_inet6.c                                |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/net/ovpn/.gitignore        |    2 +
 tools/testing/selftests/net/ovpn/Makefile          |   17 +
 tools/testing/selftests/net/ovpn/config            |   10 +
 tools/testing/selftests/net/ovpn/data64.key        |    5 +
 tools/testing/selftests/net/ovpn/ovpn-cli.c        | 2377 ++++++++++++++++++++
 tools/testing/selftests/net/ovpn/tcp_peers.txt     |    5 +
 .../testing/selftests/net/ovpn/test-chachapoly.sh  |    9 +
 tools/testing/selftests/net/ovpn/test-float.sh     |    9 +
 tools/testing/selftests/net/ovpn/test-tcp.sh       |    9 +
 tools/testing/selftests/net/ovpn/test.sh           |  192 ++
 tools/testing/selftests/net/ovpn/udp_peers.txt     |    5 +
 54 files changed, 9817 insertions(+), 6 deletions(-)
---
base-commit: 39f54262ba499d862420a97719d2f0eea0cbd394
change-id: 20241002-b4-ovpn-eeee35c694a2

Best regards,

Sabrina Dubroca Feb. 12, 2025, 11:34 p.m. UTC | #1

Hello,

2025-02-11, 01:39:53 +0100, Antonio Quartulli wrote:
> All minor and major reported problems have been finally addressed.
> Big thanks to Sabrina, who took the time to guide me through
> converting the peer socket to an RCU pointer.

Something is off (not sure if it's new to this version): if I use
test-tcp.sh to setup a set of interfaces and peers (I stop the test
just after setup to keep the environment alive), then remove all netns
with "ip -all netns delete", I expect all devices to go away, but they
don't. With debug messages enabled I'm seeing some activity from the
module ("tun0: sending keepalive to peer 3" and so on), and
ovpn_net_uninit/ovpn_priv_free never got called.

[...]
> So there is NO risk of deadlock (and indeed nothing hangs), but I
> couldn't find a way to make the warning go away.

I've spotted another splat on strparser cleanup that looked like an
actual deadlock, but it's not very reproducible. Still looking into
it, but I'm not convinced it's ok to call strp_done (as is done from
ovpn_tcp_socket_detach) while under lock_sock, because AFAIU
cancel_work_sync(&strp->work) may be waiting for a work that needs to
lock the socket (cb.lock in do_strp_work). I guess tcp_tx_work would
have the same problem.

Antonio Quartulli Feb. 13, 2025, 11:46 a.m. UTC | #2

On 13/02/2025 00:34, Sabrina Dubroca wrote:
> Hello,
> 
> 2025-02-11, 01:39:53 +0100, Antonio Quartulli wrote:
>> All minor and major reported problems have been finally addressed.
>> Big thanks to Sabrina, who took the time to guide me through
>> converting the peer socket to an RCU pointer.
> 
> Something is off (not sure if it's new to this version): if I use
> test-tcp.sh to setup a set of interfaces and peers (I stop the test
> just after setup to keep the environment alive), then remove all netns
> with "ip -all netns delete", I expect all devices to go away, but they
> don't. With debug messages enabled I'm seeing some activity from the
> module ("tun0: sending keepalive to peer 3" and so on), and
> ovpn_net_uninit/ovpn_priv_free never got called.

I can reproduce it. If later I rmmod ovpn I then get all the "Deleting 
peer" messages.
So instances are not being purged on netns exit.

Will dive into it.

> 
> [...]
>> So there is NO risk of deadlock (and indeed nothing hangs), but I
>> couldn't find a way to make the warning go away.
> 
> I've spotted another splat on strparser cleanup that looked like an
> actual deadlock, but it's not very reproducible. Still looking into
> it, but I'm not convinced it's ok to call strp_done (as is done from
> ovpn_tcp_socket_detach) while under lock_sock, because AFAIU
> cancel_work_sync(&strp->work) may be waiting for a work that needs to
> lock the socket (cb.lock in do_strp_work). I guess tcp_tx_work would
> have the same problem.

Will have a look here too.

Thanks!

Sean Anderson Feb. 13, 2025, 2:26 p.m. UTC | #3

Hi Antonio,

On 2/10/25 19:39, Antonio Quartulli wrote:
> NOTE: TCP tests are still showing the following warning while running
> iperf.
> I have analysed the report several times, but it definitely looks like
> a false positive to me, so nothing to worry about.
> 
> Basically the lockdep engine gets confused thinking that we are
> acquiring the lock twice on the same sock, but actually:
> 1) the kernel is first locking the 'iperf' (user) TCP socket;
> 2) ovpn is later locking the underlying TCP trasport socket.
> 
> So there is NO risk of deadlock (and indeed nothing hangs), but I
> couldn't find a way to make the warning go away.

I think you can use a "nested" lock to avoid this. See e.g. commit
86a41ea9fd79 ("l2tp: fix lockdep splat") for an example.

--Sean

[1] https://www.kernel.org/doc/html/latest/locking/lockdep-design.html#exception-nested-data-dependencies-leading-to-nested-locking

Sabrina Dubroca Feb. 13, 2025, 3:46 p.m. UTC | #4

2025-02-13, 12:46:34 +0100, Antonio Quartulli wrote:
> On 13/02/2025 00:34, Sabrina Dubroca wrote:
> > Hello,
> > 
> > 2025-02-11, 01:39:53 +0100, Antonio Quartulli wrote:
> > > All minor and major reported problems have been finally addressed.
> > > Big thanks to Sabrina, who took the time to guide me through
> > > converting the peer socket to an RCU pointer.
> > 
> > Something is off (not sure if it's new to this version): if I use
> > test-tcp.sh to setup a set of interfaces and peers (I stop the test
> > just after setup to keep the environment alive), then remove all netns
> > with "ip -all netns delete", I expect all devices to go away, but they
> > don't. With debug messages enabled I'm seeing some activity from the
> > module ("tun0: sending keepalive to peer 3" and so on), and
> > ovpn_net_uninit/ovpn_priv_free never got called.
> 
> I can reproduce it. If later I rmmod ovpn I then get all the "Deleting peer"
> messages.
> So instances are not being purged on netns exit.
> 
> Will dive into it.

I think the socket holds a ref on the netns, so it's not getting
destroyed, simply "removed" from iproute's point of view. And the
socket isn't going away as long as it's used by a peer.

If I delete the peer(s) for the ovpn device and then the netns it was
in, the netns is fully removed, and the ovpn device is gone. Also no
issue if I delete the ovpn device before its netns, then everything is
destroyed as expected.

I'm not sure that can be solved, as least under the current refcount
scheme.

But I don't think there's a way to re-attach to that netns afterwards
if we wanted to clean up manually (something similar to "ip netns
attach <name> <pid>", but that won't work if whatever created the
socket is not running anymore -- as is the case with ovpn-cli).

> > 
> > [...]
> > > So there is NO risk of deadlock (and indeed nothing hangs), but I
> > > couldn't find a way to make the warning go away.
> > 
> > I've spotted another splat on strparser cleanup that looked like an
> > actual deadlock, but it's not very reproducible. Still looking into
> > it, but I'm not convinced it's ok to call strp_done (as is done from
> > ovpn_tcp_socket_detach) while under lock_sock, because AFAIU
> > cancel_work_sync(&strp->work) may be waiting for a work that needs to
> > lock the socket (cb.lock in do_strp_work). I guess tcp_tx_work would
> > have the same problem.
> 
> Will have a look here too.

The only ways I've managed to reproduce it is by using some ugly
kernel-side hacks to try to force that path being hit. Either forcing
the strp work to be queued just as we detach the socket (in
strp_stop), or this:

 - hack some code path to do a big sleep() under lock_sock(), to give
   me the time to do the next steps
 - ping over ovpn, or any other way to add packets on the receive
   socket
 - delete the peer for the sleeping socket (while it's still sleeping)

When that big sleep is over, strp_data_ready will kick off and queue
its worker (because at this point the socket lock is still owned),
then I think del_peer proceeds toward tcp_detach and I got the splat
below. So that's maybe a bit hard to trigger in real life (I don't
remember what I was doing the first time I got it, I think something
messing with the RCU stuff we discussed earlier).


[  922.681435][  T300] ======================================================
[  922.686247][  T300] WARNING: possible circular locking dependency detected
[  922.690971][  T300] 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279 Tainted: G                 N
[  922.696584][  T300] ------------------------------------------------------
[  922.699697][  T300] kworker/1:2/300 is trying to acquire lock:
[  922.702105][  T300] ffff88800a662160 ((work_completion)(&strp->work)){+.+.}-{0:0}, at: start_flush_work+0x407/0xa50
[  922.705716][  T300] 
[  922.705716][  T300] but task is already holding lock:
[  922.707779][  T300] ffff8880113edd98 (sk_lock-AF_INET){+.+.}-{0:0}, at: ovpn_socket_release+0x8a/0x1a0
[  922.710238][  T300] 
[  922.710238][  T300] which lock already depends on the new lock.
[  922.710238][  T300] 
[  922.712628][  T300] 
[  922.712628][  T300] the existing dependency chain (in reverse order) is:
[  922.714443][  T300] 
[  922.714443][  T300] -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
[  922.716127][  T300]        __lock_acquire+0xc4d/0x1ee0
[  922.717250][  T300]        lock_acquire+0x1a9/0x500
[  922.718266][  T300]        lock_sock_nested+0x40/0xf0
[  922.719325][  T300]        strp_work+0x95/0x1e0
[  922.720240][  T300]        process_one_work+0xe28/0x1460
[  922.721307][  T300]        worker_thread+0x674/0xee0
[  922.722283][  T300]        kthread+0x3c3/0x760
[  922.723101][  T300]        ret_from_fork+0x46/0x80
[  922.723792][  T300]        ret_from_fork_asm+0x1a/0x30
[  922.724532][  T300] 
[  922.724532][  T300] -> #0 ((work_completion)(&strp->work)){+.+.}-{0:0}:
[  922.726038][  T300]        check_prev_add+0x1af/0x2400
[  922.726927][  T300]        validate_chain+0xdcf/0x1a10
[  922.727847][  T300]        __lock_acquire+0xc4d/0x1ee0
[  922.728721][  T300]        lock_acquire+0x1a9/0x500
[  922.729590][  T300]        start_flush_work+0x41a/0xa50
[  922.730434][  T300]        __flush_work+0xee/0x210
[  922.731213][  T300]        cancel_work_sync+0xb8/0xd0
[  922.732022][  T300]        strp_done.cold+0x51/0xcf
[  922.732830][  T300]        ovpn_tcp_socket_detach+0x28e/0x2de
[  922.733752][  T300]        ovpn_socket_release_kref+0x1ef/0x350
[  922.734713][  T300]        ovpn_socket_release+0xe7/0x1a0
[  922.735577][  T300]        ovpn_peer_remove_work+0x2b/0x90
[  922.736468][  T300]        process_one_work+0xe28/0x1460
[  922.737357][  T300]        worker_thread+0x674/0xee0
[  922.737985][  T300]        kthread+0x3c3/0x760
[  922.738545][  T300]        ret_from_fork+0x46/0x80
[  922.739177][  T300]        ret_from_fork_asm+0x1a/0x30
[  922.739846][  T300] 
[  922.739846][  T300] other info that might help us debug this:
[  922.739846][  T300] 
[  922.741159][  T300]  Possible unsafe locking scenario:
[  922.741159][  T300] 
[  922.742165][  T300]        CPU0                    CPU1
[  922.743055][  T300]        ----                    ----
[  922.743943][  T300]   lock(sk_lock-AF_INET);
[  922.744638][  T300]                                lock((work_completion)(&strp->work));
[  922.745786][  T300]                                lock(sk_lock-AF_INET);
[  922.746837][  T300]   lock((work_completion)(&strp->work));
[  922.747788][  T300] 
[  922.747788][  T300]  *** DEADLOCK ***
[  922.747788][  T300] 
[  922.748856][  T300] 4 locks held by kworker/1:2/300:
[  922.749474][  T300]  #0: ffff888008499b48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xd80/0x1460
[  922.750588][  T300]  #1: ffffc90002527d30 ((work_completion)(&peer->remove_work)){+.+.}-{0:0}, at: process_one_work+0xddf/0x1460
[  922.751894][  T300]  #2: ffff8880113edd98 (sk_lock-AF_INET){+.+.}-{0:0}, at: ovpn_socket_release+0x8a/0x1a0
[  922.752983][  T300]  #3: ffffffff854de980 (rcu_read_lock){....}-{1:3}, at: start_flush_work+0x39/0xa50
[  922.754018][  T300] 
[  922.754018][  T300] stack backtrace:
[  922.754734][  T300] CPU: 1 UID: 0 PID: 300 Comm: kworker/1:2 Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
[  922.754748][  T300] Tainted: [N]=TEST
[  922.754752][  T300] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[  922.754761][  T300] Workqueue: events ovpn_peer_remove_work
[  922.754779][  T300] Call Trace:
[  922.754785][  T300]  <TASK>
[  922.754791][  T300]  dump_stack_lvl+0xa5/0x100
[  922.754803][  T300]  print_circular_bug.cold+0x38/0x48
[  922.754820][  T300]  check_noncircular+0x2f2/0x3d0
[  922.754922][  T300]  check_prev_add+0x1af/0x2400
[  922.754942][  T300]  validate_chain+0xdcf/0x1a10
[  922.754991][  T300]  __lock_acquire+0xc4d/0x1ee0
[  922.755023][  T300]  lock_acquire+0x1a9/0x500
[  922.755104][  T300]  start_flush_work+0x41a/0xa50
[  922.755128][  T300]  __flush_work+0xee/0x210
[  922.755198][  T300]  cancel_work_sync+0xb8/0xd0
[  922.755211][  T300]  strp_done.cold+0x51/0xcf
[  922.755222][  T300]  ovpn_tcp_socket_detach+0x28e/0x2de
[  922.755237][  T300]  ovpn_socket_release_kref+0x1ef/0x350
[  922.755253][  T300]  ovpn_socket_release+0xe7/0x1a0
[  922.755268][  T300]  ovpn_peer_remove_work+0x2b/0x90
[  922.755282][  T300]  process_one_work+0xe28/0x1460
[  922.755330][  T300]  worker_thread+0x674/0xee0
[  922.755402][  T300]  kthread+0x3c3/0x760
[  922.755472][  T300]  ret_from_fork+0x46/0x80
[  922.755497][  T300]  ret_from_fork_asm+0x1a/0x30
[  922.755521][  T300]  </TASK>


And then come the hung task warnings:
 - del_peer waiting on the completion
 - strp waiting on lock_sock
 - peer cleanup/socket detach waiting on cancel_work_sync/strp

[ 1106.347400][   T39] INFO: task kworker/u16:0:11 blocked for more than 122 seconds.
[ 1106.348547][   T39]       Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
[ 1106.349671][   T39] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1106.350748][   T39] task:kworker/u16:0   state:D stack:0     pid:11    tgid:11    ppid:2      task_flags:0x4208160 flags:0x00004000
[ 1106.352454][   T39] Workqueue: kstrp strp_work
[ 1106.353133][   T39] Call Trace:
[ 1106.353659][   T39]  <TASK>
[ 1106.354143][   T39]  __schedule+0xace/0x2620
[ 1106.360659][   T39]  schedule+0xd0/0x210
[ 1106.361266][   T39]  __lock_sock+0x137/0x230
[ 1106.365430][   T39]  lock_sock_nested+0xcb/0xf0
[ 1106.366173][   T39]  strp_work+0x95/0x1e0
[ 1106.366845][   T39]  process_one_work+0xe28/0x1460
[ 1106.369231][   T39]  worker_thread+0x674/0xee0
[ 1106.371594][   T39]  kthread+0x3c3/0x760
[ 1106.375692][   T39]  ret_from_fork+0x46/0x80
[ 1106.377040][   T39]  ret_from_fork_asm+0x1a/0x30
[ 1106.377851][   T39]  </TASK>
[ 1106.378355][   T39] INFO: task kworker/1:2:300 blocked for more than 122 seconds.
[ 1106.379590][   T39]       Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
[ 1106.381079][   T39] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1106.382372][   T39] task:kworker/1:2     state:D stack:0     pid:300   tgid:300   ppid:2      task_flags:0x4208060 flags:0x00004000
[ 1106.384211][   T39] Workqueue: events ovpn_peer_remove_work
[ 1106.385072][   T39] Call Trace:
[ 1106.385648][   T39]  <TASK>
[ 1106.386139][   T39]  __schedule+0xace/0x2620
[ 1106.393518][   T39]  schedule+0xd0/0x210
[ 1106.394195][   T39]  schedule_timeout+0x18c/0x240
[ 1106.398840][   T39]  __wait_for_common+0x3e3/0x610
[ 1106.403079][   T39]  __flush_work+0x14f/0x210
[ 1106.406874][   T39]  cancel_work_sync+0xb8/0xd0
[ 1106.407519][   T39]  strp_done.cold+0x51/0xcf
[ 1106.408109][   T39]  ovpn_tcp_socket_detach+0x28e/0x2de
[ 1106.408851][   T39]  ovpn_socket_release_kref+0x1ef/0x350
[ 1106.409588][   T39]  ovpn_socket_release+0xe7/0x1a0
[ 1106.410271][   T39]  ovpn_peer_remove_work+0x2b/0x90
[ 1106.410973][   T39]  process_one_work+0xe28/0x1460
[ 1106.413075][   T39]  worker_thread+0x674/0xee0
[ 1106.416968][   T39]  kthread+0x3c3/0x760
[ 1106.419983][   T39]  ret_from_fork+0x46/0x80
[ 1106.421172][   T39]  ret_from_fork_asm+0x1a/0x30
[ 1106.421829][   T39]  </TASK>
[ 1106.422260][   T39] INFO: task ovpn-cli:1213 blocked for more than 122 seconds.
[ 1106.423215][   T39]       Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
[ 1106.424451][   T39] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1106.425643][   T39] task:ovpn-cli        state:D stack:0     pid:1213  tgid:1213  ppid:514    task_flags:0x400100 flags:0x00000002
[ 1106.427145][   T39] Call Trace:
[ 1106.427610][   T39]  <TASK>
[ 1106.428005][   T39]  __schedule+0xace/0x2620
[ 1106.431820][   T39]  schedule+0xd0/0x210
[ 1106.432351][   T39]  schedule_timeout+0x18c/0x240
[ 1106.435050][   T39]  __wait_for_common+0x3e3/0x610
[ 1106.439809][   T39]  ovpn_nl_peer_del_doit+0x270/0x6e0
[ 1106.442825][   T39]  genl_family_rcv_msg_doit+0x1ea/0x2e0
[ 1106.445602][   T39]  genl_family_rcv_msg+0x3a7/0x5b0
[ 1106.450559][   T39]  genl_rcv_msg+0xb1/0x160
[ 1106.451154][   T39]  netlink_rcv_skb+0x13e/0x3d0
[ 1106.455220][   T39]  genl_rcv+0x29/0x40
[ 1106.455737][   T39]  netlink_unicast+0x491/0x730
[ 1106.457107][   T39]  netlink_sendmsg+0x77d/0xc00
[ 1106.458517][   T39]  ____sys_sendmsg+0x7c5/0xac0
[ 1106.461329][   T39]  ___sys_sendmsg+0x163/0x1b0
[ 1106.468146][   T39]  __sys_sendmsg+0x135/0x1d0
[ 1106.471564][   T39]  do_syscall_64+0x64/0x140
[ 1106.472173][   T39]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1106.472947][   T39] RIP: 0033:0x7faa76628e56
[ 1106.473558][   T39] RSP: 002b:00007ffe757311e0 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 1106.474674][   T39] RAX: ffffffffffffffda RBX: 00007faa7654f740 RCX: 00007faa76628e56
[ 1106.475715][   T39] RDX: 0000000000000000 RSI: 00007ffe75731270 RDI: 0000000000000003
[ 1106.476847][   T39] RBP: 00007ffe757311f0 R08: 0000000000000000 R09: 0000000000000000
[ 1106.477959][   T39] R10: 0000000000000000 R11: 0000000000000202 R12: 00005617490cf490
[ 1106.479064][   T39] R13: 00005617490d04f0 R14: 00007ffe75731270 R15: 0000561733d62d78
[ 1106.480175][   T39]  </TASK>
[ 1106.480646][   T39] INFO: lockdep is turned off.

Antonio Quartulli Feb. 13, 2025, 7:40 p.m. UTC | #5

On 13/02/2025 16:46, Sabrina Dubroca wrote:
> 2025-02-13, 12:46:34 +0100, Antonio Quartulli wrote:
>> On 13/02/2025 00:34, Sabrina Dubroca wrote:
>>> Hello,
>>>
>>> 2025-02-11, 01:39:53 +0100, Antonio Quartulli wrote:
>>>> All minor and major reported problems have been finally addressed.
>>>> Big thanks to Sabrina, who took the time to guide me through
>>>> converting the peer socket to an RCU pointer.
>>>
>>> Something is off (not sure if it's new to this version): if I use
>>> test-tcp.sh to setup a set of interfaces and peers (I stop the test
>>> just after setup to keep the environment alive), then remove all netns
>>> with "ip -all netns delete", I expect all devices to go away, but they
>>> don't. With debug messages enabled I'm seeing some activity from the
>>> module ("tun0: sending keepalive to peer 3" and so on), and
>>> ovpn_net_uninit/ovpn_priv_free never got called.
>>
>> I can reproduce it. If later I rmmod ovpn I then get all the "Deleting peer"
>> messages.
>> So instances are not being purged on netns exit.
>>
>> Will dive into it.
> 
> I think the socket holds a ref on the netns, so it's not getting
> destroyed, simply "removed" from iproute's point of view. And the
> socket isn't going away as long as it's used by a peer.

After a deep dive I was getting to the same conclusion:
cleanup_net() is never invoked because we lack an invocation of put_net().
sk_alloc() invokes get_net(), so this is the ref that is not being released.

> 
> If I delete the peer(s) for the ovpn device and then the netns it was
> in, the netns is fully removed, and the ovpn device is gone. Also no
> issue if I delete the ovpn device before its netns, then everything is
> destroyed as expected.
> 
> I'm not sure that can be solved, as least under the current refcount
> scheme.

I went back to v12 of the patchset (which is pre-refcount-restructuring) 
and indeed the problem there doesn't exist.

However, it's unclear to me how in v12 the socket release was happening 
upon netns delete. Who was triggering that? ovpn still needed to call 
sockfd_put() in order to let it go.

Will investigate some more and think about a solution.

> 
> But I don't think there's a way to re-attach to that netns afterwards
> if we wanted to clean up manually (something similar to "ip netns
> attach <name> <pid>", but that won't work if whatever created the
> socket is not running anymore -- as is the case with ovpn-cli).

Yeah, it just hangs in background until reboot.

> 
>>>
>>> [...]
>>>> So there is NO risk of deadlock (and indeed nothing hangs), but I
>>>> couldn't find a way to make the warning go away.
>>>
>>> I've spotted another splat on strparser cleanup that looked like an
>>> actual deadlock, but it's not very reproducible. Still looking into
>>> it, but I'm not convinced it's ok to call strp_done (as is done from
>>> ovpn_tcp_socket_detach) while under lock_sock, because AFAIU
>>> cancel_work_sync(&strp->work) may be waiting for a work that needs to
>>> lock the socket (cb.lock in do_strp_work). I guess tcp_tx_work would
>>> have the same problem.
>>
>> Will have a look here too.

Thanks for all the info below.
I will have a deeper look at this tomorrow.

Regards,


> 
> The only ways I've managed to reproduce it is by using some ugly
> kernel-side hacks to try to force that path being hit. Either forcing
> the strp work to be queued just as we detach the socket (in
> strp_stop), or this:
> 
>   - hack some code path to do a big sleep() under lock_sock(), to give
>     me the time to do the next steps
>   - ping over ovpn, or any other way to add packets on the receive
>     socket
>   - delete the peer for the sleeping socket (while it's still sleeping)
> 
> When that big sleep is over, strp_data_ready will kick off and queue
> its worker (because at this point the socket lock is still owned),
> then I think del_peer proceeds toward tcp_detach and I got the splat
> below. So that's maybe a bit hard to trigger in real life (I don't
> remember what I was doing the first time I got it, I think something
> messing with the RCU stuff we discussed earlier).
> 
> 
> [  922.681435][  T300] ======================================================
> [  922.686247][  T300] WARNING: possible circular locking dependency detected
> [  922.690971][  T300] 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279 Tainted: G                 N
> [  922.696584][  T300] ------------------------------------------------------
> [  922.699697][  T300] kworker/1:2/300 is trying to acquire lock:
> [  922.702105][  T300] ffff88800a662160 ((work_completion)(&strp->work)){+.+.}-{0:0}, at: start_flush_work+0x407/0xa50
> [  922.705716][  T300]
> [  922.705716][  T300] but task is already holding lock:
> [  922.707779][  T300] ffff8880113edd98 (sk_lock-AF_INET){+.+.}-{0:0}, at: ovpn_socket_release+0x8a/0x1a0
> [  922.710238][  T300]
> [  922.710238][  T300] which lock already depends on the new lock.
> [  922.710238][  T300]
> [  922.712628][  T300]
> [  922.712628][  T300] the existing dependency chain (in reverse order) is:
> [  922.714443][  T300]
> [  922.714443][  T300] -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
> [  922.716127][  T300]        __lock_acquire+0xc4d/0x1ee0
> [  922.717250][  T300]        lock_acquire+0x1a9/0x500
> [  922.718266][  T300]        lock_sock_nested+0x40/0xf0
> [  922.719325][  T300]        strp_work+0x95/0x1e0
> [  922.720240][  T300]        process_one_work+0xe28/0x1460
> [  922.721307][  T300]        worker_thread+0x674/0xee0
> [  922.722283][  T300]        kthread+0x3c3/0x760
> [  922.723101][  T300]        ret_from_fork+0x46/0x80
> [  922.723792][  T300]        ret_from_fork_asm+0x1a/0x30
> [  922.724532][  T300]
> [  922.724532][  T300] -> #0 ((work_completion)(&strp->work)){+.+.}-{0:0}:
> [  922.726038][  T300]        check_prev_add+0x1af/0x2400
> [  922.726927][  T300]        validate_chain+0xdcf/0x1a10
> [  922.727847][  T300]        __lock_acquire+0xc4d/0x1ee0
> [  922.728721][  T300]        lock_acquire+0x1a9/0x500
> [  922.729590][  T300]        start_flush_work+0x41a/0xa50
> [  922.730434][  T300]        __flush_work+0xee/0x210
> [  922.731213][  T300]        cancel_work_sync+0xb8/0xd0
> [  922.732022][  T300]        strp_done.cold+0x51/0xcf
> [  922.732830][  T300]        ovpn_tcp_socket_detach+0x28e/0x2de
> [  922.733752][  T300]        ovpn_socket_release_kref+0x1ef/0x350
> [  922.734713][  T300]        ovpn_socket_release+0xe7/0x1a0
> [  922.735577][  T300]        ovpn_peer_remove_work+0x2b/0x90
> [  922.736468][  T300]        process_one_work+0xe28/0x1460
> [  922.737357][  T300]        worker_thread+0x674/0xee0
> [  922.737985][  T300]        kthread+0x3c3/0x760
> [  922.738545][  T300]        ret_from_fork+0x46/0x80
> [  922.739177][  T300]        ret_from_fork_asm+0x1a/0x30
> [  922.739846][  T300]
> [  922.739846][  T300] other info that might help us debug this:
> [  922.739846][  T300]
> [  922.741159][  T300]  Possible unsafe locking scenario:
> [  922.741159][  T300]
> [  922.742165][  T300]        CPU0                    CPU1
> [  922.743055][  T300]        ----                    ----
> [  922.743943][  T300]   lock(sk_lock-AF_INET);
> [  922.744638][  T300]                                lock((work_completion)(&strp->work));
> [  922.745786][  T300]                                lock(sk_lock-AF_INET);
> [  922.746837][  T300]   lock((work_completion)(&strp->work));
> [  922.747788][  T300]
> [  922.747788][  T300]  *** DEADLOCK ***
> [  922.747788][  T300]
> [  922.748856][  T300] 4 locks held by kworker/1:2/300:
> [  922.749474][  T300]  #0: ffff888008499b48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xd80/0x1460
> [  922.750588][  T300]  #1: ffffc90002527d30 ((work_completion)(&peer->remove_work)){+.+.}-{0:0}, at: process_one_work+0xddf/0x1460
> [  922.751894][  T300]  #2: ffff8880113edd98 (sk_lock-AF_INET){+.+.}-{0:0}, at: ovpn_socket_release+0x8a/0x1a0
> [  922.752983][  T300]  #3: ffffffff854de980 (rcu_read_lock){....}-{1:3}, at: start_flush_work+0x39/0xa50
> [  922.754018][  T300]
> [  922.754018][  T300] stack backtrace:
> [  922.754734][  T300] CPU: 1 UID: 0 PID: 300 Comm: kworker/1:2 Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
> [  922.754748][  T300] Tainted: [N]=TEST
> [  922.754752][  T300] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> [  922.754761][  T300] Workqueue: events ovpn_peer_remove_work
> [  922.754779][  T300] Call Trace:
> [  922.754785][  T300]  <TASK>
> [  922.754791][  T300]  dump_stack_lvl+0xa5/0x100
> [  922.754803][  T300]  print_circular_bug.cold+0x38/0x48
> [  922.754820][  T300]  check_noncircular+0x2f2/0x3d0
> [  922.754922][  T300]  check_prev_add+0x1af/0x2400
> [  922.754942][  T300]  validate_chain+0xdcf/0x1a10
> [  922.754991][  T300]  __lock_acquire+0xc4d/0x1ee0
> [  922.755023][  T300]  lock_acquire+0x1a9/0x500
> [  922.755104][  T300]  start_flush_work+0x41a/0xa50
> [  922.755128][  T300]  __flush_work+0xee/0x210
> [  922.755198][  T300]  cancel_work_sync+0xb8/0xd0
> [  922.755211][  T300]  strp_done.cold+0x51/0xcf
> [  922.755222][  T300]  ovpn_tcp_socket_detach+0x28e/0x2de
> [  922.755237][  T300]  ovpn_socket_release_kref+0x1ef/0x350
> [  922.755253][  T300]  ovpn_socket_release+0xe7/0x1a0
> [  922.755268][  T300]  ovpn_peer_remove_work+0x2b/0x90
> [  922.755282][  T300]  process_one_work+0xe28/0x1460
> [  922.755330][  T300]  worker_thread+0x674/0xee0
> [  922.755402][  T300]  kthread+0x3c3/0x760
> [  922.755472][  T300]  ret_from_fork+0x46/0x80
> [  922.755497][  T300]  ret_from_fork_asm+0x1a/0x30
> [  922.755521][  T300]  </TASK>
> 
> 
> And then come the hung task warnings:
>   - del_peer waiting on the completion
>   - strp waiting on lock_sock
>   - peer cleanup/socket detach waiting on cancel_work_sync/strp
> 
> [ 1106.347400][   T39] INFO: task kworker/u16:0:11 blocked for more than 122 seconds.
> [ 1106.348547][   T39]       Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
> [ 1106.349671][   T39] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1106.350748][   T39] task:kworker/u16:0   state:D stack:0     pid:11    tgid:11    ppid:2      task_flags:0x4208160 flags:0x00004000
> [ 1106.352454][   T39] Workqueue: kstrp strp_work
> [ 1106.353133][   T39] Call Trace:
> [ 1106.353659][   T39]  <TASK>
> [ 1106.354143][   T39]  __schedule+0xace/0x2620
> [ 1106.360659][   T39]  schedule+0xd0/0x210
> [ 1106.361266][   T39]  __lock_sock+0x137/0x230
> [ 1106.365430][   T39]  lock_sock_nested+0xcb/0xf0
> [ 1106.366173][   T39]  strp_work+0x95/0x1e0
> [ 1106.366845][   T39]  process_one_work+0xe28/0x1460
> [ 1106.369231][   T39]  worker_thread+0x674/0xee0
> [ 1106.371594][   T39]  kthread+0x3c3/0x760
> [ 1106.375692][   T39]  ret_from_fork+0x46/0x80
> [ 1106.377040][   T39]  ret_from_fork_asm+0x1a/0x30
> [ 1106.377851][   T39]  </TASK>
> [ 1106.378355][   T39] INFO: task kworker/1:2:300 blocked for more than 122 seconds.
> [ 1106.379590][   T39]       Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
> [ 1106.381079][   T39] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1106.382372][   T39] task:kworker/1:2     state:D stack:0     pid:300   tgid:300   ppid:2      task_flags:0x4208060 flags:0x00004000
> [ 1106.384211][   T39] Workqueue: events ovpn_peer_remove_work
> [ 1106.385072][   T39] Call Trace:
> [ 1106.385648][   T39]  <TASK>
> [ 1106.386139][   T39]  __schedule+0xace/0x2620
> [ 1106.393518][   T39]  schedule+0xd0/0x210
> [ 1106.394195][   T39]  schedule_timeout+0x18c/0x240
> [ 1106.398840][   T39]  __wait_for_common+0x3e3/0x610
> [ 1106.403079][   T39]  __flush_work+0x14f/0x210
> [ 1106.406874][   T39]  cancel_work_sync+0xb8/0xd0
> [ 1106.407519][   T39]  strp_done.cold+0x51/0xcf
> [ 1106.408109][   T39]  ovpn_tcp_socket_detach+0x28e/0x2de
> [ 1106.408851][   T39]  ovpn_socket_release_kref+0x1ef/0x350
> [ 1106.409588][   T39]  ovpn_socket_release+0xe7/0x1a0
> [ 1106.410271][   T39]  ovpn_peer_remove_work+0x2b/0x90
> [ 1106.410973][   T39]  process_one_work+0xe28/0x1460
> [ 1106.413075][   T39]  worker_thread+0x674/0xee0
> [ 1106.416968][   T39]  kthread+0x3c3/0x760
> [ 1106.419983][   T39]  ret_from_fork+0x46/0x80
> [ 1106.421172][   T39]  ret_from_fork_asm+0x1a/0x30
> [ 1106.421829][   T39]  </TASK>
> [ 1106.422260][   T39] INFO: task ovpn-cli:1213 blocked for more than 122 seconds.
> [ 1106.423215][   T39]       Tainted: G                 N 6.14.0-rc1-net-00276-g95f1f7ea224e-dirty #279
> [ 1106.424451][   T39] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1106.425643][   T39] task:ovpn-cli        state:D stack:0     pid:1213  tgid:1213  ppid:514    task_flags:0x400100 flags:0x00000002
> [ 1106.427145][   T39] Call Trace:
> [ 1106.427610][   T39]  <TASK>
> [ 1106.428005][   T39]  __schedule+0xace/0x2620
> [ 1106.431820][   T39]  schedule+0xd0/0x210
> [ 1106.432351][   T39]  schedule_timeout+0x18c/0x240
> [ 1106.435050][   T39]  __wait_for_common+0x3e3/0x610
> [ 1106.439809][   T39]  ovpn_nl_peer_del_doit+0x270/0x6e0
> [ 1106.442825][   T39]  genl_family_rcv_msg_doit+0x1ea/0x2e0
> [ 1106.445602][   T39]  genl_family_rcv_msg+0x3a7/0x5b0
> [ 1106.450559][   T39]  genl_rcv_msg+0xb1/0x160
> [ 1106.451154][   T39]  netlink_rcv_skb+0x13e/0x3d0
> [ 1106.455220][   T39]  genl_rcv+0x29/0x40
> [ 1106.455737][   T39]  netlink_unicast+0x491/0x730
> [ 1106.457107][   T39]  netlink_sendmsg+0x77d/0xc00
> [ 1106.458517][   T39]  ____sys_sendmsg+0x7c5/0xac0
> [ 1106.461329][   T39]  ___sys_sendmsg+0x163/0x1b0
> [ 1106.468146][   T39]  __sys_sendmsg+0x135/0x1d0
> [ 1106.471564][   T39]  do_syscall_64+0x64/0x140
> [ 1106.472173][   T39]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1106.472947][   T39] RIP: 0033:0x7faa76628e56
> [ 1106.473558][   T39] RSP: 002b:00007ffe757311e0 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> [ 1106.474674][   T39] RAX: ffffffffffffffda RBX: 00007faa7654f740 RCX: 00007faa76628e56
> [ 1106.475715][   T39] RDX: 0000000000000000 RSI: 00007ffe75731270 RDI: 0000000000000003
> [ 1106.476847][   T39] RBP: 00007ffe757311f0 R08: 0000000000000000 R09: 0000000000000000
> [ 1106.477959][   T39] R10: 0000000000000000 R11: 0000000000000202 R12: 00005617490cf490
> [ 1106.479064][   T39] R13: 00005617490d04f0 R14: 00007ffe75731270 R15: 0000561733d62d78
> [ 1106.480175][   T39]  </TASK>
> [ 1106.480646][   T39] INFO: lockdep is turned off.
>

Antonio Quartulli Feb. 13, 2025, 7:46 p.m. UTC | #6

Hi Sean,

On 13/02/2025 15:26, Sean Anderson wrote:
> Hi Antonio,
> 
> On 2/10/25 19:39, Antonio Quartulli wrote:
>> NOTE: TCP tests are still showing the following warning while running
>> iperf.
>> I have analysed the report several times, but it definitely looks like
>> a false positive to me, so nothing to worry about.
>>
>> Basically the lockdep engine gets confused thinking that we are
>> acquiring the lock twice on the same sock, but actually:
>> 1) the kernel is first locking the 'iperf' (user) TCP socket;
>> 2) ovpn is later locking the underlying TCP trasport socket.
>>
>> So there is NO risk of deadlock (and indeed nothing hangs), but I
>> couldn't find a way to make the warning go away.
> 
> I think you can use a "nested" lock to avoid this. See e.g. commit
> 86a41ea9fd79 ("l2tp: fix lockdep splat") for an example.

Thanks a lot for this pointer.
By reading the commit message it seems I am hitting the very same "issue".

I will try to port the same solution to ovpn then :)

Thanks again!

Regards,

> 
> --Sean
> 
> [1] https://www.kernel.org/doc/html/latest/locking/lockdep- 
> design.html#exception-nested-data-dependencies-leading-to-nested-locking

Antonio Quartulli Feb. 14, 2025, 1:54 p.m. UTC | #7

On 13/02/2025 20:40, Antonio Quartulli wrote:
> On 13/02/2025 16:46, Sabrina Dubroca wrote:
>> 2025-02-13, 12:46:34 +0100, Antonio Quartulli wrote:
>>> On 13/02/2025 00:34, Sabrina Dubroca wrote:
>>>> Hello,
>>>>
>>>> 2025-02-11, 01:39:53 +0100, Antonio Quartulli wrote:
>>>>> All minor and major reported problems have been finally addressed.
>>>>> Big thanks to Sabrina, who took the time to guide me through
>>>>> converting the peer socket to an RCU pointer.
>>>>
>>>> Something is off (not sure if it's new to this version): if I use
>>>> test-tcp.sh to setup a set of interfaces and peers (I stop the test
>>>> just after setup to keep the environment alive), then remove all netns
>>>> with "ip -all netns delete", I expect all devices to go away, but they
>>>> don't. With debug messages enabled I'm seeing some activity from the
>>>> module ("tun0: sending keepalive to peer 3" and so on), and
>>>> ovpn_net_uninit/ovpn_priv_free never got called.
>>>
>>> I can reproduce it. If later I rmmod ovpn I then get all the 
>>> "Deleting peer"
>>> messages.
>>> So instances are not being purged on netns exit.
>>>
>>> Will dive into it.
>>
>> I think the socket holds a ref on the netns, so it's not getting
>> destroyed, simply "removed" from iproute's point of view. And the
>> socket isn't going away as long as it's used by a peer.
> 
> After a deep dive I was getting to the same conclusion:
> cleanup_net() is never invoked because we lack an invocation of put_net().
> sk_alloc() invokes get_net(), so this is the ref that is not being 
> released.
> 
>>
>> If I delete the peer(s) for the ovpn device and then the netns it was
>> in, the netns is fully removed, and the ovpn device is gone. Also no
>> issue if I delete the ovpn device before its netns, then everything is
>> destroyed as expected.
>>
>> I'm not sure that can be solved, as least under the current refcount
>> scheme.
> 
> I went back to v12 of the patchset (which is pre-refcount-restructuring) 
> and indeed the problem there doesn't exist.
> 
> However, it's unclear to me how in v12 the socket release was happening 
> upon netns delete. Who was triggering that? ovpn still needed to call 
> sockfd_put() in order to let it go.
> 
> Will investigate some more and think about a solution.
> 

I may have done something wrong, but today I tried again and I am 
reproducing this issue also on v8 + 6.11.

I am indeed wondering how it could have ever worked: if we don't delete 
the peer, we don't detach the socket, hence the following chain:

sockfd_put() -> sk_destructor() -> put_net()

does not happen.

Shouldn't we be notified somehow that the netns is going away?

For example in wireguard/device.c the socket is released in 
pernet_operations.pre_exit().

But pre_exit() is invoked in cleanup_net(), which is invoked ONLY if the 
net refcount has reached 0...but how can it be zero before the sockets 
have been released?

I must be missing something, because this seems to be a reference loop.


I'll keep digging..

Regards,

Antonio Quartulli Feb. 14, 2025, 2 p.m. UTC | #8

On 14/02/2025 14:54, Antonio Quartulli wrote:
> For example in wireguard/device.c the socket is released in 
> pernet_operations.pre_exit().
> 
> But pre_exit() is invoked in cleanup_net(), which is invoked ONLY if the 
> net refcount has reached 0...but how can it be zero before the sockets 
> have been released?
> 
> I must be missing something, because this seems to be a reference loop.

FTR, the answer is that sockets created in-kernel (like for wireguard) 
have sk->sk_ref_cnt set to 0, which implies that no reference to the 
netns is taken.

So ovpn has this issue because our sockets are coming from userspace.

Regards,

[net-next,v19,00/26] Introducing OpenVPN Data Channel Offload

Message

Comments