diff mbox series

[net] tcp: properly terminate timers for kernel sockets

Message ID 20240322135732.1535772-1-edumazet@google.com (mailing list archive)
State Accepted
Commit 151c9c724d05d5b0dd8acd3e11cb69ef1f2dbada
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: properly terminate timers for kernel sockets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3414 this patch: 3414
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: ebiederm@xmission.com; 3 maintainers not CCed: ebiederm@xmission.com dsahern@kernel.org bpf@vger.kernel.org
netdev/build_clang success Errors and warnings before: 999 this patch: 999
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 3604 this patch: 3604
netdev/checkpatch warning CHECK: multiple assignments should be avoided
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-23--00-00 (tests: 943)

Commit Message

Eric Dumazet March 22, 2024, 1:57 p.m. UTC
We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.

Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.

When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.

inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.

For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.

For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.

This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.

Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.

It is very possible we can revert in the future commit
3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)

We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Josef Bacik <josef@toxicpanda.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/net/inet_connection_sock.h |  1 +
 include/net/sock.h                 |  7 +++++++
 net/ipv4/inet_connection_sock.c    | 14 ++++++++++++++
 net/ipv4/tcp.c                     |  2 ++
 4 files changed, 24 insertions(+)

Comments

Josef Bacik March 22, 2024, 2:58 p.m. UTC | #1
On Fri, Mar 22, 2024 at 01:57:32PM +0000, Eric Dumazet wrote:
> We had various syzbot reports about tcp timers firing after
> the corresponding netns has been dismantled.
> 
> Fortunately Josef Bacik could trigger the issue more often,
> and could test a patch I wrote two years ago.
> 
> When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
> to 'stop' the timers.
> 
> inet_csk_clear_xmit_timers() can be called from any context,
> including when socket lock is held.
> This is the reason it uses sk_stop_timer(), aka del_timer().
> This means that ongoing timers might finish much later.
> 
> For user sockets, this is fine because each running timer
> holds a reference on the socket, and the user socket holds
> a reference on the netns.
> 
> For kernel sockets, we risk that the netns is freed before
> timer can complete, because kernel sockets do not hold
> reference on the netns.
> 
> This patch adds inet_csk_clear_xmit_timers_sync() function
> that using sk_stop_timer_sync() to make sure all timers
> are terminated before the kernel socket is released.
> Modules using kernel sockets close them in their netns exit()
> handler.
> 
> Also add sock_not_owned_by_me() helper to get LOCKDEP
> support : inet_csk_clear_xmit_timers_sync() must not be called
> while socket lock is held.
> 
> It is very possible we can revert in the future commit
> 3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
> which attempted to solve the issue in rds only.
> (net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
> 
> We probably can remove the check_net() tests from
> tcp_out_of_resources() and __tcp_close() in the future.
> 

Thanks so much for your help with this Eric!

Josef
Jakub Kicinski March 22, 2024, 10:47 p.m. UTC | #2
On Fri, 22 Mar 2024 13:57:32 +0000 Eric Dumazet wrote:
> +	if (!sk->sk_net_refcnt)
> +		inet_csk_clear_xmit_timers_sync(sk);

The thought that we should clear or poison sk_net at this point
(whether sk->sk_net_refcnt or not) keeps coming back to me.
If we don't guarantee the pointer is valid - to make it easier
for syzbot to catch invalid accesses?
Eric Dumazet March 23, 2024, 4:45 a.m. UTC | #3
On Fri, Mar 22, 2024 at 11:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 22 Mar 2024 13:57:32 +0000 Eric Dumazet wrote:
> > +     if (!sk->sk_net_refcnt)
> > +             inet_csk_clear_xmit_timers_sync(sk);
>
> The thought that we should clear or poison sk_net at this point
> (whether sk->sk_net_refcnt or not) keeps coming back to me.
> If we don't guarantee the pointer is valid - to make it easier
> for syzbot to catch invalid accesses?

I do not think we should do this here.

Note that KASAN has quarantine, and can catch invalid UAF accesses anyway.

We could clear the base socket in sk_prot_free() but this will not
make KASAN better.

diff --git a/net/core/sock.c b/net/core/sock.c
index 43bf3818c19e829b47d3989d36e2e1b3bf985438..7a3ed6262a7a3c603e3964e7c1b40c82ad9c8bff
100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2110,6 +2110,7 @@ static void sk_prot_free(struct proto *prot,
struct sock *sk)
        cgroup_sk_free(&sk->sk_cgrp_data);
        mem_cgroup_sk_free(sk);
        security_sk_free(sk);
+       memset(sk, 0, sizeof(*sk));
        if (slab != NULL)
                kmem_cache_free(slab, sk);
        else
Jakub Kicinski March 25, 2024, 1:34 p.m. UTC | #4
On Sat, 23 Mar 2024 05:45:26 +0100 Eric Dumazet wrote:
> > On Fri, 22 Mar 2024 13:57:32 +0000 Eric Dumazet wrote:  
> > > +     if (!sk->sk_net_refcnt)
> > > +             inet_csk_clear_xmit_timers_sync(sk);  
> >
> > The thought that we should clear or poison sk_net at this point
> > (whether sk->sk_net_refcnt or not) keeps coming back to me.
> > If we don't guarantee the pointer is valid - to make it easier
> > for syzbot to catch invalid accesses?  
> 
> I do not think we should do this here.
> 
> Note that KASAN has quarantine, and can catch invalid UAF accesses anyway.
> 
> We could clear the base socket in sk_prot_free() but this will not
> make KASAN better.

I was thinking mostly about the kernel sockets being "special",
and therefore less well exercised by syzbot. But sk_net will
remain valid until all user space for that netns exists, IIUC,
so I take it back, the clearing has a real chance of adding bugs.
patchwork-bot+netdevbpf@kernel.org March 26, 2024, 3:10 a.m. UTC | #5
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 22 Mar 2024 13:57:32 +0000 you wrote:
> We had various syzbot reports about tcp timers firing after
> the corresponding netns has been dismantled.
> 
> Fortunately Josef Bacik could trigger the issue more often,
> and could test a patch I wrote two years ago.
> 
> When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
> to 'stop' the timers.
> 
> [...]

Here is the summary with links:
  - [net] tcp: properly terminate timers for kernel sockets
    https://git.kernel.org/netdev/net/c/151c9c724d05

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 9ab4bf704e864358215d2370d33d3d9668681923..ccf171f7eb60d462e0ebf49c9e876016e963ffa5 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -175,6 +175,7 @@  void inet_csk_init_xmit_timers(struct sock *sk,
 			       void (*delack_handler)(struct timer_list *),
 			       void (*keepalive_handler)(struct timer_list *));
 void inet_csk_clear_xmit_timers(struct sock *sk);
+void inet_csk_clear_xmit_timers_sync(struct sock *sk);
 
 static inline void inet_csk_schedule_ack(struct sock *sk)
 {
diff --git a/include/net/sock.h b/include/net/sock.h
index b5e00702acc1f037df7eb8ad085d00e0b18079a8..f57bfd8a2ad2deaedf3f351325ab9336ae040504 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1759,6 +1759,13 @@  static inline void sock_owned_by_me(const struct sock *sk)
 #endif
 }
 
+static inline void sock_not_owned_by_me(const struct sock *sk)
+{
+#ifdef CONFIG_LOCKDEP
+	WARN_ON_ONCE(lockdep_sock_is_held(sk) && debug_locks);
+#endif
+}
+
 static inline bool sock_owned_by_user(const struct sock *sk)
 {
 	sock_owned_by_me(sk);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 7d8090f109ef4e794a13fb6ab5d180b16bafb59d..c038e28e2f1e66bf10c7f67ffe073e6790b2d6ce 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -771,6 +771,20 @@  void inet_csk_clear_xmit_timers(struct sock *sk)
 }
 EXPORT_SYMBOL(inet_csk_clear_xmit_timers);
 
+void inet_csk_clear_xmit_timers_sync(struct sock *sk)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
+	/* ongoing timer handlers need to acquire socket lock. */
+	sock_not_owned_by_me(sk);
+
+	icsk->icsk_pending = icsk->icsk_ack.pending = 0;
+
+	sk_stop_timer_sync(sk, &icsk->icsk_retransmit_timer);
+	sk_stop_timer_sync(sk, &icsk->icsk_delack_timer);
+	sk_stop_timer_sync(sk, &sk->sk_timer);
+}
+
 void inet_csk_delete_keepalive_timer(struct sock *sk)
 {
 	sk_stop_timer(sk, &sk->sk_timer);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d20b62d521712ae7982b1e73fddf7d4be0df696d..e767721b3a588b5d56567ae7badf5dffcd35a76a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2931,6 +2931,8 @@  void tcp_close(struct sock *sk, long timeout)
 	lock_sock(sk);
 	__tcp_close(sk, timeout);
 	release_sock(sk);
+	if (!sk->sk_net_refcnt)
+		inet_csk_clear_xmit_timers_sync(sk);
 	sock_put(sk);
 }
 EXPORT_SYMBOL(tcp_close);