Message ID | 20240322135732.1535772-1-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 151c9c724d05d5b0dd8acd3e11cb69ef1f2dbada |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: properly terminate timers for kernel sockets | expand |
On Fri, Mar 22, 2024 at 01:57:32PM +0000, Eric Dumazet wrote: > We had various syzbot reports about tcp timers firing after > the corresponding netns has been dismantled. > > Fortunately Josef Bacik could trigger the issue more often, > and could test a patch I wrote two years ago. > > When TCP sockets are closed, we call inet_csk_clear_xmit_timers() > to 'stop' the timers. > > inet_csk_clear_xmit_timers() can be called from any context, > including when socket lock is held. > This is the reason it uses sk_stop_timer(), aka del_timer(). > This means that ongoing timers might finish much later. > > For user sockets, this is fine because each running timer > holds a reference on the socket, and the user socket holds > a reference on the netns. > > For kernel sockets, we risk that the netns is freed before > timer can complete, because kernel sockets do not hold > reference on the netns. > > This patch adds inet_csk_clear_xmit_timers_sync() function > that using sk_stop_timer_sync() to make sure all timers > are terminated before the kernel socket is released. > Modules using kernel sockets close them in their netns exit() > handler. > > Also add sock_not_owned_by_me() helper to get LOCKDEP > support : inet_csk_clear_xmit_timers_sync() must not be called > while socket lock is held. > > It is very possible we can revert in the future commit > 3a58f13a881e ("net: rds: acquire refcount on TCP sockets") > which attempted to solve the issue in rds only. > (net/smc/af_smc.c and net/mptcp/subflow.c have similar code) > > We probably can remove the check_net() tests from > tcp_out_of_resources() and __tcp_close() in the future. > Thanks so much for your help with this Eric! Josef
On Fri, 22 Mar 2024 13:57:32 +0000 Eric Dumazet wrote: > + if (!sk->sk_net_refcnt) > + inet_csk_clear_xmit_timers_sync(sk); The thought that we should clear or poison sk_net at this point (whether sk->sk_net_refcnt or not) keeps coming back to me. If we don't guarantee the pointer is valid - to make it easier for syzbot to catch invalid accesses?
On Fri, Mar 22, 2024 at 11:47 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 22 Mar 2024 13:57:32 +0000 Eric Dumazet wrote: > > + if (!sk->sk_net_refcnt) > > + inet_csk_clear_xmit_timers_sync(sk); > > The thought that we should clear or poison sk_net at this point > (whether sk->sk_net_refcnt or not) keeps coming back to me. > If we don't guarantee the pointer is valid - to make it easier > for syzbot to catch invalid accesses? I do not think we should do this here. Note that KASAN has quarantine, and can catch invalid UAF accesses anyway. We could clear the base socket in sk_prot_free() but this will not make KASAN better. diff --git a/net/core/sock.c b/net/core/sock.c index 43bf3818c19e829b47d3989d36e2e1b3bf985438..7a3ed6262a7a3c603e3964e7c1b40c82ad9c8bff 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2110,6 +2110,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk) cgroup_sk_free(&sk->sk_cgrp_data); mem_cgroup_sk_free(sk); security_sk_free(sk); + memset(sk, 0, sizeof(*sk)); if (slab != NULL) kmem_cache_free(slab, sk); else
On Sat, 23 Mar 2024 05:45:26 +0100 Eric Dumazet wrote: > > On Fri, 22 Mar 2024 13:57:32 +0000 Eric Dumazet wrote: > > > + if (!sk->sk_net_refcnt) > > > + inet_csk_clear_xmit_timers_sync(sk); > > > > The thought that we should clear or poison sk_net at this point > > (whether sk->sk_net_refcnt or not) keeps coming back to me. > > If we don't guarantee the pointer is valid - to make it easier > > for syzbot to catch invalid accesses? > > I do not think we should do this here. > > Note that KASAN has quarantine, and can catch invalid UAF accesses anyway. > > We could clear the base socket in sk_prot_free() but this will not > make KASAN better. I was thinking mostly about the kernel sockets being "special", and therefore less well exercised by syzbot. But sk_net will remain valid until all user space for that netns exists, IIUC, so I take it back, the clearing has a real chance of adding bugs.
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 22 Mar 2024 13:57:32 +0000 you wrote: > We had various syzbot reports about tcp timers firing after > the corresponding netns has been dismantled. > > Fortunately Josef Bacik could trigger the issue more often, > and could test a patch I wrote two years ago. > > When TCP sockets are closed, we call inet_csk_clear_xmit_timers() > to 'stop' the timers. > > [...] Here is the summary with links: - [net] tcp: properly terminate timers for kernel sockets https://git.kernel.org/netdev/net/c/151c9c724d05 You are awesome, thank you!
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 9ab4bf704e864358215d2370d33d3d9668681923..ccf171f7eb60d462e0ebf49c9e876016e963ffa5 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -175,6 +175,7 @@ void inet_csk_init_xmit_timers(struct sock *sk, void (*delack_handler)(struct timer_list *), void (*keepalive_handler)(struct timer_list *)); void inet_csk_clear_xmit_timers(struct sock *sk); +void inet_csk_clear_xmit_timers_sync(struct sock *sk); static inline void inet_csk_schedule_ack(struct sock *sk) { diff --git a/include/net/sock.h b/include/net/sock.h index b5e00702acc1f037df7eb8ad085d00e0b18079a8..f57bfd8a2ad2deaedf3f351325ab9336ae040504 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1759,6 +1759,13 @@ static inline void sock_owned_by_me(const struct sock *sk) #endif } +static inline void sock_not_owned_by_me(const struct sock *sk) +{ +#ifdef CONFIG_LOCKDEP + WARN_ON_ONCE(lockdep_sock_is_held(sk) && debug_locks); +#endif +} + static inline bool sock_owned_by_user(const struct sock *sk) { sock_owned_by_me(sk); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 7d8090f109ef4e794a13fb6ab5d180b16bafb59d..c038e28e2f1e66bf10c7f67ffe073e6790b2d6ce 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -771,6 +771,20 @@ void inet_csk_clear_xmit_timers(struct sock *sk) } EXPORT_SYMBOL(inet_csk_clear_xmit_timers); +void inet_csk_clear_xmit_timers_sync(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + /* ongoing timer handlers need to acquire socket lock. */ + sock_not_owned_by_me(sk); + + icsk->icsk_pending = icsk->icsk_ack.pending = 0; + + sk_stop_timer_sync(sk, &icsk->icsk_retransmit_timer); + sk_stop_timer_sync(sk, &icsk->icsk_delack_timer); + sk_stop_timer_sync(sk, &sk->sk_timer); +} + void inet_csk_delete_keepalive_timer(struct sock *sk) { sk_stop_timer(sk, &sk->sk_timer); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index d20b62d521712ae7982b1e73fddf7d4be0df696d..e767721b3a588b5d56567ae7badf5dffcd35a76a 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2931,6 +2931,8 @@ void tcp_close(struct sock *sk, long timeout) lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); + if (!sk->sk_net_refcnt) + inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } EXPORT_SYMBOL(tcp_close);