Message ID | 20250228055106.58071-2-jiayuan.chen@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bpf: Fix use-after-free of sockmap | expand |
On 2/28/25 06:51, Jiayuan Chen wrote: > ... > static void sk_psock_verdict_data_ready(struct sock *sk) > { > - struct socket *sock = sk->sk_socket; > + struct socket *sock; > const struct proto_ops *ops; > int copied; > > trace_sk_data_ready(sk); > > + /* We need RCU to prevent the sk_socket from being released. > + * Especially for Unix sockets, we are currently in the process > + * context and do not have RCU protection. > + */ > + rcu_read_lock(); > + sock = sk->sk_socket; > if (unlikely(!sock)) > - return; > + goto unlock; > + > ops = READ_ONCE(sock->ops); > if (!ops || !ops->read_skb) > - return; > + goto unlock; > + > copied = ops->read_skb(sk, sk_psock_verdict_recv); > if (copied >= 0) { > struct sk_psock *psock; > > - rcu_read_lock(); > psock = sk_psock(sk); > if (psock) > sk_psock_data_ready(sk, psock); > - rcu_read_unlock(); > } > +unlock: > + rcu_read_unlock(); > } Hi, Doesn't sk_psock_handle_skb() (!ingress path) have the same `struct socket` release race issue? Any plans on fixing that one, too? BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's read_skb() under RCU read lock. Thanks, Michal
March 7, 2025 at 5:45 PM, "Michal Luczaj" <mhal@rbox.co> wrote: > > On 2/28/25 06:51, Jiayuan Chen wrote: > > > > > ... > > > > static void sk_psock_verdict_data_ready(struct sock *sk) > > > > { > > > > - struct socket *sock = sk->sk_socket; > > > > + struct socket *sock; > > > > const struct proto_ops *ops; > > > > int copied; > > > > > > > > trace_sk_data_ready(sk); > > > > > > > > + /* We need RCU to prevent the sk_socket from being released. > > > > + * Especially for Unix sockets, we are currently in the process > > > > + * context and do not have RCU protection. > > > > + */ > > > > + rcu_read_lock(); > > > > + sock = sk->sk_socket; > > > > if (unlikely(!sock)) > > > > - return; > > > > + goto unlock; > > > > + > > > > ops = READ_ONCE(sock->ops); > > > > if (!ops || !ops->read_skb) > > > > - return; > > > > + goto unlock; > > > > + > > > > copied = ops->read_skb(sk, sk_psock_verdict_recv); > > > > if (copied >= 0) { > > > > struct sk_psock *psock; > > > > > > > > - rcu_read_lock(); > > > > psock = sk_psock(sk); > > > > if (psock) > > > > sk_psock_data_ready(sk, psock); > > > > - rcu_read_unlock(); > > > > } > > > > +unlock: > > > > + rcu_read_unlock(); > > > > } > > > > Hi, > > Doesn't sk_psock_handle_skb() (!ingress path) have the same `struct socket` > > release race issue? Any plans on fixing that one, too? Yes, the send path logic also has similar issues, and after some hacking, I was able to reproduce it. Thanks for providing this information. I can fix these together in the next revision of this patchset, anyway, this patchset still needs further confirmation from the maintainer. > > BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's > > read_skb() under RCU read lock. > > Thanks, > > Michal > My environment also has LOCKDEP enabled, but I didn't see similar warnings. Moreover, RCU assertions are typically written as: WARN_ON_ONCE(!rcu_read_lock_held()) And when LOCKDEP is not enabled, rcu_read_lock_held() defaults to returning 1. So, it's unlikely to trigger a warning due to an RCU lock being held. Could you provide more of the call stack? Thanks.
On 3/10/25 12:36, Jiayuan Chen wrote: > March 7, 2025 at 5:45 PM, "Michal Luczaj" <mhal@rbox.co> wrote: > ... >> BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's >> read_skb() under RCU read lock. >> > My environment also has LOCKDEP enabled, but I didn't see similar > warnings. > Moreover, RCU assertions are typically written as: > > WARN_ON_ONCE(!rcu_read_lock_held()) > > And when LOCKDEP is not enabled, rcu_read_lock_held() defaults to > returning 1. So, it's unlikely to trigger a warning due to an RCU lock > being held. > > Could you provide more of the call stack? Sure, bpf-next with this series applied, test_progs -t sockmap_basic: ============================= [ BUG: Invalid wait context ] 6.14.0-rc3+ #111 Tainted: G OE ----------------------------- test_progs/37755 is trying to lock: ffff88810d9bc3c0 (&u->iolock){+.+.}-{4:4}, at: unix_stream_read_skb+0x30/0x120 other info that might help us debug this: context-{5:5} 1 lock held by test_progs/37755: #0: ffffffff833700e0 (rcu_read_lock){....}-{1:3}, at: sk_psock_verdict_data_ready+0x3e/0x2a0 stack backtrace: CPU: 13 UID: 0 PID: 37755 Comm: test_progs Tainted: G OE 6.14.0-rc3+ #111 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Call Trace: dump_stack_lvl+0x68/0x90 lock_acquire+0xcf/0x2e0 __mutex_lock+0x9c/0xcc0 unix_stream_read_skb+0x30/0x120 sk_psock_verdict_data_ready+0x8d/0x2a0 unix_stream_sendmsg+0x232/0x640 __sys_sendto+0x1cd/0x1e0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e
March 10, 2025 at 9:08 PM, "Michal Luczaj" <mhal@rbox.co> wrote: > > On 3/10/25 12:36, Jiayuan Chen wrote: > > > > > March 7, 2025 at 5:45 PM, "Michal Luczaj" <mhal@rbox.co> wrote: > > > > ... > > > > > > > > BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's > > > > > > read_skb() under RCU read lock. > > > > > > > My environment also has LOCKDEP enabled, but I didn't see similar > > > > warnings. > > > > Moreover, RCU assertions are typically written as: > > > > > > > > WARN_ON_ONCE(!rcu_read_lock_held()) > > > > > > > > And when LOCKDEP is not enabled, rcu_read_lock_held() defaults to > > > > returning 1. So, it's unlikely to trigger a warning due to an RCU lock > > > > being held. > > > > > > > > Could you provide more of the call stack? > > > > Sure, bpf-next with this series applied, test_progs -t sockmap_basic: > > ============================= > > [ BUG: Invalid wait context ] > > 6.14.0-rc3+ #111 Tainted: G OE > > ----------------------------- > > test_progs/37755 is trying to lock: > > ffff88810d9bc3c0 (&u->iolock){+.+.}-{4:4}, at: unix_stream_read_skb+0x30/0x120 > > other info that might help us debug this: > > context-{5:5} > > 1 lock held by test_progs/37755: > > #0: ffffffff833700e0 (rcu_read_lock){....}-{1:3}, at: sk_psock_verdict_data_ready+0x3e/0x2a0 > > stack backtrace: > > CPU: 13 UID: 0 PID: 37755 Comm: test_progs Tainted: G OE 6.14.0-rc3+ #111 > > Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > > Call Trace: > > dump_stack_lvl+0x68/0x90 > > lock_acquire+0xcf/0x2e0 > > __mutex_lock+0x9c/0xcc0 > > unix_stream_read_skb+0x30/0x120 > > sk_psock_verdict_data_ready+0x8d/0x2a0 > > unix_stream_sendmsg+0x232/0x640 > > __sys_sendto+0x1cd/0x1e0 > > __x64_sys_sendto+0x20/0x30 > > do_syscall_64+0x93/0x180 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > Thanks, I got this stack too after enabling CONFIG_PROVE_LOCKING. It seems that we can't call sleepable lock such as mutex_lock under rcu-locked context. I'm working on it.
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 0ddc4c718833..1b71ae1d1bf5 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1222,27 +1222,35 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) static void sk_psock_verdict_data_ready(struct sock *sk) { - struct socket *sock = sk->sk_socket; + struct socket *sock; const struct proto_ops *ops; int copied; trace_sk_data_ready(sk); + /* We need RCU to prevent the sk_socket from being released. + * Especially for Unix sockets, we are currently in the process + * context and do not have RCU protection. + */ + rcu_read_lock(); + sock = sk->sk_socket; if (unlikely(!sock)) - return; + goto unlock; + ops = READ_ONCE(sock->ops); if (!ops || !ops->read_skb) - return; + goto unlock; + copied = ops->read_skb(sk, sk_psock_verdict_recv); if (copied >= 0) { struct sk_psock *psock; - rcu_read_lock(); psock = sk_psock(sk); if (psock) sk_psock_data_ready(sk, psock); - rcu_read_unlock(); } +unlock: + rcu_read_unlock(); } void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)