Message ID | 20220610123319.32118-1-duoming@zju.edu.cn (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4] net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg | expand |
On Fri, 2022-06-10 at 20:33 +0800, Duoming Zhou wrote: > The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock > and block until it receives a packet from the remote. If the client > doesn`t connect to server and calls read() directly, it will not > receive any packets forever. As a result, the deadlock will happen. > > The fail log caused by deadlock is shown below: > > [ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds. > [ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 369.613058] Call Trace: > [ 369.613315] <TASK> > [ 369.614072] __schedule+0x2f9/0xb20 > [ 369.615029] schedule+0x49/0xb0 > [ 369.615734] __lock_sock+0x92/0x100 > [ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20 > [ 369.617941] lock_sock_nested+0x6e/0x70 > [ 369.618809] ax25_bind+0xaa/0x210 > [ 369.619736] __sys_bind+0xca/0xf0 > [ 369.620039] ? do_futex+0xae/0x1b0 > [ 369.620387] ? __x64_sys_futex+0x7c/0x1c0 > [ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40 > [ 369.620613] __x64_sys_bind+0x11/0x20 > [ 369.621791] do_syscall_64+0x3b/0x90 > [ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 369.623319] RIP: 0033:0x7f43c8aa8af7 > [ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 > [ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7 > [ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005 > [ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700 > [ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe > [ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700 > > This patch replaces skb_recv_datagram() with an open-coded variant of it > releasing the socket lock before the __skb_wait_for_more_packets() call > and re-acquiring it after such call in order that other functions that > need socket lock could be executed. > > what's more, the socket lock will be released only when recvmsg() will > block and that should produce nicer overall behavior. > > Fixes: 40d0a923f55a ("Implement locking of internal data for NET/ROM and ROSE.") The correct tag is actually: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") We don't pick hashes from historicaly repository, as they would confuse most/all users. Please re-post with the correct tag. Note that you can check patchwork yourself for this kind of formal errors. The patch LGTM, when you repost you can append: Acked-by: Paolo Abeni <pabeni@redhat.com> Thanks, Paolo
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 95393bb2760..4c7030ed8d3 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1661,9 +1661,12 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; - struct sk_buff *skb; + struct sk_buff *skb, *last; + struct sk_buff_head *sk_queue; int copied; int err = 0; + int off = 0; + long timeo; lock_sock(sk); /* @@ -1675,10 +1678,29 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, goto out; } - /* Now we can treat all alike */ - skb = skb_recv_datagram(sk, flags, &err); - if (skb == NULL) - goto out; + /* We need support for non-blocking reads. */ + sk_queue = &sk->sk_receive_queue; + skb = __skb_try_recv_datagram(sk, sk_queue, flags, &off, &err, &last); + /* If no packet is available, release_sock(sk) and try again. */ + if (!skb) { + if (err != -EAGAIN) + goto out; + release_sock(sk); + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + while (timeo && !__skb_wait_for_more_packets(sk, sk_queue, &err, + &timeo, last)) { + skb = __skb_try_recv_datagram(sk, sk_queue, flags, &off, + &err, &last); + if (skb) + break; + + if (err != -EAGAIN) + goto done; + } + if (!skb) + goto done; + lock_sock(sk); + } if (!sk_to_ax25(sk)->pidincl) skb_pull(skb, 1); /* Remove PID */ @@ -1725,6 +1747,7 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, out: release_sock(sk); +done: return err; }
The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock and block until it receives a packet from the remote. If the client doesn`t connect to server and calls read() directly, it will not receive any packets forever. As a result, the deadlock will happen. The fail log caused by deadlock is shown below: [ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds. [ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.613058] Call Trace: [ 369.613315] <TASK> [ 369.614072] __schedule+0x2f9/0xb20 [ 369.615029] schedule+0x49/0xb0 [ 369.615734] __lock_sock+0x92/0x100 [ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20 [ 369.617941] lock_sock_nested+0x6e/0x70 [ 369.618809] ax25_bind+0xaa/0x210 [ 369.619736] __sys_bind+0xca/0xf0 [ 369.620039] ? do_futex+0xae/0x1b0 [ 369.620387] ? __x64_sys_futex+0x7c/0x1c0 [ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40 [ 369.620613] __x64_sys_bind+0x11/0x20 [ 369.621791] do_syscall_64+0x3b/0x90 [ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 369.623319] RIP: 0033:0x7f43c8aa8af7 [ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7 [ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005 [ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700 [ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe [ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700 This patch replaces skb_recv_datagram() with an open-coded variant of it releasing the socket lock before the __skb_wait_for_more_packets() call and re-acquiring it after such call in order that other functions that need socket lock could be executed. what's more, the socket lock will be released only when recvmsg() will block and that should produce nicer overall behavior. Fixes: 40d0a923f55a ("Implement locking of internal data for NET/ROM and ROSE.") Suggested-by: Thomas Osterried <thomas@osterried.de> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reported-by: Thomas Habets <thomas@@habets.se> --- Changes in v4: - Replaces skb_recv_datagram() with an open-coded variant of it. net/ax25/af_ax25.c | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-)