Message ID | 20220821125751.4185-1-yin31149@gmail.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | rxrpc: fix bad unlock balance in rxrpc_do_sendmsg | expand |
On Sun, Aug 21, 2022 at 6:58 PM Hawkins Jiawei <yin31149@gmail.com> wrote: > > Syzkaller reports bad unlock balance bug as follows: > ------------[ cut here ]------------ > WARNING: bad unlock balance detected! > syz-executor.0/4094 is trying to release lock (&call->user_mutex) at: > [<ffffffff87c1d8d1>] rxrpc_do_sendmsg+0x851/0x1110 net/rxrpc/sendmsg.c:754 > but there are no more locks to release! > > other info that might help us debug this: > no locks held by syz-executor.0/4094. > > stack backtrace: > [...] > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0x57/0x7d lib/dump_stack.c:106 > print_unlock_imbalance_bug include/trace/events/lock.h:69 [inline] > __lock_release kernel/locking/lockdep.c:5333 [inline] > lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5686 > __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:907 > rxrpc_do_sendmsg+0x851/0x1110 net/rxrpc/sendmsg.c:754 > sock_sendmsg_nosec net/socket.c:714 [inline] > sock_sendmsg+0xab/0xe0 net/socket.c:734 > ____sys_sendmsg+0x5c2/0x7a0 net/socket.c:2485 > ___sys_sendmsg+0xdb/0x160 net/socket.c:2539 > __sys_sendmsg+0xc3/0x160 net/socket.c:2568 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > [...] > </TASK> > ------------------------------------ > > When kernel wants to send a message through an RxRPC socket in > rxrpc_do_sendmsg(), kernel should hold the call->user_mutex lock, > or it will triggers bug when releasing this lock before returning > from rxrpc_do_sendmsg(). > > Yet the problem is that during rxrpc_do_sendmsg(), kernel may call > rxrpc_wait_for_tx_window_intr() to wait for space to appear in the > tx queue or a signal to occur. When kernel fails the > mutex_lock_interruptible(), kernel will returns from the > rxrpc_wait_for_tx_window_intr() without acquiring the mutex lock, then > triggers bug when releasing the mutex lock in rxrpc_do_sendmsg(). > > This patch solves it by acquiring the call->user_mutex lock, when > kernel fails the mutex_lock_interruptible() before returning from > the rxrpc_wait_for_tx_window_intr(). > > Reported-and-tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > --- > net/rxrpc/sendmsg.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c > index 1d38e279e2ef..e13043d357d5 100644 > --- a/net/rxrpc/sendmsg.c > +++ b/net/rxrpc/sendmsg.c > @@ -53,8 +53,10 @@ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, > trace_rxrpc_transmit(call, rxrpc_transmit_wait); > mutex_unlock(&call->user_mutex); > *timeo = schedule_timeout(*timeo); > - if (mutex_lock_interruptible(&call->user_mutex) < 0) > + if (mutex_lock_interruptible(&call->user_mutex) < 0) { > + mutex_lock(&call->user_mutex); The interruptible version fails to acquire the lock. So why is it okay to force it to acquire the mutex_lock since we are in the interrupt context? > return sock_intr_errno(*timeo); > + } > } > } thanks, -- Khalid Masum
On Sun, Aug 21, 2022 at 9:58 PM Khalid Masum <khalid.masum.92@gmail.com> wrote: > > On Sun, Aug 21, 2022 at 6:58 PM Hawkins Jiawei <yin31149@gmail.com> wrote: > > > The interruptible version fails to acquire the lock. So why is it okay to > force it to acquire the mutex_lock since we are in the interrupt context? Sorry, I mean, won't the function lose its ability of being interruptible? Since we are forcing it to acquire the lock. > > return sock_intr_errno(*timeo); > > + } > > } > > } > > thanks, > -- Khalid Masum
On Mon, 22 Aug 2022 at 00:42, Khalid Masum <khalid.masum.92@gmail.com> wrote: > > On Sun, Aug 21, 2022 at 9:58 PM Khalid Masum <khalid.masum.92@gmail.com> wrote: > > > > On Sun, Aug 21, 2022 at 6:58 PM Hawkins Jiawei <yin31149@gmail.com> wrote: > > > > > The interruptible version fails to acquire the lock. So why is it okay to > > force it to acquire the mutex_lock since we are in the interrupt context? > > Sorry, I mean, won't the function lose its ability of being interruptible? > Since we are forcing it to acquire the lock. > > > return sock_intr_errno(*timeo); > > > + } > > > } > > > } > > > > thanks, > > -- Khalid Masum Hi, Khalid In my opinion, _intr in rxrpc_wait_for_tx_window_intr() seems referring that, the loop in function should be interrupted when a signal arrives(Please correct me if I am wrong): > /* > * Wait for space to appear in the Tx queue or a signal to occur. > */ > static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, > struct rxrpc_call *call, > long *timeo) > { > for (;;) { > set_current_state(TASK_INTERRUPTIBLE); > if (rxrpc_check_tx_space(call, NULL)) > return 0; > > if (call->state >= RXRPC_CALL_COMPLETE) > return call->error; > > if (signal_pending(current)) > return sock_intr_errno(*timeo); > > trace_rxrpc_transmit(call, rxrpc_transmit_wait); > mutex_unlock(&call->user_mutex); > *timeo = schedule_timeout(*timeo); > if (mutex_lock_interruptible(&call->user_mutex) < 0) > return sock_intr_errno(*timeo); > } > } To be more specific, when a signal arrives, rxrpc_wait_for_tx_window_intr() should know when executing mutex_lock_interruptible() and get a non-zero value. Then rxrpc_wait_for_tx_window_intr() should be interrupted, which means function should be returned. So I think, acquiring mutex_lock() seems won't effect its ability of being interruptible.(Please correct me if I am wrong). What's more, when the kernel return from rxrpc_wait_for_tx_window_intr(), it will only handles the error case before unlocking the call->user_mutex, which won't cost a long time. So I think it seems Ok to acquire the call->user_mutex when rxrpc_wait_for_tx_window_intr() is interrupted by a signal. On Mon, 22 Aug 2022 at 03:18, Khalid Masum <khalid.masum.92@gmail.com> wrote: > > Maybe we do not need to lock since no other timer_schedule needs > it. > > Test if this fixes the issue. > --- > diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c > index 1d38e279e2ef..640e2ab2cc35 100644 > --- a/net/rxrpc/sendmsg.c > +++ b/net/rxrpc/sendmsg.c > @@ -51,10 +51,8 @@ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, > return sock_intr_errno(*timeo); > > trace_rxrpc_transmit(call, rxrpc_transmit_wait); > - mutex_unlock(&call->user_mutex); > *timeo = schedule_timeout(*timeo); > - if (mutex_lock_interruptible(&call->user_mutex) < 0) > - return sock_intr_errno(*timeo); > + return sock_intr_errno(*timeo); > } > } > > -- > 2.37.1 > If it is still improper to patch this bug by acquiring the call->user_mutex, I wonder if it is better to check before unlocking the lock in rxrpc_do_sendmsg(), because kernel will always unlocking the call->user_mutex in the end of the rxrpc_do_sendmsg(): > int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) > __releases(&rx->sk.sk_lock.slock) > __releases(&call->user_mutex) > { > ... > out_put_unlock: > mutex_unlock(&call->user_mutex); > error_put: > rxrpc_put_call(call, rxrpc_call_put); > _leave(" = %d", ret); > return ret; > > error_release_sock: > release_sock(&rx->sk); > return ret; > }
Hawkins Jiawei <yin31149@gmail.com> wrote: > - if (mutex_lock_interruptible(&call->user_mutex) < 0) > + if (mutex_lock_interruptible(&call->user_mutex) < 0) { > + mutex_lock(&call->user_mutex); Yeah, as Khalid points out that kind of makes the interruptible lock pointless. Either rxrpc_send_data() needs to return a separate indication that we returned without the lock held or it needs to always drop the lock on error (at least for ERESTARTSYS/EINTR) which can be checked for higher up. David
Actually, there's another bug here too: if rxrpc_wait_for_tx_window() drops the call mutex then it needs to reload the pending packet state in rxrpc_send_data() as it may have raced with another sendmsg(). David
On Mon, 22 Aug 2022 at 16:48, David Howells <dhowells@redhat.com> wrote: > > Hawkins Jiawei <yin31149@gmail.com> wrote: > > > - if (mutex_lock_interruptible(&call->user_mutex) < 0) > > + if (mutex_lock_interruptible(&call->user_mutex) < 0) { > > + mutex_lock(&call->user_mutex); > > Yeah, as Khalid points out that kind of makes the interruptible lock > pointless. Either rxrpc_send_data() needs to return a separate indication > that we returned without the lock held or it needs to always drop the lock on > error (at least for ERESTARTSYS/EINTR) which can be checked for higher up. Hi David, For second option, I think it may meet some difficulty, because we cannot figure out whether rxrpc_send_data() meets lock error. To be more specific, rxrpc_send_data() may still returns the number it has copied even rxrpc_send_data() meets lock error, if rxrpc_send_data() has successfully dealed some data.(Please correct me if I am wrong) So I think the first option seems better. I wonder if we can add an argument in rxrpc_send_data() as an indication you pointed out? Maybe: diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 1d38e279e2ef..0801325a7c7f 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -284,13 +284,18 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, /* * send data through a socket + * @holding_mutex: rxrpc_send_data() will assign this pointer with True + * if functions still holds the call user access mutex when returned to caller. + * This argument can be NULL, which will effect nothing. + * * - must be called in process context * - The caller holds the call user access mutex, but not the socket lock. */ static int rxrpc_send_data(struct rxrpc_sock *rx, struct rxrpc_call *call, struct msghdr *msg, size_t len, - rxrpc_notify_end_tx_t notify_end_tx) + rxrpc_notify_end_tx_t notify_end_tx, + bool *holding_mutex) { struct rxrpc_skb_priv *sp; struct sk_buff *skb; @@ -299,6 +304,9 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, bool more; int ret, copied; + if (holding_mutex) + *holding_mutex = true; + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); /* this should be in poll */ @@ -338,8 +346,11 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, ret = rxrpc_wait_for_tx_window(rx, call, &timeo, msg->msg_flags & MSG_WAITALL); - if (ret < 0) + if (ret < 0) { + if (holding_mutex) + *holding_mutex = false; goto maybe_error; + } } /* Work out the maximum size of a packet. Assume that @@ -630,6 +641,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) struct rxrpc_call *call; unsigned long now, j; int ret; + bool holding_user_mutex; struct rxrpc_send_params p = { .call.tx_total_len = -1, @@ -747,7 +759,9 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) /* Reply phase not begun or not complete for service call. */ ret = -EPROTO; } else { - ret = rxrpc_send_data(rx, call, msg, len, NULL); + ret = rxrpc_send_data(rx, call, msg, len, NULL, &holding_user_mutex); + if (!holding_user_mutex) + goto error_put; } out_put_unlock: @@ -796,7 +810,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, case RXRPC_CALL_SERVER_ACK_REQUEST: case RXRPC_CALL_SERVER_SEND_REPLY: ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len, - notify_end_tx); + notify_end_tx, NULL); break; case RXRPC_CALL_COMPLETE: read_lock_bh(&call->state_lock); On Mon, 22 Aug 2022 at 17:21, David Howells <dhowells@redhat.com> wrote: > > Actually, there's another bug here too: if rxrpc_wait_for_tx_window() drops > the call mutex then it needs to reload the pending packet state in > rxrpc_send_data() as it may have raced with another sendmsg(). > > David > After applying the above patch, kernel still panic, but seems not the bad unlock balance bug before. yet I am not sure if this is the same bug you mentioned. Kernel output as below: [ 39.115966][ T6508] ==================================== [ 39.116940][ T6508] WARNING: syz/6508 still has locks held! [ 39.117978][ T6508] 6.0.0-rc1-00066-g3b06a2755758-dirty #186 Not tainted [ 39.119353][ T6508] ------------------------------------ [ 39.120321][ T6508] 1 lock held by syz/6508: [ 39.121122][ T6508] #0: ffff88801f472b20 (&call->user_mutex){....}-{3:3}0 [ 39.123014][ T6508] [ 39.123014][ T6508] stack backtrace: [ 39.123925][ T6508] CPU: 0 PID: 6508 Comm: syz Not tainted 6.0.0-rc1-00066 [ 39.125304][ T6508] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)4 [ 39.125304][ T6508] Call Trace: [ 39.125304][ T6508] <TASK> [ 39.125304][ T6508] dump_stack_lvl+0x8e/0xdd [ 39.125304][ T6508] get_signal+0x1866/0x24d0 [ 39.125304][ T6508] ? lock_acquire+0x172/0x310 [ 39.125304][ T6508] ? exit_signals+0x7b0/0x7b0 [ 39.125304][ T6508] arch_do_signal_or_restart+0x82/0x23f0 [ 39.125304][ T6508] ? __sanitizer_cov_trace_pc+0x1a/0x40 [ 39.125304][ T6508] ? __fget_light+0x20d/0x270 [ 39.125304][ T6508] ? get_sigframe_size+0x10/0x10 [ 39.125304][ T6508] ? __sanitizer_cov_trace_pc+0x1a/0x40 [ 39.125304][ T6508] ? __sys_sendmsg+0x11a/0x1c0 [ 39.125304][ T6508] ? __sys_sendmsg_sock+0x30/0x30 [ 39.125304][ T6508] exit_to_user_mode_prepare+0x146/0x1b0 [ 39.125304][ T6508] syscall_exit_to_user_mode+0x12/0x30 [ 39.125304][ T6508] do_syscall_64+0x42/0xb0 [ 39.125304][ T6508] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 39.125304][ T6508] RIP: 0033:0x44fbad [ 39.125304][ T6508] Code: c3 e8 97 29 00 00 0f 1f 80 00 00 00 00 f3 0f 1e8 [ 39.125304][ T6508] RSP: 002b:00007f4b8ae22d48 EFLAGS: 00000246 ORIG_RAX:e [ 39.125304][ T6508] RAX: fffffffffffffffc RBX: 0000000000000000 RCX: 0000d [ 39.125304][ T6508] RDX: 0000000000000186 RSI: 0000000020000000 RDI: 00003 [ 39.125304][ T6508] RBP: 00007f4b8ae22d80 R08: 00007f4b8ae23700 R09: 00000 [ 39.125304][ T6508] R10: 00007f4b8ae23700 R11: 0000000000000246 R12: 0000e [ 39.125304][ T6508] R13: 00007ffe483304af R14: 00007ffe48330550 R15: 00000 [ 39.125304][ T6508] </TASK> ==================================== I will make a deeper look and try to patch it.
On Mon, 22 Aug 2022 at 16:48, David Howells <dhowells@redhat.com> wrote: > > Hawkins Jiawei <yin31149@gmail.com> wrote: > > > - if (mutex_lock_interruptible(&call->user_mutex) < 0) > > + if (mutex_lock_interruptible(&call->user_mutex) < 0) { > > + mutex_lock(&call->user_mutex); > > Yeah, as Khalid points out that kind of makes the interruptible lock > pointless. Either rxrpc_send_data() needs to return a separate indication > that we returned without the lock held or it needs to always drop the lock on > error (at least for ERESTARTSYS/EINTR) which can be checked for higher up. Hi David, For second option, I think it may meet some difficulty, because we cannot figure out whether rxrpc_send_data() meets lock error. To be more specific, rxrpc_send_data() may still returns the number it has copied even rxrpc_send_data() meets lock error, if rxrpc_send_data() has successfully dealed some data.(Please correct me if I am wrong) So I think the first option seems better. I wonder if we can add an argument in rxrpc_send_data() as an indication you pointed out? Maybe: diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 1d38e279e2ef..0801325a7c7f 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -284,13 +284,18 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, /* * send data through a socket + * @holding_mutex: rxrpc_send_data() will assign this pointer with True + * if functions still holds the call user access mutex when returned to caller. + * This argument can be NULL, which will effect nothing. + * * - must be called in process context * - The caller holds the call user access mutex, but not the socket lock. */ static int rxrpc_send_data(struct rxrpc_sock *rx, struct rxrpc_call *call, struct msghdr *msg, size_t len, - rxrpc_notify_end_tx_t notify_end_tx) + rxrpc_notify_end_tx_t notify_end_tx, + bool *holding_mutex) { struct rxrpc_skb_priv *sp; struct sk_buff *skb; @@ -299,6 +304,9 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, bool more; int ret, copied; + if (holding_mutex) + *holding_mutex = true; + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); /* this should be in poll */ @@ -338,8 +346,11 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, ret = rxrpc_wait_for_tx_window(rx, call, &timeo, msg->msg_flags & MSG_WAITALL); - if (ret < 0) + if (ret < 0) { + if (holding_mutex) + *holding_mutex = false; goto maybe_error; + } } /* Work out the maximum size of a packet. Assume that @@ -630,6 +641,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) struct rxrpc_call *call; unsigned long now, j; int ret; + bool holding_user_mutex; struct rxrpc_send_params p = { .call.tx_total_len = -1, @@ -747,7 +759,9 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) /* Reply phase not begun or not complete for service call. */ ret = -EPROTO; } else { - ret = rxrpc_send_data(rx, call, msg, len, NULL); + ret = rxrpc_send_data(rx, call, msg, len, NULL, &holding_user_mutex); + if (!holding_user_mutex) + goto error_put; } out_put_unlock: @@ -796,7 +810,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, case RXRPC_CALL_SERVER_ACK_REQUEST: case RXRPC_CALL_SERVER_SEND_REPLY: ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len, - notify_end_tx); + notify_end_tx, NULL); break; case RXRPC_CALL_COMPLETE: read_lock_bh(&call->state_lock); On Mon, 22 Aug 2022 at 17:21, David Howells <dhowells@redhat.com> wrote: > > Actually, there's another bug here too: if rxrpc_wait_for_tx_window() drops > the call mutex then it needs to reload the pending packet state in > rxrpc_send_data() as it may have raced with another sendmsg(). > > David > After applying the above patch, kernel still panic, but seems not the bad unlock balance bug before. yet I am not sure if this is the same bug you mentioned. Kernel output as below: [ 39.115966][ T6508] ==================================== [ 39.116940][ T6508] WARNING: syz/6508 still has locks held! [ 39.117978][ T6508] 6.0.0-rc1-00066-g3b06a2755758-dirty #186 Not tainted [ 39.119353][ T6508] ------------------------------------ [ 39.120321][ T6508] 1 lock held by syz/6508: [ 39.121122][ T6508] #0: ffff88801f472b20 (&call->user_mutex){....}-{3:3}0 [ 39.123014][ T6508] [ 39.123014][ T6508] stack backtrace: [ 39.123925][ T6508] CPU: 0 PID: 6508 Comm: syz Not tainted 6.0.0-rc1-00066 [ 39.125304][ T6508] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)4 [ 39.125304][ T6508] Call Trace: [ 39.125304][ T6508] <TASK> [ 39.125304][ T6508] dump_stack_lvl+0x8e/0xdd [ 39.125304][ T6508] get_signal+0x1866/0x24d0 [ 39.125304][ T6508] ? lock_acquire+0x172/0x310 [ 39.125304][ T6508] ? exit_signals+0x7b0/0x7b0 [ 39.125304][ T6508] arch_do_signal_or_restart+0x82/0x23f0 [ 39.125304][ T6508] ? __sanitizer_cov_trace_pc+0x1a/0x40 [ 39.125304][ T6508] ? __fget_light+0x20d/0x270 [ 39.125304][ T6508] ? get_sigframe_size+0x10/0x10 [ 39.125304][ T6508] ? __sanitizer_cov_trace_pc+0x1a/0x40 [ 39.125304][ T6508] ? __sys_sendmsg+0x11a/0x1c0 [ 39.125304][ T6508] ? __sys_sendmsg_sock+0x30/0x30 [ 39.125304][ T6508] exit_to_user_mode_prepare+0x146/0x1b0 [ 39.125304][ T6508] syscall_exit_to_user_mode+0x12/0x30 [ 39.125304][ T6508] do_syscall_64+0x42/0xb0 [ 39.125304][ T6508] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 39.125304][ T6508] RIP: 0033:0x44fbad [ 39.125304][ T6508] Code: c3 e8 97 29 00 00 0f 1f 80 00 00 00 00 f3 0f 1e8 [ 39.125304][ T6508] RSP: 002b:00007f4b8ae22d48 EFLAGS: 00000246 ORIG_RAX:e [ 39.125304][ T6508] RAX: fffffffffffffffc RBX: 0000000000000000 RCX: 0000d [ 39.125304][ T6508] RDX: 0000000000000186 RSI: 0000000020000000 RDI: 00003 [ 39.125304][ T6508] RBP: 00007f4b8ae22d80 R08: 00007f4b8ae23700 R09: 00000 [ 39.125304][ T6508] R10: 00007f4b8ae23700 R11: 0000000000000246 R12: 0000e [ 39.125304][ T6508] R13: 00007ffe483304af R14: 00007ffe48330550 R15: 00000 [ 39.125304][ T6508] </TASK> ==================================== I will make a deeper look and try to patch it.
On Mon, 22 Aug 2022 at 19:30, Hawkins Jiawei <yin31149@gmail.com> wrote: > > On Mon, 22 Aug 2022 at 16:48, David Howells <dhowells@redhat.com> wrote: > > > > Hawkins Jiawei <yin31149@gmail.com> wrote: > > > > > - if (mutex_lock_interruptible(&call->user_mutex) < 0) > > > + if (mutex_lock_interruptible(&call->user_mutex) < 0) { > > > + mutex_lock(&call->user_mutex); > > > > Yeah, as Khalid points out that kind of makes the interruptible lock > > pointless. Either rxrpc_send_data() needs to return a separate indication > > that we returned without the lock held or it needs to always drop the lock on > > error (at least for ERESTARTSYS/EINTR) which can be checked for higher up. > Hi David, > > For second option, I think it may meet some difficulty, because we > cannot figure out whether rxrpc_send_data() meets lock error. > To be more specific, rxrpc_send_data() may still returns the number > it has copied even rxrpc_send_data() meets lock error, if > rxrpc_send_data() has successfully dealed some data.(Please correct me > if I am wrong) > > So I think the first option seems better. I wonder if we can add an > argument in rxrpc_send_data() as an indication you pointed out? Maybe: > > diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c > index 1d38e279e2ef..0801325a7c7f 100644 > --- a/net/rxrpc/sendmsg.c > +++ b/net/rxrpc/sendmsg.c > @@ -284,13 +284,18 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, > > /* > * send data through a socket > + * @holding_mutex: rxrpc_send_data() will assign this pointer with True > + * if functions still holds the call user access mutex when returned to caller. > + * This argument can be NULL, which will effect nothing. > + * > * - must be called in process context > * - The caller holds the call user access mutex, but not the socket lock. > */ > static int rxrpc_send_data(struct rxrpc_sock *rx, > struct rxrpc_call *call, > struct msghdr *msg, size_t len, > - rxrpc_notify_end_tx_t notify_end_tx) > + rxrpc_notify_end_tx_t notify_end_tx, > + bool *holding_mutex) > { > struct rxrpc_skb_priv *sp; > struct sk_buff *skb; > @@ -299,6 +304,9 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, > bool more; > int ret, copied; > > + if (holding_mutex) > + *holding_mutex = true; > + > timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); > > /* this should be in poll */ > @@ -338,8 +346,11 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, > ret = rxrpc_wait_for_tx_window(rx, call, > &timeo, > msg->msg_flags & MSG_WAITALL); > - if (ret < 0) > + if (ret < 0) { > + if (holding_mutex) > + *holding_mutex = false; > goto maybe_error; > + } > } > > /* Work out the maximum size of a packet. Assume that > @@ -630,6 +641,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) > struct rxrpc_call *call; > unsigned long now, j; > int ret; > + bool holding_user_mutex; > > struct rxrpc_send_params p = { > .call.tx_total_len = -1, > @@ -747,7 +759,9 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) > /* Reply phase not begun or not complete for service call. */ > ret = -EPROTO; > } else { > - ret = rxrpc_send_data(rx, call, msg, len, NULL); > + ret = rxrpc_send_data(rx, call, msg, len, NULL, &holding_user_mutex); > + if (!holding_user_mutex) > + goto error_put; > } > > out_put_unlock: > @@ -796,7 +810,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, > case RXRPC_CALL_SERVER_ACK_REQUEST: > case RXRPC_CALL_SERVER_SEND_REPLY: > ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len, > - notify_end_tx); > + notify_end_tx, NULL); > break; > case RXRPC_CALL_COMPLETE: > read_lock_bh(&call->state_lock); > > After applying the above patch, kernel still panic, but seems not the > bad unlock balance bug before. yet I am not sure if this is the same bug you > mentioned. Kernel output as below: > [ 39.115966][ T6508] ==================================== > [ 39.116940][ T6508] WARNING: syz/6508 still has locks held! > [ 39.117978][ T6508] 6.0.0-rc1-00066-g3b06a2755758-dirty #186 Not tainted > [ 39.119353][ T6508] ------------------------------------ > [ 39.120321][ T6508] 1 lock held by syz/6508: > [ 39.121122][ T6508] #0: ffff88801f472b20 (&call->user_mutex){....}-{3:3}0 > [ 39.123014][ T6508] > [ 39.123014][ T6508] stack backtrace: > [ 39.123925][ T6508] CPU: 0 PID: 6508 Comm: syz Not tainted 6.0.0-rc1-00066 > [ 39.125304][ T6508] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)4 > [ 39.125304][ T6508] Call Trace: > [ 39.125304][ T6508] <TASK> > [ 39.125304][ T6508] dump_stack_lvl+0x8e/0xdd > [ 39.125304][ T6508] get_signal+0x1866/0x24d0 > [ 39.125304][ T6508] ? lock_acquire+0x172/0x310 > [ 39.125304][ T6508] ? exit_signals+0x7b0/0x7b0 > [ 39.125304][ T6508] arch_do_signal_or_restart+0x82/0x23f0 > [ 39.125304][ T6508] ? __sanitizer_cov_trace_pc+0x1a/0x40 > [ 39.125304][ T6508] ? __fget_light+0x20d/0x270 > [ 39.125304][ T6508] ? get_sigframe_size+0x10/0x10 > [ 39.125304][ T6508] ? __sanitizer_cov_trace_pc+0x1a/0x40 > [ 39.125304][ T6508] ? __sys_sendmsg+0x11a/0x1c0 > [ 39.125304][ T6508] ? __sys_sendmsg_sock+0x30/0x30 > [ 39.125304][ T6508] exit_to_user_mode_prepare+0x146/0x1b0 > [ 39.125304][ T6508] syscall_exit_to_user_mode+0x12/0x30 > [ 39.125304][ T6508] do_syscall_64+0x42/0xb0 > [ 39.125304][ T6508] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 39.125304][ T6508] RIP: 0033:0x44fbad > [ 39.125304][ T6508] Code: c3 e8 97 29 00 00 0f 1f 80 00 00 00 00 f3 0f 1e8 > [ 39.125304][ T6508] RSP: 002b:00007f4b8ae22d48 EFLAGS: 00000246 ORIG_RAX:e > [ 39.125304][ T6508] RAX: fffffffffffffffc RBX: 0000000000000000 RCX: 0000d > [ 39.125304][ T6508] RDX: 0000000000000186 RSI: 0000000020000000 RDI: 00003 > [ 39.125304][ T6508] RBP: 00007f4b8ae22d80 R08: 00007f4b8ae23700 R09: 00000 > [ 39.125304][ T6508] R10: 00007f4b8ae23700 R11: 0000000000000246 R12: 0000e > [ 39.125304][ T6508] R13: 00007ffe483304af R14: 00007ffe48330550 R15: 00000 > [ 39.125304][ T6508] </TASK> > ==================================== > > I will make a deeper look and try to patch it. The cause is that patch assigns False to pointer holding_mutex in rxrpc_send_data() by only juding whether the return value from rxrpc_wait_for_tx_window() is less than 0, yet this is not correct. So we should just only assign False to holding_mutex when unlocking the call->user_mutex in rxrpc_wait_for_tx_window_intr(). Maybe: diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 1d38e279e2ef..1050cc2f5c89 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -33,11 +33,16 @@ static bool rxrpc_check_tx_space(struct rxrpc_call *call, rxrpc_seq_t *_tx_win) } /* + * @holding_mutex: An indication whether caller can still holds + * the call->user_mutex when returned to caller. + * This argument can be NULL, which will effect nothing. + * * Wait for space to appear in the Tx queue or a signal to occur. */ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, struct rxrpc_call *call, - long *timeo) + long *timeo, + bool *holding_mutex) { for (;;) { set_current_state(TASK_INTERRUPTIBLE); @@ -53,8 +58,11 @@ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, trace_rxrpc_transmit(call, rxrpc_transmit_wait); mutex_unlock(&call->user_mutex); *timeo = schedule_timeout(*timeo); - if (mutex_lock_interruptible(&call->user_mutex) < 0) + if (mutex_lock_interruptible(&call->user_mutex) < 0) { + if (holding_mutex) + *holding_mutex = false; return sock_intr_errno(*timeo); + } } } @@ -121,13 +129,18 @@ static int rxrpc_wait_for_tx_window_nonintr(struct rxrpc_sock *rx, } /* + * @holding_mutex: An indication whether caller can still holds + * the call->user_mutex when returned to caller. + * This argument can be NULL, which will effect nothing. + * * wait for space to appear in the transmit/ACK window * - caller holds the socket locked */ static int rxrpc_wait_for_tx_window(struct rxrpc_sock *rx, struct rxrpc_call *call, long *timeo, - bool waitall) + bool waitall, + bool *holding_mutex) { DECLARE_WAITQUEUE(myself, current); int ret; @@ -142,7 +155,7 @@ static int rxrpc_wait_for_tx_window(struct rxrpc_sock *rx, if (waitall) ret = rxrpc_wait_for_tx_window_waitall(rx, call); else - ret = rxrpc_wait_for_tx_window_intr(rx, call, timeo); + ret = rxrpc_wait_for_tx_window_intr(rx, call, timeo, holding_mutex); break; case RXRPC_PREINTERRUPTIBLE: case RXRPC_UNINTERRUPTIBLE: @@ -284,13 +297,19 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, /* * send data through a socket + * + * @holding_mutex: An indication whether caller can still holds + * the call->user_mutex when returned to caller. + * This argument can be NULL, which will effect nothing. + * * - must be called in process context * - The caller holds the call user access mutex, but not the socket lock. */ static int rxrpc_send_data(struct rxrpc_sock *rx, struct rxrpc_call *call, struct msghdr *msg, size_t len, - rxrpc_notify_end_tx_t notify_end_tx) + rxrpc_notify_end_tx_t notify_end_tx, + bool *holding_mutex) { struct rxrpc_skb_priv *sp; struct sk_buff *skb; @@ -299,6 +318,13 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, bool more; int ret, copied; + /* + * The caller holds the call->user_mutex when calls + * rxrpc_send_data(), so initial it with True + */ + if (holding_mutex) + *holding_mutex = true; + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); /* this should be in poll */ @@ -337,7 +363,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, goto maybe_error; ret = rxrpc_wait_for_tx_window(rx, call, &timeo, - msg->msg_flags & MSG_WAITALL); + msg->msg_flags & MSG_WAITALL, + holding_mutex); if (ret < 0) goto maybe_error; } @@ -630,6 +657,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) struct rxrpc_call *call; unsigned long now, j; int ret; + bool holding_user_mutex; struct rxrpc_send_params p = { .call.tx_total_len = -1, @@ -747,7 +775,9 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) /* Reply phase not begun or not complete for service call. */ ret = -EPROTO; } else { - ret = rxrpc_send_data(rx, call, msg, len, NULL); + ret = rxrpc_send_data(rx, call, msg, len, NULL, &holding_user_mutex); + if (!holding_user_mutex) + goto error_put; } out_put_unlock: @@ -796,7 +826,7 @@ int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, case RXRPC_CALL_SERVER_ACK_REQUEST: case RXRPC_CALL_SERVER_SEND_REPLY: ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len, - notify_end_tx); + notify_end_tx, NULL); break; case RXRPC_CALL_COMPLETE: read_lock_bh(&call->state_lock); Reproducer did not trigger any issue locally. I will propose a test by syzbot. Maybe I will send v2 patch if patch can pass the syzbot tesing.
On Mon, Aug 22, 2022 at 09:04:14PM +0800, Hawkins Jiawei wrote: > static int rxrpc_send_data(struct rxrpc_sock *rx, > struct rxrpc_call *call, > struct msghdr *msg, size_t len, > - rxrpc_notify_end_tx_t notify_end_tx) > + rxrpc_notify_end_tx_t notify_end_tx, > + bool *holding_mutex) > { > struct rxrpc_skb_priv *sp; > struct sk_buff *skb; > @@ -299,6 +318,13 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, > bool more; > int ret, copied; > > + /* > + * The caller holds the call->user_mutex when calls > + * rxrpc_send_data(), so initial it with True > + */ > + if (holding_mutex) > + *holding_mutex = true; Don't make this optional. All callers have to care if we dropped the lock. regards, dan carpenter
> > /* > + * @holding_mutex: An indication whether caller can still holds > + * the call->user_mutex when returned to caller. Maybe we can use mutex_is_locked instead of using the holding_mutex parameter to get whether call->user_mutex is still held. https://www.kernel.org/doc/htmldocs/kernel-locking/API-mutex-is-locked.html > + * This argument can be NULL, which will effect nothing. > + * > * Wait for space to appear in the Tx queue or a signal to occur. > */ > thanks, -- Khalid Masum
On Mon, Aug 22, 2022 at 07:55:27PM +0600, Khalid Masum wrote: > > > > /* > > + * @holding_mutex: An indication whether caller can still holds > > + * the call->user_mutex when returned to caller. > > Maybe we can use mutex_is_locked instead of using the holding_mutex > parameter to get whether call->user_mutex is still held. That doesn't work. What if there is contention for the lock and someone else took it. Locks under contention are sort of the whole point of locking so it's highly likely. I do kind of feel the code has a layering violation. I'd prefer instead of "I am setting this variable to true to reflect the caller blah blah", the variable could just be set in the caller. I'd probably flip it around and call it "dropped_lock" instead of "holding_lock". bool dropped_lock = false; ... if (!dropped_lock) mutex_unlock(); That way only the callers and the function which drops the lock need to be involved with setting the variable. regards, dan carpenter
On Mon, 22 Aug 2022 at 22:06, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > On Mon, Aug 22, 2022 at 07:55:27PM +0600, Khalid Masum wrote: > > > > > > /* > > > + * @holding_mutex: An indication whether caller can still holds > > > + * the call->user_mutex when returned to caller. > > > > Maybe we can use mutex_is_locked instead of using the holding_mutex > > parameter to get whether call->user_mutex is still held. > > That doesn't work. What if there is contention for the lock and someone > else took it. Locks under contention are sort of the whole point of > locking so it's highly likely. I trid this before, it doesn't work as Dan points out. I think it seems that mutex_is_locked() only checks whether there is a task holding the mutex, but do not check whether it is current task holding mutex. I also tried lockdep_is_held(), lockdep_is_held() seems can detect call->user_mutex is still held accurately, although this cannot be the patch.
On August 22, 2022 9:39:43 PM GMT+06:00, Hawkins Jiawei <yin31149@gmail.com> wrote: >On Mon, 22 Aug 2022 at 22:06, Dan Carpenter <dan.carpenter@oracle.com> wrote: >I trid this before, it doesn't work as Dan points out. I think >it seems that mutex_is_locked() only checks whether there is a task I see. >holding the mutex, but do not check whether it is current task holding >mutex. I also tried lockdep_is_held(), lockdep_is_held() seems can detect Ok. >call->user_mutex is still held accurately, although this cannot be the patch. Useful to know! Thanks for the information. -- Khalid Masum
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 1d38e279e2ef..e13043d357d5 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -53,8 +53,10 @@ static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, trace_rxrpc_transmit(call, rxrpc_transmit_wait); mutex_unlock(&call->user_mutex); *timeo = schedule_timeout(*timeo); - if (mutex_lock_interruptible(&call->user_mutex) < 0) + if (mutex_lock_interruptible(&call->user_mutex) < 0) { + mutex_lock(&call->user_mutex); return sock_intr_errno(*timeo); + } } }
Syzkaller reports bad unlock balance bug as follows: ------------[ cut here ]------------ WARNING: bad unlock balance detected! syz-executor.0/4094 is trying to release lock (&call->user_mutex) at: [<ffffffff87c1d8d1>] rxrpc_do_sendmsg+0x851/0x1110 net/rxrpc/sendmsg.c:754 but there are no more locks to release! other info that might help us debug this: no locks held by syz-executor.0/4094. stack backtrace: [...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x57/0x7d lib/dump_stack.c:106 print_unlock_imbalance_bug include/trace/events/lock.h:69 [inline] __lock_release kernel/locking/lockdep.c:5333 [inline] lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5686 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:907 rxrpc_do_sendmsg+0x851/0x1110 net/rxrpc/sendmsg.c:754 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xab/0xe0 net/socket.c:734 ____sys_sendmsg+0x5c2/0x7a0 net/socket.c:2485 ___sys_sendmsg+0xdb/0x160 net/socket.c:2539 __sys_sendmsg+0xc3/0x160 net/socket.c:2568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> ------------------------------------ When kernel wants to send a message through an RxRPC socket in rxrpc_do_sendmsg(), kernel should hold the call->user_mutex lock, or it will triggers bug when releasing this lock before returning from rxrpc_do_sendmsg(). Yet the problem is that during rxrpc_do_sendmsg(), kernel may call rxrpc_wait_for_tx_window_intr() to wait for space to appear in the tx queue or a signal to occur. When kernel fails the mutex_lock_interruptible(), kernel will returns from the rxrpc_wait_for_tx_window_intr() without acquiring the mutex lock, then triggers bug when releasing the mutex lock in rxrpc_do_sendmsg(). This patch solves it by acquiring the call->user_mutex lock, when kernel fails the mutex_lock_interruptible() before returning from the rxrpc_wait_for_tx_window_intr(). Reported-and-tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> --- net/rxrpc/sendmsg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)