diff mbox series

[RFC] sunrpc: Ensure signalled RPC tasks exit

Message ID 20200401193559.6487.55107.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series [RFC] sunrpc: Ensure signalled RPC tasks exit | expand

Commit Message

Chuck Lever April 1, 2020, 7:37 p.m. UTC
If an RPC task is signaled while it is running and the transport is
not connected, it will never sleep and never be terminated. This can
happen when a RPC transport is shut down: the remaining tasks are
signalled, but the transport is disconnected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/sched.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

Interested in comments and suggestions.

Nearly every time my NFS/RDMA client unmounts when using krb5, the
umount hangs (killably). I tracked it down to an NFSv3 NULL request
that is signalled but loops and does not exit.

Comments

Trond Myklebust April 1, 2020, 9:55 p.m. UTC | #1
On Wed, 2020-04-01 at 15:37 -0400, Chuck Lever wrote:
> If an RPC task is signaled while it is running and the transport is
> not connected, it will never sleep and never be terminated. This can
> happen when a RPC transport is shut down: the remaining tasks are
> signalled, but the transport is disconnected.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/sched.c |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> Interested in comments and suggestions.
> 
> Nearly every time my NFS/RDMA client unmounts when using krb5, the
> umount hangs (killably). I tracked it down to an NFSv3 NULL request
> that is signalled but loops and does not exit.
> 
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 55e900255b0c..905c31f75593 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -905,6 +905,12 @@ static void __rpc_execute(struct rpc_task *task)
>  		trace_rpc_task_run_action(task, do_action);
>  		do_action(task);
>  
> +		if (RPC_SIGNALLED(task)) {
> +			task->tk_rpc_status = -ERESTARTSYS;
> +			rpc_exit(task, -ERESTARTSYS);
> +			break;
> +		}
> +

Hmm... I'd really prefer to avoid this kind of check in the tight loop.
Why is this NULL request looping?

>  		/*
>  		 * Lockless check for whether task is sleeping or not.
>  		 */
> @@ -912,14 +918,6 @@ static void __rpc_execute(struct rpc_task *task)
>  			continue;
>  
>  		/*
> -		 * Signalled tasks should exit rather than sleep.
> -		 */
> -		if (RPC_SIGNALLED(task)) {
> -			task->tk_rpc_status = -ERESTARTSYS;
> -			rpc_exit(task, -ERESTARTSYS);
> -		}
> -
> -		/*
>  		 * The queue->lock protects against races with
>  		 * rpc_make_runnable().
>  		 *
>
Chuck Lever April 2, 2020, 3:05 p.m. UTC | #2
> On Apr 1, 2020, at 5:55 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Wed, 2020-04-01 at 15:37 -0400, Chuck Lever wrote:
>> If an RPC task is signaled while it is running and the transport is
>> not connected, it will never sleep and never be terminated. This can
>> happen when a RPC transport is shut down: the remaining tasks are
>> signalled, but the transport is disconnected.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/sched.c |   14 ++++++--------
>> 1 file changed, 6 insertions(+), 8 deletions(-)
>> 
>> Interested in comments and suggestions.
>> 
>> Nearly every time my NFS/RDMA client unmounts when using krb5, the
>> umount hangs (killably). I tracked it down to an NFSv3 NULL request
>> that is signalled but loops and does not exit.
>> 
>> 
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 55e900255b0c..905c31f75593 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -905,6 +905,12 @@ static void __rpc_execute(struct rpc_task *task)
>> 		trace_rpc_task_run_action(task, do_action);
>> 		do_action(task);
>> 
>> +		if (RPC_SIGNALLED(task)) {
>> +			task->tk_rpc_status = -ERESTARTSYS;
>> +			rpc_exit(task, -ERESTARTSYS);
>> +			break;
>> +		}
>> +
> 
> Hmm... I'd really prefer to avoid this kind of check in the tight loop.

Certainly performance is a concern, but compared to the indirect
function call directly preceding this check, it is small potatoes.
Adding an "unlikely()" to the RPC_SIGNALLED() macro might help
mitigate the execution pipeline bubble it would cause.


> Why is this NULL request looping?

Here's an ftrace log I captured a couple of days ago.

      umount.nfs-7540  [004]   938.823064: rpc_task_begin:       task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=ACTIVE status=0 action=(nil)s
   kworker/u25:0-6092  [000]   938.823073: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_start
   kworker/u25:0-6092  [000]   938.823075: rpc_request:          task:18647@1 nfsv3 NULL (async)
   kworker/u25:0-6092  [000]   938.823076: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_reserve
   kworker/u25:0-6092  [000]   938.823078: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_reserveresult
   kworker/u25:0-6092  [000]   938.823079: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_refresh
   kworker/u25:0-6092  [000]   938.823080: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_refreshresult
   kworker/u25:0-6092  [000]   938.823081: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_allocate
   kworker/u25:0-6092  [000]   938.823084: xprtrdma_op_allocate: task:18647@1 req=0xffff8884691ca000 (1624, 60)
   kworker/u25:0-6092  [000]   938.823085: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE status=0 action=call_encode
   kworker/u25:0-6092  [000]   938.823088: rpcgss_seqno:         task:18647@1 xid=0xfd7e5fb8 seqno=21981
   kworker/u25:0-6092  [000]   938.823117: xprt_enq_xmit:        task:18647@1 xid=0xfd7e5fb8 seqno=21981 stage=4
   kworker/u25:0-6092  [000]   938.823118: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit
      umount.nfs-7540  [004]   938.823119: rpc_task_signalled:   task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_transmit_status
   kworker/u25:0-6092  [000]   938.823119: xprt_reserve_cong:    task:18647@1 snd_task:18647 cong=0 cwnd=16384
   kworker/u25:0-6092  [000]   938.823121: xprt_transmit:        task:18647@1 xid=0xfd7e5fb8 seqno=21981 status=-512
   kworker/u25:0-6092  [000]   938.823127: xprt_release_cong:    task:18647@1 snd_task:4294967295 cong=0 cwnd=16384
   kworker/u25:0-6092  [000]   938.823127: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_transmit_status
   kworker/u25:0-6092  [000]   938.823129: rpc_task_sleep:       task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0 queue=xprt_pending
   kworker/u25:0-6092  [000]   938.823132: rpc_task_wakeup:      task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107 timeout=60000 queue=xprt_pending
   kworker/u25:0-6092  [000]   938.823133: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=xprt_timer
   kworker/u25:0-6092  [000]   938.823133: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=call_status
   kworker/u25:0-6092  [000]   938.823136: rpc_call_status:      task:18647@1 status=-107
   kworker/u25:0-6092  [000]   938.823137: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_encode
   kworker/u25:0-6092  [000]   938.823138: rpcgss_seqno:         task:18647@1 xid=0xfd7e5fb8 seqno=21982
   kworker/u25:0-6092  [000]   938.823146: xprt_enq_xmit:        task:18647@1 xid=0xfd7e5fb8 seqno=21982 stage=4
   kworker/u25:0-6092  [000]   938.823147: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV|SIGNALLED status=0 action=call_transmit
   kworker/u25:0-6092  [000]   938.823147: xprt_reserve_cong:    task:18647@1 snd_task:18647 cong=0 cwnd=16384
   kworker/u25:0-6092  [000]   938.823148: xprt_transmit:        task:18647@1 xid=0xfd7e5fb8 seqno=21982 status=-512
   kworker/u25:0-6092  [000]   938.823149: xprt_release_cong:    task:18647@1 snd_task:4294967295 cong=0 cwnd=16384
   kworker/u25:0-6092  [000]   938.823149: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 action=call_transmit_status
   kworker/u25:0-6092  [000]   938.823150: rpc_task_sleep:       task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=0 timeout=0 queue=xprt_pending
   kworker/u25:0-6092  [000]   938.823151: rpc_task_wakeup:      task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|QUEUED|ACTIVE|NEED_RECV|SIGNALLED status=-107 timeout=120000 queue=xprt_pending
   kworker/u25:0-6092  [000]   938.823152: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=xprt_timer
   kworker/u25:0-6092  [000]   938.823152: rpc_task_run_action:  task:18647@1 flags=ASYNC|DYNAMIC|SOFT runstate=RUNNING|ACTIVE|NEED_RECV|SIGNALLED status=-107 action=call_status
   kworker/u25:0-6092  [000]   938.823153: rpc_call_status:      task:18647@1 status=-107

And here, the looping continues.

My analysis is that the signal occurs while the task is running/active
in a section of code that continues to iterate on reconnecting without
checking for a signal. Note that the tail of xprt_transmit will not set
tk_status in this case, so it remains set to zero upon entry to
call_transmit_status.

I like the scheduler fix because I don't believe we can easily tell if
there might be other such critical sections in the client's state
machine where a loop might occur with no exit condition based on a
signal.

This fix closes the door on all such bugs.


--
Chuck Lever
diff mbox series

Patch

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 55e900255b0c..905c31f75593 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -905,6 +905,12 @@  static void __rpc_execute(struct rpc_task *task)
 		trace_rpc_task_run_action(task, do_action);
 		do_action(task);
 
+		if (RPC_SIGNALLED(task)) {
+			task->tk_rpc_status = -ERESTARTSYS;
+			rpc_exit(task, -ERESTARTSYS);
+			break;
+		}
+
 		/*
 		 * Lockless check for whether task is sleeping or not.
 		 */
@@ -912,14 +918,6 @@  static void __rpc_execute(struct rpc_task *task)
 			continue;
 
 		/*
-		 * Signalled tasks should exit rather than sleep.
-		 */
-		if (RPC_SIGNALLED(task)) {
-			task->tk_rpc_status = -ERESTARTSYS;
-			rpc_exit(task, -ERESTARTSYS);
-		}
-
-		/*
 		 * The queue->lock protects against races with
 		 * rpc_make_runnable().
 		 *