Message ID | 20250114144101.2511043-1-lilingfeng3@huawei.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | SUNRPC: Set tk_rpc_status when RPC_TASK_SIGNALLED is detected | expand |
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 0090162ee8c3..0acdff19a37c 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2509,8 +2509,10 @@ rpc_check_timeout(struct rpc_task *task) { struct rpc_clnt *clnt = task->tk_client; - if (RPC_SIGNALLED(task)) + if (RPC_SIGNALLED(task)) { + rpc_call_rpcerror(task, -ERESTARTSYS); return; + } if (xprt_adjust_timeout(task->tk_rqstp) == 0) return;
Commit 39494194f93b("SUNRPC: Fix races with rpc_killall_tasks()") adds rpc_task_set_rpc_status before setting RPC_TASK_SIGNALLED in rpc_signal_task, ensuring that rpc_status is definitely set when RPC_TASK_SIGNALLED is set. Therefore, it seems unnecessary to set rpc_status again after detecting RPC_TASK_SIGNALLED in rpc_check_timeout. However, in some exceptional cases, invalid and endlessly looping rpc_tasks may be generated. The rpc_call_rpcerror in rpc_check_timeout can timely terminate such rpc_tasks. Removing rpc_call_rpcerror may cause the rpc_task to fall into an infinite loop. For example, in the following situation: nfs4_close_done // get err from server nfs4_async_handle_exception // goto out_restart // umount -f nfs_umount_begin rpc_killall_tasks rpc_signal_task rpc_task_set_rpc_status task->tk_rpc_status = -ERESTARTSYS set_bit // set RPC_TASK_SIGNALLED to tk_runstate rpc_restart_call_prepare __rpc_restart_call task->tk_rpc_status = 0 // clear tk_rpc_status ... rpc_prepare_task nfs4_close_prepare nfs4_setup_sequence rpc_call_start task->tk_action = call_start At this point, an rpc_task with RPC_TASK_SIGNALLED set but tk_rpc_status as 0 will be generated. This rpc_task will fall into the following loop: call_encode --> call_transmit --> call_transmit_status --> call_status --> call_encode. Since RPC_TASK_SIGNALLED is set, no request will be sent in call_transmit. Similarly, since RPC_TASK_SIGNALLED is set, rq_majortimeo will not be updated in call_status --> rpc_check_timeout, which will cause -ETIMEDOUT to be directly set to tk_status in call_transmit_status --> xprt_request_wait_receive --> xprt_wait_for_reply_request_def --> rpc_sleep_on_timeout --> __rpc_sleep_on_priority_timeout. Here is the state and loop process of the rpc_task: tk_runstate: RPC_TASK_RUNNING RPC_TASK_ACTIVE RPC_TASK_NEED_RECV RPC_TASK_SIGNALLED tk_xprt->state: XPRT_CONNECTED XPRT_BOUND tk_flags RPC_TASK_ASYNC RPC_TASK_MOVEABLE RPC_TASK_DYNAMIC RPC_TASK_SOFT RPC_TASK_NO_RETRANS_TIMEOUT RPC_TASK_CRED_NOREF call_encode xprt_request_enqueue_transmit set_bit // RPC_TASK_NEED_XMIT task->tk_action = call_transmit call_transmit task->tk_action = call_transmit_status xprt_transmit xprt_request_transmit // check RPC_TASK_SIGNALLED and goto out_dequeue xprt_request_dequeue_transmit xprt_request_dequeue_transmit_locked test_and_clear_bit // RPC_TASK_NEED_XMIT call_transmit_status task->tk_action = call_status xprt_request_wait_receive xprt_wait_for_reply_request_def xprt_request_timeout // get timeout req->rq_majortimeo // rq_majortimeo will not be updated rpc_sleep_on_timeout __rpc_sleep_on_priority_timeout task->tk_status = -ETIMEDOUT // set ETIMEDOUT call_status task->tk_action = call_encode rpc_check_timeout // check RPC_TASK_SIGNALLED and skip xprt_reset_majortimeo Fix it by adding rpc_call_rpcerror back. Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> --- net/sunrpc/clnt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)