diff mbox series

[RFC,2/4] NFS: Unset RPC_TASK_NO_RETRANS_TIMEOUT for session/clientid destruction

Message ID 162627782362.1294.9395366920293772038.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series Ensure RPC_TASK_NORTO is disabled for select operations | expand

Commit Message

Chuck Lever July 14, 2021, 3:50 p.m. UTC
In some rare failure modes, the server is actually reading the
transport, but then just dropping the requests on the floor.
TCP_USER_TIMEOUT cannot detect that case.

Prevent such a stuck server from pinning client resources
indefinitely by ensuring that session and client ID clean-up can
time out even if the connection is still operational.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/nfs4client.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Trond Myklebust July 14, 2021, 3:59 p.m. UTC | #1
On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote:
> In some rare failure modes, the server is actually reading the
> transport, but then just dropping the requests on the floor.
> TCP_USER_TIMEOUT cannot detect that case.
> 
> Prevent such a stuck server from pinning client resources
> indefinitely by ensuring that session and client ID clean-up can
> time out even if the connection is still operational.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfs/nfs4client.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 28431acd1230..c5032f784ac0 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -281,6 +281,7 @@ static void nfs4_destroy_callback(struct
> nfs_client *clp)
>  
>  static void nfs4_shutdown_client(struct nfs_client *clp)
>  {
> +       clp->cl_rpcclient->cl_noretranstimeo = 0;
>         if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state))
>                 nfs4_kill_renewd(clp);
>         clp->cl_mvops->shutdown_client(clp);
> 
> 

I can't see how this will help. Again, I suggest we rather turn off the
retransmission default for the RPC calls where the server can drop
stuff on the floor. That's really only the RPCSEC_GSS control
messages. 

Anything else is covered by the NFSv4 blanket ban on dropping RPC
calls.
diff mbox series

Patch

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 28431acd1230..c5032f784ac0 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -281,6 +281,7 @@  static void nfs4_destroy_callback(struct nfs_client *clp)
 
 static void nfs4_shutdown_client(struct nfs_client *clp)
 {
+	clp->cl_rpcclient->cl_noretranstimeo = 0;
 	if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state))
 		nfs4_kill_renewd(clp);
 	clp->cl_mvops->shutdown_client(clp);