Message ID | 162627782362.1294.9395366920293772038.stgit@manet.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Ensure RPC_TASK_NORTO is disabled for select operations | expand |
On Wed, 2021-07-14 at 11:50 -0400, Chuck Lever wrote: > In some rare failure modes, the server is actually reading the > transport, but then just dropping the requests on the floor. > TCP_USER_TIMEOUT cannot detect that case. > > Prevent such a stuck server from pinning client resources > indefinitely by ensuring that session and client ID clean-up can > time out even if the connection is still operational. > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfs/nfs4client.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > index 28431acd1230..c5032f784ac0 100644 > --- a/fs/nfs/nfs4client.c > +++ b/fs/nfs/nfs4client.c > @@ -281,6 +281,7 @@ static void nfs4_destroy_callback(struct > nfs_client *clp) > > static void nfs4_shutdown_client(struct nfs_client *clp) > { > + clp->cl_rpcclient->cl_noretranstimeo = 0; > if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state)) > nfs4_kill_renewd(clp); > clp->cl_mvops->shutdown_client(clp); > > I can't see how this will help. Again, I suggest we rather turn off the retransmission default for the RPC calls where the server can drop stuff on the floor. That's really only the RPCSEC_GSS control messages. Anything else is covered by the NFSv4 blanket ban on dropping RPC calls.
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 28431acd1230..c5032f784ac0 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -281,6 +281,7 @@ static void nfs4_destroy_callback(struct nfs_client *clp) static void nfs4_shutdown_client(struct nfs_client *clp) { + clp->cl_rpcclient->cl_noretranstimeo = 0; if (__test_and_clear_bit(NFS_CS_RENEWD, &clp->cl_res_state)) nfs4_kill_renewd(clp); clp->cl_mvops->shutdown_client(clp);
In some rare failure modes, the server is actually reading the transport, but then just dropping the requests on the floor. TCP_USER_TIMEOUT cannot detect that case. Prevent such a stuck server from pinning client resources indefinitely by ensuring that session and client ID clean-up can time out even if the connection is still operational. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- fs/nfs/nfs4client.c | 1 + 1 file changed, 1 insertion(+)