diff mbox series

NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

Message ID 20201118002431.GA6941@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (mailing list archive)
State New
Headers show
Series NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE | expand

Commit Message

Anchal Agarwal Nov. 18, 2020, 12:24 a.m. UTC
If our CLOSE RPC call is rejected with an ERR_STALE error, then we
should remove the GETATTR call from the compound RPC and retry.
This could happen in a scenario where two clients tries to access
the same file. One client opens the file and the other client removes
the file while it's opened by first client. When the first client
attempts to close the file the server returns ESTALE and the file ends
up being leaked on the server. This depends on how nfs server is
configured and is not reproducible if running against nfsd.

Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
---
 fs/nfs/nfs4proc.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Trond Myklebust Nov. 18, 2020, 3:17 a.m. UTC | #1
On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> If our CLOSE RPC call is rejected with an ERR_STALE error, then we
> should remove the GETATTR call from the compound RPC and retry.
> This could happen in a scenario where two clients tries to access
> the same file. One client opens the file and the other client removes
> the file while it's opened by first client. When the first client
> attempts to close the file the server returns ESTALE and the file
> ends
> up being leaked on the server. This depends on how nfs server is
> configured and is not reproducible if running against nfsd.

That would be a seriously broken server. If you return NFS4ERR_STALE to
the client, you cannot expect any further interaction with that file
from the client. It won't try to send CLOSE or DELEGRETURN or any other
stateful operation.
Anchal Agarwal Nov. 18, 2020, 9:29 p.m. UTC | #2
On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > If our CLOSE RPC call is rejected with an ERR_STALE error, then we
> > should remove the GETATTR call from the compound RPC and retry.
> > This could happen in a scenario where two clients tries to access
> > the same file. One client opens the file and the other client removes
> > the file while it's opened by first client. When the first client
> > attempts to close the file the server returns ESTALE and the file
> > ends
> > up being leaked on the server. This depends on how nfs server is
> > configured and is not reproducible if running against nfsd.
> 
> That would be a seriously broken server. If you return NFS4ERR_STALE to
> the client, you cannot expect any further interaction with that file
> from the client. It won't try to send CLOSE or DELEGRETURN or any other
> stateful operation.
>
In this scenario, the setup we have at EFS is more of a distributed fashion. Multiple
clients are connected to multiple servers with a common filesystem. So the above
scenario leads to leaked open file handles on the client that tries to close deleted
file. So I was of the view, in that case client could retry close without getattr
in the close sequence without anything to do on server side.

Thanks,
Anchal Agarwal
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>
Trond Myklebust Nov. 18, 2020, 10:13 p.m. UTC | #3
On Wed, 2020-11-18 at 21:29 +0000, Anchal Agarwal wrote:
> On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the
> > sender and know the content is safe.
> > 
> > 
> > 
> > On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > > If our CLOSE RPC call is rejected with an ERR_STALE error, then
> > > we
> > > should remove the GETATTR call from the compound RPC and retry.
> > > This could happen in a scenario where two clients tries to access
> > > the same file. One client opens the file and the other client
> > > removes
> > > the file while it's opened by first client. When the first client
> > > attempts to close the file the server returns ESTALE and the file
> > > ends
> > > up being leaked on the server. This depends on how nfs server is
> > > configured and is not reproducible if running against nfsd.
> > 
> > That would be a seriously broken server. If you return
> > NFS4ERR_STALE to
> > the client, you cannot expect any further interaction with that
> > file
> > from the client. It won't try to send CLOSE or DELEGRETURN or any
> > other
> > stateful operation.
> > 
> In this scenario, the setup we have at EFS is more of a distributed
> fashion. Multiple
> clients are connected to multiple servers with a common filesystem.
> So the above
> scenario leads to leaked open file handles on the client that tries
> to close deleted
> file. So I was of the view, in that case client could retry close
> without getattr
> in the close sequence without anything to do on server side.


If you send the client an NFS4ERR_STALE, you are telling it that its
access to the file has been revoked. That is not a temporary error, it
is a fatal one. The client is not responsible for cleaning up any
state.
Anchal Agarwal Nov. 19, 2020, 7:24 p.m. UTC | #4
On Wed, Nov 18, 2020 at 10:13:16PM +0000, Trond Myklebust wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Wed, 2020-11-18 at 21:29 +0000, Anchal Agarwal wrote:
> > On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> > > CAUTION: This email originated from outside of the organization. Do
> > > not click links or open attachments unless you can confirm the
> > > sender and know the content is safe.
> > >
> > >
> > >
> > > On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > > > If our CLOSE RPC call is rejected with an ERR_STALE error, then
> > > > we
> > > > should remove the GETATTR call from the compound RPC and retry.
> > > > This could happen in a scenario where two clients tries to access
> > > > the same file. One client opens the file and the other client
> > > > removes
> > > > the file while it's opened by first client. When the first client
> > > > attempts to close the file the server returns ESTALE and the file
> > > > ends
> > > > up being leaked on the server. This depends on how nfs server is
> > > > configured and is not reproducible if running against nfsd.
> > >
> > > That would be a seriously broken server. If you return
> > > NFS4ERR_STALE to
> > > the client, you cannot expect any further interaction with that
> > > file
> > > from the client. It won't try to send CLOSE or DELEGRETURN or any
> > > other
> > > stateful operation.
> > >
> > In this scenario, the setup we have at EFS is more of a distributed
> > fashion. Multiple
> > clients are connected to multiple servers with a common filesystem.
> > So the above
> > scenario leads to leaked open file handles on the client that tries
> > to close deleted
> > file. So I was of the view, in that case client could retry close
> > without getattr
> > in the close sequence without anything to do on server side.
> 
> 
> If you send the client an NFS4ERR_STALE, you are telling it that its
> access to the file has been revoked. That is not a temporary error, it
> is a fatal one. The client is not responsible for cleaning up any
> state.
>
Ok, I get what you are saying. So from what I am understanding this is not
a valid error to be sent to client on close call and its the server who is doing
something fatally wrong and should be cleaning up its own state or basically not
be allowing to let this scenario happen.
Thanks for bearing with me.

--
Anchal
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>
diff mbox series

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9e0ca9b2b210..40e4259bc83e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3548,6 +3548,7 @@  static void nfs4_close_done(struct rpc_task *task, void *data)
 			res_stateid = &calldata->res.stateid;
 			renew_lease(server, calldata->timestamp);
 			break;
+		case -ESTALE:
 		case -NFS4ERR_ACCESS:
 			if (calldata->arg.bitmask != NULL) {
 				calldata->arg.bitmask = NULL;