diff mbox series

NFS: Handle missing attributes in OPEN reply

Message ID 167279203612.13974.15063003557908413815@noble.neil.brown.name (mailing list archive)
State New, archived
Headers show
Series NFS: Handle missing attributes in OPEN reply | expand

Commit Message

NeilBrown Jan. 4, 2023, 12:27 a.m. UTC
If a NFSv4 OPEN reply reports that the file was successfully opened but
the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone
GETATTR request.  If that also fails, handling of the reply is aborted
with error -EAGAIN and the open is attempted again from the start.

This leaves the server with an active state (because the OPEN operation
succeeded) which the client doesn't know about.  If the open-owner
(local user) did not have the file already open, this has minimal
consequences for the client and only causes the server to spend
resources on an open state that will never be used or explicitly closed.

If the open-owner DID already have the file open, then it will hold a
reference to the open-state for that file, but the seq-id in the
state-id will now be out-of-sync with the server.  The server will have
incremented the seq-id, but the client will not have noticed.  So when
the client next attempts to access the file using that state (READ,
WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.

The Linux-client assumes this error is due to a race and simply retries
on the basis that the local state-id information should have been
updated by another thread.  This basis is invalid in this case and the
result is an infinite loop attempting IO and getting OLD_STATEID.

This has been observed with a NetApp Filer as the server (ONTAP 9.8 p5,
using NFSv4.0).  The client is creating, writing, and unlinking a
particular file from multiple clients (.bash_history).  If a new OPEN
from one client races with a REMOVE from another client while the first
client already has the file open, the Filer can report success for the
OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the OPEN
request.  This gets the seq-id out-of-sync and a subsequent write to the
other open on the first client causes the infinite loop to occur.

The reason that the client returns -EAGAIN is that it needs to find the
inode so it can find the associated state to update the seq-id, but the
inode lookup requires the file-id which is provided in the GETATTR
reply.  Without the file-id normal inode lookup cannot be used.

This patch changes the lookup so that when the file-id is not available
the list of states owned by the open-owner is examined to find the state
with the correct state-id (ignoring the seq-id part of the state-id).
If this is found it is used just as when a normal inode lookup finds an
inode.  If it isn't found, -EAGAIN is returned as before.

This bug can be demonstrated by modifying the Linux NFS server as
follows:

1/ The second time a file is opened, unlink it.  This simulates
   a race with another client, without needing to have a race:

    --- a/fs/nfsd/nfs4proc.c
    +++ b/fs/nfsd/nfs4proc.c
    @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
     	if (reclaim && !status)
     		nn->somebody_reclaimed = true;
     out:
    +	if (!status && open->op_stateid.si_generation > 1) {
    +		printk("Opening gen %d\n", (int)open->op_stateid.si_generation);
    +		vfs_unlink(mnt_user_ns(resfh->fh_export->ex_path.mnt),
    +			   resfh->fh_dentry->d_parent->d_inode,
    +			   resfh->fh_dentry, NULL);
    +	}
     	if (open->op_filp) {
     		fput(open->op_filp);
     		open->op_filp = NULL;

2/ When a GETATTR op is attempted on an unlinked file, return ESTALE

    @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
     	if (status)
     		return status;

    +	if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) {
    +		printk("Return Estale for unlinked file\n");
    +		return nfserr_stale;
    +	}
    +
     	if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
     		return nfserr_inval;

Then mount the filesystem and

  Thread 1            Thread 2
  open a file
                      open the same file (will fail)
  write to that file

I use this shell fragment, using 'sleep' for synchronisation.
The use of /bin/echo ensures the write is flushed when /bin/echo closes
the fd on exit.

    (
        /bin/echo hello
        sleep 3
        /bin/echo there
    ) > /import/A/afile &
    sleep 3
    cat /import/A/afile

Probably when the OPEN succeeds, the GETATTR fails, and we don't already
have the state open, we should explicitly close the state.  Leaving it
open could cause problems if, for example, the server revoked it and
signalled the client that there was a revoked state.  The client would
not be able to find the state that needed to be relinquished.  I haven't
attempted to implement this.

For stable backports prior to 4.14 the nfs_fhget() that needs an
alternate is in _nfs4_opendata_to_nfs4_state().

Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/nfs4_fs.h   |  1 +
 fs/nfs/nfs4proc.c  | 18 ++++++++++++++----
 fs/nfs/nfs4state.c | 17 +++++++++++++++++
 3 files changed, 32 insertions(+), 4 deletions(-)

Comments

Trond Myklebust Jan. 4, 2023, 12:46 a.m. UTC | #1
On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:
> 
> If a NFSv4 OPEN reply reports that the file was successfully opened
> but
> the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone
> GETATTR request.  If that also fails, handling of the reply is
> aborted
> with error -EAGAIN and the open is attempted again from the start.
> 
> This leaves the server with an active state (because the OPEN
> operation
> succeeded) which the client doesn't know about.  If the open-owner
> (local user) did not have the file already open, this has minimal
> consequences for the client and only causes the server to spend
> resources on an open state that will never be used or explicitly
> closed.
> 
> If the open-owner DID already have the file open, then it will hold a
> reference to the open-state for that file, but the seq-id in the
> state-id will now be out-of-sync with the server.  The server will
> have
> incremented the seq-id, but the client will not have noticed.  So
> when
> the client next attempts to access the file using that state (READ,
> WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.
> 
> The Linux-client assumes this error is due to a race and simply
> retries
> on the basis that the local state-id information should have been
> updated by another thread.  This basis is invalid in this case and
> the
> result is an infinite loop attempting IO and getting OLD_STATEID.
> 
> This has been observed with a NetApp Filer as the server (ONTAP 9.8
> p5,
> using NFSv4.0).  The client is creating, writing, and unlinking a
> particular file from multiple clients (.bash_history).  If a new OPEN
> from one client races with a REMOVE from another client while the
> first
> client already has the file open, the Filer can report success for
> the
> OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the OPEN
> request.  This gets the seq-id out-of-sync and a subsequent write to
> the
> other open on the first client causes the infinite loop to occur.
> 
> The reason that the client returns -EAGAIN is that it needs to find
> the
> inode so it can find the associated state to update the seq-id, but
> the
> inode lookup requires the file-id which is provided in the GETATTR
> reply.  Without the file-id normal inode lookup cannot be used.
> 
> This patch changes the lookup so that when the file-id is not
> available
> the list of states owned by the open-owner is examined to find the
> state
> with the correct state-id (ignoring the seq-id part of the state-id).
> If this is found it is used just as when a normal inode lookup finds
> an
> inode.  If it isn't found, -EAGAIN is returned as before.
> 
> This bug can be demonstrated by modifying the Linux NFS server as
> follows:
> 
> 1/ The second time a file is opened, unlink it.  This simulates
>    a race with another client, without needing to have a race:
> 
>     --- a/fs/nfsd/nfs4proc.c
>     +++ b/fs/nfsd/nfs4proc.c
>     @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct
> nfsd4_compound_state *cstate,
>         if (reclaim && !status)
>                 nn->somebody_reclaimed = true;
>      out:
>     +   if (!status && open->op_stateid.si_generation > 1) {
>     +           printk("Opening gen %d\n", (int)open-
> >op_stateid.si_generation);
>     +           vfs_unlink(mnt_user_ns(resfh->fh_export-
> >ex_path.mnt),
>     +                      resfh->fh_dentry->d_parent->d_inode,
>     +                      resfh->fh_dentry, NULL);
>     +   }
>         if (open->op_filp) {
>                 fput(open->op_filp);
>                 open->op_filp = NULL;
> 
> 2/ When a GETATTR op is attempted on an unlinked file, return ESTALE
> 
>     @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct
> nfsd4_compound_state *cstate,
>         if (status)
>                 return status;
> 
>     +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) {
>     +           printk("Return Estale for unlinked file\n");
>     +           return nfserr_stale;
>     +   }
>     +
>         if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
>                 return nfserr_inval;
> 
> Then mount the filesystem and
> 
>   Thread 1            Thread 2
>   open a file
>                       open the same file (will fail)
>   write to that file
> 
> I use this shell fragment, using 'sleep' for synchronisation.
> The use of /bin/echo ensures the write is flushed when /bin/echo
> closes
> the fd on exit.
> 
>     (
>         /bin/echo hello
>         sleep 3
>         /bin/echo there
>     ) > /import/A/afile &
>     sleep 3
>     cat /import/A/afile
> 
> Probably when the OPEN succeeds, the GETATTR fails, and we don't
> already
> have the state open, we should explicitly close the state.  Leaving
> it
> open could cause problems if, for example, the server revoked it and
> signalled the client that there was a revoked state.  The client
> would
> not be able to find the state that needed to be relinquished.  I
> haven't
> attempted to implement this.


If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
we care about stateid values? Just mark the inode as stale and drop it
on the floor.

If the server tries to declare the state as revoked, then it is clearly
borken, so let NetApp fix their own bug.
NeilBrown Jan. 4, 2023, 1:01 a.m. UTC | #2
On Wed, 04 Jan 2023, Trond Myklebust wrote:
> 
> 
> If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> we care about stateid values? Just mark the inode as stale and drop it
> on the floor.

We have a valid state from the server - we really shouldn't just ignore
it.

Maybe it would be OK to mark the inode stale.  I don't know if various
retry loops abort properly when the inode is stale.

> 
> If the server tries to declare the state as revoked, then it is clearly
> borken, so let NetApp fix their own bug.

That's probably fair.

Thanks,
NeilBrown

> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 
>
Trond Myklebust Jan. 4, 2023, 1:15 a.m. UTC | #3
On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > 
> > 
> > If the server starts to reply NFS4ERR_STALE to GETATTR requests,
> > why do
> > we care about stateid values? Just mark the inode as stale and drop
> > it
> > on the floor.
> 
> We have a valid state from the server - we really shouldn't just
> ignore
> it.
> 
> Maybe it would be OK to mark the inode stale.  I don't know if
> various
> retry loops abort properly when the inode is stale.

Yes, they are all supposed to do that. Otherwise we would end up
looping forever in close(), for instance, since the PUTFH, GETATTR and
CLOSE can all return NFS4ERR_STALE as well.
Olga Kornievskaia Jan. 4, 2023, 2:02 a.m. UTC | #4
On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote:
>
> On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:
> >
> > If a NFSv4 OPEN reply reports that the file was successfully opened
> > but
> > the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone
> > GETATTR request.  If that also fails, handling of the reply is
> > aborted
> > with error -EAGAIN and the open is attempted again from the start.
> >
> > This leaves the server with an active state (because the OPEN
> > operation
> > succeeded) which the client doesn't know about.  If the open-owner
> > (local user) did not have the file already open, this has minimal
> > consequences for the client and only causes the server to spend
> > resources on an open state that will never be used or explicitly
> > closed.
> >
> > If the open-owner DID already have the file open, then it will hold a
> > reference to the open-state for that file, but the seq-id in the
> > state-id will now be out-of-sync with the server.  The server will
> > have
> > incremented the seq-id, but the client will not have noticed.  So
> > when
> > the client next attempts to access the file using that state (READ,
> > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.
> >
> > The Linux-client assumes this error is due to a race and simply
> > retries
> > on the basis that the local state-id information should have been
> > updated by another thread.  This basis is invalid in this case and
> > the
> > result is an infinite loop attempting IO and getting OLD_STATEID.
> >
> > This has been observed with a NetApp Filer as the server (ONTAP 9.8
> > p5,
> > using NFSv4.0).  The client is creating, writing, and unlinking a
> > particular file from multiple clients (.bash_history).  If a new OPEN
> > from one client races with a REMOVE from another client while the
> > first
> > client already has the file open, the Filer can report success for
> > the
> > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the OPEN
> > request.  This gets the seq-id out-of-sync and a subsequent write to
> > the
> > other open on the first client causes the infinite loop to occur.
> >
> > The reason that the client returns -EAGAIN is that it needs to find
> > the
> > inode so it can find the associated state to update the seq-id, but
> > the
> > inode lookup requires the file-id which is provided in the GETATTR
> > reply.  Without the file-id normal inode lookup cannot be used.
> >
> > This patch changes the lookup so that when the file-id is not
> > available
> > the list of states owned by the open-owner is examined to find the
> > state
> > with the correct state-id (ignoring the seq-id part of the state-id).
> > If this is found it is used just as when a normal inode lookup finds
> > an
> > inode.  If it isn't found, -EAGAIN is returned as before.
> >
> > This bug can be demonstrated by modifying the Linux NFS server as
> > follows:
> >
> > 1/ The second time a file is opened, unlink it.  This simulates
> >    a race with another client, without needing to have a race:
> >
> >     --- a/fs/nfsd/nfs4proc.c
> >     +++ b/fs/nfsd/nfs4proc.c
> >     @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct
> > nfsd4_compound_state *cstate,
> >         if (reclaim && !status)
> >                 nn->somebody_reclaimed = true;
> >      out:
> >     +   if (!status && open->op_stateid.si_generation > 1) {
> >     +           printk("Opening gen %d\n", (int)open-
> > >op_stateid.si_generation);
> >     +           vfs_unlink(mnt_user_ns(resfh->fh_export-
> > >ex_path.mnt),
> >     +                      resfh->fh_dentry->d_parent->d_inode,
> >     +                      resfh->fh_dentry, NULL);
> >     +   }
> >         if (open->op_filp) {
> >                 fput(open->op_filp);
> >                 open->op_filp = NULL;
> >
> > 2/ When a GETATTR op is attempted on an unlinked file, return ESTALE
> >
> >     @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct
> > nfsd4_compound_state *cstate,
> >         if (status)
> >                 return status;
> >
> >     +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) {
> >     +           printk("Return Estale for unlinked file\n");
> >     +           return nfserr_stale;
> >     +   }
> >     +
> >         if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
> >                 return nfserr_inval;
> >
> > Then mount the filesystem and
> >
> >   Thread 1            Thread 2
> >   open a file
> >                       open the same file (will fail)
> >   write to that file
> >
> > I use this shell fragment, using 'sleep' for synchronisation.
> > The use of /bin/echo ensures the write is flushed when /bin/echo
> > closes
> > the fd on exit.
> >
> >     (
> >         /bin/echo hello
> >         sleep 3
> >         /bin/echo there
> >     ) > /import/A/afile &
> >     sleep 3
> >     cat /import/A/afile
> >
> > Probably when the OPEN succeeds, the GETATTR fails, and we don't
> > already
> > have the state open, we should explicitly close the state.  Leaving
> > it
> > open could cause problems if, for example, the server revoked it and
> > signalled the client that there was a revoked state.  The client
> > would
> > not be able to find the state that needed to be relinquished.  I
> > haven't
> > attempted to implement this.
>
>
> If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> we care about stateid values?

It is acceptable for the server to return ESTALE to the GETATTR after
the processing the open (due to a REMOVE that comes in) and that open
generating a valid stateid which client should care about when there
are pre-existing opens. The server will keep the state of an existing
opens valid even if the file is removed. Which is what's happening,
the previous open is being used for IO but the stateid is updated on
the server but not on the client.

> Just mark the inode as stale and drop it
> on the floor.

Why would that be correct? Any pre-existing opens should continue
operating, thus the inode can't be marked stale. We don't do it now
(silly rename allows preexisting IO to continue).

> If the server tries to declare the state as revoked, then it is clearly
> borken, so let NetApp fix their own bug.

The server does not declare the state revoked. The open succeeded and
its state's seqid was updated but the client is using an old stateid.
Server isn't at fault here.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>
NeilBrown Jan. 4, 2023, 2:12 a.m. UTC | #5
On Wed, 04 Jan 2023, Trond Myklebust wrote:
> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > 
> > > 
> > > If the server starts to reply NFS4ERR_STALE to GETATTR requests,
> > > why do
> > > we care about stateid values? Just mark the inode as stale and drop
> > > it
> > > on the floor.
> > 
> > We have a valid state from the server - we really shouldn't just
> > ignore
> > it.
> > 
> > Maybe it would be OK to mark the inode stale.  I don't know if
> > various
> > retry loops abort properly when the inode is stale.
> 
> Yes, they are all supposed to do that. Otherwise we would end up
> looping forever in close(), for instance, since the PUTFH, GETATTR and
> CLOSE can all return NFS4ERR_STALE as well.

To mark the inode as STALE we still need to find the inode, and that is
the key bit missing in the current code.  Once we find the inode, we
could mark it stale, but maybe some other error resulted in the missing
GETATTR...

It might make sense to put the new code in _nfs4_proc_open() after the
explicit nfs4_proc_getattr() fails.  We would need to find the inode
given only the filehandle.  Is there any easy way to do that?

Thanks,
NeilBrown
Trond Myklebust Jan. 4, 2023, 2:17 a.m. UTC | #6
On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote:
> On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org>
> wrote:
> > 
> > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:
> > > 
> > > If a NFSv4 OPEN reply reports that the file was successfully
> > > opened
> > > but
> > > the subsequent GETATTR fails, Linux-NFS will attempt a stand-
> > > alone
> > > GETATTR request.  If that also fails, handling of the reply is
> > > aborted
> > > with error -EAGAIN and the open is attempted again from the
> > > start.
> > > 
> > > This leaves the server with an active state (because the OPEN
> > > operation
> > > succeeded) which the client doesn't know about.  If the open-
> > > owner
> > > (local user) did not have the file already open, this has minimal
> > > consequences for the client and only causes the server to spend
> > > resources on an open state that will never be used or explicitly
> > > closed.
> > > 
> > > If the open-owner DID already have the file open, then it will
> > > hold a
> > > reference to the open-state for that file, but the seq-id in the
> > > state-id will now be out-of-sync with the server.  The server
> > > will
> > > have
> > > incremented the seq-id, but the client will not have noticed.  So
> > > when
> > > the client next attempts to access the file using that state
> > > (READ,
> > > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.
> > > 
> > > The Linux-client assumes this error is due to a race and simply
> > > retries
> > > on the basis that the local state-id information should have been
> > > updated by another thread.  This basis is invalid in this case
> > > and
> > > the
> > > result is an infinite loop attempting IO and getting OLD_STATEID.
> > > 
> > > This has been observed with a NetApp Filer as the server (ONTAP
> > > 9.8
> > > p5,
> > > using NFSv4.0).  The client is creating, writing, and unlinking a
> > > particular file from multiple clients (.bash_history).  If a new
> > > OPEN
> > > from one client races with a REMOVE from another client while the
> > > first
> > > client already has the file open, the Filer can report success
> > > for
> > > the
> > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the
> > > OPEN
> > > request.  This gets the seq-id out-of-sync and a subsequent write
> > > to
> > > the
> > > other open on the first client causes the infinite loop to occur.
> > > 
> > > The reason that the client returns -EAGAIN is that it needs to
> > > find
> > > the
> > > inode so it can find the associated state to update the seq-id,
> > > but
> > > the
> > > inode lookup requires the file-id which is provided in the
> > > GETATTR
> > > reply.  Without the file-id normal inode lookup cannot be used.
> > > 
> > > This patch changes the lookup so that when the file-id is not
> > > available
> > > the list of states owned by the open-owner is examined to find
> > > the
> > > state
> > > with the correct state-id (ignoring the seq-id part of the state-
> > > id).
> > > If this is found it is used just as when a normal inode lookup
> > > finds
> > > an
> > > inode.  If it isn't found, -EAGAIN is returned as before.
> > > 
> > > This bug can be demonstrated by modifying the Linux NFS server as
> > > follows:
> > > 
> > > 1/ The second time a file is opened, unlink it.  This simulates
> > >    a race with another client, without needing to have a race:
> > > 
> > >     --- a/fs/nfsd/nfs4proc.c
> > >     +++ b/fs/nfsd/nfs4proc.c
> > >     @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp,
> > > struct
> > > nfsd4_compound_state *cstate,
> > >         if (reclaim && !status)
> > >                 nn->somebody_reclaimed = true;
> > >      out:
> > >     +   if (!status && open->op_stateid.si_generation > 1) {
> > >     +           printk("Opening gen %d\n", (int)open-
> > > > op_stateid.si_generation);
> > >     +           vfs_unlink(mnt_user_ns(resfh->fh_export-
> > > > ex_path.mnt),
> > >     +                      resfh->fh_dentry->d_parent->d_inode,
> > >     +                      resfh->fh_dentry, NULL);
> > >     +   }
> > >         if (open->op_filp) {
> > >                 fput(open->op_filp);
> > >                 open->op_filp = NULL;
> > > 
> > > 2/ When a GETATTR op is attempted on an unlinked file, return
> > > ESTALE
> > > 
> > >     @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp,
> > > struct
> > > nfsd4_compound_state *cstate,
> > >         if (status)
> > >                 return status;
> > > 
> > >     +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0)
> > > {
> > >     +           printk("Return Estale for unlinked file\n");
> > >     +           return nfserr_stale;
> > >     +   }
> > >     +
> > >         if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
> > >                 return nfserr_inval;
> > > 
> > > Then mount the filesystem and
> > > 
> > >   Thread 1            Thread 2
> > >   open a file
> > >                       open the same file (will fail)
> > >   write to that file
> > > 
> > > I use this shell fragment, using 'sleep' for synchronisation.
> > > The use of /bin/echo ensures the write is flushed when /bin/echo
> > > closes
> > > the fd on exit.
> > > 
> > >     (
> > >         /bin/echo hello
> > >         sleep 3
> > >         /bin/echo there
> > >     ) > /import/A/afile &
> > >     sleep 3
> > >     cat /import/A/afile
> > > 
> > > Probably when the OPEN succeeds, the GETATTR fails, and we don't
> > > already
> > > have the state open, we should explicitly close the state. 
> > > Leaving
> > > it
> > > open could cause problems if, for example, the server revoked it
> > > and
> > > signalled the client that there was a revoked state.  The client
> > > would
> > > not be able to find the state that needed to be relinquished.  I
> > > haven't
> > > attempted to implement this.
> > 
> > 
> > If the server starts to reply NFS4ERR_STALE to GETATTR requests,
> > why do
> > we care about stateid values?
> 
> It is acceptable for the server to return ESTALE to the GETATTR after
> the processing the open (due to a REMOVE that comes in) and that open
> generating a valid stateid which client should care about when there
> are pre-existing opens. The server will keep the state of an existing
> opens valid even if the file is removed. Which is what's happening,
> the previous open is being used for IO but the stateid is updated on
> the server but not on the client.
> 
> > Just mark the inode as stale and drop it
> > on the floor.
> 
> Why would that be correct? Any pre-existing opens should continue
> operating, thus the inode can't be marked stale. We don't do it now
> (silly rename allows preexisting IO to continue).
> 
> > If the server tries to declare the state as revoked, then it is
> > clearly
> > borken, so let NetApp fix their own bug.
> 
> The server does not declare the state revoked. The open succeeded and
> its state's seqid was updated but the client is using an old stateid.
> Server isn't at fault here.

If the client can't send a GETATTR, then it can't revalidate
attributes, do I/O, and it can't even close the file. So we're not
going to waste time and effort trying to support this, whether or not
NetApp thinks it is a good idea.
NeilBrown Jan. 4, 2023, 2:19 a.m. UTC | #7
On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
> On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote:
> >
> >
> > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> > we care about stateid values?
> 
> It is acceptable for the server to return ESTALE to the GETATTR after
> the processing the open (due to a REMOVE that comes in) and that open
> generating a valid stateid which client should care about when there
> are pre-existing opens. The server will keep the state of an existing
> opens valid even if the file is removed. Which is what's happening,
> the previous open is being used for IO but the stateid is updated on
> the server but not on the client.

I agree that it is acceptable to return ESTALE to the GETATTR, but
having done that I don't think it is acceptable for a PUTFH of the same
filehandle to succeed.  Certainly any attempt to again use the
filehandle after the PUTFH should fail with NFS4ERR_STALE.

RFC7530 says

13.1.2.7.  NFS4ERR_STALE (Error Code 70)

   The current or saved filehandle value designating an argument to the
   current operation is invalid.  The file system object referred to by
   that filehandle no longer exists, or access to it has been revoked.

So the file doesn't exist or access has been revoked.  So any writes
should fail.  Failing with OLD_STATEID is weird - and having writes
succeed if we use the correct stateid is also odd.  Failing with STALE
would be perfectly sensible and I suspect the Linux client would handle
that just fine.

Thanks,
NeilBrown
NeilBrown Jan. 4, 2023, 2:34 a.m. UTC | #8
On Wed, 04 Jan 2023, NeilBrown wrote:
> On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
> > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote:
> > >
> > >
> > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> > > we care about stateid values?
> > 
> > It is acceptable for the server to return ESTALE to the GETATTR after
> > the processing the open (due to a REMOVE that comes in) and that open
> > generating a valid stateid which client should care about when there
> > are pre-existing opens. The server will keep the state of an existing
> > opens valid even if the file is removed. Which is what's happening,
> > the previous open is being used for IO but the stateid is updated on
> > the server but not on the client.
> 
> I agree that it is acceptable to return ESTALE to the GETATTR, but
> having done that I don't think it is acceptable for a PUTFH of the same
> filehandle to succeed.  Certainly any attempt to again use the
> filehandle after the PUTFH should fail with NFS4ERR_STALE.
> 
> RFC7530 says
> 
> 13.1.2.7.  NFS4ERR_STALE (Error Code 70)
> 
>    The current or saved filehandle value designating an argument to the
>    current operation is invalid.  The file system object referred to by
>    that filehandle no longer exists, or access to it has been revoked.
> 
> So the file doesn't exist or access has been revoked.  So any writes
> should fail.  Failing with OLD_STATEID is weird - and having writes
> succeed if we use the correct stateid is also odd.  Failing with STALE
> would be perfectly sensible and I suspect the Linux client would handle
> that just fine.

I checked a recent tcpdump (with patched SLE kernel talking to Netapp)
and I see that the writes don't succeed after the first NFS4ERR_STALE.

If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE.
If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID.

So it seems that it just has these two checks in the wrong order.

NeilBrown
Olga Kornievskaia Jan. 4, 2023, 3:03 p.m. UTC | #9
On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 04 Jan 2023, NeilBrown wrote:
> > On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
> > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote:
> > > >
> > > >
> > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> > > > we care about stateid values?
> > >
> > > It is acceptable for the server to return ESTALE to the GETATTR after
> > > the processing the open (due to a REMOVE that comes in) and that open
> > > generating a valid stateid which client should care about when there
> > > are pre-existing opens. The server will keep the state of an existing
> > > opens valid even if the file is removed. Which is what's happening,
> > > the previous open is being used for IO but the stateid is updated on
> > > the server but not on the client.
> >
> > I agree that it is acceptable to return ESTALE to the GETATTR, but
> > having done that I don't think it is acceptable for a PUTFH of the same
> > filehandle to succeed.  Certainly any attempt to again use the
> > filehandle after the PUTFH should fail with NFS4ERR_STALE.
> >
> > RFC7530 says
> >
> > 13.1.2.7.  NFS4ERR_STALE (Error Code 70)
> >
> >    The current or saved filehandle value designating an argument to the
> >    current operation is invalid.  The file system object referred to by
> >    that filehandle no longer exists, or access to it has been revoked.
> >
> > So the file doesn't exist or access has been revoked.  So any writes
> > should fail.  Failing with OLD_STATEID is weird - and having writes
> > succeed if we use the correct stateid is also odd.  Failing with STALE
> > would be perfectly sensible and I suspect the Linux client would handle
> > that just fine.
>
> I checked a recent tcpdump (with patched SLE kernel talking to Netapp)
> and I see that the writes don't succeed after the first NFS4ERR_STALE.
>
> If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE.
> If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID.
>
> So it seems that it just has these two checks in the wrong order.

Does Netapp return ERR_STALE on the PUTFH if the stateid is correct?
If it's the WRITE operation that returns an error and if the server
has multiple errors (ie bad statid and bad file handle)` then I don't
think the spec says which error is more important. In this case, I
think the server should fail PUTFH with ERR_STALE.

>
> NeilBrown
Olga Kornievskaia Jan. 4, 2023, 3:09 p.m. UTC | #10
On Tue, Jan 3, 2023 at 9:17 PM Trond Myklebust <trondmy@kernel.org> wrote:
>
> On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote:
> > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org>
> > wrote:
> > >
> > > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:
> > > >
> > > > If a NFSv4 OPEN reply reports that the file was successfully
> > > > opened
> > > > but
> > > > the subsequent GETATTR fails, Linux-NFS will attempt a stand-
> > > > alone
> > > > GETATTR request.  If that also fails, handling of the reply is
> > > > aborted
> > > > with error -EAGAIN and the open is attempted again from the
> > > > start.
> > > >
> > > > This leaves the server with an active state (because the OPEN
> > > > operation
> > > > succeeded) which the client doesn't know about.  If the open-
> > > > owner
> > > > (local user) did not have the file already open, this has minimal
> > > > consequences for the client and only causes the server to spend
> > > > resources on an open state that will never be used or explicitly
> > > > closed.
> > > >
> > > > If the open-owner DID already have the file open, then it will
> > > > hold a
> > > > reference to the open-state for that file, but the seq-id in the
> > > > state-id will now be out-of-sync with the server.  The server
> > > > will
> > > > have
> > > > incremented the seq-id, but the client will not have noticed.  So
> > > > when
> > > > the client next attempts to access the file using that state
> > > > (READ,
> > > > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.
> > > >
> > > > The Linux-client assumes this error is due to a race and simply
> > > > retries
> > > > on the basis that the local state-id information should have been
> > > > updated by another thread.  This basis is invalid in this case
> > > > and
> > > > the
> > > > result is an infinite loop attempting IO and getting OLD_STATEID.
> > > >
> > > > This has been observed with a NetApp Filer as the server (ONTAP
> > > > 9.8
> > > > p5,
> > > > using NFSv4.0).  The client is creating, writing, and unlinking a
> > > > particular file from multiple clients (.bash_history).  If a new
> > > > OPEN
> > > > from one client races with a REMOVE from another client while the
> > > > first
> > > > client already has the file open, the Filer can report success
> > > > for
> > > > the
> > > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the
> > > > OPEN
> > > > request.  This gets the seq-id out-of-sync and a subsequent write
> > > > to
> > > > the
> > > > other open on the first client causes the infinite loop to occur.
> > > >
> > > > The reason that the client returns -EAGAIN is that it needs to
> > > > find
> > > > the
> > > > inode so it can find the associated state to update the seq-id,
> > > > but
> > > > the
> > > > inode lookup requires the file-id which is provided in the
> > > > GETATTR
> > > > reply.  Without the file-id normal inode lookup cannot be used.
> > > >
> > > > This patch changes the lookup so that when the file-id is not
> > > > available
> > > > the list of states owned by the open-owner is examined to find
> > > > the
> > > > state
> > > > with the correct state-id (ignoring the seq-id part of the state-
> > > > id).
> > > > If this is found it is used just as when a normal inode lookup
> > > > finds
> > > > an
> > > > inode.  If it isn't found, -EAGAIN is returned as before.
> > > >
> > > > This bug can be demonstrated by modifying the Linux NFS server as
> > > > follows:
> > > >
> > > > 1/ The second time a file is opened, unlink it.  This simulates
> > > >    a race with another client, without needing to have a race:
> > > >
> > > >     --- a/fs/nfsd/nfs4proc.c
> > > >     +++ b/fs/nfsd/nfs4proc.c
> > > >     @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp,
> > > > struct
> > > > nfsd4_compound_state *cstate,
> > > >         if (reclaim && !status)
> > > >                 nn->somebody_reclaimed = true;
> > > >      out:
> > > >     +   if (!status && open->op_stateid.si_generation > 1) {
> > > >     +           printk("Opening gen %d\n", (int)open-
> > > > > op_stateid.si_generation);
> > > >     +           vfs_unlink(mnt_user_ns(resfh->fh_export-
> > > > > ex_path.mnt),
> > > >     +                      resfh->fh_dentry->d_parent->d_inode,
> > > >     +                      resfh->fh_dentry, NULL);
> > > >     +   }
> > > >         if (open->op_filp) {
> > > >                 fput(open->op_filp);
> > > >                 open->op_filp = NULL;
> > > >
> > > > 2/ When a GETATTR op is attempted on an unlinked file, return
> > > > ESTALE
> > > >
> > > >     @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp,
> > > > struct
> > > > nfsd4_compound_state *cstate,
> > > >         if (status)
> > > >                 return status;
> > > >
> > > >     +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0)
> > > > {
> > > >     +           printk("Return Estale for unlinked file\n");
> > > >     +           return nfserr_stale;
> > > >     +   }
> > > >     +
> > > >         if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
> > > >                 return nfserr_inval;
> > > >
> > > > Then mount the filesystem and
> > > >
> > > >   Thread 1            Thread 2
> > > >   open a file
> > > >                       open the same file (will fail)
> > > >   write to that file
> > > >
> > > > I use this shell fragment, using 'sleep' for synchronisation.
> > > > The use of /bin/echo ensures the write is flushed when /bin/echo
> > > > closes
> > > > the fd on exit.
> > > >
> > > >     (
> > > >         /bin/echo hello
> > > >         sleep 3
> > > >         /bin/echo there
> > > >     ) > /import/A/afile &
> > > >     sleep 3
> > > >     cat /import/A/afile
> > > >
> > > > Probably when the OPEN succeeds, the GETATTR fails, and we don't
> > > > already
> > > > have the state open, we should explicitly close the state.
> > > > Leaving
> > > > it
> > > > open could cause problems if, for example, the server revoked it
> > > > and
> > > > signalled the client that there was a revoked state.  The client
> > > > would
> > > > not be able to find the state that needed to be relinquished.  I
> > > > haven't
> > > > attempted to implement this.
> > >
> > >
> > > If the server starts to reply NFS4ERR_STALE to GETATTR requests,
> > > why do
> > > we care about stateid values?
> >
> > It is acceptable for the server to return ESTALE to the GETATTR after
> > the processing the open (due to a REMOVE that comes in) and that open
> > generating a valid stateid which client should care about when there
> > are pre-existing opens. The server will keep the state of an existing
> > opens valid even if the file is removed. Which is what's happening,
> > the previous open is being used for IO but the stateid is updated on
> > the server but not on the client.
> >
> > > Just mark the inode as stale and drop it
> > > on the floor.
> >
> > Why would that be correct? Any pre-existing opens should continue
> > operating, thus the inode can't be marked stale. We don't do it now
> > (silly rename allows preexisting IO to continue).
> >
> > > If the server tries to declare the state as revoked, then it is
> > > clearly
> > > borken, so let NetApp fix their own bug.
> >
> > The server does not declare the state revoked. The open succeeded and
> > its state's seqid was updated but the client is using an old stateid.
> > Server isn't at fault here.
>
> If the client can't send a GETATTR, then it can't revalidate
> attributes, do I/O, and it can't even close the file. So we're not
> going to waste time and effort trying to support this, whether or not
> NetApp thinks it is a good idea.

That makes sense but I think the server should fail PUTFHs in the
compounds after the REMOVE was processed, not the other (GETATTR,
WRITE) operations, right?

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>
Trond Myklebust Jan. 4, 2023, 4:07 p.m. UTC | #11
On Wed, 2023-01-04 at 10:09 -0500, Olga Kornievskaia wrote:
> On Tue, Jan 3, 2023 at 9:17 PM Trond Myklebust <trondmy@kernel.org>
> wrote:
> > 
> > On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote:
> > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust
> > > <trondmy@kernel.org>
> > > wrote:
> > > > 
> > > > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:
> > > > > 
> > > > > If a NFSv4 OPEN reply reports that the file was successfully
> > > > > opened
> > > > > but
> > > > > the subsequent GETATTR fails, Linux-NFS will attempt a stand-
> > > > > alone
> > > > > GETATTR request.  If that also fails, handling of the reply
> > > > > is
> > > > > aborted
> > > > > with error -EAGAIN and the open is attempted again from the
> > > > > start.
> > > > > 
> > > > > This leaves the server with an active state (because the OPEN
> > > > > operation
> > > > > succeeded) which the client doesn't know about.  If the open-
> > > > > owner
> > > > > (local user) did not have the file already open, this has
> > > > > minimal
> > > > > consequences for the client and only causes the server to
> > > > > spend
> > > > > resources on an open state that will never be used or
> > > > > explicitly
> > > > > closed.
> > > > > 
> > > > > If the open-owner DID already have the file open, then it
> > > > > will
> > > > > hold a
> > > > > reference to the open-state for that file, but the seq-id in
> > > > > the
> > > > > state-id will now be out-of-sync with the server.  The server
> > > > > will
> > > > > have
> > > > > incremented the seq-id, but the client will not have
> > > > > noticed.  So
> > > > > when
> > > > > the client next attempts to access the file using that state
> > > > > (READ,
> > > > > WRITE, SETATTR), the attempt will fail with
> > > > > NFS4ERR_OLD_STATEID.
> > > > > 
> > > > > The Linux-client assumes this error is due to a race and
> > > > > simply
> > > > > retries
> > > > > on the basis that the local state-id information should have
> > > > > been
> > > > > updated by another thread.  This basis is invalid in this
> > > > > case
> > > > > and
> > > > > the
> > > > > result is an infinite loop attempting IO and getting
> > > > > OLD_STATEID.
> > > > > 
> > > > > This has been observed with a NetApp Filer as the server
> > > > > (ONTAP
> > > > > 9.8
> > > > > p5,
> > > > > using NFSv4.0).  The client is creating, writing, and
> > > > > unlinking a
> > > > > particular file from multiple clients (.bash_history).  If a
> > > > > new
> > > > > OPEN
> > > > > from one client races with a REMOVE from another client while
> > > > > the
> > > > > first
> > > > > client already has the file open, the Filer can report
> > > > > success
> > > > > for
> > > > > the
> > > > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in
> > > > > the
> > > > > OPEN
> > > > > request.  This gets the seq-id out-of-sync and a subsequent
> > > > > write
> > > > > to
> > > > > the
> > > > > other open on the first client causes the infinite loop to
> > > > > occur.
> > > > > 
> > > > > The reason that the client returns -EAGAIN is that it needs
> > > > > to
> > > > > find
> > > > > the
> > > > > inode so it can find the associated state to update the seq-
> > > > > id,
> > > > > but
> > > > > the
> > > > > inode lookup requires the file-id which is provided in the
> > > > > GETATTR
> > > > > reply.  Without the file-id normal inode lookup cannot be
> > > > > used.
> > > > > 
> > > > > This patch changes the lookup so that when the file-id is not
> > > > > available
> > > > > the list of states owned by the open-owner is examined to
> > > > > find
> > > > > the
> > > > > state
> > > > > with the correct state-id (ignoring the seq-id part of the
> > > > > state-
> > > > > id).
> > > > > If this is found it is used just as when a normal inode
> > > > > lookup
> > > > > finds
> > > > > an
> > > > > inode.  If it isn't found, -EAGAIN is returned as before.
> > > > > 
> > > > > This bug can be demonstrated by modifying the Linux NFS
> > > > > server as
> > > > > follows:
> > > > > 
> > > > > 1/ The second time a file is opened, unlink it.  This
> > > > > simulates
> > > > >    a race with another client, without needing to have a
> > > > > race:
> > > > > 
> > > > >     --- a/fs/nfsd/nfs4proc.c
> > > > >     +++ b/fs/nfsd/nfs4proc.c
> > > > >     @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp,
> > > > > struct
> > > > > nfsd4_compound_state *cstate,
> > > > >         if (reclaim && !status)
> > > > >                 nn->somebody_reclaimed = true;
> > > > >      out:
> > > > >     +   if (!status && open->op_stateid.si_generation > 1) {
> > > > >     +           printk("Opening gen %d\n", (int)open-
> > > > > > op_stateid.si_generation);
> > > > >     +           vfs_unlink(mnt_user_ns(resfh->fh_export-
> > > > > > ex_path.mnt),
> > > > >     +                      resfh->fh_dentry->d_parent-
> > > > > >d_inode,
> > > > >     +                      resfh->fh_dentry, NULL);
> > > > >     +   }
> > > > >         if (open->op_filp) {
> > > > >                 fput(open->op_filp);
> > > > >                 open->op_filp = NULL;
> > > > > 
> > > > > 2/ When a GETATTR op is attempted on an unlinked file, return
> > > > > ESTALE
> > > > > 
> > > > >     @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst
> > > > > *rqstp,
> > > > > struct
> > > > > nfsd4_compound_state *cstate,
> > > > >         if (status)
> > > > >                 return status;
> > > > > 
> > > > >     +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink ==
> > > > > 0)
> > > > > {
> > > > >     +           printk("Return Estale for unlinked file\n");
> > > > >     +           return nfserr_stale;
> > > > >     +   }
> > > > >     +
> > > > >         if (getattr->ga_bmval[1] &
> > > > > NFSD_WRITEONLY_ATTRS_WORD1)
> > > > >                 return nfserr_inval;
> > > > > 
> > > > > Then mount the filesystem and
> > > > > 
> > > > >   Thread 1            Thread 2
> > > > >   open a file
> > > > >                       open the same file (will fail)
> > > > >   write to that file
> > > > > 
> > > > > I use this shell fragment, using 'sleep' for synchronisation.
> > > > > The use of /bin/echo ensures the write is flushed when
> > > > > /bin/echo
> > > > > closes
> > > > > the fd on exit.
> > > > > 
> > > > >     (
> > > > >         /bin/echo hello
> > > > >         sleep 3
> > > > >         /bin/echo there
> > > > >     ) > /import/A/afile &
> > > > >     sleep 3
> > > > >     cat /import/A/afile
> > > > > 
> > > > > Probably when the OPEN succeeds, the GETATTR fails, and we
> > > > > don't
> > > > > already
> > > > > have the state open, we should explicitly close the state.
> > > > > Leaving
> > > > > it
> > > > > open could cause problems if, for example, the server revoked
> > > > > it
> > > > > and
> > > > > signalled the client that there was a revoked state.  The
> > > > > client
> > > > > would
> > > > > not be able to find the state that needed to be
> > > > > relinquished.  I
> > > > > haven't
> > > > > attempted to implement this.
> > > > 
> > > > 
> > > > If the server starts to reply NFS4ERR_STALE to GETATTR
> > > > requests,
> > > > why do
> > > > we care about stateid values?
> > > 
> > > It is acceptable for the server to return ESTALE to the GETATTR
> > > after
> > > the processing the open (due to a REMOVE that comes in) and that
> > > open
> > > generating a valid stateid which client should care about when
> > > there
> > > are pre-existing opens. The server will keep the state of an
> > > existing
> > > opens valid even if the file is removed. Which is what's
> > > happening,
> > > the previous open is being used for IO but the stateid is updated
> > > on
> > > the server but not on the client.
> > > 
> > > > Just mark the inode as stale and drop it
> > > > on the floor.
> > > 
> > > Why would that be correct? Any pre-existing opens should continue
> > > operating, thus the inode can't be marked stale. We don't do it
> > > now
> > > (silly rename allows preexisting IO to continue).
> > > 
> > > > If the server tries to declare the state as revoked, then it is
> > > > clearly
> > > > borken, so let NetApp fix their own bug.
> > > 
> > > The server does not declare the state revoked. The open succeeded
> > > and
> > > its state's seqid was updated but the client is using an old
> > > stateid.
> > > Server isn't at fault here.
> > 
> > If the client can't send a GETATTR, then it can't revalidate
> > attributes, do I/O, and it can't even close the file. So we're not
> > going to waste time and effort trying to support this, whether or
> > not
> > NetApp thinks it is a good idea.
> 
> That makes sense but I think the server should fail PUTFHs in the
> compounds after the REMOVE was processed, not the other (GETATTR,
> WRITE) operations, right?
> 
> 

In theory, the server is free to reply NFS4_OK to PUTFH, and then
NFS4ERR_STALE to the GETATTR, because a COMPOUND is not atomic w.r.t.
execution of the individual operations. i.e. if the file is deleted by
some other RPC call between the execution of the PUTFH and the GETATTR
in the COMPOUND, then this situation can happen.

However, my point is that once the server starts handing out
NFS4ERR_STALE errors, then the client can assume that the filehandle
being used is bad, because under RFC5661, Section 4.2.2, the server
MUST return NFS4ERR_STALE if it is presented with that filehandle
again.
IOW: the client should just toss the file associated with that
filehandle out of its inode caches, and forget any state associated
with that filehandle, because all future attempts to use it MUST result
in NFS4ERR_STALE.
NeilBrown Jan. 5, 2023, 10:02 p.m. UTC | #12
On Thu, 05 Jan 2023, Olga Kornievskaia wrote:
> On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Wed, 04 Jan 2023, NeilBrown wrote:
> > > On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
> > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote:
> > > > >
> > > > >
> > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> > > > > we care about stateid values?
> > > >
> > > > It is acceptable for the server to return ESTALE to the GETATTR after
> > > > the processing the open (due to a REMOVE that comes in) and that open
> > > > generating a valid stateid which client should care about when there
> > > > are pre-existing opens. The server will keep the state of an existing
> > > > opens valid even if the file is removed. Which is what's happening,
> > > > the previous open is being used for IO but the stateid is updated on
> > > > the server but not on the client.
> > >
> > > I agree that it is acceptable to return ESTALE to the GETATTR, but
> > > having done that I don't think it is acceptable for a PUTFH of the same
> > > filehandle to succeed.  Certainly any attempt to again use the
> > > filehandle after the PUTFH should fail with NFS4ERR_STALE.
> > >
> > > RFC7530 says
> > >
> > > 13.1.2.7.  NFS4ERR_STALE (Error Code 70)
> > >
> > >    The current or saved filehandle value designating an argument to the
> > >    current operation is invalid.  The file system object referred to by
> > >    that filehandle no longer exists, or access to it has been revoked.
> > >
> > > So the file doesn't exist or access has been revoked.  So any writes
> > > should fail.  Failing with OLD_STATEID is weird - and having writes
> > > succeed if we use the correct stateid is also odd.  Failing with STALE
> > > would be perfectly sensible and I suspect the Linux client would handle
> > > that just fine.
> >
> > I checked a recent tcpdump (with patched SLE kernel talking to Netapp)
> > and I see that the writes don't succeed after the first NFS4ERR_STALE.
> >
> > If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE.
> > If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID.
> >
> > So it seems that it just has these two checks in the wrong order.
> 
> Does Netapp return ERR_STALE on the PUTFH if the stateid is correct?

In the traces I have, Netapp never returns ERR_STALE on PUTFH.  Of
course the PUTFH operation doesn't have a stateid, so the "if the
stateid is correct" is not meaningful.

ACCESS, READ, WRITE, SETATTR, and GETATTR are the only ops that I have
seen to result in ERR_STALE.

ACCESS and GETATTR don't have a stateid.
READ, WRITE, and SETATTR do.  These get OLD_STATEID if the stateid is
old, and only get STALE if the stateid is current.


> If it's the WRITE operation that returns an error and if the server
> has multiple errors (ie bad statid and bad file handle)` then I don't
> think the spec says which error is more important. In this case, I
> think the server should fail PUTFH with ERR_STALE.

I agree.  If the PUTFH returned STALE there would not be a problem.
Even if a race resulted in the PUTFH succeeding and the WRITE getting
OLD_STATEID, Linux-NFS would retry an the new PUTFH could then fail
correctly.

NeilBrown

> 
> >
> > NeilBrown
>
NeilBrown Jan. 5, 2023, 10:56 p.m. UTC | #13
On Wed, 04 Jan 2023, NeilBrown wrote:
> On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > 
> > > > 
> > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests,
> > > > why do
> > > > we care about stateid values? Just mark the inode as stale and drop
> > > > it
> > > > on the floor.
> > > 
> > > We have a valid state from the server - we really shouldn't just
> > > ignore
> > > it.
> > > 
> > > Maybe it would be OK to mark the inode stale.  I don't know if
> > > various
> > > retry loops abort properly when the inode is stale.
> > 
> > Yes, they are all supposed to do that. Otherwise we would end up
> > looping forever in close(), for instance, since the PUTFH, GETATTR and
> > CLOSE can all return NFS4ERR_STALE as well.
> 
> To mark the inode as STALE we still need to find the inode, and that is
> the key bit missing in the current code.  Once we find the inode, we
> could mark it stale, but maybe some other error resulted in the missing
> GETATTR...
> 
> It might make sense to put the new code in _nfs4_proc_open() after the
> explicit nfs4_proc_getattr() fails.  We would need to find the inode
> given only the filehandle.  Is there any easy way to do that?
> 
> Thanks,
> NeilBrown
> 

I couldn't see a consistent pattern to follow for when to mark an inode
as stale.  Do this, on top of the previous patch, seem reasonable?

Thanks,
NeilBrown



diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b441b1d14a50..04497cb42154 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
 		case -ESTALE:
 			if (inode != NULL && S_ISREG(inode->i_mode))
 				pnfs_destroy_layout(NFS_I(inode));
+			if (inode)
+				nfs_set_inode_stale(inode);
 			break;
 		case -NFS4ERR_DELEG_REVOKED:
 		case -NFS4ERR_ADMIN_REVOKED:
@@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct nfs4_opendata *data,
 			return status;
 	}
 	if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
+		struct inode *inode = nfs4_get_inode_by_stateid(
+			&data->o_res.stateid,
+			data->owner);
 		nfs4_sequence_free_slot(&o_res->seq_res);
-		nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, NULL);
+		nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, inode);
+		iput(inode);
 	}
 	return 0;
 }
@@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
 {
 	struct nfs4_exception exception = {
 		.interruptible = true,
+		.inode = inode,
 	};
 	int err;
 	do {
Trond Myklebust Jan. 6, 2023, 3:29 a.m. UTC | #14
On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
> On Wed, 04 Jan 2023, NeilBrown wrote:
> > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> > > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > > 
> > > > > 
> > > > > If the server starts to reply NFS4ERR_STALE to GETATTR
> > > > > requests,
> > > > > why do
> > > > > we care about stateid values? Just mark the inode as stale
> > > > > and drop
> > > > > it
> > > > > on the floor.
> > > > 
> > > > We have a valid state from the server - we really shouldn't
> > > > just
> > > > ignore
> > > > it.
> > > > 
> > > > Maybe it would be OK to mark the inode stale.  I don't know if
> > > > various
> > > > retry loops abort properly when the inode is stale.
> > > 
> > > Yes, they are all supposed to do that. Otherwise we would end up
> > > looping forever in close(), for instance, since the PUTFH,
> > > GETATTR and
> > > CLOSE can all return NFS4ERR_STALE as well.
> > 
> > To mark the inode as STALE we still need to find the inode, and
> > that is
> > the key bit missing in the current code.  Once we find the inode,
> > we
> > could mark it stale, but maybe some other error resulted in the
> > missing
> > GETATTR...
> > 
> > It might make sense to put the new code in _nfs4_proc_open() after
> > the
> > explicit nfs4_proc_getattr() fails.  We would need to find the
> > inode
> > given only the filehandle.  Is there any easy way to do that?
> > 
> > Thanks,
> > NeilBrown
> > 
> 
> I couldn't see a consistent pattern to follow for when to mark an
> inode
> as stale.  Do this, on top of the previous patch, seem reasonable?
> 
> Thanks,
> NeilBrown
> 
> 
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index b441b1d14a50..04497cb42154 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
> nfs_server *server,
>                 case -ESTALE:
>                         if (inode != NULL && S_ISREG(inode->i_mode))
>                                 pnfs_destroy_layout(NFS_I(inode));
> +                       if (inode)
> +                               nfs_set_inode_stale(inode);

This is normally dealt with in the generic code inside
nfs_revalidate_inode(). There should be no need to duplicate it here.

>                         break;
>                 case -NFS4ERR_DELEG_REVOKED:
>                 case -NFS4ERR_ADMIN_REVOKED:
> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
> nfs4_opendata *data,
>                         return status;
>         }
>         if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
> +               struct inode *inode = nfs4_get_inode_by_stateid(
> +                       &data->o_res.stateid,
> +                       data->owner);

There shouldn't be a need to go looking for open descriptors here,
because they will hit ESTALE at some point later anyway.

If we're going to change anything, I'd rather see us return -EACCES and
-ESTALE from the decode_access() and decode_getfattr() calls in
nfs4_xdr_dec_open() (and only those errors from those two calls!) so
that we can skip the unnecessary getattr call here.

In fact, the only case that this extra getattr should be trying to
address is the one where the server returns NFS4ERR_DELAY to either the
decode_access() or the decode_getfattr() calls specifically, and where
we therefore don't want to replay the entire open call.

>                 nfs4_sequence_free_slot(&o_res->seq_res);
> -               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> NULL);
> +               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> inode);
> +               iput(inode);
>         }
>         return 0;
>  }
> @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server
> *server, struct nfs_fh *fhandle,
>  {
>         struct nfs4_exception exception = {
>                 .interruptible = true,
> +               .inode = inode,
>         };
>         int err;
>         do {
NeilBrown Jan. 6, 2023, 3:48 a.m. UTC | #15
On Fri, 06 Jan 2023, Trond Myklebust wrote:
> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
> > On Wed, 04 Jan 2023, NeilBrown wrote:
> > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > > > 
> > > > > > 
> > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR
> > > > > > requests,
> > > > > > why do
> > > > > > we care about stateid values? Just mark the inode as stale
> > > > > > and drop
> > > > > > it
> > > > > > on the floor.
> > > > > 
> > > > > We have a valid state from the server - we really shouldn't
> > > > > just
> > > > > ignore
> > > > > it.
> > > > > 
> > > > > Maybe it would be OK to mark the inode stale.  I don't know if
> > > > > various
> > > > > retry loops abort properly when the inode is stale.
> > > > 
> > > > Yes, they are all supposed to do that. Otherwise we would end up
> > > > looping forever in close(), for instance, since the PUTFH,
> > > > GETATTR and
> > > > CLOSE can all return NFS4ERR_STALE as well.
> > > 
> > > To mark the inode as STALE we still need to find the inode, and
> > > that is
> > > the key bit missing in the current code.  Once we find the inode,
> > > we
> > > could mark it stale, but maybe some other error resulted in the
> > > missing
> > > GETATTR...
> > > 
> > > It might make sense to put the new code in _nfs4_proc_open() after
> > > the
> > > explicit nfs4_proc_getattr() fails.  We would need to find the
> > > inode
> > > given only the filehandle.  Is there any easy way to do that?
> > > 
> > > Thanks,
> > > NeilBrown
> > > 
> > 
> > I couldn't see a consistent pattern to follow for when to mark an
> > inode
> > as stale.  Do this, on top of the previous patch, seem reasonable?
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > 
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index b441b1d14a50..04497cb42154 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
> > nfs_server *server,
> >                 case -ESTALE:
> >                         if (inode != NULL && S_ISREG(inode->i_mode))
> >                                 pnfs_destroy_layout(NFS_I(inode));
> > +                       if (inode)
> > +                               nfs_set_inode_stale(inode);
> 
> This is normally dealt with in the generic code inside
> nfs_revalidate_inode(). There should be no need to duplicate it here.
> 
> >                         break;
> >                 case -NFS4ERR_DELEG_REVOKED:
> >                 case -NFS4ERR_ADMIN_REVOKED:
> > @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
> > nfs4_opendata *data,
> >                         return status;
> >         }
> >         if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
> > +               struct inode *inode = nfs4_get_inode_by_stateid(
> > +                       &data->o_res.stateid,
> > +                       data->owner);
> 
> There shouldn't be a need to go looking for open descriptors here,
> because they will hit ESTALE at some point later anyway.

The problem is that they don't hit ESTALE later.  Unless we update our
stored stateid with the result of the OPEN, we can use the old stateid,
and get the corresponding error.

The only way to avoid the infinite loop is to either mark the inode as
stale, or update the state-id.  For either of those we need to find the
inode.  We don't have fileid so we cannot use iget.  We do have file
handle and state-id.

Maybe you are saying this is a server bug that the client cannot be
expect to cope with at all, and that an infinite loop is a valid client
response to this particular server behaviour.  In that case, I guess no
patch is needed.

NeilBrown


> 
> If we're going to change anything, I'd rather see us return -EACCES and
> -ESTALE from the decode_access() and decode_getfattr() calls in
> nfs4_xdr_dec_open() (and only those errors from those two calls!) so
> that we can skip the unnecessary getattr call here.
> 
> In fact, the only case that this extra getattr should be trying to
> address is the one where the server returns NFS4ERR_DELAY to either the
> decode_access() or the decode_getfattr() calls specifically, and where
> we therefore don't want to replay the entire open call.
> 
> >                 nfs4_sequence_free_slot(&o_res->seq_res);
> > -               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> > NULL);
> > +               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> > inode);
> > +               iput(inode);
> >         }
> >         return 0;
> >  }
> > @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server
> > *server, struct nfs_fh *fhandle,
> >  {
> >         struct nfs4_exception exception = {
> >                 .interruptible = true,
> > +               .inode = inode,
> >         };
> >         int err;
> >         do {
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 
>
Olga Kornievskaia Jan. 9, 2023, 3:33 p.m. UTC | #16
On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 06 Jan 2023, Trond Myklebust wrote:
> > On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
> > > On Wed, 04 Jan 2023, NeilBrown wrote:
> > > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> > > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > > > >
> > > > > > >
> > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR
> > > > > > > requests,
> > > > > > > why do
> > > > > > > we care about stateid values? Just mark the inode as stale
> > > > > > > and drop
> > > > > > > it
> > > > > > > on the floor.
> > > > > >
> > > > > > We have a valid state from the server - we really shouldn't
> > > > > > just
> > > > > > ignore
> > > > > > it.
> > > > > >
> > > > > > Maybe it would be OK to mark the inode stale.  I don't know if
> > > > > > various
> > > > > > retry loops abort properly when the inode is stale.
> > > > >
> > > > > Yes, they are all supposed to do that. Otherwise we would end up
> > > > > looping forever in close(), for instance, since the PUTFH,
> > > > > GETATTR and
> > > > > CLOSE can all return NFS4ERR_STALE as well.
> > > >
> > > > To mark the inode as STALE we still need to find the inode, and
> > > > that is
> > > > the key bit missing in the current code.  Once we find the inode,
> > > > we
> > > > could mark it stale, but maybe some other error resulted in the
> > > > missing
> > > > GETATTR...
> > > >
> > > > It might make sense to put the new code in _nfs4_proc_open() after
> > > > the
> > > > explicit nfs4_proc_getattr() fails.  We would need to find the
> > > > inode
> > > > given only the filehandle.  Is there any easy way to do that?
> > > >
> > > > Thanks,
> > > > NeilBrown
> > > >
> > >
> > > I couldn't see a consistent pattern to follow for when to mark an
> > > inode
> > > as stale.  Do this, on top of the previous patch, seem reasonable?
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > >
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index b441b1d14a50..04497cb42154 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
> > > nfs_server *server,
> > >                 case -ESTALE:
> > >                         if (inode != NULL && S_ISREG(inode->i_mode))
> > >                                 pnfs_destroy_layout(NFS_I(inode));
> > > +                       if (inode)
> > > +                               nfs_set_inode_stale(inode);
> >
> > This is normally dealt with in the generic code inside
> > nfs_revalidate_inode(). There should be no need to duplicate it here.
> >
> > >                         break;
> > >                 case -NFS4ERR_DELEG_REVOKED:
> > >                 case -NFS4ERR_ADMIN_REVOKED:
> > > @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
> > > nfs4_opendata *data,
> > >                         return status;
> > >         }
> > >         if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
> > > +               struct inode *inode = nfs4_get_inode_by_stateid(
> > > +                       &data->o_res.stateid,
> > > +                       data->owner);
> >
> > There shouldn't be a need to go looking for open descriptors here,
> > because they will hit ESTALE at some point later anyway.
>
> The problem is that they don't hit ESTALE later.  Unless we update our
> stored stateid with the result of the OPEN, we can use the old stateid,
> and get the corresponding error.
>
> The only way to avoid the infinite loop is to either mark the inode as
> stale, or update the state-id.  For either of those we need to find the
> inode.  We don't have fileid so we cannot use iget.  We do have file
> handle and state-id.
>
> Maybe you are saying this is a server bug that the client cannot be
> expect to cope with at all, and that an infinite loop is a valid client
> response to this particular server behaviour.  In that case, I guess no
> patch is needed.

I'm not arguing that the server should do something else. But I would
like to talk about it from the spec perspective. When PUTFH+WRITE is
sent to the server (with an incorrect stateid) and it's processing the
WRITE compound (as the spec doesn't require the server to validate a
filehandle on PUTFH. The spec says PUTFH is to "set" the correct
filehandle), the server is dealing with 2 errors that it can possibly
return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing
in the spec that speaks to the orders of errors to be returned. Of
course I'd like to say that the server should prioritize ERR_STALE
over ERR_OLD_STATEID (simply because it's a MUST in the spec and
ERR_OLD_STATEIDs are written as "rules").

>
> NeilBrown
>
>
> >
> > If we're going to change anything, I'd rather see us return -EACCES and
> > -ESTALE from the decode_access() and decode_getfattr() calls in
> > nfs4_xdr_dec_open() (and only those errors from those two calls!) so
> > that we can skip the unnecessary getattr call here.
> >
> > In fact, the only case that this extra getattr should be trying to
> > address is the one where the server returns NFS4ERR_DELAY to either the
> > decode_access() or the decode_getfattr() calls specifically, and where
> > we therefore don't want to replay the entire open call.
> >
> > >                 nfs4_sequence_free_slot(&o_res->seq_res);
> > > -               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> > > NULL);
> > > +               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> > > inode);
> > > +               iput(inode);
> > >         }
> > >         return 0;
> > >  }
> > > @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server
> > > *server, struct nfs_fh *fhandle,
> > >  {
> > >         struct nfs4_exception exception = {
> > >                 .interruptible = true,
> > > +               .inode = inode,
> > >         };
> > >         int err;
> > >         do {
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> >
> >
> >
Trond Myklebust Jan. 9, 2023, 3:47 p.m. UTC | #17
> On Jan 9, 2023, at 10:33, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote:
>> 
>> On Fri, 06 Jan 2023, Trond Myklebust wrote:
>>> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
>>>> On Wed, 04 Jan 2023, NeilBrown wrote:
>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote:
>>>>>> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
>>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> If the server starts to reply NFS4ERR_STALE to GETATTR
>>>>>>>> requests,
>>>>>>>> why do
>>>>>>>> we care about stateid values? Just mark the inode as stale
>>>>>>>> and drop
>>>>>>>> it
>>>>>>>> on the floor.
>>>>>>> 
>>>>>>> We have a valid state from the server - we really shouldn't
>>>>>>> just
>>>>>>> ignore
>>>>>>> it.
>>>>>>> 
>>>>>>> Maybe it would be OK to mark the inode stale.  I don't know if
>>>>>>> various
>>>>>>> retry loops abort properly when the inode is stale.
>>>>>> 
>>>>>> Yes, they are all supposed to do that. Otherwise we would end up
>>>>>> looping forever in close(), for instance, since the PUTFH,
>>>>>> GETATTR and
>>>>>> CLOSE can all return NFS4ERR_STALE as well.
>>>>> 
>>>>> To mark the inode as STALE we still need to find the inode, and
>>>>> that is
>>>>> the key bit missing in the current code.  Once we find the inode,
>>>>> we
>>>>> could mark it stale, but maybe some other error resulted in the
>>>>> missing
>>>>> GETATTR...
>>>>> 
>>>>> It might make sense to put the new code in _nfs4_proc_open() after
>>>>> the
>>>>> explicit nfs4_proc_getattr() fails.  We would need to find the
>>>>> inode
>>>>> given only the filehandle.  Is there any easy way to do that?
>>>>> 
>>>>> Thanks,
>>>>> NeilBrown
>>>>> 
>>>> 
>>>> I couldn't see a consistent pattern to follow for when to mark an
>>>> inode
>>>> as stale.  Do this, on top of the previous patch, seem reasonable?
>>>> 
>>>> Thanks,
>>>> NeilBrown
>>>> 
>>>> 
>>>> 
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index b441b1d14a50..04497cb42154 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
>>>> nfs_server *server,
>>>>                case -ESTALE:
>>>>                        if (inode != NULL && S_ISREG(inode->i_mode))
>>>>                                pnfs_destroy_layout(NFS_I(inode));
>>>> +                       if (inode)
>>>> +                               nfs_set_inode_stale(inode);
>>> 
>>> This is normally dealt with in the generic code inside
>>> nfs_revalidate_inode(). There should be no need to duplicate it here.
>>> 
>>>>                        break;
>>>>                case -NFS4ERR_DELEG_REVOKED:
>>>>                case -NFS4ERR_ADMIN_REVOKED:
>>>> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
>>>> nfs4_opendata *data,
>>>>                        return status;
>>>>        }
>>>>        if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
>>>> +               struct inode *inode = nfs4_get_inode_by_stateid(
>>>> +                       &data->o_res.stateid,
>>>> +                       data->owner);
>>> 
>>> There shouldn't be a need to go looking for open descriptors here,
>>> because they will hit ESTALE at some point later anyway.
>> 
>> The problem is that they don't hit ESTALE later.  Unless we update our
>> stored stateid with the result of the OPEN, we can use the old stateid,
>> and get the corresponding error.
>> 
>> The only way to avoid the infinite loop is to either mark the inode as
>> stale, or update the state-id.  For either of those we need to find the
>> inode.  We don't have fileid so we cannot use iget.  We do have file
>> handle and state-id.
>> 
>> Maybe you are saying this is a server bug that the client cannot be
>> expect to cope with at all, and that an infinite loop is a valid client
>> response to this particular server behaviour.  In that case, I guess no
>> patch is needed.
> 
> I'm not arguing that the server should do something else. But I would
> like to talk about it from the spec perspective. When PUTFH+WRITE is
> sent to the server (with an incorrect stateid) and it's processing the
> WRITE compound (as the spec doesn't require the server to validate a
> filehandle on PUTFH. The spec says PUTFH is to "set" the correct
> filehandle), the server is dealing with 2 errors that it can possibly
> return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing
> in the spec that speaks to the orders of errors to be returned. Of
> course I'd like to say that the server should prioritize ERR_STALE
> over ERR_OLD_STATEID (simply because it's a MUST in the spec and
> ERR_OLD_STATEIDs are written as "rules").
> 

I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE.
Olga Kornievskaia Jan. 9, 2023, 4:07 p.m. UTC | #18
On Mon, Jan 9, 2023 at 10:47 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
>
>
> > On Jan 9, 2023, at 10:33, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote:
> >>
> >> On Fri, 06 Jan 2023, Trond Myklebust wrote:
> >>> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
> >>>> On Wed, 04 Jan 2023, NeilBrown wrote:
> >>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote:
> >>>>>> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> >>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> If the server starts to reply NFS4ERR_STALE to GETATTR
> >>>>>>>> requests,
> >>>>>>>> why do
> >>>>>>>> we care about stateid values? Just mark the inode as stale
> >>>>>>>> and drop
> >>>>>>>> it
> >>>>>>>> on the floor.
> >>>>>>>
> >>>>>>> We have a valid state from the server - we really shouldn't
> >>>>>>> just
> >>>>>>> ignore
> >>>>>>> it.
> >>>>>>>
> >>>>>>> Maybe it would be OK to mark the inode stale.  I don't know if
> >>>>>>> various
> >>>>>>> retry loops abort properly when the inode is stale.
> >>>>>>
> >>>>>> Yes, they are all supposed to do that. Otherwise we would end up
> >>>>>> looping forever in close(), for instance, since the PUTFH,
> >>>>>> GETATTR and
> >>>>>> CLOSE can all return NFS4ERR_STALE as well.
> >>>>>
> >>>>> To mark the inode as STALE we still need to find the inode, and
> >>>>> that is
> >>>>> the key bit missing in the current code.  Once we find the inode,
> >>>>> we
> >>>>> could mark it stale, but maybe some other error resulted in the
> >>>>> missing
> >>>>> GETATTR...
> >>>>>
> >>>>> It might make sense to put the new code in _nfs4_proc_open() after
> >>>>> the
> >>>>> explicit nfs4_proc_getattr() fails.  We would need to find the
> >>>>> inode
> >>>>> given only the filehandle.  Is there any easy way to do that?
> >>>>>
> >>>>> Thanks,
> >>>>> NeilBrown
> >>>>>
> >>>>
> >>>> I couldn't see a consistent pattern to follow for when to mark an
> >>>> inode
> >>>> as stale.  Do this, on top of the previous patch, seem reasonable?
> >>>>
> >>>> Thanks,
> >>>> NeilBrown
> >>>>
> >>>>
> >>>>
> >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >>>> index b441b1d14a50..04497cb42154 100644
> >>>> --- a/fs/nfs/nfs4proc.c
> >>>> +++ b/fs/nfs/nfs4proc.c
> >>>> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
> >>>> nfs_server *server,
> >>>>                case -ESTALE:
> >>>>                        if (inode != NULL && S_ISREG(inode->i_mode))
> >>>>                                pnfs_destroy_layout(NFS_I(inode));
> >>>> +                       if (inode)
> >>>> +                               nfs_set_inode_stale(inode);
> >>>
> >>> This is normally dealt with in the generic code inside
> >>> nfs_revalidate_inode(). There should be no need to duplicate it here.
> >>>
> >>>>                        break;
> >>>>                case -NFS4ERR_DELEG_REVOKED:
> >>>>                case -NFS4ERR_ADMIN_REVOKED:
> >>>> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
> >>>> nfs4_opendata *data,
> >>>>                        return status;
> >>>>        }
> >>>>        if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
> >>>> +               struct inode *inode = nfs4_get_inode_by_stateid(
> >>>> +                       &data->o_res.stateid,
> >>>> +                       data->owner);
> >>>
> >>> There shouldn't be a need to go looking for open descriptors here,
> >>> because they will hit ESTALE at some point later anyway.
> >>
> >> The problem is that they don't hit ESTALE later.  Unless we update our
> >> stored stateid with the result of the OPEN, we can use the old stateid,
> >> and get the corresponding error.
> >>
> >> The only way to avoid the infinite loop is to either mark the inode as
> >> stale, or update the state-id.  For either of those we need to find the
> >> inode.  We don't have fileid so we cannot use iget.  We do have file
> >> handle and state-id.
> >>
> >> Maybe you are saying this is a server bug that the client cannot be
> >> expect to cope with at all, and that an infinite loop is a valid client
> >> response to this particular server behaviour.  In that case, I guess no
> >> patch is needed.
> >
> > I'm not arguing that the server should do something else. But I would
> > like to talk about it from the spec perspective. When PUTFH+WRITE is
> > sent to the server (with an incorrect stateid) and it's processing the
> > WRITE compound (as the spec doesn't require the server to validate a
> > filehandle on PUTFH. The spec says PUTFH is to "set" the correct
> > filehandle), the server is dealing with 2 errors that it can possibly
> > return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing
> > in the spec that speaks to the orders of errors to be returned. Of
> > course I'd like to say that the server should prioritize ERR_STALE
> > over ERR_OLD_STATEID (simply because it's a MUST in the spec and
> > ERR_OLD_STATEIDs are written as "rules").
> >
>
> I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE.

We are talking about 4.0 and not 4.1. In 4.0 all operations that use
PUTFH can return ERR_BADHANDLE.

>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
Trond Myklebust Jan. 9, 2023, 5:11 p.m. UTC | #19
> On Jan 9, 2023, at 11:07, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> On Mon, Jan 9, 2023 at 10:47 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>> 
>> 
>> 
>>> On Jan 9, 2023, at 10:33, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> 
>>> On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote:
>>>> 
>>>> On Fri, 06 Jan 2023, Trond Myklebust wrote:
>>>>> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
>>>>>> On Wed, 04 Jan 2023, NeilBrown wrote:
>>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote:
>>>>>>>> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
>>>>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If the server starts to reply NFS4ERR_STALE to GETATTR
>>>>>>>>>> requests,
>>>>>>>>>> why do
>>>>>>>>>> we care about stateid values? Just mark the inode as stale
>>>>>>>>>> and drop
>>>>>>>>>> it
>>>>>>>>>> on the floor.
>>>>>>>>> 
>>>>>>>>> We have a valid state from the server - we really shouldn't
>>>>>>>>> just
>>>>>>>>> ignore
>>>>>>>>> it.
>>>>>>>>> 
>>>>>>>>> Maybe it would be OK to mark the inode stale.  I don't know if
>>>>>>>>> various
>>>>>>>>> retry loops abort properly when the inode is stale.
>>>>>>>> 
>>>>>>>> Yes, they are all supposed to do that. Otherwise we would end up
>>>>>>>> looping forever in close(), for instance, since the PUTFH,
>>>>>>>> GETATTR and
>>>>>>>> CLOSE can all return NFS4ERR_STALE as well.
>>>>>>> 
>>>>>>> To mark the inode as STALE we still need to find the inode, and
>>>>>>> that is
>>>>>>> the key bit missing in the current code.  Once we find the inode,
>>>>>>> we
>>>>>>> could mark it stale, but maybe some other error resulted in the
>>>>>>> missing
>>>>>>> GETATTR...
>>>>>>> 
>>>>>>> It might make sense to put the new code in _nfs4_proc_open() after
>>>>>>> the
>>>>>>> explicit nfs4_proc_getattr() fails.  We would need to find the
>>>>>>> inode
>>>>>>> given only the filehandle.  Is there any easy way to do that?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> NeilBrown
>>>>>>> 
>>>>>> 
>>>>>> I couldn't see a consistent pattern to follow for when to mark an
>>>>>> inode
>>>>>> as stale.  Do this, on top of the previous patch, seem reasonable?
>>>>>> 
>>>>>> Thanks,
>>>>>> NeilBrown
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>> index b441b1d14a50..04497cb42154 100644
>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
>>>>>> nfs_server *server,
>>>>>>               case -ESTALE:
>>>>>>                       if (inode != NULL && S_ISREG(inode->i_mode))
>>>>>>                               pnfs_destroy_layout(NFS_I(inode));
>>>>>> +                       if (inode)
>>>>>> +                               nfs_set_inode_stale(inode);
>>>>> 
>>>>> This is normally dealt with in the generic code inside
>>>>> nfs_revalidate_inode(). There should be no need to duplicate it here.
>>>>> 
>>>>>>                       break;
>>>>>>               case -NFS4ERR_DELEG_REVOKED:
>>>>>>               case -NFS4ERR_ADMIN_REVOKED:
>>>>>> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
>>>>>> nfs4_opendata *data,
>>>>>>                       return status;
>>>>>>       }
>>>>>>       if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
>>>>>> +               struct inode *inode = nfs4_get_inode_by_stateid(
>>>>>> +                       &data->o_res.stateid,
>>>>>> +                       data->owner);
>>>>> 
>>>>> There shouldn't be a need to go looking for open descriptors here,
>>>>> because they will hit ESTALE at some point later anyway.
>>>> 
>>>> The problem is that they don't hit ESTALE later.  Unless we update our
>>>> stored stateid with the result of the OPEN, we can use the old stateid,
>>>> and get the corresponding error.
>>>> 
>>>> The only way to avoid the infinite loop is to either mark the inode as
>>>> stale, or update the state-id.  For either of those we need to find the
>>>> inode.  We don't have fileid so we cannot use iget.  We do have file
>>>> handle and state-id.
>>>> 
>>>> Maybe you are saying this is a server bug that the client cannot be
>>>> expect to cope with at all, and that an infinite loop is a valid client
>>>> response to this particular server behaviour.  In that case, I guess no
>>>> patch is needed.
>>> 
>>> I'm not arguing that the server should do something else. But I would
>>> like to talk about it from the spec perspective. When PUTFH+WRITE is
>>> sent to the server (with an incorrect stateid) and it's processing the
>>> WRITE compound (as the spec doesn't require the server to validate a
>>> filehandle on PUTFH. The spec says PUTFH is to "set" the correct
>>> filehandle), the server is dealing with 2 errors that it can possibly
>>> return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing
>>> in the spec that speaks to the orders of errors to be returned. Of
>>> course I'd like to say that the server should prioritize ERR_STALE
>>> over ERR_OLD_STATEID (simply because it's a MUST in the spec and
>>> ERR_OLD_STATEIDs are written as "rules").
>>> 
>> 
>> I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE.
> 
> We are talking about 4.0 and not 4.1. In 4.0 all operations that use
> PUTFH can return ERR_BADHANDLE.

Same difference: RFC7530 Section 4.2.2 uses the exact same language as in RFC5661 to describe how file handles work for deleted files, and RFC7530 Section 13.2 list NFS4ERR_STALE as being a valid error for PUTFH.
diff mbox series

Patch

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 5edd1704f735..5f2f5371a620 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -493,6 +493,7 @@  extern void nfs4_put_state_owner(struct nfs4_state_owner *);
 extern void nfs4_purge_state_owners(struct nfs_server *, struct list_head *);
 extern void nfs4_free_state_owners(struct list_head *head);
 extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
+extern struct inode *nfs4_get_inode_by_stateid(nfs4_stateid *stateid, struct nfs4_state_owner *owner);
 extern void nfs4_put_open_state(struct nfs4_state *);
 extern void nfs4_close_state(struct nfs4_state *, fmode_t);
 extern void nfs4_close_sync(struct nfs4_state *, fmode_t);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 40d749f29ed3..b441b1d14a50 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2008,10 +2008,20 @@  nfs4_opendata_get_inode(struct nfs4_opendata *data)
 	case NFS4_OPEN_CLAIM_NULL:
 	case NFS4_OPEN_CLAIM_DELEGATE_CUR:
 	case NFS4_OPEN_CLAIM_DELEGATE_PREV:
-		if (!(data->f_attr.valid & NFS_ATTR_FATTR))
-			return ERR_PTR(-EAGAIN);
-		inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh,
-				&data->f_attr);
+		if (data->f_attr.valid & NFS_ATTR_FATTR) {
+			inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh,
+					  &data->f_attr);
+		} else {
+			/* We don't have the fileid and so cannot do inode
+			 * lookup.  If we already have this state open we MUST
+			 * update the seqid to match the server, so we need to
+			 * find it if possible.
+			 */
+			inode = nfs4_get_inode_by_stateid(&data->o_res.stateid,
+							  data->owner);
+			if (!inode)
+				inode = ERR_PTR(-EAGAIN);
+		}
 		break;
 	default:
 		inode = d_inode(data->dentry);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 2a0ca5c7f082..80c36c61ae20 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -752,6 +752,23 @@  nfs4_get_open_state(struct inode *inode, struct nfs4_state_owner *owner)
 	return state;
 }
 
+struct inode *
+nfs4_get_inode_by_stateid(nfs4_stateid *stateid, struct nfs4_state_owner *owner)
+{
+	struct nfs4_state *state;
+	struct inode *inode = NULL;
+
+	spin_lock(&owner->so_lock);
+	list_for_each_entry(state, &owner->so_states, open_states)
+		if (nfs4_stateid_match_other(stateid, &state->open_stateid)) {
+			inode = state->inode;
+			ihold(inode);
+			break;
+		}
+	spin_unlock(&owner->so_lock);
+	return inode;
+}
+
 void nfs4_put_open_state(struct nfs4_state *state)
 {
 	struct inode *inode = state->inode;