NFSv4: fix stateid refreshing when CLOSE racing with OPEN

Message ID	20191010074020.o2uwtuyegtmfdlze@XZHOUW.usersys.redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=DdpP=YD=vger.kernel.org=linux-nfs-owner@kernel.org> Date: Thu, 10 Oct 2019 15:40:20 +0800 From: Murphy Zhou <jencce.kernel@gmail.com> To: linux-nfs@vger.kernel.org, Trond Myklebust <trondmy@gmail.com> Subject: [PATCH] NFSv4: fix stateid refreshing when CLOSE racing with OPEN Message-ID: <20191010074020.o2uwtuyegtmfdlze@XZHOUW.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk
Series	NFSv4: fix stateid refreshing when CLOSE racing with OPEN \| expand NFSv4: fix stateid refreshing when CLOSE racing with OPEN

Murphy Zhou Oct. 10, 2019, 7:40 a.m. UTC

Since commit:
  [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE

xfstests generic/168 on v4.2 starts to fail because reflink call gets:
  +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable

In tshark output, NFS4ERR_OLD_STATEID stands out when comparing with
good ones:

 5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
 5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
 5212   NFS 250 V4 Reply (Call In 5211) GETATTR
 5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
 5214   NFS 250 V4 Reply (Call In 5213) GETATTR
 5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len: 65536
 5218   NFS 266 V4 Reply (Call In 5216) WRITE
 5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
 5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
 5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
 5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status: NFS4ERR_OLD_STATEID
 5225   NFS 338 V4 Call CLOSE StateID: 0xa342
 5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
 5227   NFS 266 V4 Reply (Call In 5225) CLOSE
 5228   NFS 250 V4 Reply (Call In 5226) GETATTR

It's easy to reproduce. By printing some logs, found that we are making
CLOSE seqid larger then OPEN seqid when racing.

Fix this by not bumping seqid when it's equal to OPEN seqid. Also
put the whole changing process into seqlock read protection in case
really bad luck, and this is the same locking behavior with the
old deleted function.

Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
---
 fs/nfs/nfs4proc.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Trond Myklebust Oct. 10, 2019, 2:46 p.m. UTC | #1

On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
> Since commit:
>   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> 
> xfstests generic/168 on v4.2 starts to fail because reflink call
> gets:
>   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> 
> In tshark output, NFS4ERR_OLD_STATEID stands out when comparing with
> good ones:
> 
>  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
>  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
>  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
>  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
>  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
>  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> 65536
>  5218   NFS 266 V4 Reply (Call In 5216) WRITE
>  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
>  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
>  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
>  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> NFS4ERR_OLD_STATEID
>  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
>  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
>  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
>  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> 
> It's easy to reproduce. By printing some logs, found that we are
> making
> CLOSE seqid larger then OPEN seqid when racing.
> 
> Fix this by not bumping seqid when it's equal to OPEN seqid. Also
> put the whole changing process into seqlock read protection in case
> really bad luck, and this is the same locking behavior with the
> old deleted function.
> 
> Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> CLOSE/OPEN_DOWNGRADE")
> Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
> ---
>  fs/nfs/nfs4proc.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 11eafcf..6db5a09 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -3334,12 +3334,13 @@ static void
> nfs4_sync_open_stateid(nfs4_stateid *dst,
>  			break;
>  		}
>  		seqid_open = state->open_stateid.seqid;
> -		if (read_seqretry(&state->seqlock, seq))
> -			continue;
>  
>  		dst_seqid = be32_to_cpu(dst->seqid);
>  		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) < 0)
>  			dst->seqid = seqid_open;
> +
> +		if (read_seqretry(&state->seqlock, seq))
> +			continue;

What's the intention of this change? Neither dst_seqid nor dst->seqid
are protected by the state->seqlock so why move this code under the
lock.

>  		break;
>  	}
>  }
> @@ -3367,14 +3368,16 @@ static bool
> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
>  			break;
>  		}
>  		seqid_open = state->open_stateid.seqid;
> -		if (read_seqretry(&state->seqlock, seq))
> -			continue;
>  
>  		dst_seqid = be32_to_cpu(dst->seqid);
> -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
>  			dst->seqid = cpu_to_be32(dst_seqid + 1);

This negates the whole intention of the patch you reference in the
'Fixes:', which was to allow us to CLOSE files even if seqid bumps have
been lost due to interrupted RPC calls e.g. when using 'soft' or
'softerr' mounts.
With the above change, the check could just be tossed out altogether,
because dst_seqid will never become larger than seqid_open.

>  		else
>  			dst->seqid = seqid_open;
> +
> +		if (read_seqretry(&state->seqlock, seq))
> +			continue;
> +

Again, why this change to code that doesn't appear to need protection?
If the bump does succeed above, then looping back will actually cause
unpredictable behaviour.

>  		ret = true;
>  		break;
>  	}

Olga Kornievskaia Oct. 10, 2019, 5:32 p.m. UTC | #2

On Thu, Oct 10, 2019 at 3:42 AM Murphy Zhou <jencce.kernel@gmail.com> wrote:
>
> Since commit:
>   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
>
> xfstests generic/168 on v4.2 starts to fail because reflink call gets:
>   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable

I don't believe this failure has to do with getting ERR_OLD_STATEID on
the CLOSE. What you see on the network trace is expected as the client
in parallel sends OPEN/CLOSE thus server will fail the CLOSE with the
ERR_OLD_STATEID since it already updated its stateid for the OPEN.

> In tshark output, NFS4ERR_OLD_STATEID stands out when comparing with
> good ones:
>
>  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
>  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
>  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
>  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
>  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
>  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len: 65536
>  5218   NFS 266 V4 Reply (Call In 5216) WRITE
>  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
>  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
>  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
>  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status: NFS4ERR_OLD_STATEID
>  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
>  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
>  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
>  5228   NFS 250 V4 Reply (Call In 5226) GETATTR

"resource temporarily unavailable" is more likely to do with ulimit limits.

I also saw the same error. After I increased the ulimit for the stack
size, the problem went away. There might still be a problem somewhere
in the kernel.

Trond, is it possible that we have too many CLOSE recovery on the
stack that's eating up stack space?

> It's easy to reproduce. By printing some logs, found that we are making
> CLOSE seqid larger then OPEN seqid when racing.
>
> Fix this by not bumping seqid when it's equal to OPEN seqid. Also
> put the whole changing process into seqlock read protection in case
> really bad luck, and this is the same locking behavior with the
> old deleted function.
>
> Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
> Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
> ---
>  fs/nfs/nfs4proc.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 11eafcf..6db5a09 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -3334,12 +3334,13 @@ static void nfs4_sync_open_stateid(nfs4_stateid *dst,
>                         break;
>                 }
>                 seqid_open = state->open_stateid.seqid;
> -               if (read_seqretry(&state->seqlock, seq))
> -                       continue;
>
>                 dst_seqid = be32_to_cpu(dst->seqid);
>                 if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) < 0)
>                         dst->seqid = seqid_open;
> +
> +               if (read_seqretry(&state->seqlock, seq))
> +                       continue;
>                 break;
>         }
>  }
> @@ -3367,14 +3368,16 @@ static bool nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
>                         break;
>                 }
>                 seqid_open = state->open_stateid.seqid;
> -               if (read_seqretry(&state->seqlock, seq))
> -                       continue;
>
>                 dst_seqid = be32_to_cpu(dst->seqid);
> -               if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> +               if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
>                         dst->seqid = cpu_to_be32(dst_seqid + 1);
>                 else
>                         dst->seqid = seqid_open;
> +
> +               if (read_seqretry(&state->seqlock, seq))
> +                       continue;
> +
>                 ret = true;
>                 break;
>         }
> --
> 1.8.3.1
>

Murphy Zhou Oct. 11, 2019, 8:49 a.m. UTC | #3

On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
> > Since commit:
> >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > 
> > xfstests generic/168 on v4.2 starts to fail because reflink call
> > gets:
> >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> > 
> > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing with
> > good ones:
> > 
> >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> > 65536
> >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> > NFS4ERR_OLD_STATEID
> >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> > 
> > It's easy to reproduce. By printing some logs, found that we are
> > making
> > CLOSE seqid larger then OPEN seqid when racing.
> > 
> > Fix this by not bumping seqid when it's equal to OPEN seqid. Also
> > put the whole changing process into seqlock read protection in case
> > really bad luck, and this is the same locking behavior with the
> > old deleted function.
> > 
> > Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> > CLOSE/OPEN_DOWNGRADE")
> > Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
> > ---
> >  fs/nfs/nfs4proc.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 11eafcf..6db5a09 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -3334,12 +3334,13 @@ static void
> > nfs4_sync_open_stateid(nfs4_stateid *dst,
> >  			break;
> >  		}
> >  		seqid_open = state->open_stateid.seqid;
> > -		if (read_seqretry(&state->seqlock, seq))
> > -			continue;
> >  
> >  		dst_seqid = be32_to_cpu(dst->seqid);
> >  		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) < 0)
> >  			dst->seqid = seqid_open;
> > +
> > +		if (read_seqretry(&state->seqlock, seq))
> > +			continue;
> 
> What's the intention of this change? Neither dst_seqid nor dst->seqid
> are protected by the state->seqlock so why move this code under the
> lock.

Because seqid_open could have changed when writing to dst->seqid ?

The old nfs4_refresh_open_stateid function put the writing under the lock.

> 
> >  		break;
> >  	}
> >  }
> > @@ -3367,14 +3368,16 @@ static bool
> > nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
> >  			break;
> >  		}
> >  		seqid_open = state->open_stateid.seqid;
> > -		if (read_seqretry(&state->seqlock, seq))
> > -			continue;
> >  
> >  		dst_seqid = be32_to_cpu(dst->seqid);
> > -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> > +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
> >  			dst->seqid = cpu_to_be32(dst_seqid + 1);
> 
> This negates the whole intention of the patch you reference in the
> 'Fixes:', which was to allow us to CLOSE files even if seqid bumps have
> been lost due to interrupted RPC calls e.g. when using 'soft' or
> 'softerr' mounts.
> With the above change, the check could just be tossed out altogether,
> because dst_seqid will never become larger than seqid_open.

Hmm.. I got it wrong. Thanks for the explanation.

> 
> >  		else
> >  			dst->seqid = seqid_open;
> > +
> > +		if (read_seqretry(&state->seqlock, seq))
> > +			continue;
> > +
> 
> Again, why this change to code that doesn't appear to need protection?
> If the bump does succeed above, then looping back will actually cause
> unpredictable behaviour.

Same as above. This change have nothing to do with the stateid race. I
just found it when reading the patch.

Thanks,
M
> 
> >  		ret = true;
> >  		break;
> >  	}
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>

Murphy Zhou Oct. 11, 2019, 9:42 a.m. UTC | #4

On Thu, Oct 10, 2019 at 01:32:50PM -0400, Olga Kornievskaia wrote:
> On Thu, Oct 10, 2019 at 3:42 AM Murphy Zhou <jencce.kernel@gmail.com> wrote:
> >
> > Since commit:
> >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> >
> > xfstests generic/168 on v4.2 starts to fail because reflink call gets:
> >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> 
> I don't believe this failure has to do with getting ERR_OLD_STATEID on
> the CLOSE. What you see on the network trace is expected as the client
> in parallel sends OPEN/CLOSE thus server will fail the CLOSE with the
> ERR_OLD_STATEID since it already updated its stateid for the OPEN.
> 
> > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing with
> > good ones:
> >
> >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len: 65536
> >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status: NFS4ERR_OLD_STATEID
> >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> 
> "resource temporarily unavailable" is more likely to do with ulimit limits.
> 
> I also saw the same error. After I increased the ulimit for the stack
> size, the problem went away. There might still be a problem somewhere
> in the kernel.

Do you mean ulimit -s ? I set it to 'unlimited' and still can reproduce
this.

Thanks,
M
> 
> Trond, is it possible that we have too many CLOSE recovery on the
> stack that's eating up stack space?
> 
> > It's easy to reproduce. By printing some logs, found that we are making
> > CLOSE seqid larger then OPEN seqid when racing.
> >
> > Fix this by not bumping seqid when it's equal to OPEN seqid. Also
> > put the whole changing process into seqlock read protection in case
> > really bad luck, and this is the same locking behavior with the
> > old deleted function.
> >
> > Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
> > Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
> > ---
> >  fs/nfs/nfs4proc.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 11eafcf..6db5a09 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -3334,12 +3334,13 @@ static void nfs4_sync_open_stateid(nfs4_stateid *dst,
> >                         break;
> >                 }
> >                 seqid_open = state->open_stateid.seqid;
> > -               if (read_seqretry(&state->seqlock, seq))
> > -                       continue;
> >
> >                 dst_seqid = be32_to_cpu(dst->seqid);
> >                 if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) < 0)
> >                         dst->seqid = seqid_open;
> > +
> > +               if (read_seqretry(&state->seqlock, seq))
> > +                       continue;
> >                 break;
> >         }
> >  }
> > @@ -3367,14 +3368,16 @@ static bool nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
> >                         break;
> >                 }
> >                 seqid_open = state->open_stateid.seqid;
> > -               if (read_seqretry(&state->seqlock, seq))
> > -                       continue;
> >
> >                 dst_seqid = be32_to_cpu(dst->seqid);
> > -               if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> > +               if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
> >                         dst->seqid = cpu_to_be32(dst_seqid + 1);
> >                 else
> >                         dst->seqid = seqid_open;
> > +
> > +               if (read_seqretry(&state->seqlock, seq))
> > +                       continue;
> > +
> >                 ret = true;
> >                 break;
> >         }
> > --
> > 1.8.3.1
> >

Trond Myklebust Oct. 11, 2019, 2:14 p.m. UTC | #5

On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
> > On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
> > > Since commit:
> > >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE
> > > 
> > > xfstests generic/168 on v4.2 starts to fail because reflink call
> > > gets:
> > >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> > > 
> > > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing
> > > with
> > > good ones:
> > > 
> > >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> > >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> > >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> > >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> > > 65536
> > >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> > >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> > >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> > >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> > >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> > > NFS4ERR_OLD_STATEID
> > >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> > >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> > >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> > > 
> > > It's easy to reproduce. By printing some logs, found that we are
> > > making
> > > CLOSE seqid larger then OPEN seqid when racing.
> > > 
> > > Fix this by not bumping seqid when it's equal to OPEN seqid. Also
> > > put the whole changing process into seqlock read protection in
> > > case
> > > really bad luck, and this is the same locking behavior with the
> > > old deleted function.
> > > 
> > > Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE")
> > > Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
> > > ---
> > >  fs/nfs/nfs4proc.c | 13 ++++++++-----
> > >  1 file changed, 8 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index 11eafcf..6db5a09 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -3334,12 +3334,13 @@ static void
> > > nfs4_sync_open_stateid(nfs4_stateid *dst,
> > >  			break;
> > >  		}
> > >  		seqid_open = state->open_stateid.seqid;
> > > -		if (read_seqretry(&state->seqlock, seq))
> > > -			continue;
> > >  
> > >  		dst_seqid = be32_to_cpu(dst->seqid);
> > >  		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) < 0)
> > >  			dst->seqid = seqid_open;
> > > +
> > > +		if (read_seqretry(&state->seqlock, seq))
> > > +			continue;
> > 
> > What's the intention of this change? Neither dst_seqid nor dst-
> > >seqid
> > are protected by the state->seqlock so why move this code under the
> > lock.
> 
> Because seqid_open could have changed when writing to dst->seqid ?
> 
> The old nfs4_refresh_open_stateid function put the writing under the
> lock.

The value of state->open_stateid could change both before and after we
write to dst->seqid. If we hold a lock, then it could change the moment
we release that lock. That's not something we're able to control.

The only thing we should care about here is whether or not the call to
nfs4_state_match_open_stateid_other(state, dst) and the storage of
state->open_stateid.seqid in seqid_open are being performed with the
same instance of the full state->open_stateid.

> 
> > >  		break;
> > >  	}
> > >  }
> > > @@ -3367,14 +3368,16 @@ static bool
> > > nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
> > >  			break;
> > >  		}
> > >  		seqid_open = state->open_stateid.seqid;
> > > -		if (read_seqretry(&state->seqlock, seq))
> > > -			continue;
> > >  
> > >  		dst_seqid = be32_to_cpu(dst->seqid);
> > > -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> > > +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
> > >  			dst->seqid = cpu_to_be32(dst_seqid + 1);
> > 
> > This negates the whole intention of the patch you reference in the
> > 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
> > have
> > been lost due to interrupted RPC calls e.g. when using 'soft' or
> > 'softerr' mounts.
> > With the above change, the check could just be tossed out
> > altogether,
> > because dst_seqid will never become larger than seqid_open.
> 
> Hmm.. I got it wrong. Thanks for the explanation.

So to be clear: I'm not saying that what you describe is not a problem.
I'm just saying that the fix you propose is really no better than
reverting the entire patch. I'd prefer not to do that, and would rather
see us look for ways to fix both problems, but if we can't find such as
fix then that would be the better solution.

> 
> > >  		else
> > >  			dst->seqid = seqid_open;
> > > +
> > > +		if (read_seqretry(&state->seqlock, seq))
> > > +			continue;
> > > +
> > 
> > Again, why this change to code that doesn't appear to need
> > protection?
> > If the bump does succeed above, then looping back will actually
> > cause
> > unpredictable behaviour.
> 
> Same as above. This change have nothing to do with the stateid race.
> I
> just found it when reading the patch.
> 
> Thanks,
> M
> > >  		ret = true;
> > >  		break;
> > >  	}

Trond Myklebust Oct. 11, 2019, 2:18 p.m. UTC | #6

On Thu, 2019-10-10 at 13:32 -0400, Olga Kornievskaia wrote:
> On Thu, Oct 10, 2019 at 3:42 AM Murphy Zhou <jencce.kernel@gmail.com>
> wrote:
> > Since commit:
> >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in
> > CLOSE/OPEN_DOWNGRADE
> > 
> > xfstests generic/168 on v4.2 starts to fail because reflink call
> > gets:
> >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> 
> I don't believe this failure has to do with getting ERR_OLD_STATEID
> on
> the CLOSE. What you see on the network trace is expected as the
> client
> in parallel sends OPEN/CLOSE thus server will fail the CLOSE with the
> ERR_OLD_STATEID since it already updated its stateid for the OPEN.
> 
> > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing
> > with
> > good ones:
> > 
> >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> > 65536
> >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> > NFS4ERR_OLD_STATEID
> >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> 
> "resource temporarily unavailable" is more likely to do with ulimit
> limits.
> 
> I also saw the same error. After I increased the ulimit for the stack
> size, the problem went away. There might still be a problem somewhere
> in the kernel.
> 
> Trond, is it possible that we have too many CLOSE recovery on the
> stack that's eating up stack space?

That shouldn't normally happen. CLOSE runs as an asynchronous RPC call,
so its stack usage should be pretty minimal (limited to whatever each
callback function uses).

Olga Kornievskaia Oct. 11, 2019, 6:50 p.m. UTC | #7

On Fri, Oct 11, 2019 at 10:18 AM Trond Myklebust
<trondmy@hammerspace.com> wrote:
>
> On Thu, 2019-10-10 at 13:32 -0400, Olga Kornievskaia wrote:
> > On Thu, Oct 10, 2019 at 3:42 AM Murphy Zhou <jencce.kernel@gmail.com>
> > wrote:
> > > Since commit:
> > >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE
> > >
> > > xfstests generic/168 on v4.2 starts to fail because reflink call
> > > gets:
> > >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> >
> > I don't believe this failure has to do with getting ERR_OLD_STATEID
> > on
> > the CLOSE. What you see on the network trace is expected as the
> > client
> > in parallel sends OPEN/CLOSE thus server will fail the CLOSE with the
> > ERR_OLD_STATEID since it already updated its stateid for the OPEN.
> >
> > > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing
> > > with
> > > good ones:
> > >
> > >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> > >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> > >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> > >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> > > 65536
> > >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> > >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> > >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> > >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> > >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> > > NFS4ERR_OLD_STATEID
> > >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> > >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> > >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> >
> > "resource temporarily unavailable" is more likely to do with ulimit
> > limits.
> >
> > I also saw the same error. After I increased the ulimit for the stack
> > size, the problem went away. There might still be a problem somewhere
> > in the kernel.
> >
> > Trond, is it possible that we have too many CLOSE recovery on the
> > stack that's eating up stack space?
>
> That shouldn't normally happen. CLOSE runs as an asynchronous RPC call,
> so its stack usage should be pretty minimal (limited to whatever each
> callback function uses).

Yeah, that wasn't it. I've straced generic/168 to catch
ioctl(clone_file_range) returning EAGAIN.

I've instrumented the kernel to see where we are returning an EAGAIN
in nfs42_proc_clone(). nfs42_proc_clone is failing on
nfs42_select_rw_context() because nfs4_copy_open_stateid() is failing
to get the open state. Basically it looks like we are trying to do a
clone on a file that's not opened. Still trying to understand
things...

Olga Kornievskaia Oct. 19, 2019, 12:34 a.m. UTC | #8

On Fri, Oct 11, 2019 at 2:50 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Fri, Oct 11, 2019 at 10:18 AM Trond Myklebust
> <trondmy@hammerspace.com> wrote:
> >
> > On Thu, 2019-10-10 at 13:32 -0400, Olga Kornievskaia wrote:
> > > On Thu, Oct 10, 2019 at 3:42 AM Murphy Zhou <jencce.kernel@gmail.com>
> > > wrote:
> > > > Since commit:
> > > >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > > CLOSE/OPEN_DOWNGRADE
> > > >
> > > > xfstests generic/168 on v4.2 starts to fail because reflink call
> > > > gets:
> > > >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> > >
> > > I don't believe this failure has to do with getting ERR_OLD_STATEID
> > > on
> > > the CLOSE. What you see on the network trace is expected as the
> > > client
> > > in parallel sends OPEN/CLOSE thus server will fail the CLOSE with the
> > > ERR_OLD_STATEID since it already updated its stateid for the OPEN.
> > >
> > > > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing
> > > > with
> > > > good ones:
> > > >
> > > >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> > > >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > > >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> > > >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > > >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> > > >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> > > > 65536
> > > >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> > > >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> > > >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> > > >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> > > >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> > > > NFS4ERR_OLD_STATEID
> > > >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> > > >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > > >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> > > >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> > >
> > > "resource temporarily unavailable" is more likely to do with ulimit
> > > limits.
> > >
> > > I also saw the same error. After I increased the ulimit for the stack
> > > size, the problem went away. There might still be a problem somewhere
> > > in the kernel.
> > >
> > > Trond, is it possible that we have too many CLOSE recovery on the
> > > stack that's eating up stack space?
> >
> > That shouldn't normally happen. CLOSE runs as an asynchronous RPC call,
> > so its stack usage should be pretty minimal (limited to whatever each
> > callback function uses).
>
> Yeah, that wasn't it. I've straced generic/168 to catch
> ioctl(clone_file_range) returning EAGAIN.
>
> I've instrumented the kernel to see where we are returning an EAGAIN
> in nfs42_proc_clone(). nfs42_proc_clone is failing on
> nfs42_select_rw_context() because nfs4_copy_open_stateid() is failing
> to get the open state. Basically it looks like we are trying to do a
> clone on a file that's not opened. Still trying to understand
> things...

Trond,

Generic/168 fails in 2 ways (though only 1 leads to the failure in
xfs_io). Another way is having a file closed then client using the
stateid for the write and getting a bad_stateid which the client
recovers from (but the fact that client shouldn't have done that is a
problem). Another is the clone where again happens that file is
"closed" and clone is trying to use a stateid.

The problem comes from the following fact. We have a racing CLOSE and
OPEN. Where client did the CLOSE followed by the OPEN but the server
processed OPEN and then the CLOSE. Server returns OLD_STATEID to the
CLOSE. What the code does it bumps the sequence id and resends the
CLOSE which inadvertently is closing a file that was opened before.
While IO errors from this are recoverable, the clone error is visible
to the application (I think another case would be a copy).

I don't have a code solution yet. But it'll have to be something where
we need to ignore a CLOSE with OLD_STATEID when another OPEN happened.

Olga Kornievskaia Oct. 21, 2019, 5:15 p.m. UTC | #9

On Fri, Oct 18, 2019 at 8:34 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Fri, Oct 11, 2019 at 2:50 PM Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Fri, Oct 11, 2019 at 10:18 AM Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > >
> > > On Thu, 2019-10-10 at 13:32 -0400, Olga Kornievskaia wrote:
> > > > On Thu, Oct 10, 2019 at 3:42 AM Murphy Zhou <jencce.kernel@gmail.com>
> > > > wrote:
> > > > > Since commit:
> > > > >   [0e0cb35] NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > > > CLOSE/OPEN_DOWNGRADE
> > > > >
> > > > > xfstests generic/168 on v4.2 starts to fail because reflink call
> > > > > gets:
> > > > >   +XFS_IOC_CLONE_RANGE: Resource temporarily unavailable
> > > >
> > > > I don't believe this failure has to do with getting ERR_OLD_STATEID
> > > > on
> > > > the CLOSE. What you see on the network trace is expected as the
> > > > client
> > > > in parallel sends OPEN/CLOSE thus server will fail the CLOSE with the
> > > > ERR_OLD_STATEID since it already updated its stateid for the OPEN.
> > > >
> > > > > In tshark output, NFS4ERR_OLD_STATEID stands out when comparing
> > > > > with
> > > > > good ones:
> > > > >
> > > > >  5210   NFS 406 V4 Reply (Call In 5209) OPEN StateID: 0xadb5
> > > > >  5211   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > > > >  5212   NFS 250 V4 Reply (Call In 5211) GETATTR
> > > > >  5213   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > > > >  5214   NFS 250 V4 Reply (Call In 5213) GETATTR
> > > > >  5216   NFS 422 V4 Call WRITE StateID: 0xa818 Offset: 851968 Len:
> > > > > 65536
> > > > >  5218   NFS 266 V4 Reply (Call In 5216) WRITE
> > > > >  5219   NFS 382 V4 Call OPEN DH: 0x8d44a6b1/
> > > > >  5220   NFS 338 V4 Call CLOSE StateID: 0xadb5
> > > > >  5222   NFS 406 V4 Reply (Call In 5219) OPEN StateID: 0xa342
> > > > >  5223   NFS 250 V4 Reply (Call In 5220) CLOSE Status:
> > > > > NFS4ERR_OLD_STATEID
> > > > >  5225   NFS 338 V4 Call CLOSE StateID: 0xa342
> > > > >  5226   NFS 314 V4 Call GETATTR FH: 0x8d44a6b1
> > > > >  5227   NFS 266 V4 Reply (Call In 5225) CLOSE
> > > > >  5228   NFS 250 V4 Reply (Call In 5226) GETATTR
> > > >
> > > > "resource temporarily unavailable" is more likely to do with ulimit
> > > > limits.
> > > >
> > > > I also saw the same error. After I increased the ulimit for the stack
> > > > size, the problem went away. There might still be a problem somewhere
> > > > in the kernel.
> > > >
> > > > Trond, is it possible that we have too many CLOSE recovery on the
> > > > stack that's eating up stack space?
> > >
> > > That shouldn't normally happen. CLOSE runs as an asynchronous RPC call,
> > > so its stack usage should be pretty minimal (limited to whatever each
> > > callback function uses).
> >
> > Yeah, that wasn't it. I've straced generic/168 to catch
> > ioctl(clone_file_range) returning EAGAIN.
> >
> > I've instrumented the kernel to see where we are returning an EAGAIN
> > in nfs42_proc_clone(). nfs42_proc_clone is failing on
> > nfs42_select_rw_context() because nfs4_copy_open_stateid() is failing
> > to get the open state. Basically it looks like we are trying to do a
> > clone on a file that's not opened. Still trying to understand
> > things...
>
> Trond,
>
> Generic/168 fails in 2 ways (though only 1 leads to the failure in
> xfs_io). Another way is having a file closed then client using the
> stateid for the write and getting a bad_stateid which the client
> recovers from (but the fact that client shouldn't have done that is a
> problem). Another is the clone where again happens that file is
> "closed" and clone is trying to use a stateid.
>
> The problem comes from the following fact. We have a racing CLOSE and
> OPEN. Where client did the CLOSE followed by the OPEN but the server
> processed OPEN and then the CLOSE. Server returns OLD_STATEID to the
> CLOSE. What the code does it bumps the sequence id and resends the
> CLOSE which inadvertently is closing a file that was opened before.
> While IO errors from this are recoverable, the clone error is visible
> to the application (I think another case would be a copy).
>
> I don't have a code solution yet. But it'll have to be something where
> we need to ignore a CLOSE with OLD_STATEID when another OPEN happened.

Also besides the cases for the clone, copy, if the racing open got a
delegation and that was used during the BAD_STATEID that's an
automatic  EIO to the application.

I guess my proposal is to revert 12f275cdd1638. The same problem
exists for the DELEGRETURN racing with the open and if we re-try we
will return a delegation that was just given to us but client would
think that we still have it.

Benjamin Coddington Sept. 3, 2020, 5:54 p.m. UTC | #10

On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
> On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
>> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
>>> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
...
>>>> @@ -3367,14 +3368,16 @@ static bool
>>>> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
>>>>  			break;
>>>>  		}
>>>>  		seqid_open = state->open_stateid.seqid;
>>>> -		if (read_seqretry(&state->seqlock, seq))
>>>> -			continue;
>>>>
>>>>  		dst_seqid = be32_to_cpu(dst->seqid);
>>>> -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
>>>> +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
>>>>  			dst->seqid = cpu_to_be32(dst_seqid + 1);
>>>
>>> This negates the whole intention of the patch you reference in the
>>> 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
>>> have
>>> been lost due to interrupted RPC calls e.g. when using 'soft' or
>>> 'softerr' mounts.
>>> With the above change, the check could just be tossed out
>>> altogether,
>>> because dst_seqid will never become larger than seqid_open.
>>
>> Hmm.. I got it wrong. Thanks for the explanation.
>
> So to be clear: I'm not saying that what you describe is not a problem.
> I'm just saying that the fix you propose is really no better than
> reverting the entire patch. I'd prefer not to do that, and would rather
> see us look for ways to fix both problems, but if we can't find such as
> fix then that would be the better solution.

Hi Trond and Murphy Zhou,

Sorry to resurrect this old thread, but I'm wondering if any progress was
made on this front.

I'm seeing this race manifest when process is never able to escape from the
loop in nfs_set_open_stateid_locked() if CLOSE comes through first and
clears out the state.  We can play bit-fiddling games to fix that, but I
feel like we'll just end up breaking it again later with another fix.

Either we should revert 0e0cb35b417f, or talk about how to fix it.  Seems
like we should be able to put the CLOSE on the nfs4_state->waitq as well,
and see if we can't just take that approach anytime our operations get out
of sequence.  Do you see any problems with this approach?

Ben

Murphy Zhou Sept. 4, 2020, 3:04 a.m. UTC | #11

Hi Benjamin,

On Thu, Sep 03, 2020 at 01:54:26PM -0400, Benjamin Coddington wrote:
> 
> On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
> > On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
> >> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
> >>> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
> ...
> >>>> @@ -3367,14 +3368,16 @@ static bool
> >>>> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
> >>>>  			break;
> >>>>  		}
> >>>>  		seqid_open = state->open_stateid.seqid;
> >>>> -		if (read_seqretry(&state->seqlock, seq))
> >>>> -			continue;
> >>>>
> >>>>  		dst_seqid = be32_to_cpu(dst->seqid);
> >>>> -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> >>>> +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
> >>>>  			dst->seqid = cpu_to_be32(dst_seqid + 1);
> >>>
> >>> This negates the whole intention of the patch you reference in the
> >>> 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
> >>> have
> >>> been lost due to interrupted RPC calls e.g. when using 'soft' or
> >>> 'softerr' mounts.
> >>> With the above change, the check could just be tossed out
> >>> altogether,
> >>> because dst_seqid will never become larger than seqid_open.
> >>
> >> Hmm.. I got it wrong. Thanks for the explanation.
> >
> > So to be clear: I'm not saying that what you describe is not a problem.
> > I'm just saying that the fix you propose is really no better than
> > reverting the entire patch. I'd prefer not to do that, and would rather
> > see us look for ways to fix both problems, but if we can't find such as
> > fix then that would be the better solution.
> 
> Hi Trond and Murphy Zhou,
> 
> Sorry to resurrect this old thread, but I'm wondering if any progress was
> made on this front.

This failure stoped showing up since v5.6-rc1 release cycle
in my records. Can you reproduce this on latest upstream kernel?

Thanks!

> 
> I'm seeing this race manifest when process is never able to escape from the
> loop in nfs_set_open_stateid_locked() if CLOSE comes through first and
> clears out the state.  We can play bit-fiddling games to fix that, but I
> feel like we'll just end up breaking it again later with another fix.
> 
> Either we should revert 0e0cb35b417f, or talk about how to fix it.  Seems
> like we should be able to put the CLOSE on the nfs4_state->waitq as well,
> and see if we can't just take that approach anytime our operations get out
> of sequence.  Do you see any problems with this approach?
> 
> Ben
>

Benjamin Coddington Sept. 4, 2020, 10:55 a.m. UTC | #12

On 3 Sep 2020, at 23:04, Murphy Zhou wrote:

> Hi Benjamin,
>
> On Thu, Sep 03, 2020 at 01:54:26PM -0400, Benjamin Coddington wrote:
>>
>> On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
>>> On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
>>>> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
>>>>> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
>> ...
>>>>>> @@ -3367,14 +3368,16 @@ static bool
>>>>>> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
>>>>>>  			break;
>>>>>>  		}
>>>>>>  		seqid_open = state->open_stateid.seqid;
>>>>>> -		if (read_seqretry(&state->seqlock, seq))
>>>>>> -			continue;
>>>>>>
>>>>>>  		dst_seqid = be32_to_cpu(dst->seqid);
>>>>>> -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
>>>>>> +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
>>>>>>  			dst->seqid = cpu_to_be32(dst_seqid + 1);
>>>>>
>>>>> This negates the whole intention of the patch you reference in the
>>>>> 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
>>>>> have
>>>>> been lost due to interrupted RPC calls e.g. when using 'soft' or
>>>>> 'softerr' mounts.
>>>>> With the above change, the check could just be tossed out
>>>>> altogether,
>>>>> because dst_seqid will never become larger than seqid_open.
>>>>
>>>> Hmm.. I got it wrong. Thanks for the explanation.
>>>
>>> So to be clear: I'm not saying that what you describe is not a problem.
>>> I'm just saying that the fix you propose is really no better than
>>> reverting the entire patch. I'd prefer not to do that, and would rather
>>> see us look for ways to fix both problems, but if we can't find such as
>>> fix then that would be the better solution.
>>
>> Hi Trond and Murphy Zhou,
>>
>> Sorry to resurrect this old thread, but I'm wondering if any progress was
>> made on this front.
>
> This failure stoped showing up since v5.6-rc1 release cycle
> in my records. Can you reproduce this on latest upstream kernel?

I'm seeing it on generic/168 on a v5.8 client against a v5.3 knfsd server.
When I test against v5.8 server, the test takes longer to complete and I
have yet to reproduce the livelock.

- on v5.3 server takes ~50 iterations to produce, each test completes in ~40
seconds
- on v5.8 server my test has run ~750 iterations without getting into
the lock, each test takes ~60 seconds.

I suspect recent changes to the server have changed the timing of open
replies such that the problem isn't reproduced on the client.

Ben

Chuck Lever Sept. 4, 2020, 2:14 p.m. UTC | #13

> On Sep 4, 2020, at 6:55 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 3 Sep 2020, at 23:04, Murphy Zhou wrote:
> 
>> Hi Benjamin,
>> 
>> On Thu, Sep 03, 2020 at 01:54:26PM -0400, Benjamin Coddington wrote:
>>> 
>>> On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
>>>> On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
>>>>> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
>>>>>> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
>>> ...
>>>>>>> @@ -3367,14 +3368,16 @@ static bool
>>>>>>> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
>>>>>>> 			break;
>>>>>>> 		}
>>>>>>> 		seqid_open = state->open_stateid.seqid;
>>>>>>> -		if (read_seqretry(&state->seqlock, seq))
>>>>>>> -			continue;
>>>>>>> 
>>>>>>> 		dst_seqid = be32_to_cpu(dst->seqid);
>>>>>>> -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
>>>>>>> +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
>>>>>>> 			dst->seqid = cpu_to_be32(dst_seqid + 1);
>>>>>> 
>>>>>> This negates the whole intention of the patch you reference in the
>>>>>> 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
>>>>>> have
>>>>>> been lost due to interrupted RPC calls e.g. when using 'soft' or
>>>>>> 'softerr' mounts.
>>>>>> With the above change, the check could just be tossed out
>>>>>> altogether,
>>>>>> because dst_seqid will never become larger than seqid_open.
>>>>> 
>>>>> Hmm.. I got it wrong. Thanks for the explanation.
>>>> 
>>>> So to be clear: I'm not saying that what you describe is not a problem.
>>>> I'm just saying that the fix you propose is really no better than
>>>> reverting the entire patch. I'd prefer not to do that, and would rather
>>>> see us look for ways to fix both problems, but if we can't find such as
>>>> fix then that would be the better solution.
>>> 
>>> Hi Trond and Murphy Zhou,
>>> 
>>> Sorry to resurrect this old thread, but I'm wondering if any progress was
>>> made on this front.
>> 
>> This failure stoped showing up since v5.6-rc1 release cycle
>> in my records. Can you reproduce this on latest upstream kernel?
> 
> I'm seeing it on generic/168 on a v5.8 client against a v5.3 knfsd server.
> When I test against v5.8 server, the test takes longer to complete and I
> have yet to reproduce the livelock.
> 
> - on v5.3 server takes ~50 iterations to produce, each test completes in ~40
> seconds
> - on v5.8 server my test has run ~750 iterations without getting into
> the lock, each test takes ~60 seconds.
> 
> I suspect recent changes to the server have changed the timing of open
> replies such that the problem isn't reproduced on the client.

The Linux NFS server in v5.4 does behave differently than earlier
kernels with NFSv4.0, and it is performance-related. The filecache
went into v5.4, and that seems to change the frequency at which
the server offers delegations.

I'm looking into it, and learning a bunch.


--
Chuck Lever

Olga Kornievskaia Sept. 4, 2020, 4:13 p.m. UTC | #14

On Thu, Sep 3, 2020 at 1:55 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
>
> On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
> > On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
> >> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
> >>> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
> ...
> >>>> @@ -3367,14 +3368,16 @@ static bool
> >>>> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
> >>>>                    break;
> >>>>            }
> >>>>            seqid_open = state->open_stateid.seqid;
> >>>> -          if (read_seqretry(&state->seqlock, seq))
> >>>> -                  continue;
> >>>>
> >>>>            dst_seqid = be32_to_cpu(dst->seqid);
> >>>> -          if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
> >>>> +          if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
> >>>>                    dst->seqid = cpu_to_be32(dst_seqid + 1);
> >>>
> >>> This negates the whole intention of the patch you reference in the
> >>> 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
> >>> have
> >>> been lost due to interrupted RPC calls e.g. when using 'soft' or
> >>> 'softerr' mounts.
> >>> With the above change, the check could just be tossed out
> >>> altogether,
> >>> because dst_seqid will never become larger than seqid_open.
> >>
> >> Hmm.. I got it wrong. Thanks for the explanation.
> >
> > So to be clear: I'm not saying that what you describe is not a problem.
> > I'm just saying that the fix you propose is really no better than
> > reverting the entire patch. I'd prefer not to do that, and would rather
> > see us look for ways to fix both problems, but if we can't find such as
> > fix then that would be the better solution.
>
> Hi Trond and Murphy Zhou,
>
> Sorry to resurrect this old thread, but I'm wondering if any progress was
> made on this front.
>
> I'm seeing this race manifest when process is never able to escape from the
> loop in nfs_set_open_stateid_locked() if CLOSE comes through first and
> clears out the state.  We can play bit-fiddling games to fix that, but I
> feel like we'll just end up breaking it again later with another fix.
>
> Either we should revert 0e0cb35b417f, or talk about how to fix it.  Seems
> like we should be able to put the CLOSE on the nfs4_state->waitq as well,
> and see if we can't just take that approach anytime our operations get out
> of sequence.  Do you see any problems with this approach?
>

I'm not sure reverting the patch is the solution? Because I'm running
into the infinite ERR_OLD_STATEID loop on CLOSE on SLE15SP2 machines
which don't have this patch at all.

> Ben
>

Benjamin Coddington Sept. 8, 2020, 12:43 p.m. UTC | #15

On 4 Sep 2020, at 10:14, Chuck Lever wrote:

>> On Sep 4, 2020, at 6:55 AM, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 3 Sep 2020, at 23:04, Murphy Zhou wrote:
>>
>>> Hi Benjamin,
>>>
>>> On Thu, Sep 03, 2020 at 01:54:26PM -0400, Benjamin Coddington wrote:
>>>>
>>>> On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
>>>>> On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
>>>>>> On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
>>>>>>> On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
>>>> ...
>>>>>>>> @@ -3367,14 +3368,16 @@ static bool
>>>>>>>> nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
>>>>>>>> 			break;
>>>>>>>> 		}
>>>>>>>> 		seqid_open = state->open_stateid.seqid;
>>>>>>>> -		if (read_seqretry(&state->seqlock, seq))
>>>>>>>> -			continue;
>>>>>>>>
>>>>>>>> 		dst_seqid = be32_to_cpu(dst->seqid);
>>>>>>>> -		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
>>>>>>>> +		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
>>>>>>>> 			dst->seqid = cpu_to_be32(dst_seqid + 1);
>>>>>>>
>>>>>>> This negates the whole intention of the patch you reference in 
>>>>>>> the
>>>>>>> 'Fixes:', which was to allow us to CLOSE files even if seqid 
>>>>>>> bumps
>>>>>>> have
>>>>>>> been lost due to interrupted RPC calls e.g. when using 'soft' or
>>>>>>> 'softerr' mounts.
>>>>>>> With the above change, the check could just be tossed out
>>>>>>> altogether,
>>>>>>> because dst_seqid will never become larger than seqid_open.
>>>>>>
>>>>>> Hmm.. I got it wrong. Thanks for the explanation.
>>>>>
>>>>> So to be clear: I'm not saying that what you describe is not a 
>>>>> problem.
>>>>> I'm just saying that the fix you propose is really no better than
>>>>> reverting the entire patch. I'd prefer not to do that, and would 
>>>>> rather
>>>>> see us look for ways to fix both problems, but if we can't find 
>>>>> such as
>>>>> fix then that would be the better solution.
>>>>
>>>> Hi Trond and Murphy Zhou,
>>>>
>>>> Sorry to resurrect this old thread, but I'm wondering if any 
>>>> progress was
>>>> made on this front.
>>>
>>> This failure stoped showing up since v5.6-rc1 release cycle
>>> in my records. Can you reproduce this on latest upstream kernel?
>>
>> I'm seeing it on generic/168 on a v5.8 client against a v5.3 knfsd 
>> server.
>> When I test against v5.8 server, the test takes longer to complete 
>> and I
>> have yet to reproduce the livelock.
>>
>> - on v5.3 server takes ~50 iterations to produce, each test completes 
>> in ~40
>> seconds
>> - on v5.8 server my test has run ~750 iterations without getting into
>> the lock, each test takes ~60 seconds.
>>
>> I suspect recent changes to the server have changed the timing of 
>> open
>> replies such that the problem isn't reproduced on the client.
>
> The Linux NFS server in v5.4 does behave differently than earlier
> kernels with NFSv4.0, and it is performance-related. The filecache
> went into v5.4, and that seems to change the frequency at which
> the server offers delegations.

Just a point of reference - finally reproduced it on a v5.8 server after
4900 runs.  This took several days, and helped to heat the basement.

Ben

NFSv4: fix stateid refreshing when CLOSE racing with OPEN

Commit Message

Comments

Patch