Message ID | 169963371324.5404.3057239228897633466.stgit@bazille.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSD DRC fixes for v6.7-rc | expand |
On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > The "statp + 1" pointer that is passed to nfsd_cache_update() is > supposed to point to the start of the egress NFS Reply header. In > fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. > > But both krb5i and krb5p add fields between the RPC header's > accept_stat field and the start of the NFS Reply header. In those > cases, "statp + 1" points at the extra fields instead of the Reply. > The result is that nfsd_cache_update() caches what looks to the > client like garbage. > > A connection break can occur for a number of reasons, but the most > common reason when using krb5i/p is a GSS sequence number window > underrun. When an underrun is detected, the server is obliged to > drop the RPC and the connection to force a retransmit with a fresh > GSS sequence number. The client presents the same XID, it hits in > the server's DRC, and the server returns the garbage cache entry. > > The "statp + 1" argument has been used since the oldest changeset > in the kernel history repo, so it has been in nfsd_dispatch() > literally since before history began. The problem arose only when > the server-side GSS implementation was added twenty years ago. > > This particular patch applies cleanly to v6.5 and later, but needs > some context adjustment to apply to earlier kernels. Before v5.16, > nfsd_dispatch() does not use xdr_stream, so saving the NFS header > pointer before calling ->pc_encode is still an appropriate fix > but it needs to be implemented differently. > > Cc: <stable@vger.kernel.org> # v5.16+ > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/nfssvc.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index d6122bb2d167..60aacca2bca6 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > const struct svc_procedure *proc = rqstp->rq_procinfo; > __be32 *statp = rqstp->rq_accept_statp; > struct nfsd_cacherep *rp; > + __be32 *nfs_reply; > > /* > * Give the xdr decoder a chance to change this if it wants > @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > if (test_bit(RQ_DROPME, &rqstp->rq_flags)) > goto out_update_drop; > > + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0); > if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) > goto out_encode_err; > > @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > */ > smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1); > > - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1); > + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply); > out_cached_reply: > return 1; > > > With this patch, I'm seeing a regression in pynfs RPLY14. In the attached capture the client sends a replay of an earlier call, and the server responds (frame #97) with a reply that is truncated just after the RPC accept state.
On Fri, Nov 17, 2023 at 09:57:49AM -0500, Jeff Layton wrote: > On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > The "statp + 1" pointer that is passed to nfsd_cache_update() is > > supposed to point to the start of the egress NFS Reply header. In > > fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. > > > > But both krb5i and krb5p add fields between the RPC header's > > accept_stat field and the start of the NFS Reply header. In those > > cases, "statp + 1" points at the extra fields instead of the Reply. > > The result is that nfsd_cache_update() caches what looks to the > > client like garbage. > > > > A connection break can occur for a number of reasons, but the most > > common reason when using krb5i/p is a GSS sequence number window > > underrun. When an underrun is detected, the server is obliged to > > drop the RPC and the connection to force a retransmit with a fresh > > GSS sequence number. The client presents the same XID, it hits in > > the server's DRC, and the server returns the garbage cache entry. > > > > The "statp + 1" argument has been used since the oldest changeset > > in the kernel history repo, so it has been in nfsd_dispatch() > > literally since before history began. The problem arose only when > > the server-side GSS implementation was added twenty years ago. > > > > This particular patch applies cleanly to v6.5 and later, but needs > > some context adjustment to apply to earlier kernels. Before v5.16, > > nfsd_dispatch() does not use xdr_stream, so saving the NFS header > > pointer before calling ->pc_encode is still an appropriate fix > > but it needs to be implemented differently. > > > > Cc: <stable@vger.kernel.org> # v5.16+ > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > fs/nfsd/nfssvc.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > > index d6122bb2d167..60aacca2bca6 100644 > > --- a/fs/nfsd/nfssvc.c > > +++ b/fs/nfsd/nfssvc.c > > @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > > const struct svc_procedure *proc = rqstp->rq_procinfo; > > __be32 *statp = rqstp->rq_accept_statp; > > struct nfsd_cacherep *rp; > > + __be32 *nfs_reply; > > > > /* > > * Give the xdr decoder a chance to change this if it wants > > @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > > if (test_bit(RQ_DROPME, &rqstp->rq_flags)) > > goto out_update_drop; > > > > + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0); > > if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) > > goto out_encode_err; > > > > @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > > */ > > smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1); > > > > - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1); > > + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply); > > out_cached_reply: > > return 1; > > > > > > > > With this patch, I'm seeing a regression in pynfs RPLY14. In the > attached capture the client sends a replay of an earlier call, and the > server responds (frame #97) with a reply that is truncated just after > the RPC accept state. I've reproduced it. Looking now.
> On Nov 17, 2023, at 10:08 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Fri, Nov 17, 2023 at 09:57:49AM -0500, Jeff Layton wrote: >> On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@oracle.com> >>> >>> The "statp + 1" pointer that is passed to nfsd_cache_update() is >>> supposed to point to the start of the egress NFS Reply header. In >>> fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. >>> >>> But both krb5i and krb5p add fields between the RPC header's >>> accept_stat field and the start of the NFS Reply header. In those >>> cases, "statp + 1" points at the extra fields instead of the Reply. >>> The result is that nfsd_cache_update() caches what looks to the >>> client like garbage. >>> >>> A connection break can occur for a number of reasons, but the most >>> common reason when using krb5i/p is a GSS sequence number window >>> underrun. When an underrun is detected, the server is obliged to >>> drop the RPC and the connection to force a retransmit with a fresh >>> GSS sequence number. The client presents the same XID, it hits in >>> the server's DRC, and the server returns the garbage cache entry. >>> >>> The "statp + 1" argument has been used since the oldest changeset >>> in the kernel history repo, so it has been in nfsd_dispatch() >>> literally since before history began. The problem arose only when >>> the server-side GSS implementation was added twenty years ago. >>> >>> This particular patch applies cleanly to v6.5 and later, but needs >>> some context adjustment to apply to earlier kernels. Before v5.16, >>> nfsd_dispatch() does not use xdr_stream, so saving the NFS header >>> pointer before calling ->pc_encode is still an appropriate fix >>> but it needs to be implemented differently. >>> >>> Cc: <stable@vger.kernel.org> # v5.16+ >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>> --- >>> fs/nfsd/nfssvc.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c >>> index d6122bb2d167..60aacca2bca6 100644 >>> --- a/fs/nfsd/nfssvc.c >>> +++ b/fs/nfsd/nfssvc.c >>> @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) >>> const struct svc_procedure *proc = rqstp->rq_procinfo; >>> __be32 *statp = rqstp->rq_accept_statp; >>> struct nfsd_cacherep *rp; >>> + __be32 *nfs_reply; >>> >>> /* >>> * Give the xdr decoder a chance to change this if it wants >>> @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) >>> if (test_bit(RQ_DROPME, &rqstp->rq_flags)) >>> goto out_update_drop; >>> >>> + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0); >>> if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) >>> goto out_encode_err; >>> >>> @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) >>> */ >>> smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1); >>> >>> - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1); >>> + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply); >>> out_cached_reply: >>> return 1; >>> >>> >>> >> >> With this patch, I'm seeing a regression in pynfs RPLY14. In the >> attached capture the client sends a replay of an earlier call, and the >> server responds (frame #97) with a reply that is truncated just after >> the RPC accept state. > > I've reproduced it. Looking now. One line fix was squashed into "NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()". The new series is in the nfsd-fixes branch of my repo on kernel.org <http://kernel.org/>. -- Chuck Lever
On Fri, 2023-11-17 at 18:58 +0000, Chuck Lever III wrote: > > > On Nov 17, 2023, at 10:08 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > > On Fri, Nov 17, 2023 at 09:57:49AM -0500, Jeff Layton wrote: > > > On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote: > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > The "statp + 1" pointer that is passed to nfsd_cache_update() is > > > > supposed to point to the start of the egress NFS Reply header. In > > > > fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. > > > > > > > > But both krb5i and krb5p add fields between the RPC header's > > > > accept_stat field and the start of the NFS Reply header. In those > > > > cases, "statp + 1" points at the extra fields instead of the Reply. > > > > The result is that nfsd_cache_update() caches what looks to the > > > > client like garbage. > > > > > > > > A connection break can occur for a number of reasons, but the most > > > > common reason when using krb5i/p is a GSS sequence number window > > > > underrun. When an underrun is detected, the server is obliged to > > > > drop the RPC and the connection to force a retransmit with a fresh > > > > GSS sequence number. The client presents the same XID, it hits in > > > > the server's DRC, and the server returns the garbage cache entry. > > > > > > > > The "statp + 1" argument has been used since the oldest changeset > > > > in the kernel history repo, so it has been in nfsd_dispatch() > > > > literally since before history began. The problem arose only when > > > > the server-side GSS implementation was added twenty years ago. > > > > > > > > This particular patch applies cleanly to v6.5 and later, but needs > > > > some context adjustment to apply to earlier kernels. Before v5.16, > > > > nfsd_dispatch() does not use xdr_stream, so saving the NFS header > > > > pointer before calling ->pc_encode is still an appropriate fix > > > > but it needs to be implemented differently. > > > > > > > > Cc: <stable@vger.kernel.org> # v5.16+ > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > --- > > > > fs/nfsd/nfssvc.c | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > > > > index d6122bb2d167..60aacca2bca6 100644 > > > > --- a/fs/nfsd/nfssvc.c > > > > +++ b/fs/nfsd/nfssvc.c > > > > @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > > > > const struct svc_procedure *proc = rqstp->rq_procinfo; > > > > __be32 *statp = rqstp->rq_accept_statp; > > > > struct nfsd_cacherep *rp; > > > > + __be32 *nfs_reply; > > > > > > > > /* > > > > * Give the xdr decoder a chance to change this if it wants > > > > @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > > > > if (test_bit(RQ_DROPME, &rqstp->rq_flags)) > > > > goto out_update_drop; > > > > > > > > + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0); > > > > if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) > > > > goto out_encode_err; > > > > > > > > @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) > > > > */ > > > > smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1); > > > > > > > > - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1); > > > > + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply); > > > > out_cached_reply: > > > > return 1; > > > > > > > > > > > > > > > > > > With this patch, I'm seeing a regression in pynfs RPLY14. In the > > > attached capture the client sends a replay of an earlier call, and the > > > server responds (frame #97) with a reply that is truncated just after > > > the RPC accept state. > > > > I've reproduced it. Looking now. > > One line fix was squashed into "NFSD: Fix "start of NFS reply" > pointer passed to nfsd_cache_update()". The new series is in > the nfsd-fixes branch of my repo on kernel.org <http://kernel.org/>. > LGTM. You can add this to the pile: Tested-by: Jeff Layton <jlayton@kernel.org>
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index d6122bb2d167..60aacca2bca6 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) const struct svc_procedure *proc = rqstp->rq_procinfo; __be32 *statp = rqstp->rq_accept_statp; struct nfsd_cacherep *rp; + __be32 *nfs_reply; /* * Give the xdr decoder a chance to change this if it wants @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) if (test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop; + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0); if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) goto out_encode_err; @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp) */ smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1); - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1); + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply); out_cached_reply: return 1;