diff mbox series

[v2,1/3] NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()

Message ID 169963371324.5404.3057239228897633466.stgit@bazille.1015granger.net (mailing list archive)
State New, archived
Headers show
Series NFSD DRC fixes for v6.7-rc | expand

Commit Message

Chuck Lever Nov. 10, 2023, 4:28 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

The "statp + 1" pointer that is passed to nfsd_cache_update() is
supposed to point to the start of the egress NFS Reply header. In
fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.

But both krb5i and krb5p add fields between the RPC header's
accept_stat field and the start of the NFS Reply header. In those
cases, "statp + 1" points at the extra fields instead of the Reply.
The result is that nfsd_cache_update() caches what looks to the
client like garbage.

A connection break can occur for a number of reasons, but the most
common reason when using krb5i/p is a GSS sequence number window
underrun. When an underrun is detected, the server is obliged to
drop the RPC and the connection to force a retransmit with a fresh
GSS sequence number. The client presents the same XID, it hits in
the server's DRC, and the server returns the garbage cache entry.

The "statp + 1" argument has been used since the oldest changeset
in the kernel history repo, so it has been in nfsd_dispatch()
literally since before history began. The problem arose only when
the server-side GSS implementation was added twenty years ago.

This particular patch applies cleanly to v6.5 and later, but needs
some context adjustment to apply to earlier kernels. Before v5.16,
nfsd_dispatch() does not use xdr_stream, so saving the NFS header
pointer before calling ->pc_encode is still an appropriate fix
but it needs to be implemented differently.

Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfssvc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jeff Layton Nov. 17, 2023, 2:57 p.m. UTC | #1
On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The "statp + 1" pointer that is passed to nfsd_cache_update() is
> supposed to point to the start of the egress NFS Reply header. In
> fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
> 
> But both krb5i and krb5p add fields between the RPC header's
> accept_stat field and the start of the NFS Reply header. In those
> cases, "statp + 1" points at the extra fields instead of the Reply.
> The result is that nfsd_cache_update() caches what looks to the
> client like garbage.
> 
> A connection break can occur for a number of reasons, but the most
> common reason when using krb5i/p is a GSS sequence number window
> underrun. When an underrun is detected, the server is obliged to
> drop the RPC and the connection to force a retransmit with a fresh
> GSS sequence number. The client presents the same XID, it hits in
> the server's DRC, and the server returns the garbage cache entry.
> 
> The "statp + 1" argument has been used since the oldest changeset
> in the kernel history repo, so it has been in nfsd_dispatch()
> literally since before history began. The problem arose only when
> the server-side GSS implementation was added twenty years ago.
> 
> This particular patch applies cleanly to v6.5 and later, but needs
> some context adjustment to apply to earlier kernels. Before v5.16,
> nfsd_dispatch() does not use xdr_stream, so saving the NFS header
> pointer before calling ->pc_encode is still an appropriate fix
> but it needs to be implemented differently.
> 
> Cc: <stable@vger.kernel.org> # v5.16+
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfssvc.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index d6122bb2d167..60aacca2bca6 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	const struct svc_procedure *proc = rqstp->rq_procinfo;
>  	__be32 *statp = rqstp->rq_accept_statp;
>  	struct nfsd_cacherep *rp;
> +	__be32 *nfs_reply;
>  
>  	/*
>  	 * Give the xdr decoder a chance to change this if it wants
> @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	if (test_bit(RQ_DROPME, &rqstp->rq_flags))
>  		goto out_update_drop;
>  
> +	nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0);
>  	if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
>  		goto out_encode_err;
>  
> @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	 */
>  	smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1);
>  
> -	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1);
> +	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply);
>  out_cached_reply:
>  	return 1;
>  
> 
> 

With this patch, I'm seeing a regression in pynfs RPLY14. In the
attached capture the client sends a replay of an earlier call, and the
server responds (frame #97) with a reply that is truncated just after
the RPC accept state.
Chuck Lever Nov. 17, 2023, 3:08 p.m. UTC | #2
On Fri, Nov 17, 2023 at 09:57:49AM -0500, Jeff Layton wrote:
> On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > The "statp + 1" pointer that is passed to nfsd_cache_update() is
> > supposed to point to the start of the egress NFS Reply header. In
> > fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
> > 
> > But both krb5i and krb5p add fields between the RPC header's
> > accept_stat field and the start of the NFS Reply header. In those
> > cases, "statp + 1" points at the extra fields instead of the Reply.
> > The result is that nfsd_cache_update() caches what looks to the
> > client like garbage.
> > 
> > A connection break can occur for a number of reasons, but the most
> > common reason when using krb5i/p is a GSS sequence number window
> > underrun. When an underrun is detected, the server is obliged to
> > drop the RPC and the connection to force a retransmit with a fresh
> > GSS sequence number. The client presents the same XID, it hits in
> > the server's DRC, and the server returns the garbage cache entry.
> > 
> > The "statp + 1" argument has been used since the oldest changeset
> > in the kernel history repo, so it has been in nfsd_dispatch()
> > literally since before history began. The problem arose only when
> > the server-side GSS implementation was added twenty years ago.
> > 
> > This particular patch applies cleanly to v6.5 and later, but needs
> > some context adjustment to apply to earlier kernels. Before v5.16,
> > nfsd_dispatch() does not use xdr_stream, so saving the NFS header
> > pointer before calling ->pc_encode is still an appropriate fix
> > but it needs to be implemented differently.
> > 
> > Cc: <stable@vger.kernel.org> # v5.16+
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  fs/nfsd/nfssvc.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index d6122bb2d167..60aacca2bca6 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> >  	const struct svc_procedure *proc = rqstp->rq_procinfo;
> >  	__be32 *statp = rqstp->rq_accept_statp;
> >  	struct nfsd_cacherep *rp;
> > +	__be32 *nfs_reply;
> >  
> >  	/*
> >  	 * Give the xdr decoder a chance to change this if it wants
> > @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> >  	if (test_bit(RQ_DROPME, &rqstp->rq_flags))
> >  		goto out_update_drop;
> >  
> > +	nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0);
> >  	if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
> >  		goto out_encode_err;
> >  
> > @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> >  	 */
> >  	smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1);
> >  
> > -	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1);
> > +	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply);
> >  out_cached_reply:
> >  	return 1;
> >  
> > 
> > 
> 
> With this patch, I'm seeing a regression in pynfs RPLY14. In the
> attached capture the client sends a replay of an earlier call, and the
> server responds (frame #97) with a reply that is truncated just after
> the RPC accept state.

I've reproduced it. Looking now.
Chuck Lever Nov. 17, 2023, 6:58 p.m. UTC | #3
> On Nov 17, 2023, at 10:08 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> On Fri, Nov 17, 2023 at 09:57:49AM -0500, Jeff Layton wrote:
>> On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> The "statp + 1" pointer that is passed to nfsd_cache_update() is
>>> supposed to point to the start of the egress NFS Reply header. In
>>> fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
>>> 
>>> But both krb5i and krb5p add fields between the RPC header's
>>> accept_stat field and the start of the NFS Reply header. In those
>>> cases, "statp + 1" points at the extra fields instead of the Reply.
>>> The result is that nfsd_cache_update() caches what looks to the
>>> client like garbage.
>>> 
>>> A connection break can occur for a number of reasons, but the most
>>> common reason when using krb5i/p is a GSS sequence number window
>>> underrun. When an underrun is detected, the server is obliged to
>>> drop the RPC and the connection to force a retransmit with a fresh
>>> GSS sequence number. The client presents the same XID, it hits in
>>> the server's DRC, and the server returns the garbage cache entry.
>>> 
>>> The "statp + 1" argument has been used since the oldest changeset
>>> in the kernel history repo, so it has been in nfsd_dispatch()
>>> literally since before history began. The problem arose only when
>>> the server-side GSS implementation was added twenty years ago.
>>> 
>>> This particular patch applies cleanly to v6.5 and later, but needs
>>> some context adjustment to apply to earlier kernels. Before v5.16,
>>> nfsd_dispatch() does not use xdr_stream, so saving the NFS header
>>> pointer before calling ->pc_encode is still an appropriate fix
>>> but it needs to be implemented differently.
>>> 
>>> Cc: <stable@vger.kernel.org> # v5.16+
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>> fs/nfsd/nfssvc.c |    4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>>> index d6122bb2d167..60aacca2bca6 100644
>>> --- a/fs/nfsd/nfssvc.c
>>> +++ b/fs/nfsd/nfssvc.c
>>> @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>>> const struct svc_procedure *proc = rqstp->rq_procinfo;
>>> __be32 *statp = rqstp->rq_accept_statp;
>>> struct nfsd_cacherep *rp;
>>> + __be32 *nfs_reply;
>>> 
>>> /*
>>>  * Give the xdr decoder a chance to change this if it wants
>>> @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>>> if (test_bit(RQ_DROPME, &rqstp->rq_flags))
>>> goto out_update_drop;
>>> 
>>> + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0);
>>> if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
>>> goto out_encode_err;
>>> 
>>> @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>>>  */
>>> smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1);
>>> 
>>> - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1);
>>> + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply);
>>> out_cached_reply:
>>> return 1;
>>> 
>>> 
>>> 
>> 
>> With this patch, I'm seeing a regression in pynfs RPLY14. In the
>> attached capture the client sends a replay of an earlier call, and the
>> server responds (frame #97) with a reply that is truncated just after
>> the RPC accept state.
> 
> I've reproduced it. Looking now.

One line fix was squashed into "NFSD: Fix "start of NFS reply"
pointer passed to nfsd_cache_update()". The new series is in
the nfsd-fixes branch of my repo on kernel.org <http://kernel.org/>.


--
Chuck Lever
Jeff Layton Nov. 17, 2023, 8:11 p.m. UTC | #4
On Fri, 2023-11-17 at 18:58 +0000, Chuck Lever III wrote:
> 
> > On Nov 17, 2023, at 10:08 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > 
> > On Fri, Nov 17, 2023 at 09:57:49AM -0500, Jeff Layton wrote:
> > > On Fri, 2023-11-10 at 11:28 -0500, Chuck Lever wrote:
> > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > 
> > > > The "statp + 1" pointer that is passed to nfsd_cache_update() is
> > > > supposed to point to the start of the egress NFS Reply header. In
> > > > fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
> > > > 
> > > > But both krb5i and krb5p add fields between the RPC header's
> > > > accept_stat field and the start of the NFS Reply header. In those
> > > > cases, "statp + 1" points at the extra fields instead of the Reply.
> > > > The result is that nfsd_cache_update() caches what looks to the
> > > > client like garbage.
> > > > 
> > > > A connection break can occur for a number of reasons, but the most
> > > > common reason when using krb5i/p is a GSS sequence number window
> > > > underrun. When an underrun is detected, the server is obliged to
> > > > drop the RPC and the connection to force a retransmit with a fresh
> > > > GSS sequence number. The client presents the same XID, it hits in
> > > > the server's DRC, and the server returns the garbage cache entry.
> > > > 
> > > > The "statp + 1" argument has been used since the oldest changeset
> > > > in the kernel history repo, so it has been in nfsd_dispatch()
> > > > literally since before history began. The problem arose only when
> > > > the server-side GSS implementation was added twenty years ago.
> > > > 
> > > > This particular patch applies cleanly to v6.5 and later, but needs
> > > > some context adjustment to apply to earlier kernels. Before v5.16,
> > > > nfsd_dispatch() does not use xdr_stream, so saving the NFS header
> > > > pointer before calling ->pc_encode is still an appropriate fix
> > > > but it needs to be implemented differently.
> > > > 
> > > > Cc: <stable@vger.kernel.org> # v5.16+
> > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > ---
> > > > fs/nfsd/nfssvc.c |    4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > > index d6122bb2d167..60aacca2bca6 100644
> > > > --- a/fs/nfsd/nfssvc.c
> > > > +++ b/fs/nfsd/nfssvc.c
> > > > @@ -981,6 +981,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> > > > const struct svc_procedure *proc = rqstp->rq_procinfo;
> > > > __be32 *statp = rqstp->rq_accept_statp;
> > > > struct nfsd_cacherep *rp;
> > > > + __be32 *nfs_reply;
> > > > 
> > > > /*
> > > >  * Give the xdr decoder a chance to change this if it wants
> > > > @@ -1014,6 +1015,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> > > > if (test_bit(RQ_DROPME, &rqstp->rq_flags))
> > > > goto out_update_drop;
> > > > 
> > > > + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0);
> > > > if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
> > > > goto out_encode_err;
> > > > 
> > > > @@ -1023,7 +1025,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
> > > >  */
> > > > smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1);
> > > > 
> > > > - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1);
> > > > + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply);
> > > > out_cached_reply:
> > > > return 1;
> > > > 
> > > > 
> > > > 
> > > 
> > > With this patch, I'm seeing a regression in pynfs RPLY14. In the
> > > attached capture the client sends a replay of an earlier call, and the
> > > server responds (frame #97) with a reply that is truncated just after
> > > the RPC accept state.
> > 
> > I've reproduced it. Looking now.
> 
> One line fix was squashed into "NFSD: Fix "start of NFS reply"
> pointer passed to nfsd_cache_update()". The new series is in
> the nfsd-fixes branch of my repo on kernel.org <http://kernel.org/>.
> 

LGTM. You can add this to the pile:

Tested-by: Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d6122bb2d167..60aacca2bca6 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -981,6 +981,7 @@  int nfsd_dispatch(struct svc_rqst *rqstp)
 	const struct svc_procedure *proc = rqstp->rq_procinfo;
 	__be32 *statp = rqstp->rq_accept_statp;
 	struct nfsd_cacherep *rp;
+	__be32 *nfs_reply;
 
 	/*
 	 * Give the xdr decoder a chance to change this if it wants
@@ -1014,6 +1015,7 @@  int nfsd_dispatch(struct svc_rqst *rqstp)
 	if (test_bit(RQ_DROPME, &rqstp->rq_flags))
 		goto out_update_drop;
 
+	nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0);
 	if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
 		goto out_encode_err;
 
@@ -1023,7 +1025,7 @@  int nfsd_dispatch(struct svc_rqst *rqstp)
 	 */
 	smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1);
 
-	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1);
+	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply);
 out_cached_reply:
 	return 1;