diff mbox series

[7/8] nfsd: clean up and amend comments around nfsd4_cb_sequence_done()

Message ID 20250123-nfsd-6-14-v1-7-c1137a4fa2ae@kernel.org (mailing list archive)
State Not Applicable
Headers show
Series nfsd: CB_SEQUENCE error handling fixes and cleanups | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Jeff Layton Jan. 23, 2025, 8:25 p.m. UTC
Add a new kerneldoc header, and clean up the comments a bit.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

Comments

Chuck Lever Jan. 24, 2025, 2:43 p.m. UTC | #1
On 1/23/25 3:25 PM, Jeff Layton wrote:
> Add a new kerneldoc header, and clean up the comments a bit.

Usually I'm in favor of kdoc headers, but here, it's a static function
whose address is not shared outside of this source file. The only
documentation need is the meaning of the return code, IMO.


> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
>   1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
>   	rpc_call_start(task);
>   }
>   
> +/**
> + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
> + * @task: rpc_task
> + * @cb: nfsd4_callback for this call
> + *
> + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
> + * if the callback RPC client was killed. For v4.1+ the error handling
> + * is more sophisticated.

It would be much clearer to pull the 4.0 error handling out of this
function, which is named "cb_/sequence/_done".

Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ?


> + *
> + * Returns true if reply processing should continue.
> + */
>   static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
>   {
>   	struct nfs4_client *clp = cb->cb_clp;
> @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>   	if (!clp->cl_minorversion) {
>   		/*
>   		 * If the backchannel connection was shut down while this
> -		 * task was queued, we need to resubmit it after setting up
> -		 * a new backchannel connection.
> +		 * task was queued, resubmit it after setting up a new
> +		 * backchannel connection.
>   		 *
> -		 * Note that if we lost our callback connection permanently
> -		 * the submission code will error out, so we don't need to
> +		 * Note that if the callback connection is permanently lost,
> +		 * the submission code will error out. There is no need to
>   		 * handle that case here.
>   		 */
>   		if (RPC_SIGNALLED(task))
> @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>   	switch (cb->cb_seq_status) {
>   	case 0:
>   		/*
> -		 * No need for lock, access serialized in nfsd4_cb_prepare
> -		 *
>   		 * RFC5661 20.9.3
>   		 * If CB_SEQUENCE returns an error, then the state of the slot
>   		 * (sequence ID, cached reply) MUST NOT change.
> @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>   		ret = true;
>   		break;
>   	case -ESERVERFAULT:
> +		/*
> +		 * Client returned NFS4_OK, but decoding failed. Mark the
> +		 * backchannel as faulty, but don't retransmit since the
> +		 * call was successful.
> +		 */
>   		++session->se_cb_seq_nr[cb->cb_held_slot];
>   		nfsd4_mark_cb_fault(cb->cb_clp);
>   		break;

This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is
a better choice. But why call mark_cb_fault in this case?

Maybe split this clean-up into a separate patch.
Jeff Layton Jan. 24, 2025, 2:50 p.m. UTC | #2
On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote:
> On 1/23/25 3:25 PM, Jeff Layton wrote:
> > Add a new kerneldoc header, and clean up the comments a bit.
> 
> Usually I'm in favor of kdoc headers, but here, it's a static function
> whose address is not shared outside of this source file. The only
> documentation need is the meaning of the return code, IMO.
> 

If you like. I figured it wouldn't hurt to do a full kdoc comment.

> 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >   fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
> >   1 file changed, 20 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
> >   	rpc_call_start(task);
> >   }
> >   
> > +/**
> > + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
> > + * @task: rpc_task
> > + * @cb: nfsd4_callback for this call
> > + *
> > + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
> > + * if the callback RPC client was killed. For v4.1+ the error handling
> > + * is more sophisticated.
> 
> It would be much clearer to pull the 4.0 error handling out of this
> function, which is named "cb_/sequence/_done".
> 
> Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ?
> 

If we do that then we'll need to change this function to return
something other than a bool, and that's a larger change than I wanted
to make here. I really wanted to keep these as small, targeted patches
that can be backported easily.

I wouldn't object to further cleanup here on top of that though.


> 
> > + *
> > + * Returns true if reply processing should continue.
> > + */
> >   static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
> >   {
> >   	struct nfs4_client *clp = cb->cb_clp;
> > @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >   	if (!clp->cl_minorversion) {
> >   		/*
> >   		 * If the backchannel connection was shut down while this
> > -		 * task was queued, we need to resubmit it after setting up
> > -		 * a new backchannel connection.
> > +		 * task was queued, resubmit it after setting up a new
> > +		 * backchannel connection.
> >   		 *
> > -		 * Note that if we lost our callback connection permanently
> > -		 * the submission code will error out, so we don't need to
> > +		 * Note that if the callback connection is permanently lost,
> > +		 * the submission code will error out. There is no need to
> >   		 * handle that case here.
> >   		 */
> >   		if (RPC_SIGNALLED(task))
> > @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >   	switch (cb->cb_seq_status) {
> >   	case 0:
> >   		/*
> > -		 * No need for lock, access serialized in nfsd4_cb_prepare
> > -		 *
> >   		 * RFC5661 20.9.3
> >   		 * If CB_SEQUENCE returns an error, then the state of the slot
> >   		 * (sequence ID, cached reply) MUST NOT change.
> > @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >   		ret = true;
> >   		break;
> >   	case -ESERVERFAULT:
> > +		/*
> > +		 * Client returned NFS4_OK, but decoding failed. Mark the
> > +		 * backchannel as faulty, but don't retransmit since the
> > +		 * call was successful.
> > +		 */
> >   		++session->se_cb_seq_nr[cb->cb_held_slot];
> >   		nfsd4_mark_cb_fault(cb->cb_clp);
> >   		break;
> 
> This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is
> a better choice. But why call mark_cb_fault in this case?
> 
> Maybe split this clean-up into a separate patch.
> 
> 

I'm only altering comments in this patch. Do you really want separate
patches for the different comments?
Chuck Lever Jan. 24, 2025, 3:05 p.m. UTC | #3
On 1/24/25 9:50 AM, Jeff Layton wrote:
> On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote:
>> On 1/23/25 3:25 PM, Jeff Layton wrote:
>>> Add a new kerneldoc header, and clean up the comments a bit.
>>
>> Usually I'm in favor of kdoc headers, but here, it's a static function
>> whose address is not shared outside of this source file. The only
>> documentation need is the meaning of the return code, IMO.
>>
> 
> If you like. I figured it wouldn't hurt to do a full kdoc comment.

Kdoc comments are pretty noisy. This one doesn't seem to me to add
much real value -- its callers are all right here in the same file.


>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>    fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
>>>    1 file changed, 20 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
>>>    	rpc_call_start(task);
>>>    }
>>>    
>>> +/**
>>> + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
>>> + * @task: rpc_task
>>> + * @cb: nfsd4_callback for this call
>>> + *
>>> + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
>>> + * if the callback RPC client was killed. For v4.1+ the error handling
>>> + * is more sophisticated.
>>
>> It would be much clearer to pull the 4.0 error handling out of this
>> function, which is named "cb_/sequence/_done".
>>
>> Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ?
>>
> 
> If we do that then we'll need to change this function to return
> something other than a bool, and that's a larger change than I wanted
> to make here. I really wanted to keep these as small, targeted patches
> that can be backported easily.
> 
> I wouldn't object to further cleanup here on top of that though.

There's no reason to document the 4.0 logic if it's about to be moved
out. I strongly prefer making the code more self-documenting. Adding
a comment here about 4.0 then adding a patch on top moving the code
somewhere else seems silly to me.


>>> + *
>>> + * Returns true if reply processing should continue.
>>> + */
>>>    static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
>>>    {
>>>    	struct nfs4_client *clp = cb->cb_clp;
>>> @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    	if (!clp->cl_minorversion) {
>>>    		/*
>>>    		 * If the backchannel connection was shut down while this
>>> -		 * task was queued, we need to resubmit it after setting up
>>> -		 * a new backchannel connection.
>>> +		 * task was queued, resubmit it after setting up a new
>>> +		 * backchannel connection.
>>>    		 *
>>> -		 * Note that if we lost our callback connection permanently
>>> -		 * the submission code will error out, so we don't need to
>>> +		 * Note that if the callback connection is permanently lost,
>>> +		 * the submission code will error out. There is no need to
>>>    		 * handle that case here.
>>>    		 */
>>>    		if (RPC_SIGNALLED(task))
>>> @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    	switch (cb->cb_seq_status) {
>>>    	case 0:
>>>    		/*
>>> -		 * No need for lock, access serialized in nfsd4_cb_prepare
>>> -		 *
>>>    		 * RFC5661 20.9.3
>>>    		 * If CB_SEQUENCE returns an error, then the state of the slot
>>>    		 * (sequence ID, cached reply) MUST NOT change.
>>> @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    		ret = true;
>>>    		break;
>>>    	case -ESERVERFAULT:
>>> +		/*
>>> +		 * Client returned NFS4_OK, but decoding failed. Mark the
>>> +		 * backchannel as faulty, but don't retransmit since the
>>> +		 * call was successful.
>>> +		 */
>>>    		++session->se_cb_seq_nr[cb->cb_held_slot];
>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>    		break;
>>
>> This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is
>> a better choice. But why call mark_cb_fault in this case?
>>
>> Maybe split this clean-up into a separate patch.
>>
>>
> 
> I'm only altering comments in this patch. Do you really want separate
> patches for the different comments?

Why call mark_cb_fault here? If NFSD retransmits this operation on a
fresh session/transport it will just fail to decode the reply again.

Do we believe that the decoding failure means there was a transport
problem of some kind?

It's clear we do not understand this code well enough to update the
existing comment, so my review comment above suggests a broader code
change is necessary.
Jeff Layton Jan. 24, 2025, 3:31 p.m. UTC | #4
On Fri, 2025-01-24 at 10:05 -0500, Chuck Lever wrote:
> On 1/24/25 9:50 AM, Jeff Layton wrote:
> > On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote:
> > > On 1/23/25 3:25 PM, Jeff Layton wrote:
> > > > Add a new kerneldoc header, and clean up the comments a bit.
> > > 
> > > Usually I'm in favor of kdoc headers, but here, it's a static function
> > > whose address is not shared outside of this source file. The only
> > > documentation need is the meaning of the return code, IMO.
> > > 
> > 
> > If you like. I figured it wouldn't hurt to do a full kdoc comment.
> 
> Kdoc comments are pretty noisy. This one doesn't seem to me to add
> much real value -- its callers are all right here in the same file.
> 
> 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >    fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
> > > >    1 file changed, 20 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
> > > >    	rpc_call_start(task);
> > > >    }
> > > >    
> > > > +/**
> > > > + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
> > > > + * @task: rpc_task
> > > > + * @cb: nfsd4_callback for this call
> > > > + *
> > > > + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
> > > > + * if the callback RPC client was killed. For v4.1+ the error handling
> > > > + * is more sophisticated.
> > > 
> > > It would be much clearer to pull the 4.0 error handling out of this
> > > function, which is named "cb_/sequence/_done".
> > > 
> > > Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ?
> > > 
> > 
> > If we do that then we'll need to change this function to return
> > something other than a bool, and that's a larger change than I wanted
> > to make here. I really wanted to keep these as small, targeted patches
> > that can be backported easily.
> > 
> > I wouldn't object to further cleanup here on top of that though.
> 
> There's no reason to document the 4.0 logic if it's about to be moved
> out. I strongly prefer making the code more self-documenting. Adding
> a comment here about 4.0 then adding a patch on top moving the code
> somewhere else seems silly to me.
> 

Ok.

> 
> > > > + *
> > > > + * Returns true if reply processing should continue.
> > > > + */
> > > >    static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
> > > >    {
> > > >    	struct nfs4_client *clp = cb->cb_clp;
> > > > @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >    	if (!clp->cl_minorversion) {
> > > >    		/*
> > > >    		 * If the backchannel connection was shut down while this
> > > > -		 * task was queued, we need to resubmit it after setting up
> > > > -		 * a new backchannel connection.
> > > > +		 * task was queued, resubmit it after setting up a new
> > > > +		 * backchannel connection.
> > > >    		 *
> > > > -		 * Note that if we lost our callback connection permanently
> > > > -		 * the submission code will error out, so we don't need to
> > > > +		 * Note that if the callback connection is permanently lost,
> > > > +		 * the submission code will error out. There is no need to
> > > >    		 * handle that case here.
> > > >    		 */
> > > >    		if (RPC_SIGNALLED(task))
> > > > @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >    	switch (cb->cb_seq_status) {
> > > >    	case 0:
> > > >    		/*
> > > > -		 * No need for lock, access serialized in nfsd4_cb_prepare
> > > > -		 *
> > > >    		 * RFC5661 20.9.3
> > > >    		 * If CB_SEQUENCE returns an error, then the state of the slot
> > > >    		 * (sequence ID, cached reply) MUST NOT change.
> > > > @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > > >    		ret = true;
> > > >    		break;
> > > >    	case -ESERVERFAULT:
> > > > +		/*
> > > > +		 * Client returned NFS4_OK, but decoding failed. Mark the
> > > > +		 * backchannel as faulty, but don't retransmit since the
> > > > +		 * call was successful.
> > > > +		 */
> > > >    		++session->se_cb_seq_nr[cb->cb_held_slot];
> > > >    		nfsd4_mark_cb_fault(cb->cb_clp);
> > > >    		break;
> > > 
> > > This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is
> > > a better choice. But why call mark_cb_fault in this case?
> > > 

I can fix that up. BADXDR is more descriptive.

> > > Maybe split this clean-up into a separate patch.
> > > 
> > > 
> > 
> > I'm only altering comments in this patch. Do you really want separate
> > patches for the different comments?
> 
> Why call mark_cb_fault here? If NFSD retransmits this operation on a
> fresh session/transport it will just fail to decode the reply again.
> 
> Do we believe that the decoding failure means there was a transport
> problem of some kind?
> 
> It's clear we do not understand this code well enough to update the
> existing comment, so my review comment above suggests a broader code
> change is necessary.
> 
> 

It won't be retransmitted in the ESERVERFAULT case. It just fails.

I can't speak definitively, but my guess is that this is an
extraordinary situation and the author (Kinglong Mee?) figured "might
as well mark the cb faulty too". I don't think that's necessarily
unreasonable, but I agree that it's unlikely to help.

Unfortunately as the server in this situation, our options for alerting
about callback problems are limited. Marking the CB channel as faulty
isn't a great response, but it is at least something. The problem here
is that these are callbacks, and if they fail, there is zero indication
that there is a problem.

Stepping back, when we do find failing callbacks, should we be doing
more to alert the admin? What would be an appropriate response?
Ratelimited pr_notice()? Conditional tracepoints? Something else?
Chuck Lever Jan. 24, 2025, 3:42 p.m. UTC | #5
On 1/24/25 10:31 AM, Jeff Layton wrote:
> Stepping back, when we do find failing callbacks, should we be doing
> more to alert the admin? What would be an appropriate response?
> Ratelimited pr_notice()? Conditional tracepoints? Something else?

If we can identify a reasonable action that an admin can take, then
pr_ratelimited (or once per client instance) to report a decoding
failure makes sense. Report the IP address of the client that is sending
us weird crap.

For the more conventional session failures, I'm less enthusiastic
about logging these because frequently they are the result of
transport failures (or a suspended client, or ...). We'll have to
sort through the various cases to understand where a logged error
might be helpful/actionable.
Chuck Lever Jan. 26, 2025, 4:50 p.m. UTC | #6
On 1/24/25 9:50 AM, Jeff Layton wrote:
> On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote:
>> On 1/23/25 3:25 PM, Jeff Layton wrote:
>>> Add a new kerneldoc header, and clean up the comments a bit.
>>
>> Usually I'm in favor of kdoc headers, but here, it's a static function
>> whose address is not shared outside of this source file. The only
>> documentation need is the meaning of the return code, IMO.
>>
> 
> If you like. I figured it wouldn't hurt to do a full kdoc comment.
> 
>>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>    fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------
>>>    1 file changed, 20 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
>>>    	rpc_call_start(task);
>>>    }
>>>    
>>> +/**
>>> + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
>>> + * @task: rpc_task
>>> + * @cb: nfsd4_callback for this call
>>> + *
>>> + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
>>> + * if the callback RPC client was killed. For v4.1+ the error handling
>>> + * is more sophisticated.
>>
>> It would be much clearer to pull the 4.0 error handling out of this
>> function, which is named "cb_/sequence/_done".
>>
>> Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ?
>>
> 
> If we do that then we'll need to change this function to return
> something other than a bool, and that's a larger change than I wanted
> to make here.

I don't think that's needed. If you create a helper like so:

static bool nfsd4_cb_requeue(struct rpc_task *task,
			     struct nfsd4_callback *cb)
{ 

         struct nfs4_client *clp = cb->cb_clp;

         if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { 

                 trace_nfsd_cb_restart(clp, cb); 

                 task->tk_status = 0; 

                 cb->cb_need_restart = true; 

         } 

         return false;
}

Then you can replace the "goto need_restart;" sites in both functions
with a tail call to this helper:

	return nfsd4_cb_requeue(task, cb);


> I really wanted to keep these as small, targeted patches
> that can be backported easily.

Clean-ups are generally not back-portable, so I don't mind if you go to
a little extra trouble.


> I wouldn't object to further cleanup here on top of that though.
> 
> 
>>
>>> + *
>>> + * Returns true if reply processing should continue.
>>> + */
>>>    static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
>>>    {
>>>    	struct nfs4_client *clp = cb->cb_clp;
>>> @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    	if (!clp->cl_minorversion) {
>>>    		/*
>>>    		 * If the backchannel connection was shut down while this
>>> -		 * task was queued, we need to resubmit it after setting up
>>> -		 * a new backchannel connection.
>>> +		 * task was queued, resubmit it after setting up a new
>>> +		 * backchannel connection.
>>>    		 *
>>> -		 * Note that if we lost our callback connection permanently
>>> -		 * the submission code will error out, so we don't need to
>>> +		 * Note that if the callback connection is permanently lost,
>>> +		 * the submission code will error out. There is no need to
>>>    		 * handle that case here.
>>>    		 */
>>>    		if (RPC_SIGNALLED(task))
>>> @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    	switch (cb->cb_seq_status) {
>>>    	case 0:
>>>    		/*
>>> -		 * No need for lock, access serialized in nfsd4_cb_prepare
>>> -		 *
>>>    		 * RFC5661 20.9.3
>>>    		 * If CB_SEQUENCE returns an error, then the state of the slot
>>>    		 * (sequence ID, cached reply) MUST NOT change.
>>> @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>    		ret = true;
>>>    		break;
>>>    	case -ESERVERFAULT:
>>> +		/*
>>> +		 * Client returned NFS4_OK, but decoding failed. Mark the
>>> +		 * backchannel as faulty, but don't retransmit since the
>>> +		 * call was successful.
>>> +		 */
>>>    		++session->se_cb_seq_nr[cb->cb_held_slot];
>>>    		nfsd4_mark_cb_fault(cb->cb_clp);
>>>    		break;
>>
>> This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is
>> a better choice. But why call mark_cb_fault in this case?
>>
>> Maybe split this clean-up into a separate patch.
>>
>>
> 
> I'm only altering comments in this patch. Do you really want separate
> patches for the different comments?
>
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1325,6 +1325,17 @@  static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata)
 	rpc_call_start(task);
 }
 
+/**
+ * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE
+ * @task: rpc_task
+ * @cb: nfsd4_callback for this call
+ *
+ * For minorversion 0, there is no CB_SEQUENCE. Only restart the call
+ * if the callback RPC client was killed. For v4.1+ the error handling
+ * is more sophisticated.
+ *
+ * Returns true if reply processing should continue.
+ */
 static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb)
 {
 	struct nfs4_client *clp = cb->cb_clp;
@@ -1334,11 +1345,11 @@  static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 	if (!clp->cl_minorversion) {
 		/*
 		 * If the backchannel connection was shut down while this
-		 * task was queued, we need to resubmit it after setting up
-		 * a new backchannel connection.
+		 * task was queued, resubmit it after setting up a new
+		 * backchannel connection.
 		 *
-		 * Note that if we lost our callback connection permanently
-		 * the submission code will error out, so we don't need to
+		 * Note that if the callback connection is permanently lost,
+		 * the submission code will error out. There is no need to
 		 * handle that case here.
 		 */
 		if (RPC_SIGNALLED(task))
@@ -1355,8 +1366,6 @@  static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 	switch (cb->cb_seq_status) {
 	case 0:
 		/*
-		 * No need for lock, access serialized in nfsd4_cb_prepare
-		 *
 		 * RFC5661 20.9.3
 		 * If CB_SEQUENCE returns an error, then the state of the slot
 		 * (sequence ID, cached reply) MUST NOT change.
@@ -1365,6 +1374,11 @@  static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		ret = true;
 		break;
 	case -ESERVERFAULT:
+		/*
+		 * Client returned NFS4_OK, but decoding failed. Mark the
+		 * backchannel as faulty, but don't retransmit since the
+		 * call was successful.
+		 */
 		++session->se_cb_seq_nr[cb->cb_held_slot];
 		nfsd4_mark_cb_fault(cb->cb_clp);
 		break;