Message ID | 20250123-nfsd-6-14-v1-7-c1137a4fa2ae@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Chuck Lever |
Headers | show |
Series | nfsd: CB_SEQUENCE error handling fixes and cleanups | expand |
On 1/23/25 3:25 PM, Jeff Layton wrote: > Add a new kerneldoc header, and clean up the comments a bit. Usually I'm in favor of kdoc headers, but here, it's a static function whose address is not shared outside of this source file. The only documentation need is the meaning of the return code, IMO. > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644 > --- a/fs/nfsd/nfs4callback.c > +++ b/fs/nfsd/nfs4callback.c > @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) > rpc_call_start(task); > } > > +/** > + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE > + * @task: rpc_task > + * @cb: nfsd4_callback for this call > + * > + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call > + * if the callback RPC client was killed. For v4.1+ the error handling > + * is more sophisticated. It would be much clearer to pull the 4.0 error handling out of this function, which is named "cb_/sequence/_done". Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ? > + * > + * Returns true if reply processing should continue. > + */ > static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) > { > struct nfs4_client *clp = cb->cb_clp; > @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > if (!clp->cl_minorversion) { > /* > * If the backchannel connection was shut down while this > - * task was queued, we need to resubmit it after setting up > - * a new backchannel connection. > + * task was queued, resubmit it after setting up a new > + * backchannel connection. > * > - * Note that if we lost our callback connection permanently > - * the submission code will error out, so we don't need to > + * Note that if the callback connection is permanently lost, > + * the submission code will error out. There is no need to > * handle that case here. > */ > if (RPC_SIGNALLED(task)) > @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > switch (cb->cb_seq_status) { > case 0: > /* > - * No need for lock, access serialized in nfsd4_cb_prepare > - * > * RFC5661 20.9.3 > * If CB_SEQUENCE returns an error, then the state of the slot > * (sequence ID, cached reply) MUST NOT change. > @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > ret = true; > break; > case -ESERVERFAULT: > + /* > + * Client returned NFS4_OK, but decoding failed. Mark the > + * backchannel as faulty, but don't retransmit since the > + * call was successful. > + */ > ++session->se_cb_seq_nr[cb->cb_held_slot]; > nfsd4_mark_cb_fault(cb->cb_clp); > break; This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is a better choice. But why call mark_cb_fault in this case? Maybe split this clean-up into a separate patch.
On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote: > On 1/23/25 3:25 PM, Jeff Layton wrote: > > Add a new kerneldoc header, and clean up the comments a bit. > > Usually I'm in favor of kdoc headers, but here, it's a static function > whose address is not shared outside of this source file. The only > documentation need is the meaning of the return code, IMO. > If you like. I figured it wouldn't hurt to do a full kdoc comment. > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------ > > 1 file changed, 20 insertions(+), 6 deletions(-) > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > > index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644 > > --- a/fs/nfsd/nfs4callback.c > > +++ b/fs/nfsd/nfs4callback.c > > @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) > > rpc_call_start(task); > > } > > > > +/** > > + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE > > + * @task: rpc_task > > + * @cb: nfsd4_callback for this call > > + * > > + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call > > + * if the callback RPC client was killed. For v4.1+ the error handling > > + * is more sophisticated. > > It would be much clearer to pull the 4.0 error handling out of this > function, which is named "cb_/sequence/_done". > > Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ? > If we do that then we'll need to change this function to return something other than a bool, and that's a larger change than I wanted to make here. I really wanted to keep these as small, targeted patches that can be backported easily. I wouldn't object to further cleanup here on top of that though. > > > + * > > + * Returns true if reply processing should continue. > > + */ > > static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) > > { > > struct nfs4_client *clp = cb->cb_clp; > > @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > if (!clp->cl_minorversion) { > > /* > > * If the backchannel connection was shut down while this > > - * task was queued, we need to resubmit it after setting up > > - * a new backchannel connection. > > + * task was queued, resubmit it after setting up a new > > + * backchannel connection. > > * > > - * Note that if we lost our callback connection permanently > > - * the submission code will error out, so we don't need to > > + * Note that if the callback connection is permanently lost, > > + * the submission code will error out. There is no need to > > * handle that case here. > > */ > > if (RPC_SIGNALLED(task)) > > @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > switch (cb->cb_seq_status) { > > case 0: > > /* > > - * No need for lock, access serialized in nfsd4_cb_prepare > > - * > > * RFC5661 20.9.3 > > * If CB_SEQUENCE returns an error, then the state of the slot > > * (sequence ID, cached reply) MUST NOT change. > > @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > ret = true; > > break; > > case -ESERVERFAULT: > > + /* > > + * Client returned NFS4_OK, but decoding failed. Mark the > > + * backchannel as faulty, but don't retransmit since the > > + * call was successful. > > + */ > > ++session->se_cb_seq_nr[cb->cb_held_slot]; > > nfsd4_mark_cb_fault(cb->cb_clp); > > break; > > This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is > a better choice. But why call mark_cb_fault in this case? > > Maybe split this clean-up into a separate patch. > > I'm only altering comments in this patch. Do you really want separate patches for the different comments?
On 1/24/25 9:50 AM, Jeff Layton wrote: > On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote: >> On 1/23/25 3:25 PM, Jeff Layton wrote: >>> Add a new kerneldoc header, and clean up the comments a bit. >> >> Usually I'm in favor of kdoc headers, but here, it's a static function >> whose address is not shared outside of this source file. The only >> documentation need is the meaning of the return code, IMO. >> > > If you like. I figured it wouldn't hurt to do a full kdoc comment. Kdoc comments are pretty noisy. This one doesn't seem to me to add much real value -- its callers are all right here in the same file. >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> --- >>> fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------ >>> 1 file changed, 20 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c >>> index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644 >>> --- a/fs/nfsd/nfs4callback.c >>> +++ b/fs/nfsd/nfs4callback.c >>> @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) >>> rpc_call_start(task); >>> } >>> >>> +/** >>> + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE >>> + * @task: rpc_task >>> + * @cb: nfsd4_callback for this call >>> + * >>> + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call >>> + * if the callback RPC client was killed. For v4.1+ the error handling >>> + * is more sophisticated. >> >> It would be much clearer to pull the 4.0 error handling out of this >> function, which is named "cb_/sequence/_done". >> >> Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ? >> > > If we do that then we'll need to change this function to return > something other than a bool, and that's a larger change than I wanted > to make here. I really wanted to keep these as small, targeted patches > that can be backported easily. > > I wouldn't object to further cleanup here on top of that though. There's no reason to document the 4.0 logic if it's about to be moved out. I strongly prefer making the code more self-documenting. Adding a comment here about 4.0 then adding a patch on top moving the code somewhere else seems silly to me. >>> + * >>> + * Returns true if reply processing should continue. >>> + */ >>> static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) >>> { >>> struct nfs4_client *clp = cb->cb_clp; >>> @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> if (!clp->cl_minorversion) { >>> /* >>> * If the backchannel connection was shut down while this >>> - * task was queued, we need to resubmit it after setting up >>> - * a new backchannel connection. >>> + * task was queued, resubmit it after setting up a new >>> + * backchannel connection. >>> * >>> - * Note that if we lost our callback connection permanently >>> - * the submission code will error out, so we don't need to >>> + * Note that if the callback connection is permanently lost, >>> + * the submission code will error out. There is no need to >>> * handle that case here. >>> */ >>> if (RPC_SIGNALLED(task)) >>> @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> switch (cb->cb_seq_status) { >>> case 0: >>> /* >>> - * No need for lock, access serialized in nfsd4_cb_prepare >>> - * >>> * RFC5661 20.9.3 >>> * If CB_SEQUENCE returns an error, then the state of the slot >>> * (sequence ID, cached reply) MUST NOT change. >>> @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> ret = true; >>> break; >>> case -ESERVERFAULT: >>> + /* >>> + * Client returned NFS4_OK, but decoding failed. Mark the >>> + * backchannel as faulty, but don't retransmit since the >>> + * call was successful. >>> + */ >>> ++session->se_cb_seq_nr[cb->cb_held_slot]; >>> nfsd4_mark_cb_fault(cb->cb_clp); >>> break; >> >> This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is >> a better choice. But why call mark_cb_fault in this case? >> >> Maybe split this clean-up into a separate patch. >> >> > > I'm only altering comments in this patch. Do you really want separate > patches for the different comments? Why call mark_cb_fault here? If NFSD retransmits this operation on a fresh session/transport it will just fail to decode the reply again. Do we believe that the decoding failure means there was a transport problem of some kind? It's clear we do not understand this code well enough to update the existing comment, so my review comment above suggests a broader code change is necessary.
On Fri, 2025-01-24 at 10:05 -0500, Chuck Lever wrote: > On 1/24/25 9:50 AM, Jeff Layton wrote: > > On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote: > > > On 1/23/25 3:25 PM, Jeff Layton wrote: > > > > Add a new kerneldoc header, and clean up the comments a bit. > > > > > > Usually I'm in favor of kdoc headers, but here, it's a static function > > > whose address is not shared outside of this source file. The only > > > documentation need is the meaning of the return code, IMO. > > > > > > > If you like. I figured it wouldn't hurt to do a full kdoc comment. > > Kdoc comments are pretty noisy. This one doesn't seem to me to add > much real value -- its callers are all right here in the same file. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------ > > > > 1 file changed, 20 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > > > > index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644 > > > > --- a/fs/nfsd/nfs4callback.c > > > > +++ b/fs/nfsd/nfs4callback.c > > > > @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) > > > > rpc_call_start(task); > > > > } > > > > > > > > +/** > > > > + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE > > > > + * @task: rpc_task > > > > + * @cb: nfsd4_callback for this call > > > > + * > > > > + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call > > > > + * if the callback RPC client was killed. For v4.1+ the error handling > > > > + * is more sophisticated. > > > > > > It would be much clearer to pull the 4.0 error handling out of this > > > function, which is named "cb_/sequence/_done". > > > > > > Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ? > > > > > > > If we do that then we'll need to change this function to return > > something other than a bool, and that's a larger change than I wanted > > to make here. I really wanted to keep these as small, targeted patches > > that can be backported easily. > > > > I wouldn't object to further cleanup here on top of that though. > > There's no reason to document the 4.0 logic if it's about to be moved > out. I strongly prefer making the code more self-documenting. Adding > a comment here about 4.0 then adding a patch on top moving the code > somewhere else seems silly to me. > Ok. > > > > > + * > > > > + * Returns true if reply processing should continue. > > > > + */ > > > > static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) > > > > { > > > > struct nfs4_client *clp = cb->cb_clp; > > > > @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > > > if (!clp->cl_minorversion) { > > > > /* > > > > * If the backchannel connection was shut down while this > > > > - * task was queued, we need to resubmit it after setting up > > > > - * a new backchannel connection. > > > > + * task was queued, resubmit it after setting up a new > > > > + * backchannel connection. > > > > * > > > > - * Note that if we lost our callback connection permanently > > > > - * the submission code will error out, so we don't need to > > > > + * Note that if the callback connection is permanently lost, > > > > + * the submission code will error out. There is no need to > > > > * handle that case here. > > > > */ > > > > if (RPC_SIGNALLED(task)) > > > > @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > > > switch (cb->cb_seq_status) { > > > > case 0: > > > > /* > > > > - * No need for lock, access serialized in nfsd4_cb_prepare > > > > - * > > > > * RFC5661 20.9.3 > > > > * If CB_SEQUENCE returns an error, then the state of the slot > > > > * (sequence ID, cached reply) MUST NOT change. > > > > @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > > > ret = true; > > > > break; > > > > case -ESERVERFAULT: > > > > + /* > > > > + * Client returned NFS4_OK, but decoding failed. Mark the > > > > + * backchannel as faulty, but don't retransmit since the > > > > + * call was successful. > > > > + */ > > > > ++session->se_cb_seq_nr[cb->cb_held_slot]; > > > > nfsd4_mark_cb_fault(cb->cb_clp); > > > > break; > > > > > > This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is > > > a better choice. But why call mark_cb_fault in this case? > > > I can fix that up. BADXDR is more descriptive. > > > Maybe split this clean-up into a separate patch. > > > > > > > > > > I'm only altering comments in this patch. Do you really want separate > > patches for the different comments? > > Why call mark_cb_fault here? If NFSD retransmits this operation on a > fresh session/transport it will just fail to decode the reply again. > > Do we believe that the decoding failure means there was a transport > problem of some kind? > > It's clear we do not understand this code well enough to update the > existing comment, so my review comment above suggests a broader code > change is necessary. > > It won't be retransmitted in the ESERVERFAULT case. It just fails. I can't speak definitively, but my guess is that this is an extraordinary situation and the author (Kinglong Mee?) figured "might as well mark the cb faulty too". I don't think that's necessarily unreasonable, but I agree that it's unlikely to help. Unfortunately as the server in this situation, our options for alerting about callback problems are limited. Marking the CB channel as faulty isn't a great response, but it is at least something. The problem here is that these are callbacks, and if they fail, there is zero indication that there is a problem. Stepping back, when we do find failing callbacks, should we be doing more to alert the admin? What would be an appropriate response? Ratelimited pr_notice()? Conditional tracepoints? Something else?
On 1/24/25 10:31 AM, Jeff Layton wrote: > Stepping back, when we do find failing callbacks, should we be doing > more to alert the admin? What would be an appropriate response? > Ratelimited pr_notice()? Conditional tracepoints? Something else? If we can identify a reasonable action that an admin can take, then pr_ratelimited (or once per client instance) to report a decoding failure makes sense. Report the IP address of the client that is sending us weird crap. For the more conventional session failures, I'm less enthusiastic about logging these because frequently they are the result of transport failures (or a suspended client, or ...). We'll have to sort through the various cases to understand where a logged error might be helpful/actionable.
On 1/24/25 9:50 AM, Jeff Layton wrote: > On Fri, 2025-01-24 at 09:43 -0500, Chuck Lever wrote: >> On 1/23/25 3:25 PM, Jeff Layton wrote: >>> Add a new kerneldoc header, and clean up the comments a bit. >> >> Usually I'm in favor of kdoc headers, but here, it's a static function >> whose address is not shared outside of this source file. The only >> documentation need is the meaning of the return code, IMO. >> > > If you like. I figured it wouldn't hurt to do a full kdoc comment. > >> >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> --- >>> fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------ >>> 1 file changed, 20 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c >>> index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644 >>> --- a/fs/nfsd/nfs4callback.c >>> +++ b/fs/nfsd/nfs4callback.c >>> @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) >>> rpc_call_start(task); >>> } >>> >>> +/** >>> + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE >>> + * @task: rpc_task >>> + * @cb: nfsd4_callback for this call >>> + * >>> + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call >>> + * if the callback RPC client was killed. For v4.1+ the error handling >>> + * is more sophisticated. >> >> It would be much clearer to pull the 4.0 error handling out of this >> function, which is named "cb_/sequence/_done". >> >> Perhaps the need_restart label can be hoisted into nfsd4_cb_done() ? >> > > If we do that then we'll need to change this function to return > something other than a bool, and that's a larger change than I wanted > to make here. I don't think that's needed. If you create a helper like so: static bool nfsd4_cb_requeue(struct rpc_task *task, struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { trace_nfsd_cb_restart(clp, cb); task->tk_status = 0; cb->cb_need_restart = true; } return false; } Then you can replace the "goto need_restart;" sites in both functions with a tail call to this helper: return nfsd4_cb_requeue(task, cb); > I really wanted to keep these as small, targeted patches > that can be backported easily. Clean-ups are generally not back-portable, so I don't mind if you go to a little extra trouble. > I wouldn't object to further cleanup here on top of that though. > > >> >>> + * >>> + * Returns true if reply processing should continue. >>> + */ >>> static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) >>> { >>> struct nfs4_client *clp = cb->cb_clp; >>> @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> if (!clp->cl_minorversion) { >>> /* >>> * If the backchannel connection was shut down while this >>> - * task was queued, we need to resubmit it after setting up >>> - * a new backchannel connection. >>> + * task was queued, resubmit it after setting up a new >>> + * backchannel connection. >>> * >>> - * Note that if we lost our callback connection permanently >>> - * the submission code will error out, so we don't need to >>> + * Note that if the callback connection is permanently lost, >>> + * the submission code will error out. There is no need to >>> * handle that case here. >>> */ >>> if (RPC_SIGNALLED(task)) >>> @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> switch (cb->cb_seq_status) { >>> case 0: >>> /* >>> - * No need for lock, access serialized in nfsd4_cb_prepare >>> - * >>> * RFC5661 20.9.3 >>> * If CB_SEQUENCE returns an error, then the state of the slot >>> * (sequence ID, cached reply) MUST NOT change. >>> @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> ret = true; >>> break; >>> case -ESERVERFAULT: >>> + /* >>> + * Client returned NFS4_OK, but decoding failed. Mark the >>> + * backchannel as faulty, but don't retransmit since the >>> + * call was successful. >>> + */ >>> ++session->se_cb_seq_nr[cb->cb_held_slot]; >>> nfsd4_mark_cb_fault(cb->cb_clp); >>> break; >> >> This old code abuses the meaning of ESERVERFAULT IMO. NFS4ERR_BADXDR is >> a better choice. But why call mark_cb_fault in this case? >> >> Maybe split this clean-up into a separate patch. >> >> > > I'm only altering comments in this patch. Do you really want separate > patches for the different comments? >
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 6e0561f3b21bd850b0387b5af7084eb05e818231..415fc8aae0f47c36f00b2384805c7a996fb1feb0 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1325,6 +1325,17 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) rpc_call_start(task); } +/** + * nfsd4_cb_sequence_done - process the result of a CB_SEQUENCE + * @task: rpc_task + * @cb: nfsd4_callback for this call + * + * For minorversion 0, there is no CB_SEQUENCE. Only restart the call + * if the callback RPC client was killed. For v4.1+ the error handling + * is more sophisticated. + * + * Returns true if reply processing should continue. + */ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; @@ -1334,11 +1345,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback if (!clp->cl_minorversion) { /* * If the backchannel connection was shut down while this - * task was queued, we need to resubmit it after setting up - * a new backchannel connection. + * task was queued, resubmit it after setting up a new + * backchannel connection. * - * Note that if we lost our callback connection permanently - * the submission code will error out, so we don't need to + * Note that if the callback connection is permanently lost, + * the submission code will error out. There is no need to * handle that case here. */ if (RPC_SIGNALLED(task)) @@ -1355,8 +1366,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback switch (cb->cb_seq_status) { case 0: /* - * No need for lock, access serialized in nfsd4_cb_prepare - * * RFC5661 20.9.3 * If CB_SEQUENCE returns an error, then the state of the slot * (sequence ID, cached reply) MUST NOT change. @@ -1365,6 +1374,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback ret = true; break; case -ESERVERFAULT: + /* + * Client returned NFS4_OK, but decoding failed. Mark the + * backchannel as faulty, but don't retransmit since the + * call was successful. + */ ++session->se_cb_seq_nr[cb->cb_held_slot]; nfsd4_mark_cb_fault(cb->cb_clp); break;
Add a new kerneldoc header, and clean up the comments a bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/nfs4callback.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)