Message ID | 1738185446-7488-1-git-send-email-dai.ngo@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/1] NFSD: fix hang in nfsd4_shutdown_callback | expand |
On 1/29/25 4:17 PM, Dai Ngo wrote: > If nfs4_client is in COURTESY state then there is no point to retry > the callback. This causes nfsd4_shutdown_callback to hang since > cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP > notifies NFSD that the connection was closed. > > This patch modifies nfsd4_cb_sequence_done to skip the restart the > RPC if nfs4_client is in COURTESY state. > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > --- > fs/nfsd/nfs4callback.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > index 50e468bdb8d4..c90f94898cc5 100644 > --- a/fs/nfsd/nfs4callback.c > +++ b/fs/nfsd/nfs4callback.c > @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > ret = false; > break; > case 1: > + if (clp->cl_state == NFSD4_COURTESY) { > + nfsd4_mark_cb_fault(cb->cb_clp); > + ret = false; > + break; > + } We could do toss this operation here or at the top of nfsd4_run_cb_work(). > /* > * cb_seq_status remains 1 if an RPC Reply was never > * received. NFSD can't know if the client processed
On Wed, Jan 29, 2025 at 4:17 PM Dai Ngo <dai.ngo@oracle.com> wrote: > > If nfs4_client is in COURTESY state then there is no point to retry > the callback. This causes nfsd4_shutdown_callback to hang since > cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP > notifies NFSD that the connection was closed. > > This patch modifies nfsd4_cb_sequence_done to skip the restart the > RPC if nfs4_client is in COURTESY state. > Curious, does this patch address the problem seen/discussed in the thread "NFSD threads hang when destroying a session or client ID" or that is something else? > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > --- > fs/nfsd/nfs4callback.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > index 50e468bdb8d4..c90f94898cc5 100644 > --- a/fs/nfsd/nfs4callback.c > +++ b/fs/nfsd/nfs4callback.c > @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > ret = false; > break; > case 1: > + if (clp->cl_state == NFSD4_COURTESY) { > + nfsd4_mark_cb_fault(cb->cb_clp); > + ret = false; > + break; > + } > /* > * cb_seq_status remains 1 if an RPC Reply was never > * received. NFSD can't know if the client processed > -- > 2.43.5 > >
On Wed, 2025-01-29 at 16:39 -0500, Chuck Lever wrote: > On 1/29/25 4:17 PM, Dai Ngo wrote: > > If nfs4_client is in COURTESY state then there is no point to retry > > the callback. This causes nfsd4_shutdown_callback to hang since > > cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP > > notifies NFSD that the connection was closed. > > > > This patch modifies nfsd4_cb_sequence_done to skip the restart the > > RPC if nfs4_client is in COURTESY state. > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com> > > --- > > fs/nfsd/nfs4callback.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > > index 50e468bdb8d4..c90f94898cc5 100644 > > --- a/fs/nfsd/nfs4callback.c > > +++ b/fs/nfsd/nfs4callback.c > > @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback > > ret = false; > > break; > > case 1: > > + if (clp->cl_state == NFSD4_COURTESY) { > > + nfsd4_mark_cb_fault(cb->cb_clp); > > + ret = false; > > + break; > > + } > > We could do toss this operation here or at the top of > nfsd4_run_cb_work(). > Like Olga, I'm wondering if this is the culprit on the recently reported rpc_shutdown_client hangs. I'm assuming that with a courtesy client that we don't want to do any callbacks? If that's the case, then I think doing it early in nfsd4_run_cb_work() would be better. That would prevent new cb's from being queued until the cl_state changes too. > > > /* > > * cb_seq_status remains 1 if an RPC Reply was never > > * received. NFSD can't know if the client processed > > Nice catch though!
On 1/29/25 2:38 PM, Jeff Layton wrote: > On Wed, 2025-01-29 at 16:39 -0500, Chuck Lever wrote: >> On 1/29/25 4:17 PM, Dai Ngo wrote: >>> If nfs4_client is in COURTESY state then there is no point to retry >>> the callback. This causes nfsd4_shutdown_callback to hang since >>> cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP >>> notifies NFSD that the connection was closed. >>> >>> This patch modifies nfsd4_cb_sequence_done to skip the restart the >>> RPC if nfs4_client is in COURTESY state. >>> >>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >>> --- >>> fs/nfsd/nfs4callback.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c >>> index 50e468bdb8d4..c90f94898cc5 100644 >>> --- a/fs/nfsd/nfs4callback.c >>> +++ b/fs/nfsd/nfs4callback.c >>> @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >>> ret = false; >>> break; >>> case 1: >>> + if (clp->cl_state == NFSD4_COURTESY) { >>> + nfsd4_mark_cb_fault(cb->cb_clp); >>> + ret = false; >>> + break; >>> + } >> We could do toss this operation here or at the top of >> nfsd4_run_cb_work(). >> > Like Olga, I'm wondering if this is the culprit on the recently > reported rpc_shutdown_client hangs. > > I'm assuming that with a courtesy client that we don't want to do any > callbacks? If that's the case, then I think doing it early in > nfsd4_run_cb_work() would be better. That would prevent new cb's from > being queued until the cl_state changes too. I'll move this check into nfsd4_run_cb_work. Thanks Chuck and Jeff. -Dai > > >>> /* >>> * cb_seq_status remains 1 if an RPC Reply was never >>> * received. NFSD can't know if the client processed >> > Nice catch though!
On 1/29/25 2:28 PM, Olga Kornievskaia wrote: > On Wed, Jan 29, 2025 at 4:17 PM Dai Ngo <dai.ngo@oracle.com> wrote: >> If nfs4_client is in COURTESY state then there is no point to retry >> the callback. This causes nfsd4_shutdown_callback to hang since >> cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP >> notifies NFSD that the connection was closed. >> >> This patch modifies nfsd4_cb_sequence_done to skip the restart the >> RPC if nfs4_client is in COURTESY state. >> > Curious, does this patch address the problem seen/discussed in the > thread "NFSD threads hang when destroying a session or client ID" or > that is something else? I'm not sure about the symptom in 6.1.y kernel. The problem that I reproduced here has the same symptom described in the thread for newer kernel; NFSv4 callback shutdown hang while waiting for cl_cb_inflight to drop to 0. -Dai > >> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> >> --- >> fs/nfsd/nfs4callback.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c >> index 50e468bdb8d4..c90f94898cc5 100644 >> --- a/fs/nfsd/nfs4callback.c >> +++ b/fs/nfsd/nfs4callback.c >> @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback >> ret = false; >> break; >> case 1: >> + if (clp->cl_state == NFSD4_COURTESY) { >> + nfsd4_mark_cb_fault(cb->cb_clp); >> + ret = false; >> + break; >> + } >> /* >> * cb_seq_status remains 1 if an RPC Reply was never >> * received. NFSD can't know if the client processed >> -- >> 2.43.5 >> >>
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 50e468bdb8d4..c90f94898cc5 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback ret = false; break; case 1: + if (clp->cl_state == NFSD4_COURTESY) { + nfsd4_mark_cb_fault(cb->cb_clp); + ret = false; + break; + } /* * cb_seq_status remains 1 if an RPC Reply was never * received. NFSD can't know if the client processed
If nfs4_client is in COURTESY state then there is no point to retry the callback. This causes nfsd4_shutdown_callback to hang since cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP notifies NFSD that the connection was closed. This patch modifies nfsd4_cb_sequence_done to skip the restart the RPC if nfs4_client is in COURTESY state. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/nfs4callback.c | 5 +++++ 1 file changed, 5 insertions(+)