diff mbox series

[1/1] NFSD: fix hang in nfsd4_shutdown_callback

Message ID 1738185446-7488-1-git-send-email-dai.ngo@oracle.com (mailing list archive)
State New
Headers show
Series [1/1] NFSD: fix hang in nfsd4_shutdown_callback | expand

Commit Message

Dai Ngo Jan. 29, 2025, 9:17 p.m. UTC
If nfs4_client is in COURTESY state then there is no point to retry
the callback. This causes nfsd4_shutdown_callback to hang since
cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
notifies NFSD that the connection was closed.

This patch modifies nfsd4_cb_sequence_done to skip the restart the
RPC if nfs4_client is in  COURTESY state.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/nfs4callback.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Chuck Lever Jan. 29, 2025, 9:39 p.m. UTC | #1
On 1/29/25 4:17 PM, Dai Ngo wrote:
> If nfs4_client is in COURTESY state then there is no point to retry
> the callback. This causes nfsd4_shutdown_callback to hang since
> cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
> notifies NFSD that the connection was closed.
> 
> This patch modifies nfsd4_cb_sequence_done to skip the restart the
> RPC if nfs4_client is in  COURTESY state.
> 
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/nfsd/nfs4callback.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 50e468bdb8d4..c90f94898cc5 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>  		ret = false;
>  		break;
>  	case 1:
> +		if (clp->cl_state == NFSD4_COURTESY) {
> +			nfsd4_mark_cb_fault(cb->cb_clp);
> +			ret = false;
> +			break;
> +		}

We could do toss this operation here or at the top of
nfsd4_run_cb_work().


>  		/*
>  		 * cb_seq_status remains 1 if an RPC Reply was never
>  		 * received. NFSD can't know if the client processed
Olga Kornievskaia Jan. 29, 2025, 10:28 p.m. UTC | #2
On Wed, Jan 29, 2025 at 4:17 PM Dai Ngo <dai.ngo@oracle.com> wrote:
>
> If nfs4_client is in COURTESY state then there is no point to retry
> the callback. This causes nfsd4_shutdown_callback to hang since
> cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
> notifies NFSD that the connection was closed.
>
> This patch modifies nfsd4_cb_sequence_done to skip the restart the
> RPC if nfs4_client is in  COURTESY state.
>

Curious, does this patch address the problem seen/discussed in the
thread "NFSD threads hang when destroying a session or client ID" or
that is something else?

> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/nfsd/nfs4callback.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 50e468bdb8d4..c90f94898cc5 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>                 ret = false;
>                 break;
>         case 1:
> +               if (clp->cl_state == NFSD4_COURTESY) {
> +                       nfsd4_mark_cb_fault(cb->cb_clp);
> +                       ret = false;
> +                       break;
> +               }
>                 /*
>                  * cb_seq_status remains 1 if an RPC Reply was never
>                  * received. NFSD can't know if the client processed
> --
> 2.43.5
>
>
Jeff Layton Jan. 29, 2025, 10:38 p.m. UTC | #3
On Wed, 2025-01-29 at 16:39 -0500, Chuck Lever wrote:
> On 1/29/25 4:17 PM, Dai Ngo wrote:
> > If nfs4_client is in COURTESY state then there is no point to retry
> > the callback. This causes nfsd4_shutdown_callback to hang since
> > cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
> > notifies NFSD that the connection was closed.
> > 
> > This patch modifies nfsd4_cb_sequence_done to skip the restart the
> > RPC if nfs4_client is in  COURTESY state.
> > 
> > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 50e468bdb8d4..c90f94898cc5 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  		ret = false;
> >  		break;
> >  	case 1:
> > +		if (clp->cl_state == NFSD4_COURTESY) {
> > +			nfsd4_mark_cb_fault(cb->cb_clp);
> > +			ret = false;
> > +			break;
> > +		}
> 
> We could do toss this operation here or at the top of
> nfsd4_run_cb_work().
> 

Like Olga, I'm wondering if this is the culprit on the recently
reported rpc_shutdown_client hangs.

I'm assuming that with a courtesy client that we don't want to do any
callbacks? If that's the case, then I think doing it early in
nfsd4_run_cb_work() would be better. That would prevent new cb's from
being queued until the cl_state changes too.


> 
> >  		/*
> >  		 * cb_seq_status remains 1 if an RPC Reply was never
> >  		 * received. NFSD can't know if the client processed
> 
> 

Nice catch though!
Dai Ngo Jan. 29, 2025, 11:47 p.m. UTC | #4
On 1/29/25 2:38 PM, Jeff Layton wrote:
> On Wed, 2025-01-29 at 16:39 -0500, Chuck Lever wrote:
>> On 1/29/25 4:17 PM, Dai Ngo wrote:
>>> If nfs4_client is in COURTESY state then there is no point to retry
>>> the callback. This causes nfsd4_shutdown_callback to hang since
>>> cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
>>> notifies NFSD that the connection was closed.
>>>
>>> This patch modifies nfsd4_cb_sequence_done to skip the restart the
>>> RPC if nfs4_client is in  COURTESY state.
>>>
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> ---
>>>   fs/nfsd/nfs4callback.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>> index 50e468bdb8d4..c90f94898cc5 100644
>>> --- a/fs/nfsd/nfs4callback.c
>>> +++ b/fs/nfsd/nfs4callback.c
>>> @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>   		ret = false;
>>>   		break;
>>>   	case 1:
>>> +		if (clp->cl_state == NFSD4_COURTESY) {
>>> +			nfsd4_mark_cb_fault(cb->cb_clp);
>>> +			ret = false;
>>> +			break;
>>> +		}
>> We could do toss this operation here or at the top of
>> nfsd4_run_cb_work().
>>
> Like Olga, I'm wondering if this is the culprit on the recently
> reported rpc_shutdown_client hangs.
>
> I'm assuming that with a courtesy client that we don't want to do any
> callbacks? If that's the case, then I think doing it early in
> nfsd4_run_cb_work() would be better. That would prevent new cb's from
> being queued until the cl_state changes too.

I'll move this check into nfsd4_run_cb_work.

Thanks Chuck and Jeff.
-Dai

>
>
>>>   		/*
>>>   		 * cb_seq_status remains 1 if an RPC Reply was never
>>>   		 * received. NFSD can't know if the client processed
>>
> Nice catch though!
Dai Ngo Jan. 29, 2025, 11:49 p.m. UTC | #5
On 1/29/25 2:28 PM, Olga Kornievskaia wrote:
> On Wed, Jan 29, 2025 at 4:17 PM Dai Ngo <dai.ngo@oracle.com> wrote:
>> If nfs4_client is in COURTESY state then there is no point to retry
>> the callback. This causes nfsd4_shutdown_callback to hang since
>> cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
>> notifies NFSD that the connection was closed.
>>
>> This patch modifies nfsd4_cb_sequence_done to skip the restart the
>> RPC if nfs4_client is in  COURTESY state.
>>
> Curious, does this patch address the problem seen/discussed in the
> thread "NFSD threads hang when destroying a session or client ID" or
> that is something else?

I'm not sure about the symptom in 6.1.y kernel.

The problem that I reproduced here has the same symptom described in
the thread for newer kernel; NFSv4 callback shutdown hang while waiting
for cl_cb_inflight to drop to 0.

-Dai

>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/nfsd/nfs4callback.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 50e468bdb8d4..c90f94898cc5 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>                  ret = false;
>>                  break;
>>          case 1:
>> +               if (clp->cl_state == NFSD4_COURTESY) {
>> +                       nfsd4_mark_cb_fault(cb->cb_clp);
>> +                       ret = false;
>> +                       break;
>> +               }
>>                  /*
>>                   * cb_seq_status remains 1 if an RPC Reply was never
>>                   * received. NFSD can't know if the client processed
>> --
>> 2.43.5
>>
>>
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 50e468bdb8d4..c90f94898cc5 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1372,6 +1372,11 @@  static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		ret = false;
 		break;
 	case 1:
+		if (clp->cl_state == NFSD4_COURTESY) {
+			nfsd4_mark_cb_fault(cb->cb_clp);
+			ret = false;
+			break;
+		}
 		/*
 		 * cb_seq_status remains 1 if an RPC Reply was never
 		 * received. NFSD can't know if the client processed