diff mbox series

[-next] SUNRPC: Clean XPRT_CONGESTED of xprt->state when rpc task is killed

Message ID 1639490018-128451-2-git-send-email-cuibixuan@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series [-next] SUNRPC: Clean XPRT_CONGESTED of xprt->state when rpc task is killed | expand

Commit Message

Bixuan Cui Dec. 14, 2021, 1:53 p.m. UTC
When the values of tcp_max_slot_table_entries and
sunrpc.tcp_slot_table_entries are lower than the number of rpc tasks,
xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -EAGAIN, and
then set xprt->state to XPRT_CONGESTED:
  xprt_retry_reserve
    ->xprt_do_reserve
      ->xprt_alloc_slot
        ->xprt_dynamic_alloc_slot // return -EAGAIN and task->tk_rqstp is NULL
          ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt->state);

When rpc task is killed, XPRT_CONGESTED bit of xprt->state will not be
cleaned up and nfs hangs:
  rpc_exit_task
    ->xprt_release // if (req == NULL) is true, then XPRT_CONGESTED
		   // bit not clean

Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in
xprt_release().

Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
Signed-off-by: Xiaohui Pei <xiaoh.peixh@alibaba-inc.com>
---
 net/sunrpc/xprt.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Bixuan Cui Dec. 20, 2021, 3:39 a.m. UTC | #1
ping~

在 2021/12/14 下午9:53, Bixuan Cui 写道:
> When the values of tcp_max_slot_table_entries and
> sunrpc.tcp_slot_table_entries are lower than the number of rpc tasks,
> xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -EAGAIN, and
> then set xprt->state to XPRT_CONGESTED:
>    xprt_retry_reserve
>      ->xprt_do_reserve
>        ->xprt_alloc_slot
>          ->xprt_dynamic_alloc_slot // return -EAGAIN and task->tk_rqstp is NULL
>            ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt->state);
>
> When rpc task is killed, XPRT_CONGESTED bit of xprt->state will not be
> cleaned up and nfs hangs:
>    rpc_exit_task
>      ->xprt_release // if (req == NULL) is true, then XPRT_CONGESTED
> 		   // bit not clean
>
> Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in
> xprt_release().
>
> Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
> Signed-off-by: Xiaohui Pei <xiaoh.peixh@alibaba-inc.com>
> ---
>   net/sunrpc/xprt.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index a02de2b..70d11ae 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1952,6 +1952,7 @@ void xprt_release(struct rpc_task *task)
>   	if (req == NULL) {
>   		if (task->tk_client) {
>   			xprt = task->tk_xprt;
> +			xprt_wake_up_backlog(xprt);
>   			xprt_release_write(xprt, task);
>   		}
>   		return;
Trond Myklebust Dec. 20, 2021, 6:22 p.m. UTC | #2
On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote:
> ping~
> 
> 在 2021/12/14 下午9:53, Bixuan Cui 写道:
> > When the values of tcp_max_slot_table_entries and
> > sunrpc.tcp_slot_table_entries are lower than the number of rpc
> > tasks,
> > xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -EAGAIN,
> > and
> > then set xprt->state to XPRT_CONGESTED:
> >    xprt_retry_reserve
> >      ->xprt_do_reserve
> >        ->xprt_alloc_slot
> >          ->xprt_dynamic_alloc_slot // return -EAGAIN and task-
> > >tk_rqstp is NULL
> >            ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt-
> > >state);
> > 
> > When rpc task is killed, XPRT_CONGESTED bit of xprt->state will not
> > be
> > cleaned up and nfs hangs:
> >    rpc_exit_task
> >      ->xprt_release // if (req == NULL) is true, then
> > XPRT_CONGESTED
> >                    // bit not clean
> > 
> > Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in
> > xprt_release().

I'm not seeing how this explanation makes sense. If the task doesn't
hold a slot, then freeing that task isn't going to clear the congestion
caused by all the slots being in use.
Trond Myklebust Dec. 22, 2021, 3:02 p.m. UTC | #3
On Wed, 2021-12-22 at 10:55 +0800, Bixuan Cui wrote:
> 
> 在 2021/12/21 上午2:22, Trond Myklebust 写道:
>  
> > On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote:
> >  
> > > ping~
> > > 
> > > 在 2021/12/14 下午9:53, Bixuan Cui 写道:
> > >  
> > > > When the values of tcp_max_slot_table_entries and
> > > > sunrpc.tcp_slot_table_entries are lower than the number of rpc
> > > > tasks,
> > > > xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -
> > > > EAGAIN,
> > > > and
> > > > then set xprt->state to XPRT_CONGESTED:
> > > >    xprt_retry_reserve
> > > >      ->xprt_do_reserve
> > > >        ->xprt_alloc_slot
> > > >          ->xprt_dynamic_alloc_slot // return -EAGAIN and task-
> > > >  
> > > > > tk_rqstp is NULL
> > > >            ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt-
> > > >  
> > > > > state);
> > > > When rpc task is killed, XPRT_CONGESTED bit of xprt->state will
> > > > not
> > > > be
> > > > cleaned up and nfs hangs:
> > > >    rpc_exit_task
> > > >      ->xprt_release // if (req == NULL) is true, then
> > > > XPRT_CONGESTED
> > > >                    // bit not clean
> > > > 
> > > > Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in
> > > > xprt_release().
> > I'm not seeing how this explanation makes sense. If the task
> > doesn't
> > hold a slot, then freeing that task isn't going to clear the
> > congestion
> > caused by all the slots being in use.
> Hi,
> If the rpc task is free, call xprt_release() :
> void xprt_release(struct rpc_task *task)
>  {
>       if (req == NULL) {
>                   if (task->tk_client) {
>                           xprt = task->tk_xprt;
>                           xprt_release_write(xprt, task); // 1.
> release xprt
>                   }
>                   return;
>           }
>       ....
>       if (likely(!bc_prealloc(req)))
>                   xprt->ops->free_slot(xprt, req); // 2. release slot
> and call xprt_wake_up_backlog(xprt, req) to wakeup next task(clear
> XPRT_CONGESTED bit if next is NULL) in xprt_free_slot()
>           else
>                   xprt_free_bc_request(req);
>  }
>  I mean that in step 1, xprt was only released, but
> xprt_wake_up_backlog was not called (I don’t know if it is necessary,
> but xprt->state may still be XPRT_CONGESTED), which causes xprt to
> hold up. I think it happens when the task that does not hold a slot
> is the last released task,xprt_wake_up_backlog(clear XPRT_CONGESTED)
> will not be executed. :-)
> Thanks,
> Bixuan Cui
> 
>  

My point is that in that case 1, there is no slot to free, so there is
no change to the congestion state.

IOW: your patch is incorrect because it is trying to assign a slot in a
case where there is no slot to assign.
Bixuan Cui Dec. 31, 2021, 2 a.m. UTC | #4
在 2021/12/22 下午11:02, Trond Myklebust 写道:
> On Wed, 2021-12-22 at 10:55 +0800, Bixuan Cui wrote:
>> 在 2021/12/21 上午2:22, Trond Myklebust 写道:
>>   
>>> On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote:
>>>   
>>>> ping~
>>>>
>>>> 在 2021/12/14 下午9:53, Bixuan Cui 写道:
>>>>   
>>>>> When the values of tcp_max_slot_table_entries and
>>>>> sunrpc.tcp_slot_table_entries are lower than the number of rpc
>>>>> tasks,
>>>>> xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -
>>>>> EAGAIN,
>>>>> and
>>>>> then set xprt->state to XPRT_CONGESTED:
>>>>>     xprt_retry_reserve
>>>>>       ->xprt_do_reserve
>>>>>         ->xprt_alloc_slot
>>>>>           ->xprt_dynamic_alloc_slot // return -EAGAIN and task-
>>>>>   
>>>>>> tk_rqstp is NULL
>>>>>             ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt-
>>>>>   
>>>>>> state);
>>>>> When rpc task is killed, XPRT_CONGESTED bit of xprt->state will
>>>>> not
>>>>> be
>>>>> cleaned up and nfs hangs:
>>>>>     rpc_exit_task
>>>>>       ->xprt_release // if (req == NULL) is true, then
>>>>> XPRT_CONGESTED
>>>>>                     // bit not clean
>>>>>
>>>>> Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in
>>>>> xprt_release().
>>> I'm not seeing how this explanation makes sense. If the task
>>> doesn't
>>> hold a slot, then freeing that task isn't going to clear the
>>> congestion
>>> caused by all the slots being in use.
>> Hi,
>> If the rpc task is free, call xprt_release() :
>> void xprt_release(struct rpc_task *task)
>>   {
>>        if (req == NULL) {
>>                    if (task->tk_client) {
>>                            xprt = task->tk_xprt;
>>                            xprt_release_write(xprt, task); // 1.
>> release xprt
>>                    }
>>                    return;
>>            }
>>        ....
>>        if (likely(!bc_prealloc(req)))
>>                    xprt->ops->free_slot(xprt, req); // 2. release slot
>> and call xprt_wake_up_backlog(xprt, req) to wakeup next task(clear
>> XPRT_CONGESTED bit if next is NULL) in xprt_free_slot()
>>            else
>>                    xprt_free_bc_request(req);
>>   }
>>   I mean that in step 1, xprt was only released, but
>> xprt_wake_up_backlog was not called (I don’t know if it is necessary,
>> but xprt->state may still be XPRT_CONGESTED), which causes xprt to
>> hold up. I think it happens when the task that does not hold a slot
>> is the last released task,xprt_wake_up_backlog(clear XPRT_CONGESTED)
>> will not be executed. :-)
>> Thanks,
>> Bixuan Cui
>>
>>   
> My point is that in that case 1, there is no slot to free, so there is
> no change to the congestion state.
>
> IOW: your patch is incorrect because it is trying to assign a slot in a
> case where there is no slot to assign.
Hi,
I found the correct way to fix it, that is, do not free the request when 
there are tasks in the xprt->backlog :-)
And it has been fixed by e877a88d1f06 (SUNRPC in case of backlog, hand 
free slots directly to waiting task)
     commit e877a88d1f069edced4160792f42c2a8e2dba942
     Author: NeilBrown <neilb@suse.de>
     Date:   Mon May 17 09:59:10 2021 +1000

     SUNRPC in case of backlog, hand free slots directly to waiting task

     If sunrpc.tcp_max_slot_table_entries is small and there are tasks
     on the backlog queue, then when a request completes it is freed and the
     first task on the queue is woken.  The expectation is that it will wake
     and claim that request.  However if it was a sync task and the waiting
     process was killed at just that moment, it will wake and NOT claim the
     request.
Thanks for your advice.

Thanks,
Bixuan Cui
diff mbox series

Patch

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a02de2b..70d11ae 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1952,6 +1952,7 @@  void xprt_release(struct rpc_task *task)
 	if (req == NULL) {
 		if (task->tk_client) {
 			xprt = task->tk_xprt;
+			xprt_wake_up_backlog(xprt);
 			xprt_release_write(xprt, task);
 		}
 		return;