Message ID | 1639490018-128451-2-git-send-email-cuibixuan@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [-next] SUNRPC: Clean XPRT_CONGESTED of xprt->state when rpc task is killed | expand |
ping~ 在 2021/12/14 下午9:53, Bixuan Cui 写道: > When the values of tcp_max_slot_table_entries and > sunrpc.tcp_slot_table_entries are lower than the number of rpc tasks, > xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -EAGAIN, and > then set xprt->state to XPRT_CONGESTED: > xprt_retry_reserve > ->xprt_do_reserve > ->xprt_alloc_slot > ->xprt_dynamic_alloc_slot // return -EAGAIN and task->tk_rqstp is NULL > ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt->state); > > When rpc task is killed, XPRT_CONGESTED bit of xprt->state will not be > cleaned up and nfs hangs: > rpc_exit_task > ->xprt_release // if (req == NULL) is true, then XPRT_CONGESTED > // bit not clean > > Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in > xprt_release(). > > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> > Signed-off-by: Xiaohui Pei <xiaoh.peixh@alibaba-inc.com> > --- > net/sunrpc/xprt.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index a02de2b..70d11ae 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1952,6 +1952,7 @@ void xprt_release(struct rpc_task *task) > if (req == NULL) { > if (task->tk_client) { > xprt = task->tk_xprt; > + xprt_wake_up_backlog(xprt); > xprt_release_write(xprt, task); > } > return;
On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote: > ping~ > > 在 2021/12/14 下午9:53, Bixuan Cui 写道: > > When the values of tcp_max_slot_table_entries and > > sunrpc.tcp_slot_table_entries are lower than the number of rpc > > tasks, > > xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return -EAGAIN, > > and > > then set xprt->state to XPRT_CONGESTED: > > xprt_retry_reserve > > ->xprt_do_reserve > > ->xprt_alloc_slot > > ->xprt_dynamic_alloc_slot // return -EAGAIN and task- > > >tk_rqstp is NULL > > ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt- > > >state); > > > > When rpc task is killed, XPRT_CONGESTED bit of xprt->state will not > > be > > cleaned up and nfs hangs: > > rpc_exit_task > > ->xprt_release // if (req == NULL) is true, then > > XPRT_CONGESTED > > // bit not clean > > > > Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in > > xprt_release(). I'm not seeing how this explanation makes sense. If the task doesn't hold a slot, then freeing that task isn't going to clear the congestion caused by all the slots being in use.
On Wed, 2021-12-22 at 10:55 +0800, Bixuan Cui wrote: > > 在 2021/12/21 上午2:22, Trond Myklebust 写道: > > > On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote: > > > > > ping~ > > > > > > 在 2021/12/14 下午9:53, Bixuan Cui 写道: > > > > > > > When the values of tcp_max_slot_table_entries and > > > > sunrpc.tcp_slot_table_entries are lower than the number of rpc > > > > tasks, > > > > xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return - > > > > EAGAIN, > > > > and > > > > then set xprt->state to XPRT_CONGESTED: > > > > xprt_retry_reserve > > > > ->xprt_do_reserve > > > > ->xprt_alloc_slot > > > > ->xprt_dynamic_alloc_slot // return -EAGAIN and task- > > > > > > > > > tk_rqstp is NULL > > > > ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt- > > > > > > > > > state); > > > > When rpc task is killed, XPRT_CONGESTED bit of xprt->state will > > > > not > > > > be > > > > cleaned up and nfs hangs: > > > > rpc_exit_task > > > > ->xprt_release // if (req == NULL) is true, then > > > > XPRT_CONGESTED > > > > // bit not clean > > > > > > > > Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in > > > > xprt_release(). > > I'm not seeing how this explanation makes sense. If the task > > doesn't > > hold a slot, then freeing that task isn't going to clear the > > congestion > > caused by all the slots being in use. > Hi, > If the rpc task is free, call xprt_release() : > void xprt_release(struct rpc_task *task) > { > if (req == NULL) { > if (task->tk_client) { > xprt = task->tk_xprt; > xprt_release_write(xprt, task); // 1. > release xprt > } > return; > } > .... > if (likely(!bc_prealloc(req))) > xprt->ops->free_slot(xprt, req); // 2. release slot > and call xprt_wake_up_backlog(xprt, req) to wakeup next task(clear > XPRT_CONGESTED bit if next is NULL) in xprt_free_slot() > else > xprt_free_bc_request(req); > } > I mean that in step 1, xprt was only released, but > xprt_wake_up_backlog was not called (I don’t know if it is necessary, > but xprt->state may still be XPRT_CONGESTED), which causes xprt to > hold up. I think it happens when the task that does not hold a slot > is the last released task,xprt_wake_up_backlog(clear XPRT_CONGESTED) > will not be executed. :-) > Thanks, > Bixuan Cui > > My point is that in that case 1, there is no slot to free, so there is no change to the congestion state. IOW: your patch is incorrect because it is trying to assign a slot in a case where there is no slot to assign.
在 2021/12/22 下午11:02, Trond Myklebust 写道: > On Wed, 2021-12-22 at 10:55 +0800, Bixuan Cui wrote: >> 在 2021/12/21 上午2:22, Trond Myklebust 写道: >> >>> On Mon, 2021-12-20 at 11:39 +0800, Bixuan Cui wrote: >>> >>>> ping~ >>>> >>>> 在 2021/12/14 下午9:53, Bixuan Cui 写道: >>>> >>>>> When the values of tcp_max_slot_table_entries and >>>>> sunrpc.tcp_slot_table_entries are lower than the number of rpc >>>>> tasks, >>>>> xprt_dynamic_alloc_slot() in xprt_alloc_slot() will return - >>>>> EAGAIN, >>>>> and >>>>> then set xprt->state to XPRT_CONGESTED: >>>>> xprt_retry_reserve >>>>> ->xprt_do_reserve >>>>> ->xprt_alloc_slot >>>>> ->xprt_dynamic_alloc_slot // return -EAGAIN and task- >>>>> >>>>>> tk_rqstp is NULL >>>>> ->xprt_add_backlog // set_bit(XPRT_CONGESTED, &xprt- >>>>> >>>>>> state); >>>>> When rpc task is killed, XPRT_CONGESTED bit of xprt->state will >>>>> not >>>>> be >>>>> cleaned up and nfs hangs: >>>>> rpc_exit_task >>>>> ->xprt_release // if (req == NULL) is true, then >>>>> XPRT_CONGESTED >>>>> // bit not clean >>>>> >>>>> Add xprt_wake_up_backlog(xprt) to clean XPRT_CONGESTED bit in >>>>> xprt_release(). >>> I'm not seeing how this explanation makes sense. If the task >>> doesn't >>> hold a slot, then freeing that task isn't going to clear the >>> congestion >>> caused by all the slots being in use. >> Hi, >> If the rpc task is free, call xprt_release() : >> void xprt_release(struct rpc_task *task) >> { >> if (req == NULL) { >> if (task->tk_client) { >> xprt = task->tk_xprt; >> xprt_release_write(xprt, task); // 1. >> release xprt >> } >> return; >> } >> .... >> if (likely(!bc_prealloc(req))) >> xprt->ops->free_slot(xprt, req); // 2. release slot >> and call xprt_wake_up_backlog(xprt, req) to wakeup next task(clear >> XPRT_CONGESTED bit if next is NULL) in xprt_free_slot() >> else >> xprt_free_bc_request(req); >> } >> I mean that in step 1, xprt was only released, but >> xprt_wake_up_backlog was not called (I don’t know if it is necessary, >> but xprt->state may still be XPRT_CONGESTED), which causes xprt to >> hold up. I think it happens when the task that does not hold a slot >> is the last released task,xprt_wake_up_backlog(clear XPRT_CONGESTED) >> will not be executed. :-) >> Thanks, >> Bixuan Cui >> >> > My point is that in that case 1, there is no slot to free, so there is > no change to the congestion state. > > IOW: your patch is incorrect because it is trying to assign a slot in a > case where there is no slot to assign. Hi, I found the correct way to fix it, that is, do not free the request when there are tasks in the xprt->backlog :-) And it has been fixed by e877a88d1f06 (SUNRPC in case of backlog, hand free slots directly to waiting task) commit e877a88d1f069edced4160792f42c2a8e2dba942 Author: NeilBrown <neilb@suse.de> Date: Mon May 17 09:59:10 2021 +1000 SUNRPC in case of backlog, hand free slots directly to waiting task If sunrpc.tcp_max_slot_table_entries is small and there are tasks on the backlog queue, then when a request completes it is freed and the first task on the queue is woken. The expectation is that it will wake and claim that request. However if it was a sync task and the waiting process was killed at just that moment, it will wake and NOT claim the request. Thanks for your advice. Thanks, Bixuan Cui
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a02de2b..70d11ae 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1952,6 +1952,7 @@ void xprt_release(struct rpc_task *task) if (req == NULL) { if (task->tk_client) { xprt = task->tk_xprt; + xprt_wake_up_backlog(xprt); xprt_release_write(xprt, task); } return;