Message ID | 20230731064839.7729-7-neilb@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SUNRPC: various thread management improvements | expand |
On Mon, Jul 31, 2023 at 04:48:33PM +1000, NeilBrown wrote: > svc_get_next_xprt() does a lot more than just get an xprt. It also > decides if it needs to sleep, depending not only on the availability of > xprts, but also on the need to exit or handle external work > (SP_TASK_PENDING). > > So rename it to svc_rqst_wait_and_dequeue_work(), don't return the xprt > (which can easily be found in rqstp->rq_xprt), and restructure to make a > clear separation between waiting and dequeueing. For me, the most valuable part of this patch is the last part here: refactoring the dequeue and the wait, and deduplicating the dequeue. > All the scheduling-related code like try_to_freeze() and > kthread_should_stop() is moved into svc_rqst_wait_and_dequeue_work(). > > Rather than calling svc_xprt_dequeue() twice (before and after deciding > to wait), it now calls rqst_should_sleep() twice. If the first fails, > we skip all the waiting code completely. In the waiting code we call > again after setting the task state in case we missed a wake-up. > > We now only have one call to try_to_freeze() and one call to > svc_xprt_dequeue(). We still have two calls to kthread_should_stop() - > one in rqst_should_sleep() to avoid sleeping, and one afterwards to > avoid dequeueing any work (it previously came after dequeueing which > doesn't seem right). > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > net/sunrpc/svc_xprt.c | 62 +++++++++++++++++++++---------------------- > 1 file changed, 31 insertions(+), 31 deletions(-) > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > index 380fb3caea4c..67f2b34cb8e4 100644 > --- a/net/sunrpc/svc_xprt.c > +++ b/net/sunrpc/svc_xprt.c > @@ -722,47 +722,51 @@ rqst_should_sleep(struct svc_rqst *rqstp) > return true; > } > > -static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > +static void svc_rqst_wait_and_dequeue_work(struct svc_rqst *rqstp) It would be simpler to follow if you renamed this function once (here), and changed directly from returning struct svc_xprt to returning bool. > { > struct svc_pool *pool = rqstp->rq_pool; > + bool slept = false; > > /* rq_xprt should be clear on entry */ > WARN_ON_ONCE(rqstp->rq_xprt); > > - rqstp->rq_xprt = svc_xprt_dequeue(pool); > - if (rqstp->rq_xprt) { > - trace_svc_pool_polled(rqstp); > - goto out_found; > + if (rqst_should_sleep(rqstp)) { > + set_current_state(TASK_IDLE); > + smp_mb__before_atomic(); > + clear_bit(SP_CONGESTED, &pool->sp_flags); > + clear_bit(RQ_BUSY, &rqstp->rq_flags); > + smp_mb__after_atomic(); > + > + /* Need to test again after setting task state */ This comment isn't illuminating. It needs to explain the "need to test again". > + if (likely(rqst_should_sleep(rqstp))) { Is likely() still needed here? > + schedule(); > + slept = true; > + } else { > + __set_current_state(TASK_RUNNING); > + cond_resched(); This makes me happy. Only call cond_resched() if we didn't sleep. > + } > + set_bit(RQ_BUSY, &rqstp->rq_flags); > + smp_mb__after_atomic(); > } > - > - set_current_state(TASK_IDLE); > - smp_mb__before_atomic(); > - clear_bit(SP_CONGESTED, &pool->sp_flags); > - clear_bit(RQ_BUSY, &rqstp->rq_flags); > - smp_mb__after_atomic(); > - > - if (likely(rqst_should_sleep(rqstp))) > - schedule(); > - else > - __set_current_state(TASK_RUNNING); > - > try_to_freeze(); > > - set_bit(RQ_BUSY, &rqstp->rq_flags); > - smp_mb__after_atomic(); > + if (kthread_should_stop()) > + return; > + > clear_bit(SP_TASK_PENDING, &pool->sp_flags); > rqstp->rq_xprt = svc_xprt_dequeue(pool); > if (rqstp->rq_xprt) { > - trace_svc_pool_awoken(rqstp); > + if (slept) > + trace_svc_pool_awoken(rqstp); > + else > + trace_svc_pool_polled(rqstp); Again, it would perhaps be better if we rearranged this code first, and then added tracepoints. This is ... well, ugly. > goto out_found; > } > > - if (kthread_should_stop()) > - return NULL; > - percpu_counter_inc(&pool->sp_threads_no_work); > - return NULL; > + if (slept) > + percpu_counter_inc(&pool->sp_threads_no_work); > + return; > out_found: > - clear_bit(SP_TASK_PENDING, &pool->sp_flags); > /* Normally we will wait up to 5 seconds for any required > * cache information to be provided. > */ > @@ -770,7 +774,6 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > rqstp->rq_chandle.thread_wait = 5*HZ; > else > rqstp->rq_chandle.thread_wait = 1*HZ; > - return rqstp->rq_xprt; > } > > static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt) > @@ -854,12 +857,9 @@ void svc_recv(struct svc_rqst *rqstp) > if (!svc_alloc_arg(rqstp)) > goto out; > > - try_to_freeze(); > - cond_resched(); > - if (kthread_should_stop()) > - goto out; > + svc_rqst_wait_and_dequeue_work(rqstp); > > - xprt = svc_get_next_xprt(rqstp); > + xprt = rqstp->rq_xprt; > if (!xprt) > goto out; > > -- > 2.40.1 >
On Mon, Jul 31, 2023 at 07:16:01PM -0400, Chuck Lever wrote: > On Mon, Jul 31, 2023 at 04:48:33PM +1000, NeilBrown wrote: > > svc_get_next_xprt() does a lot more than just get an xprt. It also > > decides if it needs to sleep, depending not only on the availability of > > xprts, but also on the need to exit or handle external work > > (SP_TASK_PENDING). > > > > So rename it to svc_rqst_wait_and_dequeue_work(), don't return the xprt > > (which can easily be found in rqstp->rq_xprt), and restructure to make a > > clear separation between waiting and dequeueing. > > For me, the most valuable part of this patch is the last part here: > refactoring the dequeue and the wait, and deduplicating the dequeue. > > > > All the scheduling-related code like try_to_freeze() and > > kthread_should_stop() is moved into svc_rqst_wait_and_dequeue_work(). > > > > Rather than calling svc_xprt_dequeue() twice (before and after deciding > > to wait), it now calls rqst_should_sleep() twice. If the first fails, > > we skip all the waiting code completely. In the waiting code we call > > again after setting the task state in case we missed a wake-up. > > > > We now only have one call to try_to_freeze() and one call to > > svc_xprt_dequeue(). We still have two calls to kthread_should_stop() - > > one in rqst_should_sleep() to avoid sleeping, and one afterwards to > > avoid dequeueing any work (it previously came after dequeueing which > > doesn't seem right). > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > net/sunrpc/svc_xprt.c | 62 +++++++++++++++++++++---------------------- > > 1 file changed, 31 insertions(+), 31 deletions(-) > > > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > > index 380fb3caea4c..67f2b34cb8e4 100644 > > --- a/net/sunrpc/svc_xprt.c > > +++ b/net/sunrpc/svc_xprt.c > > @@ -722,47 +722,51 @@ rqst_should_sleep(struct svc_rqst *rqstp) > > return true; > > } > > > > -static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > > +static void svc_rqst_wait_and_dequeue_work(struct svc_rqst *rqstp) > > It would be simpler to follow if you renamed this function once > (here), and changed directly from returning struct svc_xprt to > returning bool. > > > > { > > struct svc_pool *pool = rqstp->rq_pool; > > + bool slept = false; > > > > /* rq_xprt should be clear on entry */ > > WARN_ON_ONCE(rqstp->rq_xprt); > > > > - rqstp->rq_xprt = svc_xprt_dequeue(pool); > > - if (rqstp->rq_xprt) { > > - trace_svc_pool_polled(rqstp); > > - goto out_found; > > + if (rqst_should_sleep(rqstp)) { > > + set_current_state(TASK_IDLE); > > + smp_mb__before_atomic(); > > + clear_bit(SP_CONGESTED, &pool->sp_flags); > > + clear_bit(RQ_BUSY, &rqstp->rq_flags); > > + smp_mb__after_atomic(); > > + > > + /* Need to test again after setting task state */ > > This comment isn't illuminating. It needs to explain the "need to > test again". > > > > + if (likely(rqst_should_sleep(rqstp))) { > > Is likely() still needed here? > > > > + schedule(); > > + slept = true; > > + } else { > > + __set_current_state(TASK_RUNNING); > > + cond_resched(); > > This makes me happy. Only call cond_resched() if we didn't sleep. > > > > + } > > + set_bit(RQ_BUSY, &rqstp->rq_flags); > > + smp_mb__after_atomic(); > > } > > - > > - set_current_state(TASK_IDLE); > > - smp_mb__before_atomic(); > > - clear_bit(SP_CONGESTED, &pool->sp_flags); > > - clear_bit(RQ_BUSY, &rqstp->rq_flags); > > - smp_mb__after_atomic(); > > - > > - if (likely(rqst_should_sleep(rqstp))) > > - schedule(); > > - else > > - __set_current_state(TASK_RUNNING); > > - > > try_to_freeze(); > > > > - set_bit(RQ_BUSY, &rqstp->rq_flags); > > - smp_mb__after_atomic(); > > + if (kthread_should_stop()) > > + return; > > + > > clear_bit(SP_TASK_PENDING, &pool->sp_flags); > > rqstp->rq_xprt = svc_xprt_dequeue(pool); > > if (rqstp->rq_xprt) { > > - trace_svc_pool_awoken(rqstp); > > + if (slept) > > + trace_svc_pool_awoken(rqstp); > > + else > > + trace_svc_pool_polled(rqstp); > > Again, it would perhaps be better if we rearranged this code first, > and then added tracepoints. This is ... well, ugly. I've dropped the three tracepoint patches and pushed out the changes to topic-sunrpc-thread-scheduling . We can circle back to adding tracepoints once this code has settled. > > goto out_found; > > } > > > > - if (kthread_should_stop()) > > - return NULL; > > - percpu_counter_inc(&pool->sp_threads_no_work); > > - return NULL; > > + if (slept) > > + percpu_counter_inc(&pool->sp_threads_no_work); > > + return; > > out_found: > > - clear_bit(SP_TASK_PENDING, &pool->sp_flags); > > /* Normally we will wait up to 5 seconds for any required > > * cache information to be provided. > > */ > > @@ -770,7 +774,6 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > > rqstp->rq_chandle.thread_wait = 5*HZ; > > else > > rqstp->rq_chandle.thread_wait = 1*HZ; > > - return rqstp->rq_xprt; > > } > > > > static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt) > > @@ -854,12 +857,9 @@ void svc_recv(struct svc_rqst *rqstp) > > if (!svc_alloc_arg(rqstp)) > > goto out; > > > > - try_to_freeze(); > > - cond_resched(); > > - if (kthread_should_stop()) > > - goto out; > > + svc_rqst_wait_and_dequeue_work(rqstp); > > > > - xprt = svc_get_next_xprt(rqstp); > > + xprt = rqstp->rq_xprt; > > if (!xprt) > > goto out; > > > > -- > > 2.40.1 > > > > -- > Chuck Lever
On Tue, 01 Aug 2023, Chuck Lever wrote: > On Mon, Jul 31, 2023 at 04:48:33PM +1000, NeilBrown wrote: > > svc_get_next_xprt() does a lot more than just get an xprt. It also > > decides if it needs to sleep, depending not only on the availability of > > xprts, but also on the need to exit or handle external work > > (SP_TASK_PENDING). > > > > So rename it to svc_rqst_wait_and_dequeue_work(), don't return the xprt > > (which can easily be found in rqstp->rq_xprt), and restructure to make a > > clear separation between waiting and dequeueing. > > For me, the most valuable part of this patch is the last part here: > refactoring the dequeue and the wait, and deduplicating the dequeue. > > > > All the scheduling-related code like try_to_freeze() and > > kthread_should_stop() is moved into svc_rqst_wait_and_dequeue_work(). > > > > Rather than calling svc_xprt_dequeue() twice (before and after deciding > > to wait), it now calls rqst_should_sleep() twice. If the first fails, > > we skip all the waiting code completely. In the waiting code we call > > again after setting the task state in case we missed a wake-up. > > > > We now only have one call to try_to_freeze() and one call to > > svc_xprt_dequeue(). We still have two calls to kthread_should_stop() - > > one in rqst_should_sleep() to avoid sleeping, and one afterwards to > > avoid dequeueing any work (it previously came after dequeueing which > > doesn't seem right). > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > net/sunrpc/svc_xprt.c | 62 +++++++++++++++++++++---------------------- > > 1 file changed, 31 insertions(+), 31 deletions(-) > > > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > > index 380fb3caea4c..67f2b34cb8e4 100644 > > --- a/net/sunrpc/svc_xprt.c > > +++ b/net/sunrpc/svc_xprt.c > > @@ -722,47 +722,51 @@ rqst_should_sleep(struct svc_rqst *rqstp) > > return true; > > } > > > > -static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > > +static void svc_rqst_wait_and_dequeue_work(struct svc_rqst *rqstp) > > It would be simpler to follow if you renamed this function once > (here), and changed directly from returning struct svc_xprt to > returning bool. It isn't clear to me why it would be simpler, or exactly what you are suggesting. Should U just squash SUNRPC: rename and refactor svc_get_next_xprt(). and SUNRPC: move task-dequeueing code into svc_recv() together? I can see that it would make sense to move SUNRPC: move all of xprt handling into svc_xprt_handle() earlier. > > > > { > > struct svc_pool *pool = rqstp->rq_pool; > > + bool slept = false; > > > > /* rq_xprt should be clear on entry */ > > WARN_ON_ONCE(rqstp->rq_xprt); > > > > - rqstp->rq_xprt = svc_xprt_dequeue(pool); > > - if (rqstp->rq_xprt) { > > - trace_svc_pool_polled(rqstp); > > - goto out_found; > > + if (rqst_should_sleep(rqstp)) { > > + set_current_state(TASK_IDLE); > > + smp_mb__before_atomic(); > > + clear_bit(SP_CONGESTED, &pool->sp_flags); > > + clear_bit(RQ_BUSY, &rqstp->rq_flags); > > + smp_mb__after_atomic(); > > + > > + /* Need to test again after setting task state */ > > This comment isn't illuminating. It needs to explain the "need to > test again". "after setting task state" was meant to be the explanation, but I guess more words wouldn't hurt. > > > > + if (likely(rqst_should_sleep(rqstp))) { > > Is likely() still needed here? It is ever needed? Let's drop it. Thanks, NeilBrown > > > > + schedule(); > > + slept = true; > > + } else { > > + __set_current_state(TASK_RUNNING); > > + cond_resched(); > > This makes me happy. Only call cond_resched() if we didn't sleep. > > > > + } > > + set_bit(RQ_BUSY, &rqstp->rq_flags); > > + smp_mb__after_atomic(); > > } > > - > > - set_current_state(TASK_IDLE); > > - smp_mb__before_atomic(); > > - clear_bit(SP_CONGESTED, &pool->sp_flags); > > - clear_bit(RQ_BUSY, &rqstp->rq_flags); > > - smp_mb__after_atomic(); > > - > > - if (likely(rqst_should_sleep(rqstp))) > > - schedule(); > > - else > > - __set_current_state(TASK_RUNNING); > > - > > try_to_freeze(); > > > > - set_bit(RQ_BUSY, &rqstp->rq_flags); > > - smp_mb__after_atomic(); > > + if (kthread_should_stop()) > > + return; > > + > > clear_bit(SP_TASK_PENDING, &pool->sp_flags); > > rqstp->rq_xprt = svc_xprt_dequeue(pool); > > if (rqstp->rq_xprt) { > > - trace_svc_pool_awoken(rqstp); > > + if (slept) > > + trace_svc_pool_awoken(rqstp); > > + else > > + trace_svc_pool_polled(rqstp); > > Again, it would perhaps be better if we rearranged this code first, > and then added tracepoints. This is ... well, ugly. > > > > goto out_found; > > } > > > > - if (kthread_should_stop()) > > - return NULL; > > - percpu_counter_inc(&pool->sp_threads_no_work); > > - return NULL; > > + if (slept) > > + percpu_counter_inc(&pool->sp_threads_no_work); > > + return; > > out_found: > > - clear_bit(SP_TASK_PENDING, &pool->sp_flags); > > /* Normally we will wait up to 5 seconds for any required > > * cache information to be provided. > > */ > > @@ -770,7 +774,6 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > > rqstp->rq_chandle.thread_wait = 5*HZ; > > else > > rqstp->rq_chandle.thread_wait = 1*HZ; > > - return rqstp->rq_xprt; > > } > > > > static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt) > > @@ -854,12 +857,9 @@ void svc_recv(struct svc_rqst *rqstp) > > if (!svc_alloc_arg(rqstp)) > > goto out; > > > > - try_to_freeze(); > > - cond_resched(); > > - if (kthread_should_stop()) > > - goto out; > > + svc_rqst_wait_and_dequeue_work(rqstp); > > > > - xprt = svc_get_next_xprt(rqstp); > > + xprt = rqstp->rq_xprt; > > if (!xprt) > > goto out; > > > > -- > > 2.40.1 > > > > -- > Chuck Lever >
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 380fb3caea4c..67f2b34cb8e4 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -722,47 +722,51 @@ rqst_should_sleep(struct svc_rqst *rqstp) return true; } -static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) +static void svc_rqst_wait_and_dequeue_work(struct svc_rqst *rqstp) { struct svc_pool *pool = rqstp->rq_pool; + bool slept = false; /* rq_xprt should be clear on entry */ WARN_ON_ONCE(rqstp->rq_xprt); - rqstp->rq_xprt = svc_xprt_dequeue(pool); - if (rqstp->rq_xprt) { - trace_svc_pool_polled(rqstp); - goto out_found; + if (rqst_should_sleep(rqstp)) { + set_current_state(TASK_IDLE); + smp_mb__before_atomic(); + clear_bit(SP_CONGESTED, &pool->sp_flags); + clear_bit(RQ_BUSY, &rqstp->rq_flags); + smp_mb__after_atomic(); + + /* Need to test again after setting task state */ + if (likely(rqst_should_sleep(rqstp))) { + schedule(); + slept = true; + } else { + __set_current_state(TASK_RUNNING); + cond_resched(); + } + set_bit(RQ_BUSY, &rqstp->rq_flags); + smp_mb__after_atomic(); } - - set_current_state(TASK_IDLE); - smp_mb__before_atomic(); - clear_bit(SP_CONGESTED, &pool->sp_flags); - clear_bit(RQ_BUSY, &rqstp->rq_flags); - smp_mb__after_atomic(); - - if (likely(rqst_should_sleep(rqstp))) - schedule(); - else - __set_current_state(TASK_RUNNING); - try_to_freeze(); - set_bit(RQ_BUSY, &rqstp->rq_flags); - smp_mb__after_atomic(); + if (kthread_should_stop()) + return; + clear_bit(SP_TASK_PENDING, &pool->sp_flags); rqstp->rq_xprt = svc_xprt_dequeue(pool); if (rqstp->rq_xprt) { - trace_svc_pool_awoken(rqstp); + if (slept) + trace_svc_pool_awoken(rqstp); + else + trace_svc_pool_polled(rqstp); goto out_found; } - if (kthread_should_stop()) - return NULL; - percpu_counter_inc(&pool->sp_threads_no_work); - return NULL; + if (slept) + percpu_counter_inc(&pool->sp_threads_no_work); + return; out_found: - clear_bit(SP_TASK_PENDING, &pool->sp_flags); /* Normally we will wait up to 5 seconds for any required * cache information to be provided. */ @@ -770,7 +774,6 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) rqstp->rq_chandle.thread_wait = 5*HZ; else rqstp->rq_chandle.thread_wait = 1*HZ; - return rqstp->rq_xprt; } static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt) @@ -854,12 +857,9 @@ void svc_recv(struct svc_rqst *rqstp) if (!svc_alloc_arg(rqstp)) goto out; - try_to_freeze(); - cond_resched(); - if (kthread_should_stop()) - goto out; + svc_rqst_wait_and_dequeue_work(rqstp); - xprt = svc_get_next_xprt(rqstp); + xprt = rqstp->rq_xprt; if (!xprt) goto out;
svc_get_next_xprt() does a lot more than just get an xprt. It also decides if it needs to sleep, depending not only on the availability of xprts, but also on the need to exit or handle external work (SP_TASK_PENDING). So rename it to svc_rqst_wait_and_dequeue_work(), don't return the xprt (which can easily be found in rqstp->rq_xprt), and restructure to make a clear separation between waiting and dequeueing. All the scheduling-related code like try_to_freeze() and kthread_should_stop() is moved into svc_rqst_wait_and_dequeue_work(). Rather than calling svc_xprt_dequeue() twice (before and after deciding to wait), it now calls rqst_should_sleep() twice. If the first fails, we skip all the waiting code completely. In the waiting code we call again after setting the task state in case we missed a wake-up. We now only have one call to try_to_freeze() and one call to svc_xprt_dequeue(). We still have two calls to kthread_should_stop() - one in rqst_should_sleep() to avoid sleeping, and one afterwards to avoid dequeueing any work (it previously came after dequeueing which doesn't seem right). Signed-off-by: NeilBrown <neilb@suse.de> --- net/sunrpc/svc_xprt.c | 62 +++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 31 deletions(-)