Message ID | 162731081843.13580.15415936872318036839.stgit@klimt.1015granger.net (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Optimize NFSD Send completion processing | expand |
On 7/26/2021 10:46 AM, Chuck Lever wrote: > /** > * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC > * @cq: Completion Queue context > @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) > > trace_svcrdma_wc_send(wc, &ctxt->sc_cid); > > + svc_rdma_wake_send_waiters(rdma, 1); > complete(&ctxt->sc_done); > > - atomic_inc(&rdma->sc_sq_avail); > - wake_up(&rdma->sc_send_wait); This appears to change the order of wake_up() vs complete(). Is that intentional? Is there any possibility of a false scheduler activation, later leading to a second wakeup or poll? Tom.
> On Jul 26, 2021, at 12:50 PM, Tom Talpey <tom@talpey.com> wrote: > > On 7/26/2021 10:46 AM, Chuck Lever wrote: >> /** >> * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC >> * @cq: Completion Queue context >> @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) >> trace_svcrdma_wc_send(wc, &ctxt->sc_cid); >> + svc_rdma_wake_send_waiters(rdma, 1); >> complete(&ctxt->sc_done); >> - atomic_inc(&rdma->sc_sq_avail); >> - wake_up(&rdma->sc_send_wait); > > This appears to change the order of wake_up() vs complete(). > Is that intentional? IIRC I reversed the order here to be consistent with the other Send completion handlers. > Is there any possibility of a false > scheduler activation, later leading to a second wakeup or poll? The two "wake-ups" here are not related to each other, and RPC Replies are transmitted already so this shouldn't have a direct impact on server latency. But I might not understand your question. -- Chuck Lever
On 7/26/2021 1:53 PM, Chuck Lever III wrote: > > >> On Jul 26, 2021, at 12:50 PM, Tom Talpey <tom@talpey.com> wrote: >> >> On 7/26/2021 10:46 AM, Chuck Lever wrote: >>> /** >>> * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC >>> * @cq: Completion Queue context >>> @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) >>> trace_svcrdma_wc_send(wc, &ctxt->sc_cid); >>> + svc_rdma_wake_send_waiters(rdma, 1); >>> complete(&ctxt->sc_done); >>> - atomic_inc(&rdma->sc_sq_avail); >>> - wake_up(&rdma->sc_send_wait); >> >> This appears to change the order of wake_up() vs complete(). >> Is that intentional? > > IIRC I reversed the order here to be consistent with the other > Send completion handlers. > > >> Is there any possibility of a false >> scheduler activation, later leading to a second wakeup or poll? > > The two "wake-ups" here are not related to each other, and RPC > Replies are transmitted already so this shouldn't have a direct > impact on server latency. But I might not understand your > question. IIUC, you're saying that the thread which is awaiting the completion of ctxt->sc_done is not also waiting to send anything, therefore no thread is going to experience a fire drill. Ok. Feel free to add my Reviewed-By: Tom Talpey <tom@talpey.com> to the series. Tom.
diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 3184465de3a0..57c60ffe76dd 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -207,6 +207,7 @@ extern void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *sctxt, struct svc_rdma_recv_ctxt *rctxt, int status); +extern void svc_rdma_wake_send_waiters(struct svcxprt_rdma *rdma, int avail); extern int svc_rdma_sendto(struct svc_rqst *); extern int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length); diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 1e651447dc4e..3d1b119f6e3e 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -248,8 +248,7 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_write(wc, &cc->cc_cid); - atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail); - wake_up(&rdma->sc_send_wait); + svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); if (unlikely(wc->status != IB_WC_SUCCESS)) svc_xprt_deferred_close(&rdma->sc_xprt); @@ -304,9 +303,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_read(wc, &cc->cc_cid); - atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail); - wake_up(&rdma->sc_send_wait); - + svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); cc->cc_status = wc->status; complete(&cc->cc_done); return; diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index d6bbafb773e1..fba2ee4eb607 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -258,6 +258,20 @@ void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma, spin_unlock(&rdma->sc_send_lock); } +/** + * svc_rdma_wake_send_waiters - manage Send Queue accounting + * @rdma: controlling transport + * @avail: Number of additional SQEs that are now available + * + */ +void svc_rdma_wake_send_waiters(struct svcxprt_rdma *rdma, int avail) +{ + atomic_add(avail, &rdma->sc_sq_avail); + smp_mb__after_atomic(); + if (unlikely(waitqueue_active(&rdma->sc_send_wait))) + wake_up(&rdma->sc_send_wait); +} + /** * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC * @cq: Completion Queue context @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_send(wc, &ctxt->sc_cid); + svc_rdma_wake_send_waiters(rdma, 1); complete(&ctxt->sc_done); - atomic_inc(&rdma->sc_sq_avail); - wake_up(&rdma->sc_send_wait); - if (unlikely(wc->status != IB_WC_SUCCESS)) svc_xprt_deferred_close(&rdma->sc_xprt); }
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op. As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- include/linux/sunrpc/svc_rdma.h | 1 + net/sunrpc/xprtrdma/svc_rdma_rw.c | 7 ++----- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 18 +++++++++++++++--- 3 files changed, 18 insertions(+), 8 deletions(-)