Message ID | 161126239239.8979.7995314438640511469.stgit@klimt.1015granger.net (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Two small NFSD/RDMA scalability enhancements | expand |
On Thu, Jan 21, 2021 at 03:53:12PM -0500, Chuck Lever wrote: > The Receive completion handler doesn't look at the contents of the > Receive buffer. The DMA sync isn't terribly expensive but it's one > less thing that needs to be done by the Receive completion handler, > which is single-threaded (per svc_xprt). This helps scalability. On dma-noncoherent systems that have speculative execution (e.g. a lot of ARM systems) it can be fairly expensive, so for those this a very good thing. Reviewed-by: Christoph Hellwig <hch@lst.de>
Is there an asynchronous version of ib_dma_sync? Because it flushes DMA pipelines, I'm wondering if kicking it off early might improve latency, getting it done before svc_rdma_recvfrom() needs to dig into the contents. Tom. On 1/21/2021 3:53 PM, Chuck Lever wrote: > The Receive completion handler doesn't look at the contents of the > Receive buffer. The DMA sync isn't terribly expensive but it's one > less thing that needs to be done by the Receive completion handler, > which is single-threaded (per svc_xprt). This helps scalability. > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > index ab0b7e9777bc..6d28f23ceb35 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > @@ -342,9 +342,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc) > > /* All wc fields are now known to be valid */ > ctxt->rc_byte_len = wc->byte_len; > - ib_dma_sync_single_for_cpu(rdma->sc_pd->device, > - ctxt->rc_recv_sge.addr, > - wc->byte_len, DMA_FROM_DEVICE); > > spin_lock(&rdma->sc_rq_dto_lock); > list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q); > @@ -851,6 +848,9 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp) > spin_unlock(&rdma_xprt->sc_rq_dto_lock); > percpu_counter_inc(&svcrdma_stat_recv); > > + ib_dma_sync_single_for_cpu(rdma_xprt->sc_pd->device, > + ctxt->rc_recv_sge.addr, ctxt->rc_byte_len, > + DMA_FROM_DEVICE); > svc_rdma_build_arg_xdr(rqstp, ctxt); > > /* Prevent svc_xprt_release from releasing pages in rq_pages > > >
On Fri, Jan 22, 2021 at 09:37:02AM -0500, Tom Talpey wrote:
> Is there an asynchronous version of ib_dma_sync?
No. These routines basically compile down to cache writeback and/or
invalidate instructions without much logic around them.
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index ab0b7e9777bc..6d28f23ceb35 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -342,9 +342,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc) /* All wc fields are now known to be valid */ ctxt->rc_byte_len = wc->byte_len; - ib_dma_sync_single_for_cpu(rdma->sc_pd->device, - ctxt->rc_recv_sge.addr, - wc->byte_len, DMA_FROM_DEVICE); spin_lock(&rdma->sc_rq_dto_lock); list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q); @@ -851,6 +848,9 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp) spin_unlock(&rdma_xprt->sc_rq_dto_lock); percpu_counter_inc(&svcrdma_stat_recv); + ib_dma_sync_single_for_cpu(rdma_xprt->sc_pd->device, + ctxt->rc_recv_sge.addr, ctxt->rc_byte_len, + DMA_FROM_DEVICE); svc_rdma_build_arg_xdr(rqstp, ctxt); /* Prevent svc_xprt_release from releasing pages in rq_pages
The Receive completion handler doesn't look at the contents of the Receive buffer. The DMA sync isn't terribly expensive but it's one less thing that needs to be done by the Receive completion handler, which is single-threaded (per svc_xprt). This helps scalability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)