From patchwork Mon Jan 29 14:50:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535818 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC0581272A8; Mon, 29 Jan 2024 14:50:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539839; cv=none; b=ouO+20AoRRAX+ipeC0iecfl3YjDdeg8ZAz8ts74RE+CN4tGSfddnSQG9Pw4O9eNXNwhdWkLF64yGx3n5Sw4xlEeCwOBeo80IX/CuRiJDKhDo9YVcgE/2d7hKnHLYMtWvUXjJri7eYQ3yHWSblvbm/CdwMYYTHeip7WElQM50I7A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539839; c=relaxed/simple; bh=05MA2D7vfCBiHZrUjlGcpBFOLywfR7Q9Kfb3fvxMbMQ=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nPT0d2SworOz+z/XYYsO7GAViZVWWXkxbsAjBp2Yeds6DFTfvmmqZLlgKG50n9MnKoqZpkm1tsz7O1V4Mbkbs8gFHlIGLQGj/UG6CVvQG7RNPhdouc1aDf4DU+R484dKQBQSIstSSrN9EsV/lMcvj/9bs7eHdDTfGrwspY9RCaA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=X+m2DJZ5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="X+m2DJZ5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43969C433C7; Mon, 29 Jan 2024 14:50:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539838; bh=05MA2D7vfCBiHZrUjlGcpBFOLywfR7Q9Kfb3fvxMbMQ=; h=Subject:From:To:Date:In-Reply-To:References:From; b=X+m2DJZ5t1Ho/vP+Az9Zgk4qrNLyscxxOQdzrUbHgroY+Sbb3q5NYW5hythKwyUvA QzpsQiF34okZA7aGpo0OXQj+02WE8EaGIM/qPI/red4mFfS78wCK5VEs52ycPRu9TG cIcRlhY/Bp1psdl93i0IZm76Rytd/5UepMrPpEoRThHXBJcsX7yoWjvLRo4Oj+XGzp E50A3BHjgJKbQPACjuLjOOtVY+FkDYQuJkzGcAWaL6ntaKuvroWd3dwPR/iAittAin PM3blK3eNP8dB4H6+7i+x+dNMKI7BKv+4BY2An13nFzWItFSMChVdiiErDlKGgdYEr /iJCqdy0zIFaA== Subject: [PATCH v1 01/11] svcrdma: Reserve an extra WQE for ib_drain_rq() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:50:37 -0500 Message-ID: <170653983726.24162.15894483608296911955.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Do as other ULPs already do: ensure there is an extra Receive WQE reserved for the tear-down drain WR. I haven't heard reports of problems but it can't hurt. Note that rq_depth is used to compute the Send Queue depth as well, so this fix should affect both the SQ and RQ. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 4f27325ace4a..4a038c7e86f9 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -415,7 +415,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge) newxprt->sc_max_send_sges = dev->attrs.max_send_sge; rq_depth = newxprt->sc_max_requests + newxprt->sc_max_bc_requests + - newxprt->sc_recv_batch; + newxprt->sc_recv_batch + 1 /* drain */; if (rq_depth > dev->attrs.max_qp_wr) { rq_depth = dev->attrs.max_qp_wr; newxprt->sc_recv_batch = 1; From patchwork Mon Jan 29 14:50:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535819 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C1941487DA; Mon, 29 Jan 2024 14:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539845; cv=none; b=JFNmTCnWom1XFXbg+nwaMpjtzM8EK9qXdpriL3UttGyJz/SmvQxezsc6PlHxpqMjfsZi7c2ciVMgIS1paix/ds9z2TGV2MTOdxKtGCuaYkfatfFvl0pypq/QKtDeTN5x0TpYNL8S+v0Gz/WeDk0yDc4hxGT5wADfGzSyRiWY9TI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539845; c=relaxed/simple; bh=L2ZeHEQXAskqSLFktz0METTAHNAUfChhuJZ2tCckukk=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ngr35JXHs3qrg3lHSw86is+6IcDxzFwlaN6OZKfS5YKt69V/CWMZ+kSeF838o0mvbzi9XssK0BYc2NHvnODFj51UPyC1shvDuQVgRcg/2z3IAXeawBd8GzqU4DPCXwcgVXjPKiePHau2mhXBEbuehfSy5+b1frxcvYAj4CLX4ik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vEk9ZxuM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vEk9ZxuM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C4B3C433F1; Mon, 29 Jan 2024 14:50:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539844; bh=L2ZeHEQXAskqSLFktz0METTAHNAUfChhuJZ2tCckukk=; h=Subject:From:To:Date:In-Reply-To:References:From; b=vEk9ZxuMTqnu1at2EkIKgyK4Knan3JIKlpGhcT6XhmlYPvyWjXdcO6FJyQZxNVAMQ KENvCUd4OafRgIuBpyqQgxYoR+wKObuhJ14otJ4yadU/K9bBv6KdtdF1wvSFMDkJmr RxypgNxPF3KPt1RXOrYSCkAX/NRkYzNwASumAqxtqu2N1pqnmVVsw4MJocyHVSwdFc cePTce7GAY/ihuGvE7zjBpGtEOvUD59/8tCzApoQKy0/6AES8L8aYpt6VKmBQdnBZ6 cPRsFJHnNQHku3B+0cmgms5WPs9ajDFOwdKL0uOE9XyxOytuFYFkgKiEv118qT4Z2u +9JS3na7zQhjg== Subject: [PATCH v1 02/11] svcrdma: Use all allocated Send Queue entries From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:50:43 -0500 Message-ID: <170653984365.24162.652127313173673494.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever For upper layer protocols that request rw_ctxs, ib_create_qp() adjusts ib_qp_init_attr::max_send_wr to accommodate the WQEs those rw_ctxs will consume. See rdma_rw_init_qp() for details. To actually use those additional WQEs, svc_rdma_accept() needs to retrieve the corrected SQ depth after calling rdma_create_qp() and set newxprt->sc_sq_depth and newxprt->sc_sq_avail so that svc_rdma_send() and svc_rdma_post_chunk_ctxt() can utilize those WQEs. The NVMe target driver, for example, already does this properly. Fixes: 26fb2254dd33 ("svcrdma: Estimate Send Queue depth properly") Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 36 ++++++++++++++++++------------ 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 4a038c7e86f9..75f1481fbca0 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -370,12 +370,12 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv, */ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) { + unsigned int ctxts, rq_depth, sq_depth; struct svcxprt_rdma *listen_rdma; struct svcxprt_rdma *newxprt = NULL; struct rdma_conn_param conn_param; struct rpcrdma_connect_private pmsg; struct ib_qp_init_attr qp_attr; - unsigned int ctxts, rq_depth; struct ib_device *dev; int ret = 0; RPC_IFDEBUG(struct sockaddr *sap); @@ -422,24 +422,29 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) newxprt->sc_max_requests = rq_depth - 2; newxprt->sc_max_bc_requests = 2; } + sq_depth = rq_depth; + ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES); ctxts *= newxprt->sc_max_requests; - newxprt->sc_sq_depth = rq_depth + ctxts; - if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr) - newxprt->sc_sq_depth = dev->attrs.max_qp_wr; - atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth); newxprt->sc_pd = ib_alloc_pd(dev, 0); if (IS_ERR(newxprt->sc_pd)) { trace_svcrdma_pd_err(newxprt, PTR_ERR(newxprt->sc_pd)); goto errout; } - newxprt->sc_sq_cq = ib_alloc_cq_any(dev, newxprt, newxprt->sc_sq_depth, + + /* The Completion Queue depth is the maximum number of signaled + * WRs expected to be in flight. Every Send WR is signaled, and + * each rw_ctx has a chain of WRs, but only one WR in each chain + * is signaled. + */ + newxprt->sc_sq_cq = ib_alloc_cq_any(dev, newxprt, sq_depth + ctxts, IB_POLL_WORKQUEUE); if (IS_ERR(newxprt->sc_sq_cq)) goto errout; - newxprt->sc_rq_cq = - ib_alloc_cq_any(dev, newxprt, rq_depth, IB_POLL_WORKQUEUE); + /* Every Receive WR is signaled. */ + newxprt->sc_rq_cq = ib_alloc_cq_any(dev, newxprt, rq_depth, + IB_POLL_WORKQUEUE); if (IS_ERR(newxprt->sc_rq_cq)) goto errout; @@ -448,7 +453,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) qp_attr.qp_context = &newxprt->sc_xprt; qp_attr.port_num = newxprt->sc_port_num; qp_attr.cap.max_rdma_ctxs = ctxts; - qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts; + qp_attr.cap.max_send_wr = sq_depth; qp_attr.cap.max_recv_wr = rq_depth; qp_attr.cap.max_send_sge = newxprt->sc_max_send_sges; qp_attr.cap.max_recv_sge = 1; @@ -456,17 +461,20 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) qp_attr.qp_type = IB_QPT_RC; qp_attr.send_cq = newxprt->sc_sq_cq; qp_attr.recv_cq = newxprt->sc_rq_cq; - dprintk(" cap.max_send_wr = %d, cap.max_recv_wr = %d\n", - qp_attr.cap.max_send_wr, qp_attr.cap.max_recv_wr); - dprintk(" cap.max_send_sge = %d, cap.max_recv_sge = %d\n", - qp_attr.cap.max_send_sge, qp_attr.cap.max_recv_sge); - ret = rdma_create_qp(newxprt->sc_cm_id, newxprt->sc_pd, &qp_attr); if (ret) { trace_svcrdma_qp_err(newxprt, ret); goto errout; } + dprintk("svcrdma: cap.max_send_wr = %d, cap.max_recv_wr = %d\n", + qp_attr.cap.max_send_wr, qp_attr.cap.max_recv_wr); + dprintk(" cap.max_send_sge = %d, cap.max_recv_sge = %d\n", + qp_attr.cap.max_send_sge, qp_attr.cap.max_recv_sge); + dprintk(" send CQ depth = %d, recv CQ depth = %d\n", + sq_depth, rq_depth); newxprt->sc_qp = newxprt->sc_cm_id->qp; + newxprt->sc_sq_depth = qp_attr.cap.max_send_wr; + atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth); if (!(dev->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)) newxprt->sc_snd_w_inv = false; From patchwork Mon Jan 29 14:50:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535820 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9376B5A7B3; Mon, 29 Jan 2024 14:50:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539851; cv=none; b=nRj3Xb1tDcYXHTArO/ZFTzIvBsJpFLxjU8P1ekPFknyFI+HjXEG/cJLPjtwpQWmFS+xaDyFnEgqQLfwzXvCEj/AEDXZKXGLm0SW04YwiOO7etv4yj8uWPSGi/5s8dUVNS26pWNP1d4Z9ZYqEC5/sxYUOjZ+GQ56vzE3wdbhoqpM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539851; c=relaxed/simple; bh=gVzsndViawjI3Z7+06pFsOgnjbx9+SjTnMQiUA6UHdg=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TOWm/rqAxobgF12aFAr6dYsMLNLLp17m7lI4bo+TepuPHbOR9eyRQ5qJfkVu0mLYZpi9s4y1N9ZSx0XiOl1MnXKai8F6IdJBy4e44148yOrWRukFqoMCChNEqd+t+0cETs9v3OGWv+RbS0IwqZjCok6o9P+SQiyR9VLFKpAzNT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RTxTJDMk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RTxTJDMk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E4AF8C43390; Mon, 29 Jan 2024 14:50:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539851; bh=gVzsndViawjI3Z7+06pFsOgnjbx9+SjTnMQiUA6UHdg=; h=Subject:From:To:Date:In-Reply-To:References:From; b=RTxTJDMk2xPUrx8WNqo+FHG9lqo4CEbXuPxSWJYN9dFHHIdvWNk09CdP1djuJSC8a VIjNzJRZSk0SPrfGD3iHwvq5eYUpsFU6LxHF/oMe9tc3psBQ3zch4jpTE4fyJdst7h pZU7I7i6UG6FxRDFKlcQr9ZievsllsADvY4F0zDS2qQFZFAjHn9tqBfahI0MVRzooL jm57YKdIh7fj+Ebf98but3U97JCRi4tpO4gtSeCctWw1WnQ2vzMLRxcyABlg/3yzG9 ru4baB4XYVpi50mi/+Tz4+3dJAs5U0AicARRKoQJo8G/+kG7eVOl6tx22vCokkw3w5 a+cghO+ZyV6AQ== Subject: [PATCH v1 03/11] svcrdma: Increase the per-transport rw_ctx count From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:50:50 -0500 Message-ID: <170653985002.24162.17277374573743602302.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever rdma_rw_mr_factor() returns the smallest number of MRs needed to move a particular number of pages. svcrdma currently asks for the number of MRs needed to move RPCSVC_MAXPAGES (a little over one megabyte), as that is the number of pages in the largest r/wsize the server supports. This call assumes that the client's NIC can bundle a full one megabyte payload in a single rdma_segment. In fact, most NICs cannot handle a full megabyte with a single rkey / rdma_segment. Clients will typically split even a single Read chunk into many segments. The server needs one MR to read each rdma_segment in a Read chunk, and thus each one needs an rw_ctx. svcrdma has been vastly underestimating the number of rw_ctxs needed to handle 64 RPC requests with large Read chunks using small rdma_segments. Unfortunately there doesn't seem to be a good way to estimate this number without knowing the client NIC's capabilities. Even then, the client RPC/RDMA implementation is still free to split a chunk into smaller segments (for example, it might be using physical registration, which needs an rdma_segment per page). The best we can do for now is choose a number that will guarantee forward progress in the worst case (one page per segment). At some later point, we could add some mechanisms to make this much less of a problem: - Add a core API to add more rw_ctxs to an already-established QP - svcrdma could treat rw_ctx exhaustion as a temporary error and try again - Limit the number of Reads in flight Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 75f1481fbca0..57316afe62bf 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -424,8 +424,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) } sq_depth = rq_depth; - ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES); - ctxts *= newxprt->sc_max_requests; + /* Arbitrarily estimate the number of rw_ctxs needed for + * this transport. This is enough rw_ctxs to make forward + * progress even if the client is using one rkey per page + * in each Read chunk. + */ + ctxts = 3 * RPCSVC_MAXPAGES; newxprt->sc_pd = ib_alloc_pd(dev, 0); if (IS_ERR(newxprt->sc_pd)) { From patchwork Mon Jan 29 14:50:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535821 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07C38664DE; Mon, 29 Jan 2024 14:50:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539858; cv=none; b=T4jKeXc88YcWFnCD7WoYalju03bn31LNHcjelrKbA70GR4UfzSRORYK5pDqH2rE3fwhiIqsZndDoZGt95MFS4NGGg5q6AjC1qfUTiYIMnNFpjO5Vn/+NLrz0cNQp62nWO4TKuilkwpdNdXAlF5YflrBHvV0/PymFDC20jRViTWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539858; c=relaxed/simple; bh=bvcVwU0dkB9Z73d5B21G/1JHHanO0TgLYJApJSvI37M=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hjVtTUK7nxl7LhRoBrnGtQN0zdDMUzATt7NL2DtWZe0W7i20G2gQRPaAHnZAIKN4WU/YtIm+5/3K2mHF7helM2YOTI+gG7/U+LGl+9ajdLpEZUtUWsvU30/SRhpasgFVoS/VKiShI2CjjylicpwREqDSswPE8xQoOVqVOEpl2vQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=URzKkg3P; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="URzKkg3P" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 578F9C433C7; Mon, 29 Jan 2024 14:50:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539857; bh=bvcVwU0dkB9Z73d5B21G/1JHHanO0TgLYJApJSvI37M=; h=Subject:From:To:Date:In-Reply-To:References:From; b=URzKkg3POvtbdlGIbpgm5A2e5/lDDA2MII7W4iIHQp60GklJ8QrpP3BaasDZBt4d/ LtVW0zOlOgZ/pBkmbGVOjVUhYoN99kJRwEWzhqGr/R8RtFt6vZJM7LiK5BiNc4qg7T nWvy7r1PYLZbjE8HG4jYBu/dJYU2brKLYq/ovkRKMLua9qYsUCF9XYxTRrTs+N4W1G J5ud1bGZphU076T/Wf1VLGyv/Ykz/OkR8EBBQ1vNQMqYbls8pwwaYOk1o0WhUVd8y5 T7C0y1IRbTMr2GWZGlRQDyqxckHs1l9bzXHZIty21gYZWmC6YopWGlxzfcUwIgDUb+ HvfK96UPxyysg== Subject: [PATCH v1 04/11] svcrdma: Fix SQ wake-ups From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:50:56 -0500 Message-ID: <170653985631.24162.990683035035649882.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Ensure there is a wake-up when increasing sc_sq_avail. Likewise, if a wake-up is done, sc_sq_avail needs to be updated, otherwise the wait_event() conditional is never going to be met. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 1a49b7f02041..f1f5c7b58fce 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -335,11 +335,11 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) /* If the SQ is full, wait until an SQ entry is available */ while (1) { if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { + svc_rdma_wake_send_waiters(rdma, 1); percpu_counter_inc(&svcrdma_stat_sq_starve); trace_svcrdma_sq_full(rdma, &ctxt->sc_cid); - atomic_inc(&rdma->sc_sq_avail); wait_event(rdma->sc_send_wait, - atomic_read(&rdma->sc_sq_avail) > 1); + atomic_read(&rdma->sc_sq_avail) > 0); if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) return -ENOTCONN; trace_svcrdma_sq_retry(rdma, &ctxt->sc_cid); @@ -355,7 +355,7 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) trace_svcrdma_sq_post_err(rdma, &ctxt->sc_cid, ret); svc_xprt_deferred_close(&rdma->sc_xprt); - wake_up(&rdma->sc_send_wait); + svc_rdma_wake_send_waiters(rdma, 1); return ret; } From patchwork Mon Jan 29 14:51:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535822 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CA27664DE; Mon, 29 Jan 2024 14:51:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539864; cv=none; b=fGlcaS70ek4Q2uQWHDWlQ8TM+iTnp81biKgJdqt7my1I0ubYZMYbckkd5aA5EU1hv7piLzgnvnrD8F0SN4dKS7gwhGfWlOza6uqDc0PgVtrBjOmMKGWWypSA5HcHaOx3zMjU+o1gnV/e3VZFz8nVxaRvHPWKt1HGlNn+hpcuPPU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539864; c=relaxed/simple; bh=Q8M+ZV8AwjWJN106yxlomJOUg4m9WT4jLLZhalyAC+0=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=d+ZWAgoq9AQ9/VArcfwb5gMHFRFLC3xFb894PAovLeP+4RNYfgGaCK/Eg0W4yH8Srp6vgCoRkCBLkcIh68fKp/xAM/MzyDF3pH5unAG0HKQRbQpFhehee+/PghOoyiTJFLinj0N8lAg5fKVWl/Ie3kQORJ9DY/djbK0OwfIneCU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bzxWnW/b; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bzxWnW/b" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AAAAFC433F1; Mon, 29 Jan 2024 14:51:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539863; bh=Q8M+ZV8AwjWJN106yxlomJOUg4m9WT4jLLZhalyAC+0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=bzxWnW/bzPTe1TM+K74RooCuKcH7X2qBQILl/Z+PeieQ6tMaaQznA1chqK/+R1z1I ApDEkOKut6l/5MaHlKH9eoHTsQDQ2a30Mp+AoGbUjhCwrbeKv1NdVBqmQcI6w9Yiys 89JcNuz5Y4b/sPxNpsiaRrKdJ1X8xaEraqceG/4g/Eki+WgQKv+aBS5oj9RXCF8tqE vd8kBh1vGlrfi5HYmUqjtrQbhBNEAGgUS8hgg7YrG4AtvDnmg1HeSY2clh5oM+SJQg m/JJlPZmHLuxhBo5QAHMMtwfbB4M9kuGAp6SGXsJWhOJj92UJ7INmfxpgdx8fCYKht m+XIxC5Ny8Vqw== Subject: [PATCH v1 05/11] svcrdma: Prevent a UAF in svc_rdma_send() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:02 -0500 Message-ID: <170653986273.24162.4447192396691167938.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever In some error flow cases, svc_rdma_wc_send() releases @ctxt. Copy the sc_cid field in @ctxt to a stack variable in order to guarantee that the value is available after the ib_post_send() call. In case the new comment looks a little strange, this will be done with at least one more field in a subsequent patch. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index f1f5c7b58fce..b6fc9299b472 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -316,12 +316,17 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) * @rdma: transport on which to post the WR * @ctxt: send ctxt with a Send WR ready to post * + * Copy fields in @ctxt to stack variables in order to guarantee + * that these values remain available after the ib_post_send() call. + * In some error flow cases, svc_rdma_wc_send() releases @ctxt. + * * Returns zero if the Send WR was posted successfully. Otherwise, a * negative errno is returned. */ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) { struct ib_send_wr *wr = &ctxt->sc_send_wr; + struct rpc_rdma_cid cid = ctxt->sc_cid; int ret; might_sleep(); @@ -337,12 +342,12 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { svc_rdma_wake_send_waiters(rdma, 1); percpu_counter_inc(&svcrdma_stat_sq_starve); - trace_svcrdma_sq_full(rdma, &ctxt->sc_cid); + trace_svcrdma_sq_full(rdma, &cid); wait_event(rdma->sc_send_wait, atomic_read(&rdma->sc_sq_avail) > 0); if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) return -ENOTCONN; - trace_svcrdma_sq_retry(rdma, &ctxt->sc_cid); + trace_svcrdma_sq_retry(rdma, &cid); continue; } @@ -353,7 +358,7 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) return 0; } - trace_svcrdma_sq_post_err(rdma, &ctxt->sc_cid, ret); + trace_svcrdma_sq_post_err(rdma, &cid, ret); svc_xprt_deferred_close(&rdma->sc_xprt); svc_rdma_wake_send_waiters(rdma, 1); return ret; From patchwork Mon Jan 29 14:51:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535823 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9A5C1487DA; Mon, 29 Jan 2024 14:51:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539870; cv=none; b=Wi+qoqmXMp8pDcPueqScTD6RgctdZT1kVabPiHQeMhPK7uKbTYMB5PMVIyU6BZUo326jzHIDajpZgtGSl5QtfatJW9iaTosfUtrKHCEhDUGXCs+Ajg1kU9Vtm6ofNeygEX0QBzpg1+J5kSz3j46rTbQ9SxI8n/hznGUoDE2xhkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539870; c=relaxed/simple; bh=AxYa8OHD5oTe38D0Fdj6WOQrLqgmLObq7aN1YDqlrJ0=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KCrob0hV/gZV+jCMBvgPCD5gRxRNZ5h+WTEGxEiVCXCM8oyUO9bGansH7Tz2hrPBn4F2ghdSRJJ7SdECusz14ryc/zoEoGy64JKCVA73iW0O/WjVYkS8FKBal/IMq0MMJfTnTksXaFAKu0YN1vJ1MOKLKpHOpVmbDwhpc4OIEt8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=THlcY70v; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="THlcY70v" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0C9BBC433C7; Mon, 29 Jan 2024 14:51:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539870; bh=AxYa8OHD5oTe38D0Fdj6WOQrLqgmLObq7aN1YDqlrJ0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=THlcY70v21GO3qlclwpsS+g67726GdDsx3RzYUPmfzWGLwWtWAqDfiRyb6tAMEwtS urnnFSLAgOttNYzjAGTvpZUjUF8px9RxSqeET0pdlXZfPQzuQCvx5h3AuQaR2i3OKx eVt3YNMchxQ8OVCEnXraOBebq9OkqghRrQ0XQoh96mCo949rtsZAt8hNGSrK10ovh7 NMrfrtnclmUa69kJw6IdnsZ7ri5Tg757hIKO2uok66a7wrySjPOeJ/qZKT4wwr0YGd enzrrB6KT4DlJk84YDT3L8CwhJBTF2P93XQtAM4SGYZK+j/l7bPlv4PcfXB2SdlyjH zxvkio8xJQ6ZQ== Subject: [PATCH v1 06/11] svcrdma: Fix retry loop in svc_rdma_send() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:09 -0500 Message-ID: <170653986907.24162.2435133775108024319.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Don't call ib_post_send() at all if the transport is already shutting down. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index b6fc9299b472..0ee9185f5f3f 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -320,8 +320,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) * that these values remain available after the ib_post_send() call. * In some error flow cases, svc_rdma_wc_send() releases @ctxt. * - * Returns zero if the Send WR was posted successfully. Otherwise, a - * negative errno is returned. + * Return values: + * %0: @ctxt's WR chain was posted successfully + * %-ENOTCONN: The connection was lost */ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) { @@ -338,30 +339,35 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) DMA_TO_DEVICE); /* If the SQ is full, wait until an SQ entry is available */ - while (1) { + while (!test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) { if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { svc_rdma_wake_send_waiters(rdma, 1); + + /* When the transport is torn down, assume + * ib_drain_sq() will trigger enough Send + * completions to wake us. The XPT_CLOSE test + * above should then cause the while loop to + * exit. + */ percpu_counter_inc(&svcrdma_stat_sq_starve); trace_svcrdma_sq_full(rdma, &cid); wait_event(rdma->sc_send_wait, atomic_read(&rdma->sc_sq_avail) > 0); - if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) - return -ENOTCONN; trace_svcrdma_sq_retry(rdma, &cid); continue; } trace_svcrdma_post_send(ctxt); ret = ib_post_send(rdma->sc_qp, wr, NULL); - if (ret) + if (ret) { + trace_svcrdma_sq_post_err(rdma, &cid, ret); + svc_xprt_deferred_close(&rdma->sc_xprt); + svc_rdma_wake_send_waiters(rdma, 1); break; + } return 0; } - - trace_svcrdma_sq_post_err(rdma, &cid, ret); - svc_xprt_deferred_close(&rdma->sc_xprt); - svc_rdma_wake_send_waiters(rdma, 1); - return ret; + return -ENOTCONN; } /** From patchwork Mon Jan 29 14:51:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535824 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AF2414460B; Mon, 29 Jan 2024 14:51:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539877; cv=none; b=SQQDxfkjo1Grc3MY8GAgEbrrfyb85B+CFau3hU0LzFpwdIvWQE1zka2erCgvTbN3cJKlpT2lVQFe5apOzDJuDuHOjcZ2KdrP/FqsFEdpztNIaWSlcc72vumEUvGcSp5xYWwtcdPuKRqDd5ECO67Xdjhtm4xnCmmOgHmkEXSz3hA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539877; c=relaxed/simple; bh=rPH08YEEIQsOio03nQ7ZFxrHGUJoeYPgGbyedryQQy8=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Dt9gquUND0FjrDOrSl0BmEQ6iWjl1VyNfPMAWxzGUQD6fPGhMkRvb6pmTz6mQnej5223VnYbNZ44PspC4fMq/hx6adrLRAZatYs35F6rdWYcrWgjLOKtjBEcefBB7dsYsfrnFBtvp/PZsXgf31Km+kdmeUPKIAa3uU6w/NR6p24= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tiRLXvij; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tiRLXvij" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5C04FC433C7; Mon, 29 Jan 2024 14:51:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539876; bh=rPH08YEEIQsOio03nQ7ZFxrHGUJoeYPgGbyedryQQy8=; h=Subject:From:To:Date:In-Reply-To:References:From; b=tiRLXvijUio9p9U6K0PBbTAwVRK9VTg6MYpKFy1J3C8ueBcgWN13Lz1w/Qi4oj7Id uitKthZm2fn0P0ceNx3IPhBbQj3DtxJjHtidjdqjbGEGB4f0VyWY8WifrkVp34l53m 8llcO6OiNbfmS0tWQvfDmVVGz3Fdv3prxAzG8gbS0jV+C6G9G6B1I6by5ICxU4mXZ2 urw3W4C4ZMwYdNej18FGPW8UL6ccJ0PVCu+NIW2VtQ7rQn/6G3zEoxNm10aH+f0yP2 3VAuG3zz+hBCru581R9mwDO09iOMQXIPEIzulWm7y0mPXATx17L1MXSRLqc4Eee7DL eGcFlYVFEKe/Q== Subject: [PATCH v1 07/11] svcrdma: Post Send WR chain From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:15 -0500 Message-ID: <170653987543.24162.18059205281166429623.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Eventually I'd like the server to post the reply's Send WR along with any Write WRs using only a single call to ib_post_send(), in order to reduce the NIC's doorbell rate. To do this, add an anchor for a WR chain to svc_rdma_send_ctxt, and refactor svc_rdma_send() to post this WR chain to the Send Queue. For the moment, the posted chain will continue to contain a single Send WR. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 6 ++- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 + net/sunrpc/xprtrdma/svc_rdma_sendto.c | 49 +++++++++++++++++++--------- 3 files changed, 38 insertions(+), 19 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index e7595ae62fe2..ee05087d6499 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -210,6 +210,8 @@ struct svc_rdma_send_ctxt { struct svcxprt_rdma *sc_rdma; struct ib_send_wr sc_send_wr; + struct ib_send_wr *sc_wr_chain; + int sc_sqecount; struct ib_cqe sc_cqe; struct xdr_buf sc_hdrbuf; struct xdr_stream sc_stream; @@ -258,8 +260,8 @@ extern struct svc_rdma_send_ctxt * svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma); extern void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt); -extern int svc_rdma_send(struct svcxprt_rdma *rdma, - struct svc_rdma_send_ctxt *ctxt); +extern int svc_rdma_post_send(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt); extern int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *sctxt, const struct svc_rdma_pcl *write_pcl, diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index c9be6778643b..e5a78b761012 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -90,7 +90,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, */ get_page(virt_to_page(rqst->rq_buffer)); sctxt->sc_send_wr.opcode = IB_WR_SEND; - return svc_rdma_send(rdma, sctxt); + return svc_rdma_post_send(rdma, sctxt); } /* Server-side transport endpoint wants a whole page for its send diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 0ee9185f5f3f..0f02fb09d5b0 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -208,6 +208,9 @@ struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma) ctxt->sc_send_wr.num_sge = 0; ctxt->sc_cur_sge_no = 0; ctxt->sc_page_count = 0; + ctxt->sc_wr_chain = &ctxt->sc_send_wr; + ctxt->sc_sqecount = 1; + return ctxt; out_empty: @@ -293,7 +296,7 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) struct svc_rdma_send_ctxt *ctxt = container_of(cqe, struct svc_rdma_send_ctxt, sc_cqe); - svc_rdma_wake_send_waiters(rdma, 1); + svc_rdma_wake_send_waiters(rdma, ctxt->sc_sqecount); if (unlikely(wc->status != IB_WC_SUCCESS)) goto flushed; @@ -312,36 +315,44 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) } /** - * svc_rdma_send - Post a single Send WR - * @rdma: transport on which to post the WR - * @ctxt: send ctxt with a Send WR ready to post + * svc_rdma_post_send - Post a WR chain to the Send Queue + * @rdma: transport context + * @ctxt: WR chain to post * * Copy fields in @ctxt to stack variables in order to guarantee * that these values remain available after the ib_post_send() call. * In some error flow cases, svc_rdma_wc_send() releases @ctxt. * + * Note there is potential for starvation when the Send Queue is + * full because there is no order to when waiting threads are + * awoken. The transport is typically provisioned with a deep + * enough Send Queue that SQ exhaustion should be a rare event. + * * Return values: * %0: @ctxt's WR chain was posted successfully * %-ENOTCONN: The connection was lost */ -int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) +int svc_rdma_post_send(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt) { - struct ib_send_wr *wr = &ctxt->sc_send_wr; + struct ib_send_wr *first_wr = ctxt->sc_wr_chain; + struct ib_send_wr *send_wr = &ctxt->sc_send_wr; + const struct ib_send_wr *bad_wr = first_wr; struct rpc_rdma_cid cid = ctxt->sc_cid; - int ret; + int ret, sqecount = ctxt->sc_sqecount; might_sleep(); /* Sync the transport header buffer */ ib_dma_sync_single_for_device(rdma->sc_pd->device, - wr->sg_list[0].addr, - wr->sg_list[0].length, + send_wr->sg_list[0].addr, + send_wr->sg_list[0].length, DMA_TO_DEVICE); /* If the SQ is full, wait until an SQ entry is available */ while (!test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) { - if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { - svc_rdma_wake_send_waiters(rdma, 1); + if (atomic_sub_return(sqecount, &rdma->sc_sq_avail) < 0) { + svc_rdma_wake_send_waiters(rdma, sqecount); /* When the transport is torn down, assume * ib_drain_sq() will trigger enough Send @@ -358,12 +369,18 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) } trace_svcrdma_post_send(ctxt); - ret = ib_post_send(rdma->sc_qp, wr, NULL); + ret = ib_post_send(rdma->sc_qp, first_wr, &bad_wr); if (ret) { trace_svcrdma_sq_post_err(rdma, &cid, ret); svc_xprt_deferred_close(&rdma->sc_xprt); - svc_rdma_wake_send_waiters(rdma, 1); - break; + + /* If even one WR was posted, there will be a + * Send completion that bumps sc_sq_avail. + */ + if (bad_wr == first_wr) { + svc_rdma_wake_send_waiters(rdma, sqecount); + break; + } } return 0; } @@ -884,7 +901,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, sctxt->sc_send_wr.opcode = IB_WR_SEND; } - return svc_rdma_send(rdma, sctxt); + return svc_rdma_post_send(rdma, sctxt); } /** @@ -948,7 +965,7 @@ void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, sctxt->sc_send_wr.num_sge = 1; sctxt->sc_send_wr.opcode = IB_WR_SEND; sctxt->sc_sges[0].length = sctxt->sc_hdrbuf.len; - if (svc_rdma_send(rdma, sctxt)) + if (svc_rdma_post_send(rdma, sctxt)) goto put_ctxt; return; From patchwork Mon Jan 29 14:51:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535825 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65D1614460B; Mon, 29 Jan 2024 14:51:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539883; cv=none; b=nqOFdBpSzlct6JbPMa0d7qzQSU2GfrmIlMZEHbNri3bCKa1i0qzvYKAstyza5VZe3CVgw2w6kSZMov4BCPwUefY9MEzfbP80CoEnwntNVMOQAI6YZhzLZtxeIKQC5bHd+uCp9paGWUH+sZQOQ2ShaRfvH+1vCNpIvRYCrVj1eQM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539883; c=relaxed/simple; bh=fACfSyyk+VkhtS87Z3tlQNfR2YjBL1g1e2LN2YXkIm0=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sWixhrU5Wdw3GKGMBOhRBqmLLpHAYTSg2HQvGrbpgIVTdAg3eyR7oTPAN9kaSJzbi0BkT7/QO69llrred39v5niydAlmGueFEU399RuTiwSs0RJSCrCNHPc3cGhY6Lr78ShYTiEgT4pSlyJ3yy46GC3sUEYLhNMmso3Y7bq3PAk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=D3atI+f7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="D3atI+f7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4368C433F1; Mon, 29 Jan 2024 14:51:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539882; bh=fACfSyyk+VkhtS87Z3tlQNfR2YjBL1g1e2LN2YXkIm0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=D3atI+f7BLOsLisRVVsom53rjlTi19zunMiFZxufE0J3J9RPHPDo4xjwy3my62nQG zz4EIR9SzYf2Sm0Q08K2hRRe31ipm4pZMRQUtFRJsIa0f7cdbauAD0lnfUrYckubIy 14KkhAZJgKYZyfge1/BqRvuWxuI9EWxDAzUpvfqAaFTf/kapPM49G356UqYKeLDpeQ U7jgmYCnqW6ANCcotqiuuiUldna5oets3QGJumiXW5UnQ3yeb0s8ThgsafFe6h1Ai9 d35Pb7VWU7pHWwEbVfUod2nfyv54LxbBlC2Lllngz7mPF8+HnZUhO6WSpJa+ditt0K f9KGnbcbCDqTQ== Subject: [PATCH v1 08/11] svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:21 -0500 Message-ID: <170653988175.24162.12812489159335969199.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Since the RPC transaction's svc_rdma_send_ctxt will stay around for the duration of the RDMA Write operation, the write_info structure for the Reply chunk can reside in the request's svc_rdma_send_ctxt instead of being allocated separately. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 25 +++++++++ include/trace/events/rpcrdma.h | 4 + net/sunrpc/xprtrdma/svc_rdma_rw.c | 91 +++++++++++++++++++-------------- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 2 - 4 files changed, 82 insertions(+), 40 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index ee05087d6499..918cf4fda728 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -203,6 +203,29 @@ struct svc_rdma_recv_ctxt { struct page *rc_pages[RPCSVC_MAXPAGES]; }; +/* + * State for sending a Write chunk. + * - Tracks progress of writing one chunk over all its segments + * - Stores arguments for the SGL constructor functions + */ +struct svc_rdma_write_info { + struct svcxprt_rdma *wi_rdma; + + const struct svc_rdma_chunk *wi_chunk; + + /* write state of this chunk */ + unsigned int wi_seg_off; + unsigned int wi_seg_no; + + /* SGL constructor arguments */ + const struct xdr_buf *wi_xdr; + unsigned char *wi_base; + unsigned int wi_next_off; + + struct svc_rdma_chunk_ctxt wi_cc; + struct work_struct wi_work; +}; + struct svc_rdma_send_ctxt { struct llist_node sc_node; struct rpc_rdma_cid sc_cid; @@ -215,6 +238,7 @@ struct svc_rdma_send_ctxt { struct ib_cqe sc_cqe; struct xdr_buf sc_hdrbuf; struct xdr_stream sc_stream; + struct svc_rdma_write_info sc_reply_info; void *sc_xprt_buf; int sc_page_count; int sc_cur_sge_no; @@ -249,6 +273,7 @@ extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, const struct xdr_buf *xdr); extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_recv_ctxt *rctxt, + struct svc_rdma_send_ctxt *sctxt, const struct xdr_buf *xdr); extern int svc_rdma_process_read_list(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp, diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h index 110c1475c527..027ac3ab457d 100644 --- a/include/trace/events/rpcrdma.h +++ b/include/trace/events/rpcrdma.h @@ -2118,6 +2118,10 @@ DEFINE_SIMPLE_CID_EVENT(svcrdma_wc_write); DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_write_flush); DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_write_err); +DEFINE_SIMPLE_CID_EVENT(svcrdma_wc_reply); +DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_reply_flush); +DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_reply_err); + TRACE_EVENT(svcrdma_qp_error, TP_PROTO( const struct ib_event *event, diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index c00fcce61d1e..2ca3c6311c5e 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -197,28 +197,6 @@ void svc_rdma_cc_release(struct svcxprt_rdma *rdma, llist_add_batch(first, last, &rdma->sc_rw_ctxts); } -/* State for sending a Write or Reply chunk. - * - Tracks progress of writing one chunk over all its segments - * - Stores arguments for the SGL constructor functions - */ -struct svc_rdma_write_info { - struct svcxprt_rdma *wi_rdma; - - const struct svc_rdma_chunk *wi_chunk; - - /* write state of this chunk */ - unsigned int wi_seg_off; - unsigned int wi_seg_no; - - /* SGL constructor arguments */ - const struct xdr_buf *wi_xdr; - unsigned char *wi_base; - unsigned int wi_next_off; - - struct svc_rdma_chunk_ctxt wi_cc; - struct work_struct wi_work; -}; - static struct svc_rdma_write_info * svc_rdma_write_info_alloc(struct svcxprt_rdma *rdma, const struct svc_rdma_chunk *chunk) @@ -252,6 +230,43 @@ static void svc_rdma_write_info_free(struct svc_rdma_write_info *info) queue_work(svcrdma_wq, &info->wi_work); } +static void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_chunk_ctxt *cc) +{ + svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); + svc_rdma_cc_release(rdma, cc, DMA_TO_DEVICE); +} + +/** + * svc_rdma_reply_done - Reply chunk Write completion handler + * @cq: controlling Completion Queue + * @wc: Work Completion report + * + * Pages under I/O are released by a subsequent Send completion. + */ +static void svc_rdma_reply_done(struct ib_cq *cq, struct ib_wc *wc) +{ + struct ib_cqe *cqe = wc->wr_cqe; + struct svc_rdma_chunk_ctxt *cc = + container_of(cqe, struct svc_rdma_chunk_ctxt, cc_cqe); + struct svcxprt_rdma *rdma = cq->cq_context; + + switch (wc->status) { + case IB_WC_SUCCESS: + trace_svcrdma_wc_reply(&cc->cc_cid); + svc_rdma_reply_chunk_release(rdma, cc); + return; + case IB_WC_WR_FLUSH_ERR: + trace_svcrdma_wc_reply_flush(wc, &cc->cc_cid); + break; + default: + trace_svcrdma_wc_reply_err(wc, &cc->cc_cid); + } + + svc_rdma_reply_chunk_release(rdma, cc); + svc_xprt_deferred_close(&rdma->sc_xprt); +} + /** * svc_rdma_write_done - Write chunk completion * @cq: controlling Completion Queue @@ -624,7 +639,8 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, /** * svc_rdma_send_reply_chunk - Write all segments in the Reply chunk * @rdma: controlling RDMA transport - * @rctxt: Write and Reply chunks from client + * @rctxt: Write and Reply chunks provisioned by the client + * @sctxt: Send WR resources * @xdr: xdr_buf containing an RPC Reply * * Returns a non-negative number of bytes the chunk consumed, or @@ -636,37 +652,34 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, */ int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_recv_ctxt *rctxt, + struct svc_rdma_send_ctxt *sctxt, const struct xdr_buf *xdr) { - struct svc_rdma_write_info *info; - struct svc_rdma_chunk_ctxt *cc; - struct svc_rdma_chunk *chunk; + struct svc_rdma_write_info *info = &sctxt->sc_reply_info; + struct svc_rdma_chunk_ctxt *cc = &info->wi_cc; int ret; - if (pcl_is_empty(&rctxt->rc_reply_pcl)) - return 0; + if (likely(pcl_is_empty(&rctxt->rc_reply_pcl))) + return 0; /* client provided no Reply chunk */ - chunk = pcl_first_chunk(&rctxt->rc_reply_pcl); - info = svc_rdma_write_info_alloc(rdma, chunk); - if (!info) - return -ENOMEM; - cc = &info->wi_cc; + info->wi_rdma = rdma; + info->wi_chunk = pcl_first_chunk(&rctxt->rc_reply_pcl); + info->wi_seg_off = 0; + info->wi_seg_no = 0; + svc_rdma_cc_init(rdma, &info->wi_cc); + info->wi_cc.cc_cqe.done = svc_rdma_reply_done; ret = pcl_process_nonpayloads(&rctxt->rc_write_pcl, xdr, svc_rdma_xb_write, info); if (ret < 0) - goto out_err; + return ret; trace_svcrdma_post_reply_chunk(&cc->cc_cid, cc->cc_sqecount); ret = svc_rdma_post_chunk_ctxt(rdma, cc); if (ret < 0) - goto out_err; + return ret; return xdr->len; - -out_err: - svc_rdma_write_info_free(info); - return ret; } /** diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 0f02fb09d5b0..d8e079be36e2 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -1012,7 +1012,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; - ret = svc_rdma_send_reply_chunk(rdma, rctxt, &rqstp->rq_res); + ret = svc_rdma_send_reply_chunk(rdma, rctxt, sctxt, &rqstp->rq_res); if (ret < 0) goto reply_chunk; rc_size = ret; From patchwork Mon Jan 29 14:51:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535826 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3D33152DE7; Mon, 29 Jan 2024 14:51:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539889; cv=none; b=e/uLiWyQOEf46jJklQv8X0X4YPYReeki1OXRwTHcjUHAapHU9qUhDiXKbVfI2NHdDoDhFwuvH/b6SJ1pZLl0AgCrC03wFYDeex+Ii0+o6LMTjeFx/YD704iKO5LXb6D6QSRu0UsIeRnCeooErPGwqFNEMqwdBDBmyVZq6gm85mc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539889; c=relaxed/simple; bh=4V5S/JIS1CnIKji7O7YFq/SoJn6dnb/EkM8BkRXCb1w=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QPQeROrkMPAeHZNLGAvtbeKqMb+tTv61iTTvIXKlbejH1MZs3/aT1kPWudV/bVADc3hvNIAzORLqInNZTJVuIiK0yWGT6mjj3VQHHeGNqaGIru7FRvVOIl7mrlrUJfb5gmIdLE5ZtGY9PmPt+4CDTGmrz8c5Yzpm6Ql5pB0/C5U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IVPXVz/5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IVPXVz/5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05970C433C7; Mon, 29 Jan 2024 14:51:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539889; bh=4V5S/JIS1CnIKji7O7YFq/SoJn6dnb/EkM8BkRXCb1w=; h=Subject:From:To:Date:In-Reply-To:References:From; b=IVPXVz/53kBZLHAJok1t+n9rNQEZO5f8CG36bc8xG2mp6JnROzw3aWESyE/NKPzWu //e7CvFfLz3PYH4GXGxe+Tw9X/yOXUjZIMW0iSDcVJzZDSKFb+iCfpPY14/V7BFL/s jgedB73dY5a4MeK/mBYZtFcpXGDaDt8XAb11CN8ztHvThkNyBuR9TJaGXqF4deAOgL M8AoQRqbKc/vPb9lbrMGoefZqwzbx3xYP53bGm9oNeSk7LkUfYGPREHmWve8/u/6dJ mxqMaCKj069S06LhU/Y1Xj8kltZgVquP0dp/2n4FLEJCxcHnBbW35NUrOHpkwNlNtd dxwCAHD9pCLYw== Subject: [PATCH v1 09/11] svcrdma: Post the Reply chunk and Send WR together From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:28 -0500 Message-ID: <170653988811.24162.11957929805329696177.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Reduce the doorbell and Send completion rates when sending RPC/RDMA replies that have Reply chunks. NFS READDIR procedures typically return their result in a Reply chunk, for example. Instead of calling ib_post_send() to post the Write WRs for the Reply chunk, and then calling it again to post the Send WR that conveys the transport header, chain the Write WRs to the Send WR and call ib_post_send() only once. Thanks to the Send Queue completion ordering rules, when the Send WR completes, that guarantees that Write WRs posted before it have also completed successfully. Thus all Write WRs for the Reply chunk can remain unsignaled. Instead of handling a Write completion and then a Send completion, only the Send completion is seen, and it handles clean up for both the Writes and the Send. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 13 +++++-- net/sunrpc/xprtrdma/svc_rdma_rw.c | 58 +++++++++++++++++++++------------ net/sunrpc/xprtrdma/svc_rdma_sendto.c | 34 +++++++++++-------- 3 files changed, 66 insertions(+), 39 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 918cf4fda728..ac882bd23ca2 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -262,19 +262,24 @@ extern void svc_rdma_release_ctxt(struct svc_xprt *xprt, void *ctxt); extern int svc_rdma_recvfrom(struct svc_rqst *); /* svc_rdma_rw.c */ +extern void svc_rdma_cc_init(struct svcxprt_rdma *rdma, + struct svc_rdma_chunk_ctxt *cc); extern void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma); extern void svc_rdma_cc_init(struct svcxprt_rdma *rdma, struct svc_rdma_chunk_ctxt *cc); extern void svc_rdma_cc_release(struct svcxprt_rdma *rdma, struct svc_rdma_chunk_ctxt *cc, enum dma_data_direction dir); +extern void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt); extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_chunk *chunk, const struct xdr_buf *xdr); -extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt, - const struct xdr_buf *xdr); +extern int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + const struct svc_rdma_pcl *reply_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr); extern int svc_rdma_process_read_list(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp, struct svc_rdma_recv_ctxt *head); diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 2ca3c6311c5e..2b25edc6c73c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -230,10 +230,18 @@ static void svc_rdma_write_info_free(struct svc_rdma_write_info *info) queue_work(svcrdma_wq, &info->wi_work); } -static void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, - struct svc_rdma_chunk_ctxt *cc) +/** + * svc_rdma_reply_chunk_release - Release Reply chunk I/O resources + * @rdma: controlling transport + * @ctxt: Send context that is being released + */ +void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt) { - svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); + struct svc_rdma_chunk_ctxt *cc = &ctxt->sc_reply_info.wi_cc; + + if (!cc->cc_sqecount) + return; svc_rdma_cc_release(rdma, cc, DMA_TO_DEVICE); } @@ -254,7 +262,6 @@ static void svc_rdma_reply_done(struct ib_cq *cq, struct ib_wc *wc) switch (wc->status) { case IB_WC_SUCCESS: trace_svcrdma_wc_reply(&cc->cc_cid); - svc_rdma_reply_chunk_release(rdma, cc); return; case IB_WC_WR_FLUSH_ERR: trace_svcrdma_wc_reply_flush(wc, &cc->cc_cid); @@ -263,7 +270,6 @@ static void svc_rdma_reply_done(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_reply_err(wc, &cc->cc_cid); } - svc_rdma_reply_chunk_release(rdma, cc); svc_xprt_deferred_close(&rdma->sc_xprt); } @@ -637,9 +643,10 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, } /** - * svc_rdma_send_reply_chunk - Write all segments in the Reply chunk + * svc_rdma_prepare_reply_chunk - Construct WR chain for writing the Reply chunk * @rdma: controlling RDMA transport - * @rctxt: Write and Reply chunks provisioned by the client + * @write_pcl: Write chunk list provided by client + * @reply_pcl: Reply chunk provided by client * @sctxt: Send WR resources * @xdr: xdr_buf containing an RPC Reply * @@ -650,35 +657,44 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, * %-ENOTCONN if posting failed (connection is lost), * %-EIO if rdma_rw initialization failed (DMA mapping, etc). */ -int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt, - const struct xdr_buf *xdr) +int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + const struct svc_rdma_pcl *reply_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr) { struct svc_rdma_write_info *info = &sctxt->sc_reply_info; struct svc_rdma_chunk_ctxt *cc = &info->wi_cc; + struct ib_send_wr *first_wr; + struct list_head *pos; + struct ib_cqe *cqe; int ret; - if (likely(pcl_is_empty(&rctxt->rc_reply_pcl))) - return 0; /* client provided no Reply chunk */ - info->wi_rdma = rdma; - info->wi_chunk = pcl_first_chunk(&rctxt->rc_reply_pcl); + info->wi_chunk = pcl_first_chunk(reply_pcl); info->wi_seg_off = 0; info->wi_seg_no = 0; - svc_rdma_cc_init(rdma, &info->wi_cc); info->wi_cc.cc_cqe.done = svc_rdma_reply_done; - ret = pcl_process_nonpayloads(&rctxt->rc_write_pcl, xdr, + ret = pcl_process_nonpayloads(write_pcl, xdr, svc_rdma_xb_write, info); if (ret < 0) return ret; - trace_svcrdma_post_reply_chunk(&cc->cc_cid, cc->cc_sqecount); - ret = svc_rdma_post_chunk_ctxt(rdma, cc); - if (ret < 0) - return ret; + first_wr = sctxt->sc_wr_chain; + cqe = &cc->cc_cqe; + list_for_each(pos, &cc->cc_rwctxts) { + struct svc_rdma_rw_ctxt *rwc; + rwc = list_entry(pos, struct svc_rdma_rw_ctxt, rw_list); + first_wr = rdma_rw_ctx_wrs(&rwc->rw_ctx, rdma->sc_qp, + rdma->sc_port_num, cqe, first_wr); + cqe = NULL; + } + sctxt->sc_wr_chain = first_wr; + sctxt->sc_sqecount += cc->cc_sqecount; + + trace_svcrdma_post_reply_chunk(&cc->cc_cid, cc->cc_sqecount); return xdr->len; } diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index d8e079be36e2..6dfd2232ce5b 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -205,6 +205,7 @@ struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma) xdr_init_encode(&ctxt->sc_stream, &ctxt->sc_hdrbuf, ctxt->sc_xprt_buf, NULL); + svc_rdma_cc_init(rdma, &ctxt->sc_reply_info.wi_cc); ctxt->sc_send_wr.num_sge = 0; ctxt->sc_cur_sge_no = 0; ctxt->sc_page_count = 0; @@ -226,6 +227,8 @@ static void svc_rdma_send_ctxt_release(struct svcxprt_rdma *rdma, struct ib_device *device = rdma->sc_cm_id->device; unsigned int i; + svc_rdma_reply_chunk_release(rdma, ctxt); + if (ctxt->sc_page_count) release_pages(ctxt->sc_pages, ctxt->sc_page_count); @@ -867,16 +870,10 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp, * in sc_sges[0], and the RPC xdr_buf is prepared in following sges. * * Depending on whether a Write list or Reply chunk is present, - * the server may send all, a portion of, or none of the xdr_buf. + * the server may Send all, a portion of, or none of the xdr_buf. * In the latter case, only the transport header (sc_sges[0]) is * transmitted. * - * RDMA Send is the last step of transmitting an RPC reply. Pages - * involved in the earlier RDMA Writes are here transferred out - * of the rqstp and into the sctxt's page array. These pages are - * DMA unmapped by each Write completion, but the subsequent Send - * completion finally releases these pages. - * * Assumptions: * - The Reply's transport header will never be larger than a page. */ @@ -885,6 +882,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, const struct svc_rdma_recv_ctxt *rctxt, struct svc_rqst *rqstp) { + struct ib_send_wr *send_wr = &sctxt->sc_send_wr; int ret; ret = svc_rdma_map_reply_msg(rdma, sctxt, &rctxt->rc_write_pcl, @@ -892,13 +890,16 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, if (ret < 0) return ret; + /* Transfer pages involved in RDMA Writes to the sctxt's + * page array. Completion handling releases these pages. + */ svc_rdma_save_io_pages(rqstp, sctxt); if (rctxt->rc_inv_rkey) { - sctxt->sc_send_wr.opcode = IB_WR_SEND_WITH_INV; - sctxt->sc_send_wr.ex.invalidate_rkey = rctxt->rc_inv_rkey; + send_wr->opcode = IB_WR_SEND_WITH_INV; + send_wr->ex.invalidate_rkey = rctxt->rc_inv_rkey; } else { - sctxt->sc_send_wr.opcode = IB_WR_SEND; + send_wr->opcode = IB_WR_SEND; } return svc_rdma_post_send(rdma, sctxt); @@ -1012,10 +1013,15 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; - ret = svc_rdma_send_reply_chunk(rdma, rctxt, sctxt, &rqstp->rq_res); - if (ret < 0) - goto reply_chunk; - rc_size = ret; + rc_size = 0; + if (!pcl_is_empty(&rctxt->rc_reply_pcl)) { + ret = svc_rdma_prepare_reply_chunk(rdma, &rctxt->rc_write_pcl, + &rctxt->rc_reply_pcl, sctxt, + &rqstp->rq_res); + if (ret < 0) + goto reply_chunk; + rc_size = ret; + } *p++ = *rdma_argp; *p++ = *(rdma_argp + 1); From patchwork Mon Jan 29 14:51:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535827 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11CD214460B; Mon, 29 Jan 2024 14:51:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539896; cv=none; b=Y4gcVFS3mc+qX3t6M9bK6l8ROIZELWUjPHII+20wFFn2l4ouHaVnitrsFzUUKQrCKZk2kkoRYbEB6leWHM7WEb615MSDu5jgZdKkA/tXpcQO8zS74yuUTsFrEX2goczb7GBL3nm38nNZsww3UMlCTJpU2I9kzYsDv33YH+F3pdo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539896; c=relaxed/simple; bh=L0X3dLyOPy67nt8GCLWy13o2002WEL1KVLp4Q4jt1ZU=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OErnceIZ/0Qd9CCc3AjELJoJbK/f4WSbU/3p+kRICMU5Rgium7ll9bkeQ12HnIUorPd2nXlCNqGhx0eLHwKkIK8vOLn+b4IXrq+2KpZigoDXd6gOxri3JZBO5VEBskuAYUpUzVSLmtAKpf1CfUQtjrUfvBSNr53iuwWwqUc+8MY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LHlBcmXK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LHlBcmXK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 541DCC433F1; Mon, 29 Jan 2024 14:51:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539895; bh=L0X3dLyOPy67nt8GCLWy13o2002WEL1KVLp4Q4jt1ZU=; h=Subject:From:To:Date:In-Reply-To:References:From; b=LHlBcmXKHf0boe5XGZkjpuRzJr+v3DH99t7pucSkpV/ptjSx3yQ5tCvupjq588i8J cwV1c1ehTOKZsF6c9kF0vCymu1fTIxTOmxy9r06BDF4fEhp+BXT8IJU4gfuec1v5BX jge5nxXKxVPrRtx6SxC+2MQPWRpuExjhSaw5NsXUTcYzyxz82oV9kcpS/vDxTBfK4F 5+hYkBTBNU/8+TPuLzy4EyumldiJPKnxtMIcbTkVmC2tgU4B6MChN4wbum/oVH+0DR k53AgEEvzneRtPk14OhmVjI/kLEzeUrv8hNHVMSqMOr+xK1PJtMkl7j8UK7PDQVdQB ukUOcIeSeZ2Eg== Subject: [PATCH v1 10/11] svcrdma: Post WRs for Write chunks in svc_rdma_sendto() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:34 -0500 Message-ID: <170653989440.24162.14938748106287967969.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Refactor to eventually enable svcrdma to post the Write WRs for each RPC response using the same ib_post_send() as the Send WR (ie, as a single WR chain). svc_rdma_result_payload (originally svc_rdma_read_payload) was added so that the upper layer XDR encoder could identify a range of bytes to be possibly conveyed by RDMA (if a Write chunk was provided by the client). The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while XDR encoding replies") was to post as much of the result payload outside of svc_rdma_sendto() as possible because svc_rdma_sendto() used to be called with the xpt_mutex held. However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into socket xpo_sendto methods"), the xpt_mutex is no longer held when calling svc_rdma_sendto(). Thus, that benefit is no longer an issue. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 6 ++-- net/sunrpc/xprtrdma/svc_rdma_rw.c | 56 ++++++++++++++++++++++----------- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 30 ++++++------------ 3 files changed, 51 insertions(+), 41 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index ac882bd23ca2..d33bab33099a 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -272,9 +272,9 @@ extern void svc_rdma_cc_release(struct svcxprt_rdma *rdma, enum dma_data_direction dir); extern void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt); -extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_chunk *chunk, - const struct xdr_buf *xdr); +extern int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_recv_ctxt *rctxt, + const struct xdr_buf *xdr); extern int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_pcl *write_pcl, const struct svc_rdma_pcl *reply_pcl, diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 2b25edc6c73c..40797114d50a 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -601,47 +601,65 @@ static int svc_rdma_xb_write(const struct xdr_buf *xdr, void *data) return xdr->len; } -/** - * svc_rdma_send_write_chunk - Write all segments in a Write chunk - * @rdma: controlling RDMA transport - * @chunk: Write chunk provided by the client - * @xdr: xdr_buf containing the data payload - * - * Returns a non-negative number of bytes the chunk consumed, or - * %-E2BIG if the payload was larger than the Write chunk, - * %-EINVAL if client provided too many segments, - * %-ENOMEM if rdma_rw context pool was exhausted, - * %-ENOTCONN if posting failed (connection is lost), - * %-EIO if rdma_rw initialization failed (DMA mapping, etc). - */ -int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_chunk *chunk, - const struct xdr_buf *xdr) +static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, + const struct svc_rdma_chunk *chunk, + const struct xdr_buf *xdr) { struct svc_rdma_write_info *info; struct svc_rdma_chunk_ctxt *cc; + struct xdr_buf payload; int ret; + if (xdr_buf_subsegment(xdr, &payload, chunk->ch_position, + chunk->ch_payload_length)) + return -EMSGSIZE; + info = svc_rdma_write_info_alloc(rdma, chunk); if (!info) return -ENOMEM; cc = &info->wi_cc; - ret = svc_rdma_xb_write(xdr, info); - if (ret != xdr->len) + ret = svc_rdma_xb_write(&payload, info); + if (ret != payload.len) goto out_err; trace_svcrdma_post_write_chunk(&cc->cc_cid, cc->cc_sqecount); ret = svc_rdma_post_chunk_ctxt(rdma, cc); if (ret < 0) goto out_err; - return xdr->len; + return 0; out_err: svc_rdma_write_info_free(info); return ret; } +/** + * svc_rdma_send_write_list - Send all chunks on the Write list + * @rdma: controlling RDMA transport + * @rctxt: Write list provisioned by the client + * @xdr: xdr_buf containing an RPC Reply message + * + * Returns zero on success, or a negative errno if one or more + * Write chunks could not be sent. + */ +int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_recv_ctxt *rctxt, + const struct xdr_buf *xdr) +{ + struct svc_rdma_chunk *chunk; + int ret; + + pcl_for_each_chunk(chunk, &rctxt->rc_write_pcl) { + if (!chunk->ch_payload_length) + break; + ret = svc_rdma_send_write_chunk(rdma, chunk, xdr); + if (ret < 0) + return ret; + } + return 0; +} + /** * svc_rdma_prepare_reply_chunk - Construct WR chain for writing the Reply chunk * @rdma: controlling RDMA transport diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 6dfd2232ce5b..bb5436b719e0 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -1013,6 +1013,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; + ret = svc_rdma_send_write_list(rdma, rctxt, &rqstp->rq_res); + if (ret < 0) + goto put_ctxt; + rc_size = 0; if (!pcl_is_empty(&rctxt->rc_reply_pcl)) { ret = svc_rdma_prepare_reply_chunk(rdma, &rctxt->rc_write_pcl, @@ -1064,45 +1068,33 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) /** * svc_rdma_result_payload - special processing for a result payload - * @rqstp: svc_rqst to operate on - * @offset: payload's byte offset in @xdr + * @rqstp: RPC transaction context + * @offset: payload's byte offset in @rqstp->rq_res * @length: size of payload, in bytes * + * Assign the passed-in result payload to the current Write chunk, + * and advance to cur_result_payload to the next Write chunk, if + * there is one. + * * Return values: * %0 if successful or nothing needed to be done - * %-EMSGSIZE on XDR buffer overflow * %-E2BIG if the payload was larger than the Write chunk - * %-EINVAL if client provided too many segments - * %-ENOMEM if rdma_rw context pool was exhausted - * %-ENOTCONN if posting failed (connection is lost) - * %-EIO if rdma_rw initialization failed (DMA mapping, etc) */ int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length) { struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt; struct svc_rdma_chunk *chunk; - struct svcxprt_rdma *rdma; - struct xdr_buf subbuf; - int ret; chunk = rctxt->rc_cur_result_payload; if (!length || !chunk) return 0; rctxt->rc_cur_result_payload = pcl_next_chunk(&rctxt->rc_write_pcl, chunk); + if (length > chunk->ch_length) return -E2BIG; - chunk->ch_position = offset; chunk->ch_payload_length = length; - - if (xdr_buf_subsegment(&rqstp->rq_res, &subbuf, offset, length)) - return -EMSGSIZE; - - rdma = container_of(rqstp->rq_xprt, struct svcxprt_rdma, sc_xprt); - ret = svc_rdma_send_write_chunk(rdma, chunk, &subbuf); - if (ret < 0) - return ret; return 0; } From patchwork Mon Jan 29 14:51:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13535828 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E463152DE7; Mon, 29 Jan 2024 14:51:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539902; cv=none; b=VikAxYD34lEzACYuUFnA54h3oERGnG0mn3b5XUBj37Khju3SqWYd69IpTxvq6p/dXurIE4qvlGowQOnmOKusgs/28ABlblOahZtrdPlxn8hgubZ6xhOga1yO9DWAHr1D/m4cMnVSA7QUIrmF+Sv5KTSiB7NH7eZtXzJ08eFdOYc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706539902; c=relaxed/simple; bh=fp9acj4ZEwK1Xq922shpXg6CpeNaFsC5ZBGJptxgrSI=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rhP0aa7K2VVCiCmz1AfTogY+uVkVaDOXMTlv49u0GC4y2ahI0GHc3hg2oFLPcWC73/Gof+LbU5tP+rFT3BRrW8AiuNZJRqx4XkrZDLBM6fYo3vsrwXzmOKj5lqGJYMk6XIssCQN9kMMa0RGyram5/2xo9/OSHSxLFm3JTLc+cXo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J05Wwjlc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J05Wwjlc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id ACFC7C433F1; Mon, 29 Jan 2024 14:51:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706539901; bh=fp9acj4ZEwK1Xq922shpXg6CpeNaFsC5ZBGJptxgrSI=; h=Subject:From:To:Date:In-Reply-To:References:From; b=J05WwjlcNF2VcBNtwSxK5f+C5e4kkwb/T6LV+9ISpRXpN+fdokQ9/OM1xKsYTcIfa CoOsYNqyH+Lz5ciyXZYAsI4zqtfvTaZCHIVme3aUPm6wP0sCns8oae7MhqlKJLQcqA uj/Wcg/+xKbU586nRs5f0cDKHHmktNJ16OK9wtXIo+I78dmozChgxci3gQ0ePKL5Dy wuEu0LYYnGFGCNrQp+eouwpJ1R11yFUflOjv9xHQ6fpXHYMlzk022zCdcSI9bIcULO MU3jBaAz8t8CHhdEE2HX0BaYUpeZLBg2iGIEQaa2mrpEV9Vsn4E5nnfY7NiMiNmL+j l7TCsg08CNQGQ== Subject: [PATCH v1 11/11] svcrdma: Add Write chunk WRs to the RPC's Send WR chain From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Mon, 29 Jan 2024 09:51:40 -0500 Message-ID: <170653990074.24162.7550506379641649738.stgit@manet.1015granger.net> In-Reply-To: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> References: <170653967395.24162.4661804176845293777.stgit@manet.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Chain RDMA Writes that convey Write chunks onto the local Send chain. This means all WRs for an RPC Reply are now posted with a single ib_post_send() call, and there is a single Send completion when all of these are done. That reduces both the per-transport doorbell rate and completion rate. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 13 ++++- net/sunrpc/xprtrdma/svc_rdma_rw.c | 86 +++++++++++++++++++++++++-------- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 5 ++ 3 files changed, 78 insertions(+), 26 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index d33bab33099a..24cd199dd6f3 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -210,6 +210,7 @@ struct svc_rdma_recv_ctxt { */ struct svc_rdma_write_info { struct svcxprt_rdma *wi_rdma; + struct list_head wi_list; const struct svc_rdma_chunk *wi_chunk; @@ -238,7 +239,10 @@ struct svc_rdma_send_ctxt { struct ib_cqe sc_cqe; struct xdr_buf sc_hdrbuf; struct xdr_stream sc_stream; + + struct list_head sc_write_info_list; struct svc_rdma_write_info sc_reply_info; + void *sc_xprt_buf; int sc_page_count; int sc_cur_sge_no; @@ -270,11 +274,14 @@ extern void svc_rdma_cc_init(struct svcxprt_rdma *rdma, extern void svc_rdma_cc_release(struct svcxprt_rdma *rdma, struct svc_rdma_chunk_ctxt *cc, enum dma_data_direction dir); +extern void svc_rdma_write_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt); extern void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt); -extern int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - const struct xdr_buf *xdr); +extern int svc_rdma_prepare_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr); extern int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_pcl *write_pcl, const struct svc_rdma_pcl *reply_pcl, diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 40797114d50a..f2a100c4c81f 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -230,6 +230,28 @@ static void svc_rdma_write_info_free(struct svc_rdma_write_info *info) queue_work(svcrdma_wq, &info->wi_work); } +/** + * svc_rdma_write_chunk_release - Release Write chunk I/O resources + * @rdma: controlling transport + * @ctxt: Send context that is being released + */ +void svc_rdma_write_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt) +{ + struct svc_rdma_write_info *info; + struct svc_rdma_chunk_ctxt *cc; + + while (!list_empty(&ctxt->sc_write_info_list)) { + info = list_first_entry(&ctxt->sc_write_info_list, + struct svc_rdma_write_info, wi_list); + list_del(&info->wi_list); + + cc = &info->wi_cc; + svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); + svc_rdma_write_info_free(info); + } +} + /** * svc_rdma_reply_chunk_release - Release Reply chunk I/O resources * @rdma: controlling transport @@ -286,13 +308,11 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) struct ib_cqe *cqe = wc->wr_cqe; struct svc_rdma_chunk_ctxt *cc = container_of(cqe, struct svc_rdma_chunk_ctxt, cc_cqe); - struct svc_rdma_write_info *info = - container_of(cc, struct svc_rdma_write_info, wi_cc); switch (wc->status) { case IB_WC_SUCCESS: trace_svcrdma_wc_write(&cc->cc_cid); - break; + return; case IB_WC_WR_FLUSH_ERR: trace_svcrdma_wc_write_flush(wc, &cc->cc_cid); break; @@ -300,12 +320,11 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_write_err(wc, &cc->cc_cid); } - svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); - - if (unlikely(wc->status != IB_WC_SUCCESS)) - svc_xprt_deferred_close(&rdma->sc_xprt); - - svc_rdma_write_info_free(info); + /* The RDMA Write has flushed, so the client won't get + * some of the outgoing RPC message. Signal the loss + * to the client by closing the connection. + */ + svc_xprt_deferred_close(&rdma->sc_xprt); } /** @@ -601,13 +620,19 @@ static int svc_rdma_xb_write(const struct xdr_buf *xdr, void *data) return xdr->len; } -static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_chunk *chunk, - const struct xdr_buf *xdr) +/* Link Write WRs for @chunk onto @sctxt's WR chain. + */ +static int svc_rdma_prepare_write_chunk(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *sctxt, + const struct svc_rdma_chunk *chunk, + const struct xdr_buf *xdr) { struct svc_rdma_write_info *info; struct svc_rdma_chunk_ctxt *cc; + struct ib_send_wr *first_wr; struct xdr_buf payload; + struct list_head *pos; + struct ib_cqe *cqe; int ret; if (xdr_buf_subsegment(xdr, &payload, chunk->ch_position, @@ -623,10 +648,25 @@ static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, if (ret != payload.len) goto out_err; - trace_svcrdma_post_write_chunk(&cc->cc_cid, cc->cc_sqecount); - ret = svc_rdma_post_chunk_ctxt(rdma, cc); - if (ret < 0) + ret = -EINVAL; + if (unlikely(cc->cc_sqecount > rdma->sc_sq_depth)) goto out_err; + + first_wr = sctxt->sc_wr_chain; + cqe = &cc->cc_cqe; + list_for_each(pos, &cc->cc_rwctxts) { + struct svc_rdma_rw_ctxt *rwc; + + rwc = list_entry(pos, struct svc_rdma_rw_ctxt, rw_list); + first_wr = rdma_rw_ctx_wrs(&rwc->rw_ctx, rdma->sc_qp, + rdma->sc_port_num, cqe, first_wr); + cqe = NULL; + } + sctxt->sc_wr_chain = first_wr; + sctxt->sc_sqecount += cc->cc_sqecount; + list_add(&info->wi_list, &sctxt->sc_write_info_list); + + trace_svcrdma_post_write_chunk(&cc->cc_cid, cc->cc_sqecount); return 0; out_err: @@ -635,25 +675,27 @@ static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, } /** - * svc_rdma_send_write_list - Send all chunks on the Write list + * svc_rdma_prepare_write_list - Construct WR chain for sending Write list * @rdma: controlling RDMA transport - * @rctxt: Write list provisioned by the client + * @write_pcl: Write list provisioned by the client + * @sctxt: Send WR resources * @xdr: xdr_buf containing an RPC Reply message * * Returns zero on success, or a negative errno if one or more * Write chunks could not be sent. */ -int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - const struct xdr_buf *xdr) +int svc_rdma_prepare_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr) { struct svc_rdma_chunk *chunk; int ret; - pcl_for_each_chunk(chunk, &rctxt->rc_write_pcl) { + pcl_for_each_chunk(chunk, write_pcl) { if (!chunk->ch_payload_length) break; - ret = svc_rdma_send_write_chunk(rdma, chunk, xdr); + ret = svc_rdma_prepare_write_chunk(rdma, sctxt, chunk, xdr); if (ret < 0) return ret; } diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index bb5436b719e0..dfca39abd16c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -142,6 +142,7 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma) ctxt->sc_send_wr.sg_list = ctxt->sc_sges; ctxt->sc_send_wr.send_flags = IB_SEND_SIGNALED; ctxt->sc_cqe.done = svc_rdma_wc_send; + INIT_LIST_HEAD(&ctxt->sc_write_info_list); ctxt->sc_xprt_buf = buffer; xdr_buf_init(&ctxt->sc_hdrbuf, ctxt->sc_xprt_buf, rdma->sc_max_req_size); @@ -227,6 +228,7 @@ static void svc_rdma_send_ctxt_release(struct svcxprt_rdma *rdma, struct ib_device *device = rdma->sc_cm_id->device; unsigned int i; + svc_rdma_write_chunk_release(rdma, ctxt); svc_rdma_reply_chunk_release(rdma, ctxt); if (ctxt->sc_page_count) @@ -1013,7 +1015,8 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; - ret = svc_rdma_send_write_list(rdma, rctxt, &rqstp->rq_res); + ret = svc_rdma_prepare_write_list(rdma, &rctxt->rc_write_pcl, sctxt, + &rqstp->rq_res); if (ret < 0) goto put_ctxt;