From patchwork Sun Feb 4 23:16:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544886 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FA7ABA33; Sun, 4 Feb 2024 23:16:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088599; cv=none; b=lq+CJMMuae9OfvKjKOsNQA8WUFoTMGUWahwrBRqrSrioPIIBT4Zi3jdvj54XIAM60isZd76wkTQ3HgspuJ6AzyORNWzlG/mmaGsppsCWsmJxT8PhIpdbPvVITGrV9OsCouVSZ60n1A+PbhBMeKIii5PkXAtsFLqTKxkdJSHpn1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088599; c=relaxed/simple; bh=05MA2D7vfCBiHZrUjlGcpBFOLywfR7Q9Kfb3fvxMbMQ=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XhRBgQxduGpl3gEptbWEgkNsGdxBhWnUPBbPZjdxP6jLZSVcsmNPK2P6OdmApxv1Da/RchzUKwitrsrqZWUzL/hnNVdrw+di0aZ3fDbcBKudH78/FAAj1RltikaHoO5i+5KfS7OGgd3SgeiSzm5cse519RdNn7IB2zRE5TlqFXs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=K6OtCpCs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="K6OtCpCs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C962DC433C7; Sun, 4 Feb 2024 23:16:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088599; bh=05MA2D7vfCBiHZrUjlGcpBFOLywfR7Q9Kfb3fvxMbMQ=; h=Subject:From:To:Date:In-Reply-To:References:From; b=K6OtCpCsFaPsqALCkMJSC7o8M++/WAEER1Q7LwVDrNaPDuj+W8zNHblUUYngekJGC S+hUg5yHl6NHims6fjYLMk5l5KCKD5qOSeiEJPqdPeIcLwUegQqOkqtGomHtE0T6wU NcKznmhU7i5FbF74Xhu0HsiuTWbbgBWSkcbnZWLvl7RnexkAk9VgoQP+7lJmqSrNh5 IAh/q8IMmuk6TT/XtD8Te+HRkp6POavfVQotJ+WW77sOOwFHKCw48hHf29vhaWOsPM vzxKMq6nrIWiN+Pfnw9TIRSCOlYK3sOOmuUUospTQrT8b6VkMry57nE1EqZTQWVmQc a+6WMlZ+XL1vw== Subject: [PATCH v2 01/12] svcrdma: Reserve an extra WQE for ib_drain_rq() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:16:37 -0500 Message-ID: <170708859782.28128.17719864297677716598.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Do as other ULPs already do: ensure there is an extra Receive WQE reserved for the tear-down drain WR. I haven't heard reports of problems but it can't hurt. Note that rq_depth is used to compute the Send Queue depth as well, so this fix should affect both the SQ and RQ. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 4f27325ace4a..4a038c7e86f9 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -415,7 +415,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge) newxprt->sc_max_send_sges = dev->attrs.max_send_sge; rq_depth = newxprt->sc_max_requests + newxprt->sc_max_bc_requests + - newxprt->sc_recv_batch; + newxprt->sc_recv_batch + 1 /* drain */; if (rq_depth > dev->attrs.max_qp_wr) { rq_depth = dev->attrs.max_qp_wr; newxprt->sc_recv_batch = 1; From patchwork Sun Feb 4 23:16:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544887 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4AE4BE58; Sun, 4 Feb 2024 23:16:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088606; cv=none; b=S9K+WCKfoFXsLjTgvB0sstxp1IX3LoaujRIgqECuxZ1SumQgjPWR98OJJPG12R94X4nC5f23/w4t8PaPmrAyFK5VO+p8Lv9Sr8WJ38mVOzGlFrKmhrlaRtkxn+DkdCd8OMkf97bkUUZuxkwbrw3LZ3NYv+DS3oQdVbSppO+mqpU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088606; c=relaxed/simple; bh=+gbBmId/6GSOy+HqZ3CYqeEHHRAAh5ME6lwcXFo6LdY=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=iKVgCdEed/j/yhiZJtLwha9aJFvFNfvirg8EW1XslysDMDQ6SegIAiNw3VXH+neSqxaZLGY0b8U/dLj4DEvGWkkbQpKiOBj28gEYNkZelk+oMZW4eftnbQRjAoh1u8/ImxsadIOJSR8P7vp+q+H7xGBigJHTZwYSkPv4DfLMMPU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GC8XHjpX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GC8XHjpX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F776C433C7; Sun, 4 Feb 2024 23:16:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088605; bh=+gbBmId/6GSOy+HqZ3CYqeEHHRAAh5ME6lwcXFo6LdY=; h=Subject:From:To:Date:In-Reply-To:References:From; b=GC8XHjpXosjQrLJIjGltbvC7QElyh8Xi0Agf2miE+g8MsZ0yLSf/5dPX3pD+ShyWA daeJb46g7odeVuwIHnecFrSSYNiKKL7M+bSlxAE2NhEUx1xsCbYxJL54t0Vd0WoRqw ruJWE4dgflNquTeHrpTBoTOUIlg3wJEXyhHsBlKuXWY4BVHu6Jqri/hy3gvk/eT++7 UahILIkeQIwZMnoBINLx2uzclwkI2Rctm1J7In+nwsGRa9958MG8Xpy7iUM6hdji+X wXIxcIWOrIYNGC576A30N9Jxaxe0MRbrpQDR+LMW9TUM4XfmCUgGUtxzoEIdJURenv mhxC+KKkkdXZw== Subject: [PATCH v2 02/12] svcrdma: Report CQ depths in debugging output From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:16:44 -0500 Message-ID: <170708860419.28128.11517073619616677423.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Check that svc_rdma_accept() is allocating an appropriate number of CQEs. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 4a038c7e86f9..8be0493797cf 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -460,7 +460,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) qp_attr.cap.max_send_wr, qp_attr.cap.max_recv_wr); dprintk(" cap.max_send_sge = %d, cap.max_recv_sge = %d\n", qp_attr.cap.max_send_sge, qp_attr.cap.max_recv_sge); - + dprintk(" send CQ depth = %u, recv CQ depth = %u\n", + newxprt->sc_sq_depth, rq_depth); ret = rdma_create_qp(newxprt->sc_cm_id, newxprt->sc_pd, &qp_attr); if (ret) { trace_svcrdma_qp_err(newxprt, ret); From patchwork Sun Feb 4 23:16:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544888 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B31AD510; Sun, 4 Feb 2024 23:16:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088612; cv=none; b=MIwA+qs2fubfTUn96/DsIMbp7MDukhN6MyWXSZoVegch/oRdLSIOqJebxc2DWshMg7b+aOWLAvwJlOZjuxFuYR+3c6Syn9h11Y0WCw5iXAU8LcsQ5eTMyzV+kaCPUxzOzhn3F/QQFLSeVfXgyJW5EQ5+2oFqDG8Vp1J11jcPx9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088612; c=relaxed/simple; bh=6BFFnRKPQsFPc2U3vG/2FJmT0Fq08EU7vYWqhaAf2dw=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=B7vVBQNXTlm90lo5KiOGKeofLwJ2saCujOdKxq2ka22HmKcn3L6jqWPIlLJ1gQ1CowxDqPqPuDYrt9ujMUCSP73Cex01E4n4ZYcLXzccTozXO6sDdpmgrWV94lXZEWSylWcrlcmlBTTvC2y9E6Hgc7RCVNabYfcouBVUuFJRN7k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bKbEjRhM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bKbEjRhM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7EB1FC43390; Sun, 4 Feb 2024 23:16:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088611; bh=6BFFnRKPQsFPc2U3vG/2FJmT0Fq08EU7vYWqhaAf2dw=; h=Subject:From:To:Date:In-Reply-To:References:From; b=bKbEjRhM0GN46T0/8wXeWlF5/NxhtkIRyBA0cADLf38/X0LmE5Z1XoSsmZ8ltOfmf Ab2AOeT11TPatLQ/AgqN85hYpBXdXyL7/gYRIKetWvXZslXsQpaTJT6OIbV2C1K87C uwXPbfcURSjA9AkFBIQLm+qm0mmK05B6HBKXEFBlDcyAvN5QkMNgcb4CZIBg0Zz//k DsAUZG6vSY5mphwpUBJAp5jywSCqy26saObBZ5WybHSB4bLw2x8whx8zRce9Nc42uo rkpJ6LVPXzdqozMohBwt8sygDDy593g4O5DoayY2knk0RmwQSLV3zNRFJ3rgmz2EAa UZVYckMjB+ohw== Subject: [PATCH v2 03/12] svcrdma: Update max_send_sges after QP is created From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:16:50 -0500 Message-ID: <170708861056.28128.2372400452643006250.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever rdma_create_qp() can modify cap.max_send_sges. Copy the new value to the svcrdma transport so it is bound by the new limit instead of the requested one. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 8be0493797cf..839c0e80e5cd 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -467,6 +467,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) trace_svcrdma_qp_err(newxprt, ret); goto errout; } + newxprt->sc_max_send_sges = qp_attr.cap.max_send_sge; newxprt->sc_qp = newxprt->sc_cm_id->qp; if (!(dev->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)) From patchwork Sun Feb 4 23:16:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544889 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85D7ADDB2; Sun, 4 Feb 2024 23:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088618; cv=none; b=B8POd+fUi92Q1F+T5yVcs+BOy5YX2XCESSpducwkzcPwwh20bIPzleS3fWv3JZHZsZTxd+oZ8aGe1XWcYOqmy2B7edWwdVGc2UUUvD2qt+Hsp+Aa1+Gt0iLcwl8ot6llNbInxLqafns5DQrroi0r/zQYDm/S8vtxI1IFnFXEU6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088618; c=relaxed/simple; bh=+V9dSTnar5Zbg2d9qB3pTwMQy1aH5p5NMLsE+cUSjSw=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Dix5Upjb69Tf3a03CfAeQnzf9qoyHV9PJAnJBY38pLHMK4+w34mIV5+vEU/y1+xYmfQYdZpHrq/HU7UK4kR/CLg/uMkvzhR84L8NyfDxODLTuozf2GHdJv0UaEgPJnuf2jsuz99ovVRty+6habm5Inrub8ZzLkumVJiL4DK9sho= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Cc0IUMvV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Cc0IUMvV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5C52C433C7; Sun, 4 Feb 2024 23:16:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088618; bh=+V9dSTnar5Zbg2d9qB3pTwMQy1aH5p5NMLsE+cUSjSw=; h=Subject:From:To:Date:In-Reply-To:References:From; b=Cc0IUMvVz4uDtfTNW7ITk0Im4g2H4o8cwHMHYwy+la43eePM5s4QNhgtdfzLQ66bx wH7sDhiR8yT0gFEUFkS2QrY+Gz5CvM2kTeY2GRR2qG7dWt2SA02zH04ZP+4khLbBAZ 7amgAbkAE8X7Y/733KINY4lFLmOWwfY8ITF9zHLtKNc6yQ+1bFUYd0kN2o97lh8FpX ueEb5xD5KmtZPqmBGyn23wWx/58NHh75OlOxAo5CUtJTv786JVdy9p2bMYnJMQODmN /qgiVIogXxV/WHB7BjPoWzkD21ZDp8tA7gX55UmhF0YaeiZcnBrpuyk6h6T4G0iVCG CUigivrIca2pw== Subject: [PATCH v2 04/12] svcrdma: Increase the per-transport rw_ctx count From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:16:56 -0500 Message-ID: <170708861688.28128.16380294131274226696.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever rdma_rw_mr_factor() returns the smallest number of MRs needed to move a particular number of pages. svcrdma currently asks for the number of MRs needed to move RPCSVC_MAXPAGES (a little over one megabyte), as that is the number of pages in the largest r/wsize the server supports. This call assumes that the client's NIC can bundle a full one megabyte payload in a single rdma_segment. In fact, most NICs cannot handle a full megabyte with a single rkey / rdma_segment. Clients will typically split even a single Read chunk into many segments. The server needs one MR to read each rdma_segment in a Read chunk, and thus each one needs an rw_ctx. svcrdma has been vastly underestimating the number of rw_ctxs needed to handle 64 RPC requests with large Read chunks using small rdma_segments. Unfortunately there doesn't seem to be a good way to estimate this number without knowing the client NIC's capabilities. Even then, the client RPC/RDMA implementation is still free to split a chunk into smaller segments (for example, it might be using physical registration, which needs an rdma_segment per page). The best we can do for now is choose a number that will guarantee forward progress in the worst case (one page per segment). At some later point, we could add some mechanisms to make this much less of a problem: - Add a core API to add more rw_ctxs to an already-established QP - svcrdma could treat rw_ctx exhaustion as a temporary error and try again - Limit the number of Reads in flight Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_transport.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 839c0e80e5cd..2b1c16b9547d 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -422,8 +422,13 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) newxprt->sc_max_requests = rq_depth - 2; newxprt->sc_max_bc_requests = 2; } - ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES); - ctxts *= newxprt->sc_max_requests; + + /* Arbitrarily estimate the number of rw_ctxs needed for + * this transport. This is enough rw_ctxs to make forward + * progress even if the client is using one rkey per page + * in each Read chunk. + */ + ctxts = 3 * RPCSVC_MAXPAGES; newxprt->sc_sq_depth = rq_depth + ctxts; if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr) newxprt->sc_sq_depth = dev->attrs.max_qp_wr; From patchwork Sun Feb 4 23:17:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544890 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2B92BE49; Sun, 4 Feb 2024 23:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088625; cv=none; b=RrZW2QP+c0c4XCl8MzpDo2PFR4uphcvKv4KdFT4jVhB3JXc1QqIukBHSrEtv7Xuk8b9UoAJ/koF5l9AT1qj3NAKH2kuMTZzoeuGToozZfQ2J2qCR33ZyLP7FFAqiKvntquNbeJ3X/cSH9CxXpqIbsx1WOZfhEGcZCp/FQKtHBu0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088625; c=relaxed/simple; bh=bvcVwU0dkB9Z73d5B21G/1JHHanO0TgLYJApJSvI37M=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=L1pOIkr7RiD6U+9hlcHAjK6Tq4UmRmaj6HPYlW0LvSGjLcl2SZlvkIQYYBCN8swf5mk3Qsu06EvXYbSz3aHnGWSiLohGERh2CUEOZoIIZ1Y2k7V3lmrtaYm33AT9mOQZ9om60N+fh6JrtFhLuwshBqms53yGMJuSIW3+5wEB04I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i9Stj1ry; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i9Stj1ry" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3A825C43394; Sun, 4 Feb 2024 23:17:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088624; bh=bvcVwU0dkB9Z73d5B21G/1JHHanO0TgLYJApJSvI37M=; h=Subject:From:To:Date:In-Reply-To:References:From; b=i9Stj1ryMTeI6JfgI/KCa++7zi49XFVdTCVmw6dKB0sUmQRjn5EmqoQV4MUyOdPrV Lng27UEDGr/UIhnFsO0xH+/GTwe54p3CoqQK8P60qepEzkEocFvkyirQajBt7bec5i 3zz9+g/qTHIFmaowRB725Z6gFW1bn4OhKWbILO1cCUpF7GZT/gkPSqQxOEjhSEiwD+ OqGa5T8FT5IC1sZzNzbBo7i6XP3wu9MzhiufKVHr224SYN+e8mUJllHA3dp0UGW9OQ Pcm+b5YqZ4dF2k4nrzew9oitE/fRvskNBeY96+/6AvPjVzJoeIq+EmPAKTzx2c5rKl mAFOOvx/pSeWA== Subject: [PATCH v2 05/12] svcrdma: Fix SQ wake-ups From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:03 -0500 Message-ID: <170708862325.28128.5793674364971023505.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Ensure there is a wake-up when increasing sc_sq_avail. Likewise, if a wake-up is done, sc_sq_avail needs to be updated, otherwise the wait_event() conditional is never going to be met. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 1a49b7f02041..f1f5c7b58fce 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -335,11 +335,11 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) /* If the SQ is full, wait until an SQ entry is available */ while (1) { if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { + svc_rdma_wake_send_waiters(rdma, 1); percpu_counter_inc(&svcrdma_stat_sq_starve); trace_svcrdma_sq_full(rdma, &ctxt->sc_cid); - atomic_inc(&rdma->sc_sq_avail); wait_event(rdma->sc_send_wait, - atomic_read(&rdma->sc_sq_avail) > 1); + atomic_read(&rdma->sc_sq_avail) > 0); if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) return -ENOTCONN; trace_svcrdma_sq_retry(rdma, &ctxt->sc_cid); @@ -355,7 +355,7 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) trace_svcrdma_sq_post_err(rdma, &ctxt->sc_cid, ret); svc_xprt_deferred_close(&rdma->sc_xprt); - wake_up(&rdma->sc_send_wait); + svc_rdma_wake_send_waiters(rdma, 1); return ret; } From patchwork Sun Feb 4 23:17:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544891 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46C19D26B; Sun, 4 Feb 2024 23:17:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088631; cv=none; b=Pd5F0HKvZCPr4rjB33joimsTQ/IufFaNohuUV+qoxvkBALl4Jj9KViCtqW0fdCTIUi+bnO/5AwTS3DWQTx9xYrsch8Hp33jXrVMa+qdx0AlCitBgR0Il40K9uiij89x5OfupMwhLhyuY+e0bU1AH7/a1Gnt2L+93/8PGS2NOQhc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088631; c=relaxed/simple; bh=Q8M+ZV8AwjWJN106yxlomJOUg4m9WT4jLLZhalyAC+0=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qS3fo8VkAfzTmuVewQRxNM5ttm0vu2IL2XeXZVWh8n6s/ZaBUhrZTpHmRaf2gaczcIGLltj321TQssm+KxMqL1AWBif7D63CePKFvPY/dVccW3heFDWRmtgQc1q3VzEiIfe6nqD0+8p8KBg5Zq04TAPGj4bXsRHteBTSSgeT76U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=g6wD4lS3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g6wD4lS3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93E6FC433C7; Sun, 4 Feb 2024 23:17:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088630; bh=Q8M+ZV8AwjWJN106yxlomJOUg4m9WT4jLLZhalyAC+0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=g6wD4lS3OawgAmTkiVVgqp3YK9EZ7F9uFjWslDLr+F0qWeAdSaCex55PTBEzIlhj6 pYUAWpsMYDHZDx2nATKIb0NwlQg5LBjZDFZBrGk5jAGH1krKaAiLFfTKTTGjzVXxiC zZcg48CxitlgnzpstxhAbCvKds+0TQ2bQY5VKuJbE30hjTpmcy476n10pH8xQCuyxK AxYzkyxF5Oa5PF6Gdp0PY31OQIksMKpjQ4X/PKcgi7Msg78O5VBepFwrTaCKBrHnOw aSXbyFeAMOaUrmUXtZgXukqgahRjLdxLJAzaChVAc4HSdSM2EjA/rofe+gHFTXCE5m SSTrLIxI8kmSQ== Subject: [PATCH v2 06/12] svcrdma: Prevent a UAF in svc_rdma_send() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:09 -0500 Message-ID: <170708862961.28128.4635466330893173032.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever In some error flow cases, svc_rdma_wc_send() releases @ctxt. Copy the sc_cid field in @ctxt to a stack variable in order to guarantee that the value is available after the ib_post_send() call. In case the new comment looks a little strange, this will be done with at least one more field in a subsequent patch. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index f1f5c7b58fce..b6fc9299b472 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -316,12 +316,17 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) * @rdma: transport on which to post the WR * @ctxt: send ctxt with a Send WR ready to post * + * Copy fields in @ctxt to stack variables in order to guarantee + * that these values remain available after the ib_post_send() call. + * In some error flow cases, svc_rdma_wc_send() releases @ctxt. + * * Returns zero if the Send WR was posted successfully. Otherwise, a * negative errno is returned. */ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) { struct ib_send_wr *wr = &ctxt->sc_send_wr; + struct rpc_rdma_cid cid = ctxt->sc_cid; int ret; might_sleep(); @@ -337,12 +342,12 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { svc_rdma_wake_send_waiters(rdma, 1); percpu_counter_inc(&svcrdma_stat_sq_starve); - trace_svcrdma_sq_full(rdma, &ctxt->sc_cid); + trace_svcrdma_sq_full(rdma, &cid); wait_event(rdma->sc_send_wait, atomic_read(&rdma->sc_sq_avail) > 0); if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) return -ENOTCONN; - trace_svcrdma_sq_retry(rdma, &ctxt->sc_cid); + trace_svcrdma_sq_retry(rdma, &cid); continue; } @@ -353,7 +358,7 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) return 0; } - trace_svcrdma_sq_post_err(rdma, &ctxt->sc_cid, ret); + trace_svcrdma_sq_post_err(rdma, &cid, ret); svc_xprt_deferred_close(&rdma->sc_xprt); svc_rdma_wake_send_waiters(rdma, 1); return ret; From patchwork Sun Feb 4 23:17:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544892 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A413BC8CE; Sun, 4 Feb 2024 23:17:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088637; cv=none; b=GxkAKrn3B9EN9D3bB7+qBpmVD4CrPh6/ufOFws0waf8LivIshStWfrME/6v1P/J5q0hw9FExHxiGS25xSmzbFJHq19+Is1I8lZ2muWWQ/chqT+VxNexMG2LfJfRyQpSn/m3Bohz1MwoN1o81gKzaml6Pk+B3VC/kBNjfipURDII= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088637; c=relaxed/simple; bh=AxYa8OHD5oTe38D0Fdj6WOQrLqgmLObq7aN1YDqlrJ0=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p698ngxlxzyFt/Apht89NdMz9GMPH36EEf4ledyGNwBe5g0zEcMItmFt1ZtMHgsnihIBJjsu2tjQzUnqVBOgphjD8QR2tEeJ0wjSstYwzyMqOPkiQMkhsv6af+hVHUvu8NBI0eKaETZRQa+CdfiK0lE3CWEYR0pY/BWOlfJIU2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=t3ZRPlOD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="t3ZRPlOD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EA73DC433F1; Sun, 4 Feb 2024 23:17:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088637; bh=AxYa8OHD5oTe38D0Fdj6WOQrLqgmLObq7aN1YDqlrJ0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=t3ZRPlODTXsGMSiWB+An5bh0qOU2tvsBnmzYhorAY7pFj8ZkeBVxPfIvRHdODg0NO aJyun45+cZ/LBjaWOw8jgYYOCa3QoX0Txd/qupOOSWUR7S3N1U5Ur06J7lqiU4t8g+ zqLXiMd7B4xWt4vZletyPnb4fI77wb+i1U5AUFLnXPR7ghSMgqbtQJnidI5lN36pNu ZBwurP2bvI3S/jLHm51lpgjEtbl4hkeaAkEzdhP5nPbZ7aWNDNOJyJUmU1qCI0slu2 Y2UvKN3b4S9LZW8ZQxqbXzVC0KUDL08GT8vQSqc9tcC1FAYnDza7/Tb/WTHeZoot43 MwrvPBd0UY/5g== Subject: [PATCH v2 07/12] svcrdma: Fix retry loop in svc_rdma_send() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:15 -0500 Message-ID: <170708863598.28128.10828477777645028041.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Don't call ib_post_send() at all if the transport is already shutting down. Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index b6fc9299b472..0ee9185f5f3f 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -320,8 +320,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) * that these values remain available after the ib_post_send() call. * In some error flow cases, svc_rdma_wc_send() releases @ctxt. * - * Returns zero if the Send WR was posted successfully. Otherwise, a - * negative errno is returned. + * Return values: + * %0: @ctxt's WR chain was posted successfully + * %-ENOTCONN: The connection was lost */ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) { @@ -338,30 +339,35 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) DMA_TO_DEVICE); /* If the SQ is full, wait until an SQ entry is available */ - while (1) { + while (!test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) { if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { svc_rdma_wake_send_waiters(rdma, 1); + + /* When the transport is torn down, assume + * ib_drain_sq() will trigger enough Send + * completions to wake us. The XPT_CLOSE test + * above should then cause the while loop to + * exit. + */ percpu_counter_inc(&svcrdma_stat_sq_starve); trace_svcrdma_sq_full(rdma, &cid); wait_event(rdma->sc_send_wait, atomic_read(&rdma->sc_sq_avail) > 0); - if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) - return -ENOTCONN; trace_svcrdma_sq_retry(rdma, &cid); continue; } trace_svcrdma_post_send(ctxt); ret = ib_post_send(rdma->sc_qp, wr, NULL); - if (ret) + if (ret) { + trace_svcrdma_sq_post_err(rdma, &cid, ret); + svc_xprt_deferred_close(&rdma->sc_xprt); + svc_rdma_wake_send_waiters(rdma, 1); break; + } return 0; } - - trace_svcrdma_sq_post_err(rdma, &cid, ret); - svc_xprt_deferred_close(&rdma->sc_xprt); - svc_rdma_wake_send_waiters(rdma, 1); - return ret; + return -ENOTCONN; } /** From patchwork Sun Feb 4 23:17:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544893 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BFBCBE5A; Sun, 4 Feb 2024 23:17:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088643; cv=none; b=WwfrJVXoO9dby3VD1DcNnk1/61HWquh+ubSUm96ApC62N/ngXTFzcDuGkL31b2UqUkJwHPqfaL7+a1dvFjoi+TiM/EANxf1Pwoia2/sfIA2S9wHLJASz3k2Sss29YKHsZ/9d1X1mqchwWOTk44eVvFva/vCnQTE1qypIen6CA/Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088643; c=relaxed/simple; bh=rPH08YEEIQsOio03nQ7ZFxrHGUJoeYPgGbyedryQQy8=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eWdA0CiMZyppRkJEryy3RWeFJn5D10sYeOqBXve4eftSc/gEleKk+BeQ9/JieQQhqOlR1wGZW0nfay3nh1fpO93Kvyr2rGRTfellEtakoVvs9UEYZzeZ9l10ALO7C3oW1Ekxga2qB+aHha7paQZS/xmoxB/TNzPL/oslp6pI+QU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a7L4pjFW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a7L4pjFW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4780DC43390; Sun, 4 Feb 2024 23:17:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088643; bh=rPH08YEEIQsOio03nQ7ZFxrHGUJoeYPgGbyedryQQy8=; h=Subject:From:To:Date:In-Reply-To:References:From; b=a7L4pjFWARGDlbMwDPk0Isjj0xUzEKgWZp9hDiRlVG+nvp56TKOFhUGusv236jcQX 6z/xSQdP3v66czVARKDUkCdwzGZgigq5LxvDhdOH/eT9TCHttFyTPvodp9NoSwuGzv 9hDKZ8F/HdUlcoxhayykCDzQMSyW6F9bJ3H+lSJyMEycuUWoPg7kl7k3zxeO31l+kT kp1qjT2fBfHjkZm6Vxdv5FOuouClLAPzkRhiMbzoeayxwE5r1W9y7HD78lL7ynSYfk YlS1Mq6L2Y7vjwN5Ozzr2uvpQ5UV4rAHfQ6PT5cnkOcAeS3DBLSAMGYZldVwpWy8rS zAt/5Ugdi4AAg== Subject: [PATCH v2 08/12] svcrdma: Post Send WR chain From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:22 -0500 Message-ID: <170708864233.28128.8304695511753070556.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Eventually I'd like the server to post the reply's Send WR along with any Write WRs using only a single call to ib_post_send(), in order to reduce the NIC's doorbell rate. To do this, add an anchor for a WR chain to svc_rdma_send_ctxt, and refactor svc_rdma_send() to post this WR chain to the Send Queue. For the moment, the posted chain will continue to contain a single Send WR. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 6 ++- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 + net/sunrpc/xprtrdma/svc_rdma_sendto.c | 49 +++++++++++++++++++--------- 3 files changed, 38 insertions(+), 19 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index e7595ae62fe2..ee05087d6499 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -210,6 +210,8 @@ struct svc_rdma_send_ctxt { struct svcxprt_rdma *sc_rdma; struct ib_send_wr sc_send_wr; + struct ib_send_wr *sc_wr_chain; + int sc_sqecount; struct ib_cqe sc_cqe; struct xdr_buf sc_hdrbuf; struct xdr_stream sc_stream; @@ -258,8 +260,8 @@ extern struct svc_rdma_send_ctxt * svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma); extern void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt); -extern int svc_rdma_send(struct svcxprt_rdma *rdma, - struct svc_rdma_send_ctxt *ctxt); +extern int svc_rdma_post_send(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt); extern int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *sctxt, const struct svc_rdma_pcl *write_pcl, diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index c9be6778643b..e5a78b761012 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -90,7 +90,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, */ get_page(virt_to_page(rqst->rq_buffer)); sctxt->sc_send_wr.opcode = IB_WR_SEND; - return svc_rdma_send(rdma, sctxt); + return svc_rdma_post_send(rdma, sctxt); } /* Server-side transport endpoint wants a whole page for its send diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 0ee9185f5f3f..0f02fb09d5b0 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -208,6 +208,9 @@ struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma) ctxt->sc_send_wr.num_sge = 0; ctxt->sc_cur_sge_no = 0; ctxt->sc_page_count = 0; + ctxt->sc_wr_chain = &ctxt->sc_send_wr; + ctxt->sc_sqecount = 1; + return ctxt; out_empty: @@ -293,7 +296,7 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) struct svc_rdma_send_ctxt *ctxt = container_of(cqe, struct svc_rdma_send_ctxt, sc_cqe); - svc_rdma_wake_send_waiters(rdma, 1); + svc_rdma_wake_send_waiters(rdma, ctxt->sc_sqecount); if (unlikely(wc->status != IB_WC_SUCCESS)) goto flushed; @@ -312,36 +315,44 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc) } /** - * svc_rdma_send - Post a single Send WR - * @rdma: transport on which to post the WR - * @ctxt: send ctxt with a Send WR ready to post + * svc_rdma_post_send - Post a WR chain to the Send Queue + * @rdma: transport context + * @ctxt: WR chain to post * * Copy fields in @ctxt to stack variables in order to guarantee * that these values remain available after the ib_post_send() call. * In some error flow cases, svc_rdma_wc_send() releases @ctxt. * + * Note there is potential for starvation when the Send Queue is + * full because there is no order to when waiting threads are + * awoken. The transport is typically provisioned with a deep + * enough Send Queue that SQ exhaustion should be a rare event. + * * Return values: * %0: @ctxt's WR chain was posted successfully * %-ENOTCONN: The connection was lost */ -int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) +int svc_rdma_post_send(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt) { - struct ib_send_wr *wr = &ctxt->sc_send_wr; + struct ib_send_wr *first_wr = ctxt->sc_wr_chain; + struct ib_send_wr *send_wr = &ctxt->sc_send_wr; + const struct ib_send_wr *bad_wr = first_wr; struct rpc_rdma_cid cid = ctxt->sc_cid; - int ret; + int ret, sqecount = ctxt->sc_sqecount; might_sleep(); /* Sync the transport header buffer */ ib_dma_sync_single_for_device(rdma->sc_pd->device, - wr->sg_list[0].addr, - wr->sg_list[0].length, + send_wr->sg_list[0].addr, + send_wr->sg_list[0].length, DMA_TO_DEVICE); /* If the SQ is full, wait until an SQ entry is available */ while (!test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags)) { - if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) { - svc_rdma_wake_send_waiters(rdma, 1); + if (atomic_sub_return(sqecount, &rdma->sc_sq_avail) < 0) { + svc_rdma_wake_send_waiters(rdma, sqecount); /* When the transport is torn down, assume * ib_drain_sq() will trigger enough Send @@ -358,12 +369,18 @@ int svc_rdma_send(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt) } trace_svcrdma_post_send(ctxt); - ret = ib_post_send(rdma->sc_qp, wr, NULL); + ret = ib_post_send(rdma->sc_qp, first_wr, &bad_wr); if (ret) { trace_svcrdma_sq_post_err(rdma, &cid, ret); svc_xprt_deferred_close(&rdma->sc_xprt); - svc_rdma_wake_send_waiters(rdma, 1); - break; + + /* If even one WR was posted, there will be a + * Send completion that bumps sc_sq_avail. + */ + if (bad_wr == first_wr) { + svc_rdma_wake_send_waiters(rdma, sqecount); + break; + } } return 0; } @@ -884,7 +901,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, sctxt->sc_send_wr.opcode = IB_WR_SEND; } - return svc_rdma_send(rdma, sctxt); + return svc_rdma_post_send(rdma, sctxt); } /** @@ -948,7 +965,7 @@ void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, sctxt->sc_send_wr.num_sge = 1; sctxt->sc_send_wr.opcode = IB_WR_SEND; sctxt->sc_sges[0].length = sctxt->sc_hdrbuf.len; - if (svc_rdma_send(rdma, sctxt)) + if (svc_rdma_post_send(rdma, sctxt)) goto put_ctxt; return; From patchwork Sun Feb 4 23:17:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544894 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53984BE5A; Sun, 4 Feb 2024 23:17:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088650; cv=none; b=cCFV4Vo4bFaJ5lv0IzEwbddLRF6E5X+GFEv+I5tnCkwX2a0mtKvaWKt0IsW8Zrj0on1kmMc00QJmRfgb7t6upoxQ7wRxiTqPtVwAiRx0BBnCPyMURt6Nj6ahduIUZAU5tV1QYz9SX5nLHOyBktK9eZM8MzrPuCkObcyyb7tLzzk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088650; c=relaxed/simple; bh=fACfSyyk+VkhtS87Z3tlQNfR2YjBL1g1e2LN2YXkIm0=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K8J7B2XTFfTmDvyIBEUVlGOqs8UtHZrMoLfjh+R9sR3Udlr1EX/Ab+w6vBjSG8bjpphKFk059voga/FCaK9JXcdIF/CCOkij7eg3/Zc24CUgD2LdCfO6SgL0pORn+ls/rVQlUN3S+iq7yNkV246JhC2QqZjwKjjefdbQlWE1oyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SaGyc8Et; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SaGyc8Et" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9163EC433F1; Sun, 4 Feb 2024 23:17:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088649; bh=fACfSyyk+VkhtS87Z3tlQNfR2YjBL1g1e2LN2YXkIm0=; h=Subject:From:To:Date:In-Reply-To:References:From; b=SaGyc8EtK43fG/zkLUi5EKbu1IcxRI+PBNCw1BLLc9fCBHp+3mMb5AVvjzDncnAoj o9smOakPFALabWZkIZwEqa/5sJtnSKqhhSgjTmMCyyKCfX4M7LtelgnhPdRGAMkG5M p7e3Gk4KgRmVrKsyR8HYfXT1xtTG/wXXhWmMNiluEI+tG2tkrU3UxMD1DXgMybKuOS sdDp3zwtJr4elr97cMlv/1wTrOxVr8Sj6u4rCvECFl5egSpXr2uE+Min30WxUMS4fZ ujMczbKrQM3Y8WT2lDAxdkXpc2eEPOkxLYx4HcyrdOnnkUs42O2wA3GmNb2Cg1ds/F rAfEQjr81EYhw== Subject: [PATCH v2 09/12] svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:28 -0500 Message-ID: <170708864867.28128.3022494799132344584.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Since the RPC transaction's svc_rdma_send_ctxt will stay around for the duration of the RDMA Write operation, the write_info structure for the Reply chunk can reside in the request's svc_rdma_send_ctxt instead of being allocated separately. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 25 +++++++++ include/trace/events/rpcrdma.h | 4 + net/sunrpc/xprtrdma/svc_rdma_rw.c | 91 +++++++++++++++++++-------------- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 2 - 4 files changed, 82 insertions(+), 40 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index ee05087d6499..918cf4fda728 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -203,6 +203,29 @@ struct svc_rdma_recv_ctxt { struct page *rc_pages[RPCSVC_MAXPAGES]; }; +/* + * State for sending a Write chunk. + * - Tracks progress of writing one chunk over all its segments + * - Stores arguments for the SGL constructor functions + */ +struct svc_rdma_write_info { + struct svcxprt_rdma *wi_rdma; + + const struct svc_rdma_chunk *wi_chunk; + + /* write state of this chunk */ + unsigned int wi_seg_off; + unsigned int wi_seg_no; + + /* SGL constructor arguments */ + const struct xdr_buf *wi_xdr; + unsigned char *wi_base; + unsigned int wi_next_off; + + struct svc_rdma_chunk_ctxt wi_cc; + struct work_struct wi_work; +}; + struct svc_rdma_send_ctxt { struct llist_node sc_node; struct rpc_rdma_cid sc_cid; @@ -215,6 +238,7 @@ struct svc_rdma_send_ctxt { struct ib_cqe sc_cqe; struct xdr_buf sc_hdrbuf; struct xdr_stream sc_stream; + struct svc_rdma_write_info sc_reply_info; void *sc_xprt_buf; int sc_page_count; int sc_cur_sge_no; @@ -249,6 +273,7 @@ extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, const struct xdr_buf *xdr); extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_recv_ctxt *rctxt, + struct svc_rdma_send_ctxt *sctxt, const struct xdr_buf *xdr); extern int svc_rdma_process_read_list(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp, diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h index 110c1475c527..027ac3ab457d 100644 --- a/include/trace/events/rpcrdma.h +++ b/include/trace/events/rpcrdma.h @@ -2118,6 +2118,10 @@ DEFINE_SIMPLE_CID_EVENT(svcrdma_wc_write); DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_write_flush); DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_write_err); +DEFINE_SIMPLE_CID_EVENT(svcrdma_wc_reply); +DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_reply_flush); +DEFINE_SEND_FLUSH_EVENT(svcrdma_wc_reply_err); + TRACE_EVENT(svcrdma_qp_error, TP_PROTO( const struct ib_event *event, diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index c00fcce61d1e..2ca3c6311c5e 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -197,28 +197,6 @@ void svc_rdma_cc_release(struct svcxprt_rdma *rdma, llist_add_batch(first, last, &rdma->sc_rw_ctxts); } -/* State for sending a Write or Reply chunk. - * - Tracks progress of writing one chunk over all its segments - * - Stores arguments for the SGL constructor functions - */ -struct svc_rdma_write_info { - struct svcxprt_rdma *wi_rdma; - - const struct svc_rdma_chunk *wi_chunk; - - /* write state of this chunk */ - unsigned int wi_seg_off; - unsigned int wi_seg_no; - - /* SGL constructor arguments */ - const struct xdr_buf *wi_xdr; - unsigned char *wi_base; - unsigned int wi_next_off; - - struct svc_rdma_chunk_ctxt wi_cc; - struct work_struct wi_work; -}; - static struct svc_rdma_write_info * svc_rdma_write_info_alloc(struct svcxprt_rdma *rdma, const struct svc_rdma_chunk *chunk) @@ -252,6 +230,43 @@ static void svc_rdma_write_info_free(struct svc_rdma_write_info *info) queue_work(svcrdma_wq, &info->wi_work); } +static void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_chunk_ctxt *cc) +{ + svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); + svc_rdma_cc_release(rdma, cc, DMA_TO_DEVICE); +} + +/** + * svc_rdma_reply_done - Reply chunk Write completion handler + * @cq: controlling Completion Queue + * @wc: Work Completion report + * + * Pages under I/O are released by a subsequent Send completion. + */ +static void svc_rdma_reply_done(struct ib_cq *cq, struct ib_wc *wc) +{ + struct ib_cqe *cqe = wc->wr_cqe; + struct svc_rdma_chunk_ctxt *cc = + container_of(cqe, struct svc_rdma_chunk_ctxt, cc_cqe); + struct svcxprt_rdma *rdma = cq->cq_context; + + switch (wc->status) { + case IB_WC_SUCCESS: + trace_svcrdma_wc_reply(&cc->cc_cid); + svc_rdma_reply_chunk_release(rdma, cc); + return; + case IB_WC_WR_FLUSH_ERR: + trace_svcrdma_wc_reply_flush(wc, &cc->cc_cid); + break; + default: + trace_svcrdma_wc_reply_err(wc, &cc->cc_cid); + } + + svc_rdma_reply_chunk_release(rdma, cc); + svc_xprt_deferred_close(&rdma->sc_xprt); +} + /** * svc_rdma_write_done - Write chunk completion * @cq: controlling Completion Queue @@ -624,7 +639,8 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, /** * svc_rdma_send_reply_chunk - Write all segments in the Reply chunk * @rdma: controlling RDMA transport - * @rctxt: Write and Reply chunks from client + * @rctxt: Write and Reply chunks provisioned by the client + * @sctxt: Send WR resources * @xdr: xdr_buf containing an RPC Reply * * Returns a non-negative number of bytes the chunk consumed, or @@ -636,37 +652,34 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, */ int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_recv_ctxt *rctxt, + struct svc_rdma_send_ctxt *sctxt, const struct xdr_buf *xdr) { - struct svc_rdma_write_info *info; - struct svc_rdma_chunk_ctxt *cc; - struct svc_rdma_chunk *chunk; + struct svc_rdma_write_info *info = &sctxt->sc_reply_info; + struct svc_rdma_chunk_ctxt *cc = &info->wi_cc; int ret; - if (pcl_is_empty(&rctxt->rc_reply_pcl)) - return 0; + if (likely(pcl_is_empty(&rctxt->rc_reply_pcl))) + return 0; /* client provided no Reply chunk */ - chunk = pcl_first_chunk(&rctxt->rc_reply_pcl); - info = svc_rdma_write_info_alloc(rdma, chunk); - if (!info) - return -ENOMEM; - cc = &info->wi_cc; + info->wi_rdma = rdma; + info->wi_chunk = pcl_first_chunk(&rctxt->rc_reply_pcl); + info->wi_seg_off = 0; + info->wi_seg_no = 0; + svc_rdma_cc_init(rdma, &info->wi_cc); + info->wi_cc.cc_cqe.done = svc_rdma_reply_done; ret = pcl_process_nonpayloads(&rctxt->rc_write_pcl, xdr, svc_rdma_xb_write, info); if (ret < 0) - goto out_err; + return ret; trace_svcrdma_post_reply_chunk(&cc->cc_cid, cc->cc_sqecount); ret = svc_rdma_post_chunk_ctxt(rdma, cc); if (ret < 0) - goto out_err; + return ret; return xdr->len; - -out_err: - svc_rdma_write_info_free(info); - return ret; } /** diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 0f02fb09d5b0..d8e079be36e2 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -1012,7 +1012,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; - ret = svc_rdma_send_reply_chunk(rdma, rctxt, &rqstp->rq_res); + ret = svc_rdma_send_reply_chunk(rdma, rctxt, sctxt, &rqstp->rq_res); if (ret < 0) goto reply_chunk; rc_size = ret; From patchwork Sun Feb 4 23:17:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544895 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3700BE4E; Sun, 4 Feb 2024 23:17:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088656; cv=none; b=A/iT21G6tjRTgLmKSC603BNVK6QLD62YIDDspznbmQeBbm0eZzWac/JR0V6AL2DNDV5LOjO12xtG1awanNMsd+FMjliseCoyvUCzb3iwfFNYUdO1afx3nMCSCvv0gjEGuvlc6JIoUsKbdDR+p1huk5Jvc2dbDwgF7NNupobnM88= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088656; c=relaxed/simple; bh=4V5S/JIS1CnIKji7O7YFq/SoJn6dnb/EkM8BkRXCb1w=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QFXj/EcqkGtqP0b9j4HohJ5kGAo9L6jL4tsvlkfV3QOdyM0Ve9vcxnq8lh9+6u6aFoDlB2IB8tDph2ozXnD2RDawugHZqrsDijX70pjlZ7suzyCj2KtTbYEDD3j0hXo52LdIV8KNsg96bc3BDR1l4eQUMH998HsipFOHbm1O+8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mD32fxV9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mD32fxV9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0BB8C433C7; Sun, 4 Feb 2024 23:17:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088656; bh=4V5S/JIS1CnIKji7O7YFq/SoJn6dnb/EkM8BkRXCb1w=; h=Subject:From:To:Date:In-Reply-To:References:From; b=mD32fxV9IWxVF1iTZBIPh70isLNERx4GJy0VuaEkTCi/Xc/SWPvpvfGWE6XiIZ5Hw Co2jC0uMHOdMZKq8DfeNeDAIKxlncEOZTnOna1Ds//yovVibgGePUf3ChtaKf2sl8I feQucCyIOLL6uRrJKYhthgqfEKBxTnf7KUp6GWAuI2DP4eh8Tdm0S90eyRxM+nlaGi i4rZp5QJSl6eGVT2aVHws/3ClCHGh9Mcgm6dxI4MrCBvgA1wjpvRCDC/jxg7K+Yocy x5rsHJ5jmHZI9lP6A2iYbWJ7M85ShxLHP6DyDZlZ93uDFkbOcldHF4n7CsgnBsDvzM xLiA7+OZCgRhw== Subject: [PATCH v2 10/12] svcrdma: Post the Reply chunk and Send WR together From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:34 -0500 Message-ID: <170708865497.28128.14505363034918718792.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Reduce the doorbell and Send completion rates when sending RPC/RDMA replies that have Reply chunks. NFS READDIR procedures typically return their result in a Reply chunk, for example. Instead of calling ib_post_send() to post the Write WRs for the Reply chunk, and then calling it again to post the Send WR that conveys the transport header, chain the Write WRs to the Send WR and call ib_post_send() only once. Thanks to the Send Queue completion ordering rules, when the Send WR completes, that guarantees that Write WRs posted before it have also completed successfully. Thus all Write WRs for the Reply chunk can remain unsignaled. Instead of handling a Write completion and then a Send completion, only the Send completion is seen, and it handles clean up for both the Writes and the Send. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 13 +++++-- net/sunrpc/xprtrdma/svc_rdma_rw.c | 58 +++++++++++++++++++++------------ net/sunrpc/xprtrdma/svc_rdma_sendto.c | 34 +++++++++++-------- 3 files changed, 66 insertions(+), 39 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 918cf4fda728..ac882bd23ca2 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -262,19 +262,24 @@ extern void svc_rdma_release_ctxt(struct svc_xprt *xprt, void *ctxt); extern int svc_rdma_recvfrom(struct svc_rqst *); /* svc_rdma_rw.c */ +extern void svc_rdma_cc_init(struct svcxprt_rdma *rdma, + struct svc_rdma_chunk_ctxt *cc); extern void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma); extern void svc_rdma_cc_init(struct svcxprt_rdma *rdma, struct svc_rdma_chunk_ctxt *cc); extern void svc_rdma_cc_release(struct svcxprt_rdma *rdma, struct svc_rdma_chunk_ctxt *cc, enum dma_data_direction dir); +extern void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt); extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_chunk *chunk, const struct xdr_buf *xdr); -extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt, - const struct xdr_buf *xdr); +extern int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + const struct svc_rdma_pcl *reply_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr); extern int svc_rdma_process_read_list(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp, struct svc_rdma_recv_ctxt *head); diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 2ca3c6311c5e..2b25edc6c73c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -230,10 +230,18 @@ static void svc_rdma_write_info_free(struct svc_rdma_write_info *info) queue_work(svcrdma_wq, &info->wi_work); } -static void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, - struct svc_rdma_chunk_ctxt *cc) +/** + * svc_rdma_reply_chunk_release - Release Reply chunk I/O resources + * @rdma: controlling transport + * @ctxt: Send context that is being released + */ +void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt) { - svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); + struct svc_rdma_chunk_ctxt *cc = &ctxt->sc_reply_info.wi_cc; + + if (!cc->cc_sqecount) + return; svc_rdma_cc_release(rdma, cc, DMA_TO_DEVICE); } @@ -254,7 +262,6 @@ static void svc_rdma_reply_done(struct ib_cq *cq, struct ib_wc *wc) switch (wc->status) { case IB_WC_SUCCESS: trace_svcrdma_wc_reply(&cc->cc_cid); - svc_rdma_reply_chunk_release(rdma, cc); return; case IB_WC_WR_FLUSH_ERR: trace_svcrdma_wc_reply_flush(wc, &cc->cc_cid); @@ -263,7 +270,6 @@ static void svc_rdma_reply_done(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_reply_err(wc, &cc->cc_cid); } - svc_rdma_reply_chunk_release(rdma, cc); svc_xprt_deferred_close(&rdma->sc_xprt); } @@ -637,9 +643,10 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, } /** - * svc_rdma_send_reply_chunk - Write all segments in the Reply chunk + * svc_rdma_prepare_reply_chunk - Construct WR chain for writing the Reply chunk * @rdma: controlling RDMA transport - * @rctxt: Write and Reply chunks provisioned by the client + * @write_pcl: Write chunk list provided by client + * @reply_pcl: Reply chunk provided by client * @sctxt: Send WR resources * @xdr: xdr_buf containing an RPC Reply * @@ -650,35 +657,44 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, * %-ENOTCONN if posting failed (connection is lost), * %-EIO if rdma_rw initialization failed (DMA mapping, etc). */ -int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt, - const struct xdr_buf *xdr) +int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + const struct svc_rdma_pcl *reply_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr) { struct svc_rdma_write_info *info = &sctxt->sc_reply_info; struct svc_rdma_chunk_ctxt *cc = &info->wi_cc; + struct ib_send_wr *first_wr; + struct list_head *pos; + struct ib_cqe *cqe; int ret; - if (likely(pcl_is_empty(&rctxt->rc_reply_pcl))) - return 0; /* client provided no Reply chunk */ - info->wi_rdma = rdma; - info->wi_chunk = pcl_first_chunk(&rctxt->rc_reply_pcl); + info->wi_chunk = pcl_first_chunk(reply_pcl); info->wi_seg_off = 0; info->wi_seg_no = 0; - svc_rdma_cc_init(rdma, &info->wi_cc); info->wi_cc.cc_cqe.done = svc_rdma_reply_done; - ret = pcl_process_nonpayloads(&rctxt->rc_write_pcl, xdr, + ret = pcl_process_nonpayloads(write_pcl, xdr, svc_rdma_xb_write, info); if (ret < 0) return ret; - trace_svcrdma_post_reply_chunk(&cc->cc_cid, cc->cc_sqecount); - ret = svc_rdma_post_chunk_ctxt(rdma, cc); - if (ret < 0) - return ret; + first_wr = sctxt->sc_wr_chain; + cqe = &cc->cc_cqe; + list_for_each(pos, &cc->cc_rwctxts) { + struct svc_rdma_rw_ctxt *rwc; + rwc = list_entry(pos, struct svc_rdma_rw_ctxt, rw_list); + first_wr = rdma_rw_ctx_wrs(&rwc->rw_ctx, rdma->sc_qp, + rdma->sc_port_num, cqe, first_wr); + cqe = NULL; + } + sctxt->sc_wr_chain = first_wr; + sctxt->sc_sqecount += cc->cc_sqecount; + + trace_svcrdma_post_reply_chunk(&cc->cc_cid, cc->cc_sqecount); return xdr->len; } diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index d8e079be36e2..6dfd2232ce5b 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -205,6 +205,7 @@ struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma) xdr_init_encode(&ctxt->sc_stream, &ctxt->sc_hdrbuf, ctxt->sc_xprt_buf, NULL); + svc_rdma_cc_init(rdma, &ctxt->sc_reply_info.wi_cc); ctxt->sc_send_wr.num_sge = 0; ctxt->sc_cur_sge_no = 0; ctxt->sc_page_count = 0; @@ -226,6 +227,8 @@ static void svc_rdma_send_ctxt_release(struct svcxprt_rdma *rdma, struct ib_device *device = rdma->sc_cm_id->device; unsigned int i; + svc_rdma_reply_chunk_release(rdma, ctxt); + if (ctxt->sc_page_count) release_pages(ctxt->sc_pages, ctxt->sc_page_count); @@ -867,16 +870,10 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp, * in sc_sges[0], and the RPC xdr_buf is prepared in following sges. * * Depending on whether a Write list or Reply chunk is present, - * the server may send all, a portion of, or none of the xdr_buf. + * the server may Send all, a portion of, or none of the xdr_buf. * In the latter case, only the transport header (sc_sges[0]) is * transmitted. * - * RDMA Send is the last step of transmitting an RPC reply. Pages - * involved in the earlier RDMA Writes are here transferred out - * of the rqstp and into the sctxt's page array. These pages are - * DMA unmapped by each Write completion, but the subsequent Send - * completion finally releases these pages. - * * Assumptions: * - The Reply's transport header will never be larger than a page. */ @@ -885,6 +882,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, const struct svc_rdma_recv_ctxt *rctxt, struct svc_rqst *rqstp) { + struct ib_send_wr *send_wr = &sctxt->sc_send_wr; int ret; ret = svc_rdma_map_reply_msg(rdma, sctxt, &rctxt->rc_write_pcl, @@ -892,13 +890,16 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, if (ret < 0) return ret; + /* Transfer pages involved in RDMA Writes to the sctxt's + * page array. Completion handling releases these pages. + */ svc_rdma_save_io_pages(rqstp, sctxt); if (rctxt->rc_inv_rkey) { - sctxt->sc_send_wr.opcode = IB_WR_SEND_WITH_INV; - sctxt->sc_send_wr.ex.invalidate_rkey = rctxt->rc_inv_rkey; + send_wr->opcode = IB_WR_SEND_WITH_INV; + send_wr->ex.invalidate_rkey = rctxt->rc_inv_rkey; } else { - sctxt->sc_send_wr.opcode = IB_WR_SEND; + send_wr->opcode = IB_WR_SEND; } return svc_rdma_post_send(rdma, sctxt); @@ -1012,10 +1013,15 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; - ret = svc_rdma_send_reply_chunk(rdma, rctxt, sctxt, &rqstp->rq_res); - if (ret < 0) - goto reply_chunk; - rc_size = ret; + rc_size = 0; + if (!pcl_is_empty(&rctxt->rc_reply_pcl)) { + ret = svc_rdma_prepare_reply_chunk(rdma, &rctxt->rc_write_pcl, + &rctxt->rc_reply_pcl, sctxt, + &rqstp->rq_res); + if (ret < 0) + goto reply_chunk; + rc_size = ret; + } *p++ = *rdma_argp; *p++ = *(rdma_argp + 1); From patchwork Sun Feb 4 23:17:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544896 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06128BE58; Sun, 4 Feb 2024 23:17:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088663; cv=none; b=XKXik1BtBxOOXWNlFHgZPZurBtGHoROsGwFyofZXurytLpdD2ZIihMy12dO2yIaCv1SW5V0b0B6A1ujDEkogRPpQBzQvxw5Z5mvjrSHYYGRuhGGy6rWYt0l40dNt4kR1w34WWorES+3wCnyy8BtOofGCyUFyDa+TKvj8Xupe68c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088663; c=relaxed/simple; bh=L0X3dLyOPy67nt8GCLWy13o2002WEL1KVLp4Q4jt1ZU=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OL0bmSMf7yuEdIy6Nzkilyz9oqGwPr/ZK7P6yDbeogvmor69uXPVFWMZAKNAvoW3xArWZOFgVZZhC1drwCPw14AsZDfn3tjkzFSWLijUSYUtvPisKQDgTCDC/j9DbXqnAEsXKfL8wNugJWfI1YKx8a3Rnq8+cZbA/1ndSuUbmjw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OiDQVBon; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OiDQVBon" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 58B49C433F1; Sun, 4 Feb 2024 23:17:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088662; bh=L0X3dLyOPy67nt8GCLWy13o2002WEL1KVLp4Q4jt1ZU=; h=Subject:From:To:Date:In-Reply-To:References:From; b=OiDQVBonmoILjdvYTeiC/d2qZdKkSD2wYmiGUxFRV+7HXlN+XRPu13BPN87HBBOGb fn7KqzvdPaCfJBACDBm7bcza1FLFsNKvi/7Cd5fgUISyO5MTzo8r/6z6iKJxM0rqw/ zKXNHilS3ouFIVId2/GmV0X5Xw2v+DYHRvKQIXWlNUGfpUcStavsYH1DzPwOWrEsHS /xXzgflPZfHi790elyIRnwibSUylstzNx0E1+ZqNnxriA5Zuu2oFEIxGTOTbYeCYSh P1He3LHodUVjj8F8fv3z043+TnCaweF5oGPUNn57O51Qs+SXX7Akxlsae5FLkgHnhm 3nVik/x18yqTg== Subject: [PATCH v2 11/12] svcrdma: Post WRs for Write chunks in svc_rdma_sendto() From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:41 -0500 Message-ID: <170708866136.28128.13586205078240226182.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Refactor to eventually enable svcrdma to post the Write WRs for each RPC response using the same ib_post_send() as the Send WR (ie, as a single WR chain). svc_rdma_result_payload (originally svc_rdma_read_payload) was added so that the upper layer XDR encoder could identify a range of bytes to be possibly conveyed by RDMA (if a Write chunk was provided by the client). The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while XDR encoding replies") was to post as much of the result payload outside of svc_rdma_sendto() as possible because svc_rdma_sendto() used to be called with the xpt_mutex held. However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into socket xpo_sendto methods"), the xpt_mutex is no longer held when calling svc_rdma_sendto(). Thus, that benefit is no longer an issue. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 6 ++-- net/sunrpc/xprtrdma/svc_rdma_rw.c | 56 ++++++++++++++++++++++----------- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 30 ++++++------------ 3 files changed, 51 insertions(+), 41 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index ac882bd23ca2..d33bab33099a 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -272,9 +272,9 @@ extern void svc_rdma_cc_release(struct svcxprt_rdma *rdma, enum dma_data_direction dir); extern void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt); -extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_chunk *chunk, - const struct xdr_buf *xdr); +extern int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_recv_ctxt *rctxt, + const struct xdr_buf *xdr); extern int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_pcl *write_pcl, const struct svc_rdma_pcl *reply_pcl, diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 2b25edc6c73c..40797114d50a 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -601,47 +601,65 @@ static int svc_rdma_xb_write(const struct xdr_buf *xdr, void *data) return xdr->len; } -/** - * svc_rdma_send_write_chunk - Write all segments in a Write chunk - * @rdma: controlling RDMA transport - * @chunk: Write chunk provided by the client - * @xdr: xdr_buf containing the data payload - * - * Returns a non-negative number of bytes the chunk consumed, or - * %-E2BIG if the payload was larger than the Write chunk, - * %-EINVAL if client provided too many segments, - * %-ENOMEM if rdma_rw context pool was exhausted, - * %-ENOTCONN if posting failed (connection is lost), - * %-EIO if rdma_rw initialization failed (DMA mapping, etc). - */ -int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_chunk *chunk, - const struct xdr_buf *xdr) +static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, + const struct svc_rdma_chunk *chunk, + const struct xdr_buf *xdr) { struct svc_rdma_write_info *info; struct svc_rdma_chunk_ctxt *cc; + struct xdr_buf payload; int ret; + if (xdr_buf_subsegment(xdr, &payload, chunk->ch_position, + chunk->ch_payload_length)) + return -EMSGSIZE; + info = svc_rdma_write_info_alloc(rdma, chunk); if (!info) return -ENOMEM; cc = &info->wi_cc; - ret = svc_rdma_xb_write(xdr, info); - if (ret != xdr->len) + ret = svc_rdma_xb_write(&payload, info); + if (ret != payload.len) goto out_err; trace_svcrdma_post_write_chunk(&cc->cc_cid, cc->cc_sqecount); ret = svc_rdma_post_chunk_ctxt(rdma, cc); if (ret < 0) goto out_err; - return xdr->len; + return 0; out_err: svc_rdma_write_info_free(info); return ret; } +/** + * svc_rdma_send_write_list - Send all chunks on the Write list + * @rdma: controlling RDMA transport + * @rctxt: Write list provisioned by the client + * @xdr: xdr_buf containing an RPC Reply message + * + * Returns zero on success, or a negative errno if one or more + * Write chunks could not be sent. + */ +int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_recv_ctxt *rctxt, + const struct xdr_buf *xdr) +{ + struct svc_rdma_chunk *chunk; + int ret; + + pcl_for_each_chunk(chunk, &rctxt->rc_write_pcl) { + if (!chunk->ch_payload_length) + break; + ret = svc_rdma_send_write_chunk(rdma, chunk, xdr); + if (ret < 0) + return ret; + } + return 0; +} + /** * svc_rdma_prepare_reply_chunk - Construct WR chain for writing the Reply chunk * @rdma: controlling RDMA transport diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index 6dfd2232ce5b..bb5436b719e0 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -1013,6 +1013,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; + ret = svc_rdma_send_write_list(rdma, rctxt, &rqstp->rq_res); + if (ret < 0) + goto put_ctxt; + rc_size = 0; if (!pcl_is_empty(&rctxt->rc_reply_pcl)) { ret = svc_rdma_prepare_reply_chunk(rdma, &rctxt->rc_write_pcl, @@ -1064,45 +1068,33 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) /** * svc_rdma_result_payload - special processing for a result payload - * @rqstp: svc_rqst to operate on - * @offset: payload's byte offset in @xdr + * @rqstp: RPC transaction context + * @offset: payload's byte offset in @rqstp->rq_res * @length: size of payload, in bytes * + * Assign the passed-in result payload to the current Write chunk, + * and advance to cur_result_payload to the next Write chunk, if + * there is one. + * * Return values: * %0 if successful or nothing needed to be done - * %-EMSGSIZE on XDR buffer overflow * %-E2BIG if the payload was larger than the Write chunk - * %-EINVAL if client provided too many segments - * %-ENOMEM if rdma_rw context pool was exhausted - * %-ENOTCONN if posting failed (connection is lost) - * %-EIO if rdma_rw initialization failed (DMA mapping, etc) */ int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length) { struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt; struct svc_rdma_chunk *chunk; - struct svcxprt_rdma *rdma; - struct xdr_buf subbuf; - int ret; chunk = rctxt->rc_cur_result_payload; if (!length || !chunk) return 0; rctxt->rc_cur_result_payload = pcl_next_chunk(&rctxt->rc_write_pcl, chunk); + if (length > chunk->ch_length) return -E2BIG; - chunk->ch_position = offset; chunk->ch_payload_length = length; - - if (xdr_buf_subsegment(&rqstp->rq_res, &subbuf, offset, length)) - return -EMSGSIZE; - - rdma = container_of(rqstp->rq_xprt, struct svcxprt_rdma, sc_xprt); - ret = svc_rdma_send_write_chunk(rdma, chunk, &subbuf); - if (ret < 0) - return ret; return 0; } From patchwork Sun Feb 4 23:17:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13544897 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A66ABE58; Sun, 4 Feb 2024 23:17:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088669; cv=none; b=DmMrlRtTvCz4TNRJI4jS3U/1eLr4ETBBpJaXflImJPSRk9TvP74eUXIwrluhjbI6RjY/zuJDI34ndZwk7h4v5oRoG3C/GDTRs9egh/eev4D1wN1N3+fncRD7PZJWo/G6OViT870yl+/Uo6G5lk6vVWFpokEPzZ6808xAuen9YSo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707088669; c=relaxed/simple; bh=fp9acj4ZEwK1Xq922shpXg6CpeNaFsC5ZBGJptxgrSI=; h=Subject:From:To:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ar+LmMblstz94SilbJxPXRiSWppdx/8+2Tlk6nvkdti/3Y+cm72zZpuaQe8h8Hsy1kXtKr9vqlQsSN3qhaeqmX9s4XuLhPLF/872Z0pLWuxtRScFg73NhejeBLB/OMebkLd6YpvdRj3mApFiOEBjFjzgtdSRuNIqMVWbqr/K+B4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kVKgYlfO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kVKgYlfO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9EA1FC433C7; Sun, 4 Feb 2024 23:17:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707088668; bh=fp9acj4ZEwK1Xq922shpXg6CpeNaFsC5ZBGJptxgrSI=; h=Subject:From:To:Date:In-Reply-To:References:From; b=kVKgYlfOH3H8pGy/JciO7mEDiyvFgvBtogZFKPayJXY6S5JOQoG1XaGzTLcx5NbYt QStZ8RzKP/6Bb3pzDLABD1IKfJU7W2n/ScuX7Wa8l0CWKOjTBc0RRYEqrd+zHOHgPt m1c3y+ggNWw/cFujx+kAcsDTaMIWhFbadD09BAmZXAy6Z8aqOv5oJnY6UpkHpNzWro 7p9O7wBuoR2lAN3lOdVtS0pvPpPdT+yqpN81oLiavlC/uFudoBSXROW8nFkwO8zYOG YB0EaIyijK6eGxV8w6AL+ekzx5FLejLmIiBGRoG/4/2WnANgLji/DkCYEZ+5nnKV8B JIr0fBUXGC9cw== Subject: [PATCH v2 12/12] svcrdma: Add Write chunk WRs to the RPC's Send WR chain From: Chuck Lever To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Sun, 04 Feb 2024 18:17:47 -0500 Message-ID: <170708866774.28128.5360136470502302905.stgit@bazille.1015granger.net> In-Reply-To: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> References: <170708844422.28128.2979813721958631192.stgit@bazille.1015granger.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chuck Lever Chain RDMA Writes that convey Write chunks onto the local Send chain. This means all WRs for an RPC Reply are now posted with a single ib_post_send() call, and there is a single Send completion when all of these are done. That reduces both the per-transport doorbell rate and completion rate. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 13 ++++- net/sunrpc/xprtrdma/svc_rdma_rw.c | 86 +++++++++++++++++++++++++-------- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 5 ++ 3 files changed, 78 insertions(+), 26 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index d33bab33099a..24cd199dd6f3 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -210,6 +210,7 @@ struct svc_rdma_recv_ctxt { */ struct svc_rdma_write_info { struct svcxprt_rdma *wi_rdma; + struct list_head wi_list; const struct svc_rdma_chunk *wi_chunk; @@ -238,7 +239,10 @@ struct svc_rdma_send_ctxt { struct ib_cqe sc_cqe; struct xdr_buf sc_hdrbuf; struct xdr_stream sc_stream; + + struct list_head sc_write_info_list; struct svc_rdma_write_info sc_reply_info; + void *sc_xprt_buf; int sc_page_count; int sc_cur_sge_no; @@ -270,11 +274,14 @@ extern void svc_rdma_cc_init(struct svcxprt_rdma *rdma, extern void svc_rdma_cc_release(struct svcxprt_rdma *rdma, struct svc_rdma_chunk_ctxt *cc, enum dma_data_direction dir); +extern void svc_rdma_write_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt); extern void svc_rdma_reply_chunk_release(struct svcxprt_rdma *rdma, struct svc_rdma_send_ctxt *ctxt); -extern int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - const struct xdr_buf *xdr); +extern int svc_rdma_prepare_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr); extern int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, const struct svc_rdma_pcl *write_pcl, const struct svc_rdma_pcl *reply_pcl, diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 40797114d50a..f2a100c4c81f 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -230,6 +230,28 @@ static void svc_rdma_write_info_free(struct svc_rdma_write_info *info) queue_work(svcrdma_wq, &info->wi_work); } +/** + * svc_rdma_write_chunk_release - Release Write chunk I/O resources + * @rdma: controlling transport + * @ctxt: Send context that is being released + */ +void svc_rdma_write_chunk_release(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt) +{ + struct svc_rdma_write_info *info; + struct svc_rdma_chunk_ctxt *cc; + + while (!list_empty(&ctxt->sc_write_info_list)) { + info = list_first_entry(&ctxt->sc_write_info_list, + struct svc_rdma_write_info, wi_list); + list_del(&info->wi_list); + + cc = &info->wi_cc; + svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); + svc_rdma_write_info_free(info); + } +} + /** * svc_rdma_reply_chunk_release - Release Reply chunk I/O resources * @rdma: controlling transport @@ -286,13 +308,11 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) struct ib_cqe *cqe = wc->wr_cqe; struct svc_rdma_chunk_ctxt *cc = container_of(cqe, struct svc_rdma_chunk_ctxt, cc_cqe); - struct svc_rdma_write_info *info = - container_of(cc, struct svc_rdma_write_info, wi_cc); switch (wc->status) { case IB_WC_SUCCESS: trace_svcrdma_wc_write(&cc->cc_cid); - break; + return; case IB_WC_WR_FLUSH_ERR: trace_svcrdma_wc_write_flush(wc, &cc->cc_cid); break; @@ -300,12 +320,11 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) trace_svcrdma_wc_write_err(wc, &cc->cc_cid); } - svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount); - - if (unlikely(wc->status != IB_WC_SUCCESS)) - svc_xprt_deferred_close(&rdma->sc_xprt); - - svc_rdma_write_info_free(info); + /* The RDMA Write has flushed, so the client won't get + * some of the outgoing RPC message. Signal the loss + * to the client by closing the connection. + */ + svc_xprt_deferred_close(&rdma->sc_xprt); } /** @@ -601,13 +620,19 @@ static int svc_rdma_xb_write(const struct xdr_buf *xdr, void *data) return xdr->len; } -static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, - const struct svc_rdma_chunk *chunk, - const struct xdr_buf *xdr) +/* Link Write WRs for @chunk onto @sctxt's WR chain. + */ +static int svc_rdma_prepare_write_chunk(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *sctxt, + const struct svc_rdma_chunk *chunk, + const struct xdr_buf *xdr) { struct svc_rdma_write_info *info; struct svc_rdma_chunk_ctxt *cc; + struct ib_send_wr *first_wr; struct xdr_buf payload; + struct list_head *pos; + struct ib_cqe *cqe; int ret; if (xdr_buf_subsegment(xdr, &payload, chunk->ch_position, @@ -623,10 +648,25 @@ static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, if (ret != payload.len) goto out_err; - trace_svcrdma_post_write_chunk(&cc->cc_cid, cc->cc_sqecount); - ret = svc_rdma_post_chunk_ctxt(rdma, cc); - if (ret < 0) + ret = -EINVAL; + if (unlikely(cc->cc_sqecount > rdma->sc_sq_depth)) goto out_err; + + first_wr = sctxt->sc_wr_chain; + cqe = &cc->cc_cqe; + list_for_each(pos, &cc->cc_rwctxts) { + struct svc_rdma_rw_ctxt *rwc; + + rwc = list_entry(pos, struct svc_rdma_rw_ctxt, rw_list); + first_wr = rdma_rw_ctx_wrs(&rwc->rw_ctx, rdma->sc_qp, + rdma->sc_port_num, cqe, first_wr); + cqe = NULL; + } + sctxt->sc_wr_chain = first_wr; + sctxt->sc_sqecount += cc->cc_sqecount; + list_add(&info->wi_list, &sctxt->sc_write_info_list); + + trace_svcrdma_post_write_chunk(&cc->cc_cid, cc->cc_sqecount); return 0; out_err: @@ -635,25 +675,27 @@ static int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, } /** - * svc_rdma_send_write_list - Send all chunks on the Write list + * svc_rdma_prepare_write_list - Construct WR chain for sending Write list * @rdma: controlling RDMA transport - * @rctxt: Write list provisioned by the client + * @write_pcl: Write list provisioned by the client + * @sctxt: Send WR resources * @xdr: xdr_buf containing an RPC Reply message * * Returns zero on success, or a negative errno if one or more * Write chunks could not be sent. */ -int svc_rdma_send_write_list(struct svcxprt_rdma *rdma, - const struct svc_rdma_recv_ctxt *rctxt, - const struct xdr_buf *xdr) +int svc_rdma_prepare_write_list(struct svcxprt_rdma *rdma, + const struct svc_rdma_pcl *write_pcl, + struct svc_rdma_send_ctxt *sctxt, + const struct xdr_buf *xdr) { struct svc_rdma_chunk *chunk; int ret; - pcl_for_each_chunk(chunk, &rctxt->rc_write_pcl) { + pcl_for_each_chunk(chunk, write_pcl) { if (!chunk->ch_payload_length) break; - ret = svc_rdma_send_write_chunk(rdma, chunk, xdr); + ret = svc_rdma_prepare_write_chunk(rdma, sctxt, chunk, xdr); if (ret < 0) return ret; } diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index bb5436b719e0..dfca39abd16c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -142,6 +142,7 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma) ctxt->sc_send_wr.sg_list = ctxt->sc_sges; ctxt->sc_send_wr.send_flags = IB_SEND_SIGNALED; ctxt->sc_cqe.done = svc_rdma_wc_send; + INIT_LIST_HEAD(&ctxt->sc_write_info_list); ctxt->sc_xprt_buf = buffer; xdr_buf_init(&ctxt->sc_hdrbuf, ctxt->sc_xprt_buf, rdma->sc_max_req_size); @@ -227,6 +228,7 @@ static void svc_rdma_send_ctxt_release(struct svcxprt_rdma *rdma, struct ib_device *device = rdma->sc_cm_id->device; unsigned int i; + svc_rdma_write_chunk_release(rdma, ctxt); svc_rdma_reply_chunk_release(rdma, ctxt); if (ctxt->sc_page_count) @@ -1013,7 +1015,8 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) if (!p) goto put_ctxt; - ret = svc_rdma_send_write_list(rdma, rctxt, &rqstp->rq_res); + ret = svc_rdma_prepare_write_list(rdma, &rctxt->rc_write_pcl, sctxt, + &rqstp->rq_res); if (ret < 0) goto put_ctxt;