From patchwork Mon Apr 25 19:21:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 8931431 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 65D9F9F65D for ; Mon, 25 Apr 2016 19:21:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4280A2013D for ; Mon, 25 Apr 2016 19:21:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1DB93201EC for ; Mon, 25 Apr 2016 19:21:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754766AbcDYTVQ (ORCPT ); Mon, 25 Apr 2016 15:21:16 -0400 Received: from mail-ig0-f196.google.com ([209.85.213.196]:33321 "EHLO mail-ig0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933412AbcDYTVO (ORCPT ); Mon, 25 Apr 2016 15:21:14 -0400 Received: by mail-ig0-f196.google.com with SMTP id g8so12201555igr.0; Mon, 25 Apr 2016 12:21:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=XHw0jENENheGnytoVhD9cK/2y1BsRBc1oJ5+NmNFWXk=; b=A1vddtwpu5jVMxh433ZScYKRIiNtTGlNW6qkY5B6wSd9DjX9OQtVEvPv9dV3k4XB30 zhllFZ6iY+7viyHwvo4rj0VLuSAdD6K5mgjC9+vHHQdCPiEnOEoFCDmMuYp+mY3THmkF Vu2f6dV9SjWlh/hfb3AjOMHDRDk2aMRd2yoAIkqIUHDC4QA6Vy2guoDaOhpOXu7Zpc1l yNRTO0R5I6p1GQ8BFe5gC/lvGFCpUd/hAgTSW/qd/2OI70SeAPTnKjB/S/+7v1VVc6qt RTDqifdLZJwLGVn0zH15JWFHxWH7TC4CARPWTRnS/B+3WE9cq8zrBa/KD8rGcB8Zllk7 p9aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=XHw0jENENheGnytoVhD9cK/2y1BsRBc1oJ5+NmNFWXk=; b=IFbLDhzDKmqV7LWLvUhPVcm6kQvvIlhCR+3D/r8Tue+5UvKUNYyEIJyToGfVCG/jlf nvZWbWyGwlKi8dA4yIurgkXSnS5Gz1ZOOFdn6B8WXyGyuqKGoRIB4ILnbkeqDX3QQGgD qo6zXqut5PD8CyC8zT8UNGlWa/EW4HHEf8xvOVTwzDUeRF4P8KshMuPvU8qHRDL/v+F/ BkK6wgoAmeb4WOrkcVhhmnTkmxWTaEjxLFaK/CxOensiarMzWxwsmNUnNi6SeutHsxjj OBrgc/9w8NrZJp2jLXCBrHqrCGBnJl7KawLKv2UjEWzCjtCc2kt2hhOq2R1rDHU00wqb vb6w== X-Gm-Message-State: AOPr4FUDeDP7Uvg1uRlNvRxbs6QuL9NGkPbATm7zYwsIduQ3Eg7zhpzSNeFP2KjNUwyPCw== X-Received: by 10.50.92.41 with SMTP id cj9mr12925688igb.22.1461612073624; Mon, 25 Apr 2016 12:21:13 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id k4sm3488615igf.11.2016.04.25.12.21.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Apr 2016 12:21:13 -0700 (PDT) Subject: [PATCH v2 03/18] xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Mon, 25 Apr 2016 15:21:12 -0400 Message-ID: <20160425192112.3566.30703.stgit@manet.1015granger.net> In-Reply-To: <20160425185956.3566.64142.stgit@manet.1015granger.net> References: <20160425185956.3566.64142.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Send buffer space is shared between the RPC-over-RDMA header and an RPC message. A large RPC-over-RDMA header means less space is available for the associated RPC message, which then has to be moved via an RDMA Read or Write. As more segments are added to the chunk lists, the header increases in size. Typical modern hardware needs only a few segments to convey the maximum payload size, but some devices and registration modes may need a lot of segments to convey data payload. Sometimes so many are needed that the remaining space in the Send buffer is not enough for the RPC message. Sending such a message usually fails. To ensure a transport can always make forward progress, cap the number of RDMA segments that are allowed in chunk lists. This prevents less-capable devices and memory registrations from consuming a large portion of the Send buffer by reducing the maximum data payload that can be conveyed with such devices. For now I choose an arbitrary maximum of 8 RDMA segments. This allows a maximum size RPC-over-RDMA header to fit nicely in the current 1024 byte inline threshold with over 700 bytes remaining for an inline RPC message. The current maximum data payload of NFS READ or WRITE requests is one megabyte. To convey that payload on a client with 4KB pages, each chunk segment would need to handle 32 or more data pages. This is well within the capabilities of FMR. For physical registration, the maximum payload size on platforms with 4KB pages is reduced to 32KB. For FRWR, a device's maximum page list depth would need to be at least 34 to support the maximum 1MB payload. A device with a smaller maximum page list depth means the maximum data payload is reduced when using that device. Signed-off-by: Chuck Lever Reviewed-by: Sagi Grimberg --- net/sunrpc/xprtrdma/fmr_ops.c | 2 +- net/sunrpc/xprtrdma/frwr_ops.c | 2 +- net/sunrpc/xprtrdma/physical_ops.c | 2 +- net/sunrpc/xprtrdma/verbs.c | 22 ---------------------- net/sunrpc/xprtrdma/xprt_rdma.h | 21 ++++++++++++++++++++- 5 files changed, 23 insertions(+), 26 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index b289e10..4aeb104 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -48,7 +48,7 @@ static size_t fmr_op_maxpages(struct rpcrdma_xprt *r_xprt) { return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, - rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES); + RPCRDMA_MAX_HDR_SEGS * RPCRDMA_MAX_FMR_SGES); } static int diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index c250924..2f37598 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -243,7 +243,7 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt) struct rpcrdma_ia *ia = &r_xprt->rx_ia; return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, - rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth); + RPCRDMA_MAX_HDR_SEGS * ia->ri_max_frmr_depth); } static void diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index 481b9b6..e16ed54 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -47,7 +47,7 @@ static size_t physical_op_maxpages(struct rpcrdma_xprt *r_xprt) { return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, - rpcrdma_max_segments(r_xprt)); + RPCRDMA_MAX_HDR_SEGS); } static int diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index f5ed9f9..9f8d6c1 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1271,25 +1271,3 @@ out_rc: rpcrdma_recv_buffer_put(rep); return rc; } - -/* How many chunk list items fit within our inline buffers? - */ -unsigned int -rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt) -{ - struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data; - int bytes, segments; - - bytes = min_t(unsigned int, cdata->inline_wsize, cdata->inline_rsize); - bytes -= RPCRDMA_HDRLEN_MIN; - if (bytes < sizeof(struct rpcrdma_segment) * 2) { - pr_warn("RPC: %s: inline threshold too small\n", - __func__); - return 0; - } - - segments = 1 << (fls(bytes / sizeof(struct rpcrdma_segment)) - 1); - dprintk("RPC: %s: max chunk list size = %d segments\n", - __func__, segments); - return segments; -} diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 7723e5f..0028748 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -144,6 +144,26 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb) #define RPCRDMA_DEF_GFP (GFP_NOIO | __GFP_NOWARN) +/* To ensure a transport can always make forward progress, + * the number of RDMA segments allowed in header chunk lists + * is capped at 8. This prevents less-capable devices and + * memory registrations from overrunning the Send buffer + * while building chunk lists. + * + * Elements of the Read list take up more room than the + * Write list or Reply chunk. 8 read segments means the Read + * list (or Write list or Reply chunk) cannot consume more + * than + * + * ((8 + 2) * read segment size) + 1 XDR words, or 244 bytes. + * + * And the fixed part of the header is another 24 bytes. + * + * The smallest inline threshold is 1024 bytes, ensuring that + * at least 750 bytes are available for RPC messages. + */ +#define RPCRDMA_MAX_HDR_SEGS (8) + /* * struct rpcrdma_rep -- this structure encapsulates state required to recv * and complete a reply, asychronously. It needs several pieces of @@ -456,7 +476,6 @@ struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *, void rpcrdma_free_regbuf(struct rpcrdma_ia *, struct rpcrdma_regbuf *); -unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *); int rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *, unsigned int); int frwr_alloc_recovery_wq(void);