From patchwork Mon Jul 13 16:31:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever III X-Patchwork-Id: 6780501 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 2C1579F38C for ; Mon, 13 Jul 2015 16:31:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 37ADE2054A for ; Mon, 13 Jul 2015 16:31:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 41DE92052A for ; Mon, 13 Jul 2015 16:31:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751842AbbGMQba (ORCPT ); Mon, 13 Jul 2015 12:31:30 -0400 Received: from mail-qk0-f170.google.com ([209.85.220.170]:33050 "EHLO mail-qk0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751492AbbGMQba (ORCPT ); Mon, 13 Jul 2015 12:31:30 -0400 Received: by qkdl129 with SMTP id l129so15305115qkd.0; Mon, 13 Jul 2015 09:31:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=9uIv6/CcgUDlpnqwp0dwYxMPmHKynRlUBdn+UdfuzRU=; b=Co1H2/N8CGgvl8Uq9+/g7rwJN6jEiGR79NVnY30ShImKdRl30r02ZLl5AYFyNk6Ejg pb5vYlNrtFmL+l1+45MW72sUoaGC0MS4whmRoqsI/ak+tdU1XSeYhjRH5IbJySOjJwVM 5FPORDUq3rvKxciuTZaJwqDTWB9qIlsbpm6fnhaF2Dy5hdt9NRrMmdhrNWc+EVHthCbs k4BlFaJtAB04y8Z/7CvWPpquUV2Ztv7S2gvRvjrmK3ZFVMJhYZT4yhV54QJaB6KFlGYj PiXtMBv9ezaCj5rv4o+QkgyshARlxe3eGGEVWcwCJkRnFBr2q4gmTAVOeIGUW6SNm72g W6kw== X-Received: by 10.55.23.150 with SMTP id 22mr38111395qkx.21.1436805089303; Mon, 13 Jul 2015 09:31:29 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by smtp.gmail.com with ESMTPSA id 16sm10496795qkt.27.2015.07.13.09.31.27 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Jul 2015 09:31:28 -0700 (PDT) Subject: [PATCH v2 11/14] xprtrdma: Fix XDR tail buffer marshalling From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Mon, 13 Jul 2015 12:31:26 -0400 Message-ID: <20150713163126.17630.4327.stgit@manet.1015granger.net> In-Reply-To: <20150713160617.17630.97475.stgit@manet.1015granger.net> References: <20150713160617.17630.97475.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-3-g7d0f MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by NFS (ie, read and write payloads) are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/rpc_rdma.c | 44 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 62150ae..1dd48f2 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); } +static int +rpcrdma_tail_pullup(struct xdr_buf *buf) +{ + size_t tlen = buf->tail[0].iov_len; + size_t skip = tlen & 3; + + /* Do not include the tail if it is only an XDR pad */ + if (tlen < 4) + return 0; + + /* xdr_write_pages() adds a pad at the beginning of the tail + * if the content in "buf->pages" is unaligned. Force the + * tail's actual content to land at the next XDR position + * after the head instead. + */ + if (skip) { + unsigned char *src, *dst; + unsigned int count; + + src = buf->tail[0].iov_base; + dst = buf->head[0].iov_base; + dst += buf->head[0].iov_len; + + src += skip; + tlen -= skip; + + dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n", + __func__, skip, dst, src, tlen); + + for (count = tlen; count; count--) + *dst++ = *src++; + } + + return tlen; +} + /* * Chunk assembly from upper layer xdr_buf. * @@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, if (len && n == nsegs) return -EIO; + /* When encoding the read list, the tail is always sent inline */ + if (type == rpcrdma_readch) + return n; + if (xdrbuf->tail[0].iov_len) { /* the rpcrdma protocol allows us to omit any trailing * xdr pad bytes, saving the server an RDMA operation. */ @@ -476,8 +516,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) headerp->rm_body.rm_nochunks.rm_empty[2] = xdr_zero; /* new length after pullup */ rpclen = rqst->rq_svec[0].iov_len; - } - + } else if (rtype == rpcrdma_readch) + rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); if (rtype != rpcrdma_noch) { hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf, headerp, rtype);