From patchwork Mon Jul 20 19:04:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever III X-Patchwork-Id: 6830011 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 1F7789F38B for ; Mon, 20 Jul 2015 19:04:16 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2102C203B0 for ; Mon, 20 Jul 2015 19:04:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1AB1D2038D for ; Mon, 20 Jul 2015 19:04:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755510AbbGTTEM (ORCPT ); Mon, 20 Jul 2015 15:04:12 -0400 Received: from mail-qk0-f180.google.com ([209.85.220.180]:36324 "EHLO mail-qk0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755295AbbGTTEM (ORCPT ); Mon, 20 Jul 2015 15:04:12 -0400 Received: by qkdv3 with SMTP id v3so118414290qkd.3; Mon, 20 Jul 2015 12:04:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=1/BGtO6zPR62Y1COYRYMahCFjMbnZVtUiuGMFK8GvB8=; b=LFyJAAe9S64RmXjqpXJTDS7jM3fbAU1flMIw+HlVwhyj/f6soupEeucyPKKJxUKs9N yDkm+keKr2ioIoW9eQU7dVfLtGZrsG9wyclH88B36XJE38YKc5zYc3+OsaEp0vZ8Wm4U A/Qvv4ZJWeITWrcKf5dD2DfA6m1M7hX576L6D5ry4peYfp6rQ6lfz+/096sbd+guQbnx LIbvsVUL/9GqijxxlNrKE4DGM4U5FQaM50stDCsCb4GVcUPd6sySQ7son2PQxU1OrnMY nTEdQO1nCqPIj7Fheb732hoyZPTR8cOD7KDDZqjkxZ4gwo7yQH81Lr4arwjlsEhcCzeO 0vYw== X-Received: by 10.140.95.5 with SMTP id h5mr12325718qge.68.1437419051304; Mon, 20 Jul 2015 12:04:11 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by smtp.gmail.com with ESMTPSA id 74sm903772qgo.13.2015.07.20.12.04.09 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Jul 2015 12:04:10 -0700 (PDT) Subject: [PATCH v3 11/15] xprtrdma: Fix XDR tail buffer marshalling From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Mon, 20 Jul 2015 15:04:08 -0400 Message-ID: <20150720190408.10997.72742.stgit@manet.1015granger.net> In-Reply-To: <20150720185624.10997.51574.stgit@manet.1015granger.net> References: <20150720185624.10997.51574.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-3-g7d0f MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by NFS (ie, read and write payloads) are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever Tested-by: Devesh Sharma --- net/sunrpc/xprtrdma/rpc_rdma.c | 44 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 62150ae..1dd48f2 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); } +static int +rpcrdma_tail_pullup(struct xdr_buf *buf) +{ + size_t tlen = buf->tail[0].iov_len; + size_t skip = tlen & 3; + + /* Do not include the tail if it is only an XDR pad */ + if (tlen < 4) + return 0; + + /* xdr_write_pages() adds a pad at the beginning of the tail + * if the content in "buf->pages" is unaligned. Force the + * tail's actual content to land at the next XDR position + * after the head instead. + */ + if (skip) { + unsigned char *src, *dst; + unsigned int count; + + src = buf->tail[0].iov_base; + dst = buf->head[0].iov_base; + dst += buf->head[0].iov_len; + + src += skip; + tlen -= skip; + + dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n", + __func__, skip, dst, src, tlen); + + for (count = tlen; count; count--) + *dst++ = *src++; + } + + return tlen; +} + /* * Chunk assembly from upper layer xdr_buf. * @@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, if (len && n == nsegs) return -EIO; + /* When encoding the read list, the tail is always sent inline */ + if (type == rpcrdma_readch) + return n; + if (xdrbuf->tail[0].iov_len) { /* the rpcrdma protocol allows us to omit any trailing * xdr pad bytes, saving the server an RDMA operation. */ @@ -476,8 +516,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) headerp->rm_body.rm_nochunks.rm_empty[2] = xdr_zero; /* new length after pullup */ rpclen = rqst->rq_svec[0].iov_len; - } - + } else if (rtype == rpcrdma_readch) + rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); if (rtype != rpcrdma_noch) { hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf, headerp, rtype);