From patchwork Thu Jul 9 20:42:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever III X-Patchwork-Id: 6759051 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 3DF4D9F319 for ; Thu, 9 Jul 2015 20:43:15 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 41D1C2053A for ; Thu, 9 Jul 2015 20:43:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 40C152052D for ; Thu, 9 Jul 2015 20:43:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754014AbbGIUnM (ORCPT ); Thu, 9 Jul 2015 16:43:12 -0400 Received: from mail-qk0-f173.google.com ([209.85.220.173]:36309 "EHLO mail-qk0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753616AbbGIUm7 (ORCPT ); Thu, 9 Jul 2015 16:42:59 -0400 Received: by qkei195 with SMTP id i195so194098989qke.3; Thu, 09 Jul 2015 13:42:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=qbs7HLnBxMmpqWcITzQHmRf/j6k4loGkcFUeReeqIaY=; b=YPJawADIuaGZ/GOSkFyEaugyrh/bZ7S4FtBHlsjVc+BrR+Zb+IT0O4oi2NkJ5drigO A8Od7qbtH1xeH1Mxnk/uvc3pEbYcGyg+A0y/VILGEJZ5G7xPGbZcHedXKBWLTXX5rPZ0 WAUQWz6QuSRHFXqQ+pGEu5wqiS9A49EZ3rfdbV66IYM9yPVCsgsc7XiWM2pXfse4Kc5/ DZCdmUD36rEHBEOvpTcZAANIFNv1JHqH2NleFG73HyU7ZEYpzt+lv+ON6Wlz/silWa6q /vkNVpPE3UD1RCjiGuFK7cxx6GiwUzVN48XF2YUXVhw531jMO2U4WylN1gfs9hPdl0na l/zg== X-Received: by 10.140.102.180 with SMTP id w49mr27309790qge.82.1436474578650; Thu, 09 Jul 2015 13:42:58 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by smtp.gmail.com with ESMTPSA id 188sm4201686qhh.48.2015.07.09.13.42.57 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Jul 2015 13:42:58 -0700 (PDT) Subject: [PATCH v1 08/12] xprtrdma: Fix XDR tail buffer marshalling From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Thu, 09 Jul 2015 16:42:56 -0400 Message-ID: <20150709204256.26247.14336.stgit@manet.1015granger.net> In-Reply-To: <20150709203242.26247.4848.stgit@manet.1015granger.net> References: <20150709203242.26247.4848.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-3-g7d0f MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by the NFS client are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/rpc_rdma.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 8ac1448c..cb05233 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); } +static int +rpcrdma_tail_pullup(struct xdr_buf *buf) +{ + size_t tlen = buf->tail[0].iov_len; + size_t skip = tlen & 3; + + /* Do not include the tail if it is only an XDR pad */ + if (tlen < 4) + return 0; + + /* xdr_write_pages() adds a pad at the beginning of the tail + * if the content in "buf->pages" is unaligned. Force the + * tail's actual content to land at the next XDR position + * after the head instead. + */ + if (skip) { + unsigned char *src, *dst; + unsigned int count; + + src = buf->tail[0].iov_base; + dst = buf->head[0].iov_base; + dst += buf->head[0].iov_len; + + src += skip; + tlen -= skip; + + dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n", + __func__, skip, dst, src, tlen); + + for (count = tlen; count; count--) + *dst++ = *src++; + } + + return tlen; +} + /* * Chunk assembly from upper layer xdr_buf. * @@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, if (len && n == nsegs) return -EIO; + /* When encoding the read list, the tail is always sent inline */ + if (type == rpcrdma_readch) + return n; + if (xdrbuf->tail[0].iov_len) { /* the rpcrdma protocol allows us to omit any trailing * xdr pad bytes, saving the server an RDMA operation. */ @@ -504,7 +544,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) /* new length after pullup */ rpclen = rqst->rq_svec[0].iov_len; } - } + } else if (rtype == rpcrdma_readch) + rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); if (rtype != rpcrdma_noch) { hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,