From patchwork Mon Aug 3 17:04:17 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever III X-Patchwork-Id: 6931511 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B3F479F38B for ; Mon, 3 Aug 2015 17:04:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B8ED1203A0 for ; Mon, 3 Aug 2015 17:04:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 09AEB20641 for ; Mon, 3 Aug 2015 17:04:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753936AbbHCREW (ORCPT ); Mon, 3 Aug 2015 13:04:22 -0400 Received: from mail-qg0-f47.google.com ([209.85.192.47]:34787 "EHLO mail-qg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753524AbbHCREU (ORCPT ); Mon, 3 Aug 2015 13:04:20 -0400 Received: by qgeu79 with SMTP id u79so93216328qge.1; Mon, 03 Aug 2015 10:04:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=aDym258Hm7/zxxHhOYycWbtfglwcbI+gzm/bAR6gWEQ=; b=00MwwjAwAB1xj5s7JmUF32yxcgqzdyCCMt1iNUU6auB4xiB2/hbHmwyBsDgdx8GQsI Esduw4ksQWNpVFSOTI3V0bCgeg9aKZ/zEjhwJkYIimF4Z/ez/UOltTmkBkWOtf1RYSTn qWwF/D/fmQGNMrAOh2vGK/RgDEYV+fNnTNC8MzwPBqpo9pdqFx3UtXojgDJjH/2yVCLT eVQD9VwA0tbp77vpLWpQ4NSLHpv9mLCx7pRKRZEIIm4kGISfPvkHfbTukHcLIQ+fVgqz 1m7SK+OqQg3Jgvn/3jRN3o+DhA2QyTcssOAyodxkQPQmP+qMAlEqYtZrX5rMrf9we/ep WFYQ== X-Received: by 10.140.165.5 with SMTP id l5mr26874141qhl.85.1438621459757; Mon, 03 Aug 2015 10:04:19 -0700 (PDT) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by smtp.gmail.com with ESMTPSA id j136sm6920990qhc.22.2015.08.03.10.04.18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Aug 2015 10:04:19 -0700 (PDT) Subject: [PATCH v4 11/16] xprtrdma: Fix XDR tail buffer marshalling From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Mon, 03 Aug 2015 13:04:17 -0400 Message-ID: <20150803170417.9115.23960.stgit@manet.1015granger.net> In-Reply-To: <20150803165807.9115.23842.stgit@manet.1015granger.net> References: <20150803165807.9115.23842.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-3-g7d0f MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. The client is sending the trailing GETATTR at the same Position as the preceding WRITE data payload. Whether or not RFC 5667 allows the GETATTR to appear in a read chunk, RFC 5666 requires that these two separate RPC arguments appear at two distinct Positions. It can also be argued that the GETATTR operation is not bulk data, and therefore RFC 5667 forbids its appearance in a read chunk at all. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by NFS (ie, read and write payloads) are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever Tested-by: Devesh Sharma --- net/sunrpc/xprtrdma/rpc_rdma.c | 44 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 62150ae..1dd48f2 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); } +static int +rpcrdma_tail_pullup(struct xdr_buf *buf) +{ + size_t tlen = buf->tail[0].iov_len; + size_t skip = tlen & 3; + + /* Do not include the tail if it is only an XDR pad */ + if (tlen < 4) + return 0; + + /* xdr_write_pages() adds a pad at the beginning of the tail + * if the content in "buf->pages" is unaligned. Force the + * tail's actual content to land at the next XDR position + * after the head instead. + */ + if (skip) { + unsigned char *src, *dst; + unsigned int count; + + src = buf->tail[0].iov_base; + dst = buf->head[0].iov_base; + dst += buf->head[0].iov_len; + + src += skip; + tlen -= skip; + + dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n", + __func__, skip, dst, src, tlen); + + for (count = tlen; count; count--) + *dst++ = *src++; + } + + return tlen; +} + /* * Chunk assembly from upper layer xdr_buf. * @@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, if (len && n == nsegs) return -EIO; + /* When encoding the read list, the tail is always sent inline */ + if (type == rpcrdma_readch) + return n; + if (xdrbuf->tail[0].iov_len) { /* the rpcrdma protocol allows us to omit any trailing * xdr pad bytes, saving the server an RDMA operation. */ @@ -476,8 +516,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) headerp->rm_body.rm_nochunks.rm_empty[2] = xdr_zero; /* new length after pullup */ rpclen = rqst->rq_svec[0].iov_len; - } - + } else if (rtype == rpcrdma_readch) + rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); if (rtype != rpcrdma_noch) { hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf, headerp, rtype);