From patchwork Tue Aug 31 19:05:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 12467681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9BA1C4320E for ; Tue, 31 Aug 2021 19:05:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8C21961041 for ; Tue, 31 Aug 2021 19:05:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240233AbhHaTGO (ORCPT ); Tue, 31 Aug 2021 15:06:14 -0400 Received: from mail.kernel.org ([198.145.29.99]:34616 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240182AbhHaTGM (ORCPT ); Tue, 31 Aug 2021 15:06:12 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 99E3061041; Tue, 31 Aug 2021 19:05:16 +0000 (UTC) Subject: [PATCH RFC 1/6] SUNRPC: Capture value of xdr_buf::page_base From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Tue, 31 Aug 2021 15:05:15 -0400 Message-ID: <163043671590.1415.2588181016845773403.stgit@klimt.1015granger.net> In-Reply-To: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> References: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This value will be non-zero more often, after subsequent patches are applied. Knowing its value can be important diagnostic information. Signed-off-by: Chuck Lever --- include/trace/events/sunrpc.h | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index d323f5a049c8..f13de06b7c1d 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -62,6 +62,7 @@ DECLARE_EVENT_CLASS(rpc_xdr_buf_class, __field(size_t, head_len) __field(const void *, tail_base) __field(size_t, tail_len) + __field(unsigned int, page_base) __field(unsigned int, page_len) __field(unsigned int, msg_len) ), @@ -74,14 +75,17 @@ DECLARE_EVENT_CLASS(rpc_xdr_buf_class, __entry->head_len = xdr->head[0].iov_len; __entry->tail_base = xdr->tail[0].iov_base; __entry->tail_len = xdr->tail[0].iov_len; + __entry->page_base = xdr->page_base; __entry->page_len = xdr->page_len; __entry->msg_len = xdr->len; ), - TP_printk("task:%u@%u head=[%p,%zu] page=%u tail=[%p,%zu] len=%u", + TP_printk("task:%u@%u head=[%p,%zu] page=%u(%u) tail=[%p,%zu] len=%u", __entry->task_id, __entry->client_id, - __entry->head_base, __entry->head_len, __entry->page_len, - __entry->tail_base, __entry->tail_len, __entry->msg_len + __entry->head_base, __entry->head_len, + __entry->page_len, __entry->page_base, + __entry->tail_base, __entry->tail_len, + __entry->msg_len ) ); @@ -1525,6 +1529,7 @@ DECLARE_EVENT_CLASS(svc_xdr_buf_class, __field(size_t, head_len) __field(const void *, tail_base) __field(size_t, tail_len) + __field(unsigned int, page_base) __field(unsigned int, page_len) __field(unsigned int, msg_len) ), @@ -1535,14 +1540,17 @@ DECLARE_EVENT_CLASS(svc_xdr_buf_class, __entry->head_len = xdr->head[0].iov_len; __entry->tail_base = xdr->tail[0].iov_base; __entry->tail_len = xdr->tail[0].iov_len; + __entry->page_base = xdr->page_base; __entry->page_len = xdr->page_len; __entry->msg_len = xdr->len; ), - TP_printk("xid=0x%08x head=[%p,%zu] page=%u tail=[%p,%zu] len=%u", + TP_printk("xid=0x%08x head=[%p,%zu] page=%u(%u) tail=[%p,%zu] len=%u", __entry->xid, - __entry->head_base, __entry->head_len, __entry->page_len, - __entry->tail_base, __entry->tail_len, __entry->msg_len + __entry->head_base, __entry->head_len, + __entry->page_len, __entry->page_base, + __entry->tail_base, __entry->tail_len, + __entry->msg_len ) ); From patchwork Tue Aug 31 19:05:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 12467683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B943C432BE for ; Tue, 31 Aug 2021 19:05:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 13C576108B for ; Tue, 31 Aug 2021 19:05:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240031AbhHaTGX (ORCPT ); Tue, 31 Aug 2021 15:06:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:34666 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240163AbhHaTGS (ORCPT ); Tue, 31 Aug 2021 15:06:18 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id B157161041; Tue, 31 Aug 2021 19:05:22 +0000 (UTC) Subject: [PATCH RFC 2/6] SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Tue, 31 Aug 2021 15:05:22 -0400 Message-ID: <163043672202.1415.7895822540426489041.stgit@klimt.1015granger.net> In-Reply-To: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> References: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org xdr_stream_subsegment() was introduced in commit c1346a1216ab ("NFSD: Replace the internals of the READ_BUF() macro"). There are two call sites for xdr_stream_subsegment(). One is nfsd4_decode_write(), and the other is nfsd4_decode_setxattr(). Currently neither of these call sites calls this API when xdr_buf::page_base is a non-zero value. However, I'm about to add a case where page_base will sometimes not be zero when nfsd4_decode_write() invokes this API. Replace the logic in xdr_stream_subsegment() that advances to the next data item in the xdr_stream with something more generic in order to handle this new use case. Signed-off-by: Chuck Lever --- net/sunrpc/xdr.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index ca10ba2626f2..df194cc07035 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -1633,7 +1633,7 @@ EXPORT_SYMBOL_GPL(xdr_buf_subsegment); * Sets up @subbuf to represent a portion of @xdr. The portion * starts at the current offset in @xdr, and extends for a length * of @nbytes. If this is successful, @xdr is advanced to the next - * position following that portion. + * XDR data item following that portion. * * Return values: * %true: @subbuf has been initialized, and @xdr has been advanced. @@ -1642,29 +1642,31 @@ EXPORT_SYMBOL_GPL(xdr_buf_subsegment); bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, unsigned int nbytes) { - unsigned int remaining, offset, len; + unsigned int start = xdr_stream_pos(xdr); + unsigned int remaining, len; - if (xdr_buf_subsegment(xdr->buf, subbuf, xdr_stream_pos(xdr), nbytes)) + /* Extract @subbuf and bounds-check the fn arguments */ + if (xdr_buf_subsegment(xdr->buf, subbuf, start, nbytes)) return false; - if (subbuf->head[0].iov_len) - if (!__xdr_inline_decode(xdr, subbuf->head[0].iov_len)) - return false; - - remaining = subbuf->page_len; - offset = subbuf->page_base; - while (remaining) { - len = min_t(unsigned int, remaining, PAGE_SIZE) - offset; - + /* Advance @xdr by @nbytes */ + for (remaining = nbytes; remaining;) { if (xdr->p == xdr->end && !xdr_set_next_buffer(xdr)) return false; - if (!__xdr_inline_decode(xdr, len)) - return false; + len = (char *)xdr->end - (char *)xdr->p; + if (remaining <= len) { + xdr->p = (__be32 *)((char *)xdr->p + + (remaining + xdr_pad_size(nbytes))); + break; + } + + xdr->p = (__be32 *)((char *)xdr->p + len); + xdr->end = xdr->p; remaining -= len; - offset = 0; } + xdr_stream_set_pos(xdr, start + nbytes); return true; } EXPORT_SYMBOL_GPL(xdr_stream_subsegment); From patchwork Tue Aug 31 19:05:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 12467685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC842C4320E for ; Tue, 31 Aug 2021 19:05:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B83D361074 for ; Tue, 31 Aug 2021 19:05:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239966AbhHaTGZ (ORCPT ); Tue, 31 Aug 2021 15:06:25 -0400 Received: from mail.kernel.org ([198.145.29.99]:34718 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240212AbhHaTGY (ORCPT ); Tue, 31 Aug 2021 15:06:24 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id CA99661041; Tue, 31 Aug 2021 19:05:28 +0000 (UTC) Subject: [PATCH RFC 3/6] NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment() From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Tue, 31 Aug 2021 15:05:28 -0400 Message-ID: <163043672812.1415.17532727182816586412.stgit@klimt.1015granger.net> In-Reply-To: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> References: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Refactor. Now that the NFSv2 and NFSv3 XDR decoders have been converted to use xdr_streams, the WRITE decoder functions can use xdr_stream_subsegment() to extract the WRITE payload into its own xdr_buf, just as the NFSv4 WRITE XDR decoder currently does. That makes it possible to pass the first kvec, pages array + length, page_base, and total payload length via a single function parameter. The payload's page_base is not yet assigned or used, but will be in subsequent patches. Signed-off-by: Chuck Lever --- fs/nfsd/nfs3proc.c | 3 +-- fs/nfsd/nfs3xdr.c | 12 ++---------- fs/nfsd/nfs4proc.c | 3 +-- fs/nfsd/nfsproc.c | 3 +-- fs/nfsd/nfsxdr.c | 9 +-------- fs/nfsd/xdr.h | 2 +- fs/nfsd/xdr3.h | 2 +- include/linux/sunrpc/svc.h | 3 +-- net/sunrpc/svc.c | 11 ++++++----- 9 files changed, 15 insertions(+), 33 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 17715a6c7a40..4418517f6f12 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -201,8 +201,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->committed = argp->stable; - nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, - &argp->first, cnt); + nvecs = svc_fill_write_vector(rqstp, &argp->payload); if (!nvecs) { resp->status = nfserr_io; goto out; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0a5ebc52e6a9..91d4e2b8b854 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -621,9 +621,6 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp); - struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; - size_t remaining; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) return 0; @@ -641,17 +638,12 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) /* request sanity */ if (args->count != args->len) return 0; - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->len)) - return 0; if (args->count > max_blocksize) { args->count = max_blocksize; args->len = max_blocksize; } - - args->first.iov_base = xdr->p; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) + return 0; return 1; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 486c5dba4b65..55c9551fa74e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1033,8 +1033,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_how_written = write->wr_stable_how; - nvecs = svc_fill_write_vector(rqstp, write->wr_payload.pages, - write->wr_payload.head, write->wr_buflen); + nvecs = svc_fill_write_vector(rqstp, &write->wr_payload); WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec)); status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 90fcd6178823..eea5b59b6a6c 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -234,8 +234,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->len, argp->offset); - nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, - &argp->first, cnt); + nvecs = svc_fill_write_vector(rqstp, &argp->payload); if (!nvecs) { resp->status = nfserr_io; goto out; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index a06c05fe3b42..26a42f87c240 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -325,10 +325,7 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; u32 beginoffset, totalcount; - size_t remaining; if (!svcxdr_decode_fhandle(xdr, &args->fh)) return 0; @@ -346,12 +343,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) return 0; if (args->len > NFSSVC_MAXBLKSIZE_V2) return 0; - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->len)) + if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) return 0; - args->first.iov_base = xdr->p; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index f45b4bc93f52..80fd6d7f3404 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -33,7 +33,7 @@ struct nfsd_writeargs { svc_fh fh; __u32 offset; int len; - struct kvec first; + struct xdr_buf payload; }; struct nfsd_createargs { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 933008382bbe..712c117300cb 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -40,7 +40,7 @@ struct nfsd3_writeargs { __u32 count; int stable; __u32 len; - struct kvec first; + struct xdr_buf payload; }; struct nfsd3_createargs { diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f0f846fa396e..5a277acf2667 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -532,8 +532,7 @@ int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length); unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, - struct page **pages, - struct kvec *first, size_t total); + struct xdr_buf *payload); char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index bfcbaf7b3822..34ced7f538ee 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1693,16 +1693,17 @@ EXPORT_SYMBOL_GPL(svc_encode_result_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call * @rqstp: svc_rqst to operate on - * @pages: list of pages containing data payload - * @first: buffer containing first section of write payload - * @total: total number of bytes of write payload + * @payload: xdr_buf containing only the write data payload * * Fills in rqstp::rq_vec, and returns the number of elements. */ -unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct page **pages, - struct kvec *first, size_t total) +unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, + struct xdr_buf *payload) { + struct page **pages = payload->pages; + struct kvec *first = payload->head; struct kvec *vec = rqstp->rq_vec; + size_t total = payload->len; unsigned int i; /* Some types of transport can present the write payload From patchwork Tue Aug 31 19:05:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 12467687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E94E6C432BE for ; Tue, 31 Aug 2021 19:05:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D560461041 for ; Tue, 31 Aug 2021 19:05:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230320AbhHaTGb (ORCPT ); Tue, 31 Aug 2021 15:06:31 -0400 Received: from mail.kernel.org ([198.145.29.99]:34756 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239988AbhHaTGa (ORCPT ); Tue, 31 Aug 2021 15:06:30 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E201A61041; Tue, 31 Aug 2021 19:05:34 +0000 (UTC) Subject: [PATCH RFC 4/6] SUNRPC: svc_fill_write_vector() must handle non-zero page_bases From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Tue, 31 Aug 2021 15:05:34 -0400 Message-ID: <163043673422.1415.7454603360299190766.stgit@klimt.1015granger.net> In-Reply-To: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> References: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org svc_fill_write_vector() was introduced in commit 8154ef2776aa ("NFSD: Clean up legacy NFS WRITE argument XDR decoders"). I'm about to add a case where page_base will sometimes not be zero when the NFS WRITE proc functions invoke this API. Refactor the API and function internals to handle this case properly. Signed-off-by: Chuck Lever --- net/sunrpc/svc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 34ced7f538ee..3d9f9da98aed 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1705,6 +1705,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct kvec *vec = rqstp->rq_vec; size_t total = payload->len; unsigned int i; + size_t offset; /* Some types of transport can present the write payload * entirely in rq_arg.pages. In this case, @first is empty. @@ -1717,12 +1718,14 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, ++i; } + offset = payload->page_base; while (total) { - vec[i].iov_base = page_address(*pages); - vec[i].iov_len = min_t(size_t, total, PAGE_SIZE); + vec[i].iov_base = page_address(*pages) + offset; + vec[i].iov_len = min_t(size_t, total, PAGE_SIZE - offset); total -= vec[i].iov_len; ++i; ++pages; + offset = 0; } WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec)); From patchwork Tue Aug 31 19:05:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 12467689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D026C432BE for ; Tue, 31 Aug 2021 19:05:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1364761074 for ; Tue, 31 Aug 2021 19:05:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239947AbhHaTGh (ORCPT ); Tue, 31 Aug 2021 15:06:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:34792 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239852AbhHaTGg (ORCPT ); Tue, 31 Aug 2021 15:06:36 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0578961074; Tue, 31 Aug 2021 19:05:40 +0000 (UTC) Subject: [PATCH RFC 5/6] NFSD: Add a transport hook for pulling argument payloads From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Tue, 31 Aug 2021 15:05:40 -0400 Message-ID: <163043674032.1415.4517139610988941857.stgit@klimt.1015granger.net> In-Reply-To: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> References: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Refactor. The new hook is a no-op at the moment, but it will soon be used to pull RDMA Read chunks such that the payload is aligned to the pages in the target file's page cache. Signed-off-by: Chuck Lever --- fs/nfsd/nfs3xdr.c | 4 ++++ fs/nfsd/nfs4xdr.c | 6 ++++++ fs/nfsd/nfsxdr.c | 4 ++++ include/linux/sunrpc/svc.h | 3 +++ include/linux/sunrpc/svc_rdma.h | 2 ++ include/linux/sunrpc/svc_xprt.h | 3 +++ net/sunrpc/svc.c | 20 ++++++++++++++++++++ net/sunrpc/svcsock.c | 8 ++++++++ net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 22 ++++++++++++++++++++++ net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 + 10 files changed, 73 insertions(+) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 91d4e2b8b854..1bb61a7d125a 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -642,6 +642,8 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) args->count = max_blocksize; args->len = max_blocksize; } + if (svc_decode_argument_payload(rqstp, args->offset, args->len) < 0) + return 0; if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) return 0; @@ -705,6 +707,8 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) remaining -= xdr_stream_pos(xdr); if (remaining < xdr_align_size(args->tlen)) return 0; + if (svc_decode_argument_payload(rqstp, 0, args->tlen) < 0) + return 0; args->first.iov_base = xdr->p; args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7abeccb975b2..e919f409cb9c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -813,6 +813,9 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create p = xdr_inline_decode(argp->xdr, create->cr_datalen); if (!p) return nfserr_bad_xdr; + if (svc_decode_argument_payload(argp->rqstp, 0, + create->cr_datalen) < 0) + return nfserr_bad_xdr; create->cr_data = svcxdr_dupstr(argp, p, create->cr_datalen); if (!create->cr_data) return nfserr_jukebox; @@ -1410,6 +1413,9 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &write->wr_buflen) < 0) return nfserr_bad_xdr; + if (svc_decode_argument_payload(argp->rqstp, write->wr_offset, + write->wr_buflen) < 0) + return nfserr_bad_xdr; if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) return nfserr_bad_xdr; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 26a42f87c240..7c15f31f91da 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -343,6 +343,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) return 0; if (args->len > NFSSVC_MAXBLKSIZE_V2) return 0; + if (svc_decode_argument_payload(rqstp, args->offset, args->len) < 0) + return 0; if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) return 0; @@ -397,6 +399,8 @@ nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) if (args->tlen == 0) return 0; + if (svc_decode_argument_payload(rqstp, 0, args->tlen) < 0) + return 0; args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); args->first.iov_base = xdr_inline_decode(xdr, args->tlen); if (!args->first.iov_base) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 5a277acf2667..59d5016d9ec4 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -528,6 +528,9 @@ void svc_reserve(struct svc_rqst *rqstp, int space); struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); char * svc_print_addr(struct svc_rqst *, char *, size_t); const char * svc_proc_name(const struct svc_rqst *rqstp); +int svc_decode_argument_payload(struct svc_rqst *rqstp, + unsigned int offset, + unsigned int length); int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length); diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 24aa159d29a7..f660244cc8ba 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -178,6 +178,8 @@ extern void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma, extern void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma); extern void svc_rdma_release_rqst(struct svc_rqst *rqstp); extern int svc_rdma_recvfrom(struct svc_rqst *); +extern int svc_rdma_argument_payload(struct svc_rqst *rqstp, + unsigned int offset, unsigned int length); /* svc_rdma_rw.c */ extern void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma); diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 571f605bc91e..2d4c61f3307e 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -20,6 +20,9 @@ struct svc_xprt_ops { struct svc_xprt *(*xpo_accept)(struct svc_xprt *); int (*xpo_has_wspace)(struct svc_xprt *); int (*xpo_recvfrom)(struct svc_rqst *); + int (*xpo_argument_payload)(struct svc_rqst *rqstp, + unsigned int offset, + unsigned int length); int (*xpo_sendto)(struct svc_rqst *); int (*xpo_result_payload)(struct svc_rqst *, unsigned int, unsigned int); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 3d9f9da98aed..6eca2f420371 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1673,6 +1673,26 @@ const char *svc_proc_name(const struct svc_rqst *rqstp) } +/** + * svc_decode_argument_payload - set up pages containing an argument + * @rqstp: svc_rqst to operate on + * @offset: byte offset of payload in file's page cache + * @length: size of payload, in bytes + * + * This function can modify rqstp->rq_arg.page_base and the content + * of rqstp->rq_arg.pages, but no other fields of rq_arg are changed. + * + * Returns zero on success, or a negative errno if a permanent error + * occurred. + */ +int svc_decode_argument_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) +{ + return rqstp->rq_xprt->xpt_ops->xpo_argument_payload(rqstp, offset, + length); +} +EXPORT_SYMBOL_GPL(svc_decode_argument_payload); + /** * svc_encode_result_payload - mark a range of bytes as a result payload * @rqstp: svc_rqst to operate on diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 478f857cdaed..05eb63c182fd 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -181,6 +181,12 @@ static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh) } } +static int svc_sock_argument_payload(struct svc_rqst *rqstp, + unsigned int offset, unsigned int length) +{ + return 0; +} + static int svc_sock_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length) { @@ -637,6 +643,7 @@ static struct svc_xprt *svc_udp_create(struct svc_serv *serv, static const struct svc_xprt_ops svc_udp_ops = { .xpo_create = svc_udp_create, .xpo_recvfrom = svc_udp_recvfrom, + .xpo_argument_payload = svc_sock_argument_payload, .xpo_sendto = svc_udp_sendto, .xpo_result_payload = svc_sock_result_payload, .xpo_release_rqst = svc_udp_release_rqst, @@ -1208,6 +1215,7 @@ static struct svc_xprt *svc_tcp_create(struct svc_serv *serv, static const struct svc_xprt_ops svc_tcp_ops = { .xpo_create = svc_tcp_create, .xpo_recvfrom = svc_tcp_recvfrom, + .xpo_argument_payload = svc_sock_argument_payload, .xpo_sendto = svc_tcp_sendto, .xpo_result_payload = svc_sock_result_payload, .xpo_release_rqst = svc_tcp_release_rqst, diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index cf76a6ad127b..08a620b370ae 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -867,3 +867,25 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp) svc_rdma_recv_ctxt_put(rdma_xprt, ctxt); return 0; } + +/** + * svc_rdma_argument_payload - special processing for an argument payload + * @rqstp: svc_rqst to operate on + * @offset: offset of payload in file's page cache + * @length: size of payload, in bytes + * + * Pull an RDMA Read payload chunk into rqstp->rq_arg. + * + * Return values: + * %0 if successful or nothing needed to be done + * %-EMSGSIZE on XDR buffer overflow + * %-EINVAL if client provided too many segments + * %-ENOMEM if rdma_rw context pool was exhausted + * %-ENOTCONN if posting failed (connection is lost) + * %-EIO if rdma_rw initialization failed (DMA mapping, etc) + */ +int svc_rdma_argument_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) +{ + return 0; +} diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 94b20fb47135..ee7cc112504c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -79,6 +79,7 @@ static void svc_rdma_kill_temp_xprt(struct svc_xprt *); static const struct svc_xprt_ops svc_rdma_ops = { .xpo_create = svc_rdma_create, .xpo_recvfrom = svc_rdma_recvfrom, + .xpo_argument_payload = svc_rdma_argument_payload, .xpo_sendto = svc_rdma_sendto, .xpo_result_payload = svc_rdma_result_payload, .xpo_release_rqst = svc_rdma_release_rqst, From patchwork Tue Aug 31 19:05:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 12467691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F015EC432BE for ; Tue, 31 Aug 2021 19:05:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D859B61074 for ; Tue, 31 Aug 2021 19:05:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239979AbhHaTGn (ORCPT ); Tue, 31 Aug 2021 15:06:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:34818 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239988AbhHaTGm (ORCPT ); Tue, 31 Aug 2021 15:06:42 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 243E561074; Tue, 31 Aug 2021 19:05:47 +0000 (UTC) Subject: [PATCH RFC 6/6] svcrdma: Pull Read chunks in ->xpo_argument_payload From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org Date: Tue, 31 Aug 2021 15:05:46 -0400 Message-ID: <163043674641.1415.15896010002221696600.stgit@klimt.1015granger.net> In-Reply-To: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> References: <163043485613.1415.4979286233971984855.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This enables the XDR decoder to figure out how the payload sink buffer needs to be aligned before setting up the RDMA Reads. Then re-alignment of large RDMA Read payloads can be avoided. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h | 6 + include/trace/events/rpcrdma.h | 26 ++++++ net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 17 +++- net/sunrpc/xprtrdma/svc_rdma_rw.c | 139 ++++++++++++++++++++++++++++--- 4 files changed, 169 insertions(+), 19 deletions(-) diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index f660244cc8ba..8d80a759a909 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -192,6 +192,12 @@ extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, extern int svc_rdma_process_read_list(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp, struct svc_rdma_recv_ctxt *head); +extern void svc_rdma_prepare_read_chunk(struct svc_rqst *rqstp, + struct svc_rdma_recv_ctxt *head); +extern int svc_rdma_pull_read_chunk(struct svcxprt_rdma *rdma, + struct svc_rqst *rqstp, + struct svc_rdma_recv_ctxt *ctxt, + unsigned int offset, unsigned int length); /* svc_rdma_sendto.c */ extern void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma); diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h index 5954ce036173..30440cca321a 100644 --- a/include/trace/events/rpcrdma.h +++ b/include/trace/events/rpcrdma.h @@ -2136,6 +2136,32 @@ TRACE_EVENT(svcrdma_sq_post_err, ) ); +TRACE_EVENT(svcrdma_arg_payload, + TP_PROTO( + const struct svc_rqst *rqstp, + unsigned int offset, + unsigned int length + ), + + TP_ARGS(rqstp, offset, length), + + TP_STRUCT__entry( + __field(u32, xid) + __field(u32, offset) + __field(u32, length) + ), + + TP_fast_assign( + __entry->xid = __be32_to_cpu(rqstp->rq_xid); + __entry->offset = offset_in_page(offset); + __entry->length = length; + ), + + TP_printk("xid=0x%08x offset=%u length=%u", + __entry->xid, __entry->offset, __entry->length + ) +); + #endif /* _TRACE_RPCRDMA_H */ #include diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 08a620b370ae..cd9c0fb1a470 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -838,12 +838,13 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp) svc_rdma_get_inv_rkey(rdma_xprt, ctxt); - if (!pcl_is_empty(&ctxt->rc_read_pcl) || - !pcl_is_empty(&ctxt->rc_call_pcl)) { + if (!pcl_is_empty(&ctxt->rc_call_pcl) || + ctxt->rc_read_pcl.cl_count > 1) { ret = svc_rdma_process_read_list(rdma_xprt, rqstp, ctxt); if (ret < 0) goto out_readfail; - } + } else if (ctxt->rc_read_pcl.cl_count == 1) + svc_rdma_prepare_read_chunk(rqstp, ctxt); rqstp->rq_xprt_ctxt = ctxt; rqstp->rq_prot = IPPROTO_MAX; @@ -887,5 +888,13 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp) int svc_rdma_argument_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length) { - return 0; + struct svc_rdma_recv_ctxt *ctxt = rqstp->rq_xprt_ctxt; + struct svc_xprt *xprt = rqstp->rq_xprt; + struct svcxprt_rdma *rdma = + container_of(xprt, struct svcxprt_rdma, sc_xprt); + + if (!pcl_is_empty(&ctxt->rc_call_pcl) || + ctxt->rc_read_pcl.cl_count != 1) + return 0; + return svc_rdma_pull_read_chunk(rdma, rqstp, ctxt, offset, length); } diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 29b7d477891c..5f03dfd2fa03 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -707,19 +707,24 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info, len = segment->rs_length; sge_no = PAGE_ALIGN(info->ri_pageoff + len) >> PAGE_SHIFT; + + trace_printk("pageoff=%u len=%u sges=%u\n", + info->ri_pageoff, len, sge_no); + ctxt = svc_rdma_get_rw_ctxt(cc->cc_rdma, sge_no); if (!ctxt) return -ENOMEM; ctxt->rw_nents = sge_no; + head->rc_page_count += sge_no; sg = ctxt->rw_sg_table.sgl; for (sge_no = 0; sge_no < ctxt->rw_nents; sge_no++) { seg_len = min_t(unsigned int, len, PAGE_SIZE - info->ri_pageoff); - if (!info->ri_pageoff) - head->rc_page_count++; - + trace_printk(" page=%p seg_len=%u offset=%u\n", + rqstp->rq_pages[info->ri_pageno], seg_len, + info->ri_pageoff); sg_set_page(sg, rqstp->rq_pages[info->ri_pageno], seg_len, info->ri_pageoff); sg = sg_next(sg); @@ -804,15 +809,14 @@ static int svc_rdma_copy_inline_range(struct svc_rdma_read_info *info, unsigned int page_no, numpages; numpages = PAGE_ALIGN(info->ri_pageoff + remaining) >> PAGE_SHIFT; + head->rc_page_count += numpages; + for (page_no = 0; page_no < numpages; page_no++) { unsigned int page_len; page_len = min_t(unsigned int, remaining, PAGE_SIZE - info->ri_pageoff); - if (!info->ri_pageoff) - head->rc_page_count++; - dst = page_address(rqstp->rq_pages[info->ri_pageno]); memcpy(dst + info->ri_pageno, src + offset, page_len); @@ -1092,15 +1096,8 @@ static noinline int svc_rdma_read_special(struct svc_rdma_read_info *info) * @rqstp: set of pages to use as Read sink buffers * @head: pages under I/O collect here * - * The RPC/RDMA protocol assumes that the upper layer's XDR decoders - * pull each Read chunk as they decode an incoming RPC message. - * - * On Linux, however, the server needs to have a fully-constructed RPC - * message in rqstp->rq_arg when there is a positive return code from - * ->xpo_recvfrom. So the Read list is safety-checked immediately when - * it is received, then here the whole Read list is pulled all at once. - * The ingress RPC message is fully reconstructed once all associated - * RDMA Reads have completed. + * Handle complex Read chunk cases fully before svc_rdma_recvfrom() + * returns. * * Return values: * %1: all needed RDMA Reads were posted successfully, @@ -1159,3 +1156,115 @@ int svc_rdma_process_read_list(struct svcxprt_rdma *rdma, svc_rdma_read_info_free(info); return ret; } + +/** + * svc_rdma_prepare_read_chunk - Prepare rq_arg for Read chunk + * @rqstp: set of pages to use as Read sink buffers + * @head: pages under I/O collect here + * + * The Read chunk will be pulled when the upper layer's XDR + * decoder calls svc_decode_argument_payload(). In the meantime, + * fake up rq_arg.page_len and .len to reflect the size of the + * yet-to-be-pulled payload. + */ +void svc_rdma_prepare_read_chunk(struct svc_rqst *rqstp, + struct svc_rdma_recv_ctxt *head) +{ + struct svc_rdma_chunk *chunk = pcl_first_chunk(&head->rc_read_pcl); + unsigned int length = xdr_align_size(chunk->ch_length); + struct xdr_buf *buf = &rqstp->rq_arg; + + buf->tail[0].iov_base = buf->head[0].iov_base + chunk->ch_position; + buf->tail[0].iov_len = buf->head[0].iov_len - chunk->ch_position; + buf->head[0].iov_len = chunk->ch_position; + + buf->page_len = length; + buf->len += length; + buf->buflen += length; + + /* + * rq_respages starts after the last arg page. Note that we + * don't know the offset yet, so add an extra page as slack. + */ + length += PAGE_SIZE * 2 - 1; + rqstp->rq_respages = &rqstp->rq_pages[length >> PAGE_SHIFT]; + rqstp->rq_next_page = rqstp->rq_respages + 1; +} + +/** + * svc_rdma_pull_read_chunk - Pull one Read chunk from the client + * @rdma: controlling RDMA transport + * @rqstp: set of pages to use as Read sink buffers + * @head: pages under I/O collect here + * @offset: offset of payload in file's page cache + * @length: size of payload, in bytes + * + * Once the upper layer's XDR decoder has decoded the length of + * the payload and it's offset, we can be clever about setting up + * the RDMA Read sink buffer so that the VFS does not have to + * re-align the payload once it is received. + * + * Caveat: To keep things simple, this is an optimization that is + * used only when there is a single Read chunk in the Read + * list. + * + * Return values: + * %0: all needed RDMA Reads were posted successfully, + * %-EINVAL: client provided the wrong chunk size, + * %-ENOMEM: rdma_rw context pool was exhausted, + * %-ENOTCONN: posting failed (connection is lost), + * %-EIO: rdma_rw initialization failed (DMA mapping, etc). + */ +int svc_rdma_pull_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp, + struct svc_rdma_recv_ctxt *head, + unsigned int offset, unsigned int length) +{ + struct svc_rdma_read_info *info; + struct svc_rdma_chunk_ctxt *cc; + struct svc_rdma_chunk *chunk; + int ret; + + trace_svcrdma_arg_payload(rqstp, offset, length); + + /* Sanity: the Requester must have provided enough + * bytes to fill the XDR opaque. + */ + chunk = pcl_first_chunk(&head->rc_read_pcl); + if (length > chunk->ch_length) + return -EINVAL; + + info = svc_rdma_read_info_alloc(rdma); + if (!info) + return -ENOMEM; + cc = &info->ri_cc; + info->ri_rqst = rqstp; + info->ri_readctxt = head; + info->ri_pageno = 0; + info->ri_pageoff = offset_in_page(offset); + info->ri_totalbytes = 0; + + ret = svc_rdma_build_read_chunk(info, chunk); + if (ret < 0) + goto out_err; + rqstp->rq_arg.pages = &info->ri_rqst->rq_pages[0]; + rqstp->rq_arg.page_base = offset_in_page(offset); + rqstp->rq_arg.buflen += offset_in_page(offset); + + trace_svcrdma_post_read_chunk(&cc->cc_cid, cc->cc_sqecount); + init_completion(&cc->cc_done); + ret = svc_rdma_post_chunk_ctxt(cc); + if (ret < 0) + goto out_err; + + ret = 0; + wait_for_completion(&cc->cc_done); + if (cc->cc_status != IB_WC_SUCCESS) + ret = -EIO; + + /* Ensure svc_rdma_recv_ctxt_put() does not try to release pages */ + head->rc_page_count = 0; + +out_err: + svc_rdma_read_info_free(info); + return ret; +}