From patchwork Wed Jul 19 18:31:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13319319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36EE1C001DE for ; Wed, 19 Jul 2023 18:31:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229732AbjGSSbH (ORCPT ); Wed, 19 Jul 2023 14:31:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjGSSbG (ORCPT ); Wed, 19 Jul 2023 14:31:06 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D908B6 for ; Wed, 19 Jul 2023 11:31:05 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1AAF8617CB for ; Wed, 19 Jul 2023 18:31:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36F0BC433C7; Wed, 19 Jul 2023 18:31:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689791464; bh=KrJj4+pL0evET4lNB4XmU8HTBeucyH/G6HXyAGz3KOc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=mGZrVzGjOJCcatVdlbT5QfdJTYblP/mYuxItTfbjWQSDnii+nStm2mcT9ag62v0kD jZraYgOYxUbjY1S/ZHXPgBDinyi2RyHj83rktPbsL+hxQ0kaAj3z9MlaAgvVyw6b6n FD6dAxUguGkzhB1l+3ZCcrk3/G7XUKd2Ymx7D5YgsYNDxWvS79mG+4M4aem9z5fo04 mtisYdCOwYjuZCsMNuLuw1nmiw8pDz3hDExt9XyvEsUhjLfp8ewR9lPzNF30shvEGA BUB+0W1NxVfHxJIjDGgJjJ9ObyL7PkkLvzxAPviuyUG+2e3CfzSxKLRJINW8WB5PR4 jWdLU0WAfCn5w== Subject: [PATCH v3 1/5] SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly From: Chuck Lever To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org Cc: Chuck Lever , dhowells@redhat.com Date: Wed, 19 Jul 2023 14:31:03 -0400 Message-ID: <168979146324.1905271.11000616800905663660.stgit@morisot.1015granger.net> In-Reply-To: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> References: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> User-Agent: StGit/1.5 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Chuck Lever Add a helper to convert a whole xdr_buf directly into an array of bio_vecs, then send this array instead of iterating piecemeal over the xdr_buf containing the outbound RPC message. Signed-off-by: Chuck Lever Reviewed-by: David Howells --- include/linux/sunrpc/xdr.h | 2 + net/sunrpc/svcsock.c | 59 +++++++++++++++----------------------------- net/sunrpc/xdr.c | 50 +++++++++++++++++++++++++++++++++++++ 3 files changed, 72 insertions(+), 39 deletions(-) diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index f89ec4b5ea16..42f9d7eb9a1a 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -139,6 +139,8 @@ void xdr_terminate_string(const struct xdr_buf *, const u32); size_t xdr_buf_pagecount(const struct xdr_buf *buf); int xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp); void xdr_free_bvec(struct xdr_buf *buf); +unsigned int xdr_buf_to_bvec(struct bio_vec *bvec, unsigned int bvec_size, + const struct xdr_buf *xdr); static inline __be32 *xdr_encode_array(__be32 *p, const void *s, unsigned int len) { diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index e43f26382411..90b1ab95c223 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -36,6 +36,8 @@ #include #include #include +#include + #include #include #include @@ -1194,72 +1196,52 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) return 0; /* record not complete */ } -static int svc_tcp_send_kvec(struct socket *sock, const struct kvec *vec, - int flags) -{ - struct msghdr msg = { .msg_flags = MSG_SPLICE_PAGES | flags, }; - - iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, vec, 1, vec->iov_len); - return sock_sendmsg(sock, &msg); -} - /* * MSG_SPLICE_PAGES is used exclusively to reduce the number of * copy operations in this path. Therefore the caller must ensure * that the pages backing @xdr are unchanging. * - * In addition, the logic assumes that * .bv_len is never larger - * than PAGE_SIZE. + * Note that the send is non-blocking. The caller has incremented + * the reference count on each page backing the RPC message, and + * the network layer will "put" these pages when transmission is + * complete. + * + * This is safe for our RPC services because the memory backing + * the head and tail components is never kmalloc'd. These always + * come from pages in the svc_rqst::rq_pages array. */ -static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr, +static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, rpc_fraghdr marker, unsigned int *sentp) { - const struct kvec *head = xdr->head; - const struct kvec *tail = xdr->tail; struct kvec rm = { .iov_base = &marker, .iov_len = sizeof(marker), }; struct msghdr msg = { - .msg_flags = 0, + .msg_flags = MSG_MORE, }; + unsigned int count; int ret; *sentp = 0; - ret = xdr_alloc_bvec(xdr, GFP_KERNEL); - if (ret < 0) - return ret; - ret = kernel_sendmsg(sock, &msg, &rm, 1, rm.iov_len); + ret = kernel_sendmsg(svsk->sk_sock, &msg, &rm, 1, rm.iov_len); if (ret < 0) return ret; *sentp += ret; if (ret != rm.iov_len) return -EAGAIN; - ret = svc_tcp_send_kvec(sock, head, 0); - if (ret < 0) - return ret; - *sentp += ret; - if (ret != head->iov_len) - goto out; + count = xdr_buf_to_bvec(rqstp->rq_bvec, ARRAY_SIZE(rqstp->rq_bvec), + &rqstp->rq_res); msg.msg_flags = MSG_SPLICE_PAGES; - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, xdr->bvec, - xdr_buf_pagecount(xdr), xdr->page_len); - ret = sock_sendmsg(sock, &msg); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, + count, rqstp->rq_res.len); + ret = sock_sendmsg(svsk->sk_sock, &msg); if (ret < 0) return ret; *sentp += ret; - - if (tail->iov_len) { - ret = svc_tcp_send_kvec(sock, tail, 0); - if (ret < 0) - return ret; - *sentp += ret; - } - -out: return 0; } @@ -1290,8 +1272,7 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp) if (svc_xprt_is_dead(xprt)) goto out_notconn; tcp_sock_set_cork(svsk->sk_sk, true); - err = svc_tcp_sendmsg(svsk->sk_sock, xdr, marker, &sent); - xdr_free_bvec(xdr); + err = svc_tcp_sendmsg(svsk, rqstp, marker, &sent); trace_svcsock_tcp_send(xprt, err < 0 ? (long)err : sent); if (err < 0 || sent != (xdr->len + sizeof(marker))) goto out_close; diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 2a22e78af116..358e6de91775 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -164,6 +164,56 @@ xdr_free_bvec(struct xdr_buf *buf) buf->bvec = NULL; } +/** + * xdr_buf_to_bvec - Copy components of an xdr_buf into a bio_vec array + * @bvec: bio_vec array to populate + * @bvec_size: element count of @bio_vec + * @xdr: xdr_buf to be copied + * + * Returns the number of entries consumed in @bvec. + */ +unsigned int xdr_buf_to_bvec(struct bio_vec *bvec, unsigned int bvec_size, + const struct xdr_buf *xdr) +{ + const struct kvec *head = xdr->head; + const struct kvec *tail = xdr->tail; + unsigned int count = 0; + + if (head->iov_len) { + bvec_set_virt(bvec++, head->iov_base, head->iov_len); + ++count; + } + + if (xdr->page_len) { + unsigned int offset, len, remaining; + struct page **pages = xdr->pages; + + offset = offset_in_page(xdr->page_base); + remaining = xdr->page_len; + while (remaining > 0) { + len = min_t(unsigned int, remaining, + PAGE_SIZE - offset); + bvec_set_page(bvec++, *pages++, len, offset); + remaining -= len; + offset = 0; + if (unlikely(++count > bvec_size)) + goto bvec_overflow; + } + } + + if (tail->iov_len) { + bvec_set_virt(bvec, tail->iov_base, tail->iov_len); + if (unlikely(++count > bvec_size)) + goto bvec_overflow; + } + + return count; + +bvec_overflow: + pr_warn_once("%s: bio_vec array overflow\n", __func__); + return count - 1; +} + /** * xdr_inline_pages - Prepare receive buffer for a large reply * @xdr: xdr_buf into which reply will be placed From patchwork Wed Jul 19 18:31:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13319320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66D68C001DE for ; Wed, 19 Jul 2023 18:31:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229810AbjGSSbN (ORCPT ); Wed, 19 Jul 2023 14:31:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjGSSbN (ORCPT ); Wed, 19 Jul 2023 14:31:13 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3093010CB for ; Wed, 19 Jul 2023 11:31:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 797BC617D5 for ; Wed, 19 Jul 2023 18:31:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 998F2C433C7; Wed, 19 Jul 2023 18:31:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689791470; bh=qPL1XmQINNAE+Qi03ydyBs8JMJCLspAjkFez1BvTBzk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=ZKO7qQJ2T50a5CCfwHoxnC0/pT48MPI1oNf7nGTso5ju1rnfT8yYEvu5lHezU2+64 H7bF9d2EHD818tCqfUhgLELQDZVgR4VykY0fp/hzIOHUswm3IMe6S6nJ8rsAUbFnxf vRSKXs1knNXakS2hV9PeVQx1l/wg9AHpSwyUGuWneEH21j+kDYMyFZcX+eQeJ555zV iU/zY443jdiruqPJyM65oWHRlEacJTOZhlKRl0SzUO6v+P3gN69INLnQT6k5gTbMiy o9fkTeGSgbVgKNW0jird0sC+pX9aeXDrmQ2g/cK0mu0tHhOBnAZ1yL44czOKMrKWRH JXqR9AxR/vCSA== Subject: [PATCH v3 2/5] SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call From: Chuck Lever To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org Cc: David Howells , Chuck Lever , dhowells@redhat.com Date: Wed, 19 Jul 2023 14:31:09 -0400 Message-ID: <168979146971.1905271.4709699930756258041.stgit@morisot.1015granger.net> In-Reply-To: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> References: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> User-Agent: StGit/1.5 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Chuck Lever There is now enough infrastructure in place to combine the stream record marker into the biovec array used to send each outgoing RPC message on TCP. The whole message can be more efficiently sent with a single call to sock_sendmsg() using a bio_vec iterator. Note that this also helps with RPC-with-TLS: the TLS implementation can now clearly see where the upper layer message boundaries are. Before, it would send each component of the xdr_buf (record marker, head, page payload, tail) in separate TLS records. Suggested-by: David Howells Signed-off-by: Chuck Lever Reviewed-by: David Howells --- include/linux/sunrpc/svcsock.h | 2 ++ net/sunrpc/svcsock.c | 33 ++++++++++++++++++--------------- 2 files changed, 20 insertions(+), 15 deletions(-) diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index a7116048a4d4..caf3308f1f07 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -38,6 +38,8 @@ struct svc_sock { /* Number of queued send requests */ atomic_t sk_sendqlen; + struct page_frag_cache sk_frag_cache; + struct completion sk_handshake_done; struct page * sk_pages[RPCSVC_MAXPAGES]; /* received data */ diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 90b1ab95c223..d4d816036c04 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1213,31 +1213,30 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, rpc_fraghdr marker, unsigned int *sentp) { - struct kvec rm = { - .iov_base = &marker, - .iov_len = sizeof(marker), - }; struct msghdr msg = { - .msg_flags = MSG_MORE, + .msg_flags = MSG_SPLICE_PAGES, }; unsigned int count; + void *buf; int ret; *sentp = 0; - ret = kernel_sendmsg(svsk->sk_sock, &msg, &rm, 1, rm.iov_len); - if (ret < 0) - return ret; - *sentp += ret; - if (ret != rm.iov_len) - return -EAGAIN; + /* The stream record marker is copied into a temporary page + * fragment buffer so that it can be included in rq_bvec. + */ + buf = page_frag_alloc(&svsk->sk_frag_cache, sizeof(marker), + GFP_KERNEL); + if (!buf) + return -ENOMEM; + memcpy(buf, &marker, sizeof(marker)); + bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); - count = xdr_buf_to_bvec(rqstp->rq_bvec, ARRAY_SIZE(rqstp->rq_bvec), - &rqstp->rq_res); + count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, + ARRAY_SIZE(rqstp->rq_bvec) - 1, &rqstp->rq_res); - msg.msg_flags = MSG_SPLICE_PAGES; iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, - count, rqstp->rq_res.len); + 1 + count, sizeof(marker) + rqstp->rq_res.len); ret = sock_sendmsg(svsk->sk_sock, &msg); if (ret < 0) return ret; @@ -1616,6 +1615,7 @@ static void svc_tcp_sock_detach(struct svc_xprt *xprt) static void svc_sock_free(struct svc_xprt *xprt) { struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt); + struct page_frag_cache *pfc = &svsk->sk_frag_cache; struct socket *sock = svsk->sk_sock; trace_svcsock_free(svsk, sock); @@ -1625,5 +1625,8 @@ static void svc_sock_free(struct svc_xprt *xprt) sockfd_put(sock); else sock_release(sock); + if (pfc->va) + __page_frag_cache_drain(virt_to_head_page(pfc->va), + pfc->pagecnt_bias); kfree(svsk); } From patchwork Wed Jul 19 18:31:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13319321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA1D3C001DE for ; Wed, 19 Jul 2023 18:31:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229991AbjGSSbT (ORCPT ); Wed, 19 Jul 2023 14:31:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjGSSbT (ORCPT ); Wed, 19 Jul 2023 14:31:19 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 597A3B6 for ; Wed, 19 Jul 2023 11:31:18 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EC8B6617C2 for ; Wed, 19 Jul 2023 18:31:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 19B8CC433C8; Wed, 19 Jul 2023 18:31:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689791477; bh=D6DwcPwywvoVU7CXOxDCIrUiHzrDx4usY5dx9EsKCaI=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=eVonETjNis11TjJ1oRLkrr3NJ4iHzlF7nw72KDfLquSbo2B4zMpOWrTfYToeZuGDP sriFE2tNBDGZrTo4tCMhmITUnid9+1HjIUJZ3YLbPTj3vw21s8ZEJRC/lpMQsZnE+b 9WWuDQ3DEoSAR/ztkxluu8RCQeYyvWTzQ493WSRjUt8B+99EAJQZffLXon+XUvC239 5kU2/bT29ZXOwxH10FnCahbw7T3KNIH93COT+4cqOGJlFIShXnsVpSvfKcidjoJgCm 3O1FhQXDtIrEpkFcdWkNv+3G8Tw+hM3wXu5NwsYDT3NPgCcLsgdooE8sbCFMLvs2yR EgWJgrvrzJItg== Subject: [PATCH v3 3/5] SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array From: Chuck Lever To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org Cc: Chuck Lever , dhowells@redhat.com Date: Wed, 19 Jul 2023 14:31:16 -0400 Message-ID: <168979147611.1905271.944494362977362823.stgit@morisot.1015granger.net> In-Reply-To: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> References: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> User-Agent: StGit/1.5 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Chuck Lever Commit da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends") modified svc_udp_sendto() to use xprt_sock_sendmsg() because we originally believed xprt_sock_sendmsg() would be needed for TLS support. That does not actually appear to be the case. In addition, the linkage between the client and server send code has been a bit of a maintenance headache because of the distinct ways that the client and server handle memory allocation. Going forward, eventually the XDR layer will deal with its buffers in the form of bio_vec arrays, so convert this function accordingly. Signed-off-by: Chuck Lever --- net/sunrpc/svcsock.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index d4d816036c04..f28790f282c2 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -695,7 +695,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) .msg_control = cmh, .msg_controllen = sizeof(buffer), }; - unsigned int sent; + unsigned int count; int err; svc_udp_release_ctxt(xprt, rqstp->rq_xprt_ctxt); @@ -708,22 +708,23 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) if (svc_xprt_is_dead(xprt)) goto out_notconn; - err = xdr_alloc_bvec(xdr, GFP_KERNEL); - if (err < 0) - goto out_unlock; + count = xdr_buf_to_bvec(rqstp->rq_bvec, + ARRAY_SIZE(rqstp->rq_bvec), xdr); - err = xprt_sock_sendmsg(svsk->sk_sock, &msg, xdr, 0, 0, &sent); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, + count, 0); + err = sock_sendmsg(svsk->sk_sock, &msg); if (err == -ECONNREFUSED) { /* ICMP error on earlier request. */ - err = xprt_sock_sendmsg(svsk->sk_sock, &msg, xdr, 0, 0, &sent); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, + count, 0); + err = sock_sendmsg(svsk->sk_sock, &msg); } - xdr_free_bvec(xdr); + trace_svcsock_udp_send(xprt, err); -out_unlock: + mutex_unlock(&xprt->xpt_mutex); - if (err < 0) - return err; - return sent; + return err; out_notconn: mutex_unlock(&xprt->xpt_mutex); From patchwork Wed Jul 19 18:31:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13319322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F7DEC001B0 for ; Wed, 19 Jul 2023 18:31:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbjGSSb0 (ORCPT ); Wed, 19 Jul 2023 14:31:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjGSSbZ (ORCPT ); Wed, 19 Jul 2023 14:31:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF34DB6 for ; Wed, 19 Jul 2023 11:31:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6E1A8617C2 for ; Wed, 19 Jul 2023 18:31:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D04BC433C8; Wed, 19 Jul 2023 18:31:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689791483; bh=mAGQFDTC6IA3TAfBq3MGGlOTdVh4EYYF7vR8hzjuYVo=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=rAnIgY7HIZMT5qsJMT0sySPEVimREBNfXmVd9Tt8xDqjEPsDeBHUNrepYuXsTUROs E0D+Tbk6pGcu+QV1W9DjYzh/2fo/RnBgHEn4kGIknJI8pEBVMtEa5TD++dpdDQeqBO 6FGIOtxpICw4zXTsiI5kUJNcsz2PeP0td3r3Ysuu716GhSES5hD8a6GXSHRAfWGaPo KEtMFea2sMQLygEBZdHd6QJQcjO8nmaEBvpI17AwWjCK2jzH6ouuohetHM9vwQ6kae J4LnXRY2JELJYe+Q3aC97UwzwYlttmILIwyqHuCBAz0i/cUZDQxJRaY6m6ZQtfqXzc 3v7o0sQ491xyg== Subject: [PATCH v3 4/5] SUNRPC: Revert e0a912e8ddba From: Chuck Lever To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org Cc: Chuck Lever , dhowells@redhat.com Date: Wed, 19 Jul 2023 14:31:22 -0400 Message-ID: <168979148257.1905271.8311839188162164611.stgit@morisot.1015granger.net> In-Reply-To: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> References: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> User-Agent: StGit/1.5 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Chuck Lever Flamegraph analysis showed that the cork/uncork calls consume nearly a third of the CPU time spent in svc_tcp_sendto(). The other two consumers are mutex lock/unlock and svc_tcp_sendmsg(). Now that svc_tcp_sendto() coalesces RPC messages properly, there is no need to introduce artificial delays to prevent sending partial messages. After applying this change, I measured a 1.2K read IOPS increase for 8KB random I/O (several percent) on 56Gb IP over IB. Signed-off-by: Chuck Lever Reviewed-by: David Howells --- include/linux/sunrpc/svcsock.h | 2 -- net/sunrpc/svcsock.c | 6 ------ 2 files changed, 8 deletions(-) diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index caf3308f1f07..a7ea54460b1a 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -35,8 +35,6 @@ struct svc_sock { /* Total length of the data (not including fragment headers) * received so far in the fragments making up this rpc: */ u32 sk_datalen; - /* Number of queued send requests */ - atomic_t sk_sendqlen; struct page_frag_cache sk_frag_cache; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index f28790f282c2..7b7358908a21 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1267,22 +1267,17 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp) svc_tcp_release_ctxt(xprt, rqstp->rq_xprt_ctxt); rqstp->rq_xprt_ctxt = NULL; - atomic_inc(&svsk->sk_sendqlen); mutex_lock(&xprt->xpt_mutex); if (svc_xprt_is_dead(xprt)) goto out_notconn; - tcp_sock_set_cork(svsk->sk_sk, true); err = svc_tcp_sendmsg(svsk, rqstp, marker, &sent); trace_svcsock_tcp_send(xprt, err < 0 ? (long)err : sent); if (err < 0 || sent != (xdr->len + sizeof(marker))) goto out_close; - if (atomic_dec_and_test(&svsk->sk_sendqlen)) - tcp_sock_set_cork(svsk->sk_sk, false); mutex_unlock(&xprt->xpt_mutex); return sent; out_notconn: - atomic_dec(&svsk->sk_sendqlen); mutex_unlock(&xprt->xpt_mutex); return -ENOTCONN; out_close: @@ -1291,7 +1286,6 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp) (err < 0) ? "got error" : "sent", (err < 0) ? err : sent, xdr->len); svc_xprt_deferred_close(xprt); - atomic_dec(&svsk->sk_sendqlen); mutex_unlock(&xprt->xpt_mutex); return -EAGAIN; } From patchwork Wed Jul 19 18:31:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13319332 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B7F7C001B0 for ; Wed, 19 Jul 2023 18:31:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229604AbjGSSbc (ORCPT ); Wed, 19 Jul 2023 14:31:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230133AbjGSSbc (ORCPT ); Wed, 19 Jul 2023 14:31:32 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B9F7B6 for ; Wed, 19 Jul 2023 11:31:31 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CEA5F617CF for ; Wed, 19 Jul 2023 18:31:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFAB7C433C8; Wed, 19 Jul 2023 18:31:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689791490; bh=1DccyJH/V5mxxc3FHyGNfZqC5N/PPxlYjCJJzbgu52I=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=ruExub29BrFO6myWNunbMqMxyX4SlXG5DR+7Rt2l9rusjMekSHuulCtVlWjQBFtrQ 5l5y7pLVfXaWpgLbC1hDfP4uQVGCmir09oQ2mp3nQXySlqwM2xbmPTn/ByLoSI/dzR BoHcBygdEsGJ2lzDebCPEyVDOAmofvINtQyDod9YKrb0KC3DFohGKY4hLwisP+f0Lr m9m8f74PYn+E3r01xDzEIVlzPjh4/kCexJuBj5wu2EAKNGn83QitqvWuxtQy3YRTB1 gHimkPj1UwsQblOqhTJ0N1WmPbM/Ii2FrxkNXRX6QwJsnfu9PcfDeYrK/LGulNXLzQ ++GcQBWjDNyPw== Subject: [PATCH v3 5/5] SUNRPC: Reduce thread wake-up rate when receiving large RPC messages From: Chuck Lever To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org Cc: Chuck Lever , dhowells@redhat.com Date: Wed, 19 Jul 2023 14:31:29 -0400 Message-ID: <168979148906.1905271.2650584507923874010.stgit@morisot.1015granger.net> In-Reply-To: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> References: <168979108540.1905271.9720708849149797793.stgit@morisot.1015granger.net> User-Agent: StGit/1.5 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Chuck Lever With large NFS WRITE requests on TCP, I measured 5-10 thread wake- ups to receive each request. This is because the socket layer calls ->sk_data_ready() frequently, and each call triggers a thread wake-up. Each recvmsg() seems to pull in less than 100KB. Have the socket layer hold ->sk_data_ready() calls until the full incoming message has arrived to reduce the wake-up rate. Signed-off-by: Chuck Lever --- net/sunrpc/svcsock.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 7b7358908a21..36e5070132ea 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1088,6 +1088,9 @@ static void svc_tcp_fragment_received(struct svc_sock *svsk) /* If we have more data, signal svc_xprt_enqueue() to try again */ svsk->sk_tcplen = 0; svsk->sk_marker = xdr_zero; + + smp_wmb(); + tcp_set_rcvlowat(svsk->sk_sk, 1); } /** @@ -1177,10 +1180,17 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) goto err_delete; if (len == want) svc_tcp_fragment_received(svsk); - else + else { + /* Avoid more ->sk_data_ready() calls until the rest + * of the message has arrived. This reduces service + * thread wake-ups on large incoming messages. */ + tcp_set_rcvlowat(svsk->sk_sk, + svc_sock_reclen(svsk) - svsk->sk_tcplen); + trace_svcsock_tcp_recv_short(&svsk->sk_xprt, svc_sock_reclen(svsk), svsk->sk_tcplen - sizeof(rpc_fraghdr)); + } goto err_noclose; error: if (len != -EAGAIN)