From patchwork Wed Apr 5 16:53:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13202227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA905C76188 for ; Wed, 5 Apr 2023 16:53:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65E8B6B0071; Wed, 5 Apr 2023 12:53:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60EEC6B0074; Wed, 5 Apr 2023 12:53:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D6DA6B0075; Wed, 5 Apr 2023 12:53:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3D3D66B0071 for ; Wed, 5 Apr 2023 12:53:51 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1E6391A0CF9 for ; Wed, 5 Apr 2023 16:53:51 +0000 (UTC) X-FDA: 80647934262.26.557E7D0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 3F855C000F for ; Wed, 5 Apr 2023 16:53:49 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YssehsGE; spf=pass (imf22.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680713629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=KmiKjQgWG4Y9cITznjjg+SgzOOlwkZVxSxLSfyrxTP0=; b=fYcrcGZ+RiYoovgt3SywOj7qXhnj/M7zdETQo5W7GXkWTqE9AyfIkvJc0Ha4iu2f2gCXrk ckq4J2scCEa4MvOkXwps3XLqVXa3Di2lspsTcKDKPdQ5K+jN4sZen4RyTf9SpxS10QmJsF gtnHFvhh9haxdB8X7FEjvAftNV9z8YY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YssehsGE; spf=pass (imf22.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680713629; a=rsa-sha256; cv=none; b=rafKKH6YqEyK7OkdgJpgmqUChT182sSzWCU0lTKnMoLCLOSFWLc3tjZjq6TgWoWIFLIpQH +W6oqSXv0a/aVCzJvFWlX+mW1vO2/RGQ17doHR1zMnxTqZht1XzCrvMp1pmmgwmHJ6eSWs u1xgYJGYgLo31Rw5ziU9G2W2++8sRZ0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680713628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=KmiKjQgWG4Y9cITznjjg+SgzOOlwkZVxSxLSfyrxTP0=; b=YssehsGEaosh/WolgtuojLG7LTHaL+ZLhev384aOFpAfSs2j5v3Fq6V0O/SQt0k9xv4/yA BUfhHijGefNYcRySxeWPiZS7JvmXo0Fv9lxIWeJUZyJE1fJ4dorlfKnic3irn0ib8WzLi2 e1Z+E+hAreVw0QzhkK8iCK3hs/0c4EA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-125-tnozf7CANJ-ARwwUpEUG2g-1; Wed, 05 Apr 2023 12:53:45 -0400 X-MC-Unique: tnozf7CANJ-ARwwUpEUG2g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8CA8D2807D62; Wed, 5 Apr 2023 16:53:44 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 664DE140EBF4; Wed, 5 Apr 2023 16:53:42 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH net-next v4 00/20] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 Date: Wed, 5 Apr 2023 17:53:19 +0100 Message-Id: <20230405165339.3468808-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3F855C000F X-Rspam-User: X-Stat-Signature: csycba5i5biphe9dsg145n87u8sg58yh X-HE-Tag: 1680713629-10643 X-HE-Meta: U2FsdGVkX186xTgLmY6Uq0/81VzVJMKOy3BucfnevsPAEC/8YeS9KdS+tKfZ/HN0r6oMZrrsK/2bQNuRCjYsbGsgInqkQ+SdEV8sFODkTYzSdBCsSVl/sllGFOuSUPFRDPNPk2sZQU0zsp9jngiJgG1Z8wgVKSbalGuqyCn051fR0yF8xR/bm6vU+CQ1mxZ+lX71mE9ufzt6r5nK+1yIg66bgmOddRA3tFTqAZoR3esIbeC3qIyTSwm9wgSA141lH8k5/vRT1mdYx33pjzm/oPBmmTI18vuoX/Fd9AjCyLPaFmlC2fv0hZ7hyhOxfSB9tX/vrfP7pYENSUCLIXKHIAOZPWJBpe7maWF4Y9XOiTWqCr94bJye6dt1/mc8C2u8YXho42ZH6BcyidB3HBgOFyRDfJAihWgckH7ofWAQi9fTC7ys5WJ8tmg86ZclUCNEyCNEUDgIO4xooHoGOiPox4/IOIztUjgFjhP6rjit50adg/1r6pwy0OFZf8PUO78rgopt6SIu7/Cn4pzKpBV0HkWcNVHsfA0mhlkSxvoUMlVRut7xunSBRqJoh2By1WmU0hWQhaP3O4oLkDCj/X8uMAQ/9CMeSvi2zzyqDMyiEg9LTfrUlaHJH1zjh6KOD0VTvK8VOTptaGlqPKdgxWzM3CVZ35Pt4Hp378o/OYmc6s6YTnApPmKv8o59xME03Ua9g1AB5cW5nfPygnuDvRbxW6GhaDjulNNqPSs25a5lfbJZt7MvZXbRlgXQ+Gvkjs4x8vo7aEe8Frc6IQviPntwVjUvc+S2Rh1n1KraCtUrKLizRWrIt2Bd1Z4hqr6EtatDkJsoe+f4GhwAnr7ryTvZr/oyJq8zUOvKJs18/BZFVH1J12qJrpoQA1l/wgMVOkvY/cdcYnwWE6mLTDzwdkvqA9DBEeMUkH+2O8imCr2LsICDNDPT7v1+fNvylXkdGM5Wr2eY0r0kgqthKJ1rqak jD0CEiTC tFzV9lZ/0Ty46A870riyeARiqabHqobM1csr0duW3kcM7K9t94JWGCCGEPd6nQmUOn7xBQf7V8x9MH7hLPh2EvYHL4UD5+7pSyBvflISBA0yTgXx3sVQ0Ph8WfEpGSRquxpEfU0yZtMYCkFDrRYkbexerLXqobUf+6EaCrGVphoSTfn/qxtS2yO6rz4YCNhc2c13UPfkKcNFBaGUI5jccFTgoNODPyj/DLSI7Rlj3LGFBpACb1CZ0+fo1dbw8h1W7n1TdxpGakz+S3ceOKWGhRnEAc3uprnUBZsdLZWVkzhcNXgWpPJ5jH/U7Y+VKsmul6+lJnPra4o/tkNzyLAMMECGk/G9c8uvmL63c8W3f/t8UtclecgQzWN5B1g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES internal sendmsg flag that is intended to replace the ->sendpage() op with calls to sendmsg(). MSG_SPLICE is a hint that tells the protocol that it should splice the pages supplied if it can and copy them if not. This will allow splice to pass multiple pages in a single call and allow certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire message in one go rather than having to send them piecemeal. This should also make it easier to handle the splicing of multipage folios. This set consists of the following parts: (1) Provide a set of sample functions in samples/net/ that can be used to drive splice() and sendfile() with TCP/TCP6, UDP/UDP6, TLS over TCP/TCP6, UNIX and ALG hash/skcipher sockets for testing. (2) Define the MSG_SPLICE_PAGES flag and prevent sys_sendmsg() from being able to set it. (3) Overhaul the page_frag_alloc_align() allocator: (a) Split it out from mm/page_alloc.c into its own file, mm/page_frag_alloc.c. (b) Make it use multipage folios rather than compound pages. (c) Give it per-cpu buckets to allocate from so no locking is required. (d) The netdev_alloc_cache and the napi fragment cache are then cast in terms of this and some private allocators are removed. I'm not sure that the existing allocator is 100% thread safe. (4) Implement MSG_SPLICE_PAGES support in TCP. (5) Make do_tcp_sendpages() just wrap sendmsg() and then fold it in to its various callers. (6) Implement MSG_SPLICE_PAGES support in IP and make udp_sendpage() just a wrapper around sendmsg(). (7) Implement MSG_SPLICE_PAGES support in IP6/UDP6. (8) Implement MSG_SPLICE_PAGES support in AF_UNIX. (9) Make AF_UNIX copy unspliceable pages. I've pushed the patches here also: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-1 The follow-on patches are on branch iov-sendpage on the same tree. David Changes ======= ver #4) - Added some sample socket-I/O programs into samples/net/. - Fix a missing page-get in AF_KCM. - Init the sgtable and mark the end in AF_ALG when calling netfs_extract_iter_to_sg(). - Add a destructor func for page frag caches prior to generalising it and making it per-cpu. ver #3) - Dropped the iterator-of-iterators patch. - Only expunge MSG_SPLICE_PAGES in sys_send[m]msg, not sys_recv[m]msg. - Split MSG_SPLICE_PAGES code in __ip_append_data() out into helper functions. - Implement MSG_SPLICE_PAGES support in __ip6_append_data() using the above helper functions. - Rename 'xlength' to 'initial_length'. - Minimise the changes to sunrpc for the moment. - Don't give -EOPNOTSUPP if NETIF_F_SG not available, just copy instead. - Implemented MSG_SPLICE_PAGES support in the TLS, Chelsio-TLS and AF_KCM code. ver #2) - Overhauled the page_frag_alloc() allocator: large folios and per-cpu. - Got rid of my own zerocopy allocator. - Use iov_iter_extract_pages() rather poking in iter->bvec. - Made page splicing fall back to page copying on a page-by-page basis. - Made splice_to_socket() pass 16 pipe buffers at a time. - Made AF_ALG/hash use finup/digest where possible in sendmsg. - Added an iterator-of-iterators, ITER_ITERLIST. - Made sunrpc use the iterator-of-iterators. - Converted more drivers. Link: https://lore.kernel.org/r/20230316152618.711970-1-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20230329141354.516864-1-dhowells@redhat.com/ # v2 Link: https://lore.kernel.org/r/20230331160914.1608208-1-dhowells@redhat.com/ # v3 David Howells (20): net: Add samples for network I/O and splicing net: Declare MSG_SPLICE_PAGES internal sendmsg() flag mm: Move the page fragment allocator from page_alloc.c into its own file mm: Make the page_frag_cache allocator use multipage folios mm: Make the page_frag_cache allocator use per-cpu tcp: Support MSG_SPLICE_PAGES tcp: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg espintcp: Inline do_tcp_sendpages() tls: Inline do_tcp_sendpages() siw: Inline do_tcp_sendpages() tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES ip: Remove ip_append_page() ip, udp: Support MSG_SPLICE_PAGES ip, udp: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data ip6, udp6: Support MSG_SPLICE_PAGES af_unix: Support MSG_SPLICE_PAGES af_unix: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data drivers/infiniband/sw/siw/siw_qp_tx.c | 17 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 19 +- drivers/net/ethernet/mediatek/mtk_wed_wo.h | 2 - drivers/nvme/host/tcp.c | 19 +- drivers/nvme/target/tcp.c | 22 +- include/linux/gfp.h | 17 +- include/linux/mm_types.h | 13 +- include/linux/socket.h | 3 + include/net/ip.h | 3 +- include/net/tcp.h | 2 - include/net/tls.h | 2 +- mm/Makefile | 2 +- mm/page_alloc.c | 126 ---------- mm/page_frag_alloc.c | 201 ++++++++++++++++ net/core/skbuff.c | 32 +-- net/ipv4/ip_output.c | 202 ++++++---------- net/ipv4/tcp.c | 260 ++++++++------------- net/ipv4/tcp_bpf.c | 20 +- net/ipv4/udp.c | 50 +--- net/ipv6/ip6_output.c | 12 + net/socket.c | 2 + net/tls/tls_main.c | 24 +- net/unix/af_unix.c | 115 +++++++-- net/xfrm/espintcp.c | 10 +- samples/Kconfig | 6 + samples/Makefile | 1 + samples/net/Makefile | 13 ++ samples/net/alg-encrypt.c | 201 ++++++++++++++++ samples/net/alg-hash.c | 143 ++++++++++++ samples/net/splice-out.c | 142 +++++++++++ samples/net/tcp-send.c | 154 ++++++++++++ samples/net/tcp-sink.c | 76 ++++++ samples/net/tls-send.c | 176 ++++++++++++++ samples/net/tls-sink.c | 98 ++++++++ samples/net/udp-send.c | 151 ++++++++++++ samples/net/udp-sink.c | 82 +++++++ samples/net/unix-send.c | 147 ++++++++++++ samples/net/unix-sink.c | 51 ++++ 38 files changed, 2017 insertions(+), 599 deletions(-) create mode 100644 mm/page_frag_alloc.c create mode 100644 samples/net/Makefile create mode 100644 samples/net/alg-encrypt.c create mode 100644 samples/net/alg-hash.c create mode 100644 samples/net/splice-out.c create mode 100644 samples/net/tcp-send.c create mode 100644 samples/net/tcp-sink.c create mode 100644 samples/net/tls-send.c create mode 100644 samples/net/tls-sink.c create mode 100644 samples/net/udp-send.c create mode 100644 samples/net/udp-sink.c create mode 100644 samples/net/unix-send.c create mode 100644 samples/net/unix-sink.c