From patchwork Wed Apr 5 16:53:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13202259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1E18C7619A for ; Wed, 5 Apr 2023 16:54:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 415B56B0074; Wed, 5 Apr 2023 12:54:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C4986B007E; Wed, 5 Apr 2023 12:54:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B3C16B0080; Wed, 5 Apr 2023 12:54:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1D6D46B0074 for ; Wed, 5 Apr 2023 12:54:12 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C62A4410BB for ; Wed, 5 Apr 2023 16:54:11 +0000 (UTC) X-FDA: 80647935102.28.3312746 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 1647012000C for ; Wed, 5 Apr 2023 16:54:09 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="EnroBNB/"; spf=pass (imf29.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680713650; a=rsa-sha256; cv=none; b=kmgEPlmIN30vq4R3qHXEI+WXTUZg1fBF/6gBKjodefSMI4xhKjlaGeTC2nbw79sSOtxQHE RLot9/+ZKM9VnM0uaRLNgAVxSXdv1ztPk0UYNqcTTVT81P87b1wwruJrjz7yPaXytfWUuI +HT0ejR3TbCF7bFvbI9SExDWH6ruRZk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="EnroBNB/"; spf=pass (imf29.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680713650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gNgTI4ct3B9VKtnPZ5NdE6sXYdaZCQGxwVn8DWUxWC8=; b=3WgGLlOT04s/LOY2J6Di6YexRGewk2I/0+Uch/pYNc19izGlFak1Oj3N2kB9uYnYRTtyHx 5elI+VEpfh50UVIUNWE/rTWBQLTwjV7Hu3+CqI50bCNkg3IVS3gWZutV7hNL/XgfOttLfY RsxSEMpcZ5EfuOGmrM/hHODF7k1adpw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680713649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gNgTI4ct3B9VKtnPZ5NdE6sXYdaZCQGxwVn8DWUxWC8=; b=EnroBNB/A1q8sCUmlJLaQ4KjmrqsGY+kbz6NLuswEb4JKCBEJ3Tkmm0q3UUhwYQZlqCNrx GbWf0t/7yRm2P0S9AgYtzYxbaUaAug6O7IpF1Qy+HXcmfENE5X0F/lplv0xlwUXkzKZv0I HNzC7NmkhvwqXA9aOKaJPEpKWCQ/KUw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-652-HgV9LYxTO12prleccTyfZw-1; Wed, 05 Apr 2023 12:54:04 -0400 X-MC-Unique: HgV9LYxTO12prleccTyfZw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5CFB588904B; Wed, 5 Apr 2023 16:54:03 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53E15C1602A; Wed, 5 Apr 2023 16:54:01 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH net-next v4 06/20] tcp: Support MSG_SPLICE_PAGES Date: Wed, 5 Apr 2023 17:53:25 +0100 Message-Id: <20230405165339.3468808-7-dhowells@redhat.com> In-Reply-To: <20230405165339.3468808-1-dhowells@redhat.com> References: <20230405165339.3468808-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Rspam-User: X-Rspamd-Queue-Id: 1647012000C X-Rspamd-Server: rspam01 X-Stat-Signature: qg58c9bi37f1qe1nqwffpg4hk64k7y69 X-HE-Tag: 1680713649-967297 X-HE-Meta: U2FsdGVkX1/mdzOz6MrW64rf2GSz40o6jzTETUwWd3dJE+UG+Y90XSKEJie07SN3vWCIqrzdYhPJELNyhMolI9PnsDEfO8PQXJhvACJjl1f3F76//nv9v8EbVgRxZymJRitrQ3xe/IDIkZUpxcAmh+OXgWB3HAVuLX/XVzhLY1WpVjr+EvlLj1+AyraoNPot2tWX3i0G3uFHOIQ6Rg9BFgSc1IJx67gR9OPKCgXGbtm0usoO56+k/KFhtkLjOOIRhmMUIDGjAsQgyCnzJtAhisZXvZ8WtmrUwQUbvb4TFzKtHYTiM31LyiwkGaJlRnjFl2jTC0mtLl/TQJfOxPBt3cf1YCFGFWGx2gXbQlSwIUvNim/cfYXU/98ZsRU230CxpI8tQ0hgSILBnHkjMusFwjyDTU7G4XLk6GdpcD3tOcF+3wqNh3j8iAcSjtxQDKg91DFtaqb+MU5dPaYuaVC9mMNW+rYm0IcGb9uGMq+4fottRYsKQvMfaajpODS4Y63AVVMy1+QuswW6oC0/pJJ+rJ5nWCKgiBlaB2ItRMzJwoGZXVOyXzYjPEsx7oDumR1RHIT9jX2Ngt/Pwf4jqY15MnxYoDpanh3exudEU7iRmlC5/BnxYz0bNqUYTv5Xi5C0n7l+bz59uoYA3iBfs1c5OTQ4tFmEYxhFO7v3k1OHvyu2tDQXV21qapDtjvouFY4djwXMdgEAO46yRVCFr32OYrJwSYB6c05MK06z9we26G7EzHSAuRsjiygg7UVofIkMIoAsuknz3NvisI028TUeZ3FF9t01pIB+63DWbT2WyDbBoSUKIibl1PKwTTaxnODfxs5o7aE/JZTnN1aVCT/t/zb9LFYRS5fc9Kj8OC2JoS20ZeuHsJKbbCvtv+p0bp7Llxm7v2uCqh5WbG0pLGJKBYjhT/ZKwXO1MIRxQU8LLpPo2T7ah0dxLO5HTUYlk9Tm4NelFVMS7EYcGzT/EiU LPCGqfAo qskR0goTfFU9U5E32z67JHlPQMW7PXAZKidAu45j4OtmKnqFi4pnVtMbRfq8quJNv9byp9+IACw7rmFmPL7Fe+MRSVZBXnw6wQehjyoR6qt2by9YRtNC8w+cxAGDQz0fbWUYrZ+YPtJte9MEkF8YXnHYwtQhEu/ZIVXHEHORRHcwEbs18IfJY4k1pKKwpfV2rVRm52Gl8alGDWNnbd1jkaxnhrbh/vgR9eyXV8uYZrVOzd5L7YK10pMKTOr+ABhgeHy41IMooMliMZiR3aDLnkqQuhmeIu7bLe/yXeH7BZVCjfpkZYuYvgRTr+ANVKvgXzAiTSiYqMlciBbzQIyUkVuM3xtwDiK6R9D8rCsJQv+ZMUov8QsllDf7pDDr3Lc4ZAbEaF4KgpNv7RT+L4xScOeGtTE3iKohSDoddNuSBATv/WiIv1b8xjIiD7oeQuAOM9+o7MTI8LR550aiLYpGEjfXYmPWPdGp7vKj+MpmJdzZ07J1v13OyUjD0Ea/DNFEWJJV4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 60 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index fd68d49490f2..510bacc7ce7b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1221,7 +1221,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) int flags, err, copied = 0; int mss_now = 0, size_goal, copied_syn = 0; int process_backlog = 0; - bool zc = false; + int zc = 0; long timeo; flags = msg->msg_flags; @@ -1232,17 +1232,22 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (msg->msg_ubuf) { uarg = msg->msg_ubuf; net_zcopy_get(uarg); - zc = sk->sk_route_caps & NETIF_F_SG; + if (sk->sk_route_caps & NETIF_F_SG) + zc = 1; } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); if (!uarg) { err = -ENOBUFS; goto out_err; } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) + if (sk->sk_route_caps & NETIF_F_SG) + zc = 1; + else uarg_to_msgzc(uarg)->zerocopy = 0; } + } else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) { + if (sk->sk_route_caps & NETIF_F_SG) + zc = 2; } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1305,7 +1310,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto do_error; while (msg_data_left(msg)) { - int copy = 0; + ssize_t copy = 0; skb = tcp_write_queue_tail(sk); if (skb) @@ -1346,7 +1351,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (copy > msg_data_left(msg)) copy = msg_data_left(msg); - if (!zc) { + if (zc == 0) { bool merge = true; int i = skb_shinfo(skb)->nr_frags; struct page_frag *pfrag = sk_page_frag(sk); @@ -1391,7 +1396,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) page_ref_inc(pfrag->page); } pfrag->offset += copy; - } else { + } else if (zc == 1) { /* First append to a fragless skb builds initial * pure zerocopy skb */ @@ -1412,6 +1417,54 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (err < 0) goto do_error; copy = err; + } else if (zc == 2) { + /* Splice in data. */ + struct page *page = NULL, **pages = &page; + size_t off = 0, part; + bool can_coalesce; + int i = skb_shinfo(skb)->nr_frags; + + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, + copy, 1, 0, &off); + if (copy <= 0) { + err = copy ?: -EIO; + goto do_error; + } + + can_coalesce = skb_can_coalesce(skb, i, page, off); + if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) { + tcp_mark_push(tp, skb); + iov_iter_revert(&msg->msg_iter, copy); + goto new_segment; + } + if (tcp_downgrade_zcopy_pure(sk, skb)) { + iov_iter_revert(&msg->msg_iter, copy); + goto wait_for_space; + } + + part = tcp_wmem_schedule(sk, copy); + iov_iter_revert(&msg->msg_iter, copy - part); + if (!part) + goto wait_for_space; + copy = part; + + if (can_coalesce) { + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); + } else { + get_page(page); + skb_fill_page_desc_noacc(skb, i, page, off, copy); + } + page = NULL; + + if (!(flags & MSG_NO_SHARED_FRAGS)) + skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG; + + skb->len += copy; + skb->data_len += copy; + skb->truesize += copy; + sk_wmem_queued_add(sk, copy); + sk_mem_charge(sk, copy); + } if (!copied)