From patchwork Tue Nov 30 15:18:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647597 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98CB8C433FE for ; Tue, 30 Nov 2021 15:21:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243202AbhK3PYW (ORCPT ); Tue, 30 Nov 2021 10:24:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245056AbhK3PX1 (ORCPT ); Tue, 30 Nov 2021 10:23:27 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4BC7C061376; Tue, 30 Nov 2021 07:19:20 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id l16so45201297wrp.11; Tue, 30 Nov 2021 07:19:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wYoGOjP1qgzxWAnYrLXhDnTMkpReRIX9LA5n6MsGWNs=; b=YHyyEq3f0kBU2I3kngd9RBrs3EiT3kUBhg9v27N+jy8Wz2rP5+jM6BSrcPgt/wqNVq NK7AIFLE8pLhgGj8bPwBJMGBEqrtOFSpddHPtkeDX5Sab7xIamukDZPgSvijxFO4dH1K mLlPt5S/0JxM5ryLudklKL0B/utQKyaFgAr4mWjUPCpZQfL6nmxgWQvTcmFRhFs7ZGXf MI2d7u0L5sbQAOS00oWl151DkZRZcf6L0RVK7n+tr2N1HkscMWeqp/H8n35mMlCCk1Rg W7g4L/M32te83ufNFvFEFskjBo31yz9W+et24sbGm+A5mJlU1rGw0/UkYD6I6qbwMMLP TpvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wYoGOjP1qgzxWAnYrLXhDnTMkpReRIX9LA5n6MsGWNs=; b=UUCR+58bPtHFeQAsBxoSydp8Fqux9Ki8NCgNDApOcWxlZu+/5YE6L58jXkX+hYdoyS cq3XmJs/S2Is0LD/vzC6tkUdfvpVH+HloBtTTq1ZTrmsRV2yCYwR8gty8xoZaysUD3vw MQWrCsVBkO1Ee0dUWs63iX94PTQUJjD8UOwJXNlCN23Nm5mzywXMEwQm1mCNEP9mtFWM 3n8bdsWjS7t6fumxrafPGTBc3gCsVVB9vDw4vomBYTaXszg/GHqU7IFvy8gdmMBBpPcG DmJizGzsLPdGMDF52pgQDVWVo2giAKwVBLSsCQ1g3pGTQ2oape30KWu9sx1ncJP4i6/h rC1Q== X-Gm-Message-State: AOAM531NvqxnotJy+vrdz/UpYNNuJU0nsNYwH3mXdz/L80oHG9HVMMMD bj8on8oO7rZG7BIy8g71pnU331KSLTk= X-Google-Smtp-Source: ABdhPJzbhIYydzfuouAWnaI3AWEq+lyO8VxwgoR5pYdE2F+Wye/YXWeULUbif4Am6fEMR1CkTqXvoA== X-Received: by 2002:a5d:4a85:: with SMTP id o5mr41066948wrq.109.1638285559139; Tue, 30 Nov 2021 07:19:19 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:18 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 01/12] skbuff: add SKBFL_DONT_ORPHAN flag Date: Tue, 30 Nov 2021 15:18:49 +0000 Message-Id: <079685f334f479f70dfed5ab98e06fbfaf81ee3b.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 5 +++-- net/core/skbuff.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c8cb7e697d47..750b7518d6e2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -459,6 +459,8 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) @@ -2839,8 +2841,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ba2f38246f07..b23db60ea6f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1191,7 +1191,7 @@ struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; - uarg->flags = SKBFL_ZEROCOPY_FRAG; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; refcount_set(&uarg->refcnt, 1); sock_hold(sk); From patchwork Tue Nov 30 15:18:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647581 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0964DC433FE for ; Tue, 30 Nov 2021 15:20:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245376AbhK3PYB (ORCPT ); Tue, 30 Nov 2021 10:24:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245057AbhK3PX1 (ORCPT ); Tue, 30 Nov 2021 10:23:27 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEAE2C061377; Tue, 30 Nov 2021 07:19:21 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id a18so45274846wrn.6; Tue, 30 Nov 2021 07:19:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fdsU4+l5aV4bfwBaRNCuzeM5yPixDPv1W50l5HuSAKc=; b=OloQfj21d5fcemkHQqmCStGgD1ISw8htjoVPBnbhsDeL//2i/uZXlNTRV8Mo8F+oEj ZMqLtHxgM7nqh8MdZSJOR6AI47qAoiNPMQOBnU6Ii5Jj/cng5hJwhW6S6vKUXbjn5FQI H5ZUcRNMpjORrIHn2YQWuTpxdR8arcIEJGEYgccxbkIDoN2qZKw8jY+tMBsnGP6GTnnf iyYW+z0t2A20E5XDxcNPqdUMz57fm5jEwnM8EtQjZ+rDV3MDD8rHKoJuB6fc15FNBtgY SGDMOggu1v2NLZcZSdDAd+d+ziK4zQCEcjajJ57I13wmo1Q9Csq/wjET8jnUdPMxXGxy YzOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fdsU4+l5aV4bfwBaRNCuzeM5yPixDPv1W50l5HuSAKc=; b=JSAdOGau8p+S0p4jsuNdVxe2sTNOh1C3uU1norgvz8CZzjvHGkxc8bnY+JfPTBboC4 qwUQuQTARDOKJHCSjTb51MbKrFQIBD445k3GwQTert9/Gfw1yPQTAehJrHsMVs8HVNYi JpON8Kv7XA3Etf8eiCzujwkKYeS+7UaocMkgPVSxxshThLWjNz6vYrvjuioX3283QFMt ZzdC22oqol5V1Fhzd469sAkLb/eoB2vAy2vX87wCJhO97zp2Knfst8Cyt6yfTRbSeYg7 GAf6GttF0TsBCtrlrHQr43wYYD54LRqAvWQUNHr9gGYSBIuzbWFQl2uL0/vpbmjB8Yv0 JQ+w== X-Gm-Message-State: AOAM530nhUHFWA0rpRNm06Ki5AIIb9R53ahF7ZzV34JI6oDXH1pEHtyS ClGaV5WBGcoJJE7J1fp/mNwsZu+Rn2w= X-Google-Smtp-Source: ABdhPJzhc1IZ2bs05HmTBtQaH6UqcJmg3N4P5psLtX3rmAaT4Wt+VOypwhmBeKF9ev5gdZaFEhotYw== X-Received: by 2002:adf:dd0a:: with SMTP id a10mr41470262wrm.60.1638285560235; Tue, 30 Nov 2021 07:19:20 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:19 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 02/12] skbuff: pass a struct ubuf_info in msghdr Date: Tue, 30 Nov 2021 15:18:50 +0000 Message-Id: <07b21df727d9a3987d836e271946e73a629f4601.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Instead of the net stack managing ubuf_info, allow to pass it in from outside in a struct msghdr (in-kernel structure), so io_uring can make use of it. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 2 ++ include/linux/socket.h | 1 + net/compat.c | 1 + net/socket.c | 3 +++ 4 files changed, 7 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 72da3a75521a..59380e3454ad 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4911,6 +4911,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; flags = req->sr_msg.msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -5157,6 +5158,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; flags = req->sr_msg.msg_flags; if (force_nonblock) diff --git a/include/linux/socket.h b/include/linux/socket.h index 8ef26d89ef49..6bd2c6b0c6f2 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -65,6 +65,7 @@ struct msghdr { __kernel_size_t msg_controllen; /* ancillary data buffer length */ unsigned int msg_flags; /* flags on received message */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { diff --git a/net/compat.c b/net/compat.c index 210fc3b4d0d8..6cd2e7683dd0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -80,6 +80,7 @@ int __get_compat_msghdr(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/socket.c b/net/socket.c index 7f64a6eccf63..0a29b616a38c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2023,6 +2023,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2088,6 +2089,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2326,6 +2328,7 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0; From patchwork Tue Nov 30 15:18:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647583 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 781F6C433F5 for ; Tue, 30 Nov 2021 15:20:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243505AbhK3PYG (ORCPT ); Tue, 30 Nov 2021 10:24:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245066AbhK3PX1 (ORCPT ); Tue, 30 Nov 2021 10:23:27 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0A5CC061379; Tue, 30 Nov 2021 07:19:22 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id a18so45275010wrn.6; Tue, 30 Nov 2021 07:19:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=glFnrAs7mklFewoiKDwODPpbaBl+DaLALK70LzQTA2E=; b=MV7WRbvj07PfWujdZ0i0U8PpHdYwYDoSGYUwtPMYcFgW4tZp5H9k8sGaFrWnz9yH7/ JdMTQK2XHWrdKevIsKDUNLsxK0B6Zehv4szXOdRBdw7IfKni78Zs8HxYIijT03+B1kuR oL16T/9sUR2uFS+wP77gUNuWPyRdoTbHSkpCtMJ60HUqOTObcjaCjcwu16hLaYXlKHZV UXeRHyh6IJGcde3jdSlPpiCEB4/Jx2l5D9CKlD2p9T5Tb++wiRk+C0tcJj7uz90fMc03 yLY1AjyWpcRHOJNLANXwFFLh5O9O6+SHSlUc+ZyP3w7lT3YLE/4jbqguuYrQUjEIGKAE JaQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=glFnrAs7mklFewoiKDwODPpbaBl+DaLALK70LzQTA2E=; b=7iCfNpvEuG1KtbnT89nN3PDCr4u+0HoktM7e2Y1PVhYKtYy/t/DtMZCTk/I+B1eh5F IY5X9dXI+7hsMH3kx3KJBssEVjyu5mhVLV+5aSWFFxM+k7zaH1i6ViIgqlxz4UlfnhwS qzSgx8JkXqsnTSALdldAWHvh5cK7bjvCfCx1LTuWe5zLoX2ZlpHGMWX7WvK7byXOiSZG SXWitklPUjygEI8VOSr2yNubaICYy1t7OCBC7ltIJzTK95ffaLPuJRZtN0RV/KOWS9Cl +MiNj2YAancHZ/4fsdjNGdqXHWgBXnCAd091r3eB9tOL62KZG0PpMLw+FmHFZPSsgD9D LB6g== X-Gm-Message-State: AOAM530QN5r93PP4kCdBCyvvQGDePI/2C3nX8tMKPd2sFzyq2IUpw4xC obUexRpDDZ0aukWJ19TeEGkEhCzwgFg= X-Google-Smtp-Source: ABdhPJzW1gHvdrbgEk4gxWYoimeQ9lrv/Jn3fBElCMlalpMK3YOlWNPOxtSw+Cju7eR0BsTv0DSxeA== X-Received: by 2002:adf:ef42:: with SMTP id c2mr40183493wrp.528.1638285561426; Tue, 30 Nov 2021 07:19:21 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:20 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 03/12] net/udp: add support msgdr::msg_ubuf Date: Tue, 30 Nov 2021 15:18:51 +0000 Message-Id: <26e2222a6f3316d218a3df0ca668dcd65536c1ba.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Make ipv4/udp to use ubuf_info passed in struct msghdr if it was specified. Signed-off-by: Pavel Begunkov --- include/net/ip.h | 3 ++- net/ipv4/ip_output.c | 31 ++++++++++++++++++++++++------- net/ipv4/udp.c | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index b71e88507c4a..e9c61b83a770 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -232,7 +232,8 @@ struct sk_buff *ip_make_skb(struct sock *sk, struct flowi4 *fl4, int len, int odd, struct sk_buff *skb), void *from, int length, int transhdrlen, struct ipcm_cookie *ipc, struct rtable **rtp, - struct inet_cork *cork, unsigned int flags); + struct inet_cork *cork, unsigned int flags, + struct ubuf_info *uarg); int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 9bca57ef8b83..f9aab355d283 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -948,10 +948,10 @@ static int __ip_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), void *from, int length, int transhdrlen, - unsigned int flags) + unsigned int flags, + struct ubuf_info *uarg) { struct inet_sock *inet = inet_sk(sk); - struct ubuf_info *uarg = NULL; struct sk_buff *skb; struct ip_options *opt = cork->opt; @@ -967,6 +967,7 @@ static int __ip_append_data(struct sock *sk, unsigned int wmem_alloc_delta = 0; bool paged, extra_uref = false; u32 tskey = 0; + bool zc = false; skb = skb_peek_tail(queue); @@ -1001,7 +1002,21 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { + if (uarg) { + if (skb_zcopy(skb) && uarg != skb_zcopy(skb)) + return -EINVAL; + + /* If it's not zerocopy, just drop uarg, the caller should + * be able to handle it. + */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg = NULL; + } + } else if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); if (!uarg) return -ENOBUFS; @@ -1009,6 +1024,7 @@ static int __ip_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1172,7 +1188,7 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; @@ -1309,7 +1325,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4, return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base, sk_page_frag(sk), getfrag, - from, length, transhdrlen, flags); + from, length, transhdrlen, flags, NULL); } ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, @@ -1601,7 +1617,8 @@ struct sk_buff *ip_make_skb(struct sock *sk, int len, int odd, struct sk_buff *skb), void *from, int length, int transhdrlen, struct ipcm_cookie *ipc, struct rtable **rtp, - struct inet_cork *cork, unsigned int flags) + struct inet_cork *cork, unsigned int flags, + struct ubuf_info *uarg) { struct sk_buff_head queue; int err; @@ -1620,7 +1637,7 @@ struct sk_buff *ip_make_skb(struct sock *sk, err = __ip_append_data(sk, fl4, &queue, cork, ¤t->task_frag, getfrag, - from, length, transhdrlen, flags); + from, length, transhdrlen, flags, uarg); if (err) { __ip_flush_pending_frames(sk, &queue, cork); return ERR_PTR(err); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 8bcecdd6aeda..8c514bff48d4 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1247,7 +1247,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) skb = ip_make_skb(sk, fl4, getfrag, msg, ulen, sizeof(struct udphdr), &ipc, &rt, - &cork, msg->msg_flags); + &cork, msg->msg_flags, msg->msg_ubuf); err = PTR_ERR(skb); if (!IS_ERR_OR_NULL(skb)) err = udp_send_skb(skb, fl4, &cork); From patchwork Tue Nov 30 15:18:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647593 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25EDEC433EF for ; Tue, 30 Nov 2021 15:21:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239833AbhK3PYS (ORCPT ); Tue, 30 Nov 2021 10:24:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245059AbhK3PX1 (ORCPT ); Tue, 30 Nov 2021 10:23:27 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73C0BC061D61; Tue, 30 Nov 2021 07:19:24 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id q3so22359671wru.5; Tue, 30 Nov 2021 07:19:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KlfYKZb+aTjR+IvKXHG7A3YYESuyoklkNiqKi29OkWI=; b=d0/AODVs6cUzGRT1t8Q3h7Xa2a+47zb8OdK0dcA1Mws+SXOGZDIK0ebm3/xgilxnBV mFQ6JZzgtE+XDVSmtuQ5XpfIOulAQxFaVtoOJBlwkWiwkXy9NUQuccBeZhjfJDiDetyx xk0qN3J1aG/svh3a8zcrfJNpzyl4RYOYKduueOAl2TQuBhUAtgBICpF/ml7th7cXCg8i qSiGQN1hp+sI98yLA2Ejgo2ObJqFB9UIs/hxRXtnNQOWM+oojiR0NC+GDsMKoZnX72is gwcDC7gkmXqkYP9FQM0KA47n9zV7ZbHe413iR1zn4avaSfVmii94aiJxb3nbSRqUi2Wp jR9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KlfYKZb+aTjR+IvKXHG7A3YYESuyoklkNiqKi29OkWI=; b=MZnzqXjIbDhFQiCCbf5Zbif2FCPcBdeE8QGbJz6RR++qMJlOAhZ4fO93mJrjLe0pxv PHZ4+oIualFU5cZMfeFXZJtsV5Ox/E8LOCoK79H1irJPZHGGvnWbt/iTkw2xmFhwhyYk fmEsfylyCvgISIopyT2/s6EJShgyEAm/SJBbEicUhlmjlmDXjQcGiXckvaku4nLexGTH oNHRV1XmywIphAtcpXPS/OnrEpzcK8zxM9nWl2tuoThXs/iZ1bt90hpXl9p6eaWF36WO M0e+i1reGgm11y0wH9e7N/4SRIYVIJxYKgzpd1yXlP1yPL3H8pDYvSaJz5It2+PcpwJu HLXg== X-Gm-Message-State: AOAM533vkItpv8vA+ECaBEuMf1jumw2NhdaxNeyjXt6w3JpKvIYxIkke RUZ6CPVO6kq+4Eaai8cCk6vVZSgXaPE= X-Google-Smtp-Source: ABdhPJytKzsIB1cEIGdNWP9XvNUCtEViQC2jWYiNHm3OYAad0bwR+utoDfjzsCKwzyVgFDTyKzp1rQ== X-Received: by 2002:adf:fb86:: with SMTP id a6mr41532863wrr.35.1638285562830; Tue, 30 Nov 2021 07:19:22 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:22 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 04/12] net: add zerocopy_sg_from_iter for bvec Date: Tue, 30 Nov 2021 15:18:52 +0000 Message-Id: <0ee5fc538d3badecb15d7e33fd8e204328d54776.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add a separate path for bvec iterators in __zerocopy_sg_from_iter, first it's quite faster but also will be needed to optimise out get/put_page() Signed-off-by: Pavel Begunkov --- net/core/datagram.c | 54 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/net/core/datagram.c b/net/core/datagram.c index ee290776c661..e00f7e0a7a0a 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -616,11 +616,65 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); +static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + int ret, frag = skb_shinfo(skb)->nr_frags; + struct bvec_iter bi; + struct bio_vec v; + ssize_t copied = 0; + unsigned long truesize = 0; + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size) { + if (frag == MAX_SKB_FRAGS) { + ret = -EMSGSIZE; + goto out; + } + + /* + * TODO: ignore compound pages for now, all bvec from io_uring + * are within boundaries of a single page. + */ + v = mp_bvec_iter_bvec(from->bvec, bi); + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + get_page(v.bv_page); + skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + ret = 0; +out: + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + return ret; +} + int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + if (iov_iter_is_bvec(from)) + return __zerocopy_sg_from_bvec(sk, skb, from, length); + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL; From patchwork Tue Nov 30 15:18:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647595 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FEB1C433FE for ; Tue, 30 Nov 2021 15:21:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239105AbhK3PYU (ORCPT ); Tue, 30 Nov 2021 10:24:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245058AbhK3PX1 (ORCPT ); Tue, 30 Nov 2021 10:23:27 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2904C061D62; Tue, 30 Nov 2021 07:19:25 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id q3so22359826wru.5; Tue, 30 Nov 2021 07:19:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OWPM4MjNj1xp6zI+XrQjzvvSmIGxLADJ0APTs0vw/kI=; b=CaPxPEyCe1kuXXrewNdN8vBO3EvThtstOp5TGI+D1wPozl5U5v1lLNrgxWIO4HDVJk ZbfWXNGMnU7w5/FdPooS7tvzLyQvemqC6/29nMG0Oqf00LBct2FMWOvubXLes8Z80s4p IMvcVylXTbYRpnLGR7sf37igl+PgUO8LjdhfL+b6H1hRWGRdum2lnhWXskwnnRE5hksH qIm01YV9LhuU79e2rW6Lnk5GCDZmHo7xCosxr0E3tiyP/td5vqBorwD8YEX4jt7Qg7Pd SzCUp7WgRTSQBw6ekQ6yUxTasPi7MQ8OB0Q6BKtojvOT/JhRJd2bvi0iZ85zi8BlJvyU 9cPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OWPM4MjNj1xp6zI+XrQjzvvSmIGxLADJ0APTs0vw/kI=; b=x6SdiOkK6AJQKNkcniNtkOB/yomqXRpY/oLhb3bpqf9Euu/ZhG3JRAsEvauPrQoZUk 0keocMorbPBDbjhJRmUtfDe3WRmV3nuw5fRNy0DM35JxAqTGDS3QNuRokGEuIOMaozix Ygvf9lwERVqktASvBzZFzFGg+6aBc7O9mzPwlLUJaEvf2tWpVnTWp2uYi1bIr7zDg2Y3 MBqOya9CmpLwsyGGlj71tBfZ7FdLs80N8wEFUvR6e1iL0Ye+Gq1nLrlbLU7tkA9scHc/ 9Hm+OMxYbCm4VWfDKTrR3oBFoZiYUJa9cunWQT7f0Y/KQ9H2D7BI5Ju+akZbZEsQDZyc vu3Q== X-Gm-Message-State: AOAM533KpXYcnEbYuzfTow6nD5GynNVD/zIbcYx1x8j9DMYX18srOAdQ P1mQzXigOBTCbrIgfSQoXWtPxd/CWU4= X-Google-Smtp-Source: ABdhPJy+Tx1Ecnf83DrM4/PPgQ5HjYrcJGhemiadOImgqh0svuzJuhLdmncnX+MNa77ew358K1GU3w== X-Received: by 2002:a05:6000:373:: with SMTP id f19mr40529601wrf.311.1638285564137; Tue, 30 Nov 2021 07:19:24 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:23 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 05/12] net: optimise page get/free for bvec zc Date: Tue, 30 Nov 2021 15:18:53 +0000 Message-Id: <72608c13553a1372e7f6f7a32eb53d5d4b23a1fc.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC get_page() in __zerocopy_sg_from_bvec() and matching put_page()s are expensive. However, we can avoid it if the caller can guarantee that pages stay alive until the corresponding ubuf_info is not released. In particular, it targets io_uring with fixed buffers following the described contract. Assuming that nobody yet uses bvec together with zerocopy, make all calls with bvec iterators follow this model. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 10 +++++++++- net/core/datagram.c | 9 +++++++-- net/core/skbuff.c | 16 +++++++++++++--- net/ipv4/ip_output.c | 4 ++++ 4 files changed, 33 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 750b7518d6e2..ebb12a7d386d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -461,6 +461,11 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAGS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) @@ -3154,7 +3159,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!(shinfo->flags & SKBFL_MANAGED_FRAGS)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** diff --git a/net/core/datagram.c b/net/core/datagram.c index e00f7e0a7a0a..5cf0672039d6 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -642,7 +642,6 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, v = mp_bvec_iter_bvec(from->bvec, bi); copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - get_page(v.bv_page); skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } @@ -671,9 +670,15 @@ int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + bool managed = skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS; - if (iov_iter_is_bvec(from)) + if (iov_iter_is_bvec(from) && (managed || frag == 0)) { + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAGS; return __zerocopy_sg_from_bvec(sk, skb, from, length); + } + + if (managed) + return -EFAULT; while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b23db60ea6f9..b7b087815539 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,10 +666,14 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) goto exit; - skb_zcopy_clear(skb, true); + if (!(shinfo->flags & SKBFL_MANAGED_FRAGS)) { + for (i = 0; i < shinfo->nr_frags; i++) + __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); + } else { + shinfo->flags &= ~SKBFL_MANAGED_FRAGS; + } - for (i = 0; i < shinfo->nr_frags; i++) - __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); + skb_zcopy_clear(skb, true); if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); @@ -1471,6 +1475,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) /* skb frags release userspace buffers */ for (i = 0; i < num_frags; i++) skb_frag_unref(skb, i); + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAGS; /* skb frags point to kernel buffers */ for (i = 0; i < new_frags - 1; i++) { @@ -1597,6 +1602,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); skb_copy_header(n, skb); + skb_shinfo(n)->flags &= ~SKBFL_MANAGED_FRAGS; return n; } EXPORT_SYMBOL(skb_copy); @@ -1653,6 +1659,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom, skb_frag_ref(skb, i); } skb_shinfo(n)->nr_frags = i; + skb_shinfo(n)->flags &= ~SKBFL_MANAGED_FRAGS; } if (skb_has_frag_list(skb)) { @@ -1725,6 +1732,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, refcount_inc(&skb_uarg(skb)->refcnt); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) skb_frag_ref(skb, i); + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAGS; if (skb_has_frag_list(skb)) skb_clone_fraglist(skb); @@ -3788,6 +3796,8 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, if (skb_can_coalesce(skb, i, page, offset)) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size); } else if (i < MAX_SKB_FRAGS) { + if (skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS) + return -EMSGSIZE; get_page(page); skb_fill_page_desc(skb, i, page, offset, size); } else { diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index f9aab355d283..e6adf96e5530 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1194,6 +1194,10 @@ static int __ip_append_data(struct sock *sk, err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + if (skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS) { + err = -EMSGSIZE; + goto error; + } if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { From patchwork Tue Nov 30 15:18:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41293C433F5 for ; Tue, 30 Nov 2021 15:20:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245425AbhK3PYJ (ORCPT ); Tue, 30 Nov 2021 10:24:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245085AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D43EDC061D64; Tue, 30 Nov 2021 07:19:26 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id d9so24352431wrw.4; Tue, 30 Nov 2021 07:19:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wY1As03rF5TmjUCuV7yr/sjJ+OmJ/UHrJ+IXfv0oMnk=; b=Fce56+l13k3XBLtYTY3SV8DvM+jDhx35XvC5Ovm+Rcrb5mD33F0PD2UcLtlvRSct3P GJ+lalBXL2WrFdKoglE7FCPv450yNZPvrlEXGB2BfsqYgX4p0x+X2elSRs6oSVJKjrFM vV7IX7WN40jOzoHMlHsqk+4UMpbz4z7PRtdDzt7qDJvCnaoueZuJf+n1rfZOaeudOA+w bcNaTBcZHPe/EVG5tY8uPgZU/cf6mix+qZkKxG54Da6B7JWeOY72Zx5u8to3assLF2jW Flm3LSf/crHM1Q/7E8gkdk3BAyruqUSPmOCDPedzou0KjPgYSaQKEkEpdN+15M1jEo4x 4Vow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wY1As03rF5TmjUCuV7yr/sjJ+OmJ/UHrJ+IXfv0oMnk=; b=IiDLxNcZMhwrdMCd+dtPe+44xDyp45sL7rCK8IDQsevsmdKqcQbI4x9NrWGxUCO8No KpXhm6BtFraEmSLV5snwffdYuv67J1qaA69hieaaTwGSbDISTt4h9OgjQQ4BX7VOlqKk h6E9K5ZGr/YIhB+TfQKOcBOu+KY5v/2M0MUOyyURnwnC4+954LoDDfMzAOkN76bYHJYx VWIOVHbNcnxruImp/JOJoarQOQwTDhAzkPZGFdgcodpRBvGgXd4KYYxItj/GUVkaDZ/P yHw/ihwmq2zciIfAPVvM2If372MX5g2o8VKuTI/oSj7FvFYUqHpcCilvjXYjUaK2m6rJ DEjg== X-Gm-Message-State: AOAM531CMf+EYy40AD5oUKFgTLQzYoVcUrllfH6O4hWcYqTIdAw1IS9y tH98MwqCfAlHTsDgliujKZDsXJLURGo= X-Google-Smtp-Source: ABdhPJzbEKvbxueRTlTT7Mp7AlrSXRSpYDRNHgdxY6sidzxX9SNgqMuNTsYQMw9L7Lm+hPZFwlndJQ== X-Received: by 2002:a5d:648e:: with SMTP id o14mr41003106wri.141.1638285565244; Tue, 30 Nov 2021 07:19:25 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:24 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 06/12] io_uring: add send notifiers registration Date: Tue, 30 Nov 2021 15:18:54 +0000 Message-Id: <1585ab7d7c7c987450f733b5773bc1ca1f673fce.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add IORING_REGISTER_TX_CTX and IORING_UNREGISTER_TX_CTX. Transmission (i.e. send) context will serve be used to notify the userspace when fixed buffers used for zerocopy sends are released by the kernel. Notification of a single tx context lives in generations, where each generation posts one CQE with ->user_data equal to the specified tag and ->res is a generation number starting from 0. All requests issued against a ctx will get attached to the current generation of notifications. Then, the userspace will be able to request to flush the notification allowing it to post a CQE when all buffers of all requests attached to it are released by the kernel. It'll also switch the generation to a new one with a sequence number incremented by one. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 7 ++++ 2 files changed, 79 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 59380e3454ad..a01f91e70fa5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -94,6 +94,8 @@ #define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8 +#define IORING_MAX_TX_NOTIFIERS (1U << 10) + /* only define max */ #define IORING_MAX_FIXED_FILES (1U << 15) #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ @@ -326,6 +328,15 @@ struct io_submit_state { struct blk_plug plug; }; +struct io_tx_notifier { +}; + +struct io_tx_ctx { + struct io_tx_notifier *notifier; + u64 tag; + u32 seq; +}; + struct io_ring_ctx { /* const or read-mostly hot data */ struct { @@ -373,6 +384,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_tx_ctx *tx_ctxs; + unsigned nr_tx_ctxs; struct io_submit_state submit_state; struct list_head timeout_list; @@ -9199,6 +9212,55 @@ static int io_buffer_validate(struct iovec *iov) return 0; } +static int io_sqe_tx_ctx_unregister(struct io_ring_ctx *ctx) +{ + if (!ctx->nr_tx_ctxs) + return -ENXIO; + + kvfree(ctx->tx_ctxs); + ctx->tx_ctxs = NULL; + ctx->nr_tx_ctxs = 0; + return 0; +} + +static int io_sqe_tx_ctx_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int nr_args) +{ + struct io_uring_tx_ctx_register __user *tx_args = arg; + struct io_uring_tx_ctx_register tx_arg; + unsigned i; + int ret; + + if (ctx->nr_tx_ctxs) + return -EBUSY; + if (!nr_args) + return -EINVAL; + if (nr_args > IORING_MAX_TX_NOTIFIERS) + return -EMFILE; + + ctx->tx_ctxs = kvcalloc(nr_args, sizeof(ctx->tx_ctxs[0]), + GFP_KERNEL_ACCOUNT); + if (!ctx->tx_ctxs) + return -ENOMEM; + + for (i = 0; i < nr_args; i++, ctx->nr_tx_ctxs++) { + struct io_tx_ctx *tx_ctx = &ctx->tx_ctxs[i]; + + if (copy_from_user(&tx_arg, &tx_args[i], sizeof(tx_arg))) { + ret = -EFAULT; + goto out_fput; + } + tx_ctx->tag = tx_arg.tag; + } + return 0; + +out_fput: + kvfree(ctx->tx_ctxs); + ctx->tx_ctxs = NULL; + ctx->nr_tx_ctxs = 0; + return ret; +} + static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args, u64 __user *tags) { @@ -9429,6 +9491,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + io_sqe_tx_ctx_unregister(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -11104,6 +11167,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_iowq_max_workers(ctx, arg); break; + case IORING_REGISTER_TX_CTX: + ret = io_sqe_tx_ctx_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_TX_CTX: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_sqe_tx_ctx_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 787f491f0d2a..f2e8d18e40e0 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -325,6 +325,9 @@ enum { /* set/get max number of io-wq workers */ IORING_REGISTER_IOWQ_MAX_WORKERS = 19, + IORING_REGISTER_TX_CTX = 20, + IORING_UNREGISTER_TX_CTX = 21, + /* this goes last */ IORING_REGISTER_LAST }; @@ -365,6 +368,10 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_tx_ctx_register { + __u64 tag; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) From patchwork Tue Nov 30 15:18:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B56FC433EF for ; Tue, 30 Nov 2021 15:20:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244562AbhK3PYI (ORCPT ); Tue, 30 Nov 2021 10:24:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245086AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E310BC061D65; Tue, 30 Nov 2021 07:19:27 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id j3so45243400wrp.1; Tue, 30 Nov 2021 07:19:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aQQXoYWBH4jfdlEHv5tZol+01BrTrQ8qiQ0k4xRr9KQ=; b=kZFiGfLzItRdZa2oClNtEIvfviPy1VJs24Iyd2nWiFaJHZ+SgMQhuf7eDNWDYCr1VQ cERXF8xuyeQuizQYQOB+XJvm65nkPIgQNtEjS+CWZlIK96eHqFtZPT7GLYA1mbVpLJ1h Vwqmjiayr+d1KZPTjA3Ms6UoSgN0qEhV5rnfXF7jERi7yoWIT9m1t/biLwff2wE/kEJW cBU2gd/J+sW5LplFrewTbsFM4fv26c/NURjQUCRM3xd3in4bEKHAN9YjETZRY+3yZa02 iGg+04z36Gy1/JYjtk8Z0Hp5yJgSununM0rRh4rt1j/2U9G5f16brfRhjDpw0WvzB+xW VVkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aQQXoYWBH4jfdlEHv5tZol+01BrTrQ8qiQ0k4xRr9KQ=; b=msO1odKb3z5k4wdxM77cEmPX7umpYAk/u7BOMxMuyJC8QOIKylkVyRiJh9UIz+jnk0 yTiljGnako146klI6EnnwMAWPubzaOmgEtmjzH2OR2tCyqalX3qkSW9Yqer2RnHML/9p JROVO0pGILAbOXS0AgquNHT3Jht/4nmXoNHB59wU+xpJ+oCJcZFD6Q+Fz6K+Tnuv3J88 aXUZQf69haIuMgiIH5LbVytMR2ee18aBaKJWGq2n/tJWST2nA8kFeLJkn71UQYcHiMub 2DwohWwHk/urbpba62xW8YECC/foAuhQQ3v8/n3dSp7TfWBz4+D1EYvk8dEi9hss/3QV 8Axg== X-Gm-Message-State: AOAM533wVi41P7ubfa5zO8j//ljJkLC5DTWmagl9r82WXWRhHEQASxra rnk/s9ACCek6uDmSq5IoNo9KWQ/9JlQ= X-Google-Smtp-Source: ABdhPJxD8BTtW6WL2an20PrK6Cfj0KFy5qepEeEebPK/e5whCwDaUOWQz28cjzL/U26WPjRbl8cXQA== X-Received: by 2002:a5d:5986:: with SMTP id n6mr43685533wri.297.1638285566354; Tue, 30 Nov 2021 07:19:26 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:26 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 07/12] io_uring: infrastructure for send zc notifications Date: Tue, 30 Nov 2021 15:18:55 +0000 Message-Id: <5c2b751d6c29c02f1d0a3b0e0b220de321bc3e2d.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add a new ubuf_info callback io_uring_tx_zerocopy_callback(), which should post an CQE when it completes. Also, implement some infrastructuire for allocating and managing struct ubuf_info. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 108 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index a01f91e70fa5..6ca02e60fa48 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -329,6 +329,11 @@ struct io_submit_state { }; struct io_tx_notifier { + struct ubuf_info uarg; + struct work_struct commit_work; + struct percpu_ref *fixed_rsrc_refs; + u64 tag; + u32 seq; }; struct io_tx_ctx { @@ -1275,15 +1280,20 @@ static void io_rsrc_refs_refill(struct io_ring_ctx *ctx) percpu_ref_get_many(&ctx->rsrc_node->refs, IO_RSRC_REF_BATCH); } +static inline void io_set_rsrc_node(struct percpu_ref **rsrc_refs, + struct io_ring_ctx *ctx) +{ + *rsrc_refs = &ctx->rsrc_node->refs; + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); +} + static inline void io_req_set_rsrc_node(struct io_kiocb *req, struct io_ring_ctx *ctx) { - if (!req->fixed_rsrc_refs) { - req->fixed_rsrc_refs = &ctx->rsrc_node->refs; - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); - } + if (!req->fixed_rsrc_refs) + io_set_rsrc_node(&req->fixed_rsrc_refs, ctx); } static void io_refs_resurrect(struct percpu_ref *ref, struct completion *compl) @@ -1930,6 +1940,76 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, return __io_fill_cqe(ctx, user_data, res, cflags); } +static void io_zc_tx_work_callback(struct work_struct *work) +{ + struct io_tx_notifier *notifier = container_of(work, struct io_tx_notifier, + commit_work); + struct io_ring_ctx *ctx = notifier->uarg.ctx; + + spin_lock(&ctx->completion_lock); + io_fill_cqe_aux(ctx, notifier->tag, notifier->seq, 0); + io_commit_cqring(ctx); + spin_unlock(&ctx->completion_lock); + io_cqring_ev_posted(ctx); + + percpu_ref_put(notifier->fixed_rsrc_refs); + percpu_ref_put(&ctx->refs); + kfree(notifier); +} + +static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *uarg, + bool success) +{ + struct io_tx_notifier *notifier; + + notifier = container_of(uarg, struct io_tx_notifier, uarg); + if (!refcount_dec_and_test(&uarg->refcnt)) + return; + + if (in_interrupt()) { + INIT_WORK(¬ifier->commit_work, io_zc_tx_work_callback); + queue_work(system_unbound_wq, ¬ifier->commit_work); + } else { + io_zc_tx_work_callback(¬ifier->commit_work); + } +} + +static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, + struct io_tx_ctx *tx_ctx) +{ + struct io_tx_notifier *notifier; + struct ubuf_info *uarg; + + notifier = kmalloc(sizeof(*notifier), GFP_ATOMIC); + if (!notifier) + return NULL; + + WARN_ON_ONCE(!current->io_uring); + notifier->seq = tx_ctx->seq++; + notifier->tag = tx_ctx->tag; + io_set_rsrc_node(¬ifier->fixed_rsrc_refs, ctx); + + uarg = ¬ifier->uarg; + uarg->ctx = ctx; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + uarg->callback = io_uring_tx_zerocopy_callback; + refcount_set(&uarg->refcnt, 1); + percpu_ref_get(&ctx->refs); + return notifier; +} + +__attribute__((unused)) +static inline struct io_tx_notifier *io_get_tx_notifier(struct io_ring_ctx *ctx, + struct io_tx_ctx *tx_ctx) +{ + if (tx_ctx->notifier) + return tx_ctx->notifier; + + tx_ctx->notifier = io_alloc_tx_notifier(ctx, tx_ctx); + return tx_ctx->notifier; +} + static void io_req_complete_post(struct io_kiocb *req, s32 res, u32 cflags) { @@ -9212,11 +9292,27 @@ static int io_buffer_validate(struct iovec *iov) return 0; } +static void io_sqe_tx_ctx_kill_ubufs(struct io_ring_ctx *ctx) +{ + struct io_tx_ctx *tx_ctx; + int i; + + for (i = 0; i < ctx->nr_tx_ctxs; i++) { + tx_ctx = &ctx->tx_ctxs[i]; + if (!tx_ctx->notifier) + continue; + io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, + true); + tx_ctx->notifier = NULL; + } +} + static int io_sqe_tx_ctx_unregister(struct io_ring_ctx *ctx) { if (!ctx->nr_tx_ctxs) return -ENXIO; + io_sqe_tx_ctx_kill_ubufs(ctx); kvfree(ctx->tx_ctxs); ctx->tx_ctxs = NULL; ctx->nr_tx_ctxs = 0; @@ -9608,6 +9704,12 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_sq_thread_unpark(sqd); } + if (READ_ONCE(ctx->nr_tx_ctxs)) { + mutex_lock(&ctx->uring_lock); + io_sqe_tx_ctx_kill_ubufs(ctx); + mutex_unlock(&ctx->uring_lock); + } + io_req_caches_free(ctx); if (WARN_ON_ONCE(time_after(jiffies, timeout))) { From patchwork Tue Nov 30 15:18:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18A1CC4321E for ; Tue, 30 Nov 2021 15:20:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245320AbhK3PYA (ORCPT ); Tue, 30 Nov 2021 10:24:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245087AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 461B4C061D66; Tue, 30 Nov 2021 07:19:29 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id p3-20020a05600c1d8300b003334fab53afso19808690wms.3; Tue, 30 Nov 2021 07:19:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AHZdLicu5JTswxvmP1cJP+tnU4fdIGy/YGmaAwYSNng=; b=JxJxAuiEXFOoi58byq6joGDTGf2JhnIfmxBFzRjTAYUXuWYv9rPisOm8L5vXXghWas egruRQt39gsr8Nns+Y0X1ie2tdAEhxNrCPcB961bAjhCSmVdMZg1jzGXllKjT4aMQusL dKz5WAVhequnihKy5xvcDRRcrP2A8ZX+QVJzBqNIHHzjiV0aVybbwe0ZutCk/3QawTHv pRM7E7GCLNl6NUW+eDcH2YzkXpH1S5HzK8nLx4ip9LfRKf2LYsxTykbn8Ok+5s7xv975 I1veStJs2NObgfJyJfaIbzeVVk84rnZOAaUrQhpWHPdu1X5APXAsPiUyinn/+pznFSr5 HVXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AHZdLicu5JTswxvmP1cJP+tnU4fdIGy/YGmaAwYSNng=; b=BrYA4cYvdwPog2hfxJRGxh6kx6aC00fPjre4eYDobxwR8jU1qOdkjSlMjfRkKQHjGp 0oMmlIUGscARCe6IXfqa8mikguZpq26WW+SWxzcJjnS3FepNzDpetsDsBapRfp3IeUxA gUOvOjIriUFuXCqArj0mDRamcysmMs7JpMqdCGdBPVwrfQNnT/AmHukDWrwFa6oR4X76 1ZSi0tpAOrYF3VHK6h5Oy607ZVulNXzV1y5bpGATIPOGkoHE9di1TvOukRUU4PephOHH ILtEbuklXkbN05Gi8qcAGz6SY/hFeBAtlYfXuAKTnVBITSu8IhLB1ydl03qs57mJ6Ijh 64Ww== X-Gm-Message-State: AOAM532OI/Eh37uzNmnfL+MTMrhehCYhVUJj0GlNqzqzwx349m2Wk1qn KZ7MglrZr2qKI7EzlyTbLrdvY+r8emY= X-Google-Smtp-Source: ABdhPJxiURr+gLV6BLitN1LZ2XVOPzmJ+4lLj+gBwpCBcbot95y/GHGwK/EY6/nogXD+enloBTpeiQ== X-Received: by 2002:a7b:c756:: with SMTP id w22mr107379wmk.34.1638285567620; Tue, 30 Nov 2021 07:19:27 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:27 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 08/12] io_uring: wire send zc request type Date: Tue, 30 Nov 2021 15:18:56 +0000 Message-Id: X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from other send requests is that the user should specify a tx context index, which will notifiy the userspace when the kernel doesn't need the buffers anymore and it's safe to reuse them. So, overwriting data buffers is racy before getting a separate notification even when the request is already completed. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 120 +++++++++++++++++++++++++++++++++- include/uapi/linux/io_uring.h | 2 + 2 files changed, 121 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 6ca02e60fa48..337eb91f0198 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -600,6 +600,16 @@ struct io_sr_msg { size_t len; }; +struct io_sendzc { + struct file *file; + void __user *buf; + size_t len; + struct io_tx_ctx *tx_ctx; + int msg_flags; + int addr_len; + void __user *addr; +}; + struct io_open { struct file *file; int dfd; @@ -874,6 +884,7 @@ struct io_kiocb { struct io_mkdir mkdir; struct io_symlink symlink; struct io_hardlink hardlink; + struct io_sendzc msgzc; }; u8 opcode; @@ -1123,6 +1134,12 @@ static const struct io_op_def io_op_defs[] = { [IORING_OP_MKDIRAT] = {}, [IORING_OP_SYMLINKAT] = {}, [IORING_OP_LINKAT] = {}, + [IORING_OP_SENDZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollout = 1, + .audit_skip = 1, + }, }; /* requests with any of those set should undergo io_disarm_next() */ @@ -1999,7 +2016,6 @@ static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, return notifier; } -__attribute__((unused)) static inline struct io_tx_notifier *io_get_tx_notifier(struct io_ring_ctx *ctx, struct io_tx_ctx *tx_ctx) { @@ -5025,6 +5041,102 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } +static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_sendzc *sr = &req->msgzc; + unsigned int idx; + + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL; + if (READ_ONCE(sqe->ioprio)) + return -EINVAL; + + sr->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + sr->len = READ_ONCE(sqe->len); + sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; + if (sr->msg_flags & MSG_DONTWAIT) + req->flags |= REQ_F_NOWAIT; + + idx = READ_ONCE(sqe->tx_ctx_idx); + if (idx > ctx->nr_tx_ctxs) + return -EINVAL; + idx = array_index_nospec(idx, ctx->nr_tx_ctxs); + req->msgzc.tx_ctx = &ctx->tx_ctxs[idx]; + + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + sr->addr_len = READ_ONCE(sqe->__pad2[0]); + +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + sr->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct sockaddr_storage address; + struct io_ring_ctx *ctx = req->ctx; + struct io_tx_notifier *notifier; + struct io_sendzc *sr = &req->msgzc; + struct msghdr msg; + struct iovec iov; + struct socket *sock; + unsigned flags; + int ret, min_ret = 0; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter); + if (unlikely(ret)) + return ret; + + msg.msg_name = NULL; + msg.msg_control = NULL; + msg.msg_controllen = 0; + msg.msg_namelen = 0; + if (sr->addr) { + ret = move_addr_to_kernel(sr->addr, sr->addr_len, &address); + if (ret < 0) + return ret; + msg.msg_name = (struct sockaddr *)&address; + msg.msg_namelen = sr->addr_len; + } + + io_ring_submit_lock(ctx, issue_flags & IO_URING_F_UNLOCKED); + notifier = io_get_tx_notifier(ctx, req->msgzc.tx_ctx); + if (!notifier) { + req_set_fail(req); + ret = -ENOMEM; + goto out; + } + msg.msg_ubuf = ¬ifier->uarg; + + flags = sr->msg_flags; + if (issue_flags & IO_URING_F_NONBLOCK) + flags |= MSG_DONTWAIT; + if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + msg.msg_flags = flags; + ret = sock_sendmsg(sock, &msg); + + if (ret < min_ret) { + if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) + goto out; + if (ret == -ERESTARTSYS) + ret = -EINTR; + req_set_fail(req); + } + io_ring_submit_unlock(ctx, issue_flags & IO_URING_F_UNLOCKED); + __io_req_complete(req, issue_flags, ret, 0); + return 0; +out: + io_ring_submit_unlock(ctx, issue_flags & IO_URING_F_UNLOCKED); + return ret; +} + static int __io_recvmsg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg) { @@ -5428,6 +5540,7 @@ IO_NETOP_PREP_ASYNC(sendmsg); IO_NETOP_PREP_ASYNC(recvmsg); IO_NETOP_PREP_ASYNC(connect); IO_NETOP_PREP(accept); +IO_NETOP_PREP(sendzc); IO_NETOP_FN(send); IO_NETOP_FN(recv); #endif /* CONFIG_NET */ @@ -6575,6 +6688,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) case IORING_OP_SENDMSG: case IORING_OP_SEND: return io_sendmsg_prep(req, sqe); + case IORING_OP_SENDZC: + return io_sendzc_prep(req, sqe); case IORING_OP_RECVMSG: case IORING_OP_RECV: return io_recvmsg_prep(req, sqe); @@ -6832,6 +6947,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) case IORING_OP_SEND: ret = io_send(req, issue_flags); break; + case IORING_OP_SENDZC: + ret = io_sendzc(req, issue_flags); + break; case IORING_OP_RECVMSG: ret = io_recvmsg(req, issue_flags); break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f2e8d18e40e0..bbc78fe8ca77 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -59,6 +59,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 tx_ctx_idx; }; __u64 __pad2[2]; }; @@ -143,6 +144,7 @@ enum { IORING_OP_MKDIRAT, IORING_OP_SYMLINKAT, IORING_OP_LINKAT, + IORING_OP_SENDZC, /* this goes last, obviously */ IORING_OP_LAST, From patchwork Tue Nov 30 15:18:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECC06C4332F for ; Tue, 30 Nov 2021 15:20:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234234AbhK3PYQ (ORCPT ); Tue, 30 Nov 2021 10:24:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245092AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7089FC061D67; Tue, 30 Nov 2021 07:19:30 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id q3so22360348wru.5; Tue, 30 Nov 2021 07:19:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Eehkf30/vOkmB1EHeirvJM7RAs4zwrvYsLhDyfeMDEo=; b=JhzEfNn/4tTOE5NohqW/qiQ5gmxbxvBlbr399BuAjwjTpqYqRWbDZos3thQtWTUPbU Z4E9GxXMgczPJ6yaM8kwFTrm7ExdkonbQzTUMqJjTINOVRtb3i/GdznCzP/4jRbdylNN OIAFLzk+JqHCaF3o8AYgubBU91yaoM8D+Qc3VZgoW0MxpCXOcqiJ9QG+ukVcaUOzp0m4 cRUqz+BvqxuxF62etuMRuemEvu2mbRnvBQPAOKTX2pf+t2yywuQXET7x8OvlAtyma4k+ jydbIdChjjsQdlsIXgKbVPOwrzQxpI/euHA+CBQV8oqe4bfchyVtMYnMv3+3V6040gBC 1lng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Eehkf30/vOkmB1EHeirvJM7RAs4zwrvYsLhDyfeMDEo=; b=Qau3YxVmbJf5xx+2XP7urFigMxgth7QCZ0TmKSvGsjmff8/5SkY+R2tj41uhZsO0rO fBJj9RqBGBHcOQGza+NuOAF+IHAkXkFoK7jvga0+sR+dOsjWhsiB0YCKYNmlhuWQ2JgT gg1qkoXl/96tqkTH/EHS/sWlZvU28/eQR2ib9P/Fo4vJ2zbw9eYfQZTv0pMRK1pOx1Id AgjRH/fUXi6nuLxDlzV2V2y0PTJEEYI+zdqgq4FexgG6XJ0GMNLLF/N8dTtgiUqpUwvT pGV6ZnaKU3HJtu5uRd2rv20SiYuq8TwTC7lMzZJTcIBvliszaxdsHoFWeIVXoIuc9ixH fknQ== X-Gm-Message-State: AOAM532DGxioiW6ub1d/4mdtWcNrgwntOBrJBWCBn2GHtf2a+wlWh7wz kTZfETBZVatEo9/ZLxhpPgUsQmwMVg0= X-Google-Smtp-Source: ABdhPJwUfyjo09llrc+/Fo2zUYnofOzAAyc5suZLXTovy2jjrJiektOg0WRTBGGjqbiykKXwO7kOEg== X-Received: by 2002:a05:6000:23a:: with SMTP id l26mr40998595wrz.215.1638285568859; Tue, 30 Nov 2021 07:19:28 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:28 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 09/12] io_uring: add an option to flush zc notifications Date: Tue, 30 Nov 2021 15:18:57 +0000 Message-Id: X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add IORING_SENDZC_FLUSH flag. If specified, a send zc operation on success should also flush a corresponding ubuf_info. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 26 +++++++++++++++++++------- include/uapi/linux/io_uring.h | 4 ++++ 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 337eb91f0198..e1360fde95d3 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -608,6 +608,7 @@ struct io_sendzc { int msg_flags; int addr_len; void __user *addr; + unsigned int zc_flags; }; struct io_open { @@ -1992,6 +1993,12 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, } } +static void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) +{ + io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, true); + tx_ctx->notifier = NULL; +} + static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, struct io_tx_ctx *tx_ctx) { @@ -5041,6 +5048,8 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } +#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FLUSH + static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_ring_ctx *ctx = req->ctx; @@ -5049,8 +5058,6 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; - if (READ_ONCE(sqe->ioprio)) - return -EINVAL; sr->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); sr->len = READ_ONCE(sqe->len); @@ -5067,6 +5074,10 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); sr->addr_len = READ_ONCE(sqe->__pad2[0]); + req->msgzc.zc_flags = READ_ONCE(sqe->ioprio); + if (req->msgzc.zc_flags & ~IO_SENDZC_VALID_FLAGS) + return -EINVAL; + #ifdef CONFIG_COMPAT if (req->ctx->compat) sr->msg_flags |= MSG_CMSG_COMPAT; @@ -5089,6 +5100,7 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) sock = sock_from_file(req->file); if (unlikely(!sock)) return -ENOTSOCK; + ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter); if (unlikely(ret)) return ret; @@ -5128,6 +5140,8 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) if (ret == -ERESTARTSYS) ret = -EINTR; req_set_fail(req); + } else if (req->msgzc.zc_flags & IORING_SENDZC_FLUSH) { + io_tx_kill_notification(req->msgzc.tx_ctx); } io_ring_submit_unlock(ctx, issue_flags & IO_URING_F_UNLOCKED); __io_req_complete(req, issue_flags, ret, 0); @@ -9417,11 +9431,9 @@ static void io_sqe_tx_ctx_kill_ubufs(struct io_ring_ctx *ctx) for (i = 0; i < ctx->nr_tx_ctxs; i++) { tx_ctx = &ctx->tx_ctxs[i]; - if (!tx_ctx->notifier) - continue; - io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, - true); - tx_ctx->notifier = NULL; + + if (tx_ctx->notifier) + io_tx_kill_notification(tx_ctx); } } diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index bbc78fe8ca77..ac18e8e6f86f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -187,6 +187,10 @@ enum { #define IORING_POLL_UPDATE_EVENTS (1U << 1) #define IORING_POLL_UPDATE_USER_DATA (1U << 2) +enum { + IORING_SENDZC_FLUSH = (1U << 0), +}; + /* * IO completion data structure (Completion Queue Entry) */ From patchwork Tue Nov 30 15:18:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3325C433FE for ; Tue, 30 Nov 2021 15:20:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245430AbhK3PYK (ORCPT ); Tue, 30 Nov 2021 10:24:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245094AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83D5EC061574; Tue, 30 Nov 2021 07:19:31 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id g191-20020a1c9dc8000000b0032fbf912885so15035140wme.4; Tue, 30 Nov 2021 07:19:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QIxalKscV4CnpQCTwHXr7wtef9S8BsdHTlsrNhfGWA8=; b=n4yfrpbLbKFNvBlR8u1BtiNJEHRZi54HnhAXAPI6A9TsECywc7cEIbOwoj44HkOYz2 k350sUS04ihFLp4KIqji9wf8RXcy9834pZ0TQ4zZ0qfc7B6bqB2jOzPk9Wdb/g8s2haC aivI1xAeHa07vFkc6htrT1eR6Yy2Twh8C4Ukq08zEMjXD6UGCikuLcTFk84ioIWyKGEZ 9fPy+28Bh8+BvxPnNbgD8pDgJF67euZ/wqMxT24oKuhpnx6tHTnlLEFJbe4OdaLJnrga 4HPhxwl3chDAGeGhn5mLKDZsYEHX8hAR+MieAmzTIpTDD3YYwqUNQguNPirvM6NgnKrh d32Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QIxalKscV4CnpQCTwHXr7wtef9S8BsdHTlsrNhfGWA8=; b=W+lpE7P/06WaIT/s30qrkvb3yKmMUwifC7yDOF9JQQ7IPFh5ZmPjbOgFPuhI1RZrV4 tT4AVSaA3NeyZ1qjp/SMtUBSMCB3VdF653G7L5B1+f0dILyfb8/RtWepsG24id4+RLTd ka0Y9PeiB00jQZKIiFhwmJbC1CRvhAys1POUwNgGx4MfpKmd9hWynrWp8XqByc2wi6er wBtj6ZRSjPWeQgVoIPiZud429EB91nY8/7KtoIaACLKt87whTBE/PxkV8IJ73FTpq6Rr WNPov8w96zdJsQN+xSTdLz88yutVDJSXpy2cZ6aeOkOsGcR5fYTAebjEIJNXB9ROEpBc JLrw== X-Gm-Message-State: AOAM5318I0qjSxwD4UJk4MlTGIKR5pZAPkw1IEc11cRWQGY81KJDaJyo gGrVUjWKjlJ/8tpi9NuN6ERl5PNmLLw= X-Google-Smtp-Source: ABdhPJxXlLFPGOmKYBW0swfM1LNYnW+7m7YlhMVltRZdQrtoIZctYEpzELKK8TnfbH1HdFr7JWVeJQ== X-Received: by 2002:a05:600c:1d06:: with SMTP id l6mr14948wms.97.1638285570016; Tue, 30 Nov 2021 07:19:30 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:29 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 10/12] io_uring: opcode independent fixed buf import Date: Tue, 30 Nov 2021 15:18:58 +0000 Message-Id: <560cd5b8469874d16405bf4621d4336fad991fbf.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Extract an opcode independent helper from io_import_fixed for initialising an iov_iter with a fixed buffer with Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index e1360fde95d3..bb991f4cee7b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3152,11 +3152,11 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret, } } -static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, - struct io_mapped_ubuf *imu) +static int __io_import_fixed(int rw, struct iov_iter *iter, + struct io_mapped_ubuf *imu, + u64 buf_addr, size_t len) { - size_t len = req->rw.len; - u64 buf_end, buf_addr = req->rw.addr; + u64 buf_end; size_t offset; if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end))) @@ -3225,7 +3225,7 @@ static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter) imu = READ_ONCE(ctx->user_bufs[index]); req->imu = imu; } - return __io_import_fixed(req, rw, iter, imu); + return __io_import_fixed(rw, iter, imu, req->rw.addr, req->rw.len); } static void io_ring_submit_unlock(struct io_ring_ctx *ctx, bool needs_lock) From patchwork Tue Nov 30 15:18:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EB96C433EF for ; Tue, 30 Nov 2021 15:21:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242600AbhK3PYe (ORCPT ); Tue, 30 Nov 2021 10:24:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245096AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC1FCC061748; Tue, 30 Nov 2021 07:19:32 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id m25-20020a7bcb99000000b0033aa12cdd33so7898700wmi.1; Tue, 30 Nov 2021 07:19:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Sz0U0TnMDqzg3fATGdUww/ooCVGg4JrbsKMAkl6AooM=; b=YGM0vLPzRDvFDCbOBrI/Qn9dINi5sqHIH4bX2tjI4VsNz/J2v94ogyGPCpnnon2FGQ yoKtwcx2Q2GqGroFbt7hb7Xhg4ghwj45wd0H802VN6e0gqtw+xcUQfX90g+LWQ1Ab86K 4OMcU9X9KKKfo7uEYGaNgsf19VWd8L1+dOlzRYlBwOzXVeLob6WrjnIsypHABPLfV8L5 JkQlbE4jhMyBY/uVp2BeJu7+JWedO7HejdprgbRvlM5ZI94addpcvbUWVNi4c+cUtPsm iR60lrg/SLAfyzjyG5dDfsf2SGRRPfm3zpUiJKam0qthBWvys7PWzuLY1AvkuZ1sd1p3 qjLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Sz0U0TnMDqzg3fATGdUww/ooCVGg4JrbsKMAkl6AooM=; b=CbLxcp+UTvxMTtngOeaGq8Xx5PBGae7yZ+1/K/0zFFTOfU23hYXWjGSqWM6UIBIT0T wbv1PLA47wmef6+/YDZAC9tsuO6J3Vlk3JKQ3L0useis9kxFG2w2Kh3+rjepPXPpXzQY XC/1g/UAPfc5teEkuRwUCwr6KRfTsBmfItBnPzHIPlN9ChmlUDt2faL2y7eoBkUpyDY3 QWKSFo9hDkVLLm+B7rrPkyWS/YvdohIWBFQvuVv+YnufMVY+qvUtlUtwBS1o4ef8LkXt wDLg9bfZjqbRQVgCSpXN0iCxn5SdZJg403qsEw5HVER8k0jnaHmx5zHyh6ln1ImeCCTR 3TVQ== X-Gm-Message-State: AOAM533ARYXUDaepLc8SE1AmoQpEPutiS3A6wlJcN/Pur+qqHNdpXfS3 aXQRRbkX2gebsQqjHo7Qp8hoCmAa9xI= X-Google-Smtp-Source: ABdhPJxbvOD0dN4fxyx/1eBrykrogL7u3dD+bXaXMgZonAjlZHnrLKVZdIYG/vokMZDn2bU3HBiy7Q== X-Received: by 2002:a05:600c:190c:: with SMTP id j12mr9882wmq.117.1638285571117; Tue, 30 Nov 2021 07:19:31 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:30 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 11/12] io_uring: sendzc with fixed buffers Date: Tue, 30 Nov 2021 15:18:59 +0000 Message-Id: <962f2f1c524d25356cdda188070d8653ee28f012.1638282789.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Allow zerocopy sends to use fixed buffers. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 19 +++++++++++++++++-- include/uapi/linux/io_uring.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index bb991f4cee7b..5a0adfadf759 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5048,7 +5048,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } -#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FLUSH +#define IO_SENDZC_VALID_FLAGS (IORING_SENDZC_FLUSH | IORING_SENDZC_FIXED_BUF) static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -5078,6 +5078,15 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (req->msgzc.zc_flags & ~IO_SENDZC_VALID_FLAGS) return -EINVAL; + if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { + idx = READ_ONCE(sqe->buf_index); + if (unlikely(idx >= ctx->nr_user_bufs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_user_bufs); + req->imu = READ_ONCE(ctx->user_bufs[idx]); + io_req_set_rsrc_node(req, ctx); + } + #ifdef CONFIG_COMPAT if (req->ctx->compat) sr->msg_flags |= MSG_CMSG_COMPAT; @@ -5101,7 +5110,13 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter); + if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { + ret = __io_import_fixed(WRITE, &msg.msg_iter, req->imu, + (u64)sr->buf, sr->len); + } else { + ret = import_single_range(WRITE, sr->buf, sr->len, &iov, + &msg.msg_iter); + } if (unlikely(ret)) return ret; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ac18e8e6f86f..740af1d0409f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -189,6 +189,7 @@ enum { enum { IORING_SENDZC_FLUSH = (1U << 0), + IORING_SENDZC_FIXED_BUF = (1U << 1), }; /* From patchwork Tue Nov 30 15:19:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12647599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 829F3C433EF for ; Tue, 30 Nov 2021 15:21:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243937AbhK3PY1 (ORCPT ); Tue, 30 Nov 2021 10:24:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245100AbhK3PX2 (ORCPT ); Tue, 30 Nov 2021 10:23:28 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7086C06174A; Tue, 30 Nov 2021 07:19:33 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id o13so45086133wrs.12; Tue, 30 Nov 2021 07:19:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/4qpz7rl83fuO5X6dN8pqm4n0XXAduiad2YQ5ulIq20=; b=I/36ciOZgYSozU1brpSbdUbrsAN1z8uFOpSCoMX3VnU4QzmwfZHjQp/6isz7x0JCnE 9BY2GBHcgOXQYk5d2EAQRV4ZHtBHm7sHRkPcW+MpeUHfTov9+ky4N7tfgpNkss0dEPiQ 05rWh/XTszCSGQMnxiY9YqWbryjrkxx2MS9p7YmZaIewDrGHu9jwY37E2s3f+Sb+O4nb NqDdFxIJQvePO330sGybTuLy85XTJJsaf3qAd+RbW/47SVjSHxwuPT3p6EFuUjyhM4HE QaC3S5jlH7t9LOxKw1uum3jjyEePsLsBX12Ba/h3p2wREoGi4buSZde3ix3wk64j7F7d kTSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/4qpz7rl83fuO5X6dN8pqm4n0XXAduiad2YQ5ulIq20=; b=2l8YB2N7D8jZpzxjwoRcSVU00mAIsv7a3nh9O5Kyn3D5GqCUz9NFVRQj0WN9T+iEiv gcgWtWI01qoL6TSvnSl8Bxv4JvkBVFhsq5ynRQIhY8xKRM228mWy3J4/sHXXNgpn++OC ZQVZ8+r1wOcyt9O/6giCy4eoZS7ZYSANNJ2oPNaR/m0OAVTTm8zuyccOskp6ioaEXrmv A7hnCLRM64pADMpUI/6qiCHeJ4/Fc/hbEHQbnfZVUZxWd8EJ8zRWnm/VfGsBWpyKzdoB HddZgLTVskHk9LqmhhBNu2bPtCA/zTLxbkKNOQfuFwy5Ap76yJNglRJMzS3id+CHS2Qh YSKQ== X-Gm-Message-State: AOAM533t9Ndskb4SfkhNPbLJt7HYF8NSFD6xrpB8HB4KsO6yDWaOfSYj aEdcSBLLTf2IYrcilc7K5KTWwHa3pSA= X-Google-Smtp-Source: ABdhPJwuPqYMs2SMchqkN/ucEwDcLeJzxvw8xWysP6BIzYMM/xZsvVqyEymzXBtiKUs4ximNMagJpw== X-Received: by 2002:a5d:4a85:: with SMTP id o5mr41068346wrq.109.1638285572286; Tue, 30 Nov 2021 07:19:32 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.232.109]) by smtp.gmail.com with ESMTPSA id d1sm16168483wrz.92.2021.11.30.07.19.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:19:31 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , Hideaki YOSHIFUJI , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC 12/12] io_uring: cache struct ubuf_info Date: Tue, 30 Nov 2021 15:19:00 +0000 Message-Id: X-Mailer: git-send-email 2.34.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Allocation/deallocation of ubuf_info takes some time, add an optimisation caching them. The implementation is alike to how we cache requests in io_req_complete_post(). ->ubuf_list is protected by ->uring_lock and requests try grab directly from it, and there is also ->ubuf_list_locked list protected by ->completion_lock, which is eventually batch spliced to ->ubuf_list. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 74 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 64 insertions(+), 10 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5a0adfadf759..8c81177395c3 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -334,6 +334,7 @@ struct io_tx_notifier { struct percpu_ref *fixed_rsrc_refs; u64 tag; u32 seq; + struct list_head cache_node; }; struct io_tx_ctx { @@ -393,6 +394,9 @@ struct io_ring_ctx { unsigned nr_tx_ctxs; struct io_submit_state submit_state; + struct list_head ubuf_list; + struct list_head ubuf_list_locked; + int ubuf_locked_nr; struct list_head timeout_list; struct list_head ltimeout_list; struct list_head cq_overflow_list; @@ -1491,6 +1495,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_LIST_HEAD(&ctx->ubuf_list); + INIT_LIST_HEAD(&ctx->ubuf_list_locked); return ctx; err: kfree(ctx->dummy_ubuf); @@ -1963,16 +1969,20 @@ static void io_zc_tx_work_callback(struct work_struct *work) struct io_tx_notifier *notifier = container_of(work, struct io_tx_notifier, commit_work); struct io_ring_ctx *ctx = notifier->uarg.ctx; + struct percpu_ref *rsrc_refs = notifier->fixed_rsrc_refs; spin_lock(&ctx->completion_lock); io_fill_cqe_aux(ctx, notifier->tag, notifier->seq, 0); + + list_add(¬ifier->cache_node, &ctx->ubuf_list_locked); + ctx->ubuf_locked_nr++; + io_commit_cqring(ctx); spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); - percpu_ref_put(notifier->fixed_rsrc_refs); + percpu_ref_put(rsrc_refs); percpu_ref_put(&ctx->refs); - kfree(notifier); } static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, @@ -1999,26 +2009,69 @@ static void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) tx_ctx->notifier = NULL; } +static void io_notifier_splice(struct io_ring_ctx *ctx) +{ + spin_lock(&ctx->completion_lock); + list_splice_init(&ctx->ubuf_list_locked, &ctx->ubuf_list); + ctx->ubuf_locked_nr = 0; + spin_unlock(&ctx->completion_lock); +} + +static void io_notifier_free_cached(struct io_ring_ctx *ctx) +{ + struct io_tx_notifier *notifier; + + io_notifier_splice(ctx); + + while (!list_empty(&ctx->ubuf_list)) { + notifier = list_first_entry(&ctx->ubuf_list, + struct io_tx_notifier, cache_node); + list_del(¬ifier->cache_node); + kfree(notifier); + } +} + +static inline bool io_notifier_has_cached(struct io_ring_ctx *ctx) +{ + if (likely(!list_empty(&ctx->ubuf_list))) + return true; + if (READ_ONCE(ctx->ubuf_locked_nr) <= IO_REQ_ALLOC_BATCH) + return false; + io_notifier_splice(ctx); + return !list_empty(&ctx->ubuf_list); +} + static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, struct io_tx_ctx *tx_ctx) { struct io_tx_notifier *notifier; struct ubuf_info *uarg; - notifier = kmalloc(sizeof(*notifier), GFP_ATOMIC); - if (!notifier) - return NULL; + if (likely(io_notifier_has_cached(ctx))) { + if (WARN_ON_ONCE(list_empty(&ctx->ubuf_list))) + return NULL; + + notifier = list_first_entry(&ctx->ubuf_list, + struct io_tx_notifier, cache_node); + list_del(¬ifier->cache_node); + } else { + gfp_t gfp_flags = GFP_ATOMIC|GFP_KERNEL_ACCOUNT; + + notifier = kmalloc(sizeof(*notifier), gfp_flags); + if (!notifier) + return NULL; + uarg = ¬ifier->uarg; + uarg->ctx = ctx; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + uarg->callback = io_uring_tx_zerocopy_callback; + } WARN_ON_ONCE(!current->io_uring); notifier->seq = tx_ctx->seq++; notifier->tag = tx_ctx->tag; io_set_rsrc_node(¬ifier->fixed_rsrc_refs, ctx); - uarg = ¬ifier->uarg; - uarg->ctx = ctx; - uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; - uarg->callback = io_uring_tx_zerocopy_callback; - refcount_set(&uarg->refcnt, 1); + refcount_set(¬ifier->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); return notifier; } @@ -9732,6 +9785,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + io_notifier_free_cached(ctx); io_sqe_tx_ctx_unregister(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes);