From patchwork Tue Dec 21 15:35:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689963 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF144C4332F for ; Tue, 21 Dec 2021 15:35:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239406AbhLUPf4 (ORCPT ); Tue, 21 Dec 2021 10:35:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239395AbhLUPfz (ORCPT ); Tue, 21 Dec 2021 10:35:55 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 970D9C061574; Tue, 21 Dec 2021 07:35:54 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id d9so8205178wrb.0; Tue, 21 Dec 2021 07:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NHE8ME2eLtzIF/v1MnQsFdVidn9PyCvdv39RROWf7fc=; b=ovBWMQqZUDsCBIbS8jeptc4UubI1uVUW1A0gy2VO58qznwemwF10wpfS5lQzL7D5H0 7e1I9qRx9KPbIwd7yqUB76tN7rahCV8OKPmz++jZf5ME6DQkGxgmoBnfI5d3sofrqOgP Xp2R7lQ/yVKMTD8XqWZJZfBmabGbCv/Bg+I7u8Us2BDfKNnoWu+Mgt0kkdHpUcRJV1oU GoQg4bXE0z+/k9ApnSVV163/WODdmGrhGdja1docQnk4VmeYSCnJEAC8ydyBrb3RAoUo SdpWKxmIAja/Lo/ljJOIVdXUe9PKi6goM8S4PFJ+bOkp8ezXoUxiAwvGr9BlinTTtbyq L1Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NHE8ME2eLtzIF/v1MnQsFdVidn9PyCvdv39RROWf7fc=; b=CK89iy4ljPF9AzscoRh7GCAN1Wwh+Uj2udV9wlYDYusCOxBvuhV2RI6ONGcYKqEOmE 5K/1ou5No4gb+EpzVTDI6cmlRctBWx3Ci38uF9OR02tEIWy7wo9vIXXNilbQ8mZNyVxf zUv97iLj2g0oE0kqZd0aiIoPxE/HQKSWCO7NOUBV6ZW5vGeYLNrfB7XDqhQs8Udn/B94 zUB49N3zzLAnsXEU7LBgUV35QELqQvxJaJckXRfaXRdx3UfUVq+qEmq3xcpNOsc/TGca yPsFwFgVvKAZlqF4FpCnNWtV9DXyqqlgb8xVKPZWrUCw7qLUuq/BZdLvCNEdiQCjIBFG BVsA== X-Gm-Message-State: AOAM531bCyozZlSueHBjOJsamMrtXhhIe4Ehu8OgQQWL69LcUP62kzGk 1T+3qdo6TUtet3OahdC3jNZFR+DDrO0= X-Google-Smtp-Source: ABdhPJx1pfUCucQ26f5ozBDWYnV52qT59g54Z+6AlnCbhyIyW2MeK6IuK0roy41Rzy+yhC5TL+MQyQ== X-Received: by 2002:adf:e0c8:: with SMTP id m8mr3246345wri.113.1640100953014; Tue, 21 Dec 2021 07:35:53 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:52 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Date: Tue, 21 Dec 2021 15:35:23 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 8 +++++--- net/core/skbuff.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c8cb7e697d47..b80944a9ce8f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -459,10 +459,13 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) -#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY) +#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ + SKBFL_DONT_ORPHAN) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -2839,8 +2842,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ba2f38246f07..b23db60ea6f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1191,7 +1191,7 @@ struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; - uarg->flags = SKBFL_ZEROCOPY_FRAG; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; refcount_set(&uarg->refcnt, 1); sock_hold(sk); From patchwork Tue Dec 21 15:35:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689967 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98990C433FE for ; Tue, 21 Dec 2021 15:36:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239444AbhLUPgB (ORCPT ); Tue, 21 Dec 2021 10:36:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239400AbhLUPf4 (ORCPT ); Tue, 21 Dec 2021 10:35:56 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98530C061574; Tue, 21 Dec 2021 07:35:55 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id r17so27375656wrc.3; Tue, 21 Dec 2021 07:35:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dPSlVpr7txUPmQgAHE8W7WBCYuNNpGMD5iocgyvoTWA=; b=KtT4yTdaGYMM2K8v0RtSMUOk/6W3ELsHAkE662xdRsmOGdpceVDrVoAvQGdq4xtyRL 4f5Kibf+ms2jCHgHxkl1UnyTlyee0ddndSR20egA9NEoLx2q64ecpERRAbXLuxu/pqK2 EXmePyJ9U61RSKC18rEmXHsQTBwuYzYjEKkatCWn9LjduRwG3F/kHBoi7UUzz3+pzrl0 khUzYFSVxV2Ko3RsqFSZ/llfyLhFx7whdXUpKrN38PwopgTG1EBkBkHOVcg1A0TMc+Wq IXU3QKbOqxcFrWX5OHBfnb0IyHy/jdcac6p7st5czHYKB+QGEtd+HtLHup1rvdc5THyE EtKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dPSlVpr7txUPmQgAHE8W7WBCYuNNpGMD5iocgyvoTWA=; b=zWQ/i1k/84JnQVfTm5lv2qI/vy/rLRzeSjrzkOVbLeslvWh+6GjQsb6xz80QjbjKj6 YD4oY5TLnEDvWTtALK9gxXV2FVVLMNWxJdhBKtEU44DMQsLca1kdCefCAHLUc+oPl2nb dw6j2Iw8paq6bklxklytA3RSy4Nii60mTpUwQzWDugtd66tmiG8ZzXUjf40Y4H3AFP8B lA3GgU2KGIPEnI3OSOFwaXoHoCDX4BsVHF2+MaB9rKdYO8SS8jYDMkCY0wcC6ubMWCF/ q9LsqH9oKGpT2DxiAPzoFw+h7FBJ4kJ3bsE14PlAFf/dkaVkBUF6x2KVR7qdjm1Gtv/G /4tw== X-Gm-Message-State: AOAM530ebiA1ze/t3bf+veFqU/qbqrnM+fak0ZyBuhWjHSZDNXWqUelr CLoeaSFHyiCLW/u18ei2dvUjgIOG0Dg= X-Google-Smtp-Source: ABdhPJx6vPTXtXPaU+lTMdgO0I0/v13+zI5/SuAU7CIhKZunOTh+dW6yumzbdv2kSQN1HhUK5X1D8w== X-Received: by 2002:adf:e791:: with SMTP id n17mr2907897wrm.719.1640100954090; Tue, 21 Dec 2021 07:35:54 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:53 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Date: Tue, 21 Dec 2021 15:35:24 +0000 Message-Id: <7dae2f61ee9a1ad38822870764fcafad43a3fe4e.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Instead of the net stack managing ubuf_info, allow to pass it in from outside in a struct msghdr (in-kernel structure), so io_uring can make use of it. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 2 ++ include/linux/socket.h | 1 + net/compat.c | 1 + net/socket.c | 3 +++ 4 files changed, 7 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 72da3a75521a..59380e3454ad 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4911,6 +4911,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; flags = req->sr_msg.msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -5157,6 +5158,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; flags = req->sr_msg.msg_flags; if (force_nonblock) diff --git a/include/linux/socket.h b/include/linux/socket.h index 8ef26d89ef49..6bd2c6b0c6f2 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -65,6 +65,7 @@ struct msghdr { __kernel_size_t msg_controllen; /* ancillary data buffer length */ unsigned int msg_flags; /* flags on received message */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { diff --git a/net/compat.c b/net/compat.c index 210fc3b4d0d8..6cd2e7683dd0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -80,6 +80,7 @@ int __get_compat_msghdr(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/socket.c b/net/socket.c index 7f64a6eccf63..0a29b616a38c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2023,6 +2023,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2088,6 +2089,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2326,6 +2328,7 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0; From patchwork Tue Dec 21 15:35:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689965 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9489C433FE for ; Tue, 21 Dec 2021 15:36:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239439AbhLUPgA (ORCPT ); Tue, 21 Dec 2021 10:36:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239410AbhLUPf4 (ORCPT ); Tue, 21 Dec 2021 10:35:56 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92A7FC061401; Tue, 21 Dec 2021 07:35:56 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id e5so27695561wrc.5; Tue, 21 Dec 2021 07:35:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LPkWHJSnV5nNM7itLbmwC1bFPV13/RJSG50oK6FXZpk=; b=aP99VNZHpGh/sW7pp/qKJsclA1Fn5ykT6uWK+h4+RjpoU2C5Qdk2J8LD660Y+bzBhN LBhqGRyfX20Zur2HNee0txnRfNBFNd7u11PN9t95MuiaqZXvRP/2gjZEL9Qix7+DG7Xw dYDo+O8B3XqMQbZpfY565K/0mZzfV9A/93O2VVfGMjYFQDHAkidzMEJYn+pnhfbZmcz4 lmU8g8fzPrTABKZIT31H/IxBDLXmSyj2UNKLxvr46Oel9Q/kCL+8r3DeVt9eyWW9Nrlx JKb/Tx0DCYRReZhM6eNipu4HzKiLRE7lbno6Z54bLe2YNZ4hbjZl7M5C0rLHDCkINbNk qL/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LPkWHJSnV5nNM7itLbmwC1bFPV13/RJSG50oK6FXZpk=; b=3VKfIPsKov8cP1Ex+5jASbNq0am0TBc1L4Z25AZcxjSaoRy522Q/jXb1XKV0I8hdxT YJQTJ6LU7vBtyZaFnCf51rtuTzD3Of3m8kKgGpP50MX3OIMiEgP2eDHRFgCTmBtjNoxS 2zlUu9Cix52Ykm+MB7xInF+3oV7dR0zpR22E2PHr7J6xrvdO03PpbO9biiql+RRu2ty4 4zBwWOAgIlRrhUwr6Sdhhe3CbdBJU40n5lJpmu+NNQXdI0O1Yje5WhLpdyLwGokw42gP MOfLpYW/KKCBMn5BixYtbvrvZenVe5IAbt+yom9RNtfU9y+iI74wiNHr/rLaG2/G5hEm TSLw== X-Gm-Message-State: AOAM533AgRRZzobHrf0r+Zy9ls01lBB51fKvFSyl73PzcUEG3YfkjLuq 3y7mLvP3IBlz71Lu/SXVrmYK0AMnhxw= X-Google-Smtp-Source: ABdhPJzhh+SYfefwdTfRXydLxmvAmvewcVnrFwXXe1Aq14C06o7daYcskdkAbxXrxd46+cNI/mJkxg== X-Received: by 2002:adf:e109:: with SMTP id t9mr2985003wrz.387.1640100955072; Tue, 21 Dec 2021 07:35:55 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:54 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Date: Tue, 21 Dec 2021 15:35:25 +0000 Message-Id: <162b7096c1a8e31743b692a229bac0c06a64c75c.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add a separate path for bvec iterators in __zerocopy_sg_from_iter, first it's quite faster but also will be needed to optimise out get/put_page() Signed-off-by: Pavel Begunkov --- net/core/datagram.c | 50 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/net/core/datagram.c b/net/core/datagram.c index ee290776c661..cb1e34fbcd44 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -616,11 +616,61 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); +static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + int ret, frag = skb_shinfo(skb)->nr_frags; + struct bvec_iter bi; + struct bio_vec v; + ssize_t copied = 0; + unsigned long truesize = 0; + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size) { + if (frag == MAX_SKB_FRAGS) { + ret = -EMSGSIZE; + goto out; + } + + v = mp_bvec_iter_bvec(from->bvec, bi); + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + get_page(v.bv_page); + skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + ret = 0; +out: + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + return ret; +} + int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + if (iov_iter_is_bvec(from)) + return __zerocopy_sg_from_bvec(sk, skb, from, length); + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL; From patchwork Tue Dec 21 15:35:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689969 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8FEDC433EF for ; Tue, 21 Dec 2021 15:36:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239414AbhLUPf7 (ORCPT ); Tue, 21 Dec 2021 10:35:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239420AbhLUPf6 (ORCPT ); Tue, 21 Dec 2021 10:35:58 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E62DC061747; Tue, 21 Dec 2021 07:35:57 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id s1so27793280wrg.1; Tue, 21 Dec 2021 07:35:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EYHUVcpIu0swfI0JH2FwIPOXk5xlkPfb/IJYJ57JPRE=; b=qn4VnChreXPJGgvHM24IqFtDLmHI8pS47UnpUQsQ3rI0N87Kp5KjmWR45HfiBEP/WL abAkoxjT5MU76uFovkrMIfx6Y6wBIt2/SFB3HOMYMSmZmHaiZsyRVdb0cTqzgGXbtgyi 5iHjzVPLLytWdDd+WBx7MRp6Jr0lQPhip9Hdr0GBa4joJV2eBQrPzjo6sidG2V+LdoST wYlNKbo2Z2lQSOjQXY2qZGtN9pF9Gdogi1rx2jFGDzChVw0fFxfMnNyzR2EazxUipRta 0ChqXlLrLmSKkwpGjtysHIWsod0jZRVcTTPU/oZcbhdWkIVO9md54INQOZUmo6DplZDJ sjfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EYHUVcpIu0swfI0JH2FwIPOXk5xlkPfb/IJYJ57JPRE=; b=JbXsDNJ9ehYMXNSyPqrSiEPiJAtdO41YhpV/JixA+zX/J6g6U0MErKoLm4sr9PNifJ BxjvhS67y6GYXBHmzioQlefxcabKd4FAGsBoHWgLBbZq82TVxzDh6p9yCaeZZYPOX9QL OVtRFmdAL4EP2a7SU5/0QTf1TkuqOaC5yjn9l5eRsXqxt/6fUD721y23anDdnnYBXt9f ntabgvDIzIUoX8aZmtDuI8xJ7izcGJbostsywqkI7ELkmXpNCwUFxYHHgYTkKdcLGiMm v9hmG2cJQUn+LWvj1bbgCzc2wTU1AC2tPwf6bNKvpM0vMee6lB8M5MU/7+fpIh3/GJLR Fd/Q== X-Gm-Message-State: AOAM53345cxxJVhIYYRm0k9Q6AbuhOxxuUs3oMV/wsjmJGsXf8zeYcTg 03W1YqUc+e5HHFjUfpFq4RhupWD5QyE= X-Google-Smtp-Source: ABdhPJyqSJ2OkBCbas7NOVZowmBQrV9i/fXnYTk6C2yykkVbqmiI5QCmhzYb1PwIuF5pFJSCxJBpLw== X-Received: by 2002:a5d:6289:: with SMTP id k9mr3025425wru.501.1640100956018; Tue, 21 Dec 2021 07:35:56 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:55 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 04/19] net: optimise page get/free for bvec zc Date: Tue, 21 Dec 2021 15:35:26 +0000 Message-Id: <6bea8a32471c7a4e849d64cf5b6122236b6a38dd.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC get_page() in __zerocopy_sg_from_bvec() and matching put_page()s are expensive. However, we can avoid it if the caller can guarantee that pages stay alive until the corresponding ubuf_info is not released. In particular, it targets io_uring with fixed buffers following the described contract. Assuming that nobody yet uses bvec together with zerocopy, make all calls with bvec iterators follow this model. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 12 ++++++++++-- net/core/datagram.c | 9 +++++++-- net/core/skbuff.c | 14 +++++++++++++- 3 files changed, 30 insertions(+), 5 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b80944a9ce8f..f6a6fd67e1ea 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -461,11 +461,16 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAGS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ - SKBFL_DONT_ORPHAN) + SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAGS) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -3155,7 +3160,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!(shinfo->flags & SKBFL_MANAGED_FRAGS)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** diff --git a/net/core/datagram.c b/net/core/datagram.c index cb1e34fbcd44..46526af40552 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -638,7 +638,6 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, v = mp_bvec_iter_bvec(from->bvec, bi); copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - get_page(v.bv_page); skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } @@ -667,9 +666,15 @@ int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + bool managed = skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS; - if (iov_iter_is_bvec(from)) + if (iov_iter_is_bvec(from) && (managed || frag == 0)) { + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAGS; return __zerocopy_sg_from_bvec(sk, skb, from, length); + } + + if (managed) + return -EFAULT; while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b23db60ea6f9..10cdcb99d34b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,11 +666,18 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) goto exit; - skb_zcopy_clear(skb, true); + if (skb_zcopy(skb)) { + bool skip_unref = shinfo->flags & SKBFL_MANAGED_FRAGS; + + skb_zcopy_clear(skb, true); + if (skip_unref) + goto free_head; + } for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); +free_head: if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); @@ -1597,6 +1604,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); skb_copy_header(n, skb); + skb_shinfo(n)->flags &= ~SKBFL_MANAGED_FRAGS; return n; } EXPORT_SYMBOL(skb_copy); @@ -1653,6 +1661,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom, skb_frag_ref(skb, i); } skb_shinfo(n)->nr_frags = i; + skb_shinfo(n)->flags &= ~SKBFL_MANAGED_FRAGS; } if (skb_has_frag_list(skb)) { @@ -1725,6 +1734,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, refcount_inc(&skb_uarg(skb)->refcnt); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) skb_frag_ref(skb, i); + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAGS; if (skb_has_frag_list(skb)) skb_clone_fraglist(skb); @@ -3788,6 +3798,8 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, if (skb_can_coalesce(skb, i, page, offset)) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size); } else if (i < MAX_SKB_FRAGS) { + if (skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS) + return -EMSGSIZE; get_page(page); skb_fill_page_desc(skb, i, page, offset, size); } else { From patchwork Tue Dec 21 15:35:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689971 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 719ABC433FE for ; Tue, 21 Dec 2021 15:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239496AbhLUPgG (ORCPT ); Tue, 21 Dec 2021 10:36:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239421AbhLUPf7 (ORCPT ); Tue, 21 Dec 2021 10:35:59 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBF5AC061574; Tue, 21 Dec 2021 07:35:58 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id t26so27715373wrb.4; Tue, 21 Dec 2021 07:35:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fs4IKj4lluOFqbjUEotJFbASrWU7V0j1r6mW5rOPUoo=; b=GWu7nMAaTySQQD9M470qpyx7UgtutJgOOgAWLili50UWjwMDIBrP0ivnHgHshQBT9l 7wnUclF5HzCwOVjKnRaKmHo21J1ccJ/UvtX688bHZ/dU5oncm1D6USE5QTIF/4P+zxJX pG+8CF+Ewlfh67imRkoENc8KXBGp4Jf0m4E4KpECmsvGrftKBuW9vZAbrZ5tfmxuPSz9 K987Ez5ViihMw2S9eieb8El9A0+yyDWr3ZjSvdETysiBzoGQ2wOasepp03W7Z5vAbzGA hXV39g2Kpuy8TgrmzJtoA6HbZztvq/Clf4dQgYy8VcUsZwoPflGOrMg8XAlnoE/xPWEF F0vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fs4IKj4lluOFqbjUEotJFbASrWU7V0j1r6mW5rOPUoo=; b=mnWhvXjbaRgyqnvqgyv35e8tCTxH5zyR5yBUaGQh+Gc8PaZjCK0O75voMC1JD4JXUE eap/cyQQ6RVZjMu9uH+b7/QgoWS41i8p2Nn5PXBlK+ZUV3lCq52MMOQJwNFwpvHYEzWE tPub0ZAI5DzufWI3jDlGdOPIKa7HvAixFPTdbkBiZxV/Bs3HHSaYuixHAplbhcAw/8LL ZHaanFewux8NbAkQfEPcaZHcVEu/I1sg2TuA1dvIxfxcZy76CI8IZbxX6BAZV1vGRyDZ eom9TkeLcny9VACT081g00h9DSl8i/Bx/K29lDbY+0VF9X+KJ769jRSyYPS6jJrI/sqi 2nEg== X-Gm-Message-State: AOAM530C2X30VfVMwrWJ+bOq1MUKvwgwytDpSCdt7SdvwDN9cZ3NDq3N DXM198fABq7h5X8HMdsBOJjJYAl50P0= X-Google-Smtp-Source: ABdhPJyK3SX8Np6YQTRrCehwfr8yI5k+PhB1eh4XPCr1sDLMCK/3TXlOFvYoB1jqCfxEoHclwT+FPQ== X-Received: by 2002:adf:f587:: with SMTP id f7mr3125766wro.671.1640100957177; Tue, 21 Dec 2021 07:35:57 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:56 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Date: Tue, 21 Dec 2021 15:35:27 +0000 Message-Id: <598860fe8307c120f07b4383b98cc51bde9cd531.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC In case of zerocopy frags are filled with userspace allocated memory, we shouldn't care about setting skb->pfmemalloc for them, and especially when the buffers were somehow pre-registered (i.e. getting bvec from io_uring). Remove the tracking from __zerocopy_sg_from_bvec(). Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 28 +++++++++++++++++----------- net/core/datagram.c | 7 +++++-- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f6a6fd67e1ea..eef064fbf715 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2203,6 +2203,22 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, + int i, struct page *page, + int off, int size) +{ + skb_frag_t *frag = &shinfo->frags[i]; + + /* + * Propagate page pfmemalloc to the skb if we can. The problem is + * that not all callers have unique ownership of the page but rely + * on page_is_pfmemalloc doing the right thing(tm). + */ + frag->bv_page = page; + frag->bv_offset = off; + skb_frag_size_set(frag, size); +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2219,17 +2235,7 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - /* - * Propagate page pfmemalloc to the skb if we can. The problem is - * that not all callers have unique ownership of the page but rely - * on page_is_pfmemalloc doing the right thing(tm). - */ - frag->bv_page = page; - frag->bv_offset = off; - skb_frag_size_set(frag, size); - + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; diff --git a/net/core/datagram.c b/net/core/datagram.c index 46526af40552..f8f147e14d1c 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -619,7 +619,8 @@ EXPORT_SYMBOL(skb_copy_datagram_from_iter); static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { - int ret, frag = skb_shinfo(skb)->nr_frags; + struct skb_shared_info *shinfo = skb_shinfo(skb); + int ret, frag = shinfo->nr_frags; struct bvec_iter bi; struct bio_vec v; ssize_t copied = 0; @@ -638,11 +639,13 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, v = mp_bvec_iter_bvec(from->bvec, bi); copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, + v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } ret = 0; out: + shinfo->nr_frags = frag; skb->data_len += copied; skb->len += copied; skb->truesize += truesize; From patchwork Tue Dec 21 15:35:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689973 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E23EC4332F for ; Tue, 21 Dec 2021 15:36:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239528AbhLUPgL (ORCPT ); Tue, 21 Dec 2021 10:36:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239432AbhLUPgA (ORCPT ); Tue, 21 Dec 2021 10:36:00 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6ECFC061746; Tue, 21 Dec 2021 07:35:59 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id o19-20020a1c7513000000b0033a93202467so2344933wmc.2; Tue, 21 Dec 2021 07:35:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RTN4JC/Cs0aKfPl2h/v0kiFtkMj5n0j4ZWeMe20qCjU=; b=gYWRgOBHLYoMRt+5a3W6g1jnt2VexcpDkrtmU24DVf2hXA1ufGcFJ3fKO28Gn3/SgA 2p3QevyFs7R7B42DBg7DCqrKYq5tN+nU6qUVvTTcZ/8+QUwsl29KjcGS2z203zfGcdd+ e13kbFPgwCDhvrSBqOCJ8/uQHWHOUbwDJ/YgfnuMta4XtFkveQuGrwTOTLWts5omgy/W bbJLfehBXSris8OyXEyvnahVIBxS4Ugtk+suq5dJ7lcwsK55NT325fhmGSwVexjV5VaK ox/lTFN59bAuRJDSWVdUpWrHMsrTW+6is6lWiVWjUrEGRkIcvfDbpqJTWnLMjTeI2iqD H4ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RTN4JC/Cs0aKfPl2h/v0kiFtkMj5n0j4ZWeMe20qCjU=; b=G4pGmIKqoepZmdBMPR6SCOhrEyQ7Y+nWnfCm/+vfud0NfnCXpRyltRLeh6/Qw/Dq9q AWzshc4e2rlcPaV5MwG914SNHMHDgBSrR42FNgr1Cu4sAA7/wAAkL9xVetp4NdQcGynh cQ+o4QJCDhnuzMRr51Ng/cgZH70oSm4OsW5cogqYPpsJ5ssJ/o/qb9pOw6ZSFfeB/rat hWv62JC493e9209WYhQPJDueBD+2bKjvHIHhBbo/XmrbhU3NeyIoIRp778Y41qVrKZvE l+yia/W3Vx1WSY9NTpSOw0+CIueKNvx0rsxrK9KZYuEltpQjvcyXNeVkXiZS7Marsg6u CeTw== X-Gm-Message-State: AOAM5306f59B7kbeYrW+B2/52AiKY42paWSZssv3CDwh8WpQ4iQEp616 9mfOjdleOGk2Vr9vhvYXj3Lkg5QcYtg= X-Google-Smtp-Source: ABdhPJx9m1Tg29F0RP07f65BaSMmKUiKR2caL/U52afTFbfNykdHRZfmrzAI/udyInj2/wl05Rq41A== X-Received: by 2002:a05:600c:3b19:: with SMTP id m25mr3250527wms.100.1640100958248; Tue, 21 Dec 2021 07:35:58 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:57 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Date: Tue, 21 Dec 2021 15:35:28 +0000 Message-Id: <92234ed7fe28f63c475b22c25cdc271adadd640d.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Make ipv4/udp to use ubuf_info passed in struct msghdr if it was specified. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 50 ++++++++++++++++++++++++++++++++------------ 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 9bca57ef8b83..f820288092ab 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -953,7 +953,6 @@ static int __ip_append_data(struct sock *sk, struct inet_sock *inet = inet_sk(sk); struct ubuf_info *uarg = NULL; struct sk_buff *skb; - struct ip_options *opt = cork->opt; int hh_len; int exthdrlen; @@ -967,6 +966,7 @@ static int __ip_append_data(struct sock *sk, unsigned int wmem_alloc_delta = 0; bool paged, extra_uref = false; u32 tskey = 0; + bool zc = false; skb = skb_peek_tail(queue); @@ -1001,17 +1001,37 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + uarg = msg->msg_ubuf; + if (skb_zcopy(skb) && uarg != skb_zcopy(skb)) + return -EINVAL; + + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + /* Drop uarg if can't zerocopy, callers should + * be able to handle it. + */ + uarg = NULL; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1172,9 +1192,13 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; + if (skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS) { + err = -EFAULT; + goto error; + } err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; From patchwork Tue Dec 21 15:35:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689975 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A210C433FE for ; Tue, 21 Dec 2021 15:36:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239432AbhLUPgM (ORCPT ); Tue, 21 Dec 2021 10:36:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239443AbhLUPgB (ORCPT ); Tue, 21 Dec 2021 10:36:01 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8D6FC061747; Tue, 21 Dec 2021 07:36:00 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id a9so27689717wrr.8; Tue, 21 Dec 2021 07:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I4xvaEvPhhgm/G3PxBzFOn189LbQsp3mXz6KTK9Hscw=; b=pb20C7sUx1QTnoXGlwt8gXcXP52QOFY7++jO7Sv0Am/2USGvB04nc728Qn8z4IzjzM IGEX9uKsmZJ12XHfylDx4xLBV6y747sKA2iz3cxjlU9DaYsXkDOAv/0yiR2gmm5xujiI VrRNIBiwPRspBHQNdPwtCcWx0WP/w7iSEHdFiPVuehAjKCOkDkDyq5o7zfoHH40f+BqI zT9jwPLXDNz4UkLJq9IxqAl3FR2yLeVxehTlHLuZy/7bEcdAYvp/99ssJaEuzSSB8Cuy 3/2zzgA/ah7GaD5pTo0J3GarajbGBYs53s5ukXEjoHJYA3FAyP8tIjt1HfEM9XB2z+95 0hQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I4xvaEvPhhgm/G3PxBzFOn189LbQsp3mXz6KTK9Hscw=; b=PKudjQcdpULPUhDsddUa+eY0EvF4gVY5favMudrxpi41O8Xfc7pdfQKz8ZojG+xYOt bT3pEVYzHRsNENeThfoelWeF+2YpgWe4PBrxaeg06rPpqaQRM4TbfjwWgLNniW4vKZnG pyeMeppJMqGSa8L6Z/M5jL93mt3sNwcVB7wlwyRQxG9cf3BFYgLRcMU/wtg4HP/DKb7o bvm6jKKzC6U+AlgXt1rFvdN4kJbqfoqXjwrwZIAwSeKQH9eqBa7VGlg/u8NBGwQu4qXg ErMu8kXu0g0Brh+OATPgJD0PZ3Q8GApzFXI9u/U0LazEo6foYPj5ai2hlsHIwIOLoTJR TkTQ== X-Gm-Message-State: AOAM530cLtz9CpDB5uoiOuvdgzVEagz9njAbkRFV7xqQ57bkBYgtHyKl VVxhBP1TvlmdCO+KQLOtz+wEdYkXEck= X-Google-Smtp-Source: ABdhPJxLNE54pXGhLwACql6bjUs5JQQNcqSYHl50lCRaTtP5PdzfaSB18zl5/Yb7lUkXXeayTj099A== X-Received: by 2002:a05:6000:144a:: with SMTP id v10mr3048581wrx.357.1640100959416; Tue, 21 Dec 2021 07:35:59 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:35:59 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 07/19] ipv6/udp: add support msgdr::msg_ubuf Date: Tue, 21 Dec 2021 15:35:29 +0000 Message-Id: <70428063e99a4418d2e519a496ebd1096d45ac59.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Make ipv6/udp to use ubuf_info passed in struct msghdr if it was specified. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 49 ++++++++++++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 12 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 2f044a49afa8..822e3894dd3b 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1452,6 +1452,7 @@ static int __ip6_append_data(struct sock *sk, unsigned int maxnonfragsize, headersize; unsigned int wmem_alloc_delta = 0; bool paged, extra_uref = false; + bool zc = false; skb = skb_peek_tail(queue); if (!skb) { @@ -1516,17 +1517,37 @@ static int __ip6_append_data(struct sock *sk, rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + uarg = msg->msg_ubuf; + if (skb_zcopy(skb) && uarg != skb_zcopy(skb)) + return -EINVAL; + + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + /* Drop uarg if can't zerocopy, callers should + * be able to handle it. + */ + uarg = NULL; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1717,9 +1738,13 @@ static int __ip6_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; + if (skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAGS) { + err = -EFAULT; + goto error; + } err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; From patchwork Tue Dec 21 15:35:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689977 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5CF4C4332F for ; Tue, 21 Dec 2021 15:36:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239506AbhLUPgS (ORCPT ); Tue, 21 Dec 2021 10:36:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239457AbhLUPgC (ORCPT ); Tue, 21 Dec 2021 10:36:02 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26737C061574; Tue, 21 Dec 2021 07:36:02 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id i22so27665272wrb.13; Tue, 21 Dec 2021 07:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wvTIec5KLzRjZYzsXsnhjT/wGUOjwWKFs3BDJ+O8UAE=; b=hCcv/uDRlbmGmkA/NE6LfCieO1Yf2GBtgbfiZAjTpUxrTVhAyiVdL2a8heQ+oxL0dK minxvJpIAn6TpIKSRvxdIxpma9g7QeefXOrfLJiVenmw+spBmz8ZQD5s8Y4Pf8ZkB59L bc6caKqxhneP0YenwHdicbsHo4fw+9LefkU2ZdoKlBLJAXdz43/ik08uvJcvoCnl/tO4 yC2m1/0Od3wOmZDrc1gM1tFLeCkDjoOrnWFFxEJgA8szVcrxfEe2PdVAsAxZ+OZgTJJn NGPH0DqOjVJwMpF4Xycc/uBhGa74SUx1PD0ywv8rQkY25vzeREV3EeR6tNEjdyq0RmjS zRpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wvTIec5KLzRjZYzsXsnhjT/wGUOjwWKFs3BDJ+O8UAE=; b=oCKYGtQGk4SNvf1msIMlpY+YCnQqnuGcVk+2gpE3NfVEz9YNgKfzZ7b8/fq3jf2lOb I6GezkIU/22GN1y7fW0A27YBYEfM19BGFVsG34Uv9vZyWM8TVJfBzUUXijpZ7UROLjO/ 7U5VZBFueJDSzx1pWNHjV1lKIeUnUssHwO9+Ow3WCKRkoZvKtgrnZ6MpTb9xDbka2rrV 4Ag77l1vsNzbltHHvZA+5l75Zj9FG32IfYXhW864OGEVdSIoCFKwE30BPaFgrL0HPD41 Maofwjhxo/EirlTMMxzFCPvkkZtqR9ZT6qIZWqR9GLGxgY7daOTGQnGOw9fbMjRcI9Fi iKUQ== X-Gm-Message-State: AOAM530AV79PbffFYBheM9iehn3IM61Xq9ykdZ1u6W8lAmjT9J/kUoC7 x3ALOlSMiNmzD4K7MoBa1+mYhHbiSlg= X-Google-Smtp-Source: ABdhPJzy6vgnm90jvtv4pJn4T+wvd+XYnsTzspBiHMYfM2IONkCjPkOO7qBgMBufSCuCx+H6ylJeyg== X-Received: by 2002:adf:80c2:: with SMTP id 60mr3103274wrl.609.1640100960583; Tue, 21 Dec 2021 07:36:00 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.35.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:00 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 08/19] ipv4: avoid partial copy for zc Date: Tue, 21 Dec 2021 15:35:30 +0000 Message-Id: <4c2bf8d68ffa06b212c9a4a4a095787fbdf05eb7.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index f820288092ab..5ec9e540a660 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1095,9 +1095,12 @@ static int __ip_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Dec 21 15:35:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689979 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E986CC433EF for ; Tue, 21 Dec 2021 15:36:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239577AbhLUPgT (ORCPT ); Tue, 21 Dec 2021 10:36:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239510AbhLUPgH (ORCPT ); Tue, 21 Dec 2021 10:36:07 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F478C061759; Tue, 21 Dec 2021 07:36:03 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id y83-20020a1c7d56000000b003456dfe7c5cso1986195wmc.1; Tue, 21 Dec 2021 07:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WNPIqQM1z0+kuavyLggOTiNlMbd0U5faiuOmlW4HlIg=; b=dMnBV5a98j8DOXbdVK6Qk2c0ccAKxlo3uQwu5vf21CUOHogMDoVHgrwKFlOrNR68Vf RB/Vmnqp7T6rQxLraTX63VYJ7MNgjyQn50Hy5t1N9lB8EH1mrB7vdVRzWpv2KdRY5wwj x2hpCgfKYat5k8MCuo/ppkIXzwUEDbqu8oCxSZc0FhDWo/dLwYgS+nxddDAFD44O+c9Q MLmx1hXD5Oup1I1R42j9QXj4dCY1iJ4iJ9VL7sbdf0AYPie1GDp6zwdaEpU16BUnuLR+ vwPNbMJEFHLpwjGE2wog+rtENZHDcKMILaa8eHArjv6/m5siFdXvNwConrvITz7EW3lg o+Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WNPIqQM1z0+kuavyLggOTiNlMbd0U5faiuOmlW4HlIg=; b=y8Kob//D+vaGgOciAIgB/MrI32miii1DpPg4kFTCkMmufJ2Wveu4gDHdZy7AymCGRi rOTCrHw0W0+KihC5ODDL/KvhozkjF2RGe4x2tx8/+vliZfV32McCVzC8vXdx9eLfYsUO 2msugWNvUHxDt5lfxxcmBbQDVvCGQm9fo20wXtaKu8u+RR33bQZDO7s20v4dmG6MUSfC 0LaX6YJLx9NRM98msKy2yDgbtCF8AD57K2ALgLvNmMOVLso2sv5yEPLLjDhjOlBplAYZ m6ThZGNJ1fnbmjjZZU2OltPogdfoUVaZ/DLAVBFmcJon90zEf5ZSwBvlMemlITCRRGxo 9eSA== X-Gm-Message-State: AOAM530INT8yg0kv7nZI5jW7znBfv0K+oWEpqgUXDx9/XTDIAbsus46i 8LHfLGVM2J/inV/sZ5rLWUi4QkosztE= X-Google-Smtp-Source: ABdhPJwhNbc/EwvwaWp87zZfhSuYzHX03J5P9vGGH5P7uABbg/G6WBHZi+KVdG76A4hdywSF6h9gBw== X-Received: by 2002:a1c:e913:: with SMTP id q19mr3228084wmc.87.1640100961588; Tue, 21 Dec 2021 07:36:01 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:01 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 09/19] ipv6: avoid partial copy for zc Date: Tue, 21 Dec 2021 15:35:31 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 822e3894dd3b..3ca07d2ea9ca 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1626,9 +1626,12 @@ static int __ip6_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Dec 21 15:35:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5C58C433F5 for ; Tue, 21 Dec 2021 15:36:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239543AbhLUPgV (ORCPT ); Tue, 21 Dec 2021 10:36:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239534AbhLUPgL (ORCPT ); Tue, 21 Dec 2021 10:36:11 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B60E3C061761; Tue, 21 Dec 2021 07:36:05 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id p27-20020a05600c1d9b00b0033bf8532855so2341017wms.3; Tue, 21 Dec 2021 07:36:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=G2De1nOYEnoTMy4aHTSVIFQ1gk/yEENXcYEdqgdPhSQ=; b=Xs0SKTkY3isypL7d070j113H9JYXWTyLwEpJKdCfDftM/MGwNiAVKGklpz2dTAI4zV 1KIu1N9/yatXBdOqIlEVRCY4n+gn+/FyRfjD/h1ZRw1LIRIMzejactzXA9Fdb4u4nbn1 YoYkWFILY/4D2hEdWqdMWCWKL5aom7SWyhp41ELvoiHfS5zoSWL4PPJBDP6uaHnNrId2 r59NBW6UTEe9K2GAKsLkE8lV3cp7lx3mxhXrKocAd2Zb2if4GBeaD7uU/493JgBNM0gO vD64Vsy1wQi5pGnHQvZ3NzrA7RE+775Oh58YxMQvSayBnLjeDINF6jNmr2yQbt+pTQN2 1Fag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=G2De1nOYEnoTMy4aHTSVIFQ1gk/yEENXcYEdqgdPhSQ=; b=VgpuKJmYrG/IyHB23AeVVsUzU/vwPnh7bQSLW9l+knbc/n2Ne8kQImLRxHhKgEMNAB YX/PXuMqQG2FFrL0VZgS6kXBeKJlg9vObLSXPtuQfKcpHj9VVbbcXrd1pj2VF9EJipEJ PM8S7QXwSmI39yTwKPZJ9ek2gvXm9aLPJCzK+5T9YNibBHuVHXnFn4M08Te7KHkv4RvC 5G3N0GIFp0qnVR/6239EUO19x3dF86Y0p3JOLLQ/P05k/Bg7twjvk/yDx7G1Fe4SoBBq ILY7rtffcmgmKCunnhLFBKIf2RyuiJdPfZqvdQlg7qiJfFN4W4FzVl3vyci26PEmNK6Z 8L3A== X-Gm-Message-State: AOAM533BNeqJACYJi7U3wZ9I5ydwxVVYDL744NFbyRWQtPBxT1blfJBJ mEB+qFb7kEaCItJq/tdaxZ1PHRnOV3k= X-Google-Smtp-Source: ABdhPJwrgOzl0uZgDUnuUF7Zl7BEibR8u4Wo4ZCOE2RtIHIen3MlK5OpI+/HpM4Ad/XJWkRC3s8PIA== X-Received: by 2002:a7b:c5d8:: with SMTP id n24mr3170524wmk.168.1640100964092; Tue, 21 Dec 2021 07:36:04 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:03 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 10/19] io_uring: add send notifiers registration Date: Tue, 21 Dec 2021 15:35:32 +0000 Message-Id: <4cfeb88dc07fcdff2c0c864c031bb32b61439674.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add IORING_REGISTER_TX_CTX and IORING_UNREGISTER_TX_CTX. Transmission (i.e. send) context will serve be used to notify the userspace when fixed buffers used for zerocopy sends are released by the kernel. Notification of a single tx context lives in generations, where each generation posts one CQE with ->user_data equal to the specified tag and ->res is a generation number starting from 0. All requests issued against a ctx will get attached to the current generation of notifications. Then, the userspace will be able to request to flush the notification allowing it to post a CQE when all buffers of all requests attached to it are released by the kernel. It'll also switch the generation to a new one with a sequence number incremented by one. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 7 ++++ 2 files changed, 79 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 59380e3454ad..a01f91e70fa5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -94,6 +94,8 @@ #define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8 +#define IORING_MAX_TX_NOTIFIERS (1U << 10) + /* only define max */ #define IORING_MAX_FIXED_FILES (1U << 15) #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ @@ -326,6 +328,15 @@ struct io_submit_state { struct blk_plug plug; }; +struct io_tx_notifier { +}; + +struct io_tx_ctx { + struct io_tx_notifier *notifier; + u64 tag; + u32 seq; +}; + struct io_ring_ctx { /* const or read-mostly hot data */ struct { @@ -373,6 +384,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_tx_ctx *tx_ctxs; + unsigned nr_tx_ctxs; struct io_submit_state submit_state; struct list_head timeout_list; @@ -9199,6 +9212,55 @@ static int io_buffer_validate(struct iovec *iov) return 0; } +static int io_sqe_tx_ctx_unregister(struct io_ring_ctx *ctx) +{ + if (!ctx->nr_tx_ctxs) + return -ENXIO; + + kvfree(ctx->tx_ctxs); + ctx->tx_ctxs = NULL; + ctx->nr_tx_ctxs = 0; + return 0; +} + +static int io_sqe_tx_ctx_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int nr_args) +{ + struct io_uring_tx_ctx_register __user *tx_args = arg; + struct io_uring_tx_ctx_register tx_arg; + unsigned i; + int ret; + + if (ctx->nr_tx_ctxs) + return -EBUSY; + if (!nr_args) + return -EINVAL; + if (nr_args > IORING_MAX_TX_NOTIFIERS) + return -EMFILE; + + ctx->tx_ctxs = kvcalloc(nr_args, sizeof(ctx->tx_ctxs[0]), + GFP_KERNEL_ACCOUNT); + if (!ctx->tx_ctxs) + return -ENOMEM; + + for (i = 0; i < nr_args; i++, ctx->nr_tx_ctxs++) { + struct io_tx_ctx *tx_ctx = &ctx->tx_ctxs[i]; + + if (copy_from_user(&tx_arg, &tx_args[i], sizeof(tx_arg))) { + ret = -EFAULT; + goto out_fput; + } + tx_ctx->tag = tx_arg.tag; + } + return 0; + +out_fput: + kvfree(ctx->tx_ctxs); + ctx->tx_ctxs = NULL; + ctx->nr_tx_ctxs = 0; + return ret; +} + static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args, u64 __user *tags) { @@ -9429,6 +9491,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + io_sqe_tx_ctx_unregister(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -11104,6 +11167,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_iowq_max_workers(ctx, arg); break; + case IORING_REGISTER_TX_CTX: + ret = io_sqe_tx_ctx_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_TX_CTX: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_sqe_tx_ctx_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 787f491f0d2a..f2e8d18e40e0 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -325,6 +325,9 @@ enum { /* set/get max number of io-wq workers */ IORING_REGISTER_IOWQ_MAX_WORKERS = 19, + IORING_REGISTER_TX_CTX = 20, + IORING_UNREGISTER_TX_CTX = 21, + /* this goes last */ IORING_REGISTER_LAST }; @@ -365,6 +368,10 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_tx_ctx_register { + __u64 tag; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) From patchwork Tue Dec 21 15:35:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689983 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 862F9C433FE for ; Tue, 21 Dec 2021 15:36:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239636AbhLUPgX (ORCPT ); Tue, 21 Dec 2021 10:36:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239535AbhLUPgL (ORCPT ); Tue, 21 Dec 2021 10:36:11 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8E4FC061763; Tue, 21 Dec 2021 07:36:06 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id a9so27690265wrr.8; Tue, 21 Dec 2021 07:36:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hlDfhawyO4srXvDUu8FIjMC0f1jsig5whZXypCgHm7s=; b=kUHFcXu+clpTYg2+7dZlOog1Lyb0RNPmwfWZT3YMp9NMaS5RPj24v5NtKhW8QV3vnc rARJvLchpRG3lFbv52dLqE+7Fo7tBJghiQQJB4do/tjfUPJNR+fFfqod44K9quRWH9K+ rEONiSXPqj8WqaBYhJ6YgMq54z7vAi7yC7mJj/1eSs9ApFvGaWit7exhMyAF1sAI0cKC 2LQKytoHkMq2C5knXbq+uLVWA5ns94lxzKS9utoQAcxYq5a7Lf4bFDvrorH+2TYhUyGv TiW/g7U/HLeKYZKIOOSc7Lqfy3ySIAu46kzk9jfTTk0GKifu4gqwBtgZs3nbo8NTvCxQ sQeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hlDfhawyO4srXvDUu8FIjMC0f1jsig5whZXypCgHm7s=; b=BLMXANdv3OpG7PbsntpfdDz5QKIJ7a0jXPmj0omHOQ23wZ2YT03Ti5yYjk2cs5pDWD ljipk46zAlEbB8VITkmgc72BSzKVJpywuJD0/6LRaVGoynFmKBDLAbheiQ0fTWsGSiL7 Mse6NnJ7lIbpcfRJEYVqX8scdt23Nl9zFIOM3I3p+t2/Lk0HbONGqXPoq+K8cSeGiefC iGtzxxTZd79MlSmkAIRXNvWE4QlzXPFrkPaX37VAc4ntUzApAd2azcUNUWtlS1dKHYXR U5XstZquF22ZJpoZilg7XoqKe6fjWMa2zilPqDsG1qP1eHnTGGqBYUX5l1TQv9H/4Z31 phqw== X-Gm-Message-State: AOAM533i9PX94MX8w/tzMflP6vdYmDhWWB9pmkOVViEQAoKdY4Ugwb2H 716IDoiwgxS3BC6DC5mbTJ6XAWcLbqM= X-Google-Smtp-Source: ABdhPJxZMZB+eUpDRBewQ1fM/jdg8dCcTkuIEK5cqusZENDJVhixmTgnhv7BhoybXpuruTzw1jZiCw== X-Received: by 2002:a05:6000:1625:: with SMTP id v5mr3114573wrb.196.1640100965140; Tue, 21 Dec 2021 07:36:05 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:04 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 11/19] io_uring: infrastructure for send zc notifications Date: Tue, 21 Dec 2021 15:35:33 +0000 Message-Id: <8fb455d8df2e8635a4424e7fdc34c7e06c7e1138.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add a new ubuf_info callback io_uring_tx_zerocopy_callback(), which should post an CQE when it completes. Also, implement some infrastructuire for allocating and managing struct ubuf_info. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 108 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index a01f91e70fa5..92190679f3f6 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -329,6 +329,11 @@ struct io_submit_state { }; struct io_tx_notifier { + struct ubuf_info uarg; + struct work_struct commit_work; + struct percpu_ref *fixed_rsrc_refs; + u64 tag; + u32 seq; }; struct io_tx_ctx { @@ -1275,15 +1280,20 @@ static void io_rsrc_refs_refill(struct io_ring_ctx *ctx) percpu_ref_get_many(&ctx->rsrc_node->refs, IO_RSRC_REF_BATCH); } +static inline void io_set_rsrc_node(struct percpu_ref **rsrc_refs, + struct io_ring_ctx *ctx) +{ + *rsrc_refs = &ctx->rsrc_node->refs; + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); +} + static inline void io_req_set_rsrc_node(struct io_kiocb *req, struct io_ring_ctx *ctx) { - if (!req->fixed_rsrc_refs) { - req->fixed_rsrc_refs = &ctx->rsrc_node->refs; - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); - } + if (!req->fixed_rsrc_refs) + io_set_rsrc_node(&req->fixed_rsrc_refs, ctx); } static void io_refs_resurrect(struct percpu_ref *ref, struct completion *compl) @@ -1930,6 +1940,76 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, return __io_fill_cqe(ctx, user_data, res, cflags); } +static void io_zc_tx_work_callback(struct work_struct *work) +{ + struct io_tx_notifier *notifier = container_of(work, struct io_tx_notifier, + commit_work); + struct io_ring_ctx *ctx = notifier->uarg.ctx; + + spin_lock(&ctx->completion_lock); + io_fill_cqe_aux(ctx, notifier->tag, notifier->seq, 0); + io_commit_cqring(ctx); + spin_unlock(&ctx->completion_lock); + io_cqring_ev_posted(ctx); + + percpu_ref_put(notifier->fixed_rsrc_refs); + percpu_ref_put(&ctx->refs); + kfree(notifier); +} + +static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *uarg, + bool success) +{ + struct io_tx_notifier *notifier = container_of(uarg, + struct io_tx_notifier, uarg); + + if (!refcount_dec_and_test(&uarg->refcnt)) + return; + + if (in_interrupt()) { + INIT_WORK(¬ifier->commit_work, io_zc_tx_work_callback); + queue_work(system_unbound_wq, ¬ifier->commit_work); + } else { + io_zc_tx_work_callback(¬ifier->commit_work); + } +} + +static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, + struct io_tx_ctx *tx_ctx) +{ + struct io_tx_notifier *notifier; + struct ubuf_info *uarg; + + notifier = kmalloc(sizeof(*notifier), GFP_ATOMIC); + if (!notifier) + return NULL; + + WARN_ON_ONCE(!current->io_uring); + notifier->seq = tx_ctx->seq++; + notifier->tag = tx_ctx->tag; + io_set_rsrc_node(¬ifier->fixed_rsrc_refs, ctx); + + uarg = ¬ifier->uarg; + uarg->ctx = ctx; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + uarg->callback = io_uring_tx_zerocopy_callback; + refcount_set(&uarg->refcnt, 1); + percpu_ref_get(&ctx->refs); + return notifier; +} + +__attribute__((unused)) +static inline struct io_tx_notifier *io_get_tx_notifier(struct io_ring_ctx *ctx, + struct io_tx_ctx *tx_ctx) +{ + if (tx_ctx->notifier) + return tx_ctx->notifier; + + tx_ctx->notifier = io_alloc_tx_notifier(ctx, tx_ctx); + return tx_ctx->notifier; +} + static void io_req_complete_post(struct io_kiocb *req, s32 res, u32 cflags) { @@ -9212,11 +9292,27 @@ static int io_buffer_validate(struct iovec *iov) return 0; } +static void io_sqe_tx_ctx_kill_ubufs(struct io_ring_ctx *ctx) +{ + struct io_tx_ctx *tx_ctx; + int i; + + for (i = 0; i < ctx->nr_tx_ctxs; i++) { + tx_ctx = &ctx->tx_ctxs[i]; + if (!tx_ctx->notifier) + continue; + io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, + true); + tx_ctx->notifier = NULL; + } +} + static int io_sqe_tx_ctx_unregister(struct io_ring_ctx *ctx) { if (!ctx->nr_tx_ctxs) return -ENXIO; + io_sqe_tx_ctx_kill_ubufs(ctx); kvfree(ctx->tx_ctxs); ctx->tx_ctxs = NULL; ctx->nr_tx_ctxs = 0; @@ -9608,6 +9704,12 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_sq_thread_unpark(sqd); } + if (READ_ONCE(ctx->nr_tx_ctxs)) { + mutex_lock(&ctx->uring_lock); + io_sqe_tx_ctx_kill_ubufs(ctx); + mutex_unlock(&ctx->uring_lock); + } + io_req_caches_free(ctx); if (WARN_ON_ONCE(time_after(jiffies, timeout))) { From patchwork Tue Dec 21 15:35:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689985 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9F68C433F5 for ; Tue, 21 Dec 2021 15:36:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239459AbhLUPgc (ORCPT ); Tue, 21 Dec 2021 10:36:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239492AbhLUPgR (ORCPT ); Tue, 21 Dec 2021 10:36:17 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF54DC061799; Tue, 21 Dec 2021 07:36:07 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id s1so27794342wrg.1; Tue, 21 Dec 2021 07:36:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hLt51GiyEFtQsW2c2SV0I+9skcK5FTpjdlXoOT9qeCY=; b=DCw72oZlGLm+ELuI4uV5m6KTi/jUVR0+DVoApPRjg/O/I22izSzZkPO33WmjyzE/58 1MKnYWd2IdGkaEJRnEcfR5cCLss0kIzTHSkYsumi5xAdOJiFvDKdhlRI3cjkNd/IBreh odG4EOAzjD2a5geQvyIt7/azMMYUePJwqNpflExuBJwmUig3Q1wvbz0eadS1jtR425IQ 1HVVO1CDWWH3HaaNZOdCWThI6n/oisb52zJW733xdmE8EbIS2P7YGH+q/Agp3S/YRima 7mQr2IyBQvDI7il/Ks1qzS/54Os5C98dHJMOD1D1BNY9zEYoi/NlI2BaXgRhAxVnWyle FuGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hLt51GiyEFtQsW2c2SV0I+9skcK5FTpjdlXoOT9qeCY=; b=2TeY1JnybOqW2iMOnhUcIcOflxEURPi/1MITN1kZMjB5dTPF9ch4xnq+1nMlACkA/+ c6Bmz0tSFjxKRTBTC6YvV6qnqJpVcicTyeNuyIR1vTxhwNCXQE1giWvi5rykkdBu74PK +hc5LTrqnKQsg++TOerb7HfqDAVJjAhA+iURSDlcKhON8pBf2YbGD+z04J8uK5sWXUy3 /vAg8cKmLC3cdtu+g0RG8s88LAfRTeSI+q0tnR2oYUodmXpqyN420QkQ09Do94ODPtJB E3UlGDcQOUIzqEpAPd/VFEPuu/nDpc5E3ULUvDVyEE+TNv22lOW+vMTDgh1i2XYt0Zla ld3A== X-Gm-Message-State: AOAM532LNbq9tBC4impBvwUE8MCXR9nphib+/VIZqhrYM6bd+I6qiubn XTxPy0WDons4qHss1i4tQG7sjQ0wsxI= X-Google-Smtp-Source: ABdhPJyP/hGbXJY71rBumaf5YNkP25VloEMxUPGCkAf3aaLiXI2Lp4eUugPa+yto92Av3nVHtQPNpg== X-Received: by 2002:adf:e3d1:: with SMTP id k17mr3171511wrm.610.1640100966332; Tue, 21 Dec 2021 07:36:06 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:05 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 12/19] io_uring: wire send zc request type Date: Tue, 21 Dec 2021 15:35:34 +0000 Message-Id: <41c71e1dc27c3ba17acb5a8f43e1e140fca71f19.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from other send requests is that the user should specify a tx context index, which will notifiy the userspace when the kernel doesn't need the buffers anymore and it's safe to reuse them. So, overwriting data buffers is racy before getting a separate notification even when the request is already completed. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 120 +++++++++++++++++++++++++++++++++- include/uapi/linux/io_uring.h | 2 + 2 files changed, 121 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 92190679f3f6..9452b4ec32b6 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -600,6 +600,16 @@ struct io_sr_msg { size_t len; }; +struct io_sendzc { + struct file *file; + void __user *buf; + size_t len; + struct io_tx_ctx *tx_ctx; + int msg_flags; + int addr_len; + void __user *addr; +}; + struct io_open { struct file *file; int dfd; @@ -874,6 +884,7 @@ struct io_kiocb { struct io_mkdir mkdir; struct io_symlink symlink; struct io_hardlink hardlink; + struct io_sendzc msgzc; }; u8 opcode; @@ -1123,6 +1134,12 @@ static const struct io_op_def io_op_defs[] = { [IORING_OP_MKDIRAT] = {}, [IORING_OP_SYMLINKAT] = {}, [IORING_OP_LINKAT] = {}, + [IORING_OP_SENDZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollout = 1, + .audit_skip = 1, + }, }; /* requests with any of those set should undergo io_disarm_next() */ @@ -1999,7 +2016,6 @@ static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, return notifier; } -__attribute__((unused)) static inline struct io_tx_notifier *io_get_tx_notifier(struct io_ring_ctx *ctx, struct io_tx_ctx *tx_ctx) { @@ -5025,6 +5041,102 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } +static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_sendzc *sr = &req->msgzc; + unsigned int idx; + + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL; + if (READ_ONCE(sqe->ioprio)) + return -EINVAL; + + sr->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + sr->len = READ_ONCE(sqe->len); + sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; + if (sr->msg_flags & MSG_DONTWAIT) + req->flags |= REQ_F_NOWAIT; + + idx = READ_ONCE(sqe->tx_ctx_idx); + if (idx > ctx->nr_tx_ctxs) + return -EINVAL; + idx = array_index_nospec(idx, ctx->nr_tx_ctxs); + req->msgzc.tx_ctx = &ctx->tx_ctxs[idx]; + + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + sr->addr_len = READ_ONCE(sqe->__pad2[0]); + +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + sr->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct sockaddr_storage address; + struct io_ring_ctx *ctx = req->ctx; + struct io_tx_notifier *notifier; + struct io_sendzc *sr = &req->msgzc; + struct msghdr msg; + struct iovec iov; + struct socket *sock; + unsigned flags; + int ret, min_ret = 0; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter); + if (unlikely(ret)) + return ret; + + msg.msg_name = NULL; + msg.msg_control = NULL; + msg.msg_controllen = 0; + msg.msg_namelen = 0; + if (sr->addr) { + ret = move_addr_to_kernel(sr->addr, sr->addr_len, &address); + if (ret < 0) + return ret; + msg.msg_name = (struct sockaddr *)&address; + msg.msg_namelen = sr->addr_len; + } + + io_ring_submit_lock(ctx, issue_flags & IO_URING_F_UNLOCKED); + notifier = io_get_tx_notifier(ctx, req->msgzc.tx_ctx); + if (!notifier) { + req_set_fail(req); + ret = -ENOMEM; + goto out; + } + msg.msg_ubuf = ¬ifier->uarg; + + flags = sr->msg_flags | MSG_ZEROCOPY; + if (issue_flags & IO_URING_F_NONBLOCK) + flags |= MSG_DONTWAIT; + if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + msg.msg_flags = flags; + ret = sock_sendmsg(sock, &msg); + + if (ret < min_ret) { + if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) + goto out; + if (ret == -ERESTARTSYS) + ret = -EINTR; + req_set_fail(req); + } + io_ring_submit_unlock(ctx, issue_flags & IO_URING_F_UNLOCKED); + __io_req_complete(req, issue_flags, ret, 0); + return 0; +out: + io_ring_submit_unlock(ctx, issue_flags & IO_URING_F_UNLOCKED); + return ret; +} + static int __io_recvmsg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg) { @@ -5428,6 +5540,7 @@ IO_NETOP_PREP_ASYNC(sendmsg); IO_NETOP_PREP_ASYNC(recvmsg); IO_NETOP_PREP_ASYNC(connect); IO_NETOP_PREP(accept); +IO_NETOP_PREP(sendzc); IO_NETOP_FN(send); IO_NETOP_FN(recv); #endif /* CONFIG_NET */ @@ -6575,6 +6688,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) case IORING_OP_SENDMSG: case IORING_OP_SEND: return io_sendmsg_prep(req, sqe); + case IORING_OP_SENDZC: + return io_sendzc_prep(req, sqe); case IORING_OP_RECVMSG: case IORING_OP_RECV: return io_recvmsg_prep(req, sqe); @@ -6832,6 +6947,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) case IORING_OP_SEND: ret = io_send(req, issue_flags); break; + case IORING_OP_SENDZC: + ret = io_sendzc(req, issue_flags); + break; case IORING_OP_RECVMSG: ret = io_recvmsg(req, issue_flags); break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f2e8d18e40e0..bbc78fe8ca77 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -59,6 +59,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 tx_ctx_idx; }; __u64 __pad2[2]; }; @@ -143,6 +144,7 @@ enum { IORING_OP_MKDIRAT, IORING_OP_SYMLINKAT, IORING_OP_LINKAT, + IORING_OP_SENDZC, /* this goes last, obviously */ IORING_OP_LAST, From patchwork Tue Dec 21 15:35:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F129C433EF for ; Tue, 21 Dec 2021 15:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239568AbhLUPgk (ORCPT ); Tue, 21 Dec 2021 10:36:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239569AbhLUPgS (ORCPT ); Tue, 21 Dec 2021 10:36:18 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 142AEC0617A0; Tue, 21 Dec 2021 07:36:09 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id t26so27716393wrb.4; Tue, 21 Dec 2021 07:36:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xKWnCIr57BIRr7RkCif3zIfB498scrfOaP1CdH60oZI=; b=Q4pqg94M/1Wio3VPfAk2V/9ZGGdhbKj4STAWfEHDUQHsNxM3Ej1Ob3PR7IlVzf2tuX BhdNP/MKLWJRSaBMOpf64ZBdhOqL8CBvBABI4PYLyJU58zw1dZV9GQ0Ret2qqiIGRmoe 1500Cj5MW7hXhu/Z0dhn2Fy/Es/Q/0bO8jiGlTimwzhcF2Hzs7++FGZwGaXcsRb2bzN8 ysZAeFQFwi+2tzGbjQyd276fjLpo6WpuzJQXyql9rDYjJsQDxj6XwiZyTixzBS8tVli4 zJoUJex0ZtfwPPdQ+4JZbUCyG0QqnY1XbRGm4LsxGkgraQoAZGz23+J8ak78dRfIEUkC nnTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xKWnCIr57BIRr7RkCif3zIfB498scrfOaP1CdH60oZI=; b=cja9FXZCUYtAIXYKampTsomFjqYk+JkGRzFxxXp3046d6WN9P+ggW4qPmKYw+ZQyFm fhsJByunK59Bx90Ws+5D6FyNwSD2OUqki3nHf9/mrqWrvQj9g6+zdhVqhjB//JORTfJH ewNruK87RP1o9/JJpGUxrj4WyZWINQpFSMEqo00G+xvfY75DAu672Z4HQ8pJFxjXmHX5 FShmyrPUPFgYqM6T7tKzHnE9ttkWnDElKyPQph+SiaGlNMGZdTeXJEHvh3c4EXU3uPYN J+5trbS0EC8Wrb7jkYd9+mQ1Uc/Wri4RLYVkKEOZxPgvOtG7Ko5pvEYdFcyKiWFt2I3k yZsA== X-Gm-Message-State: AOAM530APZDlbmaHDzKmQ3HjEjxs5SGDBqwMXnTaii7WjyuiMfH7Rkc0 uFTiszZ5N4rQI1m43YeiY3rlO3AjRKI= X-Google-Smtp-Source: ABdhPJxEkQKoHGj3EJXOopMDDIVTk8uc8YdMho6igfIg6ANVzbdWvJED6RTYNDh1Cu46EkRV5Pka7Q== X-Received: by 2002:a5d:4706:: with SMTP id y6mr3171281wrq.435.1640100967487; Tue, 21 Dec 2021 07:36:07 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:06 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 13/19] io_uring: add an option to flush zc notifications Date: Tue, 21 Dec 2021 15:35:35 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add IORING_SENDZC_FLUSH flag. If specified, a send zc operation on success should also flush a corresponding ubuf_info. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 26 +++++++++++++++++++------- include/uapi/linux/io_uring.h | 4 ++++ 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 9452b4ec32b6..ec1f6c60a14c 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -608,6 +608,7 @@ struct io_sendzc { int msg_flags; int addr_len; void __user *addr; + unsigned int zc_flags; }; struct io_open { @@ -1992,6 +1993,12 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, } } +static void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) +{ + io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, true); + tx_ctx->notifier = NULL; +} + static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, struct io_tx_ctx *tx_ctx) { @@ -5041,6 +5048,8 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } +#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FLUSH + static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_ring_ctx *ctx = req->ctx; @@ -5049,8 +5058,6 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; - if (READ_ONCE(sqe->ioprio)) - return -EINVAL; sr->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); sr->len = READ_ONCE(sqe->len); @@ -5067,6 +5074,10 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); sr->addr_len = READ_ONCE(sqe->__pad2[0]); + req->msgzc.zc_flags = READ_ONCE(sqe->ioprio); + if (req->msgzc.zc_flags & ~IO_SENDZC_VALID_FLAGS) + return -EINVAL; + #ifdef CONFIG_COMPAT if (req->ctx->compat) sr->msg_flags |= MSG_CMSG_COMPAT; @@ -5089,6 +5100,7 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) sock = sock_from_file(req->file); if (unlikely(!sock)) return -ENOTSOCK; + ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter); if (unlikely(ret)) return ret; @@ -5128,6 +5140,8 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) if (ret == -ERESTARTSYS) ret = -EINTR; req_set_fail(req); + } else if (req->msgzc.zc_flags & IORING_SENDZC_FLUSH) { + io_tx_kill_notification(req->msgzc.tx_ctx); } io_ring_submit_unlock(ctx, issue_flags & IO_URING_F_UNLOCKED); __io_req_complete(req, issue_flags, ret, 0); @@ -9417,11 +9431,9 @@ static void io_sqe_tx_ctx_kill_ubufs(struct io_ring_ctx *ctx) for (i = 0; i < ctx->nr_tx_ctxs; i++) { tx_ctx = &ctx->tx_ctxs[i]; - if (!tx_ctx->notifier) - continue; - io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, - true); - tx_ctx->notifier = NULL; + + if (tx_ctx->notifier) + io_tx_kill_notification(tx_ctx); } } diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index bbc78fe8ca77..ac18e8e6f86f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -187,6 +187,10 @@ enum { #define IORING_POLL_UPDATE_EVENTS (1U << 1) #define IORING_POLL_UPDATE_USER_DATA (1U << 2) +enum { + IORING_SENDZC_FLUSH = (1U << 0), +}; + /* * IO completion data structure (Completion Queue Entry) */ From patchwork Tue Dec 21 15:35:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B8EBC4332F for ; Tue, 21 Dec 2021 15:36:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239509AbhLUPgg (ORCPT ); Tue, 21 Dec 2021 10:36:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239428AbhLUPgT (ORCPT ); Tue, 21 Dec 2021 10:36:19 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11EBFC061398; Tue, 21 Dec 2021 07:36:10 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id s1so22327014wra.6; Tue, 21 Dec 2021 07:36:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=X+6VtJkR3VMuMq3+b2gwsMV704hlEufeWq+iIvQzNDs=; b=TSHWX7NDcSGSfQEIvmvElnqi946tL+U/oVHP114lPaKADFyR2dvWkPBoQUT9Xmr53a /ja3U+Z7JYCNmR9tMw4AajR7zDVEjmiizjRIHaJwdE+pziRSF+ItBY1qkNd5AMjEZH/K eJVRI0zPmP3ApLN64BTjADvEK5Q/G2BYtYhOT5KxFPjSHVlmLAXDZdHYzwKfUtztQkpX tvpBBIlG3C0xKS77t55cctLW6v1VXgC7QEXj0uK3/YD3FJc8qc1JKRXI8RqCZ8l5CEtl pz+9Kdt3w3Zxv2BQAOZ13QBF1ooy9ErLj6N56XAoAArMdXPQwRpFqJGm/uvKgX1Gysre 51HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=X+6VtJkR3VMuMq3+b2gwsMV704hlEufeWq+iIvQzNDs=; b=KCB3dPueg2D2paCiJymAbuMx7ArmbEZCfuMf9iOPrmiKXcEEWWkXgv7XwWON7u0Cxb 5ziR29cI3qKZbBSdtGqtk1bQc4SvhIk+tmxvGNgCZ3/wJ9/9YNcDWQv/nj0S0oCsQyIJ Tg3cCdQSbMw5EWtXx4u8lc5RQ9qV0Nn6bctvqguwVv2FX/LrxK0liRECMDuwzxY8Xjfu EUmiDPEzTzJehk2H/k4QtDkc+TO2DPLAqSTbV54XVadUV2nP9FrQpK607iUyR2NjKQBO ELr36M941cAvFe2dpp0Gl3SNgjhdQ8PT6oa+9KDfs1EW1621jRYKXGPlpKs37rRh37Jv TvOQ== X-Gm-Message-State: AOAM531N6y9JAKh+QESLWyXMhDZH0E80EBYejeEW88GtXn1UyQPy+hkR 19pq6G9vcZVFWhWoPpB4OkQXPZeMcRM= X-Google-Smtp-Source: ABdhPJz9wmIfpBK627vI8d2owN4QYbpLc3jwZ3G9EsyEV4A22XcwN0QmF5WhpmT1GNKyBBxQn51Yzg== X-Received: by 2002:a5d:64c3:: with SMTP id f3mr3003911wri.295.1640100968525; Tue, 21 Dec 2021 07:36:08 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:08 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 14/19] io_uring: opcode independent fixed buf import Date: Tue, 21 Dec 2021 15:35:36 +0000 Message-Id: <014cf9d888bb9531742ba53ecabbf8e586ac6f0b.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Extract an opcode independent helper from io_import_fixed for initialising an iov_iter with a fixed buffer with Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index ec1f6c60a14c..40a8d7799be3 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3152,11 +3152,11 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret, } } -static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, - struct io_mapped_ubuf *imu) +static int __io_import_fixed(int rw, struct iov_iter *iter, + struct io_mapped_ubuf *imu, + u64 buf_addr, size_t len) { - size_t len = req->rw.len; - u64 buf_end, buf_addr = req->rw.addr; + u64 buf_end; size_t offset; if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end))) @@ -3225,7 +3225,7 @@ static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter) imu = READ_ONCE(ctx->user_bufs[index]); req->imu = imu; } - return __io_import_fixed(req, rw, iter, imu); + return __io_import_fixed(rw, iter, imu, req->rw.addr, req->rw.len); } static void io_ring_submit_unlock(struct io_ring_ctx *ctx, bool needs_lock) From patchwork Tue Dec 21 15:35:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689987 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F25BC433F5 for ; Tue, 21 Dec 2021 15:36:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239589AbhLUPgf (ORCPT ); Tue, 21 Dec 2021 10:36:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239593AbhLUPgU (ORCPT ); Tue, 21 Dec 2021 10:36:20 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36C24C06139B; Tue, 21 Dec 2021 07:36:11 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id s1so27794706wrg.1; Tue, 21 Dec 2021 07:36:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=e3GJ/1g8n4eGzQHYyUbOm+L11c82sD3eIdcK9sXM3cs=; b=Gf0c7lR7X3eg4NF/QjFVp4loJSvohG1TjxVrgzGtzyKlcx+BWv4xzPYSWIpsLE3EHq 0cPxnF2jF48aPulfzCnKVTbT204anUebWv+8g1al26o8B+GKDuOoAB5XmKH55QhPPSpo 0P5yDya6skYWF+hh4CWjhqB4yLO3UpqYf1YUBdQKwuPkRFLdBtLHzAFX4yyRLuoHkEAj 2/T8zbsn4bdvddy+Ha79j1T1+v4KnK9H+V4nuvQtOZHIZZAW979hZtnJnSZSSjkNedHl C4bYnCzsgGrjW/hRHK3KrzRWgCWNXTTM3tACW2a0yS+VniXMkuzWqqF/huRK9NJ3tQSp CW/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=e3GJ/1g8n4eGzQHYyUbOm+L11c82sD3eIdcK9sXM3cs=; b=dstq1hQudy0CtuVia6KeQuXYb9bjLn3GLvJcybaM7Q1WEsdOWeIwB21dJYVeF9d9Xj jhoGIXGqdYSV1FO5dwEwlfPq0V62OyRV48j8OGMiQHzxs8U1krNNgvMii/itE9yfH3Zf tRb8VFR8ZMTb2lG3lZB2k0aavCKh6VratQdKbMFWTmEqVOMF8eQKRvhgQjThjcpERImo G/xS4Wkd/qKUfCxpzaaHJxYhJoDrnN6TkCsd37ABF40ctQdLhpzmSOEfzB8M8XOmrSJB +i0J34GRelemOGMGEgdNJBFOX1y/3z2lUZFthUl2nejkkceYmy+o1ihnbQjYCJKijIYZ bfeQ== X-Gm-Message-State: AOAM532y/+lGvbd+7nt6IG3iJi0WKJ7ZfzRwv7eT3XHq3zuR6leoudMr z6otZrF0b9i4OModaKaethIgPeaRPOg= X-Google-Smtp-Source: ABdhPJxtkeDI3RwrW+z6fOVZ5f+C+Xe/VX2k0cpzAAasKiwrcmSKak1exY831XmeeioiWotsOVwnAw== X-Received: by 2002:a5d:64ee:: with SMTP id g14mr3081526wri.52.1640100969695; Tue, 21 Dec 2021 07:36:09 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:09 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 15/19] io_uring: sendzc with fixed buffers Date: Tue, 21 Dec 2021 15:35:37 +0000 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Allow zerocopy sends to use fixed buffers. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 19 +++++++++++++++++-- include/uapi/linux/io_uring.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 40a8d7799be3..654023ba0b91 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5048,7 +5048,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } -#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FLUSH +#define IO_SENDZC_VALID_FLAGS (IORING_SENDZC_FLUSH | IORING_SENDZC_FIXED_BUF) static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -5078,6 +5078,15 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (req->msgzc.zc_flags & ~IO_SENDZC_VALID_FLAGS) return -EINVAL; + if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { + idx = READ_ONCE(sqe->buf_index); + if (unlikely(idx >= ctx->nr_user_bufs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_user_bufs); + req->imu = READ_ONCE(ctx->user_bufs[idx]); + io_req_set_rsrc_node(req, ctx); + } + #ifdef CONFIG_COMPAT if (req->ctx->compat) sr->msg_flags |= MSG_CMSG_COMPAT; @@ -5101,7 +5110,13 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_single_range(WRITE, sr->buf, sr->len, &iov, &msg.msg_iter); + if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { + ret = __io_import_fixed(WRITE, &msg.msg_iter, req->imu, + (u64)sr->buf, sr->len); + } else { + ret = import_single_range(WRITE, sr->buf, sr->len, &iov, + &msg.msg_iter); + } if (unlikely(ret)) return ret; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ac18e8e6f86f..740af1d0409f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -189,6 +189,7 @@ enum { enum { IORING_SENDZC_FLUSH = (1U << 0), + IORING_SENDZC_FIXED_BUF = (1U << 1), }; /* From patchwork Tue Dec 21 15:35:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02CF5C433EF for ; Tue, 21 Dec 2021 15:36:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239655AbhLUPgq (ORCPT ); Tue, 21 Dec 2021 10:36:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239648AbhLUPgZ (ORCPT ); Tue, 21 Dec 2021 10:36:25 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 080D8C061574; Tue, 21 Dec 2021 07:36:13 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id v7so20401838wrv.12; Tue, 21 Dec 2021 07:36:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WsUNFg1GEDl96mL5jIYmxxYGv794Eaz/mdSlxpAjpdQ=; b=Dbsqav7VEQIb3SoGMUEEjJVeAu0KNlP1MXClTmDuUKqC7/2TCXapP8UATPIheyiGJO GOESUQ/B3PgEZ52VCtUIRYIAjAWEOrzz1EJZFJy40vaTwG4kdMjn/rdGJyfwaH+xX2es msa9qBY887L9EA5I7WbtX21n6oWqdWcTMszJpMnNTFpgwobLDTR4qiFxUYQZYjKohKBM 1d9hvVjt/2/YYAGoC2Jcg1h8LWmxtYYQjZq1fpJw94kzIyE7pLKt7ASOWD5wiAubU+LO gLbrtEjp02s0WNA1yodAF7fC7b9AlgFh6EcyBQbyj25tq1opm7UdaJerNkE5uR2kyyyk jp+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WsUNFg1GEDl96mL5jIYmxxYGv794Eaz/mdSlxpAjpdQ=; b=1JSetzp1TDdivAuASP0UK45LJX6/sYjsie6by51BqCqx7Ka11wWBOdZiOLKewQiYN1 ESDezoL8tEaw4aSEcjxy3B3vP9YyGbd7TZTl5P4H8UsiaKcf18DNDY5dso6b4I6MVxQK KH8J+NVZSPswtQJiqwbnP7vTiS/o/PSQrYgXKQ/xiHJ33l5WE3deYWqHiVpmxzAMc9G0 y9T3/Eng8iAaHF1FbKcLRCZPvafcl0dKTnUOD63aE0roWXGe0suczOB09H2F9ImxEKFg v2sPw9k64jBK+kA+lufB1+mTJWuxGlYGgfmdfWe+uHzE+lGeUwXnIVb6X4Zutl30TKSC iFAQ== X-Gm-Message-State: AOAM533/UEanpKeL79bpLYwfC1Q1PuM9DUU29LmGAfqN834e4iHV13Xt T/TW819aIeh5UvUYcvHzx15jtcXNN+s= X-Google-Smtp-Source: ABdhPJzLvdfkgy6easONlSisP2N9dXgWxzHcfyUiYO7sVeDRPq5n7F/zxeKf1cCH3eaiHj/bsofZ6Q== X-Received: by 2002:a05:6000:1563:: with SMTP id 3mr3091971wrz.372.1640100971390; Tue, 21 Dec 2021 07:36:11 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:10 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 16/19] io_uring: cache struct ubuf_info Date: Tue, 21 Dec 2021 15:35:38 +0000 Message-Id: <18227f5469478564b3b5f525478acb0980996f2b.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Allocation/deallocation of ubuf_info takes some time, add an optimisation caching them. The implementation is alike to how we cache requests in io_req_complete_post(). ->ubuf_list is protected by ->uring_lock and requests try grab directly from it, and there is also ->ubuf_list_locked list protected by ->completion_lock, which is eventually batch spliced to ->ubuf_list. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 74 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 64 insertions(+), 10 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 654023ba0b91..5f79178a3f38 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -334,6 +334,7 @@ struct io_tx_notifier { struct percpu_ref *fixed_rsrc_refs; u64 tag; u32 seq; + struct list_head cache_node; }; struct io_tx_ctx { @@ -393,6 +394,9 @@ struct io_ring_ctx { unsigned nr_tx_ctxs; struct io_submit_state submit_state; + struct list_head ubuf_list; + struct list_head ubuf_list_locked; + int ubuf_locked_nr; struct list_head timeout_list; struct list_head ltimeout_list; struct list_head cq_overflow_list; @@ -1491,6 +1495,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_LIST_HEAD(&ctx->ubuf_list); + INIT_LIST_HEAD(&ctx->ubuf_list_locked); return ctx; err: kfree(ctx->dummy_ubuf); @@ -1963,16 +1969,20 @@ static void io_zc_tx_work_callback(struct work_struct *work) struct io_tx_notifier *notifier = container_of(work, struct io_tx_notifier, commit_work); struct io_ring_ctx *ctx = notifier->uarg.ctx; + struct percpu_ref *rsrc_refs = notifier->fixed_rsrc_refs; spin_lock(&ctx->completion_lock); io_fill_cqe_aux(ctx, notifier->tag, notifier->seq, 0); + + list_add(¬ifier->cache_node, &ctx->ubuf_list_locked); + ctx->ubuf_locked_nr++; + io_commit_cqring(ctx); spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); - percpu_ref_put(notifier->fixed_rsrc_refs); + percpu_ref_put(rsrc_refs); percpu_ref_put(&ctx->refs); - kfree(notifier); } static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, @@ -1999,26 +2009,69 @@ static void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) tx_ctx->notifier = NULL; } +static void io_notifier_splice(struct io_ring_ctx *ctx) +{ + spin_lock(&ctx->completion_lock); + list_splice_init(&ctx->ubuf_list_locked, &ctx->ubuf_list); + ctx->ubuf_locked_nr = 0; + spin_unlock(&ctx->completion_lock); +} + +static void io_notifier_free_cached(struct io_ring_ctx *ctx) +{ + struct io_tx_notifier *notifier; + + io_notifier_splice(ctx); + + while (!list_empty(&ctx->ubuf_list)) { + notifier = list_first_entry(&ctx->ubuf_list, + struct io_tx_notifier, cache_node); + list_del(¬ifier->cache_node); + kfree(notifier); + } +} + +static inline bool io_notifier_has_cached(struct io_ring_ctx *ctx) +{ + if (likely(!list_empty(&ctx->ubuf_list))) + return true; + if (READ_ONCE(ctx->ubuf_locked_nr) <= IO_REQ_ALLOC_BATCH) + return false; + io_notifier_splice(ctx); + return !list_empty(&ctx->ubuf_list); +} + static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, struct io_tx_ctx *tx_ctx) { struct io_tx_notifier *notifier; struct ubuf_info *uarg; - notifier = kmalloc(sizeof(*notifier), GFP_ATOMIC); - if (!notifier) - return NULL; + if (likely(io_notifier_has_cached(ctx))) { + if (WARN_ON_ONCE(list_empty(&ctx->ubuf_list))) + return NULL; + + notifier = list_first_entry(&ctx->ubuf_list, + struct io_tx_notifier, cache_node); + list_del(¬ifier->cache_node); + } else { + gfp_t gfp_flags = GFP_ATOMIC|GFP_KERNEL_ACCOUNT; + + notifier = kmalloc(sizeof(*notifier), gfp_flags); + if (!notifier) + return NULL; + uarg = ¬ifier->uarg; + uarg->ctx = ctx; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + uarg->callback = io_uring_tx_zerocopy_callback; + } WARN_ON_ONCE(!current->io_uring); notifier->seq = tx_ctx->seq++; notifier->tag = tx_ctx->tag; io_set_rsrc_node(¬ifier->fixed_rsrc_refs, ctx); - uarg = ¬ifier->uarg; - uarg->ctx = ctx; - uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; - uarg->callback = io_uring_tx_zerocopy_callback; - refcount_set(&uarg->refcnt, 1); + refcount_set(¬ifier->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); return notifier; } @@ -9732,6 +9785,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + io_notifier_free_cached(ctx); io_sqe_tx_ctx_unregister(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); From patchwork Tue Dec 21 15:35:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EA91C4332F for ; Tue, 21 Dec 2021 15:36:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239669AbhLUPgr (ORCPT ); Tue, 21 Dec 2021 10:36:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239650AbhLUPgZ (ORCPT ); Tue, 21 Dec 2021 10:36:25 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53A56C06139E; Tue, 21 Dec 2021 07:36:14 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id t18so27641772wrg.11; Tue, 21 Dec 2021 07:36:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Fhgh/5DOw/NQ7pp9nqN3Rvk8py54Nb1NJlpvArlpW/w=; b=UcqqCoyznqHt3YZhbr3UqcsisZ7PlccA6VQzfgZSbgElw5Yiyxx6Gi2DIWnYmtByN4 rwB2dpUjFdZTEdKqYEL8klHHOjjCl7vM4SB8kaah2eCGyNzEHwMbtymkicYHf2kXkFSs RINfzcaU+s7tWz0wifXRAcGRttTSdGh1isgwYlIpEcqZjSgEmpzcJ8bEHyffjjB9FzRa SSboWcHoaV+yL3ai0Jx2aDjmitzP1S8pDQTURIjbrsd1YwvGFRYocP1i67sb87mMkV3q wY/HoqZsfAIbgwRMi32Y2Tk7bfTtPzirrUPeDGCBO4RGieMd2N1WRtdu9yHzP5Hto6R4 SHcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Fhgh/5DOw/NQ7pp9nqN3Rvk8py54Nb1NJlpvArlpW/w=; b=aeqonjmSOD7r323qr3PTX4jGep5VW4e1qJUV2D8iFzYRhHSVs4r2bC/298+yld0FuC TUfJGatfemXFUfTSukppAIoMwQqXsIDG/VteW6FY5sDUH6dzDXnaUnjvThpOXS1H3jL0 FKwmFdRu7mI3rt4lPHUQ/Hw5lBQee1TVV/cSJ63mSoAmLeQ40HGgCx/a0PGbWYMflv9W CtUccPyiEv9tQRX6hrG2uzyCFdCMwrfj8OyAk+FKm5SSk1aB4M6p60wOO1yF41yEH5TX oOzQoRCCDaG9iDBR6VuvgMjLoq9WWqJtDzCgudwJNPcYt/OJaRNdfqn0EWYsdxgnPomO ZXBA== X-Gm-Message-State: AOAM532u4I8aafGIBFUj8TbCeuVcO/Y47B/xSoP/R4TsSPsRit70LwHu Jz8QyqW/GvBRVCbjUevJQWWDpTT+jo0= X-Google-Smtp-Source: ABdhPJz3jRU4NB59cEZ+kIeeUEHweT4RWY0EBPQ7ToM77nxk+ERLW2k3Wk7nZOsRhWZjuk9HvPpSzQ== X-Received: by 2002:a05:6000:144a:: with SMTP id v10mr3049303wrx.357.1640100972768; Tue, 21 Dec 2021 07:36:12 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:12 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Date: Tue, 21 Dec 2021 15:35:39 +0000 Message-Id: <2c07d8e5cb5dfbd678d5a0bc6fb398aee82b67e4.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Currently every instance of struct io_tx_notifier holds a ctx reference, including ones sitting in caches. So, when we try to quiesce the ring (e.g. for register) we'd be waiting for refs that nobody can release. That's worked around in for cancellation. Don't do ctx references but wait for all notifiers to return into caches when needed. Even better solution would be to wait for all rsrc refs. It's also nice to remove an extra pair of percpu_ref_get/put(). Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5f79178a3f38..8cfa8ea161e4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -453,6 +453,7 @@ struct io_ring_ctx { struct io_mapped_ubuf *dummy_ubuf; struct io_rsrc_data *file_data; struct io_rsrc_data *buf_data; + int nr_tx_ctx; struct delayed_work rsrc_put_work; struct llist_head rsrc_put_llist; @@ -1982,7 +1983,6 @@ static void io_zc_tx_work_callback(struct work_struct *work) io_cqring_ev_posted(ctx); percpu_ref_put(rsrc_refs); - percpu_ref_put(&ctx->refs); } static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, @@ -2028,6 +2028,7 @@ static void io_notifier_free_cached(struct io_ring_ctx *ctx) struct io_tx_notifier, cache_node); list_del(¬ifier->cache_node); kfree(notifier); + ctx->nr_tx_ctx--; } } @@ -2060,6 +2061,7 @@ static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, notifier = kmalloc(sizeof(*notifier), gfp_flags); if (!notifier) return NULL; + ctx->nr_tx_ctx++; uarg = ¬ifier->uarg; uarg->ctx = ctx; uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; @@ -2072,7 +2074,6 @@ static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, io_set_rsrc_node(¬ifier->fixed_rsrc_refs, ctx); refcount_set(¬ifier->uarg.refcnt, 1); - percpu_ref_get(&ctx->refs); return notifier; } @@ -9785,7 +9786,6 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); - io_notifier_free_cached(ctx); io_sqe_tx_ctx_unregister(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -9946,6 +9946,19 @@ static __cold void io_ring_exit_work(struct work_struct *work) spin_lock(&ctx->completion_lock); spin_unlock(&ctx->completion_lock); + while (1) { + int nr; + + mutex_lock(&ctx->uring_lock); + io_notifier_free_cached(ctx); + nr = ctx->nr_tx_ctx; + mutex_unlock(&ctx->uring_lock); + + if (!nr) + break; + schedule_timeout(interval); + } + io_ring_ctx_free(ctx); } From patchwork Tue Dec 21 15:35:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05DC6C43217 for ; Tue, 21 Dec 2021 15:36:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239770AbhLUPgs (ORCPT ); Tue, 21 Dec 2021 10:36:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239663AbhLUPgZ (ORCPT ); Tue, 21 Dec 2021 10:36:25 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A96F8C0613A5; Tue, 21 Dec 2021 07:36:15 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id v7so20402087wrv.12; Tue, 21 Dec 2021 07:36:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=v4kTxKKEwyXqjrRhwS7ceqX/Ds1FYWK7l8ES+nNlj2Q=; b=gG94ByIHFLtwgV+EjMKRAT6fVgrHLNiHSV9kMz6/qirIeT8g3fpRpeMoBC9U3a4VZP gj0tEQP1kWSoC4GJrmi/WxFMZaypaUlIc5HW5wsgx1ToCr12/BUohB4cRw2/HzsIINbP 3hd8TjkZ0gD3ctUvnTEa1VGTi9ITWZfwtY9Xqgmw9EctyEwpUY2K+Q+o1n2GOnY0CbFX Ia9mSXrsysucQ4kFt0K8+PYorYmxaKb7U7NXu8zVunJzNDIXTb604p4sFH1Rvym2ZfGO JSU5I+Ize/hZIHfOhPVo52Wla6YB3qWr6GU6V5vbFkXhnP2T7DjnKzen7Qw91rdFICvt lNgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=v4kTxKKEwyXqjrRhwS7ceqX/Ds1FYWK7l8ES+nNlj2Q=; b=XJHM/diKUCBJzTREqo58vp1XBVB0HQLe/EONtyWoDiWoCVHhpOH+LcEmA9EXWbloVK kWJ9X1+ycPS5D3dfnXu4Q/+vfwPmTq5xYcFV1iUgkpZROpOcRCe6SDDf19H/QdWebXFX E7AVqeE6vzjDOZvZC2R3JhJ62El7O9KYkXeiFqtAC+dgcRQ1MfCjwwNUCIEqjpyAHnoN hVAacjQ20jR2LM0seTFMOyeKObORipZusyONDtOuC7AZ8aMETPpwU4waShFjZLySD7Pr L4AQjOBX5jHXhdu9cUGVuGJkQPkYs0bG+D0jVQEzt3ngJFhxtT+ayAyDx1jRLk65i3qa N03Q== X-Gm-Message-State: AOAM533aVR25+ek7cnpECKvXSDJLgobl4lo2mObN2p7F490ZxEgkVvd4 9QuDbohLmVyJLg3sgx0bUmjThGgYpuE= X-Google-Smtp-Source: ABdhPJxPHkS9UDWxKBHDmpMid1sA3hRhm3+nxdOKKaZh1a5ZgBB7Q2I4r8iXpH3PNiGlnCWzoiv0tw== X-Received: by 2002:a05:6000:188c:: with SMTP id a12mr3276913wri.45.1640100974106; Tue, 21 Dec 2021 07:36:14 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:13 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 18/19] io_uring: task_work for notification delivery Date: Tue, 21 Dec 2021 15:35:40 +0000 Message-Id: <33b943a2409dc1c4ad845ea0bebb76ecad723ef6.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC workqueues are way too heavy for tx notification delivery. We still need some non-irq context because ->completion_lock is not irq-safe, so use task_work instead. Expectedly, performance for test cases with real hardware and juggling lots of notifications the perfomance is drastically better, e.g. profiles percetage of relevant parts drops from 30% to less than 3% Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 57 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 8cfa8ea161e4..ee496b463462 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -330,11 +330,16 @@ struct io_submit_state { struct io_tx_notifier { struct ubuf_info uarg; - struct work_struct commit_work; struct percpu_ref *fixed_rsrc_refs; u64 tag; u32 seq; struct list_head cache_node; + struct task_struct *task; + + union { + struct callback_head task_work; + struct work_struct commit_work; + }; }; struct io_tx_ctx { @@ -1965,19 +1970,17 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, return __io_fill_cqe(ctx, user_data, res, cflags); } -static void io_zc_tx_work_callback(struct work_struct *work) +static void io_zc_tx_notifier_finish(struct callback_head *cb) { - struct io_tx_notifier *notifier = container_of(work, struct io_tx_notifier, - commit_work); + struct io_tx_notifier *notifier = container_of(cb, struct io_tx_notifier, + task_work); struct io_ring_ctx *ctx = notifier->uarg.ctx; struct percpu_ref *rsrc_refs = notifier->fixed_rsrc_refs; spin_lock(&ctx->completion_lock); io_fill_cqe_aux(ctx, notifier->tag, notifier->seq, 0); - list_add(¬ifier->cache_node, &ctx->ubuf_list_locked); ctx->ubuf_locked_nr++; - io_commit_cqring(ctx); spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); @@ -1985,6 +1988,14 @@ static void io_zc_tx_work_callback(struct work_struct *work) percpu_ref_put(rsrc_refs); } +static void io_zc_tx_work_callback(struct work_struct *work) +{ + struct io_tx_notifier *notifier = container_of(work, struct io_tx_notifier, + commit_work); + + io_zc_tx_notifier_finish(¬ifier->task_work); +} + static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, bool success) @@ -1994,21 +2005,39 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, if (!refcount_dec_and_test(&uarg->refcnt)) return; + if (unlikely(!notifier->task)) + goto fallback; - if (in_interrupt()) { - INIT_WORK(¬ifier->commit_work, io_zc_tx_work_callback); - queue_work(system_unbound_wq, ¬ifier->commit_work); - } else { - io_zc_tx_work_callback(¬ifier->commit_work); + put_task_struct(notifier->task); + notifier->task = NULL; + + if (!in_interrupt()) { + io_zc_tx_notifier_finish(¬ifier->task_work); + return; } + + init_task_work(¬ifier->task_work, io_zc_tx_notifier_finish); + if (likely(!task_work_add(notifier->task, ¬ifier->task_work, + TWA_SIGNAL))) + return; + +fallback: + INIT_WORK(¬ifier->commit_work, io_zc_tx_work_callback); + queue_work(system_unbound_wq, ¬ifier->commit_work); } -static void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) +static inline void __io_tx_kill_notification(struct io_tx_ctx *tx_ctx) { io_uring_tx_zerocopy_callback(NULL, &tx_ctx->notifier->uarg, true); tx_ctx->notifier = NULL; } +static inline void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) +{ + tx_ctx->notifier->task = get_task_struct(current); + __io_tx_kill_notification(tx_ctx); +} + static void io_notifier_splice(struct io_ring_ctx *ctx) { spin_lock(&ctx->completion_lock); @@ -2058,7 +2087,7 @@ static struct io_tx_notifier *io_alloc_tx_notifier(struct io_ring_ctx *ctx, } else { gfp_t gfp_flags = GFP_ATOMIC|GFP_KERNEL_ACCOUNT; - notifier = kmalloc(sizeof(*notifier), gfp_flags); + notifier = kzalloc(sizeof(*notifier), gfp_flags); if (!notifier) return NULL; ctx->nr_tx_ctx++; @@ -9502,7 +9531,7 @@ static void io_sqe_tx_ctx_kill_ubufs(struct io_ring_ctx *ctx) tx_ctx = &ctx->tx_ctxs[i]; if (tx_ctx->notifier) - io_tx_kill_notification(tx_ctx); + __io_tx_kill_notification(tx_ctx); } } From patchwork Tue Dec 21 15:35:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12689997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C63B7C433FE for ; Tue, 21 Dec 2021 15:36:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239786AbhLUPgt (ORCPT ); Tue, 21 Dec 2021 10:36:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239671AbhLUPgZ (ORCPT ); Tue, 21 Dec 2021 10:36:25 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F8D1C061353; Tue, 21 Dec 2021 07:36:16 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id d198-20020a1c1dcf000000b0034569cdd2a2so2338538wmd.5; Tue, 21 Dec 2021 07:36:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ndxoRy4rGGaCkWxXQFnZHmFgZYqK7OXKTJYFtStJ0X4=; b=KYRRTZQRKO3Y3JBA4O0LbLv5vgp/h1JWUgTfXKTwSG5t+df/tUxIo2Hl5cs00Vd6Kq 3vkjTR/O5tnAKXLzlzHvz5ZurykhzZma1H/5pr0ad5v0L3gkOWiLgdjNEkySeF9DBg6F Kvnf7zFg+SoJRhCORm9b3gaqbmn0hTU/h0n8/GHm5VXT2wO+6/cM5+3rROQXaIVAyWRl XlYb0x4uu696zld0qfsyGO7LyYZI1scYapoZi/e7D5S0H5oe7ustsEv/r3JWek9l9xJ+ auHFApXJ9A1OVk/8u6OI9O59NmoVYtuEjyTYn7RHgjPTWFUMB1ualFu06G0PhosXFZnT 2NzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ndxoRy4rGGaCkWxXQFnZHmFgZYqK7OXKTJYFtStJ0X4=; b=M3FvgIQLdrUfi3znnq+r7ezsyhFNscUXxUVd2erfCEi46L7aUiMx0xX23B0auXyuA3 N9XummMi6ckVBL7qYo5vJL30dkHFt25tRUusnmcHX2vOtuo2+LYPdZ/4dR72g0oXWcuz dXA4K2BH7tRAT0lx5mbbgKUVSTQxhP/4hnRML6xQwfhiMMkDzj3rwCASoQH3ABY2H/LT rBKym4K93khli69FXkcEfmBMoueYHcHprZ5HFIE6dweUW+dBzq09hzWAmC32sBslBlIt AEldSw3rKnYDys4Ux87gMlts9ze5mh033/OmtL14A33hDnHdMTBTcRWxz5oOyvvYXCQ7 Vhtg== X-Gm-Message-State: AOAM530X8qMJcSChg3V8ta/U3MU5SzsWIzLra5HWOiXaSLIf/mZiFrDc oeyWBCpzhfceIyKRa3M0dlki70l+UkA= X-Google-Smtp-Source: ABdhPJykTPG64SPTcjahxd/ZwDRSocqmGfaSsEB6maQNdzhCIsTNh8cSQMkgjYwG1D/Z1SdA/EaBdA== X-Received: by 2002:a7b:ce83:: with SMTP id q3mr3217705wmj.37.1640100975038; Tue, 21 Dec 2021 07:36:15 -0800 (PST) Received: from 127.0.0.1localhost ([148.252.128.24]) by smtp.gmail.com with ESMTPSA id z11sm2946019wmf.9.2021.12.21.07.36.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Dec 2021 07:36:14 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe , Pavel Begunkov Subject: [RFC v2 19/19] io_uring: optimise task referencing by notifiers Date: Tue, 21 Dec 2021 15:35:41 +0000 Message-Id: <465e422a249a5eaad413a6488568a03a2160c066.1640029579.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Use io_put_task()/etc. infra holding task references for notifiers, so we can get rid of atomics there. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index ee496b463462..0eadf4ee5402 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2008,7 +2008,7 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, if (unlikely(!notifier->task)) goto fallback; - put_task_struct(notifier->task); + io_put_task(notifier->task, 1); notifier->task = NULL; if (!in_interrupt()) { @@ -2034,7 +2034,8 @@ static inline void __io_tx_kill_notification(struct io_tx_ctx *tx_ctx) static inline void io_tx_kill_notification(struct io_tx_ctx *tx_ctx) { - tx_ctx->notifier->task = get_task_struct(current); + tx_ctx->notifier->task = current; + io_get_task_refs(1); __io_tx_kill_notification(tx_ctx); }