From patchwork Tue Jul 5 15:01:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906667 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 201DECCA483 for ; Tue, 5 Jul 2022 15:01:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230231AbiGEPBx (ORCPT ); Tue, 5 Jul 2022 11:01:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231801AbiGEPBu (ORCPT ); Tue, 5 Jul 2022 11:01:50 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8D2B14D09; Tue, 5 Jul 2022 08:01:49 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id b26so17991116wrc.2; Tue, 05 Jul 2022 08:01:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aRSWPFJdUal4sFDyAFcRSHQQ/M3F+xYGlhF06C7Lf5E=; b=RpkqCmLRhozjhYWzxW++3rhKUll3T+ICVb4zS6YNhNwRZX9MzhleXzAos7D7y6kN7u +UV2vsl1Z5EZERZbQvJyE3VJOnCtjjLRahsHgr9xK7mU80yNVtuUqgOtvtTciPFTajsg TFeMR9IH0vUqButRQDALFTzUA2iDaOG1wkPBMH//PWk6GmfoE3nUaScu8j/smM9roA8l Uo50it4z5qg84LP2UXE5bjUcv9BMTg01ohdcauUK+gYJi9snniD9Q2EEPbGHawE8RlJW RW1v63M00TSsMBchemmJq8dgMBu13Ww0ZjVcTYmaC9LHGy1CtIOnvtIFFDevIKmKd31e xQew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aRSWPFJdUal4sFDyAFcRSHQQ/M3F+xYGlhF06C7Lf5E=; b=cdEIXXrGO+ONaswF0Ono9GaOwXr+oAkmTzhEPsvECXG29ojQCD1kifsYkZZgd1I98F IcULMBfO1uXfYLK9Z93NnWPqvTqpXIFt/syoF5xsgzOj6ahT697O6xj+jOd7GwKdReld rciO3BTR5/eDgzjbPHlwPVzd3aBVAZYX7ybIxx3KqKLfmxbg1aLDjxAZiPBvk1w4/2Q7 rx0JvSzL4AJOw1cGrnTQdAYJy6+5uYquATWOj4qAC20QH18O4m/QN6EBh0Yty40mpVbW vUoOiMWY+1QCuvZgcc8NvevFzp54Zj9C+954EsG7svG9cqdlxeWJTt4C1T+PGgC6m1y+ to8Q== X-Gm-Message-State: AJIora9ILoyrXj+q52vpy7up/Iluupdl6Gfg6joclLF52WKnd02PGEU9 VXzoksfFmclC3pogVTXUoak99DuU2d/0Bg== X-Google-Smtp-Source: AGRyM1u5PixJZ36vQLGgOdknikkAz8OeJE6qGe2mw2AmgLxXZNsqeQE4JXo11kUAjuPPYBIx6csFdw== X-Received: by 2002:a5d:4cca:0:b0:21d:6786:fb6e with SMTP id c10-20020a5d4cca000000b0021d6786fb6emr12758028wrt.233.1657033308024; Tue, 05 Jul 2022 08:01:48 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:47 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 01/25] ipv4: avoid partial copy for zc Date: Tue, 5 Jul 2022 16:01:01 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 00b4bf26fd93..581d1e233260 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk, struct inet_sock *inet = inet_sk(sk); struct ubuf_info *uarg = NULL; struct sk_buff *skb; - struct ip_options *opt = cork->opt; int hh_len; int exthdrlen; @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; unsigned int maxfraglen, fragheaderlen, maxnonfragsize; int csummode = CHECKSUM_NONE; struct rtable *rt = (struct rtable *)cork->dst; @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Jul 5 15:01:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906669 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21412C43334 for ; Tue, 5 Jul 2022 15:02:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231948AbiGEPB4 (ORCPT ); Tue, 5 Jul 2022 11:01:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbiGEPBw (ORCPT ); Tue, 5 Jul 2022 11:01:52 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CF2E14D09; Tue, 5 Jul 2022 08:01:51 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id d16so11643847wrv.10; Tue, 05 Jul 2022 08:01:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dYMbDXosaDJGTx1SMZg80SJyMd9PTOfSaObug9b0vzk=; b=VWeBvDFkh8D1weqwYdfH5YUawupmI1QffjEzGgjikGvcZsNdfamf0C+moIPPct2Zub Wu+3Xx1SAcx0YIabChzSZwiufgpVm6Cz4VaEQ8fPbZGhQc7kxZLRVDvygy6+5mVc2Hil yJSR4Fss2zO+bG7+2inBPr+dfDedhNJuzdXa7wnC9C6OudDyZe+BcK5Zw3e/7ME/c0Ko qcJ4S6d/YJxLhGZhk0W00Jp0ZlYZJ1WJgimHFRxNPoynDJSv8PfrKPlN9WmlD4P+G/8R KDgDbWgzPoox+AWJCtYkgMNAKVnD7tD2IS9FeIIYHNqCAmPtSWZljWXSzO4B7nbLuk5s bh1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dYMbDXosaDJGTx1SMZg80SJyMd9PTOfSaObug9b0vzk=; b=zA7msRd5iqkqp9OUDVseodN+4GOSBNxcIN+Ll0beURgPUgnIxyMquNKmvfjDr6e6ZP C+jaPlkjTSACGMJCWiRqoi/oxDCwx1Ik7jFuiXoNqnrRaXlmuH3ZF1q6o8xDfi6w/Rq6 QZ5JK1X/1CkwZ3+m2wVSPfBsik9uwMaOU7b5g1VROLE5QgxjygGn12wNHmP1Et4BcW9h FWFtb9hocTl7amJu+2ziJTcCuaVNL/sNdDE2LD3VfrgdF+QCcOnkAnLs9r4JSg9vLamf 5SPse0yfpJdR2qzszIXR87gtMGxzs4A06lPdLmm42tjUJQx2p0yjiv1sFB2otKgHSjDk Z9Xw== X-Gm-Message-State: AJIora8vuLgxsFHiMUN3omgctvW2A00vboAAvw27merXQFmDZp3sW7pj 96SF73MBIkWx0V5P4YIVIyns2y7m4VF+4w== X-Google-Smtp-Source: AGRyM1u7XTJA2ZWJMYR5RZUMlys8inmCplPmx8n7EZMFT7CpQGQ9PzI/WJ9d8GEc4zwiNwuLWR6beA== X-Received: by 2002:adf:e691:0:b0:21d:416f:8d16 with SMTP id r17-20020adfe691000000b0021d416f8d16mr25643433wrm.338.1657033309417; Tue, 05 Jul 2022 08:01:49 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:48 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 02/25] ipv6: avoid partial copy for zc Date: Tue, 5 Jul 2022 16:01:02 +0100 Message-Id: <23994117821e178dd0835d19016bea14b4296f40.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 77e3f5970ce4..fc74ce3ed8cc 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1464,6 +1464,7 @@ static int __ip6_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; u32 tskey = 0; struct rt6_info *rt = (struct rt6_info *)cork->dst; struct ipv6_txoptions *opt = v6_cork->opt; @@ -1549,6 +1550,7 @@ static int __ip6_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1630,9 +1632,12 @@ static int __ip6_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Jul 5 15:01:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906671 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 748F1C43334 for ; Tue, 5 Jul 2022 15:02:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232245AbiGEPCD (ORCPT ); Tue, 5 Jul 2022 11:02:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232014AbiGEPBy (ORCPT ); Tue, 5 Jul 2022 11:01:54 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 878D715739; Tue, 5 Jul 2022 08:01:52 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id be14-20020a05600c1e8e00b003a04a458c54so7409032wmb.3; Tue, 05 Jul 2022 08:01:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SCyYZfevfzncrr2ZRaiwUivx5wLa+/wrAD8N6KE8FTk=; b=N28TNU1LrOgBz8bu3KKLxHMPsVF51gf6znnZspVrXdu6ClOrlZ/wdBGd8iABY3eObR kfO/pFRH0c0QIzJdusD7ySOXKXxcSi2me11Jz6IPfs8ttCCnFN2NDEu3p8Q8xJaGv8CR yC1AIiQ1NICSGPkJo5mS3d6E34B7g3W4/33UU3dpmNk1EvcmdfaNIzzleDUA+/aiTs89 n/YbH2L4yOVbfvjwVjLv1h6o5+GiBd3KytKXvEk4WQEvQwCsh3WAOBZ0su9xOd9BEhk/ n8nmfkBaFYqKW3BWywBKxkYgFy9jMaZMR6eUB5sP7yskPqwJJI98Ee1ttNFJJsW6l1k0 BsuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SCyYZfevfzncrr2ZRaiwUivx5wLa+/wrAD8N6KE8FTk=; b=Pp6sbuzCbBPQSTeHMakTlDihJKT+rRiTI4HDtN4TIzVe37SkNKOOkOgcltH2mw+TFp G7sfxQAmqhQbiCwZ8ndM40tgh1x01LcNaG+K0Wi730Ds32iJYPFyqRFH3cSZ6Ik3KF6Y kD1PXzuMzBtBNGTppQe9PvV/AOr97zGgzd+19Wftb69DxUTE2ruR1zZ4H7KhWpMSl4JC ZeO2ooIEKKm8TpqLeCSb/2nEXFeaHvaVnpDoufFYZYx8/Ue7CvkJU/9PpMcO8dtCTlDd NeRjGYxYLIZN1buoW/r3StD7YFg/lwjGlnJhc13uHBjPqmTD6rIawE7IE8MIQeLdJPl5 XlBg== X-Gm-Message-State: AJIora9QsISNZWLluJOYRlOxzdMxRsKaqH0v+KjnO8eExn0lZikOGUq/ DWAJtw7U2k7wwuld5zbVwpCk8wk07FzNRQ== X-Google-Smtp-Source: AGRyM1tl1xTiJ8iAj/GyHkhpPH3ky/+OmcWi0FLD5rIUnAYXUpOw7bZZ4u7UnQoF9g9/OhotV5Cv8g== X-Received: by 2002:a05:600c:3b1d:b0:3a2:60a1:fe30 with SMTP id m29-20020a05600c3b1d00b003a260a1fe30mr10797775wms.193.1657033310609; Tue, 05 Jul 2022 08:01:50 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:50 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 03/25] skbuff: add SKBFL_DONT_ORPHAN flag Date: Tue, 5 Jul 2022 16:01:03 +0100 Message-Id: <0504267c0e7d8a4300949aa571d3459bf0d526aa.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 8 +++++--- net/core/skbuff.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d3d10556f0fa..8e12b3b9ad6c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -686,10 +686,13 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) -#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY) +#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ + SKBFL_DONT_ORPHAN) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -3182,8 +3185,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b3559cb1d82..5b35791064d1 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1193,7 +1193,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; - uarg->flags = SKBFL_ZEROCOPY_FRAG; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; refcount_set(&uarg->refcnt, 1); sock_hold(sk); From patchwork Tue Jul 5 15:01:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906670 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B430CCA487 for ; Tue, 5 Jul 2022 15:02:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232166AbiGEPB7 (ORCPT ); Tue, 5 Jul 2022 11:01:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232076AbiGEPB4 (ORCPT ); Tue, 5 Jul 2022 11:01:56 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDD911582D; Tue, 5 Jul 2022 08:01:53 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id be14-20020a05600c1e8e00b003a04a458c54so7409056wmb.3; Tue, 05 Jul 2022 08:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zggj+sxG7/uMjY/uk4DnCUDf8l/BuhDX6aIfTWAW024=; b=h4V4gtMnxqGFJcfzYa/zBiYbxwjXmcKp3K2Esev1ZdXtPpNHo6K5WnAE8+UkWxNfvu ahhDHRlk+pUkKUjg/p4ZGhQTNn+ughE9Ux1i/qzoAZ5iqC+WRDYjHnMYrTvHc8Y2ph/T Id1c0fFHuV9wy/HQjYYUphFzLwZJR3WxBrklScdp0yZM6gUm2bNSAzGsLJZzCDhex/4j E6vSbcG/9u1OBugHSLvJlCyNDKCEpBPWjsuqsI6nIoOEitAoyToSdPjnzah9ypvQFybu 72Oei49YAdWPDk6gYagFr3OD3M6OLyk8U7Decr74BBnoYd4NzKbFaNT8NEOC9U4qZncl 10KQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zggj+sxG7/uMjY/uk4DnCUDf8l/BuhDX6aIfTWAW024=; b=azbfqZ3iI85kcZ+G4pYYJprSxuIibc86qI1DjXxb2HKoMLs9RFC4DvRwPMq5X/xLxP T8Bn/WjPaqpHtaj4wZl7T6VjFq/C9sozP5ZIL7xkf9z5NTF2vZZqwHm39lH5xDGxxPSP QPe3iDtJUmT4ZFuet2zTV9CpOq+X1XPtElgUJy8Wj6fsE9RWH29rznMnPlIg0sDidMxv tpYhr7HNky9p0kyCN1PBfZUPbOwnA1bq3MTzFOwJHiCKJ5nrAWU+GSnHwGijhA+0YhpQ hxcM82jtLnHcsz9ugMbCoKLFHr0ZXWxkMdQ35kQ3ViYRXu+rkb1QvaPLnoImK7ciK4Kb IBdA== X-Gm-Message-State: AJIora+m8QkdWjErXsB24cpU1jc7nrW2edxku48qgHkkxPAmuS8pFfP6 pa2IeOSS9PcklKqy3+1twEpU3MUAKQojxQ== X-Google-Smtp-Source: AGRyM1thdfxrHq4rCONUe845Gw5mKreCs1UB8JGVmFQs5G+ITm5x3AO8/wi8GdkR74gONs+K38b2sA== X-Received: by 2002:a05:600c:4ca7:b0:3a0:3905:d441 with SMTP id g39-20020a05600c4ca700b003a03905d441mr37364593wmp.159.1657033311818; Tue, 05 Jul 2022 08:01:51 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:51 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 04/25] skbuff: carry external ubuf_info in msghdr Date: Tue, 5 Jul 2022 16:01:04 +0100 Message-Id: <6ca7e21d7a0c1abafc51579a8395c8a9d4963efb.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov --- include/linux/socket.h | 7 +++++++ io_uring/net.c | 4 ++++ net/compat.c | 2 ++ net/socket.c | 6 ++++++ 4 files changed, 19 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..ba84ee614d5a 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -66,9 +66,16 @@ struct msghdr { }; bool msg_control_is_user : 1; bool msg_get_inq : 1;/* return INQ after receive */ + /* + * The data pages are pinned and won't be released before ->msg_ubuf + * is released. ->msg_iter should point to a bvec and ->msg_ubuf has + * to be non-NULL. + */ + bool msg_managed_data : 1; unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { diff --git a/io_uring/net.c b/io_uring/net.c index 19a805c3814c..d95c88d83f9f 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -255,6 +255,8 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -525,6 +527,8 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_flags = 0; msg.msg_controllen = 0; msg.msg_iocb = NULL; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; flags = sr->msg_flags; if (force_nonblock) diff --git a/net/compat.c b/net/compat.c index 210fc3b4d0d8..435846fa85e0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -80,6 +80,8 @@ int __get_compat_msghdr(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; + kmsg->msg_managed_data = false; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/socket.c b/net/socket.c index 2bc8773d9dc5..0963a02b1472 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2106,6 +2106,8 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2171,6 +2173,8 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2409,6 +2413,8 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; + kmsg->msg_managed_data = false; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0; From patchwork Tue Jul 5 15:01:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906672 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C8D4C43334 for ; Tue, 5 Jul 2022 15:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232009AbiGEPCG (ORCPT ); Tue, 5 Jul 2022 11:02:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231917AbiGEPB4 (ORCPT ); Tue, 5 Jul 2022 11:01:56 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE71315A06; Tue, 5 Jul 2022 08:01:54 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id c131-20020a1c3589000000b003a19b2bce36so4304528wma.4; Tue, 05 Jul 2022 08:01:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mNAl7aHtaBa/r+OrmETRXcPT/IbWKen65cbhqfyTMuk=; b=QIyJpbFm4K9zyHmtH9uiaoaO0dVKXXutkbT93MI7Aq3iCVG2ljN3+ePEIY2svBMUE2 yftHamAuWuQvysHJMim9F1bKKeoKDDe0jo4y5Um+20UmdHpsaRqqVUeGVx9MAQ084omx zXho48b4/txDxxSnNc7zUN3w8dfqE2st7rxRaoeKXdKNYItOV9Vr2XFatEezzsQCVn8J 5/V9ImdqnEtCpbjHyHsZ4oOieMH+NxwB9KPBDhfQ067uQ2eMTAVidQ558eObfcwVvEsH HKGCCPG0pOIlUPRfmc1UiiQC8Eov9ADvp9aII5n1QIuBwj+0YdnJ7OpeBE4O2ZTOJMQd MVVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mNAl7aHtaBa/r+OrmETRXcPT/IbWKen65cbhqfyTMuk=; b=WJa0CXR8fDYKHbD1AT4/8qCvprYkqA1DapYzW00gCpcPphxBYBcO86LcCU3wul81Jn I44Ivbt9jHIfm9qRoDF7LTznUWKt49PtoPJDyYGfa4Za0M8KirbWX2h+mh+tx8sWXj5E p2q+N+Phm7QzxopjDVguymOpjXp8Y3bR02jzhYa85pLZBBT0U5pGq5OMWJwhwVnLTwYq SDFkp8wtbLS8fjGwj+9GX0y+0pm8JPbMLGn/wezM/fhnwqmCKKUHuFWOBh8gr8h5seWl jNpkLwCr3R8N7Yrktm10oZ6LEhwEj/3jSG7fzImSjVXU/6u1jFg5oN2PF/4snAJHRboU gC6A== X-Gm-Message-State: AJIora89X9O4QCFwF258ofwcEP5M3a1zK9rRPZFrz8lhz+veNo4l9MSE FAV6kGCTTTfydRUrH/TFETIS9yFsmp+pGQ== X-Google-Smtp-Source: AGRyM1t+FRcbfBXNkFqqNHfWVbAfizNxyQZmmszAH48Uv7YGgDS0ifHmhMNNBh/9yrg+A+eG21iXrQ== X-Received: by 2002:a7b:ce04:0:b0:3a1:92e0:d889 with SMTP id m4-20020a7bce04000000b003a192e0d889mr20351544wmc.131.1657033313017; Tue, 05 Jul 2022 08:01:53 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:52 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 05/25] net: bvec specific path in zerocopy_sg_from_iter Date: Tue, 5 Jul 2022 16:01:05 +0100 Message-Id: <4d0050583906d5fc4db710019995fb76805c9b05.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add an bvec specialised and optimised path in zerocopy_sg_from_iter. It'll be used later for {get,put}_page() optimisations. Signed-off-by: Pavel Begunkov --- net/core/datagram.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/net/core/datagram.c b/net/core/datagram.c index 50f4faeea76c..5237cb533bb4 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -613,11 +613,58 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); +static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + int frag = skb_shinfo(skb)->nr_frags; + int ret = 0; + struct bvec_iter bi; + ssize_t copied = 0; + unsigned long truesize = 0; + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size && frag < MAX_SKB_FRAGS) { + struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); + + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + get_page(v.bv_page); + skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + if (bi.bi_size) + ret = -EMSGSIZE; + + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; +} + int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + if (iov_iter_is_bvec(from)) + return __zerocopy_sg_from_bvec(sk, skb, from, length); + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL; From patchwork Tue Jul 5 15:01:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906673 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C06DC43334 for ; Tue, 5 Jul 2022 15:02:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231265AbiGEPCP (ORCPT ); Tue, 5 Jul 2022 11:02:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232034AbiGEPB5 (ORCPT ); Tue, 5 Jul 2022 11:01:57 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F98D15A1A; Tue, 5 Jul 2022 08:01:56 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id r81-20020a1c4454000000b003a0297a61ddso9891184wma.2; Tue, 05 Jul 2022 08:01:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4t5Kscu3D6yj1DZiiNE3gVRmP9uRlouDzttfwtsAkPk=; b=LNPM6yQHjSKJHWhNedHDAbtXAlc5YgSwH6KNwhC2sg5oL401iUwpZM8lQE/Fpz9fuD AI122A0OO/qOeJBgYzq5A2P5xBDQgmJq9Sp1whhazWqEkdEwH0C0wTpkXV2lJzzjeKSm +vG672hdlLqP1JpTZ89CJK5OIbCNABURob3aSlDhz9f49g80Xc/CNcJuer93sxfw3TD5 1Q1Hz4xpYufAQCr9E5v0B7okuC7x21CR6Rarut1GXBTfJIoZStzx8e/3e+LTmvMuZMkv mTyojYhLtigVuARAtuUJ07D/LIvLdiTkPWeRXbkwj+6kdU2E+QB7hJz3x9Ju+ly3+soz EO6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4t5Kscu3D6yj1DZiiNE3gVRmP9uRlouDzttfwtsAkPk=; b=EKBqmKx0lZh1q4cJBAniQJsVEN0mb/eGYH1QKI+xTtDgptB0T6Zn7ZOWudrIXfoJWG weUI5ostThPzdXBa1HDK+ttKWTBq1z0sQPOP/YxwyF7pBhHhZyKawLTr3xiy6CBgB88W zVotFhB2NhagW5Au44/EGrv8ww0MIgrF2wn/davc/x5oar4W7h/ChBUnvQdyaFQZQSX0 +RkikzNru5Ly8+zC2kL+FdKI8T4D16FxjshVNWDuJPyZN7wnpPaog8+CVsFBW7XC+fPv WdkufWxv+XNwC+BXilfTG9j3DhPaMB6cQlRI/pxNZUEP7AOT3je9LKf2IA+9RYh1rED4 Raiw== X-Gm-Message-State: AJIora/52YSuqgy4RL8GGcahRcQdNIEVAoRWgtVn7o09PZSbNFltJ2Xk l/kHAg5OAVRe3Y04PJaYxKm/fqf9zmsdlg== X-Google-Smtp-Source: AGRyM1vPbCLDD79OYBhTdh3jBLGyfugpcOftgr5OEHSbzmbkKsnqhG5bUBXeH0jXviHsODsqsOc0Hg== X-Received: by 2002:a05:600c:35cf:b0:3a0:49c1:f991 with SMTP id r15-20020a05600c35cf00b003a049c1f991mr37448421wmq.95.1657033314447; Tue, 05 Jul 2022 08:01:54 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:54 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 06/25] net: optimise bvec-based zc page referencing Date: Tue, 5 Jul 2022 16:01:06 +0100 Message-Id: <255398d582d4956871d0c35c929da158ef72b781.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Some users like io_uring can pass a bvec iterator to send and also can implement page pinning more efficiently. Add a ->msg_managed_data toogle in msghdr. When set, data pages are "managed" by upper layers, i.e. refcounted and pinned by the caller and will live at least until ->msg_ubuf is released. msghdr has to have non-NULL ->msg_ubuf and ->msg_iter should point to a bvec. Protocols supporting the feature will propagate it by setting SKBFL_MANAGED_FRAG_REFS, which means that the skb doesn't hold refs to its frag pages and only rely on ubuf_info lifetime gurantees. It should only be used with zerocopy skbs with ubuf_info set. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 25 +++++++++++++++++++++++-- net/core/datagram.c | 7 ++++--- net/core/skbuff.c | 29 +++++++++++++++++++++++++++-- 3 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8e12b3b9ad6c..712168c21736 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -688,11 +688,16 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAG_REFS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ - SKBFL_DONT_ORPHAN) + SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -1809,6 +1814,11 @@ static inline bool skb_zcopy_pure(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_PURE_ZEROCOPY; } +static inline bool skb_zcopy_managed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -1883,6 +1893,14 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) } } +void __skb_zcopy_downgrade_managed(struct sk_buff *skb); + +static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + if (unlikely(skb_zcopy_managed(skb))) + __skb_zcopy_downgrade_managed(skb); +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -3498,7 +3516,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!skb_zcopy_managed(skb)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** diff --git a/net/core/datagram.c b/net/core/datagram.c index 5237cb533bb4..a93c05156f56 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -631,7 +631,6 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - get_page(v.bv_page); skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } @@ -660,11 +659,13 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { - int frag = skb_shinfo(skb)->nr_frags; + int frag; - if (iov_iter_is_bvec(from)) + if (skb_zcopy_managed(skb)) return __zerocopy_sg_from_bvec(sk, skb, from, length); + frag = skb_shinfo(skb)->nr_frags; + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b35791064d1..71870def129c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,11 +666,18 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) goto exit; - skb_zcopy_clear(skb, true); + if (skb_zcopy(skb)) { + bool skip_unref = shinfo->flags & SKBFL_MANAGED_FRAG_REFS; + + skb_zcopy_clear(skb, true); + if (skip_unref) + goto free_head; + } for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); +free_head: if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); @@ -895,7 +902,10 @@ EXPORT_SYMBOL(skb_dump); */ void skb_tx_error(struct sk_buff *skb) { - skb_zcopy_clear(skb, true); + if (skb) { + skb_zcopy_downgrade_managed(skb); + skb_zcopy_clear(skb, true); + } } EXPORT_SYMBOL(skb_tx_error); @@ -1371,6 +1381,16 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream); +void __skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + int i; + + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAG_REFS; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) + skb_frag_ref(skb, i); +} +EXPORT_SYMBOL_GPL(__skb_zcopy_downgrade_managed); + static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig, gfp_t gfp_mask) { @@ -1688,6 +1708,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, BUG_ON(skb_shared(skb)); + skb_zcopy_downgrade_managed(skb); + size = SKB_DATA_ALIGN(size); if (skb_pfmemalloc(skb)) @@ -3484,6 +3506,8 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len) int pos = skb_headlen(skb); const int zc_flags = SKBFL_SHARED_FRAG | SKBFL_PURE_ZEROCOPY; + skb_zcopy_downgrade_managed(skb); + skb_shinfo(skb1)->flags |= skb_shinfo(skb)->flags & zc_flags; skb_zerocopy_clone(skb1, skb, 0); if (len < pos) /* Split line is inside header. */ @@ -3837,6 +3861,7 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, if (skb_can_coalesce(skb, i, page, offset)) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size); } else if (i < MAX_SKB_FRAGS) { + skb_zcopy_downgrade_managed(skb); get_page(page); skb_fill_page_desc(skb, i, page, offset, size); } else { From patchwork Tue Jul 5 15:01:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906676 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 827D1C433EF for ; Tue, 5 Jul 2022 15:02:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230182AbiGEPCV (ORCPT ); Tue, 5 Jul 2022 11:02:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232161AbiGEPB7 (ORCPT ); Tue, 5 Jul 2022 11:01:59 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C968C1582D; Tue, 5 Jul 2022 08:01:57 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id n185so7199349wmn.4; Tue, 05 Jul 2022 08:01:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=444AasoIjcG70+nERpDu52ztpjjLs2LzhKg2x6YMB6U=; b=f3gYRqX5nVoGRNzWh42VpCUHEYoASYcoraNM09PKq6uUpWC9LdJswEjuKG53v22GKc 0MD49/dzwvF+CAOQsekNj1dcL5MDACaemPMHzz6D5rgPdwnRFo2wjIQiM66uCzwvjcGh HfgLyfjcABeWpJxSxMD3bil1SaCjh3KOg+l5iXvTww88RivKqTMjCQpIovk6PUOYDf2x 8PV5LdZrCobG59M7poitj8tsmRrwzkcSsA9/qPHLKe/XxB1tCB6shw5XIH1V34NEx0cT tZOhON6kHJKwEsCTwHU0hFzC4YzNfcT+MgGGbneke+PguecXnA0SSAhV0oJsxlyIhlBT 2Y5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=444AasoIjcG70+nERpDu52ztpjjLs2LzhKg2x6YMB6U=; b=4b7/APf9reJsiclwiaTDg0KAvgHz4Rw8/veBtQF13yXlutpOINmTIHGDxaqEmwHlHU mwM329UL+zpJVLTOvrUOHikCo5jZGzMfw/QLpXyS7rdRf4QZNEcocskByMpymjfx5+GU y234opE0TpdiluYbbAYmExMMKSm89KnKWjt3fpZTZRxtdq+ZMqwE352CEcgYlZPnd1fW sbRX57ThgoSwfAQ6ODUwIvOA1kLdUyna8v73xLt7KOuPUgyCndmPSXd6aJ/M/MY2KRSS 6flCtRLji36s6XZC5fm03om4wKp5lNPkeAR1NiZ10pvUvZuQ3Ing/xwBrVvsAhXIdUkL wGEg== X-Gm-Message-State: AJIora+yO/H6xtd7zNoxptfBPzoYyQR7t3rDaTWcha/2MVSIJE7yiCKY z3aZKwb+vffQESWZJlB4cFcE9AOBPex9Qg== X-Google-Smtp-Source: AGRyM1tl+xxus7ISOtrStGl5Jb6Id/jUsAT5Gel6Db0EkOREXG4KiRAHmpLu+nTNrx6XF+EnDRmDqw== X-Received: by 2002:a05:600c:3ca2:b0:3a0:1825:2e6b with SMTP id bg34-20020a05600c3ca200b003a018252e6bmr38704300wmb.132.1657033315746; Tue, 05 Jul 2022 08:01:55 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:55 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 07/25] net: don't track pfmemalloc for managed frags Date: Tue, 5 Jul 2022 16:01:07 +0100 Message-Id: <2f699cf7f534df23ed1fe51f88bf832706f215f2.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Managed pages contain pinned userspace pages and controlled by upper layers, there is no need in tracking skb->pfmemalloc for them. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 28 +++++++++++++++++----------- net/core/datagram.c | 7 +++++-- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 712168c21736..2d5badd4b9ff 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2549,6 +2549,22 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, + int i, struct page *page, + int off, int size) +{ + skb_frag_t *frag = &shinfo->frags[i]; + + /* + * Propagate page pfmemalloc to the skb if we can. The problem is + * that not all callers have unique ownership of the page but rely + * on page_is_pfmemalloc doing the right thing(tm). + */ + frag->bv_page = page; + frag->bv_offset = off; + skb_frag_size_set(frag, size); +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2565,17 +2581,7 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - /* - * Propagate page pfmemalloc to the skb if we can. The problem is - * that not all callers have unique ownership of the page but rely - * on page_is_pfmemalloc doing the right thing(tm). - */ - frag->bv_page = page; - frag->bv_offset = off; - skb_frag_size_set(frag, size); - + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; diff --git a/net/core/datagram.c b/net/core/datagram.c index a93c05156f56..3c913a6342ad 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -616,7 +616,8 @@ EXPORT_SYMBOL(skb_copy_datagram_from_iter); static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { - int frag = skb_shinfo(skb)->nr_frags; + struct skb_shared_info *shinfo = skb_shinfo(skb); + int frag = shinfo->nr_frags; int ret = 0; struct bvec_iter bi; ssize_t copied = 0; @@ -631,12 +632,14 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, + v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } if (bi.bi_size) ret = -EMSGSIZE; + shinfo->nr_frags = frag; from->bvec += bi.bi_idx; from->nr_segs -= bi.bi_idx; from->count = bi.bi_size; From patchwork Tue Jul 5 15:01:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906674 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62F43CCA47B for ; Tue, 5 Jul 2022 15:02:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232554AbiGEPCR (ORCPT ); Tue, 5 Jul 2022 11:02:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231337AbiGEPCE (ORCPT ); Tue, 5 Jul 2022 11:02:04 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 010AF15A2A; Tue, 5 Jul 2022 08:01:58 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id q9so17960415wrd.8; Tue, 05 Jul 2022 08:01:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=p9WO85kcK3aAv/etCUIOEoECnFGVfOx8cNxht2BIvVM=; b=CI7e5TdX+dYyE/mbEe4ytqljwSIM3/33pze+KIqSLLhgaZvPFeVzg5UH/mwNeqxgJI sRzBNcUGKQIyjOs6ru70yOBjRwemZHuXVlAvBHEjGGzQ9xXT9LHRJBgWGXUf0mBNAmPM xCNZtTZpFaM+TgPbaIBteAnum7EBcGhJKdD0ALmV5ZKaF2Rww7eBf/tf5+NyFJ2fUwi1 7E5KTjKXvEVTWIfNNMywdK0kPXqHnLryp3g0y/jy9dkvGjb/LPJgEHzbjTEIXyyAkyWJ 0NUOX9M1w/R94/wGsu3wPCyRiquBPwhE+kmuAJfFMT99IG5881yxy9fhU0NthOG3DScw EfJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=p9WO85kcK3aAv/etCUIOEoECnFGVfOx8cNxht2BIvVM=; b=Siu4RXrDzFQazELvN7tdQ1j2s2Y8nctMORxUpU/CUdcVSHjUF/3UABD68HIc95WAnF //bq8y2od0Icaa0yygfFVkwHumkzn+J+I8o1ozO9CHj8gmkkJKt/5NZXWvg4VUzQLbR7 dAakVJg+ywVIr3nkOolrO8YIVsGHwwfqEFHTCEF7yfNhiLoIsp1gb8DDNTAS0pGa/Zai sKMvtEKM43BI4zQvBucxwuHFZaGmFpmpezQbEshO6m5BZBV+C51zv0E8964TRNi8BNoj pPW3/BjeUtCGbC+bSX4y4qGhCs82fpSEWjH5Dc1bP7nGOKJovvyOAhV4LivvqOMq69K6 tTlw== X-Gm-Message-State: AJIora+bIBZGepcxL0EzMOQy8n4+7qzwdJs9OD6kPME9XwkGq0r9fcDM WKRR1B8+COSnPkGsBHXAWZ/bXoll+vq3Xg== X-Google-Smtp-Source: AGRyM1tRZ7p7U5wZ/XqQmsWYqPvARkMaVbzOGMkv0lpdmFP+8f89/DJaH4m3Um2Ip3lP0MxP5GvAaw== X-Received: by 2002:a05:6000:1f0b:b0:21d:6dae:7d04 with SMTP id bv11-20020a0560001f0b00b0021d6dae7d04mr8271951wrb.414.1657033316935; Tue, 05 Jul 2022 08:01:56 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:56 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 08/25] skbuff: don't mix ubuf_info of different types Date: Tue, 5 Jul 2022 16:01:08 +0100 Message-Id: <8499c042b59474f9969a5a3d3417a0abc07350ae.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org We should not append MSG_ZEROCOPY requests to skbuff with non MSG_ZEROCOPY ubuf_info, they are not compatible. Signed-off-by: Pavel Begunkov --- net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 71870def129c..7e6fcb3cd817 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1222,6 +1222,10 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, const u32 byte_limit = 1 << 19; /* limit to a few TSO */ u32 bytelen, next; + /* there might be non MSG_ZEROCOPY users */ + if (uarg->callback != msg_zerocopy_callback) + return NULL; + /* realloc only when socket is locked (TCP, UDP cork), * so uarg->len and sk_zckey access is serialized */ From patchwork Tue Jul 5 15:01:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906675 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8D1BCCA47B for ; Tue, 5 Jul 2022 15:02:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231308AbiGEPCT (ORCPT ); Tue, 5 Jul 2022 11:02:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231821AbiGEPCE (ORCPT ); Tue, 5 Jul 2022 11:02:04 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 129D61572F; Tue, 5 Jul 2022 08:01:59 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id r81-20020a1c4454000000b003a0297a61ddso9891323wma.2; Tue, 05 Jul 2022 08:01:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+E9HBSea0QLQi12+8hra9Imm7sGSrMpugcSoqydPq9A=; b=Utu6Up9oHePrJ9TGWq9tGqID2WUD4EUiHe8B6aGw6FR86OOrkeX/VRHCltmuY0Q4kD 4k5gH4ghw1J8FKkvsiWSpsHsWSn6WpjZ3M1X/7VpAkvp5CumDUSh8Sv6Dw7JpvTCnPNa Z2yzPproSeTdUZWgPJQgQxI/TuY+zGjKJBflgZPp9Pwm25CrQ+rfauF3JSKOXczv2Kxc 2fM+vEKM0DNb8LuxhXAM5FkyNZbNBb9sJNxfYMAJ1MeUuVuSQAMp81uBO0c2wgNHYPmJ wOOqlEfUMTjFtuAvv49qTVGaQpNvLh7V5PiFHwhlC9Y9fXnnHi26Ooa5DHr9BhJ5S9cc Mffw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+E9HBSea0QLQi12+8hra9Imm7sGSrMpugcSoqydPq9A=; b=H43bVEFfzBJrm3mBNEGYqY8ymgRlON7osh9TGl8HdaR+XvCN/MtoEnKpEXjFb9Fwpy Grn80+a+U7wH9euZLxTdy8QRFnoUrhJipLbqJw6uUhscDV4XoWhE+LfMTtBhV8fPoK40 xk+oPuzkvxwswUdhAK4VHEgfiGy/arGppXlbSL/rc49mY+yKIZx0uMgugrov3//msukS S92mBKBvGRUakl06NUgejihMmTe5X9HXmwgm88Q1dhwXsP6EKoXGNFR+Cz3NpTZfnteT CKo3zZL7LL3XNWOo1FMy2nDbjG9VvnDqYRl4lvjlQDcBWPs6dql7SMpRjPhy2sUFqnxa tylw== X-Gm-Message-State: AJIora8ss68Y5choTUvF9H1PhP3aWn2kXRukTLbDhT1HIv8nCAjb9bWE i/JncKmT3H8f44kFnbAAqMGeC7FVl4IM8A== X-Google-Smtp-Source: AGRyM1sVB6IM/YBuOlaNi9NmHKjqagwpO6IOclnO9H3x3nV4J4wLpsBSdF59jxp8dZjAVeQMhGXMwQ== X-Received: by 2002:a05:600c:4282:b0:3a0:2ddf:4df2 with SMTP id v2-20020a05600c428200b003a02ddf4df2mr39083450wmc.119.1657033318344; Tue, 05 Jul 2022 08:01:58 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:57 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 09/25] ipv4/udp: support zc with managed data Date: Tue, 5 Jul 2022 16:01:09 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Teach ipv4/udp about managed data. Make it recognise and use msg->msg_ubuf, and also set/propagate SKBFL_MANAGED_FRAG_REFS down to skb_zerocopy_iter_dgram(). Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 57 +++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 581d1e233260..3fd1bf675598 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1017,18 +1017,35 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1192,13 +1209,14 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; @@ -1223,7 +1241,18 @@ static int __ip_append_data(struct sock *sk, skb->truesize += copy; wmem_alloc_delta += copy; } else { - err = skb_zerocopy_iter_dgram(skb, from, copy); + struct msghdr *msg = from; + + if (!skb_shinfo(skb)->nr_frags) { + if (msg->msg_managed_data) + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; + } else { + /* appending, don't mix managed and unmanaged */ + if (!msg->msg_managed_data) + skb_zcopy_downgrade_managed(skb); + } + + err = skb_zerocopy_iter_dgram(skb, msg, copy); if (err < 0) goto error; } From patchwork Tue Jul 5 15:01:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906679 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0578C433EF for ; Tue, 5 Jul 2022 15:02:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229602AbiGEPCo (ORCPT ); Tue, 5 Jul 2022 11:02:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232410AbiGEPCO (ORCPT ); Tue, 5 Jul 2022 11:02:14 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B42515FF2; Tue, 5 Jul 2022 08:02:01 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id v14so17965305wra.5; Tue, 05 Jul 2022 08:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GzLsAsxy95Fx76UEZFPuQaGwMf2vIR+G+ARPeYpyXs4=; b=UtZqPl6VuQcFKVyoE0m4M/ucDCoYfxGDoydFFTFdz8JcqeSF5qHI56kr0FmSCRmKQV Y05fIA5HjqzAilH2s3+vQTKL8iweuJoqI1X6qyQVBvh0RrhBouebHr9CA0a/fRau3cz3 fE4hjvSdDLe2lrc+ce78EouI3mDI3sONIS0l1Qq8lxkoOesTg5SPX3tjyRbj1WYbT7jX zKNPxJ/q7T8pTNf7Z4icOTZfjOn/DBxelt/9z3uUf3GkOsaSlWfavoEDrzYF3U8SN8Rr 3eJPo8sjGIcyWgiMo7ZqkzoIkCBKM8HfxRwT9C899dG3YBzlzvP5aMKPZ6oNN+p+GdBY SNvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GzLsAsxy95Fx76UEZFPuQaGwMf2vIR+G+ARPeYpyXs4=; b=xHVGyNFS4bHobXWdxl/5o3Fs91XYie2TSmNxwLESLBUm9T3lA7xP5PMpv6UHKM0Kc2 /YF3BqIU2xkjiEsTxX73mK8Z71CyNEXIrPMbrOEeL7+wqFy5tq8qmpplrX0mNYwreOo2 zw9s9JLV8Vedn6e0XTDxzx7r/Pt0ukhod/AqQ68O1GDs8cOsxHfF5MSdBjdcKUmJgGzy psY5WRBUNvAgx67tlFzajUHQH3p3J4yaNCSqr/YUUsHRvQeo55mV6AgWYA+s6thkgdJ2 BmYED5Wr5vC7oYbgou0KI7HMjnQt+Ge82c214KnMxME7VilRyCs93YEs+BphRU7yxqsx reAQ== X-Gm-Message-State: AJIora/VkQydiLvUwD6nSrORbLFgKK3ooYDvFpMuDdWUlk2ncUTsQyhH yYGTTYUuYUSAIuTikovwGV340WADbM7A4A== X-Google-Smtp-Source: AGRyM1uLLTELWeUDiGk8a+3owFmoMQvgcLMkMVHcxwokk6AYmB+pAUhin1uHsB1AHeKR5c0qfvcZsA== X-Received: by 2002:a05:6000:1861:b0:21b:a8a2:858d with SMTP id d1-20020a056000186100b0021ba8a2858dmr33600051wri.53.1657033319713; Tue, 05 Jul 2022 08:01:59 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:01:59 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 10/25] ipv6/udp: support zc with managed data Date: Tue, 5 Jul 2022 16:01:10 +0100 Message-Id: <74c0f3cf7ff2464b0025a590ce9e716adb350be7.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Just as with ipv4/udp make ipv6/udp to take advantage of managed data and propagate SKBFL_MANAGED_FRAG_REFS to skb_zerocopy_iter_dgram(). Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 57 ++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index fc74ce3ed8cc..34eb3b5da5e2 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1542,18 +1542,35 @@ static int __ip6_append_data(struct sock *sk, rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1747,13 +1764,14 @@ static int __ip6_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; @@ -1778,7 +1796,18 @@ static int __ip6_append_data(struct sock *sk, skb->truesize += copy; wmem_alloc_delta += copy; } else { - err = skb_zerocopy_iter_dgram(skb, from, copy); + struct msghdr *msg = from; + + if (!skb_shinfo(skb)->nr_frags) { + if (msg->msg_managed_data) + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; + } else { + /* appending, don't mix managed and unmanaged */ + if (!msg->msg_managed_data) + skb_zcopy_downgrade_managed(skb); + } + + err = skb_zerocopy_iter_dgram(skb, msg, copy); if (err < 0) goto error; } From patchwork Tue Jul 5 15:01:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906677 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C8A4C43334 for ; Tue, 5 Jul 2022 15:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229469AbiGEPCk (ORCPT ); Tue, 5 Jul 2022 11:02:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232107AbiGEPCO (ORCPT ); Tue, 5 Jul 2022 11:02:14 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01DDA1658F; Tue, 5 Jul 2022 08:02:02 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id f2so12595841wrr.6; Tue, 05 Jul 2022 08:02:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KyZlpO7Pd5tjNLef1myTjtIJ1vG1utvjduqSD0G6BDE=; b=qDCXHyX0/GU1ecBvFYBgl07k5ftG2Ecskq8qGKtifzxRmMahW/Q5VNvkwghgmidkkR gC6Vl8ulu63jd2OSnkblREktX+Cjn4Tdb7OatTrJPpg+OZQuxnoOQ2dZBta5aNMYddgx QIBy6G0dLw4K/2+lurCNdrQRp33JeeStMwQDlq67K4JfjoM6/k2WTWCHvfRCuIM/qu0B JXmKLka8WxD4rYQlggNsZm3Fdfjny4NpVn2msZmyX40dHr8KkuSiFNhCIZ6IFf0DoRt8 MYJKhPmyZbW9naxi4P4SkOVbsPZhIBTfxKQNfi5nc3bB9JHuo5D8Z/JKfoWb0/440EfY 1QEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KyZlpO7Pd5tjNLef1myTjtIJ1vG1utvjduqSD0G6BDE=; b=4Si7/0ZwGNDirTAlvwljiBzlsNHnr9dNu1DU3l39tH8l8gboFufkdpI6tRnQ4g5dm8 fSyjmyQgcvq+8SIlR6rTWN5jSOkTkSTVSsiCeHabxe7w214f+v9/bKg5WpPU2OE9XbKR zHzuAuerozaVJfYokr/MJunSGSaCCsGEwOatRv7jER2505UuKqUOaQjpE26Dv4G+2M9o a6qRPW/2bHjJpS9dwJabsXnv9EkDjfEBFKtWbfrwcN5kzYfvf5Ga5JgzI9FtyaasSKsk lHGLKDFupIJXmJwTSDafoFlCZnbB5zjA4tJeTMUFRJUY+24oX6HlLSRkyn0sztbd8uZE 8pTw== X-Gm-Message-State: AJIora9pbhCv1rLea0ddF25hd5kxda2Olegqk+rxHfrbWq7hSjXqdGUt ri3feQlBhNZQSiH5mC9ZBpZ23SGfCmBhXA== X-Google-Smtp-Source: AGRyM1scKLU7fIPXkhO1wvSVCPabhAo79LVIcQRSXC72+Jeu8w/lurbhdvnNPesG8wmwAZosjzWoFA== X-Received: by 2002:adf:e182:0:b0:21b:92c8:b045 with SMTP id az2-20020adfe182000000b0021b92c8b045mr32314764wrb.219.1657033321067; Tue, 05 Jul 2022 08:02:01 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.01.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:00 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 11/25] tcp: support zc with managed data Date: Tue, 5 Jul 2022 16:01:11 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Also make tcp to use managed data and propagate SKBFL_MANAGED_FRAG_REFS to optimise frag pages referencing. Signed-off-by: Pavel Begunkov --- net/ipv4/tcp.c | 52 ++++++++++++++++++++++++++++++++------------------ 1 file changed, 33 insertions(+), 19 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 390eb3dc53bd..05e2f6271f65 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1223,17 +1223,23 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) flags = msg->msg_flags; - if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) { + if ((flags & MSG_ZEROCOPY) && size) { skb = tcp_write_queue_tail(sk); - uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); - if (!uarg) { - err = -ENOBUFS; - goto out_err; - } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) - uarg->zerocopy = 0; + if (msg->msg_ubuf) { + uarg = msg->msg_ubuf; + net_zcopy_get(uarg); + zc = sk->sk_route_caps & NETIF_F_SG; + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); + if (!uarg) { + err = -ENOBUFS; + goto out_err; + } + zc = sk->sk_route_caps & NETIF_F_SG; + if (!zc) + uarg->zerocopy = 0; + } } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1356,9 +1362,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) copy = min_t(int, copy, pfrag->size - pfrag->offset); - if (tcp_downgrade_zcopy_pure(sk, skb)) - goto wait_for_space; - + if (unlikely(skb_zcopy_pure(skb) || skb_zcopy_managed(skb))) { + if (tcp_downgrade_zcopy_pure(sk, skb)) + goto wait_for_space; + skb_zcopy_downgrade_managed(skb); + } copy = tcp_wmem_schedule(sk, copy); if (!copy) goto wait_for_space; @@ -1381,15 +1389,21 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) pfrag->offset += copy; } else { /* First append to a fragless skb builds initial - * pure zerocopy skb + * zerocopy skb */ - if (!skb->len) + if (!skb->len) { + if (msg->msg_managed_data) + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; skb_shinfo(skb)->flags |= SKBFL_PURE_ZEROCOPY; - - if (!skb_zcopy_pure(skb)) { - copy = tcp_wmem_schedule(sk, copy); - if (!copy) - goto wait_for_space; + } else { + /* appending, don't mix managed and unmanaged */ + if (!msg->msg_managed_data) + skb_zcopy_downgrade_managed(skb); + if (!skb_zcopy_pure(skb)) { + copy = tcp_wmem_schedule(sk, copy); + if (!copy) + goto wait_for_space; + } } err = skb_zerocopy_iter_stream(sk, skb, msg, copy, uarg); From patchwork Tue Jul 5 15:01:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906678 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF6FFCCA47F for ; Tue, 5 Jul 2022 15:02:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232145AbiGEPCl (ORCPT ); Tue, 5 Jul 2022 11:02:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232477AbiGEPCO (ORCPT ); Tue, 5 Jul 2022 11:02:14 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AFE51581A; Tue, 5 Jul 2022 08:02:04 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id r14so12179042wrg.1; Tue, 05 Jul 2022 08:02:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UdmnXRPkIgS1Aj9RTdLvwJqgGoaQmg+RALiuFe75Kxo=; b=P8x9eE8Kf2sP2wA5TSj0klqACVZUk8qZzZx2ELoORjXT0Otmals+qe4Q9RDO/gOGs/ 6X6OM5XjDVWtVTMszc24LN4F8aG3aHfMbs1Okgf+mR12+YUVuQCIwFHtGlVDxvGOooJl AGY8+Vw1C/II0v3EqVua1rU3gk7bJ1cbXUPhhAFXgRz12dqPSIFRQ7xrgswFf4yw9Qha L3XukVYFMXwI6yhZHIKk+Jtppsm6KDjICYar9FxVagx3SIkVSqZkATfhCdDjbsBSaPkG eDZTAYk2vQkX8GfTikIN3muQo8F+QZwHRMnjNpboeTSW5xzPqrkjqpX2k43hr8GXWJXy 3uQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UdmnXRPkIgS1Aj9RTdLvwJqgGoaQmg+RALiuFe75Kxo=; b=gR54SNSjm2x3V6P/sGaGg6+PS8P2loW+MAlAe3iiJfxTTO0JrMzV4tlMuTYMxprt1K dQQSqW/bBry1CzPn7NngJM8aUgZ2XH9k/DXMG/etR+8pTX+q7rSOU4ql4b97JauVZpm0 iwgP/rYDWr7Ht2hM1DtLuAjy/Urt2lG4TUKCCF6J62amUJfoTT87EEtHZLZUNfR1ndkN +8tBaJTGScV++BQ8VTBW4hzErYgmphEntaGAqslWrxj8HATXCqt+R7w4IoO299na8W4x Jcyh7lzLjwYSAhioS/1cCh0F50loHh77qx3Eyp0MuM6fi0Go/nzfwqz9YfWD5qkIEXUk n2uw== X-Gm-Message-State: AJIora/NDS1t/kqkL7G35CMVtz9qDRL9eyPBi2zi0hG3mupF9/OAVPOP ZdTFn2X8GEbovVK+S6ISNOJynGjZkflvYA== X-Google-Smtp-Source: AGRyM1sf43VaMSwXgOrMplH6oKkfFhaPxg/mbQ5wLE8EEcpyWB2IeLWRH7h1LG3QfeOlSS7TsSUR3w== X-Received: by 2002:adf:f211:0:b0:21d:6f1a:b857 with SMTP id p17-20020adff211000000b0021d6f1ab857mr5867983wro.614.1657033322426; Tue, 05 Jul 2022 08:02:02 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:02 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 12/25] io_uring: add zc notification infrastructure Date: Tue, 5 Jul 2022 16:01:12 +0100 Message-Id: <2239fd796a3a3150884fadfcba3813a02a26891f.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 5 ++ io_uring/Makefile | 2 +- io_uring/io_uring.c | 6 +- io_uring/io_uring.h | 1 + io_uring/notif.c | 102 +++++++++++++++++++++++++++++++++ io_uring/notif.h | 64 +++++++++++++++++++++ 6 files changed, 177 insertions(+), 3 deletions(-) create mode 100644 io_uring/notif.c create mode 100644 io_uring/notif.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3ca8f363f504..a64eb2558e04 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -33,6 +33,9 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_notif; +struct io_notif_slot; + struct io_hash_bucket { spinlock_t lock; struct hlist_head list; @@ -207,6 +210,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_notif_slot *notif_slots; + unsigned nr_notif_slots; struct io_submit_state submit_state; diff --git a/io_uring/Makefile b/io_uring/Makefile index 466639c289be..8cc8e5387a75 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 070ee9ec9ee7..eff4adca1813 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -89,6 +89,7 @@ #include "kbuf.h" #include "rsrc.h" #include "cancel.h" +#include "notif.h" #include "timeout.h" #include "poll.h" @@ -735,8 +736,7 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx) return &rings->cqes[off]; } -static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, - u64 user_data, s32 res, u32 cflags) +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags) { struct io_uring_cqe *cqe; @@ -2498,6 +2498,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) } #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -2674,6 +2675,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) io_unregister_personality(ctx, index); if (ctx->rings) io_poll_remove_all(ctx, NULL, true); + io_notif_unregister(ctx); mutex_unlock(&ctx->uring_lock); /* failed during ring init, it couldn't have issued any requests */ diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index f77e4a5403e4..7b7b63503c02 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -24,6 +24,7 @@ void io_req_complete_failed(struct io_kiocb *req, s32 res); void __io_req_complete(struct io_kiocb *req, unsigned issue_flags); void io_req_complete_post(struct io_kiocb *req); void __io_req_complete_post(struct io_kiocb *req); +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags); bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); diff --git a/io_uring/notif.c b/io_uring/notif.c new file mode 100644 index 000000000000..e9e0c5566c4a --- /dev/null +++ b/io_uring/notif.c @@ -0,0 +1,102 @@ +#include +#include +#include +#include +#include +#include + +#include "io_uring.h" +#include "notif.h" + +static void __io_notif_complete_tw(struct callback_head *cb) +{ + struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_ring_ctx *ctx = notif->ctx; + + io_cq_lock(ctx); + io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq); + io_cq_unlock_post(ctx); + + percpu_ref_put(&ctx->refs); + kfree(notif); +} + +static inline void io_notif_complete(struct io_notif *notif) +{ + __io_notif_complete_tw(¬if->task_work); +} + +static void io_notif_complete_wq(struct work_struct *work) +{ + struct io_notif *notif = container_of(work, struct io_notif, commit_work); + + io_notif_complete(notif); +} + +static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *uarg, + bool success) +{ + struct io_notif *notif = container_of(uarg, struct io_notif, uarg); + + if (!refcount_dec_and_test(&uarg->refcnt)) + return; + INIT_WORK(¬if->commit_work, io_notif_complete_wq); + queue_work(system_unbound_wq, ¬if->commit_work); +} + +struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif; + + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + + notif->seq = slot->seq++; + notif->tag = slot->tag; + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + /* master ref owned by io_notif_slot, will be dropped on flush */ + refcount_set(¬if->uarg.refcnt, 1); + percpu_ref_get(&ctx->refs); + return notif; +} + +static void io_notif_slot_flush(struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif = slot->notif; + + slot->notif = NULL; + + if (WARN_ON_ONCE(in_interrupt())) + return; + /* drop slot's master ref */ + if (refcount_dec_and_test(¬if->uarg.refcnt)) + io_notif_complete(notif); +} + +__cold int io_notif_unregister(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + int i; + + if (!ctx->notif_slots) + return -ENXIO; + + for (i = 0; i < ctx->nr_notif_slots; i++) { + struct io_notif_slot *slot = &ctx->notif_slots[i]; + + if (slot->notif) + io_notif_slot_flush(slot); + } + + kvfree(ctx->notif_slots); + ctx->notif_slots = NULL; + ctx->nr_notif_slots = 0; + return 0; +} \ No newline at end of file diff --git a/io_uring/notif.h b/io_uring/notif.h new file mode 100644 index 000000000000..3d7a1d242e17 --- /dev/null +++ b/io_uring/notif.h @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include + +struct io_notif { + struct ubuf_info uarg; + struct io_ring_ctx *ctx; + + /* cqe->user_data, io_notif_slot::tag if not overridden */ + u64 tag; + /* see struct io_notif_slot::seq */ + u32 seq; + + union { + struct callback_head task_work; + struct work_struct commit_work; + }; +}; + +struct io_notif_slot { + /* + * Current/active notifier. A slot holds only one active notifier at a + * time and keeps one reference to it. Flush releases the reference and + * lazily replaces it with a new notifier. + */ + struct io_notif *notif; + + /* + * Default ->user_data for this slot notifiers CQEs + */ + u64 tag; + /* + * Notifiers of a slot live in generations, we create a new notifier + * only after flushing the previous one. Track the sequential number + * for all notifiers and copy it into notifiers's cqe->cflags + */ + u32 seq; +}; + +int io_notif_unregister(struct io_ring_ctx *ctx); + +struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot); + +static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) +{ + if (!slot->notif) + slot->notif = io_alloc_notif(ctx, slot); + return slot->notif; +} + +static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, + int idx) + __must_hold(&ctx->uring_lock) +{ + if (idx >= ctx->nr_notif_slots) + return NULL; + idx = array_index_nospec(idx, ctx->nr_notif_slots); + return &ctx->notif_slots[idx]; +} From patchwork Tue Jul 5 15:01:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906680 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE32CC433EF for ; Tue, 5 Jul 2022 15:02:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232576AbiGEPCz (ORCPT ); Tue, 5 Jul 2022 11:02:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232562AbiGEPC1 (ORCPT ); Tue, 5 Jul 2022 11:02:27 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02C9A1706F; Tue, 5 Jul 2022 08:02:05 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id a5so3274630wrx.12; Tue, 05 Jul 2022 08:02:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JlFuNzwa17IK4sUefiCWM0htTtZXIp8c8sCuHSjyyPo=; b=nJkXXMS5ZaSo0vfpAlUv17iJ/KfZJX/ijq56/Cx2W/5FVbn34fTomC9YX3h/K4Uenm +DqFO4+iFB+U53XLttzvIuLZRLXUT9az0kuY+T1lW51xVdjdZp+PJEluXVoryUaQj8ps vK1RPW6cNFzAs6CZVfgGOBE5rJq1FwfafZ8ekS4Z1rKne9kKYm0gHPGi2xkGOdMKwtuN OHVUTBs3sigNC+uN6eSGspe6eq6GAisW2I0THpJyIsmUi5rs4BOuf4ez9WNnsJxZvRAj EUWNCGEQ34jYmmuXIJ4ZkkGzBOqoa31wNjqJrKDo5wgWvDHTqNPuPefdRG0sYlmtcjx7 VIbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JlFuNzwa17IK4sUefiCWM0htTtZXIp8c8sCuHSjyyPo=; b=ORYOmWLZZTz2p0GcnGRxBJAWnhhXbC8dhd9pqfYyO4DRwJO/ixNAuC0r9fBSkpdIrn UfV5zaqlgf83+SAVniWLt4A0xYl1MKDAshUuymHiKUC96fgpr3xpk8wnaFZEK+E/JKiX VcPu5duoSolegKLSHjmrkhO0DyKfCtLrb9NOF5tMvLKT8Vn7nGLmuJ4GCjluqBrx/F8A 2z0HaapPpwSHB9d/7Y3+Jru5W8JrdnRmilZMdGUUOJsB7aazMvKm3nfNBPEl7uzIWwJW 7tZyVKYRajzKQSp9MV2UFeT3ZJPsy/tgtpCujQ4wRYeSxbo0xV8gAgA1qGr1FAIfbNCA 3liw== X-Gm-Message-State: AJIora+RlEBlo4E1v8EmHkdSiRsFjnC35ud7JgXcTF15OFxDMiTSAmrY BDqc0YfCGpFtUsgQmT0mJi1fhLeGiNG+Iw== X-Google-Smtp-Source: AGRyM1sv8UAk+lPdmiZ8A1yUvTEPMMUjXMgJxEMoENk/Q1F3I0i64C75EjxdbD3X6K5v9BwxwxBuww== X-Received: by 2002:a5d:4346:0:b0:21d:5dfe:b29b with SMTP id u6-20020a5d4346000000b0021d5dfeb29bmr16806951wrr.672.1657033323740; Tue, 05 Jul 2022 08:02:03 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:03 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 13/25] io_uring: export task put Date: Tue, 5 Jul 2022 16:01:13 +0100 Message-Id: <6a15bddc42ec7cc83f34e2b00be97ceea413d786.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 25 +++++++++++++++++++++++++ io_uring/io_uring.c | 11 +---------- io_uring/io_uring.h | 10 ++++++++++ io_uring/tctx.h | 26 -------------------------- 4 files changed, 36 insertions(+), 36 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index a64eb2558e04..26a1504ad24c 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -4,6 +4,7 @@ #include #include #include +#include #include struct io_wq_work_node { @@ -46,6 +47,30 @@ struct io_hash_table { unsigned hash_bits; }; +/* + * Arbitrary limit, can be raised if need be + */ +#define IO_RINGFD_REG_MAX 16 + +struct io_uring_task { + /* submission side */ + int cached_refs; + const struct io_ring_ctx *last; + struct io_wq *io_wq; + struct file *registered_rings[IO_RINGFD_REG_MAX]; + + struct xarray xa; + struct wait_queue_head wait; + atomic_t in_idle; + atomic_t inflight_tracked; + struct percpu_counter inflight; + + struct { /* task_work */ + struct llist_head task_list; + struct callback_head task_work; + } ____cacheline_aligned_in_smp; +}; + struct io_uring { u32 head ____cacheline_aligned_in_smp; u32 tail ____cacheline_aligned_in_smp; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index eff4adca1813..5fbbdcad14fa 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -603,7 +603,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx) return ret; } -static void __io_put_task(struct task_struct *task, int nr) +void __io_put_task(struct task_struct *task, int nr) { struct io_uring_task *tctx = task->io_uring; @@ -613,15 +613,6 @@ static void __io_put_task(struct task_struct *task, int nr) put_task_struct_many(task, nr); } -/* must to be called somewhat shortly after putting a request */ -static inline void io_put_task(struct task_struct *task, int nr) -{ - if (likely(task == current)) - task->io_uring->cached_refs += nr; - else - __io_put_task(task, nr); -} - static void io_task_refs_refill(struct io_uring_task *tctx) { unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 7b7b63503c02..e978654d1b14 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -59,6 +59,7 @@ void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); void io_queue_next(struct io_kiocb *req); +void __io_put_task(struct task_struct *task, int nr); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); @@ -244,4 +245,13 @@ static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx) __io_commit_cqring_flush(ctx); } +/* must to be called somewhat shortly after putting a request */ +static inline void io_put_task(struct task_struct *task, int nr) +{ + if (likely(task == current)) + task->io_uring->cached_refs += nr; + else + __io_put_task(task, nr); +} + #endif diff --git a/io_uring/tctx.h b/io_uring/tctx.h index 8a33ff6e5d91..25974beed4d6 100644 --- a/io_uring/tctx.h +++ b/io_uring/tctx.h @@ -1,31 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 -#include - -/* - * Arbitrary limit, can be raised if need be - */ -#define IO_RINGFD_REG_MAX 16 - -struct io_uring_task { - /* submission side */ - int cached_refs; - const struct io_ring_ctx *last; - struct io_wq *io_wq; - struct file *registered_rings[IO_RINGFD_REG_MAX]; - - struct xarray xa; - struct wait_queue_head wait; - atomic_t in_idle; - atomic_t inflight_tracked; - struct percpu_counter inflight; - - struct { /* task_work */ - struct llist_head task_list; - struct callback_head task_work; - } ____cacheline_aligned_in_smp; -}; - struct io_tctx_node { struct list_head ctx_node; struct task_struct *task; From patchwork Tue Jul 5 15:01:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906681 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B66ACCA47B for ; Tue, 5 Jul 2022 15:03:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232681AbiGEPDB (ORCPT ); Tue, 5 Jul 2022 11:03:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230248AbiGEPC1 (ORCPT ); Tue, 5 Jul 2022 11:02:27 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0291717040; Tue, 5 Jul 2022 08:02:05 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id q9so17961048wrd.8; Tue, 05 Jul 2022 08:02:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lC2y6heFTIszSDWCfIFDneSWEpHJYiSq2WiQM7wzmnY=; b=MOKOANxSOtxlirWG/N/isvmy1VIyEsi6v+0KsJ485FSPbos+8nR+AMs6DF4MCANC9m dL0jTsXbvKsmSmGYO8hzKRQB0nuNaJtigagAKnk4pzyNdAhkjC+U8rhf7J020AfI9lEf F2WgSCtoADxMkgIxq1+Ru+gVxk31XQv71+Fde3TMTnY7XnWEvj8zZBAPWlWZTYBQG+3V oHDXGwjQwZwZZByb15gKsF3pCRst7DpCM3dZDhsWX1cz8KFI/3UtcRikvI5khV6N8HJM 96D09dDvqCDx9du+PRp5Y6PXdf//OPGFa3tBmACDAnUMXAIWdPPHNB+dTEpqZah3cgn+ ng5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lC2y6heFTIszSDWCfIFDneSWEpHJYiSq2WiQM7wzmnY=; b=XkcZfT6IFnnZj6l/J5eMJZew8wUkRT/rh5r+5AiQsiQAWrMr2Gm/BB13Fbm4Pp5md7 w9Swwp6Y0KyvS3d1EQuuKDAfSOCp1NzdAUtWcduk/aQ2AzJFU38tEzah0hSbZBNEnlZs 0MoexHDMyf2oIMYc6gS+XVXIygvzKuNh/nzmzB6AYJuYDI9kivEnbXH7dvld99d5PQgU dbR9QcHmZVNiH0HyXW1my1xtFfN0tX/j4PxxveDWtWCPqbmNYW536/tyuVB3aBHgXKB8 VI4tOOzC784NDhe9NUdGfnLROkjJJLofWd9/Og8sMu2Loj5/AwRN+mH8vp4Fzecwoc0Z SCbQ== X-Gm-Message-State: AJIora90r0Weh82/dtDzr1TLmTG5k/0MFsjClCyUgnzCuHvIjy2gVSxX UTFzmdypzacSFNIyipGdVCVXO259CiC20A== X-Google-Smtp-Source: AGRyM1vyk37aZ7jDxeoKqJj6qXwBETKD394Mw4M6fvlkmPOliQJkUNnpHAGJPczwkK/+eX9AlSdNbA== X-Received: by 2002:a5d:58cc:0:b0:21d:6919:7daf with SMTP id o12-20020a5d58cc000000b0021d69197dafmr12176606wrf.434.1657033324881; Tue, 05 Jul 2022 08:02:04 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:04 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 14/25] io_uring: cache struct io_notif Date: Tue, 5 Jul 2022 16:01:14 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 7 +++++ io_uring/io_uring.c | 3 ++ io_uring/notif.c | 57 +++++++++++++++++++++++++++++----- io_uring/notif.h | 5 +++ 4 files changed, 65 insertions(+), 7 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 26a1504ad24c..0a3fdaa368a4 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -244,6 +244,9 @@ struct io_ring_ctx { struct xarray io_bl_xa; struct list_head io_buffers_cache; + /* struct io_notif cache, protected by uring_lock */ + struct list_head notif_list; + struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; struct list_head apoll_cache; @@ -255,6 +258,10 @@ struct io_ring_ctx { struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; + /* struct io_notif cache protected by completion_lock */ + struct list_head notif_list_locked; + unsigned int notif_locked_nr; + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5fbbdcad14fa..6054e71e6ade 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -318,6 +318,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_LIST_HEAD(&ctx->notif_list); + INIT_LIST_HEAD(&ctx->notif_list_locked); return ctx; err: kfree(ctx->dummy_ubuf); @@ -2491,6 +2493,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); + io_notif_cache_purge(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); diff --git a/io_uring/notif.c b/io_uring/notif.c index e9e0c5566c4a..ffbd5ce03c36 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -15,10 +15,12 @@ static void __io_notif_complete_tw(struct callback_head *cb) io_cq_lock(ctx); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq); + + list_add(¬if->cache_node, &ctx->notif_list_locked); + ctx->notif_locked_nr++; io_cq_unlock_post(ctx); percpu_ref_put(&ctx->refs); - kfree(notif); } static inline void io_notif_complete(struct io_notif *notif) @@ -45,21 +47,62 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, queue_work(system_unbound_wq, ¬if->commit_work); } +static void io_notif_splice_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + spin_lock(&ctx->completion_lock); + list_splice_init(&ctx->notif_list_locked, &ctx->notif_list); + ctx->notif_locked_nr = 0; + spin_unlock(&ctx->completion_lock); +} + +void io_notif_cache_purge(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + io_notif_splice_cached(ctx); + + while (!list_empty(&ctx->notif_list)) { + struct io_notif *notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + + list_del(¬if->cache_node); + kfree(notif); + } +} + +static inline bool io_notif_has_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + if (likely(!list_empty(&ctx->notif_list))) + return true; + if (data_race(READ_ONCE(ctx->notif_locked_nr) <= IO_NOTIF_SPLICE_BATCH)) + return false; + io_notif_splice_cached(ctx); + return !list_empty(&ctx->notif_list); +} + struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif; - notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); - if (!notif) - return NULL; + if (likely(io_notif_has_cached(ctx))) { + notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + list_del(¬if->cache_node); + } else { + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + /* pre-initialise some fields */ + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + } notif->seq = slot->seq++; notif->tag = slot->tag; - notif->ctx = ctx; - notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; - notif->uarg.callback = io_uring_tx_zerocopy_callback; /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); diff --git a/io_uring/notif.h b/io_uring/notif.h index 3d7a1d242e17..b23c9c0515bb 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -5,6 +5,8 @@ #include #include +#define IO_NOTIF_SPLICE_BATCH 32 + struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; @@ -13,6 +15,8 @@ struct io_notif { u64 tag; /* see struct io_notif_slot::seq */ u32 seq; + /* hook into ctx->notif_list and ctx->notif_list_locked */ + struct list_head cache_node; union { struct callback_head task_work; @@ -41,6 +45,7 @@ struct io_notif_slot { }; int io_notif_unregister(struct io_ring_ctx *ctx); +void io_notif_cache_purge(struct io_ring_ctx *ctx); struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot); From patchwork Tue Jul 5 15:01:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906706 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7897AC433EF for ; Tue, 5 Jul 2022 15:03:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231960AbiGEPDQ (ORCPT ); Tue, 5 Jul 2022 11:03:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232282AbiGEPC1 (ORCPT ); Tue, 5 Jul 2022 11:02:27 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9302117A83; Tue, 5 Jul 2022 08:02:07 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id b26so17992602wrc.2; Tue, 05 Jul 2022 08:02:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ya8qQYe1KtoEMBzW9pCDBd1YyLHf7cpw6uXv6rtRGgo=; b=qJgb3nPiM5uOk2DOVZFGqFxetXhonVk+2Yp7g2NFgMEKQIigE6zl6wDa2Jw5+bi9z7 uAkhuYGcNvQIMN4T7jJCmlZ25MzhsihALwA7079NbeRr4SsH/14TESMxiZHmqKEBxNNe NwgV4web0zPNZNOhyZp67+rv6jdTd9mMw7Tk4kXjCnYEGyfpT5UPgvbzWgb+/mVFNRDT scxWQPMeNkb6zIc4pU/ZbjQY4Jr/Nt6/lYpZGEv3tADTu1jq2yVi2pUEO/zZFEy4ltSf f5NUNvfZ31PRJO/OZslpqFMhmRDSnkzWQWOqghWeRXW1gJOFNHWMaRuRGt1c+pc5KsFL TGCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ya8qQYe1KtoEMBzW9pCDBd1YyLHf7cpw6uXv6rtRGgo=; b=VbLzLOpCuX+XgF9KJ4qdJMZxtTgiUl/xHleMNBUceKA05uqzLfGz7THHBmYu9n6bMC o1DsT7luyWbgjW+TBePVYhIG6y7H03sMB62aN+i43u7YtMKRzO9T/QFRrLjja2pnwrqf L8XZc3PCorneOwFZC0iRAD49L1Jpk+AxndWp3zxlqhRNOefCJzdZ3wN3ePuFrF0URZLt I+gl2l4x5a26RdCzDLBF72lXRSLp6Y/wuPXn4eBziViTaP9gywmP0KycLNHnkpnbmtEC KOB2eWcFLI2aW3ZgEhSxOJfA+ezEf9yC4184cIPoWRWfQ2PIXm5VGX1ia6zvSz3Jcw57 QkjQ== X-Gm-Message-State: AJIora+vYjMuGwAHmPp10EdZVIDAo/ttASJFG84UTzETzwDJYoCT4CA7 iYq8pRh/Mohtv/Add4f8+qsFo8alwjiPOw== X-Google-Smtp-Source: AGRyM1vUwW9XmUV7WuK8J2o16zZbs2hbd4UedhhHToqg9/fGWrPcDs5HDMPpTFbErf0q5d+FXWCvxA== X-Received: by 2002:adf:e3cb:0:b0:21b:8de5:ec7d with SMTP id k11-20020adfe3cb000000b0021b8de5ec7dmr33304609wrm.714.1657033326152; Tue, 05 Jul 2022 08:02:06 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:05 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 15/25] io_uring: complete notifiers in tw Date: Tue, 5 Jul 2022 16:01:15 +0100 Message-Id: <591b24351034d95bc4f39a3d1cbbb7132109218d.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 22 +++++++++++++++++++--- io_uring/notif.h | 3 +++ 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index ffbd5ce03c36..f795e820de56 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -13,6 +13,11 @@ static void __io_notif_complete_tw(struct callback_head *cb) struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_ring_ctx *ctx = notif->ctx; + if (likely(notif->task)) { + io_put_task(notif->task, 1); + notif->task = NULL; + } + io_cq_lock(ctx); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq); @@ -43,6 +48,14 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, if (!refcount_dec_and_test(&uarg->refcnt)) return; + + if (likely(notif->task)) { + init_task_work(¬if->task_work, __io_notif_complete_tw); + if (likely(!task_work_add(notif->task, ¬if->task_work, + TWA_SIGNAL))) + return; + } + INIT_WORK(¬if->commit_work, io_notif_complete_wq); queue_work(system_unbound_wq, ¬if->commit_work); } @@ -134,12 +147,15 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx) for (i = 0; i < ctx->nr_notif_slots; i++) { struct io_notif_slot *slot = &ctx->notif_slots[i]; - if (slot->notif) - io_notif_slot_flush(slot); + if (!slot->notif) + continue; + if (WARN_ON_ONCE(slot->notif->task)) + slot->notif->task = NULL; + io_notif_slot_flush(slot); } kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; return 0; -} \ No newline at end of file +} diff --git a/io_uring/notif.h b/io_uring/notif.h index b23c9c0515bb..23ca7620fff9 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -11,6 +11,9 @@ struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ + struct task_struct *task; + /* cqe->user_data, io_notif_slot::tag if not overridden */ u64 tag; /* see struct io_notif_slot::seq */ From patchwork Tue Jul 5 15:01:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906708 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 169DECCA47B for ; Tue, 5 Jul 2022 15:04:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231690AbiGEPED (ORCPT ); Tue, 5 Jul 2022 11:04:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232707AbiGEPC3 (ORCPT ); Tue, 5 Jul 2022 11:02:29 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C97A217ABE; Tue, 5 Jul 2022 08:02:09 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id a5so3274966wrx.12; Tue, 05 Jul 2022 08:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=D4NntIuJV34ug7A0X+amA/c6OF0AjApfUu1QUoGOdxk=; b=jGy8WpkxOtJxnq6C8b4FOZNH5T5316NoemiAURQDzXiFOaBdSo7638YunIA0+m9MtU x0aX7HnPdngWryxbz87mGzzi0M3FDDjZ6X7ZyxXpaxDjqljPwHxxuyeESLgmaDlrP9TQ VbtlfesDi9rNlwDjbdOj20Y0a6nHFcbGzmzmuV9QhdFSTkm296gD+mLJno6wMA4iowu7 bXEQAm1rSPel5Wxd4QfhKM1LgF9oBeObQCJcyFcZSqaf9hpPXOG+CXDvEdQZ1/J1+/5P hzlPhQebXBBZMRA8you34HD7M05/vrAlLya8+n4uFnYX0KCmRhCuSPbBrvUkj/hS+bmc GFqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=D4NntIuJV34ug7A0X+amA/c6OF0AjApfUu1QUoGOdxk=; b=LCqYLWjJWgHICFn3m8eFz7Iwi7Ylm1dZSvW01+FCYoweKREGcFXkik5T2zCnqN3DmX kZNcrNYkzVaUhhSuf7hTRhLdhYr4gZnSbGkIqM+7n7IrQU1TJ/hCNL2O5FffxJn2f1E9 4rFYqm/CN/4HvuLpDTTIZBBuKNtYp7IwjNAR2H0ox283pR48P+wGqxcbEcjpisKlqMEX Crpx4CxJYYKuhYMLP1fJhd7Uzzruca8ohfxAqMr5y78I9QdZmVpr0lJvcjXUWF7eysFT DzOMbEbJYO8WHtb4aYmCblpkB/xR6aHCr7w+BI5/MruL+iPCVOoEDZVIFy6cRf4Jq6Iq 099g== X-Gm-Message-State: AJIora+NoA59MqI7Z5JKbm1P9PFyhe8/aPYESstJ+VnsXGuX90+VSMmF L8SltL6nrxH0LyQLgCRptotPPTTlvoZIkw== X-Google-Smtp-Source: AGRyM1snF9sXVTxhArnz8el+v03a7lEqk2AVfQ3aQrqUG7aZy57Tdt5L21QLyN2D4XQU4MOPfxEJmQ== X-Received: by 2002:adf:f78d:0:b0:21d:6e79:88a2 with SMTP id q13-20020adff78d000000b0021d6e7988a2mr6706112wrp.493.1657033327404; Tue, 05 Jul 2022 08:02:07 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:07 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 16/25] io_uring: add notification slot registration Date: Tue, 5 Jul 2022 16:01:16 +0100 Message-Id: <21830bd164c444b21f6ba8b672311fb245efe752.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 17 ++++++++++++++ io_uring/io_uring.c | 9 ++++++++ io_uring/notif.c | 43 +++++++++++++++++++++++++++++++++++ io_uring/notif.h | 3 +++ 4 files changed, 72 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 09e7c3b13d2d..9b7ea3e1018f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -429,6 +429,10 @@ enum { /* sync cancelation API */ IORING_REGISTER_SYNC_CANCEL = 24, + /* zerocopy notification API */ + IORING_REGISTER_NOTIFIERS = 25, + IORING_UNREGISTER_NOTIFIERS = 26, + /* this goes last */ IORING_REGISTER_LAST }; @@ -475,6 +479,19 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_notification_slot { + __u64 tag; + __u64 resv[3]; +}; + +struct io_uring_notification_register { + __u32 nr_slots; + __u32 resv; + __u64 resv2; + __u64 data; + __u64 resv3; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6054e71e6ade..3b885d65e569 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3862,6 +3862,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_sync_cancel(ctx, arg); break; + case IORING_REGISTER_NOTIFIERS: + ret = io_notif_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_NOTIFIERS: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_notif_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/io_uring/notif.c b/io_uring/notif.c index f795e820de56..2e9329f97d2c 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -157,5 +157,48 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx) kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; + io_notif_cache_purge(ctx); + return 0; +} + +__cold int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size) + __must_hold(&ctx->uring_lock) +{ + struct io_uring_notification_slot __user *slots; + struct io_uring_notification_slot slot; + struct io_uring_notification_register reg; + unsigned i; + + if (ctx->nr_notif_slots) + return -EBUSY; + if (size != sizeof(reg)) + return -EINVAL; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (!reg.nr_slots || reg.nr_slots > IORING_MAX_NOTIF_SLOTS) + return -EINVAL; + if (reg.resv || reg.resv2 || reg.resv3) + return -EINVAL; + + slots = u64_to_user_ptr(reg.data); + ctx->notif_slots = kvcalloc(reg.nr_slots, sizeof(ctx->notif_slots[0]), + GFP_KERNEL_ACCOUNT); + if (!ctx->notif_slots) + return -ENOMEM; + + for (i = 0; i < reg.nr_slots; i++, ctx->nr_notif_slots++) { + struct io_notif_slot *notif_slot = &ctx->notif_slots[i]; + + if (copy_from_user(&slot, &slots[i], sizeof(slot))) { + io_notif_unregister(ctx); + return -EFAULT; + } + if (slot.resv[0] | slot.resv[1] | slot.resv[2]) { + io_notif_unregister(ctx); + return -EINVAL; + } + notif_slot->tag = slot.tag; + } return 0; } diff --git a/io_uring/notif.h b/io_uring/notif.h index 23ca7620fff9..6dde39c6afbe 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -6,6 +6,7 @@ #include #define IO_NOTIF_SPLICE_BATCH 32 +#define IORING_MAX_NOTIF_SLOTS (1U << 10) struct io_notif { struct ubuf_info uarg; @@ -47,6 +48,8 @@ struct io_notif_slot { u32 seq; }; +int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size); int io_notif_unregister(struct io_ring_ctx *ctx); void io_notif_cache_purge(struct io_ring_ctx *ctx); From patchwork Tue Jul 5 15:01:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906707 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0984C433EF for ; Tue, 5 Jul 2022 15:03:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231246AbiGEPDi (ORCPT ); Tue, 5 Jul 2022 11:03:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232664AbiGEPC2 (ORCPT ); Tue, 5 Jul 2022 11:02:28 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B171D17AB8; Tue, 5 Jul 2022 08:02:09 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id v14so17966064wra.5; Tue, 05 Jul 2022 08:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=YMXmb32y8VdSZAkM2oglR1DJxfovk6Mh7PEd5Sxoefs=; b=jELFjSRlzcn9xlDPOX1/WH9U9Uj9iZljgcpThRvvje5YxcmLuZ7TMQPYVC2rn3V01I B0FmKpDsDsUleMfkM+PdZXJtMFwrdRhGUP1f069r775IAQ/s1EHeUKHthiYZneLZQ4+n /Aum5A2xzds3LSS25XLQpyutkNP1HZXDWhw3XYRLB160b/CGGga2Sdp2jOFmQLaUQxAg bjPDC4x/GatLLnA6P18a0VBczgaYIAafJJb6IOFb0tUxlChWO3FM2gEtAPSJT4hrh3yj 8ywRJxW/88qz/N23LmNiRn4HWw+a7t2azktkrMoRE13c2smWJSfRPHB9mCVcfYLrclWW /pkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YMXmb32y8VdSZAkM2oglR1DJxfovk6Mh7PEd5Sxoefs=; b=eed+kd5Tc6w3CagqpzsX8Dsr6ZTlZA61dHNIXTXDi0EqMEOSEKlVEY528wAOHXCBIt nRzClxrK2ukkgSJmljjCIs8MxjDG0xDqeleNelq4jwxYjpZy2/tvgtDhJ8eqC7ox+X3c AuIbKK4eFn4aGt48riTq9ti4DOujPMr/M0Qje5IMQmf2977M4pQGw/ozfMjF4teMm7T0 ONlGDVZSNJjqxyAAD1lZS+amprfmHY0FR9x5yzcdzNwK9pd1BndiWivH1+Xo4UhQj77S 4FJelUDj86IJR3W49wpJsei60PgyeicOU2b0sBqZNYj0DBLYetSdB1DdrennfXywu23L KB6w== X-Gm-Message-State: AJIora/f6f5KQ4ridgIlQcKPiUZ7EHzYm3784X1BvSfgxrXvZQpa//uQ XtLDYozCh1yQjrGfxDgbFtgDfHxV4PZopw== X-Google-Smtp-Source: AGRyM1sO+ytymCyBk1ygSSjz60MpRsgChvHKHCmVQBbLNoyqIVYXSWe5tnugySPYNxhHWGOiduhCgQ== X-Received: by 2002:a05:6000:a1e:b0:21b:8c8d:3cb5 with SMTP id co30-20020a0560000a1e00b0021b8c8d3cb5mr33104520wrb.372.1657033328765; Tue, 05 Jul 2022 08:02:08 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:08 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 17/25] io_uring: wire send zc request type Date: Tue, 5 Jul 2022 16:01:17 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 5 +++ io_uring/net.c | 84 +++++++++++++++++++++++++++++++++++ io_uring/net.h | 4 ++ io_uring/opdef.c | 15 +++++++ 4 files changed, 108 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9b7ea3e1018f..0e1e179cec1d 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -62,6 +62,10 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + struct { + __u16 notification_idx; + __u16 __pad; + } __attribute__((packed)); }; union { struct { @@ -193,6 +197,7 @@ enum io_uring_op { IORING_OP_GETXATTR, IORING_OP_SOCKET, IORING_OP_URING_CMD, + IORING_OP_SENDZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/net.c b/io_uring/net.c index d95c88d83f9f..ef492f1360c8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -13,6 +13,7 @@ #include "io_uring.h" #include "kbuf.h" #include "net.h" +#include "notif.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -58,6 +59,14 @@ struct io_sr_msg { unsigned int flags; }; +struct io_sendzc { + struct file *file; + void __user *buf; + size_t len; + u16 slot_idx; + int msg_flags; +}; + #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -770,4 +779,79 @@ int io_connect(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } + +int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_sendzc *zc = io_kiocb_to_cmd(req); + + if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0])) + return -EINVAL; + + zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + zc->len = READ_ONCE(sqe->len); + zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; + zc->slot_idx = READ_ONCE(sqe->notification_idx); + if (zc->msg_flags & MSG_DONTWAIT) + req->flags |= REQ_F_NOWAIT; +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_sendzc *zc = io_kiocb_to_cmd(req); + struct io_notif_slot *notif_slot; + struct io_notif *notif; + struct msghdr msg; + struct iovec iov; + struct socket *sock; + unsigned msg_flags; + int ret, min_ret = 0; + + if (issue_flags & IO_URING_F_UNLOCKED) + return -EAGAIN; + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + + notif_slot = io_get_notif_slot(ctx, zc->slot_idx); + if (!notif_slot) + return -EINVAL; + notif = io_get_notif(ctx, notif_slot); + if (!notif) + return -ENOMEM; + + msg.msg_name = NULL; + msg.msg_control = NULL; + msg.msg_controllen = 0; + msg.msg_namelen = 0; + msg.msg_managed_data = 0; + + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); + if (unlikely(ret)) + return ret; + + msg_flags = zc->msg_flags | MSG_ZEROCOPY; + if (issue_flags & IO_URING_F_NONBLOCK) + msg_flags |= MSG_DONTWAIT; + if (msg_flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + + msg.msg_flags = msg_flags; + msg.msg_ubuf = ¬if->uarg; + ret = sock_sendmsg(sock, &msg); + + if (unlikely(ret < min_ret)) { + if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) + return -EAGAIN; + return ret == -ERESTARTSYS ? -EINTR : ret; + } + + io_req_set_res(req, ret, 0); + return IOU_OK; +} #endif diff --git a/io_uring/net.h b/io_uring/net.h index 81d71d164770..1dba8befebb3 100644 --- a/io_uring/net.h +++ b/io_uring/net.h @@ -40,4 +40,8 @@ int io_socket(struct io_kiocb *req, unsigned int issue_flags); int io_connect_prep_async(struct io_kiocb *req); int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_connect(struct io_kiocb *req, unsigned int issue_flags); + +int io_sendzc(struct io_kiocb *req, unsigned int issue_flags); +int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 0be00db9e31c..91d425b43174 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -466,6 +466,21 @@ const struct io_op_def io_op_defs[] = { .issue = io_uring_cmd, .prep_async = io_uring_cmd_prep_async, }, + [IORING_OP_SENDZC] = { + .name = "SENDZC", + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollout = 1, + .audit_skip = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_sendzc_prep, + .issue = io_sendzc, +#else + .prep = io_eopnotsupp_prep, +#endif + + }, }; const char *io_uring_get_opcode(u8 opcode) From patchwork Tue Jul 5 15:01:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906709 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D244DC433EF for ; Tue, 5 Jul 2022 15:04:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232345AbiGEPEE (ORCPT ); Tue, 5 Jul 2022 11:04:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232711AbiGEPC3 (ORCPT ); Tue, 5 Jul 2022 11:02:29 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0F7817E1A; Tue, 5 Jul 2022 08:02:10 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id f2so12596507wrr.6; Tue, 05 Jul 2022 08:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OyzXlHlPpUF+PH3cvr0acLEzu0fqzPDm7G8p3eUY1iQ=; b=oY665IzMEMc069+iaFhE806eybX+Vi7JUUgzz76HgKm/BEi8RmiLw8O6vaotDBWZh4 sr7Jb9AaZLI5BkhrQO7WmZBhYcwGiX3CVB+Ikj/pQ1o3bWaMJZ2CmsCyaVxhx6YDSr4W e8uSk10DtnKW5qW7MXWlMgAryd6QG0MBYPdtpP7azjixrYMQ4bV4ip2DEr/u9O7VllAO fKHZVIR46xsXb/deGjNUeximHFKEbm/ftuNyOnndXgs3owWc4QYhAoG0cSRX/kNkeLVT LlwNyaYkeSw2SdI93JnMa6W7np8fjV2Q7t8s6HXMMGYlVPu+Fr0+rghlsiZXGH06tmY8 E8Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OyzXlHlPpUF+PH3cvr0acLEzu0fqzPDm7G8p3eUY1iQ=; b=qbz0V9Nbr5zwjyG6llzU9KSQxclGcke0cDDQzG8cJPz2XfxwW2ECfOpHQKVDghFrTm M19XkIzEA7xeyxl3Gkh5FaTctJot925sx+qVKBiFzzBMHSGF6RPPd6BJq6NdZz7Gc/s8 yrWo+I9w3/IoBd+jIQng+e6Xqqrtb+feqJCbmqFxmjOjnuSHAbsawJeIyli9dZXsKeEH A1KUvaeOtobxG0E4m3CRka7R45h3TysratNVekRWdqLf+vx6wk/8w1/ZX3r/IahASHbh sji/3WBExPVKVp6XLsbNWPhg/0LbDUUJiY6mXGo62z0lodJlpGPpOZsVHDhhzaaXSco3 glIg== X-Gm-Message-State: AJIora8zKZm8UkPtHqdgywyrI3XDehKe12cYRz0Vg9Zq17+dmtzWJHzb +IatkU1JfnByE3EPjf4f7t5dkkijANcKQA== X-Google-Smtp-Source: AGRyM1tJfsBeMrIe3Dl2+VRSgweWSzFDTlzj0Tjk8JsyY3B/9FeNKugG94VtGU0X3QRP0IKpD+znrQ== X-Received: by 2002:a05:6000:1282:b0:21d:6afa:35ca with SMTP id f2-20020a056000128200b0021d6afa35camr10615518wrx.452.1657033330047; Tue, 05 Jul 2022 08:02:10 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:09 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 18/25] io_uring: account locked pages for non-fixed zc Date: Tue, 5 Jul 2022 16:01:18 +0100 Message-Id: <8b9d08b0ef818070564864036419550d5d767911.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov --- io_uring/net.c | 1 + io_uring/notif.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index ef492f1360c8..d5b00e07e72b 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -834,6 +834,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); if (unlikely(ret)) return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) diff --git a/io_uring/notif.c b/io_uring/notif.c index 2e9329f97d2c..0a03d04c010b 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -12,7 +12,13 @@ static void __io_notif_complete_tw(struct callback_head *cb) { struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_ring_ctx *ctx = notif->ctx; + struct mmpin *mmp = ¬if->uarg.mmp; + if (mmp->user) { + atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm); + free_uid(mmp->user); + mmp->user = NULL; + } if (likely(notif->task)) { io_put_task(notif->task, 1); notif->task = NULL; From patchwork Tue Jul 5 15:01:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906710 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 862A0C433EF for ; Tue, 5 Jul 2022 15:04:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232575AbiGEPEI (ORCPT ); Tue, 5 Jul 2022 11:04:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232897AbiGEPCg (ORCPT ); Tue, 5 Jul 2022 11:02:36 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 493111583E; Tue, 5 Jul 2022 08:02:13 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id f190so7203896wma.5; Tue, 05 Jul 2022 08:02:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JHn8PFoK9H3KRsAlXjeVIwINrsajhaim20tJSDCud8k=; b=hG6Eqe7Xq3ndQPQF0PoPd/7qQ9L5ZrNhYi+zqVKfXIQ511zGtm6EWAMzoDMLwDJxPG dUF0BQpxFT+Fout1GLI9EAP5EFJbuYOasnEHZQmpGBAh/lubd4LdmBoCfPX0SqXIi0k9 iZr0q+ARCS70OFI2YqFNAeByRIurmt+xAv5sEMdC1i/KXQpenRJSSA+GIm7JLPFGPy6p xt+jAAYyIGfNpFlXjNj1i9KtbYXo473dvhk+Z/M1cMrmoa1eUrAajbKuds7vntrYEQpf SZTuU2dlCe5pKqeAux6EqGMF21sdw1E+OMOKyVaFh+uEraxtPATPmCMvino6oTPrKC9l 3wXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JHn8PFoK9H3KRsAlXjeVIwINrsajhaim20tJSDCud8k=; b=6qeDS/yQlmWk+pd58Bh0xNwga2qub83lKkA3Q2iIu0a10smm9Eh6D4lFDe5eiaR8xT SbZOIIJ4agjt/NEaf5w/5KYMxDM4uqjPjPDDtAoC7oyQLZGkd7UPh/v0H4DA0oOM00xX qtd80loHBWH9//K+LgakfTHK4W9vYMlXrlka2+WU88fsyVBGb004Cyeb3kCokfBcmQsJ M4Drc7IFc4oB3u8lN/bJKa8BSm4KE0cGgKlqlnG4dPYHqzOTCKjXE/9TdzzbOYB/D2Zb OsoyJaLq5FlI2DL2Y0YvvmyqJMpWe0KXcmc9mwVtKbOT43kArRoaig0wDhyhNhTrQsTq FBEA== X-Gm-Message-State: AJIora+u0YBJKKmeViEClZ/URpMg8MrI/0clWV+3DxBYLjokkrbMe+y+ 69nWs/s6jYbRxA8IQA3Rjo0bzm6t51uMtQ== X-Google-Smtp-Source: AGRyM1vcEp0B1UCb1qIt6dc5FJfBScUe8SB25U5gF337Z0QPQD5gMSBhYyz1hmfu6KJxYWshqfehuQ== X-Received: by 2002:a05:600c:4f54:b0:3a0:4a5b:2692 with SMTP id m20-20020a05600c4f5400b003a04a5b2692mr37580727wmq.109.1657033331360; Tue, 05 Jul 2022 08:02:11 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:10 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 19/25] io_uring: allow to pass addr into sendzc Date: Tue, 5 Jul 2022 16:01:19 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 2 +- io_uring/net.c | 17 ++++++++++++++++- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 0e1e179cec1d..abb8a9502f6e 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -64,7 +64,7 @@ struct io_uring_sqe { __u32 file_index; struct { __u16 notification_idx; - __u16 __pad; + __u16 addr_len; } __attribute__((packed)); }; union { diff --git a/io_uring/net.c b/io_uring/net.c index d5b00e07e72b..e63dda89c222 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -65,6 +65,8 @@ struct io_sendzc { size_t len; u16 slot_idx; int msg_flags; + int addr_len; + void __user *addr; }; #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) @@ -784,7 +786,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = io_kiocb_to_cmd(req); - if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0])) + if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->__pad2[0])) return -EINVAL; zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); @@ -793,6 +795,10 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->slot_idx = READ_ONCE(sqe->notification_idx); if (zc->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; + + zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + zc->addr_len = READ_ONCE(sqe->addr_len); + #ifdef CONFIG_COMPAT if (req->ctx->compat) zc->msg_flags |= MSG_CMSG_COMPAT; @@ -802,6 +808,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { + struct sockaddr_storage address; struct io_ring_ctx *ctx = req->ctx; struct io_sendzc *zc = io_kiocb_to_cmd(req); struct io_notif_slot *notif_slot; @@ -836,6 +843,14 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret; mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->addr) { + ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); + if (unlikely(ret < 0)) + return ret; + msg.msg_name = (struct sockaddr *)&address; + msg.msg_namelen = zc->addr_len; + } + msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) msg_flags |= MSG_DONTWAIT; From patchwork Tue Jul 5 15:01:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906711 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5114C433EF for ; Tue, 5 Jul 2022 15:04:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232626AbiGEPEf (ORCPT ); Tue, 5 Jul 2022 11:04:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232933AbiGEPCg (ORCPT ); Tue, 5 Jul 2022 11:02:36 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 803C61572F; Tue, 5 Jul 2022 08:02:14 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id k129so7229512wme.0; Tue, 05 Jul 2022 08:02:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QayDVb1TKX4nQ32atr/TNQ1V8pDXQwsZmBz0HTUXHx0=; b=UWMB+llED/MEE1En5HIuhSQzeryDHrKYY2Oa6b+KhkqzdKYypx6Kypn4Wby2ccgM7F UZRh74HoA0ltXLlK8BIMlhaJ+EUzL0qamQ52hZCAQNXm7tI0KehlCywyGmqJ4wlhDgJf 3awMgLs2eGkhY721CEm4oJs0m4z6M2stAlMl/wqIk8Fgo+v4dtt1um0c/+P4mK2e916l LzQIoapR4enyXXfP9Ixud/7yghoL+wi71/NQq/cWxYrt8o/DV9dg27Xomk/+1RC4hAQv wmvLWghSzEBwZpi0Hnx7Ln31Z1n7+/CKWMW1qEilJ9xcRUr1S/fhWpKwegJ6P3eMhAQU Fd3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QayDVb1TKX4nQ32atr/TNQ1V8pDXQwsZmBz0HTUXHx0=; b=u+orRW/lWpkCIvSLZVgIeWqhnkCH81yVIka2Mluk7mNX+joF6WoGH5VkebvVNrYJmn YH/x2Vl3J0xBW5ONoru7nfxVqmK7BaLRrlo+TUhovttFNH0j1Tf3Nre82YdtZzslP+ff gnIf7tN8Fnb5wFWbOv+8XrRJlfPJSyw5n+7Ue1Sp5g7cWA6uGPopA1oDCVzXJw2GEKVm xGQGtsBj/wvZGKa/Qc4yDeOSH3YRJRSgAR3n+01TCYowJKREUnkaeR0ll/M+8hvVMXYS vhe95NHxZ49rFkD7Bz+gL/enyqwKbqbscYm/1kazYeOJw6yJ/ILFVb6cRhyJT7Z97mNI sxkA== X-Gm-Message-State: AJIora8ea1dtBElIRuezRKIFFRgqd2IWxECJoeET9djF24aOALSgAXsw ap0jYWiBtaj/BniKBcp5hD/KYaUX/fGVVA== X-Google-Smtp-Source: AGRyM1s5PtBo8NsQvmSVPotWDsA2JqWeU6dcxcJtAw47i0CUV/9f8oeYLKjcrAhRTaov5q7+/AaP9g== X-Received: by 2002:a05:600c:a42:b0:39c:9086:8a34 with SMTP id c2-20020a05600c0a4200b0039c90868a34mr39290063wmq.169.1657033332623; Tue, 05 Jul 2022 08:02:12 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:12 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 20/25] io_uring: add rsrc referencing for notifiers Date: Tue, 5 Jul 2022 16:01:20 +0100 Message-Id: <77556e7ac0d76a760c3a3f739fb2d177853e76c0.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 5 +++++ io_uring/notif.h | 1 + io_uring/rsrc.h | 12 +++++++++--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index 0a03d04c010b..a53acdda9ec0 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -7,10 +7,12 @@ #include "io_uring.h" #include "notif.h" +#include "rsrc.h" static void __io_notif_complete_tw(struct callback_head *cb) { struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_rsrc_node *rsrc_node = notif->rsrc_node; struct io_ring_ctx *ctx = notif->ctx; struct mmpin *mmp = ¬if->uarg.mmp; @@ -31,6 +33,7 @@ static void __io_notif_complete_tw(struct callback_head *cb) ctx->notif_locked_nr++; io_cq_unlock_post(ctx); + io_rsrc_put_node(rsrc_node, 1); percpu_ref_put(&ctx->refs); } @@ -125,6 +128,8 @@ struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); + notif->rsrc_node = ctx->rsrc_node; + io_charge_rsrc_node(ctx); return notif; } diff --git a/io_uring/notif.h b/io_uring/notif.h index 6dde39c6afbe..00efe164bdc4 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -11,6 +11,7 @@ struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + struct io_rsrc_node *rsrc_node; /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ struct task_struct *task; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 87f58315b247..af342fd239d0 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -135,6 +135,13 @@ static inline void io_req_put_rsrc_locked(struct io_kiocb *req, } } +static inline void io_charge_rsrc_node(struct io_ring_ctx *ctx) +{ + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); +} + static inline void io_req_set_rsrc_node(struct io_kiocb *req, struct io_ring_ctx *ctx, unsigned int issue_flags) @@ -144,9 +151,8 @@ static inline void io_req_set_rsrc_node(struct io_kiocb *req, if (!(issue_flags & IO_URING_F_UNLOCKED)) { lockdep_assert_held(&ctx->uring_lock); - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); + + io_charge_rsrc_node(ctx); } else { percpu_ref_get(&req->rsrc_node->refs); } From patchwork Tue Jul 5 15:01:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906712 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C5D8C43334 for ; Tue, 5 Jul 2022 15:04:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232664AbiGEPEh (ORCPT ); Tue, 5 Jul 2022 11:04:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231761AbiGEPCu (ORCPT ); Tue, 5 Jul 2022 11:02:50 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19E851836E; Tue, 5 Jul 2022 08:02:16 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id g39-20020a05600c4ca700b003a03ac7d540so9879459wmp.3; Tue, 05 Jul 2022 08:02:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Tbxo2zxY0T1Ay96nPliwwxqArxsvLBHX36I30j8CQ68=; b=NCSNg1grb3sp1zJnbVBWrlZI1p9iBfgy2h2WVpCPVuYznLkf4LQy0FZAeSOnjO/Sh9 NJ5HFg78astN1ajHtl4bg96EqQMecVpq8Tn52t+wMb6B7FBednF9Ozz0ZcPUHo0wDAqX MJ10tapRsL7i0GHM8UyMNc4ZwfCaSDKzSg8+Jf5kXlDj4RoHdUPa5V96sfM6NEl8+faC cYLEfCIWr89lYD/2mxLYQQnXbdnxt7WN20rkqdqao/AedTQrhKAn+WaHsa8eOCc81bvz BYniL/D20lf20pNINSIv/IrTA7SYw5+wdP4ZAwkqI77QxJB6JOVNMnBIXkVrpmfYZ31o UJ+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Tbxo2zxY0T1Ay96nPliwwxqArxsvLBHX36I30j8CQ68=; b=7BgiqLAyMghlPuraAiFlWl5IVWrERZOWig2bk8M6crVNSG6HGTnJHir7tbuZFdr81+ yG1xxdXkKIjnKVeijDjPht3pzAsoB9JsVkC8tQkuWPLGgDFQYDdKyOFzlHnJaJJKfoD4 twGfH17ihhgElivTzRMS0jbJdVcamOJFPXeUj6OGWfE/h+GBTtlnGV8mBRAn8/U3WeXU G3A5X6AcBKXcPYZFA/dy04x2ak/SE7R4juqu4n3liRuaQnVn8cOb/FkBzPdVBVFOZW3a t1EDk9opBV1AQYDRiG/JIuCX+dtNxSXr6fEp/HAAp/NwSYswHN8hOb2jAcTvnI/HlT0z 40NA== X-Gm-Message-State: AJIora/OaDFRh9mC6Bg9hBDhJyAdKzkCkDnT4LvGjSexW/hd5CJg8Xhq zMpudXh7cbd9nZoDxFbdvEJAWjpjhFmmFw== X-Google-Smtp-Source: AGRyM1tG5Re8Ez6RpeHcx2yU/rPLUkwLu09WetKbYfGBR1DZEQ/QvDTNjPW0Yse6uWImrH3q33XSpw== X-Received: by 2002:a05:600c:3b1d:b0:3a2:60a1:fe30 with SMTP id m29-20020a05600c3b1d00b003a260a1fe30mr10800015wms.193.1657033333933; Tue, 05 Jul 2022 08:02:13 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:13 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 21/25] io_uring: sendzc with fixed buffers Date: Tue, 5 Jul 2022 16:01:21 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 7 ++++++ io_uring/net.c | 40 +++++++++++++++++++++++++++++------ 2 files changed, 41 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index abb8a9502f6e..2509e6184bc7 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -272,6 +272,13 @@ enum io_uring_op { */ #define IORING_ACCEPT_MULTISHOT (1U << 0) +/* + * IORING_OP_SENDZC flags + */ +enum { + IORING_SENDZC_FIXED_BUF = (1U << 0), +}; + /* * IO completion data structure (Completion Queue Entry) */ diff --git a/io_uring/net.c b/io_uring/net.c index e63dda89c222..3dfe07749b04 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -14,6 +14,7 @@ #include "kbuf.h" #include "net.h" #include "notif.h" +#include "rsrc.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -65,6 +66,7 @@ struct io_sendzc { size_t len; u16 slot_idx; int msg_flags; + unsigned zc_flags; int addr_len; void __user *addr; }; @@ -782,11 +784,14 @@ int io_connect(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FIXED_BUF + int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = io_kiocb_to_cmd(req); + struct io_ring_ctx *ctx = req->ctx; - if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->__pad2[0])) + if (READ_ONCE(sqe->__pad2[0])) return -EINVAL; zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); @@ -799,6 +804,20 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); zc->addr_len = READ_ONCE(sqe->addr_len); + zc->zc_flags = READ_ONCE(sqe->ioprio); + if (zc->zc_flags & ~IO_SENDZC_VALID_FLAGS) + return -EINVAL; + + if (zc->zc_flags & IORING_SENDZC_FIXED_BUF) { + unsigned idx = READ_ONCE(sqe->buf_index); + + if (unlikely(idx >= ctx->nr_user_bufs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_user_bufs); + req->imu = READ_ONCE(ctx->user_bufs[idx]); + io_req_set_rsrc_node(req, ctx, 0); + } + #ifdef CONFIG_COMPAT if (req->ctx->compat) zc->msg_flags |= MSG_CMSG_COMPAT; @@ -836,12 +855,21 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; - msg.msg_managed_data = 0; + msg.msg_managed_data = 1; - ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); - if (unlikely(ret)) - return ret; - mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->zc_flags & IORING_SENDZC_FIXED_BUF) { + ret = io_import_fixed(WRITE, &msg.msg_iter, req->imu, + (u64)zc->buf, zc->len); + if (unlikely(ret)) + return ret; + } else { + msg.msg_managed_data = 0; + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, + &msg.msg_iter); + if (unlikely(ret)) + return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + } if (zc->addr) { ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); From patchwork Tue Jul 5 15:01:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906713 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 759C4C433EF for ; Tue, 5 Jul 2022 15:04:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232677AbiGEPEj (ORCPT ); Tue, 5 Jul 2022 11:04:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232322AbiGEPCv (ORCPT ); Tue, 5 Jul 2022 11:02:51 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A9AF18370; Tue, 5 Jul 2022 08:02:16 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id d16so11646027wrv.10; Tue, 05 Jul 2022 08:02:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3GBPmgCGqcb6RDnRFlWXFby0D6ygdiWLfcnYQGNqe1E=; b=kfI259WQOofrODpNTznY1cByP6vz8zo/Sgx0MHTrgRdHcapBKFmwedqAfkfemhvE80 tKmj48GasWHLtxnIM0NZt30Aesn98tLdCCingJh5CXa4Rc/YvPyABpAHdo2K2rqQicCn bT+qP5su7Nadflia3CIL5puKBt1xixmtlIFlBBKgaLcSZuvIRg/hDlI1uVO6BZWJazfb oX/CRfARWFQrCgLgrzDDLE83nHg+OSAX29EAaoVNMyh3nIFJpzQoig6oTl/CXx1XuUiJ lf5/qW2cmeXTrD33wH1pZzWIyzIOzoGgR5uVLoqFgLs1hYuLUTgExLoh17wbOGSsAQkG mjuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3GBPmgCGqcb6RDnRFlWXFby0D6ygdiWLfcnYQGNqe1E=; b=FZ2NVzkdV5r5SW/Lfc0r4Z3kKw2XNayHNHWXuDEc29k0C8T84/WklrdguM9BvuPlNT jVoefzuVrxakkt24JSZqsFfT0GLME2T3GJx60y2b5xG36AsKGQCVRl65li6jmpEeIMxp Xv8gc3gPAqJD7PMo3kvDxCb3za5mJkn7chJOrWoYkwClFaTST4lVFNuSLL4v/K1AiIuZ DdpnWjhVKzIasBuvIELfd7oTbvBAqDOQg+YJePVBp7fMCpW82bdMuAOfnccKfQ7MoRbc ufbtO7OhcyZ5+qQIlEBMLnSyGuC/x0cEfQBSlMdsmIoAPPDVzjU7dLyS+a26xfusMpuP roiQ== X-Gm-Message-State: AJIora9WVJzU3WE7n5t2oI5iHpm/cGuE8vWQo0OjNPmBPp8Df14xvwAq J7EX6xuHv5w2uKX6JTdvIXfwmTHCb2AlNA== X-Google-Smtp-Source: AGRyM1vn7lr717+GQspSu4HyZ37BBlyB+iCX5LNqnylcXTTkicw/0usXklZrZCaBhPD0iUCGUz03Ng== X-Received: by 2002:a5d:4986:0:b0:21d:776c:2f11 with SMTP id r6-20020a5d4986000000b0021d776c2f11mr1566102wrq.119.1657033335288; Tue, 05 Jul 2022 08:02:15 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:14 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 22/25] io_uring: flush notifiers after sendzc Date: Tue, 5 Jul 2022 16:01:22 +0100 Message-Id: <4b9bec36993104ac2a1183e81eaca8cce15ffe32.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 1 + io_uring/io_uring.c | 11 +---------- io_uring/io_uring.h | 10 ++++++++++ io_uring/net.c | 4 +++- io_uring/notif.c | 2 +- io_uring/notif.h | 11 +++++++++++ 6 files changed, 27 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 2509e6184bc7..2fd4e39a14d3 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -277,6 +277,7 @@ enum io_uring_op { */ enum { IORING_SENDZC_FIXED_BUF = (1U << 0), + IORING_SENDZC_FLUSH = (1U << 1), }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3b885d65e569..8f4152f01989 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -615,7 +615,7 @@ void __io_put_task(struct task_struct *task, int nr) put_task_struct_many(task, nr); } -static void io_task_refs_refill(struct io_uring_task *tctx) +void io_task_refs_refill(struct io_uring_task *tctx) { unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR; @@ -624,15 +624,6 @@ static void io_task_refs_refill(struct io_uring_task *tctx) tctx->cached_refs += refill; } -static inline void io_get_task_refs(int nr) -{ - struct io_uring_task *tctx = current->io_uring; - - tctx->cached_refs -= nr; - if (unlikely(tctx->cached_refs < 0)) - io_task_refs_refill(tctx); -} - static __cold void io_uring_drop_tctx_refs(struct task_struct *task) { struct io_uring_task *tctx = task->io_uring; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index e978654d1b14..cf154e9c8e28 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -60,6 +60,7 @@ void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); void io_queue_next(struct io_kiocb *req); void __io_put_task(struct task_struct *task, int nr); +void io_task_refs_refill(struct io_uring_task *tctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); @@ -254,4 +255,13 @@ static inline void io_put_task(struct task_struct *task, int nr) __io_put_task(task, nr); } +static inline void io_get_task_refs(int nr) +{ + struct io_uring_task *tctx = current->io_uring; + + tctx->cached_refs -= nr; + if (unlikely(tctx->cached_refs < 0)) + io_task_refs_refill(tctx); +} + #endif diff --git a/io_uring/net.c b/io_uring/net.c index 3dfe07749b04..3cd75d69fe70 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -784,7 +784,7 @@ int io_connect(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } -#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FIXED_BUF +#define IO_SENDZC_VALID_FLAGS (IORING_SENDZC_FIXED_BUF|IORING_SENDZC_FLUSH) int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -895,6 +895,8 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret == -ERESTARTSYS ? -EINTR : ret; } + if (zc->zc_flags & IORING_SENDZC_FLUSH) + io_notif_slot_flush_submit(notif_slot, 0); io_req_set_res(req, ret, 0); return IOU_OK; } diff --git a/io_uring/notif.c b/io_uring/notif.c index a53acdda9ec0..847535d34c65 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -133,7 +133,7 @@ struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, return notif; } -static void io_notif_slot_flush(struct io_notif_slot *slot) +void io_notif_slot_flush(struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif = slot->notif; diff --git a/io_uring/notif.h b/io_uring/notif.h index 00efe164bdc4..6cd73d7b965b 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -54,6 +54,7 @@ int io_notif_register(struct io_ring_ctx *ctx, int io_notif_unregister(struct io_ring_ctx *ctx); void io_notif_cache_purge(struct io_ring_ctx *ctx); +void io_notif_slot_flush(struct io_notif_slot *slot); struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot); @@ -74,3 +75,13 @@ static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, idx = array_index_nospec(idx, ctx->nr_notif_slots); return &ctx->notif_slots[idx]; } + +static inline void io_notif_slot_flush_submit(struct io_notif_slot *slot, + unsigned int issue_flags) +{ + if (!(issue_flags & IO_URING_F_UNLOCKED)) { + slot->notif->task = current; + io_get_task_refs(1); + } + io_notif_slot_flush(slot); +} From patchwork Tue Jul 5 15:01:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906714 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39610C433EF for ; Tue, 5 Jul 2022 15:04:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232768AbiGEPEz (ORCPT ); Tue, 5 Jul 2022 11:04:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232533AbiGEPCy (ORCPT ); Tue, 5 Jul 2022 11:02:54 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3341838D; Tue, 5 Jul 2022 08:02:17 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id v14so17966786wra.5; Tue, 05 Jul 2022 08:02:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RCosh7FOoSgjGOXo84JblI0oJogt+tqCd1iVGCEaer0=; b=ezU4bSCSbej7pZdo37cxHwOyeTZohSfd/AJrTWbD66LONC+237j+fh6TQIdkZYI4pQ GizXpAIX9KW7UPDh5uU3FqAE1+P47oAzKZ3btbeX/XPqTXfoMyKAmpgTsfU+zsHl7SiG QGuxE26RIisL2xsqLTzfWG3bqC2kfozANyyQ9ZXpJJXXe2L84fYPA4HGtAqc2Q1VLGrB sVHb6BOo6vuEJ/ipOoXoCG8C77ss0C1TMKEbQVfh7HgR0dCtsSVcrUtDLUjIix6kLVg5 RSrkvt8/WyQOPTE5BMqtg9llGtgf/DsDznMjjrgE2gCg+zUDOSURsOH5sH+Tn+QcRIF/ HzAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RCosh7FOoSgjGOXo84JblI0oJogt+tqCd1iVGCEaer0=; b=UIJMWjyZIaep803zITbp96q2memZXB18Jrqru2kfqjlx1Io86JRrhl5l3DEIMEXQvI nsg1opkytsifZK4wYgtVQW/TJJSfCDtP+bjqe4z+azUQFGhyf+0SEu0cR7/cun0jX55B wKdz00tG61qajwqiFpga5j/VkIrmSp4+t04MiDg/ghE9TIZR3HI40YwdqrQNOfJJy/lx d6s1FUShHKDnFDHzuBuyie+OhGaPkKqkqPVTe9ERE7AS5xstN+kvBKD8MTh4MwM3xXP9 6sYrEKTUbocPQcgxsw/i9mCeDaQlUXKrW2Y3hmvm39MEC7I95DC5CGcmZj6QdomkpnIm 2Ecw== X-Gm-Message-State: AJIora8XRn8BTtRZocy7g3rXoL5InQ08C8pPGappkvxONaey636ano3y eNWRJ1+5hPK8CWRm4xBKD0DEiZeYlS71lw== X-Google-Smtp-Source: AGRyM1toH9h98DrWA3/IURKNLs3GlYCxAxnRZUfGwqTThP/Jc8XEgIje+oNCoNe+Els01CPBQopG3g== X-Received: by 2002:adf:d22f:0:b0:21d:6b26:8c6f with SMTP id k15-20020adfd22f000000b0021d6b268c6fmr10509416wrh.70.1657033336626; Tue, 05 Jul 2022 08:02:16 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:16 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 23/25] io_uring: rename IORING_OP_FILES_UPDATE Date: Tue, 5 Jul 2022 16:01:23 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 12 +++++++++++- io_uring/opdef.c | 9 +++++---- io_uring/rsrc.c | 17 +++++++++++++++-- io_uring/rsrc.h | 4 ++-- 4 files changed, 33 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 2fd4e39a14d3..e62e61ceb494 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -170,7 +170,8 @@ enum io_uring_op { IORING_OP_FALLOCATE, IORING_OP_OPENAT, IORING_OP_CLOSE, - IORING_OP_FILES_UPDATE, + IORING_OP_RSRC_UPDATE, + IORING_OP_FILES_UPDATE = IORING_OP_RSRC_UPDATE, IORING_OP_STATX, IORING_OP_READ, IORING_OP_WRITE, @@ -219,6 +220,7 @@ enum io_uring_op { #define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) #define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) #define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) + /* * sqe->splice_flags * extends splice(2) flags @@ -272,6 +274,14 @@ enum io_uring_op { */ #define IORING_ACCEPT_MULTISHOT (1U << 0) + +/* + * IORING_OP_RSRC_UPDATE flags + */ +enum { + IORING_RSRC_UPDATE_FILES, +}; + /* * IORING_OP_SENDZC flags */ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 91d425b43174..431b73e9b378 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -244,12 +244,13 @@ const struct io_op_def io_op_defs[] = { .prep = io_close_prep, .issue = io_close, }, - [IORING_OP_FILES_UPDATE] = { + [IORING_OP_RSRC_UPDATE] = { .audit_skip = 1, .iopoll = 1, - .name = "FILES_UPDATE", - .prep = io_files_update_prep, - .issue = io_files_update, + .name = "RSRC_UPDATE", + .prep = io_rsrc_update_prep, + .issue = io_rsrc_update, + .ioprio = 1, }, [IORING_OP_STATX] = { .audit_skip = 1, diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 3a2a5ef263f0..0c3f95f24cef 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -21,6 +21,7 @@ struct io_rsrc_update { u64 arg; u32 nr_args; u32 offset; + int type; }; static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, @@ -658,7 +659,7 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, return -EINVAL; } -int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); @@ -672,6 +673,7 @@ int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (!up->nr_args) return -EINVAL; up->arg = READ_ONCE(sqe->addr); + up->type = READ_ONCE(sqe->ioprio); return 0; } @@ -711,7 +713,7 @@ static int io_files_update_with_index_alloc(struct io_kiocb *req, return ret; } -int io_files_update(struct io_kiocb *req, unsigned int issue_flags) +static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); struct io_ring_ctx *ctx = req->ctx; @@ -740,6 +742,17 @@ int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_rsrc_update *up = io_kiocb_to_cmd(req); + + switch (up->type) { + case IORING_RSRC_UPDATE_FILES: + return io_files_update(req, issue_flags); + } + return -EINVAL; +} + int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx, struct io_rsrc_node *node, void *rsrc) { diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index af342fd239d0..21813a23215f 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -167,6 +167,6 @@ static inline u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx) return &data->tags[table_idx][off]; } -int io_files_update(struct io_kiocb *req, unsigned int issue_flags); -int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags); +int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); #endif From patchwork Tue Jul 5 15:01:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906715 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 202D6CCA47B for ; Tue, 5 Jul 2022 15:04:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232818AbiGEPE4 (ORCPT ); Tue, 5 Jul 2022 11:04:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232421AbiGEPDA (ORCPT ); Tue, 5 Jul 2022 11:03:00 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2724183AA; Tue, 5 Jul 2022 08:02:18 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id f190so7204162wma.5; Tue, 05 Jul 2022 08:02:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=X9c0BvKViP3ldUZ7qslOympJ4LcsRat6WwWwcKSD/go=; b=QwnwG1bdlC6DRcRsAGYm3vIwKJpfHOQ0fxjA1/DgK51yBmPaauAmQs5XjjmLuQP6lh RvHLFn0/7oOyThXmyLBlv4gNOSU6KjEsQ7m8GwTa44IH5JKrYx/9kOMRRfn3+DDMDPiG cMMK+qOFqXlKTK5pVh+G6zCJBzAx0gOrMYOqubsCWp1hI0LvWWMUNtvu1Oj9wxq4P/c3 tKSnsUCxP9mRnIt8C+66nAQHmL+6X9uFmTt2EWWujO2KQuGIN5fvJ7Cou0v0U7Wy7VAK KlAy2v1RaHJAhL+6e/iaQvybKQpaNn7goRjWhEMyFbrEipxy7T+/ZizVihgAR8NXc/84 dMuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=X9c0BvKViP3ldUZ7qslOympJ4LcsRat6WwWwcKSD/go=; b=zSRujog39/q+oGbfi9N8l9JmfwQwB167sJIZIesBj2WDspq9yD1lWLQy01jC7LYZoA 9FaGtaISsonGiPxPFK+lfSl5QzwPZw9zL7jbNfOlIoYXFcAFoz3mi1wUYqM0H7Ay+GDS WDGgDJ/Qd0X2p62ahihjEjtoyxPmTYYampwyc2xh2++h8XeST3sus2toKmClZXqqKn29 6vCZ9dBx7MvD318n3Tj7SZ7L20bRsS5gX9kBScONpu2warMW7wSGGBBuaPojmS1dVNIs 1+6SqsEMl70gwMi+gtYEW88mx0vLhYPSiUDQV810gKtah9WAXZOGdIFTF9Hw2jrNkl0F BxyQ== X-Gm-Message-State: AJIora99RbjyXtRikL0dhuKc6MS6HSQBsgzr8YgVuPvUbC7nYgg42/1p XrdeYpMVlN4C/6ZlZsM3ynvh2/vbi6QqEg== X-Google-Smtp-Source: AGRyM1tPrNLSIuUHE0Tt8KKXOeaHkqjoKSKYypCeQNDCBUCcMuv5dgzTEl4ov/gYESA52xM770VHWg== X-Received: by 2002:a05:600c:a4c:b0:39c:6517:1136 with SMTP id c12-20020a05600c0a4c00b0039c65171136mr37100230wmq.12.1657033337857; Tue, 05 Jul 2022 08:02:17 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:17 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 24/25] io_uring: add zc notification flush requests Date: Tue, 5 Jul 2022 16:01:24 +0100 Message-Id: <36e4a90c33718fa1ca634d5fb0352ef7177462d4.1656318994.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 1 + io_uring/rsrc.c | 38 +++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e62e61ceb494..eeb0fbee19cb 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -280,6 +280,7 @@ enum io_uring_op { */ enum { IORING_RSRC_UPDATE_FILES, + IORING_RSRC_UPDATE_NOTIF, }; /* diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 0c3f95f24cef..af58d58dd21b 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -15,6 +15,7 @@ #include "io_uring.h" #include "openclose.h" #include "rsrc.h" +#include "notif.h" struct io_rsrc_update { struct file *file; @@ -742,6 +743,41 @@ static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +static int io_notif_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_rsrc_update *up = io_kiocb_to_cmd(req); + struct io_ring_ctx *ctx = req->ctx; + unsigned len = up->nr_args; + unsigned idx_end, idx = up->offset; + int ret = 0; + + io_ring_submit_lock(ctx, issue_flags); + if (unlikely(check_add_overflow(idx, len, &idx_end))) { + ret = -EOVERFLOW; + goto out; + } + if (unlikely(idx_end > ctx->nr_notif_slots)) { + ret = -EINVAL; + goto out; + } + + for (; idx < idx_end; idx++) { + struct io_notif_slot *slot = &ctx->notif_slots[idx]; + + if (!slot->notif) + continue; + if (up->arg) + slot->tag = up->arg; + io_notif_slot_flush_submit(slot, issue_flags); + } +out: + io_ring_submit_unlock(ctx, issue_flags); + if (ret < 0) + req_set_fail(req); + io_req_set_res(req, ret, 0); + return IOU_OK; +} + int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); @@ -749,6 +785,8 @@ int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) switch (up->type) { case IORING_RSRC_UPDATE_FILES: return io_files_update(req, issue_flags); + case IORING_RSRC_UPDATE_NOTIF: + return io_notif_update(req, issue_flags); } return -EINVAL; } From patchwork Tue Jul 5 15:01:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12906716 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 701BAC433EF for ; Tue, 5 Jul 2022 15:05:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232873AbiGEPFN (ORCPT ); Tue, 5 Jul 2022 11:05:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230300AbiGEPDh (ORCPT ); Tue, 5 Jul 2022 11:03:37 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EA32186E3; Tue, 5 Jul 2022 08:02:20 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id r81-20020a1c4454000000b003a0297a61ddso9892138wma.2; Tue, 05 Jul 2022 08:02:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yNwXN07cYz9oATyqiff3MOHF5I7hs7uLWOybawFraro=; b=Qn/p2isQgAwbf/N4NRUAAt28jaLe2XamgAjc6WUTm9trsrD9Vu6iouDvS65SGrRl8y As1n3cs16DisfoPsRxwLeT84B3n5rbDlvzTv2tUZNMgFNsrbAB6K0dfWRJdcVhmqtREC hX8a688zX7MLuVqwuT42KuWFV0q4Ut0jH8BW5Qv2UMHcvEM7qb/m97EmV/GkdOvwAlt+ Y5QNvR5bGUXxcSLUC8VetVHYlXzQNY8aSqYuO0PDiOARjdV97rwh9wjIda9dATCS1By5 u+T1aaX1XVhdxPsp4MF8Ybl1MlKqGiT32GYkB3ZmpF63p2wIiuNgu4OSe0P+Ru7vI3iX aBDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yNwXN07cYz9oATyqiff3MOHF5I7hs7uLWOybawFraro=; b=jaIC23HmTq9ACbGhwgewX9m+vkYb4iKpxgEedbFWseysHjJS9iNrHD8zSIklZqlWwb Dl2Li/wU1W+M290f/Kl9O7QODfP8PZ2SHaEoBF9/m3Kb81Yc6TTmTVpFYbS8cr4KpGhd Qa4hpxnDhvmdCwwcYMArne4Gy4lkdMrEyjFomVwpo1dDrNs7iqV7HRTME+UjKv/cBYfr LsPH2ECHfuiceZN04/kaFkyYw2XQD6T5b/fGUaPX4nVaKy2FWck3ErT8TaLyfdLXy0HF YScYIUbh5Thnd3caYJIqqws8TTsl9ubhQyQRpkb8S6iEY/LvdcaiqEZlLRKen9emFhWz Zj/A== X-Gm-Message-State: AJIora/0qJz/0aAx6hLZWcXfa+ctMdmkj/d8edsvaBMQr5bnYFmlUjqA u9DrMSDUmbLS5zBbF704+5Hsse3dkhBBCA== X-Google-Smtp-Source: AGRyM1s+4UQ3eAXjodrnGKpZt0fNPp/wU5y4xwO26bcepd3wUWmF57ZO92SSZTXW0YXJD+bqBDMqgQ== X-Received: by 2002:a05:600c:4f42:b0:3a0:57ed:93a9 with SMTP id m2-20020a05600c4f4200b003a057ed93a9mr36757575wmq.143.1657033339286; Tue, 05 Jul 2022 08:02:19 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id k27-20020adfd23b000000b0021d728d687asm2518200wrh.36.2022.07.05.08.02.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 08:02:18 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v3 25/25] selftests/io_uring: test zerocopy send Date: Tue, 5 Jul 2022 16:01:25 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add selftests for io_uring zerocopy sends and io_uring's notification infrastructure. It's largely influenced by msg_zerocopy and uses it on the receive side. Signed-off-by: Pavel Begunkov --- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 605 ++++++++++++++++++ .../selftests/net/io_uring_zerocopy_tx.sh | 131 ++++ 3 files changed, 737 insertions(+) create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 7ea54af55490..51261483744e 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -59,6 +59,7 @@ TEST_GEN_FILES += toeplitz TEST_GEN_FILES += cmsg_sender TEST_GEN_FILES += stress_reuseport_listen TEST_PROGS += test_vxlan_vnifiltering.sh +TEST_GEN_FILES += io_uring_zerocopy_tx TEST_FILES := settings diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c new file mode 100644 index 000000000000..899ddc84f8a9 --- /dev/null +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -0,0 +1,605 @@ +/* SPDX-License-Identifier: MIT */ +/* based on linux-kernel/tools/testing/selftests/net/msg_zerocopy.c */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define NOTIF_TAG 0xfffffffULL +#define NONZC_TAG 0 +#define ZC_TAG 1 + +enum { + MODE_NONZC = 0, + MODE_ZC = 1, + MODE_ZC_FIXED = 2, + MODE_MIXED = 3, +}; + +static bool cfg_flush = false; +static bool cfg_cork = false; +static int cfg_mode = MODE_ZC_FIXED; +static int cfg_nr_reqs = 8; +static int cfg_family = PF_UNSPEC; +static int cfg_payload_len; +static int cfg_port = 8000; +static int cfg_runtime_ms = 4200; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); + +struct io_sq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + unsigned *flags; + unsigned *array; +}; + +struct io_cq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *kflags; + unsigned *kdropped; + unsigned *array; + struct io_uring_sqe *sqes; + + unsigned sqe_head; + unsigned sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#ifdef __alpha__ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 535 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 536 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 537 +# endif +#else /* !__alpha__ */ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 425 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 426 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 427 +# endif +#endif + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else + +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static int io_uring_setup(unsigned int entries, struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_register_notifications(struct io_uring *ring, + unsigned nr, + struct io_uring_notification_slot *slots) +{ + int ret; + struct io_uring_notification_register r = { + .nr_slots = nr, + .data = (unsigned long)slots, + }; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_NOTIFIERS, &r, sizeof(r)); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static int io_uring_queue_init(unsigned entries, struct io_uring *ring, + unsigned flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +static int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned mask = *sq->kring_mask; + unsigned ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8) IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long) buf; + sqe->len = len; + sqe->msg_flags = (__u32) flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned slot_idx, unsigned zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8) IORING_OP_SENDZC; + sqe->notification_idx = slot_idx; + sqe->ioprio = zc_flags; +} + +static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned mask = *cq->kring_mask; + unsigned head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_setsockopt(int fd, int level, int optname, int val) +{ + if (setsockopt(fd, level, optname, &val, sizeof(val))) + error(1, errno, "setsockopt %d.%d: %d", level, optname, val); +} + +static int do_setup_tx(int domain, int type, int protocol) +{ + int fd; + + fd = socket(domain, type, protocol); + if (fd == -1) + error(1, errno, "socket t"); + + do_setsockopt(fd, SOL_SOCKET, SO_SNDBUF, 1 << 21); + + if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + return fd; +} + +static void do_tx(int domain, int type, int protocol) +{ + struct io_uring_notification_slot b[1] = {{.tag = NOTIF_TAG}}; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned long packets = 0, bytes = 0; + struct io_uring ring; + struct iovec iov; + uint64_t tstop; + int i, fd, ret; + int compl_cqes = 0; + + fd = do_setup_tx(domain, type, protocol); + + ret = io_uring_queue_init(512, &ring, 0); + if (ret) + error(1, ret, "io_uring: queue init"); + + ret = io_uring_register_notifications(&ring, 1, b); + if (ret) + error(1, ret, "io_uring: tx ctx registration"); + + iov.iov_base = payload; + iov.iov_len = cfg_payload_len; + + ret = io_uring_register_buffers(&ring, &iov, 1); + if (ret) + error(1, ret, "io_uring: buffer registration"); + + tstop = gettimeofday_ms() + cfg_runtime_ms; + do { + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 1); + + for (i = 0; i < cfg_nr_reqs; i++) { + unsigned zc_flags = 0; + unsigned buf_idx = 0; + unsigned slot_idx = 0; + unsigned mode = cfg_mode; + unsigned msg_flags = 0; + + if (cfg_mode == MODE_MIXED) + mode = rand() % 3; + + sqe = io_uring_get_sqe(&ring); + + if (mode == MODE_NONZC) { + io_uring_prep_send(sqe, fd, payload, + cfg_payload_len, msg_flags); + sqe->user_data = NONZC_TAG; + } else { + if (cfg_flush) { + zc_flags |= IORING_SENDZC_FLUSH; + compl_cqes++; + } + io_uring_prep_sendzc(sqe, fd, payload, + cfg_payload_len, + msg_flags, slot_idx, zc_flags); + if (mode == MODE_ZC_FIXED) { + sqe->ioprio |= IORING_SENDZC_FIXED_BUF; + sqe->buf_index = buf_idx; + } + sqe->user_data = ZC_TAG; + } + } + + ret = io_uring_submit(&ring); + if (ret != cfg_nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < cfg_nr_reqs; i++) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + + if (cqe->user_data == NOTIF_TAG) { + compl_cqes--; + i--; + } else if (cqe->user_data != NONZC_TAG && + cqe->user_data != ZC_TAG) { + error(1, cqe->res, "invalid user_data"); + } else if (cqe->res <= 0 && cqe->res != -EAGAIN) { + error(1, cqe->res, "send failed"); + } else { + if (cqe->res > 0) { + packets++; + bytes += cqe->res; + } + /* failed requests don't flush */ + if (cfg_flush && + cqe->res <= 0 && + cqe->user_data == ZC_TAG) + compl_cqes--; + } + io_uring_cqe_seen(&ring); + } + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 0); + } while (gettimeofday_ms() < tstop); + + if (close(fd)) + error(1, errno, "close"); + + fprintf(stderr, "tx=%lu (MB=%lu), tx/s=%lu (MB/s=%lu)\n", + packets, bytes >> 20, + packets / (cfg_runtime_ms / 1000), + (bytes >> 20) / (cfg_runtime_ms / 1000)); + + while (compl_cqes) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + io_uring_cqe_seen(&ring); + compl_cqes--; + } +} + +static void do_test(int domain, int type, int protocol) +{ + int i; + + for (i = 0; i < IP_MAXPACKET; i++) + payload[i] = 'a' + (i % 26); + do_tx(domain, type, protocol); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-f] [-n] [-z0] [-s] " + "(-4|-6) [-t