From patchwork Thu Jul 7 11:49:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E320C433EF for ; Thu, 7 Jul 2022 11:51:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235163AbiGGLvw (ORCPT ); Thu, 7 Jul 2022 07:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234803AbiGGLvv (ORCPT ); Thu, 7 Jul 2022 07:51:51 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D564F4507D; Thu, 7 Jul 2022 04:51:49 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id d16so19626195wrv.10; Thu, 07 Jul 2022 04:51:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aRSWPFJdUal4sFDyAFcRSHQQ/M3F+xYGlhF06C7Lf5E=; b=GMs/RhN8qUMfPGBs4yo+sSPeIeKLafGXETcG+w4RWJBYHzK8hsztfbzT6JKFKgPgJ1 3R2wDQ9Xrnd0oaObDr0yyhD89Xl+ZqwnnaPqB2jqF2k4hq2ma2c1KJBky6XI9Z8CPFAi tTPVPYThTF2p00DcfNOlhmvPn0CFcbxYNgWdKBpfWgmyTvRd3l0WVB2OzQ5LbOfb0yKa IeYmi57QoUBJ20PoxQkWZFTpULc/XhWBK6/+Fb+cv2KWtBGTNqnPmqtPigUv0L9Ip/d2 0EUcdcP59POGgMgufO7jMX+tHUhLsv2lr7VZzShByWv8Ck9U1aCvUL56aKYbotvQPGW+ 55iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aRSWPFJdUal4sFDyAFcRSHQQ/M3F+xYGlhF06C7Lf5E=; b=jlllIr/ax5rz7yjmm0MbSztjR8ZgEfp3ycIE3p4UBax/RwXgnow8/KkqAh9RLo2sWL 1iaIitHjpEPHmnp19rSqMM6kxNm9H7I8jsorZz/8Y24amYHaDAANDohnN8wTpdSYm9X9 husH+RudY41YUjYy6G6ntI9K5TRIJMxxmujyY7rEHmqgE1v64HYOtOsVyQQCuQglupvm hiqKG4LSbeNtNMsAkhPnUjeRxa7M5mnRf6DslJbBC5JDEWxyOOUdXdc5kwnaSeRseFGf P465mf0u1ClwDU61d3YA+92f0/nmhk6MGZnqtr5mXaAISbc0TQLMp49DFnGc6Vvlib+6 4eqw== X-Gm-Message-State: AJIora/xZ3CjJ1bNRsKbpIxV+j0ETume1enZuO4LN/3oeUPdRtKwlApo A9TzOjuAegSTCWrBw/ZQlrqxj8ltsw5aSB7mYlQ= X-Google-Smtp-Source: AGRyM1uxlRBXgmEexZel6mtArbY3Szz4Y8NjgueyWbmhHTDmaxbayaLLP/y6jHPuvF6XWwn1Fq2o8Q== X-Received: by 2002:a5d:6b81:0:b0:21d:72a8:73c9 with SMTP id n1-20020a5d6b81000000b0021d72a873c9mr13024174wrx.630.1657194708086; Thu, 07 Jul 2022 04:51:48 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:47 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 01/27] ipv4: avoid partial copy for zc Date: Thu, 7 Jul 2022 12:49:32 +0100 Message-Id: <0eb1cb5746e9ac938a7ba7848b33ccf680d30030.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 00b4bf26fd93..581d1e233260 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk, struct inet_sock *inet = inet_sk(sk); struct ubuf_info *uarg = NULL; struct sk_buff *skb; - struct ip_options *opt = cork->opt; int hh_len; int exthdrlen; @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; unsigned int maxfraglen, fragheaderlen, maxnonfragsize; int csummode = CHECKSUM_NONE; struct rtable *rt = (struct rtable *)cork->dst; @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Thu Jul 7 11:49:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D98FC43334 for ; Thu, 7 Jul 2022 11:51:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234163AbiGGLvx (ORCPT ); Thu, 7 Jul 2022 07:51:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234847AbiGGLvv (ORCPT ); Thu, 7 Jul 2022 07:51:51 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 037D153D0E; Thu, 7 Jul 2022 04:51:51 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id bk26so10878971wrb.11; Thu, 07 Jul 2022 04:51:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dYMbDXosaDJGTx1SMZg80SJyMd9PTOfSaObug9b0vzk=; b=oJVIkX7+2mHbvLuNu/u98vd92tZ0pCmSw7ZvJvTU9TV8S6d0b9S/g6p0Hr00d8LSJO xTpNllKVbC7MX6MbuQMH22+lwJnJuULFSgEyW2NCw1sfB37xvvlzt0DUPtGV6o6sAxpy +41wY3VkiUbDZwk+zqZZIyBHNVjSsOI66Ad3b+F1kP90AyawnhxxyniaquEKCKqvpI1G 8J5wreGKmpG7LTYH4vbFzkiG0JNmSkvL0w0q8alKNEqxBH0kvXyAhrQ+Nk6Qc5tF36qY 9HlNHoe/ACqJ1gVbYmLFngSjQSI0j9kUtcHjoH6d8PilwoeFJG4nx+Mv20fvfcEIvpbP oTCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dYMbDXosaDJGTx1SMZg80SJyMd9PTOfSaObug9b0vzk=; b=comJHATdJBpL69hdyJ+Edti2MXL4iN607EZeSSaWcAV5b/wknYYkTWQKEQTLsFV/ub +76LsqJGM2Xis3FdiKq32hXu+sauAe6ZodNHeAuKuGEycLosNIUaOOnRPNNbz+PRnjfX GSql0bBEaytsugvMxa+1rXGbg+eA41qjWR2dRo3i8wcXscCKt4X1n1U+Kgeb+XyaA5Ns iP95b4fVL3CMF9B5LEst+q5RKT55kz4tcYCG0CaFNYOAYZLpDa81DhfGLKl/J8aAoSPw 2FCYwfHZy9qOx0M0RYVM5JwBX1hKBXurttEE0dIMxxh6GY9OxQIbeczp/fnI13wYSZdk 9wLw== X-Gm-Message-State: AJIora9c6cyvgz6g4Oj8dlJcSxkr6jT8deoGnvb2Ra4wZsOvqe1Lqzoj sOHqENlkglnWK1d27RHwKga/mXDJwX7/vo8fP0g= X-Google-Smtp-Source: AGRyM1vZHEBOHDdz9yTiW95DrD6gYc81aykfV7vdaykM8BiB+5wBQ6iEn87XZZItiVC5P9HI84WrSw== X-Received: by 2002:a05:6000:2cb:b0:21d:7760:778c with SMTP id o11-20020a05600002cb00b0021d7760778cmr11450539wry.329.1657194709350; Thu, 07 Jul 2022 04:51:49 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:48 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 02/27] ipv6: avoid partial copy for zc Date: Thu, 7 Jul 2022 12:49:33 +0100 Message-Id: <899f19034c94ce4ce75464df132edf1b3a192ebd.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 77e3f5970ce4..fc74ce3ed8cc 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1464,6 +1464,7 @@ static int __ip6_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; u32 tskey = 0; struct rt6_info *rt = (struct rt6_info *)cork->dst; struct ipv6_txoptions *opt = v6_cork->opt; @@ -1549,6 +1550,7 @@ static int __ip6_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1630,9 +1632,12 @@ static int __ip6_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Thu Jul 7 11:49:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03CB6CCA479 for ; Thu, 7 Jul 2022 11:51:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235189AbiGGLvz (ORCPT ); Thu, 7 Jul 2022 07:51:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234977AbiGGLvx (ORCPT ); Thu, 7 Jul 2022 07:51:53 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B3C84F65D; Thu, 7 Jul 2022 04:51:52 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id q9so25948185wrd.8; Thu, 07 Jul 2022 04:51:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AAPF64NM5veV9zuD1EMKVBq3GePhsL9jU8aHYc7hgeg=; b=BBX55NsOeI35IxC+ogKWcRNOW7ytm7tu8BkiB+WLNHc8y31QZqD6tbBcLIsnO7CnN1 MCRt0dkOB4D+5ZO+EtPLhCmRyP9si8kl+Gre7wogUZ1Y0fmeQcWF3oHh6zD+ul+QjF3L CJhFX8aiF4n+d7p7bzcildSrjJRsJZ0MZ9rSyB+6CnJNA/8XTlAPMqAhE+5lHqVREPbI yWFCQv2r9TTrWhr3V0Z1VaC4PyomQGpnZUI/Nu9S3dXXkQkMfemGW85yf+4WdtkgjmUL 8QMdzKVPqGxOMZqfAIKnLfG0q7MSvqHEvo3LkFsZImaz6ewPklOWvODG8qICqvVn8r96 h9WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AAPF64NM5veV9zuD1EMKVBq3GePhsL9jU8aHYc7hgeg=; b=ghm3VQloCN9WlCzJx73pwWtmNB/sb5PjPjD4i+g5HOb1XPFCn2HXfn3VN5pylpeRBh HUccWwjHIwlQLLhSbXhrq0asXQqEF1cJHkV9iLSJj5/Z+yIQliySUPHFftmGSN1KY8Q5 M5V1QX+OU67OMJumTstmXhbHPgt7rUMG5XcEe2p+7NEH39JlS/dEg+WzvMAxhGczriOo j1NMMBvnd/9yzpiI85KMN3PfVSCIOKHincuqQqdIFQL21UFJHK4GEaJMbzB9NriKehC8 Nw9zL5C3w+yOZjW6jzdd1PKamMM+y3nUlWFhEUMEzQ+agstXeOD+oIO3YfXgxPzLFSKm YLhQ== X-Gm-Message-State: AJIora9plbjnVlODCy2Frg0TX549nAiqEeTCbJWcQCKKWbOTPIUteNvY IFXaSOaj0Dzo0Ka49vIYwodUyLrB13IAlkPl9cs= X-Google-Smtp-Source: AGRyM1vLeNDsB+tciFqXTTT+ToeahaP0DGonP10+Rc9J2nSubBDc7clFEL5ZjVyJpRbl8o7H8eL0cw== X-Received: by 2002:a5d:64ad:0:b0:21b:b412:a34b with SMTP id m13-20020a5d64ad000000b0021bb412a34bmr43133664wrp.161.1657194710401; Thu, 07 Jul 2022 04:51:50 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:50 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 03/27] skbuff: don't mix ubuf_info from different sources Date: Thu, 7 Jul 2022 12:49:34 +0100 Message-Id: <8fc991e842a43fef95b09f2d387567d06999c11c.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We should not append MSG_ZEROCOPY requests to skbuff with non MSG_ZEROCOPY ubuf_info, they might be not compatible. Signed-off-by: Pavel Begunkov --- net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b3559cb1d82..09f56bfa2771 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1212,6 +1212,10 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, const u32 byte_limit = 1 << 19; /* limit to a few TSO */ u32 bytelen, next; + /* there might be non MSG_ZEROCOPY users */ + if (uarg->callback != msg_zerocopy_callback) + return NULL; + /* realloc only when socket is locked (TCP, UDP cork), * so uarg->len and sk_zckey access is serialized */ From patchwork Thu Jul 7 11:49:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E82F1C433EF for ; Thu, 7 Jul 2022 11:51:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233808AbiGGLv4 (ORCPT ); Thu, 7 Jul 2022 07:51:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235223AbiGGLvy (ORCPT ); Thu, 7 Jul 2022 07:51:54 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42D4A53D0D; Thu, 7 Jul 2022 04:51:53 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id v16so14509975wrd.13; Thu, 07 Jul 2022 04:51:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ejPnmCaORJdrMyGe4YE1VTQPE0oM/mTMwJ4lvV5CWyQ=; b=SK3A5sYXIVyTW3AVh10tPpO6KnSEQ4egjekYG6KZGAt+mCECxdG6mTPJ2p8ZWL5QFQ ouawLTWIMwVli8N8ljpIQvzIt1i0bcfMVXvjSO0cxiykpwts/Nc3OtQWaCnnhpW0SLUf hnfyQe2KuUpjk20kTFc92AmVn7sy9HrrlYiKEy2IqYe3NX9aOTLZZC6teTEa5va2mxPp t4mNV5ffHSaQrinWTcwMLEMTjEPiP9JKHC2ioeb+nFpvVg6r+Sobu680b3CcqD5L4vIz GO7Dn53HG5Tkza0e34Z6SuUNCsNlFa4EJsGADSd7+mXJD2RaMMQXYzRrdJBBJPHn11v7 uhmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ejPnmCaORJdrMyGe4YE1VTQPE0oM/mTMwJ4lvV5CWyQ=; b=HQx/x8KbUSLgCPo/6ynmRb1OkoDXs6/ntbBIye0XmJtU7lQkSS8bVubGcI1VaOXGW+ T8zU9BouiqgjdQcaizIigWy4P6qz48ibCz+BISKTR5e1quLTgpLhczPkrXWia3xoRcWA qxSr533mFRyjps291nG5nMh2VOPP36Zb4CcsfPBkYfj7KHUp0m7G8R0sU9PHZ1JkS2b+ b/LQ8CWJnuU5cov2M1tvBjJ9sutjnSXYgG3vpSthf5c81w5PosSmB0W64FjzBc/3tlNl Zgj3f2NOhQvlnqumzouRQYjPH/ZvjwkBuib8WQuwJzlKzlRFJuhBJV97HMePtSO5ikmh 8qfw== X-Gm-Message-State: AJIora/X41tyvQGKCwWbuZowgg/RHofLRBqAI+3CB2bNIf9Iv7jIWlwe d8kBwIKVwBPh4OCEurt/YJWiEaceQCsryK3QU9Y= X-Google-Smtp-Source: AGRyM1uytWO0KV2ZLFjSm4t05Yvv+I9jZy5pafRH9FHurUA3jy0l6i365aVoG7gkydeS5fXo6NxPTw== X-Received: by 2002:adf:ec0f:0:b0:21d:7771:c3cb with SMTP id x15-20020adfec0f000000b0021d7771c3cbmr10846010wrn.81.1657194711560; Thu, 07 Jul 2022 04:51:51 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:51 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Date: Thu, 7 Jul 2022 12:49:35 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 8 +++++--- net/core/skbuff.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d3d10556f0fa..8e12b3b9ad6c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -686,10 +686,13 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) -#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY) +#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ + SKBFL_DONT_ORPHAN) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -3182,8 +3185,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 09f56bfa2771..fc22b3d32052 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1193,7 +1193,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; - uarg->flags = SKBFL_ZEROCOPY_FRAG; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; refcount_set(&uarg->refcnt, 1); sock_hold(sk); From patchwork Thu Jul 7 11:49:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1187FCCA480 for ; Thu, 7 Jul 2022 11:52:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235218AbiGGLwE (ORCPT ); Thu, 7 Jul 2022 07:52:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235297AbiGGLv4 (ORCPT ); Thu, 7 Jul 2022 07:51:56 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7354F4507D; Thu, 7 Jul 2022 04:51:54 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id l68so10472623wml.3; Thu, 07 Jul 2022 04:51:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9kzjVmEHp0xWSH6Tr3m/YVyGiLjtMkCIxQTqGUOMCmQ=; b=kjucAiROc06+6tyOEb4nnt9d67DZpGghK0fN6luEXyXIxpiBQGj/+sSZyIUGAY73bH 6zY21G8rpAaPLZlyVTnH5he6cCFoI2bo/fNFN3qkOzOGEkndq3VAea1ZlkMJbDtgqgfq m9iyv97G8YpXMxHqflK41NskttI1PudpozTO+o/m9VbGNcWIISL1TQu0XzubQ/sZV+He ry4YWe/fxSMA3BrvGPD7rV/11NtwuMC5ALQaQehWduDmYya5tst1ZgCpJy9gpa6lT5SF vmwhd1pfSy67byLo67LBhTdG2IzcuXFJhGl7Ccdg7insjgMzxk6ZQjjx2qVNVrB4/fMt QrsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9kzjVmEHp0xWSH6Tr3m/YVyGiLjtMkCIxQTqGUOMCmQ=; b=1cZfPpdIkHr1kjN+tKcWFe7UerSXg2cocchTlRmS4iuvipyPcL9eK+LdqI2y7SK2nH D+v6rUsUTYd4cgEmyN6R1bzv65P2zPNEfqng3ryGYV7UG7vsPLyqxy31LHgQ3nKTMVl+ ksK49L+S/PALorrnVkR2t7hQTg+5xPLOA5sKY/bv8jwVjGJrwNiVAc4dnubHMWo9rI5r Exu8kBY503VDKCk4SazIx2ZN6299uo1Rj001/tshvvn16im6JmENll0apfE/Jw3T7cDg pXHVRhPE+/b6EydCmJqD1N7GWbZ+dXWCu84YPb42NsYr2YOXr7IAibE7Ct2X3LXNzERY qIcw== X-Gm-Message-State: AJIora/lFS45lsgK6zIff5WWKHz5Zzv2Myx9G1Ilj09YF9dYIDpeLSk0 l1+qIBTnVp9fa5nFJx1E/oYlM9Tjst7aeKwcTLA= X-Google-Smtp-Source: AGRyM1ttWJHRhPymQcdP4SpvE8yHJStZN/mxeyktm3r0+NL+ImAbRHLdSF20kBZAG1USwqjr8olA8Q== X-Received: by 2002:a1c:7306:0:b0:3a1:8ed1:6198 with SMTP id d6-20020a1c7306000000b003a18ed16198mr4082858wmb.122.1657194712694; Thu, 07 Jul 2022 04:51:52 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:52 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 05/27] skbuff: carry external ubuf_info in msghdr Date: Thu, 7 Jul 2022 12:49:36 +0100 Message-Id: <2c3ce22ec6939856cef4329d8c95e4c8dfb355d8.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov --- include/linux/socket.h | 1 + net/compat.c | 1 + net/socket.c | 3 +++ 3 files changed, 5 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..7bac9fc1cee0 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -69,6 +69,7 @@ struct msghdr { unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { diff --git a/net/compat.c b/net/compat.c index 210fc3b4d0d8..6cd2e7683dd0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -80,6 +80,7 @@ int __get_compat_msghdr(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/socket.c b/net/socket.c index 2bc8773d9dc5..ed061609265e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2106,6 +2106,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2171,6 +2172,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2409,6 +2411,7 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0; From patchwork Thu Jul 7 11:49:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B27A2C43334 for ; Thu, 7 Jul 2022 11:52:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235463AbiGGLwD (ORCPT ); Thu, 7 Jul 2022 07:52:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235299AbiGGLv4 (ORCPT ); Thu, 7 Jul 2022 07:51:56 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6D5353D16; Thu, 7 Jul 2022 04:51:55 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id f2so20573459wrr.6; Thu, 07 Jul 2022 04:51:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ESIcpqEeOE2H9HSAO2IL96J2/r6CeXO4wnI7QDaH5RY=; b=ETAGPzQ+PnLOB4EYa0scFESA86bruvRcElk9b9Y4U3QaxLYnY3HEapY5KF7vHHOhcE Jru5X2GfLfm4tWBUt2/5iGhEt26ap25EeNSY5wfDQ6ilDrmZdzjKNLxcOunXTdpQI5Tb zoIsE6RmmQFZlv1kJrIUiMzftPYNnBXaCRl8PdLF9ghbDgEPlnQN4G9mp6vwoIss3VOs ipfLivLlFlbnldpMsLc4y75Hpx/lSQmTNwxiNCl4uKOC5/UoP7HCIx9FTW4QAh7lJvH3 aYXy15Nuc9rL3Y6/tjZqBccAULgSunpasWfIdooNQLiLGIgpJ4mzfxFfHG9FyzEVB98w fRpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ESIcpqEeOE2H9HSAO2IL96J2/r6CeXO4wnI7QDaH5RY=; b=EmXa4TvjUczQpIbt3fmcfInfALHw1SiYglHtJlAlIa8OJMgafUqQpTQpe1IT88PoeI OyygbNd67MS3iCzAc6tKJro8D9LkabqzQmmOzhuucaG0BEpm1wIKTKxVMXlgH1+vOjQo 8Rof+UCt/p37EhZwk8NxWV215/EqRacEMT6Lv1IBjZfzcjDgRsTm6rk55uiOo1RdeOPC uW01cR+XR33Gx8NT2zSr1MroDSKZa8wxL+WGXHuFTe/7cXkEOz115Ehh1AmpzEUL8C2A g28FyJRi4xP46YY1/FkqJAItyOndlwrJmfbHoSRrRDhhA1PPrcfpbo8L9M1c/hu2EACw V5Sw== X-Gm-Message-State: AJIora8FfxcS/iwUTp7zn7IFLFNkmP+8ARSV/ShLi3/Jq2YhYCzCrJSD cXpN0df8dD3sTlubWpdkmCDS+g/c0uCr63U3ayc= X-Google-Smtp-Source: AGRyM1v14X8QuP8NSNXvusQcO1+QkRk5Shvjz3de8tUHmqUpdtLeG5GxP3tUhITkWq0urrK3mZ3Z/A== X-Received: by 2002:a05:6000:49:b0:21d:78fe:34b2 with SMTP id k9-20020a056000004900b0021d78fe34b2mr9887745wrx.200.1657194713955; Thu, 07 Jul 2022 04:51:53 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:53 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 06/27] net: Allow custom iter handler in msghdr Date: Thu, 7 Jul 2022 12:49:37 +0100 Message-Id: <968c344a59315ec5d0095584a95bb7dd5a3ac617.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org From: David Ahern Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split. Signed-off-by: David Ahern [pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 7 ++++--- include/linux/socket.h | 4 ++++ net/core/datagram.c | 14 ++++++++++---- net/core/skbuff.c | 2 +- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8e12b3b9ad6c..a8a2dd4cfdfd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1776,13 +1776,14 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, bool success); -int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length); +int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, + struct sk_buff *skb, struct iov_iter *from, + size_t length); static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len) { - return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, len); + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len); } int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, diff --git a/include/linux/socket.h b/include/linux/socket.h index 7bac9fc1cee0..3c11ef18a9cf 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -14,6 +14,8 @@ struct file; struct pid; struct cred; struct socket; +struct sock; +struct sk_buff; #define __sockaddr_check_size(size) \ BUILD_BUG_ON(((size) > sizeof(struct __kernel_sockaddr_storage))) @@ -70,6 +72,8 @@ struct msghdr { __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ struct ubuf_info *msg_ubuf; + int (*sg_from_iter)(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length); }; struct user_msghdr { diff --git a/net/core/datagram.c b/net/core/datagram.c index 50f4faeea76c..b3c05efd659f 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -613,10 +613,16 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); -int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length) +int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, + struct sk_buff *skb, struct iov_iter *from, + size_t length) { - int frag = skb_shinfo(skb)->nr_frags; + int frag; + + if (msg && msg->sg_from_iter && msg->msg_ubuf == skb_zcopy(skb)) + return msg->sg_from_iter(sk, skb, from, length); + + frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; @@ -702,7 +708,7 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from) if (skb_copy_datagram_from_iter(skb, 0, from, copy)) return -EFAULT; - return __zerocopy_sg_from_iter(NULL, skb, from, ~0U); + return __zerocopy_sg_from_iter(NULL, NULL, skb, from, ~0U); } EXPORT_SYMBOL(zerocopy_sg_from_iter); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index fc22b3d32052..f5a3ebbc1f7e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1358,7 +1358,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, if (orig_uarg && uarg != orig_uarg) return -EEXIST; - err = __zerocopy_sg_from_iter(sk, skb, &msg->msg_iter, len); + err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len); if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) { struct sock *save_sk = skb->sk; From patchwork Thu Jul 7 11:49:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9093CCA479 for ; Thu, 7 Jul 2022 11:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235264AbiGGLwK (ORCPT ); Thu, 7 Jul 2022 07:52:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235362AbiGGLwD (ORCPT ); Thu, 7 Jul 2022 07:52:03 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE92D53D1A; Thu, 7 Jul 2022 04:51:56 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id v14so25969665wra.5; Thu, 07 Jul 2022 04:51:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KdxwbQTLr44vrShJt6dX+A6d3S9sOOf0L7GmeCfw9ec=; b=BDp52qCVies3QKHtxYshYvVtFcBQdL+esYa9/FetrmrCVkPUFp6VkmSBOby/TRqtBJ O1059pmQdBiDLHP1G74u7n/jtDwDicVU3URSzDBoQGkDWOSLuGS1fzcGgKClP9Ves0M3 pfsr2SWuWvOf2c2ZtKVyhc4Sodewsgl3bgZgyYPY7y6A1rRHsMJyv0P2K9bw84zUpzUb 5nLBW+9u1G3BFc7OsmmIhO65nF2TczeKfcfLuAy1ExNl7qAdtjXt82Nthla7ofJzMGo8 j+429cZ1lHRf4VaxLTv+e36TqToLxEr8dB47DduX9X64y8P2OZweVKJttUup5FRLhagJ e6qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KdxwbQTLr44vrShJt6dX+A6d3S9sOOf0L7GmeCfw9ec=; b=QqqYOSJ3odKGXLEIt5955U3A/QY8RfBZjJKpjeWBEKRn3EhkmwH9E5kD8J0fflM/nK XnZoy949Ey3vNZZhtBtEZXGVWsmP7K7w9ja4SWKqF4n7HnaWK7ZYdTmWvS4IUa81MKTH v6ZsGWoxZShMnStVjhnJgSTnBSutQw6Ux6da0HRUO8msYaiEHS2mAOIvbNbM/Q5Q4hIw GsviUWVNDElPlmbLvXgRan2/izgVXjBVr90PvdfUKoLttMfTvYOKdD2pNIwDpA3Bdf1b nOQHb07HatI6spx8s39QG7kwQv5xcg8rfE9GjU09t0c/+I809v4sEYhGyksz4Til+tKx /dQQ== X-Gm-Message-State: AJIora8e43L/TC5XfanVYOjE4YtYeoHVXw3T4w07KiGT/WmrfFiVRoQt EICQI8r1PYGQwVOZSxMokAVR8bhWB4YxR8zyUjE= X-Google-Smtp-Source: AGRyM1t4mffcH23icEmwpizHWFGXETGtCzJF8yecoowmb2GCmFnd0Tg6IBBiVNAREz2hYrbg7MTLTw== X-Received: by 2002:a05:6000:695:b0:21a:3a1a:7b60 with SMTP id bo21-20020a056000069500b0021a3a1a7b60mr41106126wrb.441.1657194715032; Thu, 07 Jul 2022 04:51:55 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:54 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 07/27] net: introduce managed frags infrastructure Date: Thu, 7 Jul 2022 12:49:38 +0100 Message-Id: <088d3480b1ebc687fe7cbfc00aec2ff1c33a72c7.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Some users like io_uring can do page pinning more efficiently, so we want a way to delegate referencing to other subsystems. For that add a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold page references and upper layers are responsivle to managing page lifetime. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. It's needed, for instance, to avoid mixing managed modes. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 25 +++++++++++++++++++++++-- net/core/skbuff.c | 29 +++++++++++++++++++++++++++-- 2 files changed, 50 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a8a2dd4cfdfd..07004593d7ca 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -688,11 +688,16 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAG_REFS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ - SKBFL_DONT_ORPHAN) + SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -1810,6 +1815,11 @@ static inline bool skb_zcopy_pure(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_PURE_ZEROCOPY; } +static inline bool skb_zcopy_managed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -1884,6 +1894,14 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) } } +void __skb_zcopy_downgrade_managed(struct sk_buff *skb); + +static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + if (unlikely(skb_zcopy_managed(skb))) + __skb_zcopy_downgrade_managed(skb); +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -3499,7 +3517,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!skb_zcopy_managed(skb)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f5a3ebbc1f7e..cf4107d80bc4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,11 +666,18 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) goto exit; - skb_zcopy_clear(skb, true); + if (skb_zcopy(skb)) { + bool skip_unref = shinfo->flags & SKBFL_MANAGED_FRAG_REFS; + + skb_zcopy_clear(skb, true); + if (skip_unref) + goto free_head; + } for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); +free_head: if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); @@ -895,7 +902,10 @@ EXPORT_SYMBOL(skb_dump); */ void skb_tx_error(struct sk_buff *skb) { - skb_zcopy_clear(skb, true); + if (skb) { + skb_zcopy_downgrade_managed(skb); + skb_zcopy_clear(skb, true); + } } EXPORT_SYMBOL(skb_tx_error); @@ -1375,6 +1385,16 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream); +void __skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + int i; + + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAG_REFS; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) + skb_frag_ref(skb, i); +} +EXPORT_SYMBOL_GPL(__skb_zcopy_downgrade_managed); + static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig, gfp_t gfp_mask) { @@ -1692,6 +1712,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, BUG_ON(skb_shared(skb)); + skb_zcopy_downgrade_managed(skb); + size = SKB_DATA_ALIGN(size); if (skb_pfmemalloc(skb)) @@ -3488,6 +3510,8 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len) int pos = skb_headlen(skb); const int zc_flags = SKBFL_SHARED_FRAG | SKBFL_PURE_ZEROCOPY; + skb_zcopy_downgrade_managed(skb); + skb_shinfo(skb1)->flags |= skb_shinfo(skb)->flags & zc_flags; skb_zerocopy_clone(skb1, skb, 0); if (len < pos) /* Split line is inside header. */ @@ -3841,6 +3865,7 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, if (skb_can_coalesce(skb, i, page, offset)) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size); } else if (i < MAX_SKB_FRAGS) { + skb_zcopy_downgrade_managed(skb); get_page(page); skb_fill_page_desc(skb, i, page, offset, size); } else { From patchwork Thu Jul 7 11:49:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9BDDC43334 for ; Thu, 7 Jul 2022 11:52:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235346AbiGGLwJ (ORCPT ); Thu, 7 Jul 2022 07:52:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235390AbiGGLwD (ORCPT ); Thu, 7 Jul 2022 07:52:03 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E001253D2C; Thu, 7 Jul 2022 04:51:57 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id f2so20573581wrr.6; Thu, 07 Jul 2022 04:51:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=38y10vFv9/dzfeTS95yhvVY4RWynqtjJymBn4qotNoc=; b=cC51W+cZKEiMnrVGBLB0kFFWKPZFRmY+7e4J+MuRyiVShB3JHlXRoSwxggZXRqnybp +qoYnDfJRKQoSkpUUntaebh32yzd9VrAyA/DVuMd8dgcC034wD06zEiRvqvzRpdP00S3 IZOisFgjCs8rks8JUSWyNqowv+WUbxGGynTeQUOMikvKrOYMVnOIJ9ayz6T2ccJyLJ68 WiAes3MaXgFhXzuRwujmv6hhX/hFfHxjt+Xj72FQ2bjcGGflUhuCgTF1cRW+qkS1/MJX MrH3vfwi29sMGMWHPHzfzUs556ODGqva0C9sEHbQcG91CTeQDlYxlYdIRT1RG8RSj+YQ 4oTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=38y10vFv9/dzfeTS95yhvVY4RWynqtjJymBn4qotNoc=; b=HL0uc80I8IGyUNWg0xmti3YlmuW/3gcEr+OBKD3emE2YHnRcaWnhtq3DbpDF1uhmZ7 HDCgqmo/kopcYUtDC3DuXePL3aCFCO93xl/gQwfMqhC1P3K02pEUS7MlHXpwtPhQ8qMf NEK07JMbbnVIEPi4vT6HPrPYk7fsiGfiN/iOtdoti3cJYQMQfWWE3O7kvC9v/vVrqIVD uHbE1dyxsPZiavt1lAv7LbeH2vvidMUGS1HtdvQVUyxSDoGmYCOBZJqIAwufV2syatrI cIH4+zZcXQPIE92E1KRIyPUQAVIimVlWX9Qm/tAFryt63pF3RAGB4stN7sG9FdG9HI0t /MVw== X-Gm-Message-State: AJIora8CAcS/VH+g8Xi51fIYxc30OfouyT66VOWHl4rvoZ2SomU132O9 lokU4eOy0uG6o4PDiffITeZsJsxZaJKqAPl7ccU= X-Google-Smtp-Source: AGRyM1vRn+ca2TImFjbeECRSRwiJaUDhfShVRS1O4cCXWgIrrdl/8rTXI9U4IesXPwgYCt9OpkyX7A== X-Received: by 2002:adf:e786:0:b0:21d:6ec1:ee5c with SMTP id n6-20020adfe786000000b0021d6ec1ee5cmr15305282wrm.285.1657194716136; Thu, 07 Jul 2022 04:51:56 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:55 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 08/27] net: introduce __skb_fill_page_desc_noacc Date: Thu, 7 Jul 2022 12:49:39 +0100 Message-Id: <48fb9ca4f207a31b69ef47a92f6c0c28d390af33.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Managed pages contain pinned userspace pages and controlled by upper layers, there is no need in tracking skb->pfmemalloc for them. Introduce a helper for filling frags but ignoring page tracking, it'll be needed later. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 07004593d7ca..1111adefd906 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2550,6 +2550,22 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, + int i, struct page *page, + int off, int size) +{ + skb_frag_t *frag = &shinfo->frags[i]; + + /* + * Propagate page pfmemalloc to the skb if we can. The problem is + * that not all callers have unique ownership of the page but rely + * on page_is_pfmemalloc doing the right thing(tm). + */ + frag->bv_page = page; + frag->bv_offset = off; + skb_frag_size_set(frag, size); +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2566,17 +2582,7 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - /* - * Propagate page pfmemalloc to the skb if we can. The problem is - * that not all callers have unique ownership of the page but rely - * on page_is_pfmemalloc doing the right thing(tm). - */ - frag->bv_page = page; - frag->bv_offset = off; - skb_frag_size_set(frag, size); - + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; From patchwork Thu Jul 7 11:49:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5692C43334 for ; Thu, 7 Jul 2022 11:52:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235231AbiGGLwF (ORCPT ); Thu, 7 Jul 2022 07:52:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235453AbiGGLwD (ORCPT ); Thu, 7 Jul 2022 07:52:03 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14E8C53D34; Thu, 7 Jul 2022 04:51:59 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id v14so25969780wra.5; Thu, 07 Jul 2022 04:51:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZTUZLAGD0xg3XR59NurEq6qMM4sEJfudbtjDr6igBMU=; b=ZbaM22Itbey8Qmo0qJC2eAgd5W0HEUHyRE0WU4J02X+R5zQNCzeE1SVXfLz8VAmTaH 2rD9xSog+LzLTPzMKc5QxEVP7y1HdtWlhPrjhj5nYIOfdXKMKPc/mmjttMy0Pp3+TVVb sL3qfvDF2zLuid2DZU87vS3xb7fHXOAmXeqeqAuBrR6UsXa1z5gFqAAC8K/AtUyUuJ9I UFa8n9MPxWxYp5PYUcWlcBt4IQxWk63q5pGJqESOWn1OJhJkUfYPilzVixuMmAbFw/85 QV116M+knsxqG4Jii6abFksP1ohbd2sggwzdFBcH8He4O7bIfe4LM+55J9lwXjKvbWex EmRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZTUZLAGD0xg3XR59NurEq6qMM4sEJfudbtjDr6igBMU=; b=mc9f5zVU9bICs+VF+mT8WMKwvyZeXQ9lTocHO8viRPsP+SxZSYwVd92DEl7/oyh+nM 0zgxnEHEy+UNqMKEQJywzD9G65Lql3hJXTBnHr3aPv0kF6AZKekf4HVBGeSvHTOzId4P F0fWrx0ja+K3oo3dR3aacxoLjCh+NjZR2tXAtSKCRzt6jHt4mXbn9zfU8i7Mr5y2ZPAP gEHNLKWybSYkyUEXOjKPzlE5Wb/+L/6uDEWXfDdrUxUel9qh1oY+1JjA/EWAdBcoexb/ xJ85wAeNNaX64/olJbbJo2t/xI1UJfPzgSG/IaawCZuKgz3bdx3n3Ymnv+AgT0HR7rsj PhGQ== X-Gm-Message-State: AJIora/qARRnLk++OK17ZUu7Y70mTZihYjcqnjb9FJawLHPLCPFPXw4Y /qZdMgyRo4K+aR8jdtBgCLrvYHqWQlYZXODS9E0= X-Google-Smtp-Source: AGRyM1tIqtvYjP7ZoJQ2q5+WCvqda5YiQWyK5C2RTxsJkCp3EAIBnVSCnRJqpJersAhf/CmaeMeoyg== X-Received: by 2002:adf:e405:0:b0:21d:86b6:a286 with SMTP id g5-20020adfe405000000b0021d86b6a286mr2609936wrm.29.1657194717264; Thu, 07 Jul 2022 04:51:57 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:56 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 09/27] ipv4/udp: support externally provided ubufs Date: Thu, 7 Jul 2022 12:49:40 +0100 Message-Id: <4b97ab89f424a2e84f6a0a58d6e7baeacbcb6e6b.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Teach ipv4/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 44 +++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 581d1e233260..df7f9dfbe8be 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1017,18 +1017,35 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1192,13 +1209,14 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; From patchwork Thu Jul 7 11:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C3C4CCA482 for ; Thu, 7 Jul 2022 11:52:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235492AbiGGLwH (ORCPT ); Thu, 7 Jul 2022 07:52:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235391AbiGGLwD (ORCPT ); Thu, 7 Jul 2022 07:52:03 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0389153D33; Thu, 7 Jul 2022 04:51:59 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id d16so19626743wrv.10; Thu, 07 Jul 2022 04:51:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=++tIUpQtGSkeKjCS0jHLC0PODcc+eo2SODOf2a2LF1M=; b=A53V5STeaTnol4pMhOzEPuKv2y4iLeid1sqRq9e6jYzGWfsZiwJtnmmGunpPNAPt0b b9EC5W1t8FKIqAPiP+d8XwpQi/YZANO+ub6l9RcBUc2nC57FR1GNIEj0Lj/2deNwT6wJ frGuC/DjVxbLgLhaL96tlf8vCdxb6UjG1tBKDIhKY/YqBmK4PfAz1XSUJaEf0DiSbBh8 NPvQ6zY3wRMlXENkhiqBFD6cbAnvnTRm6YynGZ/8z1tt9s+16clnAgjNmxWFB5+bhJE+ IVUqcAM93zbHmGY5DmmjrMHCkLq44q6r/OY0qMrgQBC3I3DX0w0MQUugQ2pZVXViGjus 0IOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=++tIUpQtGSkeKjCS0jHLC0PODcc+eo2SODOf2a2LF1M=; b=PEpQPEFcUQp4c/wZvYYWBxB2w1IGhXbisFSyZY2KovgfU/GME0tjPNmiabkHxMZw+B p9D+akHxgLK+N4JAInuj4LR9Uke3Qf/YT789UENcwt/65KfaqIkGYoKNf/r9/8ZCyOZo 99KTCqsfM3QDzmo0yMZ8qHErYcDNKGNKqCZyBnQdax4OSwJwPZLqGglZaiU/VASxZ+0C bqZFWtiZZpDsGufe1yotXYTNtiEPYMSVl55miihB/lboYzKYudMs0BRQDnr3+yZmamRQ v7RUB4rqMwCz3KdxuF4MutuJfoxhGEef24/Et9xmnzWxAqiJZrlsHl0hZYcoeI5ETE+e DrhA== X-Gm-Message-State: AJIora9yWDC4GmhR+Q8Pej1INgmwJD51WGo/5s6vPoRSZu1oK+PuNo9B P/Nsirg9S5gLnOSZaZzChu1Rf2vSWV0KSnfUz/c= X-Google-Smtp-Source: AGRyM1vCf9z1ZEc+5Tas7+SnDucHB8uJjo8JUvx3cZcEQ7zLw6Adl7taSXYxtdIyIbaz1p+seuKcKw== X-Received: by 2002:adf:dd87:0:b0:21d:6ec4:26b0 with SMTP id x7-20020adfdd87000000b0021d6ec426b0mr15484890wrl.182.1657194718343; Thu, 07 Jul 2022 04:51:58 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:57 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 10/27] ipv6/udp: support externally provided ubufs Date: Thu, 7 Jul 2022 12:49:41 +0100 Message-Id: <6d56b7c5ddf8add5b2a887dbd734c060e29ca6b2.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Teach ipv6/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 44 ++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index fc74ce3ed8cc..897ca4f9b791 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1542,18 +1542,35 @@ static int __ip6_append_data(struct sock *sk, rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1747,13 +1764,14 @@ static int __ip6_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; From patchwork Thu Jul 7 11:49:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 822E0C433EF for ; Thu, 7 Jul 2022 11:52:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235503AbiGGLwV (ORCPT ); Thu, 7 Jul 2022 07:52:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235476AbiGGLwE (ORCPT ); Thu, 7 Jul 2022 07:52:04 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19564564C0; Thu, 7 Jul 2022 04:52:01 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id m6-20020a05600c3b0600b003a0489f412cso795450wms.1; Thu, 07 Jul 2022 04:52:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=530Kj9fyhb4BAaf8FxSCXZ47V34wdCD15bBkxNyM1cg=; b=Xf4aZke/gjdB271cikF+a8GkXtrb/Sa9rjGTmUxWDuPpGwXHnBkR17Nn9l4GWAAAvf yhw6X/L/dDh8xOXl80GBKjiPISyAW6J8Ez+9183rxiBqcFSezxJsFtMSNGR9q7UDfhq5 vkCbXEtvCXrXVqc4EA0RHmr4Gp9vMacJ9KcL5BgoYAVWn6Pnqy3bPgCOIQNhSCfspnIU e5WAxf22gSS8xlg/dQHgsA5M/pjSXWZsivseN4AnlP1BCViNg/z5AiUlF3yjc61DQeQu RWGlg19lv/9wLkATHYj6wjhgF4W893Z1WeM2q6F6tOLPNDYvqgqGlb5b1fJVyrhAjgE2 WoLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=530Kj9fyhb4BAaf8FxSCXZ47V34wdCD15bBkxNyM1cg=; b=8C8VsN4uRfrpgzt/w+pSf5j2qB06cjTYC+MWRIvETDKSAhpEFQR8uAXDPCg8vITscm Kw506kmMJtmMyeb75KrxMsz44tIBZrswQyu7I02GOAuwAdtt9AV5slim3muCPXpe74/v BgaBBG81yr151IFV7aW+Pnk0iTqbZZCBnmPCorJNipV9+Uy88fc5V8P+5rJiXu0TErgD o7pHcS/rmBKBpwGGjth+LP6DbsGVP8VNdN3dA5U/QJpbvQRkFZmqm9PA8j2d7iOmPNRR wL3GGj7AzNzODByCIpjyrNae6zil6pAZGv78DJyBqWjfVY0jdVgdvVcWcS7nigrGudXR Z1kQ== X-Gm-Message-State: AJIora+Qs9Ue/ceG7yMMCMqTqWRwQ3TxNhWVg025yb7GnYUe+7iyoo/h lN2klfdn3Wg497onGNywIJ74LjN3IfUhAv1J74E= X-Google-Smtp-Source: AGRyM1uxnz3sxBaRDQ+35wnzIv5JZeO0SprLs2j7hkwtj6mhvk4MLcZL9rMqeDsQfLfKrG73a/VP3g== X-Received: by 2002:a05:600c:1ca9:b0:3a0:43a9:5e1a with SMTP id k41-20020a05600c1ca900b003a043a95e1amr3881477wms.155.1657194719419; Thu, 07 Jul 2022 04:51:59 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:51:59 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 11/27] tcp: support externally provided ubufs Date: Thu, 7 Jul 2022 12:49:42 +0100 Message-Id: <7ee05f644e3b3626b693973738364bcb23cf905d.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Teach ipv4/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov --- net/ipv4/tcp.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 390eb3dc53bd..a81f694af5e9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1223,17 +1223,23 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) flags = msg->msg_flags; - if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) { + if ((flags & MSG_ZEROCOPY) && size) { skb = tcp_write_queue_tail(sk); - uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); - if (!uarg) { - err = -ENOBUFS; - goto out_err; - } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) - uarg->zerocopy = 0; + if (msg->msg_ubuf) { + uarg = msg->msg_ubuf; + net_zcopy_get(uarg); + zc = sk->sk_route_caps & NETIF_F_SG; + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); + if (!uarg) { + err = -ENOBUFS; + goto out_err; + } + zc = sk->sk_route_caps & NETIF_F_SG; + if (!zc) + uarg->zerocopy = 0; + } } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1356,9 +1362,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) copy = min_t(int, copy, pfrag->size - pfrag->offset); - if (tcp_downgrade_zcopy_pure(sk, skb)) - goto wait_for_space; - + if (unlikely(skb_zcopy_pure(skb) || skb_zcopy_managed(skb))) { + if (tcp_downgrade_zcopy_pure(sk, skb)) + goto wait_for_space; + skb_zcopy_downgrade_managed(skb); + } copy = tcp_wmem_schedule(sk, copy); if (!copy) goto wait_for_space; From patchwork Thu Jul 7 11:49:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6585BCCA480 for ; Thu, 7 Jul 2022 11:52:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235299AbiGGLwW (ORCPT ); Thu, 7 Jul 2022 07:52:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235480AbiGGLwE (ORCPT ); Thu, 7 Jul 2022 07:52:04 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44898564C5; Thu, 7 Jul 2022 04:52:02 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id n185so10475964wmn.4; Thu, 07 Jul 2022 04:52:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QCK7k3yHrE0FdtZ6WCn4Gv42BbVCsPIkdpgk1yDl/3o=; b=m0cZHyNXEPKBh9Q4KJojy6doLEzyR/sluUouDiM5a4RcCcA2Z0gRw97Su9wdfbQTAS mvr6PPXmEb8iuxUaUxwoNGbYoaA2DMXU2ZPRfriiMn5omjkeYCAPhSnSaWrLVilARsjk Is1jI5t8yl38JoE1y4ODySIHSIQNX1qMAPBgB/YSyGP4oKRQHwq5AnZJMV9YntLj/rPp VDb0QrBE5Cx6oBlMugtCFNQW8e6fXIZQNMYG45UVSWYEAmZP1MzPzTfN3+swkrb71VXO gyAJXwy13IszPUTrVD1aGkuoENt4mq2hRIVxM4owE3dsoU+Bq6HL80MAG3pVfK4ev7Of OvnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QCK7k3yHrE0FdtZ6WCn4Gv42BbVCsPIkdpgk1yDl/3o=; b=KxrSOO2aY51B+QGoJd4Sh3U530Sumv5IMt4z3Bl4+F8gb2l/lEeMvyA6YK/H/JDPzY b+PARYaf97evsGhBBlhs3vzu+P2QdKQXKop9SXOlHyIhZPx4o9KZDx/oOTmiNAtBNtr7 129CtX0RU818JT2FKanO0b9FyyU9dDy26pOLx8UnqZIjZAZeE3XXf5QX31fPUX4ufDVY F1YxZMLHd9HczM0qyCvU7+9R4QNibAcFJrkjfIlfOWY40G+yKzDqFOleAaPlFAuuB/Fd iuDCYjzxRshQCxn0sF21Pva6nOl6j45CM5edAkUWF7wDBCqcds0CJEMVyHFMtvrSwr6m P06g== X-Gm-Message-State: AJIora8GKxwoHAyP+1GdiR0TqkxxEAvu67tKUqoIi4IuBg92MOBC0c36 Z9QHUTNQqc9GhkSXJ1DkyCXGWAXbo2Y95AH90oc= X-Google-Smtp-Source: AGRyM1vo4qy7omuQEEnoNgKG0XWRgkEY6HWfPZBujSlCn+bLDtpzLhKs/q0B0sKyy8oiO8kkByh9+Q== X-Received: by 2002:a05:600c:255:b0:3a1:963d:2ba3 with SMTP id 21-20020a05600c025500b003a1963d2ba3mr4005526wmj.200.1657194720555; Thu, 07 Jul 2022 04:52:00 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.51.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:00 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 12/27] io_uring: initialise msghdr::msg_ubuf Date: Thu, 7 Jul 2022 12:49:43 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Initialise newly added ->msg_ubuf in io_recv() and io_send(). Signed-off-by: Pavel Begunkov --- io_uring/net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index cb08a4b62840..2dd61fcf91d8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -255,6 +255,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -601,6 +602,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_flags = 0; msg.msg_controllen = 0; msg.msg_iocb = NULL; + msg.msg_ubuf = NULL; flags = sr->msg_flags; if (force_nonblock) From patchwork Thu Jul 7 11:49:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36310C433EF for ; Thu, 7 Jul 2022 11:52:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235482AbiGGLwY (ORCPT ); Thu, 7 Jul 2022 07:52:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235485AbiGGLwE (ORCPT ); Thu, 7 Jul 2022 07:52:04 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9321053D0E; Thu, 7 Jul 2022 04:52:03 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id o16-20020a05600c379000b003a02eaea815so970365wmr.0; Thu, 07 Jul 2022 04:52:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yQxFmxJR/QKBAZO6SHp/6U0EjdNvdoVCL417Yd+7Jb4=; b=RisWah9PCUHLBq4dMjq5QVBsK8Iz+29uS6KjkwGuTrkn0VKJ5XLWK/F6AsIVankBnP fufDpm78QfdsFLVpoTZq/p1Z4R5GkW0rm0YKq9ZWI/oNEXSUP66vrZt6ujOJ+6VHBdNs 69bCHiCWfPK8ec6lRkH4XqLOW27kdN+20KhzcLATS9swuDYU76eQnk5TdfFbSC9JytA5 t5we97nHaCauqzM0oi0qTiZTMgdzmaWE7WdjlCQGY58Lrz4NGd2WjrVJnw3QD9OtJDXN UDNi6U1zZEz3qOJ/ewYhEEXQd/SiJud0vNyj/d/ihm3weXqphQnR9S/Gm8kMplCqMBAB pmAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yQxFmxJR/QKBAZO6SHp/6U0EjdNvdoVCL417Yd+7Jb4=; b=Vd3TPrMnnuk02oB2zhJua/eW6Hzb3yAMUOQg/pn2/t8psz5WlguELvIlJAvwnUnhBL jlAUcV2JMqc44JduCqKsCFljI4ZnEw2utiwD9XEQcDCpRI5DKYsVAujVYAh9Y6+fZyVR roMo/3YKkSiVvAov9Cf6COdxqUBh3eKbQzIPDWGnJUG+bDlsEejxdfUSL1oSp2Ox29X4 wZE5CNL+mlRDPxW5+kMiXKDcXnSFbazgwgO797cZNPlZ3qXzI4ed0csttP0j4kMBHBmX IWVg0c4KtokgNMXMG7fN/riSo8UpOtBup9XjkK1l0mhyg2GlziVDOtRp3xzOODqAu44B NL5A== X-Gm-Message-State: AJIora+2d2Gy5tlL+AgnpDdBL23cL/B0D/R8qsh0A+G9gzVQqPsvzDyv GY/0g4PVQ7ywHuZkN5vgSR7IOHpOkDu6VhTNgPE= X-Google-Smtp-Source: AGRyM1uFZBZwYfqwuJaynzQASf+eB7WTisVOSTpfAo+Iyo2A7s0AXRlzOJ0S4Bq6ab9/aXWP71FIqg== X-Received: by 2002:a7b:c01a:0:b0:3a1:7ab1:e5dc with SMTP id c26-20020a7bc01a000000b003a17ab1e5dcmr4009494wmb.128.1657194721701; Thu, 07 Jul 2022 04:52:01 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:01 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 13/27] io_uring: export io_put_task() Date: Thu, 7 Jul 2022 12:49:44 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Make io_put_task() available to non-core parts of io_uring, we'll need it for notification infrastructure. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 25 +++++++++++++++++++++++++ io_uring/io_uring.c | 11 +---------- io_uring/io_uring.h | 10 ++++++++++ io_uring/tctx.h | 26 -------------------------- 4 files changed, 36 insertions(+), 36 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 26ef11e978d4..d876a0367081 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -4,6 +4,7 @@ #include #include #include +#include #include struct io_wq_work_node { @@ -43,6 +44,30 @@ struct io_hash_table { unsigned hash_bits; }; +/* + * Arbitrary limit, can be raised if need be + */ +#define IO_RINGFD_REG_MAX 16 + +struct io_uring_task { + /* submission side */ + int cached_refs; + const struct io_ring_ctx *last; + struct io_wq *io_wq; + struct file *registered_rings[IO_RINGFD_REG_MAX]; + + struct xarray xa; + struct wait_queue_head wait; + atomic_t in_idle; + atomic_t inflight_tracked; + struct percpu_counter inflight; + + struct { /* task_work */ + struct llist_head task_list; + struct callback_head task_work; + } ____cacheline_aligned_in_smp; +}; + struct io_uring { u32 head ____cacheline_aligned_in_smp; u32 tail ____cacheline_aligned_in_smp; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index caf979cd4327..bb644b1b575a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -602,7 +602,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx) return ret; } -static void __io_put_task(struct task_struct *task, int nr) +void __io_put_task(struct task_struct *task, int nr) { struct io_uring_task *tctx = task->io_uring; @@ -612,15 +612,6 @@ static void __io_put_task(struct task_struct *task, int nr) put_task_struct_many(task, nr); } -/* must to be called somewhat shortly after putting a request */ -static inline void io_put_task(struct task_struct *task, int nr) -{ - if (likely(task == current)) - task->io_uring->cached_refs += nr; - else - __io_put_task(task, nr); -} - static void io_task_refs_refill(struct io_uring_task *tctx) { unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 868f45d55543..2379d9e70c10 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -66,6 +66,7 @@ void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); void io_queue_next(struct io_kiocb *req); +void __io_put_task(struct task_struct *task, int nr); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); @@ -253,4 +254,13 @@ static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx) __io_commit_cqring_flush(ctx); } +/* must to be called somewhat shortly after putting a request */ +static inline void io_put_task(struct task_struct *task, int nr) +{ + if (likely(task == current)) + task->io_uring->cached_refs += nr; + else + __io_put_task(task, nr); +} + #endif diff --git a/io_uring/tctx.h b/io_uring/tctx.h index 8a33ff6e5d91..25974beed4d6 100644 --- a/io_uring/tctx.h +++ b/io_uring/tctx.h @@ -1,31 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 -#include - -/* - * Arbitrary limit, can be raised if need be - */ -#define IO_RINGFD_REG_MAX 16 - -struct io_uring_task { - /* submission side */ - int cached_refs; - const struct io_ring_ctx *last; - struct io_wq *io_wq; - struct file *registered_rings[IO_RINGFD_REG_MAX]; - - struct xarray xa; - struct wait_queue_head wait; - atomic_t in_idle; - atomic_t inflight_tracked; - struct percpu_counter inflight; - - struct { /* task_work */ - struct llist_head task_list; - struct callback_head task_work; - } ____cacheline_aligned_in_smp; -}; - struct io_tctx_node { struct list_head ctx_node; struct task_struct *task; From patchwork Thu Jul 7 11:49:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC72BC433EF for ; Thu, 7 Jul 2022 11:52:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235571AbiGGLwk (ORCPT ); Thu, 7 Jul 2022 07:52:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234745AbiGGLwH (ORCPT ); Thu, 7 Jul 2022 07:52:07 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC97B53D1A; Thu, 7 Jul 2022 04:52:04 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id s1so25951976wra.9; Thu, 07 Jul 2022 04:52:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nKvP/tlFDORXU0+1GtpYo0kp5Sigy9sy1TUkQv+AqGw=; b=IlQvwQLcBp1TFy85nrpoRuNnqpauFPbAMe+/SJzmuF3ItdThMzNPD3hwMfyQheY1nN 6D/megvkswY8KnawUHzkbjVSESP5q+fGHZjd3cnN6T09Lq7LFUK+FPPPEK0EGRjygxqj nVSytXcGh0lDxOuTPn7KZnPPGrSFXJkIPgTZd9mfEb1zbbFFs8Z7p03i9wPgY5lflxS5 GJYI1DO8LNm2009jQkL5lt8j8k6tSokhl0U4ni1vvwzldlKOtqF9n040R7v8tv4vtzPR lZWLnWEpcsMucveP5KPrxuDj4/vxRS7NCfFoLL1CK+UTuljI65tIxmv5C3fdKJcy3AAM BJTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nKvP/tlFDORXU0+1GtpYo0kp5Sigy9sy1TUkQv+AqGw=; b=8GKQTfU2koq5ZnB/i8+swihMh0WqroeY6fK4OSKpKv1D0IDeIFo2L0YQpEgmrc2/hB V/7YZgyZTq1nioZ1UdPhss0/CROGZ4AA/M81Ts6+jJbmNbpkv6wsoYalsUJgx+qDYPxW 7Qcfd/ZH2vb8u1SxG0QQmi2ZkEDqmsKSv3ZaaRXgiKD9xr3qEUmSfA45lWs21wyuyXzD mQnJ1lOa5ju376S2MgzEvFe+skUsYI+zxg48hAT40E6XXSuwaD0DKNGDxlYebWKqX0dk 7CULsTyAnbewY+TnWq1p5nc0HB+pvt5peMs6oHUU4YHU2mShUiQnY5l/8ypU4eMpc7Yq 5ibQ== X-Gm-Message-State: AJIora/R1SwRhdkmXb9eLFDN+AS/KxLYnlxlbJHEuXLF/tBEbrW6+lCm TERkF+kcRimX5WjyrY+8du7j+XH6I1oP8Ei8uB8= X-Google-Smtp-Source: AGRyM1vmsqbh1WZ5vzwSxzUp/0pXMXbR65g1XEEWufRl1tfeowq9oHi4cauhB4TpMpOsDHisvd/Oqg== X-Received: by 2002:a5d:6b81:0:b0:21d:72a8:73c9 with SMTP id n1-20020a5d6b81000000b0021d72a873c9mr13025161wrx.630.1657194722831; Thu, 07 Jul 2022 04:52:02 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:02 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 14/27] io_uring: add zc notification infrastructure Date: Thu, 7 Jul 2022 12:49:45 +0100 Message-Id: <6b938b918696b8a3d0d7bd89045390556ac286fd.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 5 ++ io_uring/Makefile | 2 +- io_uring/io_uring.c | 8 ++- io_uring/io_uring.h | 2 + io_uring/notif.c | 102 +++++++++++++++++++++++++++++++++ io_uring/notif.h | 64 +++++++++++++++++++++ 6 files changed, 179 insertions(+), 4 deletions(-) create mode 100644 io_uring/notif.c create mode 100644 io_uring/notif.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d876a0367081..95334e678586 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -34,6 +34,9 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_notif; +struct io_notif_slot; + struct io_hash_bucket { spinlock_t lock; struct hlist_head list; @@ -232,6 +235,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_notif_slot *notif_slots; + unsigned nr_notif_slots; struct io_submit_state submit_state; diff --git a/io_uring/Makefile b/io_uring/Makefile index 466639c289be..8cc8e5387a75 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index bb644b1b575a..ad816afe2345 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -89,6 +89,7 @@ #include "kbuf.h" #include "rsrc.h" #include "cancel.h" +#include "notif.h" #include "timeout.h" #include "poll.h" @@ -726,9 +727,8 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx) return &rings->cqes[off]; } -static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, - u64 user_data, s32 res, u32 cflags, - bool allow_overflow) +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, + bool allow_overflow) { struct io_uring_cqe *cqe; @@ -2496,6 +2496,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) } #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -2672,6 +2673,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) io_unregister_personality(ctx, index); if (ctx->rings) io_poll_remove_all(ctx, NULL, true); + io_notif_unregister(ctx); mutex_unlock(&ctx->uring_lock); /* failed during ring init, it couldn't have issued any requests */ diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2379d9e70c10..b8c858727dc8 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -33,6 +33,8 @@ void io_req_complete_post(struct io_kiocb *req); void __io_req_complete_post(struct io_kiocb *req); bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, bool allow_overflow); +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, + bool allow_overflow); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); diff --git a/io_uring/notif.c b/io_uring/notif.c new file mode 100644 index 000000000000..6ee948af6a49 --- /dev/null +++ b/io_uring/notif.c @@ -0,0 +1,102 @@ +#include +#include +#include +#include +#include +#include + +#include "io_uring.h" +#include "notif.h" + +static void __io_notif_complete_tw(struct callback_head *cb) +{ + struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_ring_ctx *ctx = notif->ctx; + + io_cq_lock(ctx); + io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq, true); + io_cq_unlock_post(ctx); + + percpu_ref_put(&ctx->refs); + kfree(notif); +} + +static inline void io_notif_complete(struct io_notif *notif) +{ + __io_notif_complete_tw(¬if->task_work); +} + +static void io_notif_complete_wq(struct work_struct *work) +{ + struct io_notif *notif = container_of(work, struct io_notif, commit_work); + + io_notif_complete(notif); +} + +static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *uarg, + bool success) +{ + struct io_notif *notif = container_of(uarg, struct io_notif, uarg); + + if (!refcount_dec_and_test(&uarg->refcnt)) + return; + INIT_WORK(¬if->commit_work, io_notif_complete_wq); + queue_work(system_unbound_wq, ¬if->commit_work); +} + +struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif; + + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + + notif->seq = slot->seq++; + notif->tag = slot->tag; + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + /* master ref owned by io_notif_slot, will be dropped on flush */ + refcount_set(¬if->uarg.refcnt, 1); + percpu_ref_get(&ctx->refs); + return notif; +} + +static void io_notif_slot_flush(struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif = slot->notif; + + slot->notif = NULL; + + if (WARN_ON_ONCE(in_interrupt())) + return; + /* drop slot's master ref */ + if (refcount_dec_and_test(¬if->uarg.refcnt)) + io_notif_complete(notif); +} + +__cold int io_notif_unregister(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + int i; + + if (!ctx->notif_slots) + return -ENXIO; + + for (i = 0; i < ctx->nr_notif_slots; i++) { + struct io_notif_slot *slot = &ctx->notif_slots[i]; + + if (slot->notif) + io_notif_slot_flush(slot); + } + + kvfree(ctx->notif_slots); + ctx->notif_slots = NULL; + ctx->nr_notif_slots = 0; + return 0; +} \ No newline at end of file diff --git a/io_uring/notif.h b/io_uring/notif.h new file mode 100644 index 000000000000..3d7a1d242e17 --- /dev/null +++ b/io_uring/notif.h @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include + +struct io_notif { + struct ubuf_info uarg; + struct io_ring_ctx *ctx; + + /* cqe->user_data, io_notif_slot::tag if not overridden */ + u64 tag; + /* see struct io_notif_slot::seq */ + u32 seq; + + union { + struct callback_head task_work; + struct work_struct commit_work; + }; +}; + +struct io_notif_slot { + /* + * Current/active notifier. A slot holds only one active notifier at a + * time and keeps one reference to it. Flush releases the reference and + * lazily replaces it with a new notifier. + */ + struct io_notif *notif; + + /* + * Default ->user_data for this slot notifiers CQEs + */ + u64 tag; + /* + * Notifiers of a slot live in generations, we create a new notifier + * only after flushing the previous one. Track the sequential number + * for all notifiers and copy it into notifiers's cqe->cflags + */ + u32 seq; +}; + +int io_notif_unregister(struct io_ring_ctx *ctx); + +struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot); + +static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) +{ + if (!slot->notif) + slot->notif = io_alloc_notif(ctx, slot); + return slot->notif; +} + +static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, + int idx) + __must_hold(&ctx->uring_lock) +{ + if (idx >= ctx->nr_notif_slots) + return NULL; + idx = array_index_nospec(idx, ctx->nr_notif_slots); + return &ctx->notif_slots[idx]; +} From patchwork Thu Jul 7 11:49:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1433ECCA479 for ; Thu, 7 Jul 2022 11:52:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235577AbiGGLwl (ORCPT ); Thu, 7 Jul 2022 07:52:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235494AbiGGLwH (ORCPT ); Thu, 7 Jul 2022 07:52:07 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B454353D22; Thu, 7 Jul 2022 04:52:04 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id d16so19627093wrv.10; Thu, 07 Jul 2022 04:52:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bseFqCjtJuCCHP0G+Vx7dbSlO7wFxBUtK6JlLKCdTIA=; b=ZuF1yxoQAZWvMNg6/Bqiyaun6f4IJlBiPmFohxxOaMO72o9L54yj9bIndOcLFSn7p+ 1XOj8dfOJnNvIDNTz2yEMvln51F8K0GlfpZXqH0VaIy0IKSS7/PrObSc3jNwCq9z/vLF 3ca245tOMWGW73tB/7EZ08oA7nsMlZFIW/cxYG4E6tWlTsoS+n3uHWkgPNoxlz5dxAkQ 1wjfxdT3UazHGsI+kxJtS7llIF9QTlRonOyISVdfdV9RMNLUgDMmdbST8+AKNKY8rE4+ VP6uIKrcpbPLbZS+Hi2ZgxoGPfnMlu89iMzEDJ0uTv4yNYrgsAWtIDgOxB6dXwnaRt5c ck5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bseFqCjtJuCCHP0G+Vx7dbSlO7wFxBUtK6JlLKCdTIA=; b=d4M1pNTR0KZ6trAYuUGPvBQWrIfTGLKdG/W9KX/Wwao/EhXP/AVzrWMwrw/DxcyoiO 0d3cYggRsMQ3MvvA9ZshK7KPs2y/F1iFQkfRqsKGnzPOwkGrW8X3w0Z9bl2vmgJPo/lo oyxcuIiz/SF9wIReysW/Bdl1ZNuLda1Lg+0PdLuz/2WGeQffk+Kym0fMKgBa81w8ZDuL nRf8gW5Y1xZq0vUwC+giLu8ucD4WTyF43zHoO1d7ziLrWc01y+jNvPQINMv/f3TW3BSk 7ehx67LCKOyyge2KCpwSx7fpXA+yxO8hvSWlNnNlLpUsGXJXgXgJ2Vviv0JLvFlQzDGc UW9w== X-Gm-Message-State: AJIora9DOjUvJVPLxfAnfdLccVTFnE/xJYxRCs1fRoi1Uo83FYQlov0p 6p9Px+wNzOFVu1V8syvTpjgRVgUjZrc+7EKWzn4= X-Google-Smtp-Source: AGRyM1ul0scktUDMsOh6Er/U8WXq2fu1lln5pxLJVW4oesKe8Dip0gU4+L6Wb9irRzKYJDnaKuhT/Q== X-Received: by 2002:a05:6000:911:b0:21d:2100:b97b with SMTP id bz17-20020a056000091100b0021d2100b97bmr42788178wrb.649.1657194723943; Thu, 07 Jul 2022 04:52:03 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:03 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 15/27] io_uring: cache struct io_notif Date: Thu, 7 Jul 2022 12:49:46 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 7 +++++ io_uring/io_uring.c | 3 ++ io_uring/notif.c | 57 +++++++++++++++++++++++++++++----- io_uring/notif.h | 5 +++ 4 files changed, 65 insertions(+), 7 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 95334e678586..66ab009e7a6b 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -244,6 +244,9 @@ struct io_ring_ctx { struct xarray io_bl_xa; struct list_head io_buffers_cache; + /* struct io_notif cache, protected by uring_lock */ + struct list_head notif_list; + struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; struct list_head apoll_cache; @@ -255,6 +258,10 @@ struct io_ring_ctx { struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; + /* struct io_notif cache protected by completion_lock */ + struct list_head notif_list_locked; + unsigned int notif_locked_nr; + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ad816afe2345..bdc5a2839d94 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -318,6 +318,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_LIST_HEAD(&ctx->notif_list); + INIT_LIST_HEAD(&ctx->notif_list_locked); return ctx; err: kfree(ctx->dummy_ubuf); @@ -2498,6 +2500,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); + io_notif_cache_purge(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); diff --git a/io_uring/notif.c b/io_uring/notif.c index 6ee948af6a49..b257db2120b4 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -15,10 +15,12 @@ static void __io_notif_complete_tw(struct callback_head *cb) io_cq_lock(ctx); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq, true); + + list_add(¬if->cache_node, &ctx->notif_list_locked); + ctx->notif_locked_nr++; io_cq_unlock_post(ctx); percpu_ref_put(&ctx->refs); - kfree(notif); } static inline void io_notif_complete(struct io_notif *notif) @@ -45,21 +47,62 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, queue_work(system_unbound_wq, ¬if->commit_work); } +static void io_notif_splice_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + spin_lock(&ctx->completion_lock); + list_splice_init(&ctx->notif_list_locked, &ctx->notif_list); + ctx->notif_locked_nr = 0; + spin_unlock(&ctx->completion_lock); +} + +void io_notif_cache_purge(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + io_notif_splice_cached(ctx); + + while (!list_empty(&ctx->notif_list)) { + struct io_notif *notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + + list_del(¬if->cache_node); + kfree(notif); + } +} + +static inline bool io_notif_has_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + if (likely(!list_empty(&ctx->notif_list))) + return true; + if (data_race(READ_ONCE(ctx->notif_locked_nr) <= IO_NOTIF_SPLICE_BATCH)) + return false; + io_notif_splice_cached(ctx); + return !list_empty(&ctx->notif_list); +} + struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif; - notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); - if (!notif) - return NULL; + if (likely(io_notif_has_cached(ctx))) { + notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + list_del(¬if->cache_node); + } else { + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + /* pre-initialise some fields */ + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + } notif->seq = slot->seq++; notif->tag = slot->tag; - notif->ctx = ctx; - notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; - notif->uarg.callback = io_uring_tx_zerocopy_callback; /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); diff --git a/io_uring/notif.h b/io_uring/notif.h index 3d7a1d242e17..b23c9c0515bb 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -5,6 +5,8 @@ #include #include +#define IO_NOTIF_SPLICE_BATCH 32 + struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; @@ -13,6 +15,8 @@ struct io_notif { u64 tag; /* see struct io_notif_slot::seq */ u32 seq; + /* hook into ctx->notif_list and ctx->notif_list_locked */ + struct list_head cache_node; union { struct callback_head task_work; @@ -41,6 +45,7 @@ struct io_notif_slot { }; int io_notif_unregister(struct io_ring_ctx *ctx); +void io_notif_cache_purge(struct io_ring_ctx *ctx); struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot); From patchwork Thu Jul 7 11:49:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCB5FC433EF for ; Thu, 7 Jul 2022 11:52:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235607AbiGGLwx (ORCPT ); Thu, 7 Jul 2022 07:52:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235497AbiGGLwK (ORCPT ); Thu, 7 Jul 2022 07:52:10 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1E5F564CA; Thu, 7 Jul 2022 04:52:06 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id v67-20020a1cac46000000b003a1888b9d36so10890982wme.0; Thu, 07 Jul 2022 04:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7UzeQaTQZy+qkWsDQJne9orKwjzmVDjXh7VXgGpnuFg=; b=enf/qNVVS/BjnWhlcj1ZNs2RiobUrXSwlFWQlAt+wQRTx9JbK5iecSEHNTnlXfRGVE 0+FSc+S5QuXlRAsfDI0Usy3+EkhMLuqzz56wINmSunrmxn6A8+yvd4/fezrGQ3PM4yh5 j/XL8XANfjKn6e179pvlp4ZeqyLDi9wYK5qhTeuDJ4LOncjHvgYBYFVm4SKY8JvB17/s gBlaCaSkN0NaRsL1euRIZfgA9+lriUuOyvkd0OpPGCagH5zyZsa+XmV5WpIxP3pd6S0c YhU1P4mgmilLnNRWMwqiQEiB9VhP3MqBvU5chmq659VMNgRGiAkmfEbsw1O2sO94HEES zTqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7UzeQaTQZy+qkWsDQJne9orKwjzmVDjXh7VXgGpnuFg=; b=QUToOxGcmL+3lYonXnydig8TlN4f+Bp7gAHqFmcUR2UmRADiJO9YPPMOAsZMtieXbI aPziNQlnYkaCpFALw8ubFf32CgPboT5S3kjpMElhZnQmYYPiJx5FE4JzDdOoOwCZSdBq uIfIpq7m2MVmzFtWCHIIMBeI0HZgr1ZgITR7EWGl6FlE0YW4VYwljIIJEwXnWyTHbBVc YD3XPfH37OLjhXXNr/oZUsWTYYA19ja2ImOD9Irz4+++djC7QyE4w9NK2Qwl4kZyy5Eg 6b+v4XwRtdMAQWLwoZcwGE0Gr9wMazFvVCfxzxO0pjU+TteahPbaOWAgPTKFHGIMA0f3 UqDg== X-Gm-Message-State: AJIora8A1M2+ehUlWcK67VjIPmgozU3Zm5r/mOjv+3IaD9EIcWUU67Ap W/vAufkPu9Df4W6wBzCjl9QuXmJ2mOGDY0OZgIo= X-Google-Smtp-Source: AGRyM1t3YlXjkGssaDlaxXo1igHFNIao3LtCdS42R25mVvm7uukiFwA5Q6y4vC1gkKw7Z2RtrtrvJQ== X-Received: by 2002:a05:600c:19c8:b0:3a1:792e:f913 with SMTP id u8-20020a05600c19c800b003a1792ef913mr4044760wmq.182.1657194725110; Thu, 07 Jul 2022 04:52:05 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:04 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 16/27] io_uring: complete notifiers in tw Date: Thu, 7 Jul 2022 12:49:47 +0100 Message-Id: <3f3cb9e30bd344d1fb21e9505785b653ca38f069.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 22 +++++++++++++++++++--- io_uring/notif.h | 3 +++ 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index b257db2120b4..aec74f88fc33 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -13,6 +13,11 @@ static void __io_notif_complete_tw(struct callback_head *cb) struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_ring_ctx *ctx = notif->ctx; + if (likely(notif->task)) { + io_put_task(notif->task, 1); + notif->task = NULL; + } + io_cq_lock(ctx); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq, true); @@ -43,6 +48,14 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, if (!refcount_dec_and_test(&uarg->refcnt)) return; + + if (likely(notif->task)) { + init_task_work(¬if->task_work, __io_notif_complete_tw); + if (likely(!task_work_add(notif->task, ¬if->task_work, + TWA_SIGNAL))) + return; + } + INIT_WORK(¬if->commit_work, io_notif_complete_wq); queue_work(system_unbound_wq, ¬if->commit_work); } @@ -134,12 +147,15 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx) for (i = 0; i < ctx->nr_notif_slots; i++) { struct io_notif_slot *slot = &ctx->notif_slots[i]; - if (slot->notif) - io_notif_slot_flush(slot); + if (!slot->notif) + continue; + if (WARN_ON_ONCE(slot->notif->task)) + slot->notif->task = NULL; + io_notif_slot_flush(slot); } kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; return 0; -} \ No newline at end of file +} diff --git a/io_uring/notif.h b/io_uring/notif.h index b23c9c0515bb..23ca7620fff9 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -11,6 +11,9 @@ struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ + struct task_struct *task; + /* cqe->user_data, io_notif_slot::tag if not overridden */ u64 tag; /* see struct io_notif_slot::seq */ From patchwork Thu Jul 7 11:49:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D44E1C43334 for ; Thu, 7 Jul 2022 11:52:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235620AbiGGLw5 (ORCPT ); Thu, 7 Jul 2022 07:52:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235397AbiGGLwM (ORCPT ); Thu, 7 Jul 2022 07:52:12 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09576564D0; Thu, 7 Jul 2022 04:52:08 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id b26so25980720wrc.2; Thu, 07 Jul 2022 04:52:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=n3QKMlKm1it+ilRxJ10vGka3QnjL+qdopfFQgCtqzes=; b=fO5QR1d20sSzSVeCdHtu0VMhbYncXeiy0fQnnAuNdfydNu7mtHY5D4sY6NRFWHP2aR AviaBIbSZMz2EJlWI4yqRY3gT4lPudCNw9MBZPaoBKXfH2leJOQB0tWHFiIsk5gddId7 09CVyP4Nx3n4WWvltfC2tc5OQeXDq0id45ULdaYP3hQVWmpL2zcDsapYHjbX+ozwtTAO SAIU9bXO10RG7wr0D1sK1vFyMGxiMiISbfPKLERm7Ff1I+cxFqzSr1ORAtSiOZIijtny Y3dFSy3CJDc7iFry6Yo8EsDI+WVVDJwbfj9IRBvCAnGdKtwXURew8WaVoahQ58yi7Djr RjWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=n3QKMlKm1it+ilRxJ10vGka3QnjL+qdopfFQgCtqzes=; b=2vJYWGKWQu8LclvUrLpfIc5VObOOG+e9119cjS2NVv8dTHSuZIih0jBp48dL46H/ki Uay0CVieqpA8a/o3PR0dSfjCjPo/+e/DGPIrrQAIFGAe8dMP9CGgDsDvfPyAqXH3bH4g 4rkV1A8WyqNWvjY4NzbpCBYXyf5l9vnjpulb4Mbvhwy0Wrys50jM3C/0Mk82/8Y3/DHO 6FRazLPybWl6e42ewBb1jZeuRaL801320BILQEQ+F/tTfXpSC7skgMQAX0IpaTgaIPqI vOoPJw3YyQksRvj3KVQRRuEXgYjRdSkrqULoYved2Fhpq3uir8Jb4aRfEtyavhRB2lAQ mCUw== X-Gm-Message-State: AJIora9zww99SbdXSXQtze7jymRIuqh+ixl/hNEQdftlQQahuN7tpFOk ZeDAQrskr21zIkuZFMMuYNt/QWDV+KCjdedPQX8= X-Google-Smtp-Source: AGRyM1vxZs2OTI4jbQQUAMNu9efPsdSAxJOqJsy2aacCdXfJvt6XubcRUCfiT7JLXkO9elSdc077yA== X-Received: by 2002:a5d:440e:0:b0:21d:85ce:6b8e with SMTP id z14-20020a5d440e000000b0021d85ce6b8emr3119240wrq.248.1657194726310; Thu, 07 Jul 2022 04:52:06 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:05 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 17/27] io_uring: add rsrc referencing for notifiers Date: Thu, 7 Jul 2022 12:49:48 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 5 +++++ io_uring/notif.h | 1 + io_uring/rsrc.h | 12 +++++++++--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index aec74f88fc33..0a2e98bd74f6 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -7,10 +7,12 @@ #include "io_uring.h" #include "notif.h" +#include "rsrc.h" static void __io_notif_complete_tw(struct callback_head *cb) { struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_rsrc_node *rsrc_node = notif->rsrc_node; struct io_ring_ctx *ctx = notif->ctx; if (likely(notif->task)) { @@ -25,6 +27,7 @@ static void __io_notif_complete_tw(struct callback_head *cb) ctx->notif_locked_nr++; io_cq_unlock_post(ctx); + io_rsrc_put_node(rsrc_node, 1); percpu_ref_put(&ctx->refs); } @@ -119,6 +122,8 @@ struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); + notif->rsrc_node = ctx->rsrc_node; + io_charge_rsrc_node(ctx); return notif; } diff --git a/io_uring/notif.h b/io_uring/notif.h index 23ca7620fff9..1dd48efb7744 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -10,6 +10,7 @@ struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + struct io_rsrc_node *rsrc_node; /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ struct task_struct *task; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 87f58315b247..af342fd239d0 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -135,6 +135,13 @@ static inline void io_req_put_rsrc_locked(struct io_kiocb *req, } } +static inline void io_charge_rsrc_node(struct io_ring_ctx *ctx) +{ + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); +} + static inline void io_req_set_rsrc_node(struct io_kiocb *req, struct io_ring_ctx *ctx, unsigned int issue_flags) @@ -144,9 +151,8 @@ static inline void io_req_set_rsrc_node(struct io_kiocb *req, if (!(issue_flags & IO_URING_F_UNLOCKED)) { lockdep_assert_held(&ctx->uring_lock); - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); + + io_charge_rsrc_node(ctx); } else { percpu_ref_get(&req->rsrc_node->refs); } From patchwork Thu Jul 7 11:49:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0B85CCA482 for ; Thu, 7 Jul 2022 11:52:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235628AbiGGLw6 (ORCPT ); Thu, 7 Jul 2022 07:52:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235472AbiGGLwM (ORCPT ); Thu, 7 Jul 2022 07:52:12 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E84D564D1; Thu, 7 Jul 2022 04:52:08 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id bk26so10879991wrb.11; Thu, 07 Jul 2022 04:52:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+ezDBB9HJAZBI4ahE1FxGpP6vMsCBfNd1fOFDrXYD+E=; b=d/KzkFf0dlzuz0vAfo1xFmo7HhGrlV2kj2zwbvG8VC9urem7lc3yRyqzj8KF/syNm1 j8JngnPUuum9ywJxkYJXvVIm2pX9xo2nCVXTpff32L72QjQw+BiIN4e/cG/ZdycGA2Tx JMgmqx5tequF4V6INtewSoJCEgDM+N1Wm/cRgWrmGNGaUjXoXTEGOaSz4Jjc3Rl64Wi7 XEhlDDguQbGBsaaWpZpEOtGI7p4YCJYQpoCrrDPwOEwkSdB7v+mL6qHT2EW+RU3ZXffc NiK8Z7BzjO0zb97Nl39MDF0yTaenxDHwy0kItJxh1bxZtTwudIWYabieFa7Ob5LEF+qw wKJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+ezDBB9HJAZBI4ahE1FxGpP6vMsCBfNd1fOFDrXYD+E=; b=e9SH1W1XX00kKtkCRUykxfZqeXDdeWAixNWJrDiU+4iX1vBM7dU4GUc2QEfZ4gA1O3 CdpbakIn6nV8T7c/gZzKyQRuOn4Cp6fy74GpIrqeSg1goHVTjSekAe7eZGsD0cWW+7jq t/vcf/smfn3ilIYAylUlZz7mwYftuFJZgbfrriAvaLdQu+PG2HfImR41FClOZhTDFW0t uZ6x5KAqvpZnwPqFmQjcpA3J6qqT14o60kp820dx6KVXu2fkN/1LiWjLOst7dky6Jq16 gZdYNXv1d4iMrPEAhrPr33T/+gvpdoszj/sspmnYoBSecOMoGUZDq814Q70y17EX9PAj 4+MA== X-Gm-Message-State: AJIora+cJ3a7+CcJrCPWGE96XyUpstNmkxJFzzzH9kWbgkJLGzmWXLA0 MTtXklYTKtnkSMIEfy4mKIf72u2zkVboW38Ynqs= X-Google-Smtp-Source: AGRyM1utbcGQeXsnlJQ3nGaTNUsYYGz6GjByDYc8Hh/+OwFAMWiKUqW/Lpb6y3jSdvl0Bg74aWHF2g== X-Received: by 2002:adf:fb83:0:b0:21d:649a:72d9 with SMTP id a3-20020adffb83000000b0021d649a72d9mr22525310wrr.688.1657194727477; Thu, 07 Jul 2022 04:52:07 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:07 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 18/27] io_uring: add notification slot registration Date: Thu, 7 Jul 2022 12:49:49 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 17 ++++++++++++++ io_uring/io_uring.c | 9 ++++++++ io_uring/notif.c | 43 +++++++++++++++++++++++++++++++++++ io_uring/notif.h | 3 +++ 4 files changed, 72 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e858dba2e6c9..f1ba8e934168 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -454,6 +454,10 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + /* zerocopy notification API */ + IORING_REGISTER_NOTIFIERS = 26, + IORING_UNREGISTER_NOTIFIERS = 27, + /* this goes last */ IORING_REGISTER_LAST }; @@ -500,6 +504,19 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_notification_slot { + __u64 tag; + __u64 resv[3]; +}; + +struct io_uring_notification_register { + __u32 nr_slots; + __u32 resv; + __u64 resv2; + __u64 data; + __u64 resv3; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index bdc5a2839d94..41ef98a43d32 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3875,6 +3875,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_file_alloc_range(ctx, arg); break; + case IORING_REGISTER_NOTIFIERS: + ret = io_notif_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_NOTIFIERS: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_notif_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/io_uring/notif.c b/io_uring/notif.c index 0a2e98bd74f6..e6d98dc208c7 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -162,5 +162,48 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx) kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; + io_notif_cache_purge(ctx); + return 0; +} + +__cold int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size) + __must_hold(&ctx->uring_lock) +{ + struct io_uring_notification_slot __user *slots; + struct io_uring_notification_slot slot; + struct io_uring_notification_register reg; + unsigned i; + + if (ctx->nr_notif_slots) + return -EBUSY; + if (size != sizeof(reg)) + return -EINVAL; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (!reg.nr_slots || reg.nr_slots > IORING_MAX_NOTIF_SLOTS) + return -EINVAL; + if (reg.resv || reg.resv2 || reg.resv3) + return -EINVAL; + + slots = u64_to_user_ptr(reg.data); + ctx->notif_slots = kvcalloc(reg.nr_slots, sizeof(ctx->notif_slots[0]), + GFP_KERNEL_ACCOUNT); + if (!ctx->notif_slots) + return -ENOMEM; + + for (i = 0; i < reg.nr_slots; i++, ctx->nr_notif_slots++) { + struct io_notif_slot *notif_slot = &ctx->notif_slots[i]; + + if (copy_from_user(&slot, &slots[i], sizeof(slot))) { + io_notif_unregister(ctx); + return -EFAULT; + } + if (slot.resv[0] | slot.resv[1] | slot.resv[2]) { + io_notif_unregister(ctx); + return -EINVAL; + } + notif_slot->tag = slot.tag; + } return 0; } diff --git a/io_uring/notif.h b/io_uring/notif.h index 1dd48efb7744..00efe164bdc4 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -6,6 +6,7 @@ #include #define IO_NOTIF_SPLICE_BATCH 32 +#define IORING_MAX_NOTIF_SLOTS (1U << 10) struct io_notif { struct ubuf_info uarg; @@ -48,6 +49,8 @@ struct io_notif_slot { u32 seq; }; +int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size); int io_notif_unregister(struct io_ring_ctx *ctx); void io_notif_cache_purge(struct io_ring_ctx *ctx); From patchwork Thu Jul 7 11:49:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909408 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F0EFCCA479 for ; Thu, 7 Jul 2022 11:53:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235637AbiGGLxA (ORCPT ); Thu, 7 Jul 2022 07:53:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235156AbiGGLwU (ORCPT ); Thu, 7 Jul 2022 07:52:20 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C699564D4; Thu, 7 Jul 2022 04:52:09 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id v16so14510952wrd.13; Thu, 07 Jul 2022 04:52:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Amvyi7sRZQX7XkAF6Fq2XY73IaNyB2jliYXaDHGEkSw=; b=LgtKHeRwQp8kc3bowVQfaG5uoBiciMHLDI4xE8lO05CN4PFKm8rkeFfJxErlz+WKG4 5DaNe9kdJ/XFFVAFUPOedH2uKKRItGyYSMmo4ubjs/OBH+npSMWWChanO2jx04HN+xHw HC8KnwM7cG5t8xvmmhvjuYISAdlqGulo2UMM1gtJLh1MRq3pKLRGmh47kfq+zit006EP IbIhSCqE6lOTdh7zW9eVFC5uP9R9PaWrBZAWGBRAZIKE5atWdKPkl0KNOd8g0s+hoyX+ Vm2a4fID/Wifgo+WCrbNv9UsfhA8FxBsOHGFmJMD8M4RGZ4MbANjNcPeVq0DIJ6Pi9hP DMSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Amvyi7sRZQX7XkAF6Fq2XY73IaNyB2jliYXaDHGEkSw=; b=SM6ka3SoE1cm7NCPVTjRF2X3QOGuRxWbODiDImNIFZruYsX8AhTK48GE2zkXeKf2hA bYnbyInv+IRlH2/pS0JTnCHciOvetpFJuV0+TsaZE7L2tD7Pf3XbMJVDCe0tnyf2FLLq n2qtI3JFVb9oQ57JFXAcsyPyWO7ELweVNaiImTTgipfxxENPfvnGr2VQckWZvSqm3MHi 9kqybAoN6SlVdpeDdBJb7d0a/GTh20iYbIoUfBD0R++RD0rhdk9BzONf6UsBfgoH1mEH +9QuB4P2XVikptWFtIdQLlCLbLFyA6PBtGWzFBwvfdjheb1aAsH3wqTxEjEYE1QIpCS2 EcNw== X-Gm-Message-State: AJIora8a3GcrRb2n/2MOEtfRFHlJaBU0T2mfpanOYyeY30PmYxNOJPYu WRKY7JswiMOj+pG+nRzeU451fOcvhEaDKKh7QcQ= X-Google-Smtp-Source: AGRyM1u+Fgit1dS8q9qiJpXoytStaVe7gbAJgCir8piHfRZZV7m5IgORR6eRcoAWJDBY47weToyGNQ== X-Received: by 2002:a05:6000:49:b0:21d:78fe:34b2 with SMTP id k9-20020a056000004900b0021d78fe34b2mr9888737wrx.200.1657194728558; Thu, 07 Jul 2022 04:52:08 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:08 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 19/27] io_uring: wire send zc request type Date: Thu, 7 Jul 2022 12:49:50 +0100 Message-Id: <073ee4f43806aa79b1715d52417944c99e9c5675.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 5 ++ io_uring/net.c | 94 +++++++++++++++++++++++++++++++++++ io_uring/net.h | 4 ++ io_uring/opdef.c | 15 ++++++ 4 files changed, 118 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f1ba8e934168..a6844908772a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -63,6 +63,10 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + struct { + __u16 notification_idx; + __u16 __pad; + }; }; union { struct { @@ -194,6 +198,7 @@ enum io_uring_op { IORING_OP_GETXATTR, IORING_OP_SOCKET, IORING_OP_URING_CMD, + IORING_OP_SENDZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/net.c b/io_uring/net.c index 2dd61fcf91d8..399267e8f1ef 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -13,6 +13,7 @@ #include "io_uring.h" #include "kbuf.h" #include "net.h" +#include "notif.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -58,6 +59,15 @@ struct io_sr_msg { unsigned int flags; }; +struct io_sendzc { + struct file *file; + void __user *buf; + size_t len; + u16 slot_idx; + unsigned msg_flags; + unsigned flags; +}; + #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -652,6 +662,90 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_sendzc *zc = io_kiocb_to_cmd(req); + + if (READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0]) || + READ_ONCE(sqe->addr3)) + return -EINVAL; + + zc->flags = READ_ONCE(sqe->ioprio); + if (zc->flags & ~IORING_RECVSEND_POLL_FIRST) + return -EINVAL; + + zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + zc->len = READ_ONCE(sqe->len); + zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; + zc->slot_idx = READ_ONCE(sqe->notification_idx); + if (zc->msg_flags & MSG_DONTWAIT) + req->flags |= REQ_F_NOWAIT; +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_sendzc *zc = io_kiocb_to_cmd(req); + struct io_notif_slot *notif_slot; + struct io_notif *notif; + struct msghdr msg; + struct iovec iov; + struct socket *sock; + unsigned msg_flags; + int ret, min_ret = 0; + + if (!(req->flags & REQ_F_POLLED) && + (zc->flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + if (issue_flags & IO_URING_F_UNLOCKED) + return -EAGAIN; + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + + notif_slot = io_get_notif_slot(ctx, zc->slot_idx); + if (!notif_slot) + return -EINVAL; + notif = io_get_notif(ctx, notif_slot); + if (!notif) + return -ENOMEM; + + msg.msg_name = NULL; + msg.msg_control = NULL; + msg.msg_controllen = 0; + msg.msg_namelen = 0; + + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); + if (unlikely(ret)) + return ret; + + msg_flags = zc->msg_flags | MSG_ZEROCOPY; + if (issue_flags & IO_URING_F_NONBLOCK) + msg_flags |= MSG_DONTWAIT; + if (msg_flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + + msg.msg_flags = msg_flags; + msg.msg_ubuf = ¬if->uarg; + msg.sg_from_iter = NULL; + ret = sock_sendmsg(sock, &msg); + + if (unlikely(ret < min_ret)) { + if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) + return -EAGAIN; + return ret == -ERESTARTSYS ? -EINTR : ret; + } + + io_req_set_res(req, ret, 0); + return IOU_OK; +} + int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_accept *accept = io_kiocb_to_cmd(req); diff --git a/io_uring/net.h b/io_uring/net.h index 81d71d164770..1dba8befebb3 100644 --- a/io_uring/net.h +++ b/io_uring/net.h @@ -40,4 +40,8 @@ int io_socket(struct io_kiocb *req, unsigned int issue_flags); int io_connect_prep_async(struct io_kiocb *req); int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_connect(struct io_kiocb *req, unsigned int issue_flags); + +int io_sendzc(struct io_kiocb *req, unsigned int issue_flags); +int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif diff --git a/io_uring/opdef.c b/io_uring/opdef.c index a7b84b43e6c2..8419b50c1d3b 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -470,6 +470,21 @@ const struct io_op_def io_op_defs[] = { .issue = io_uring_cmd, .prep_async = io_uring_cmd_prep_async, }, + [IORING_OP_SENDZC] = { + .name = "SENDZC", + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollout = 1, + .audit_skip = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_sendzc_prep, + .issue = io_sendzc, +#else + .prep = io_eopnotsupp_prep, +#endif + + }, }; const char *io_uring_get_opcode(u8 opcode) From patchwork Thu Jul 7 11:49:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8FCFC43334 for ; Thu, 7 Jul 2022 11:53:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235654AbiGGLxF (ORCPT ); Thu, 7 Jul 2022 07:53:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235300AbiGGLwV (ORCPT ); Thu, 7 Jul 2022 07:52:21 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44C5053D33; Thu, 7 Jul 2022 04:52:11 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id n10so652456wrc.4; Thu, 07 Jul 2022 04:52:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FNoKlp/SzKlWKmK/1oNh3mGrQFyB3mtnwPpvEKvsiGw=; b=Y3VVECV5nyDkb5h+3N82nH/dQ3E7F2OgTMBmUTzHgZ43JyTmExa6k6CfS0GBMvTJ+1 CoNLdWWMdKRx235F8de9jx4CQUq3qpPkE7Ef9IneyEdtqvA5qb02scGF4EpNCcpuNApe f8TuIsJgSbreYofNhnywB1TO02RUTZpf30s6ga6jibfQxh4riCxIl+4InY5om8UUD5C+ OfJdmCFuRubUyNNPtmwQwFpZIZGepY1c78xf+N9TwsJKB7VwAwqFc7OIn0+aPNAUI3XY y/wkWqjS/sURH3yRFxfvuYYaYVHg1oTQvt+Qrgg0cnFBtjJJD2xKsZdSbegPH6CQmm47 qGeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FNoKlp/SzKlWKmK/1oNh3mGrQFyB3mtnwPpvEKvsiGw=; b=1c05pXrsgkOcaSE9LQ7wWLuenOO6orgS5YS4+H1SAi5VHPJt9VnZPxvOxqgqm0VKP6 GPfR49x4wnFDFPOnOqx615Kqkm+iLWBYB/4fByDsY2SLT8DBjXCrmPhtEP8Q4eGGJl7g NhGurngX+vp/YOT44oeyoI7yqFS2+zyb9YuhI/jvw63+4UcM0q2ItKiZ1peczsJeUjZf FGA+Dl4OpowMV4iH2XynxnjdL+74NMIbVA3gThswh+HBSr6JV89yBmEQGfbFZnIsdYWF UFWJEzx3BtarlvWDMwU5hzHqKm7KG9XSw9cQPOThctk6Eyr33PtC5/8A4effDXTGFje8 Z4gA== X-Gm-Message-State: AJIora8YGfG5sePMZcRjD0UTCqBaZElcYuYaV2JppvHnauDupI0RMPqR 16TWBaJ91hCNYNAg2RnXaBBc/5j7oY7VQ/P4Q/M= X-Google-Smtp-Source: AGRyM1vpEFSGS6wbdfQ3p+XZzzOJ8na4RMVJfyFlyF75WuRNegLPfoQfnEHtqVnTL1ZTL58ugNlX6w== X-Received: by 2002:a5d:5846:0:b0:21b:c444:9913 with SMTP id i6-20020a5d5846000000b0021bc4449913mr39721499wrf.128.1657194729572; Thu, 07 Jul 2022 04:52:09 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:09 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 20/27] io_uring: account locked pages for non-fixed zc Date: Thu, 7 Jul 2022 12:49:51 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov --- io_uring/net.c | 1 + io_uring/notif.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index 399267e8f1ef..69273d4f4ef0 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -724,6 +724,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); if (unlikely(ret)) return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) diff --git a/io_uring/notif.c b/io_uring/notif.c index e6d98dc208c7..c5179e5c1cd6 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -14,7 +14,13 @@ static void __io_notif_complete_tw(struct callback_head *cb) struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_rsrc_node *rsrc_node = notif->rsrc_node; struct io_ring_ctx *ctx = notif->ctx; + struct mmpin *mmp = ¬if->uarg.mmp; + if (mmp->user) { + atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm); + free_uid(mmp->user); + mmp->user = NULL; + } if (likely(notif->task)) { io_put_task(notif->task, 1); notif->task = NULL; From patchwork Thu Jul 7 11:49:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDFDFC433EF for ; Thu, 7 Jul 2022 11:53:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235223AbiGGLxE (ORCPT ); Thu, 7 Jul 2022 07:53:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235333AbiGGLwV (ORCPT ); Thu, 7 Jul 2022 07:52:21 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5829C564DE; Thu, 7 Jul 2022 04:52:11 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id f2so20574414wrr.6; Thu, 07 Jul 2022 04:52:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=omoo5cVZNNBFIFnOaNM6NLvMePhB2gb+Cvuo04exYeA=; b=brBXziFLd5ln5hHryX/tpn9+fqyCgr70FpALY5PXPaA0YkczPFGCEGvjbIrRNa2M4Z 3MyodXjWkkihZLP6UkNyATo+01YM8ysucU11pBAq8duJKPxQJ1b0PYf58XhlXSN2Iuk1 zwSSAZSvTUrxPz6IUzr2oOxJ0DpZnrdwhMv6LpwvRyMHh6dmpUwQeggfDNp3Vszgh/xC 6S/I/Ox1qJ9zD/Zk3V7n+jcX5SC0dcFaMuUisQ5DL0w5m+4EJVO3TCffZRxhimhLSar8 KHzh4KTQNgeO7+3yQ89rNEFshGBBzkDJaWqtNC6qrKvA6u3EgW+Ye+qGWBgTLK/QMHFg 85dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=omoo5cVZNNBFIFnOaNM6NLvMePhB2gb+Cvuo04exYeA=; b=wtvKbDntugkPJLywuIIiNEormwV/xInjZdzyfm9hpbgxp7A6kYbMU5JWrczvq/d1lT qNfsZGCNBmoWcxTriHKUnZfkbhdwWfcc24APds/JTEPVD9B5+WscWkjbvKyQA6EVIFJ3 s140/GxmYKwDHUUywMDFYX09wa0N5peWPqwt1zNo+iZrn7T55wr9cTUDea5FakKQ1W9Q 7bvr8QBue1LavYE4i4PUja0dhzIOYxVkrEkY5RhwYxhqFpM/K+1zGIaIhotwAodmLe1b LPLx9S8sPUTDpr3CwYWfy1J69TYg9+Rt+J+qWiQgEZRkA7sComiT0z8S4Yc1qUdBSdGE mZ4A== X-Gm-Message-State: AJIora+INZv0DGWIIjIPEJZ1qPVHOkntsloVtv0ClX0z52S9pEyYbTdZ QuCQfG6jGaSKMJNlK/AqTMQamZf+ygrJAPRi4dc= X-Google-Smtp-Source: AGRyM1tEgLds10QNWmthl9h5KvVoRezaUvMvpPYja4GtVqUh01IvggKuGXGg392vUO45mB2Z9eEGGA== X-Received: by 2002:a05:6000:695:b0:21a:3a1a:7b60 with SMTP id bo21-20020a056000069500b0021a3a1a7b60mr41107170wrb.441.1657194730645; Thu, 07 Jul 2022 04:52:10 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:10 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 21/27] io_uring: allow to pass addr into sendzc Date: Thu, 7 Jul 2022 12:49:52 +0100 Message-Id: <75fbbc1dff3835ddd996346b86eab3b8e83435ff.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 2 +- io_uring/net.c | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a6844908772a..25278c9ac6d2 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -65,7 +65,7 @@ struct io_uring_sqe { __u32 file_index; struct { __u16 notification_idx; - __u16 __pad; + __u16 addr_len; }; }; union { diff --git a/io_uring/net.c b/io_uring/net.c index 69273d4f4ef0..2172cf3facd8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -66,6 +66,8 @@ struct io_sendzc { u16 slot_idx; unsigned msg_flags; unsigned flags; + unsigned addr_len; + void __user *addr; }; #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) @@ -666,8 +668,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = io_kiocb_to_cmd(req); - if (READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0]) || - READ_ONCE(sqe->addr3)) + if (READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)) return -EINVAL; zc->flags = READ_ONCE(sqe->ioprio); @@ -680,6 +681,10 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->slot_idx = READ_ONCE(sqe->notification_idx); if (zc->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; + + zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + zc->addr_len = READ_ONCE(sqe->addr_len); + #ifdef CONFIG_COMPAT if (req->ctx->compat) zc->msg_flags |= MSG_CMSG_COMPAT; @@ -689,6 +694,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { + struct sockaddr_storage address; struct io_ring_ctx *ctx = req->ctx; struct io_sendzc *zc = io_kiocb_to_cmd(req); struct io_notif_slot *notif_slot; @@ -726,6 +732,14 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret; mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->addr) { + ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); + if (unlikely(ret < 0)) + return ret; + msg.msg_name = (struct sockaddr *)&address; + msg.msg_namelen = zc->addr_len; + } + msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) msg_flags |= MSG_DONTWAIT; From patchwork Thu Jul 7 11:49:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B89D8C433EF for ; Thu, 7 Jul 2022 11:53:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235477AbiGGLxJ (ORCPT ); Thu, 7 Jul 2022 07:53:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235508AbiGGLwW (ORCPT ); Thu, 7 Jul 2022 07:52:22 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91362564D6; Thu, 7 Jul 2022 04:52:13 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id n10so652604wrc.4; Thu, 07 Jul 2022 04:52:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vni1PHVYJPc0849HRnVcqqDE9M5XrjitEQZXYPbPv8U=; b=OFL5y1oSF8T1GRvkKYNmcScAO4yqIEIk10aaKEg0iKcMwnB03kndXfJ+6GU22pIpS6 Dl2bySef/N6y6ob07uFgdAx3WnIVQNwBnXZfBr4e1O6kP47J401r/hxgozALSlx7iiAr ZsSBYro2Cquf2OidANmoPIaQwmgYQ49NaXV/R9HI9vGIt23/vp4uYp9+MK2TaalejpZV 53ap9qXS3vgktoRmCMDVsUq6MmxksjBEYBYHk6Y8+BN/qQGgWhBa/5LGBCP4BW2/7mEA KmrQYa1tNytiggn4R2pSwlvT7ZLuQuivWfTA9DhNKrGlQEFJYPoVbWDGevQYept512mK GVhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vni1PHVYJPc0849HRnVcqqDE9M5XrjitEQZXYPbPv8U=; b=Ib0/CD3cj2NOJnw61BAhJbRbRDb6GFq2bg+bP+WMLnrwR2/Nxw6bkqYS/TpkYsyzq9 CKvCZ69Cd0gJ7O4JzU0JAybdqWl3tFG7atQ5KDhW8y2w+/ChB0JZB6G3n3qHOKXd7fDW YBUX9nAE+R79BGzDuhuB3WtmvJb5BooHaiyPxl/t/SksYv3MetPQVSfb4lFCMhGLqO6c UaLHQJjNJl95Lf08+jeRr8wsFVOeSR+x7+fd8qTlCvuNvQdMS3XYQndl6FgLEI0tdPOT uiBRXMsM14ihEbu5wnMXt79qsgj5hovsyLMM/Sp9aR2MkBtv+T8MQZgIV0YCGqVtr5hh GmoA== X-Gm-Message-State: AJIora8i4dlStjnfAcy5EEebmbp8BcbE3vq0AQF2VyLLRdLTa5TvV0z0 8Zf8M6E0aZayavxHeIZ0sd5uVjczG5lUj10xHvA= X-Google-Smtp-Source: AGRyM1srS9rgrWbnnxWf6+ksxm+GW8CP5DpDq8gt1MXiJvO8YXYIJPVtS6K4g/DdssrgxB96d6Ossw== X-Received: by 2002:adf:fdc4:0:b0:21d:6f76:5193 with SMTP id i4-20020adffdc4000000b0021d6f765193mr14322621wrs.606.1657194731842; Thu, 07 Jul 2022 04:52:11 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:11 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 22/27] io_uring: sendzc with fixed buffers Date: Thu, 7 Jul 2022 12:49:53 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 6 +++++- io_uring/net.c | 29 ++++++++++++++++++++++++----- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 25278c9ac6d2..8d050c247d6b 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -269,9 +269,13 @@ enum io_uring_op { * IORING_RECV_MULTISHOT Multishot recv. Sets IORING_CQE_F_MORE if * the handler will continue to report * CQEs on behalf of the same SQE. + * + * IORING_RECVSEND_FIXED_BUF Use registered buffers, the index is stored in + * the buf_index field. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) -#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECVSEND_FIXED_BUF (1U << 2) /* * accept flags stored in sqe->ioprio diff --git a/io_uring/net.c b/io_uring/net.c index 2172cf3facd8..0259fbbad591 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -14,6 +14,7 @@ #include "kbuf.h" #include "net.h" #include "notif.h" +#include "rsrc.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -667,13 +668,23 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = io_kiocb_to_cmd(req); + struct io_ring_ctx *ctx = req->ctx; if (READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)) return -EINVAL; zc->flags = READ_ONCE(sqe->ioprio); - if (zc->flags & ~IORING_RECVSEND_POLL_FIRST) + if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_FIXED_BUF)) return -EINVAL; + if (zc->flags & IORING_RECVSEND_FIXED_BUF) { + unsigned idx = READ_ONCE(sqe->buf_index); + + if (unlikely(idx >= ctx->nr_user_bufs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_user_bufs); + req->imu = READ_ONCE(ctx->user_bufs[idx]); + io_req_set_rsrc_node(req, ctx, 0); + } zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); zc->len = READ_ONCE(sqe->len); @@ -727,10 +738,18 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_controllen = 0; msg.msg_namelen = 0; - ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); - if (unlikely(ret)) - return ret; - mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->flags & IORING_RECVSEND_FIXED_BUF) { + ret = io_import_fixed(WRITE, &msg.msg_iter, req->imu, + (u64)zc->buf, zc->len); + if (unlikely(ret)) + return ret; + } else { + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, + &msg.msg_iter); + if (unlikely(ret)) + return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + } if (zc->addr) { ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); From patchwork Thu Jul 7 11:49:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFF9BC433EF for ; Thu, 7 Jul 2022 11:53:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235673AbiGGLxH (ORCPT ); Thu, 7 Jul 2022 07:53:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235478AbiGGLwW (ORCPT ); Thu, 7 Jul 2022 07:52:22 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 336EB564F7; Thu, 7 Jul 2022 04:52:13 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id d16so19627632wrv.10; Thu, 07 Jul 2022 04:52:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+Q6oh46ZsOKHpCjkkX4hQw5IEBsf8Tb27Jc0y1xyQTc=; b=Ywu8iSvo174JQHme/OmVbgoGmzJf9QWePhwcoHLFs3l7yO1vexrhvmo4PmZ8nDs1al lyl0IMJecGInKp9mYAiOQrva9FuKjilMXFU4+trpGJ+gCoySjm/GkUPRTLqOLD+Rg1it YEbx8EKiGcrMOE28wRGvpC0vhv61U1EAHimH/znwTDmZIYlzonrWFvcGSOpBD3OxkZ/Q cUy4lSazTojvOEi89kknXJF0iAirvzXiku3WIoqx88dsc/SK5PhnhmowvIlVBIg2Pefc NyCFsG8HUtCZ0A/aXxyfS94V8zSOQdXotlun4ENRETG1AENXPuKMsrnOKpZJeonbZpdP DhxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+Q6oh46ZsOKHpCjkkX4hQw5IEBsf8Tb27Jc0y1xyQTc=; b=am39h89jEbnbY3EJJGO0oRaMKM8/PN0/+kiAb/znCbe+CROxH2NoAGL5TiDP6muOn7 7EWcMZxkwmtd5/KO/4QcZUcVQgb/5wzjOczcvsZCOyN7dy0GKovmOjlOY82lcLAZxNw6 SwjW4Ar5kTo4uu5qGXfuDd9DWtdNuqo0wrVIdJ4CnN0gESOuAcu/BWUDAQYIsmtF0G/F EhFqD+4iUX6MuWdA+qHpM8tpyKXV8DlQx5PjxGWvdxQx++ZZwQFlG3pC7JhD9+jnrZhf 9lCBMgSSC8xveSQ8Ggexw9crOapXQ75gCAC1fXfu8SNtiGaBgfDXVmtINMfkHTcT7Vu7 6MQQ== X-Gm-Message-State: AJIora9Jr1u9Qrni2HfQ9jz9sx9qupQAPsf8zHnLtvuXSP7Yq6YJD0Q/ YbEI6Z7Jl3IOaI9xcIaHND9CILmdossnG5sZjiY= X-Google-Smtp-Source: AGRyM1ufO0wl1R6HSZ1OretAHQkqbcdcwBbR6viluQHhQ+UrLM5pmSHsQtVyr2lMuwmDJz1zWkLzcw== X-Received: by 2002:adf:dd87:0:b0:21d:6ec4:26b0 with SMTP id x7-20020adfdd87000000b0021d6ec426b0mr15485925wrl.182.1657194733100; Thu, 07 Jul 2022 04:52:13 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:12 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 23/27] io_uring: flush notifiers after sendzc Date: Thu, 7 Jul 2022 12:49:54 +0100 Message-Id: <983172d39865aa7c6d313694f3bc2ef4f31e83a9.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 4 ++++ io_uring/io_uring.c | 11 +---------- io_uring/io_uring.h | 10 ++++++++++ io_uring/net.c | 5 ++++- io_uring/notif.c | 2 +- io_uring/notif.h | 11 +++++++++++ 6 files changed, 31 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 8d050c247d6b..37e0730733f9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -272,10 +272,14 @@ enum io_uring_op { * * IORING_RECVSEND_FIXED_BUF Use registered buffers, the index is stored in * the buf_index field. + * + * IORING_RECVSEND_NOTIF_FLUSH Flush a notification after a successful + * successful. Only for zerocopy sends. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) #define IORING_RECV_MULTISHOT (1U << 1) #define IORING_RECVSEND_FIXED_BUF (1U << 2) +#define IORING_RECVSEND_NOTIF_FLUSH (1U << 3) /* * accept flags stored in sqe->ioprio diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 41ef98a43d32..e4f3a1ede2f4 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -615,7 +615,7 @@ void __io_put_task(struct task_struct *task, int nr) put_task_struct_many(task, nr); } -static void io_task_refs_refill(struct io_uring_task *tctx) +void io_task_refs_refill(struct io_uring_task *tctx) { unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR; @@ -624,15 +624,6 @@ static void io_task_refs_refill(struct io_uring_task *tctx) tctx->cached_refs += refill; } -static inline void io_get_task_refs(int nr) -{ - struct io_uring_task *tctx = current->io_uring; - - tctx->cached_refs -= nr; - if (unlikely(tctx->cached_refs < 0)) - io_task_refs_refill(tctx); -} - static __cold void io_uring_drop_tctx_refs(struct task_struct *task) { struct io_uring_task *tctx = task->io_uring; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index b8c858727dc8..d9f2f5c71481 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -69,6 +69,7 @@ void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); void io_queue_next(struct io_kiocb *req); void __io_put_task(struct task_struct *task, int nr); +void io_task_refs_refill(struct io_uring_task *tctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); @@ -265,4 +266,13 @@ static inline void io_put_task(struct task_struct *task, int nr) __io_put_task(task, nr); } +static inline void io_get_task_refs(int nr) +{ + struct io_uring_task *tctx = current->io_uring; + + tctx->cached_refs -= nr; + if (unlikely(tctx->cached_refs < 0)) + io_task_refs_refill(tctx); +} + #endif diff --git a/io_uring/net.c b/io_uring/net.c index 0259fbbad591..bf9916d5e50c 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -674,7 +674,8 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; zc->flags = READ_ONCE(sqe->ioprio); - if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_FIXED_BUF)) + if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | + IORING_RECVSEND_FIXED_BUF | IORING_RECVSEND_NOTIF_FLUSH)) return -EINVAL; if (zc->flags & IORING_RECVSEND_FIXED_BUF) { unsigned idx = READ_ONCE(sqe->buf_index); @@ -776,6 +777,8 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret == -ERESTARTSYS ? -EINTR : ret; } + if (zc->flags & IORING_RECVSEND_NOTIF_FLUSH) + io_notif_slot_flush_submit(notif_slot, 0); io_req_set_res(req, ret, 0); return IOU_OK; } diff --git a/io_uring/notif.c b/io_uring/notif.c index c5179e5c1cd6..a93887451bbb 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -133,7 +133,7 @@ struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, return notif; } -static void io_notif_slot_flush(struct io_notif_slot *slot) +void io_notif_slot_flush(struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif = slot->notif; diff --git a/io_uring/notif.h b/io_uring/notif.h index 00efe164bdc4..6cd73d7b965b 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -54,6 +54,7 @@ int io_notif_register(struct io_ring_ctx *ctx, int io_notif_unregister(struct io_ring_ctx *ctx); void io_notif_cache_purge(struct io_ring_ctx *ctx); +void io_notif_slot_flush(struct io_notif_slot *slot); struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot); @@ -74,3 +75,13 @@ static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, idx = array_index_nospec(idx, ctx->nr_notif_slots); return &ctx->notif_slots[idx]; } + +static inline void io_notif_slot_flush_submit(struct io_notif_slot *slot, + unsigned int issue_flags) +{ + if (!(issue_flags & IO_URING_F_UNLOCKED)) { + slot->notif->task = current; + io_get_task_refs(1); + } + io_notif_slot_flush(slot); +} From patchwork Thu Jul 7 11:49:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C3CDC433EF for ; Thu, 7 Jul 2022 11:53:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235710AbiGGLxW (ORCPT ); Thu, 7 Jul 2022 07:53:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235276AbiGGLwf (ORCPT ); Thu, 7 Jul 2022 07:52:35 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36ABF564DA; Thu, 7 Jul 2022 04:52:15 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id n10so652731wrc.4; Thu, 07 Jul 2022 04:52:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vFwRp/uT2nw+K7pbh3+QCIJXYGEcJpL4pQn1YlUGf+0=; b=T9cR/BLHNHIrK0oOpk/oBRK4rI8ArYU64vwAEqghwNcpfSgKH6rC98l4HfCo1PCwwu q4ryD/ZsZdeYU3/kz8Jo/UY8v9vOvabFv9ckU00vPDD/CTG7eJDTAjs8p8GdzQ5Q2T10 HLJ7cKaTXoK6O2bsEl0Yicjr/zkpv1tSOgXwx/x0+zcdUEOW+5LGTxHBgRn/7P+EdldJ ZEocR14wEy1hGcYbjNIcBb5aubRM0ToFV+23hgh4TGNAfAObHjC6drrmI44XpFb40Qya CPhc2orqU4qYiw7+6gibLBUQ7A9YRVqo5NszYU8esw3TySprochpVnbm+Y+jg0ZjALqr 0gvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vFwRp/uT2nw+K7pbh3+QCIJXYGEcJpL4pQn1YlUGf+0=; b=n1OdnCqYz+1QxkNOYNqRF0ORPlAXKLg9M/GDl+U/P/TyWm7jzgvgSrSzhHILuWeJUL mo/Jgv+fmAUxVbcXphF69XgIUz3LhWbQXAfUz9qudpRZG4ajqKNm5NdjSHEjpg/xLorF WEBHXdXrQ9HdxrO2G707qvNvrLAv6hfsFE22Fm7FZ8CdbBVyVTW+5PcPMKhXDZpTkQg7 nJwbTjtWj1ut0IIzfCTx7031VOyfQq1vGfacsHPAt21eca3yNXrRLSq4gXLNRpPrnhHK gIZuXMHBCpjawDTiHpyIMjUsMjUpg5x/MBS6RRsxVYAtO9vDyvi14wCA5tbmIoNFhlMq NQ/Q== X-Gm-Message-State: AJIora+8/Ult8QMFq3iI/PCKIhJ6dIqBCxiZdpv/lKl6XVkgvrygaes0 Ddt0U4+/WVtI9r4OYDTOndYws4TpH1PW4Ar41pw= X-Google-Smtp-Source: AGRyM1tpQzs+FyXhlQEZIBss3MiCgfaUoICvxGnZRvJ0DZpF/zvOORBuOlNM0n4ul3LJkYbLDDF2xg== X-Received: by 2002:a5d:584e:0:b0:21c:ea0c:3734 with SMTP id i14-20020a5d584e000000b0021cea0c3734mr42204706wrf.420.1657194734356; Thu, 07 Jul 2022 04:52:14 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:13 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 24/27] io_uring: rename IORING_OP_FILES_UPDATE Date: Thu, 7 Jul 2022 12:49:55 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 12 +++++++++++- io_uring/opdef.c | 9 +++++---- io_uring/rsrc.c | 17 +++++++++++++++-- io_uring/rsrc.h | 4 ++-- 4 files changed, 33 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 37e0730733f9..9e325179a4f8 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -171,7 +171,8 @@ enum io_uring_op { IORING_OP_FALLOCATE, IORING_OP_OPENAT, IORING_OP_CLOSE, - IORING_OP_FILES_UPDATE, + IORING_OP_RSRC_UPDATE, + IORING_OP_FILES_UPDATE = IORING_OP_RSRC_UPDATE, IORING_OP_STATX, IORING_OP_READ, IORING_OP_WRITE, @@ -220,6 +221,7 @@ enum io_uring_op { #define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) #define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) #define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) + /* * sqe->splice_flags * extends splice(2) flags @@ -286,6 +288,14 @@ enum io_uring_op { */ #define IORING_ACCEPT_MULTISHOT (1U << 0) + +/* + * IORING_OP_RSRC_UPDATE flags + */ +enum { + IORING_RSRC_UPDATE_FILES, +}; + /* * IORING_OP_MSG_RING command types, stored in sqe->addr */ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 8419b50c1d3b..0fb347d1ec16 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -246,12 +246,13 @@ const struct io_op_def io_op_defs[] = { .prep = io_close_prep, .issue = io_close, }, - [IORING_OP_FILES_UPDATE] = { + [IORING_OP_RSRC_UPDATE] = { .audit_skip = 1, .iopoll = 1, - .name = "FILES_UPDATE", - .prep = io_files_update_prep, - .issue = io_files_update, + .name = "RSRC_UPDATE", + .prep = io_rsrc_update_prep, + .issue = io_rsrc_update, + .ioprio = 1, }, [IORING_OP_STATX] = { .audit_skip = 1, diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 1182cf0ea1fc..98ce8a93a816 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -21,6 +21,7 @@ struct io_rsrc_update { u64 arg; u32 nr_args; u32 offset; + int type; }; static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, @@ -658,7 +659,7 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, return -EINVAL; } -int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); @@ -672,6 +673,7 @@ int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (!up->nr_args) return -EINVAL; up->arg = READ_ONCE(sqe->addr); + up->type = READ_ONCE(sqe->ioprio); return 0; } @@ -711,7 +713,7 @@ static int io_files_update_with_index_alloc(struct io_kiocb *req, return ret; } -int io_files_update(struct io_kiocb *req, unsigned int issue_flags) +static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); struct io_ring_ctx *ctx = req->ctx; @@ -740,6 +742,17 @@ int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_rsrc_update *up = io_kiocb_to_cmd(req); + + switch (up->type) { + case IORING_RSRC_UPDATE_FILES: + return io_files_update(req, issue_flags); + } + return -EINVAL; +} + int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx, struct io_rsrc_node *node, void *rsrc) { diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index af342fd239d0..21813a23215f 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -167,6 +167,6 @@ static inline u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx) return &data->tags[table_idx][off]; } -int io_files_update(struct io_kiocb *req, unsigned int issue_flags); -int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags); +int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); #endif From patchwork Thu Jul 7 11:49:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 052E7C43334 for ; Thu, 7 Jul 2022 11:53:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235569AbiGGLxg (ORCPT ); Thu, 7 Jul 2022 07:53:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235453AbiGGLwf (ORCPT ); Thu, 7 Jul 2022 07:52:35 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 395FC5722A; Thu, 7 Jul 2022 04:52:16 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id o16-20020a05600c379000b003a02eaea815so970614wmr.0; Thu, 07 Jul 2022 04:52:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WVXTlEzEGbco4l2SuKTPybZhQRTUVdDBCHHo07YwDPw=; b=PHD4hGtVKinSi2VEzrOqtMQcPXveAsfopjXXoh+/bET9mrNZz6XUNb5AE9eVCKMj4J 968c3dnP8ePBlOJR30F/OAkI+rnOF+yBVbGhqePiHA3k21fsLLzBniqN6+N6fASSBOPs va7A3XspOdLLDDqq5GsHgBplFRWLQplcy+2cYwLpIk8OXDD9s9eR5ZyXLce/M/0knooF SlbWxGdY7USmZ2a8a11yoXDutWIVb6MTocR6mlpBGneS7Vhzueda1cu1XGNmyVdiNL23 ny7+AaJcxBtm2wj4SQxLfNcJBKRfak/CCizQb74jR5iCgFSShP9GP6OSGGAWxBSTkXXW VcYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WVXTlEzEGbco4l2SuKTPybZhQRTUVdDBCHHo07YwDPw=; b=X2m+mYX+1xiPQ2PggYNhI8P2KWKg6yhLdWwVGAyy4tCB7U9WJPWjVTVxYToI29hI62 sFVMQa0hyOtmJe7rDp7/0+RqFeBNDSfGw11J6aJNwG2OgUPG+Ewk127/imF0fBE+pWoz ytTzR18O63hVOLwSxjEU8YnXetpSTpKvnUFlOBRBbrniKDhFZwoTAHhttf3mFTrNYO0O Mn4W1FOWGvdFtizYkLb2C+mJ/IlwvXxPAvSsIPjzIGXPOI47A2OV21tA9YCQjNfYV5eM OBNxQKMDvL8H1d2KQturXZZo1+Tp/HGu9/UgkIaDNrBuNM2zCydHE88108i2kEbkP22s Hqyg== X-Gm-Message-State: AJIora8RvBkLeGv17oSY08CMfZmC1DY57QzSrZ1YMHWU/zi9yzOJwubl U6+ADHzZVxvAggMEHuv8/kxoR6aour3GjliCxAA= X-Google-Smtp-Source: AGRyM1sKMhMePb4hjwNHAc94LiE8QB3tOND/e1erVmWDOTo2PVOPE6K/0/luTB/4FAETbRvwBw20gg== X-Received: by 2002:a05:600c:3d11:b0:3a1:8c05:6e75 with SMTP id bh17-20020a05600c3d1100b003a18c056e75mr3949490wmb.203.1657194735504; Thu, 07 Jul 2022 04:52:15 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:15 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 25/27] io_uring: add zc notification flush requests Date: Thu, 7 Jul 2022 12:49:56 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 1 + io_uring/rsrc.c | 38 +++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9e325179a4f8..cbf9cfbe5fe7 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -294,6 +294,7 @@ enum io_uring_op { */ enum { IORING_RSRC_UPDATE_FILES, + IORING_RSRC_UPDATE_NOTIF, }; /* diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 98ce8a93a816..088a2dc32e2c 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -15,6 +15,7 @@ #include "io_uring.h" #include "openclose.h" #include "rsrc.h" +#include "notif.h" struct io_rsrc_update { struct file *file; @@ -742,6 +743,41 @@ static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +static int io_notif_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_rsrc_update *up = io_kiocb_to_cmd(req); + struct io_ring_ctx *ctx = req->ctx; + unsigned len = up->nr_args; + unsigned idx_end, idx = up->offset; + int ret = 0; + + io_ring_submit_lock(ctx, issue_flags); + if (unlikely(check_add_overflow(idx, len, &idx_end))) { + ret = -EOVERFLOW; + goto out; + } + if (unlikely(idx_end > ctx->nr_notif_slots)) { + ret = -EINVAL; + goto out; + } + + for (; idx < idx_end; idx++) { + struct io_notif_slot *slot = &ctx->notif_slots[idx]; + + if (!slot->notif) + continue; + if (up->arg) + slot->tag = up->arg; + io_notif_slot_flush_submit(slot, issue_flags); + } +out: + io_ring_submit_unlock(ctx, issue_flags); + if (ret < 0) + req_set_fail(req); + io_req_set_res(req, ret, 0); + return IOU_OK; +} + int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); @@ -749,6 +785,8 @@ int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) switch (up->type) { case IORING_RSRC_UPDATE_FILES: return io_files_update(req, issue_flags); + case IORING_RSRC_UPDATE_NOTIF: + return io_notif_update(req, issue_flags); } return -EINVAL; } From patchwork Thu Jul 7 11:49:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF927C43334 for ; Thu, 7 Jul 2022 11:53:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235602AbiGGLxj (ORCPT ); Thu, 7 Jul 2022 07:53:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235374AbiGGLww (ORCPT ); Thu, 7 Jul 2022 07:52:52 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D34235725A; Thu, 7 Jul 2022 04:52:18 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id a5so11266915wrx.12; Thu, 07 Jul 2022 04:52:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hdUqBQRL0M8k3/ERJQ8Tpj8kF4Bki9R+wj8zDNk4L4U=; b=CHqpDq7PFolUPTrHXiliVGpHvg84HNgoCSpt3X5917qZvoibQp6mpo9tW0rnC/XfA1 6c8wMwfXI9kago0piOD8B8edJ4/BjGgIH69n1xgc7NuMISMUXFGpXgA3ztgPQlZbRUOn GQ/JcYtlO57iVNqKyTazWMsMNSYBSlH7Oc20sfl7ynZDCDPixpj5DuYvZE7eSok6OnE2 7XdREWZpqCDFN/4k6v3kLCsOZjo4jLky/Ku1UkWKOeEu1mkBao5g12yyW0pQS1jh0kzr 1EXjw2X3vt7sitctPy+enDMeF6Z9tB0tEt4LIyYbvY1WyanMI9cPCR7+a0dfbDWxy05c e6Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hdUqBQRL0M8k3/ERJQ8Tpj8kF4Bki9R+wj8zDNk4L4U=; b=FekPAN70Vo0gE3AYxoxxoAC9RGpWk65uCtoXQMElItI0qLFsFAQWRv2H34qcjMk/Nl Jjv9AJ4gmgv8EPbCx9C2+ihN8HQp66JtZSR3iMOSHLCxFEjylp80bXD78DGU5uoTzit3 a5i5WKkXrG4L/iyi2Oaucy4XGVjyfH1QJAnc7dPLOl85CavQBaRlv0cLR5IzvXCmlJdO ul5m2UdxO0zJCV3LJAbR66rJ9U75LI3Et0IPLjQJq2CcGt+a5hn/VXZuHVvgfIFc6x0B 3CkD9IIoMHs1QgCyjEvyctZxbb67+rqI/fETc0hmv8sX11WsVZiK2QYdrp1Lsgn2MKnW qx7w== X-Gm-Message-State: AJIora/0ujyNK02X89GQZUpTYs71ythfMSMrH7XjnQRAMwVRbTV4gXbV rnIAyGAQCsP9ghYguzhzGcg+zwsCbKQxggB9fPA= X-Google-Smtp-Source: AGRyM1sDgBrksV2wP21F4DQYtpBSViC4H9NXhwp+0PZA8OMBIpGC6vbzcyfmRD90WXrCPAgd0c4vsg== X-Received: by 2002:adf:d1ea:0:b0:21b:a6cb:fcf6 with SMTP id g10-20020adfd1ea000000b0021ba6cbfcf6mr42348139wrd.477.1657194736776; Thu, 07 Jul 2022 04:52:16 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:16 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 26/27] io_uring: enable managed frags with register buffers Date: Thu, 7 Jul 2022 12:49:57 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org io_uring's registered buffers infra has a good performant way of pinning pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely register buffer backed. Signed-off-by: Pavel Begunkov --- io_uring/net.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/io_uring/net.c b/io_uring/net.c index bf9916d5e50c..a4e863dce7ec 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -704,6 +704,60 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return 0; } +static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + int frag = shinfo->nr_frags; + int ret = 0; + struct bvec_iter bi; + ssize_t copied = 0; + unsigned long truesize = 0; + + if (!shinfo->nr_frags) + shinfo->flags |= SKBFL_MANAGED_FRAG_REFS; + + if (!skb_zcopy_managed(skb) || !iov_iter_is_bvec(from)) { + skb_zcopy_downgrade_managed(skb); + return __zerocopy_sg_from_iter(NULL, sk, skb, from, length); + } + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size && frag < MAX_SKB_FRAGS) { + struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); + + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, + v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + if (bi.bi_size) + ret = -EMSGSIZE; + + shinfo->nr_frags = frag; + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; +} + int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { struct sockaddr_storage address; @@ -768,7 +822,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_flags = msg_flags; msg.msg_ubuf = ¬if->uarg; - msg.sg_from_iter = NULL; + msg.sg_from_iter = io_sg_from_iter; ret = sock_sendmsg(sock, &msg); if (unlikely(ret < min_ret)) { From patchwork Thu Jul 7 11:49:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12909416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76EF1C433EF for ; Thu, 7 Jul 2022 11:53:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235612AbiGGLxl (ORCPT ); Thu, 7 Jul 2022 07:53:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235614AbiGGLwz (ORCPT ); Thu, 7 Jul 2022 07:52:55 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 233D457275; Thu, 7 Jul 2022 04:52:19 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id bk26so10880653wrb.11; Thu, 07 Jul 2022 04:52:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HK3MEMYRU+U/PwPU+hWiGqUX1pkYCO1xXjVAHAycfNw=; b=RyfO8lzBYEUtjAkfmhI6uZVJIOUII+GHnESaBekkS02BwRSPDzCqmABIk+opalKR8m gPaoN9qCqg3InpaFDZECE1Uilwj0VUf/O+dPywRcKPxL2tGsy3cu/SFwAxHSunZvu9HC rdqj+a05hF5seCTVX7AvDN7m7LyPGpzHIlWdcMSLOes/e8bXX49w3PhgIsEYLL3cSMC6 AVl8TFEM0MTIqr9Ny8+6RxlD69IG7h9lfIKQ4kZFKnhVjwbbChpD5VgnS1Otv50wb+K8 LWyZPuZhPwcl8g0KIqFZ6UEH+5prPphKHqU2P9HddHDC5fPOnTK/C8ZdrMw3HjLBdHz6 1mng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HK3MEMYRU+U/PwPU+hWiGqUX1pkYCO1xXjVAHAycfNw=; b=Aq06cLOb3aUR/x08vzlil+xaGuNqLbqSNqrSGOtFH+MZEIfigUOquUxwUvA/Bz58Ub hv595kDDO65XY1c4YTVQSgrkiMbZa4sgNgp+u7VJd/F5t/TP4Z1wtHNE7u/cYeMUdPNE JifpaLGpSBoUvQgnY74E9E+x6HFjB4HKt/0rdvo6Xr/ihRkxRAbHc0iwe/KzKuxXwZ1v x+eVrij0tVjlogShOKt87a+eBaOpe6plmlUVpuf/CfBHfeCfuV0TgywV3CJ4KNWGLrg6 7o149Fc58njp+xGpd8k0IvAF2KI7MDaLD9KeKRI45aXQTAj70/lSm9yzgsAowUdGMcF3 qlSA== X-Gm-Message-State: AJIora9COJVwOkKdmrATync2XFvRAM3sPNGUx6m+/hhbtCNehpweLzpX LzeJGTLP4goynuLnZXCmMlrBKDce9QCfpF8KUqQ= X-Google-Smtp-Source: AGRyM1vtw7QF57GtgB3uQT4cJ9SIZDQsccV+0K7A6Eu4ioVIoxEOlwv4GqEimwaO8pTsxQcDPk+b6A== X-Received: by 2002:a05:6000:2cb:b0:21d:7760:778c with SMTP id o11-20020a05600002cb00b0021d7760778cmr11452609wry.329.1657194737977; Thu, 07 Jul 2022 04:52:17 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id u2-20020a5d5142000000b0021b966abc19sm37982131wrt.19.2022.07.07.04.52.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 04:52:17 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v4 27/27] selftests/io_uring: test zerocopy send Date: Thu, 7 Jul 2022 12:49:58 +0100 Message-Id: <1f3ebfcbc5628d7bf8d03ceb1589594f1001b774.1657194434.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add selftests for io_uring zerocopy sends and io_uring's notification infrastructure. It's largely influenced by msg_zerocopy and uses it on the receive side. Signed-off-by: Pavel Begunkov --- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 605 ++++++++++++++++++ .../selftests/net/io_uring_zerocopy_tx.sh | 131 ++++ 3 files changed, 737 insertions(+) create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 7ea54af55490..51261483744e 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -59,6 +59,7 @@ TEST_GEN_FILES += toeplitz TEST_GEN_FILES += cmsg_sender TEST_GEN_FILES += stress_reuseport_listen TEST_PROGS += test_vxlan_vnifiltering.sh +TEST_GEN_FILES += io_uring_zerocopy_tx TEST_FILES := settings diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c new file mode 100644 index 000000000000..00127a271d97 --- /dev/null +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -0,0 +1,605 @@ +/* SPDX-License-Identifier: MIT */ +/* based on linux-kernel/tools/testing/selftests/net/msg_zerocopy.c */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define NOTIF_TAG 0xfffffffULL +#define NONZC_TAG 0 +#define ZC_TAG 1 + +enum { + MODE_NONZC = 0, + MODE_ZC = 1, + MODE_ZC_FIXED = 2, + MODE_MIXED = 3, +}; + +static bool cfg_flush = false; +static bool cfg_cork = false; +static int cfg_mode = MODE_ZC_FIXED; +static int cfg_nr_reqs = 8; +static int cfg_family = PF_UNSPEC; +static int cfg_payload_len; +static int cfg_port = 8000; +static int cfg_runtime_ms = 4200; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); + +struct io_sq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + unsigned *flags; + unsigned *array; +}; + +struct io_cq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *kflags; + unsigned *kdropped; + unsigned *array; + struct io_uring_sqe *sqes; + + unsigned sqe_head; + unsigned sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#ifdef __alpha__ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 535 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 536 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 537 +# endif +#else /* !__alpha__ */ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 425 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 426 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 427 +# endif +#endif + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else + +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static int io_uring_setup(unsigned int entries, struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_register_notifications(struct io_uring *ring, + unsigned nr, + struct io_uring_notification_slot *slots) +{ + int ret; + struct io_uring_notification_register r = { + .nr_slots = nr, + .data = (unsigned long)slots, + }; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_NOTIFIERS, &r, sizeof(r)); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static int io_uring_queue_init(unsigned entries, struct io_uring *ring, + unsigned flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +static int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned mask = *sq->kring_mask; + unsigned ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8) IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long) buf; + sqe->len = len; + sqe->msg_flags = (__u32) flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned slot_idx, unsigned zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8) IORING_OP_SENDZC; + sqe->notification_idx = slot_idx; + sqe->ioprio = zc_flags; +} + +static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned mask = *cq->kring_mask; + unsigned head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_setsockopt(int fd, int level, int optname, int val) +{ + if (setsockopt(fd, level, optname, &val, sizeof(val))) + error(1, errno, "setsockopt %d.%d: %d", level, optname, val); +} + +static int do_setup_tx(int domain, int type, int protocol) +{ + int fd; + + fd = socket(domain, type, protocol); + if (fd == -1) + error(1, errno, "socket t"); + + do_setsockopt(fd, SOL_SOCKET, SO_SNDBUF, 1 << 21); + + if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + return fd; +} + +static void do_tx(int domain, int type, int protocol) +{ + struct io_uring_notification_slot b[1] = {{.tag = NOTIF_TAG}}; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned long packets = 0, bytes = 0; + struct io_uring ring; + struct iovec iov; + uint64_t tstop; + int i, fd, ret; + int compl_cqes = 0; + + fd = do_setup_tx(domain, type, protocol); + + ret = io_uring_queue_init(512, &ring, 0); + if (ret) + error(1, ret, "io_uring: queue init"); + + ret = io_uring_register_notifications(&ring, 1, b); + if (ret) + error(1, ret, "io_uring: tx ctx registration"); + + iov.iov_base = payload; + iov.iov_len = cfg_payload_len; + + ret = io_uring_register_buffers(&ring, &iov, 1); + if (ret) + error(1, ret, "io_uring: buffer registration"); + + tstop = gettimeofday_ms() + cfg_runtime_ms; + do { + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 1); + + for (i = 0; i < cfg_nr_reqs; i++) { + unsigned zc_flags = 0; + unsigned buf_idx = 0; + unsigned slot_idx = 0; + unsigned mode = cfg_mode; + unsigned msg_flags = 0; + + if (cfg_mode == MODE_MIXED) + mode = rand() % 3; + + sqe = io_uring_get_sqe(&ring); + + if (mode == MODE_NONZC) { + io_uring_prep_send(sqe, fd, payload, + cfg_payload_len, msg_flags); + sqe->user_data = NONZC_TAG; + } else { + if (cfg_flush) { + zc_flags |= IORING_RECVSEND_NOTIF_FLUSH; + compl_cqes++; + } + io_uring_prep_sendzc(sqe, fd, payload, + cfg_payload_len, + msg_flags, slot_idx, zc_flags); + if (mode == MODE_ZC_FIXED) { + sqe->ioprio |= IORING_RECVSEND_FIXED_BUF; + sqe->buf_index = buf_idx; + } + sqe->user_data = ZC_TAG; + } + } + + ret = io_uring_submit(&ring); + if (ret != cfg_nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < cfg_nr_reqs; i++) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + + if (cqe->user_data == NOTIF_TAG) { + compl_cqes--; + i--; + } else if (cqe->user_data != NONZC_TAG && + cqe->user_data != ZC_TAG) { + error(1, cqe->res, "invalid user_data"); + } else if (cqe->res <= 0 && cqe->res != -EAGAIN) { + error(1, cqe->res, "send failed"); + } else { + if (cqe->res > 0) { + packets++; + bytes += cqe->res; + } + /* failed requests don't flush */ + if (cfg_flush && + cqe->res <= 0 && + cqe->user_data == ZC_TAG) + compl_cqes--; + } + io_uring_cqe_seen(&ring); + } + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 0); + } while (gettimeofday_ms() < tstop); + + if (close(fd)) + error(1, errno, "close"); + + fprintf(stderr, "tx=%lu (MB=%lu), tx/s=%lu (MB/s=%lu)\n", + packets, bytes >> 20, + packets / (cfg_runtime_ms / 1000), + (bytes >> 20) / (cfg_runtime_ms / 1000)); + + while (compl_cqes) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + io_uring_cqe_seen(&ring); + compl_cqes--; + } +} + +static void do_test(int domain, int type, int protocol) +{ + int i; + + for (i = 0; i < IP_MAXPACKET; i++) + payload[i] = 'a' + (i % 26); + do_tx(domain, type, protocol); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-f] [-n] [-z0] [-s] " + "(-4|-6) [-t