From patchwork Tue Jul 12 20:52:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915627 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9459FCCA481 for ; Tue, 12 Jul 2022 20:53:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233845AbiGLUxN (ORCPT ); Tue, 12 Jul 2022 16:53:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231881AbiGLUxL (ORCPT ); Tue, 12 Jul 2022 16:53:11 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44DD2CC7B5; Tue, 12 Jul 2022 13:53:10 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id d16so12795966wrv.10; Tue, 12 Jul 2022 13:53:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CSpyPINBLSakL+f4XH07kdAC7ZTTg0Q5TWmfGDE72to=; b=Y+wrjysnIf+pzI8B8xqNCUTh14dN/zlIPcJ9UW8YNJmtoUAx2ezf1sOfSZm3vkH7Bc p0s1vTjKonmXX7nhumEl54Im9VOHHFSp4J1VApk/GSGpMayalkfRL6Q1tIV9h9vapNrU 8UKiAVaPZGQq17kwJG4nVe5L0hpUBYfrof8qAjJJIrypYAmuhWKZYwC/F/ROAUEOS0mY cxenRqLT3vH/XAw8KoR90Ug/5vMqffQN/K+D5Amd9ON5WtrVEE0eiAxgPjDbqxI/Tv+f UFyEyIi7XIRLpJxxdtzd43tCYjNzxMUGcWguncj0satuwtITbE7TPrPywEdl3uwC4lMD ksVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CSpyPINBLSakL+f4XH07kdAC7ZTTg0Q5TWmfGDE72to=; b=a+l5m4RKm5x4V/+Eg3smWin0HoR46rJPjwCt2L16yDQQDmCxlH/Ga6Jfh5IdscCf7X am18aEVOC4WX/Xjh32ZQ72MoA0oPYC9M4fepthXAai4mgM1WS75R4vHnQkg5QbcvA82h r5XBsHnbKtXczhki8tAb8KTsLSnLo3UbgIq2ganom7dV/eODjdAAgxMZMYVujLg6QZxx KaF5ChNZyWKBIEqb0CWRTY70dHWZRWzDXojTqkGWtwLvXSdvknCD3CzivWaCa8KFBdI0 VTzFHDpiJM+pdNvaEPjGRckuoV5+7drl3a4dZjpfk6iQA5kK0IG8+95B+bqPHQ9vW9qR fOKw== X-Gm-Message-State: AJIora+qTgL+vlqIOSsTepeY0FC4JR+iO1VU+mo330NJFcfAwBW7XwcV x6pFXsRQSo3CqrqpJHmU7GRddOVc/nk= X-Google-Smtp-Source: AGRyM1v58aBWnzFvNd+rChFlVZWSBCMw8glPdGEKMpixHueVz/rQD156fzx7NkGato8LvPxTefQCIQ== X-Received: by 2002:a05:6000:1545:b0:21d:8f3e:a0bd with SMTP id 5-20020a056000154500b0021d8f3ea0bdmr24536599wry.697.1657659188411; Tue, 12 Jul 2022 13:53:08 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:08 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 01/27] ipv4: avoid partial copy for zc Date: Tue, 12 Jul 2022 21:52:25 +0100 Message-Id: <0eb1cb5746e9ac938a7ba7848b33ccf680d30030.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 00b4bf26fd93..581d1e233260 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk, struct inet_sock *inet = inet_sk(sk); struct ubuf_info *uarg = NULL; struct sk_buff *skb; - struct ip_options *opt = cork->opt; int hh_len; int exthdrlen; @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; unsigned int maxfraglen, fragheaderlen, maxnonfragsize; int csummode = CHECKSUM_NONE; struct rtable *rt = (struct rtable *)cork->dst; @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Jul 12 20:52:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915632 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 377CFCCA483 for ; Tue, 12 Jul 2022 20:53:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233871AbiGLUxQ (ORCPT ); Tue, 12 Jul 2022 16:53:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233817AbiGLUxM (ORCPT ); Tue, 12 Jul 2022 16:53:12 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56E30CC782; Tue, 12 Jul 2022 13:53:11 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id bu1so11641570wrb.9; Tue, 12 Jul 2022 13:53:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SL5QCbmxRVN2Iu3K+3SZdBj/leE9bYZIxrF5QeizLlc=; b=n2thYeKr1uORHjbkNzteTnpKgpMfvug7+zTbObHML5qGqLzY/C76gISJd97CDl9Mkn scpOuGyeZ85AjsqLKzk+4x3fde+ddmn67ONyKs1lX6ng9dieMxDnO6Jlmj68hP5nkG/D UX6x1xewtQz1GheammHBKTR53Gr8Sp48DR4ldG2anG0RGTKEshpyA9eHatOTlLMv2QSJ qVDaYNzhDU1ck6qRHPjyn+UyS2h9JfKFoS9yGNeMzZuOYJoGX6WG7xGV9Avc/BelRW3e v5HXg6EPFzTTznI94rawQuANyEYRZHQz8D491zACb3GcjV25F7wdC2pOWM6Mmm8g83PK aPhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SL5QCbmxRVN2Iu3K+3SZdBj/leE9bYZIxrF5QeizLlc=; b=FVtr4g4mEtmjRapYaoIDb/PoofCVsRsKbOOJWd79a+ne4vQbswxVUPC+a7iw7Cl1kJ Pg/j9fGlKH3F9Bgp2YgN/TUBuOdPpZqWItmzAR2xy1VcEeGq87pxSz0tRyG2RI6cjz87 feiT57JUaYUF1bdRqze7PBnRrhy4SvKrP8apD6FRrYs4pqjuzZ89bMen12pheSTmMVCO M0PC2fwtIpB/QDE6m4rOsDhbtGVu9EugkCNUezObp0/Ed2m17gBa6Dfd1MhAPwKZK8fc A0bgMPWTyP1lS0P/M7+PDwd30cE8a/V9ejHbzXQweqTeqvVPS7vbxJ4p+n//hdO/XFgM mEHw== X-Gm-Message-State: AJIora9WK2nCAh2J7giquJMoh+i7yo4fPQ4eBGTMg+kJ5edGzzHhdxOu 3JSvzgI4zl0nqNr8roON/EqyDQwyfSw= X-Google-Smtp-Source: AGRyM1v5hj1bXBu/gKqNE7Gnj+j+mM/0mepgtXvGc0RMujzfF+eMeQfJNyizw0zNBg1PWLKXYRRQMg== X-Received: by 2002:a5d:588b:0:b0:21d:a918:65a5 with SMTP id n11-20020a5d588b000000b0021da91865a5mr10606510wrf.210.1657659189643; Tue, 12 Jul 2022 13:53:09 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:09 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 02/27] ipv6: avoid partial copy for zc Date: Tue, 12 Jul 2022 21:52:26 +0100 Message-Id: <899f19034c94ce4ce75464df132edf1b3a192ebd.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 77e3f5970ce4..fc74ce3ed8cc 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1464,6 +1464,7 @@ static int __ip6_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; u32 tskey = 0; struct rt6_info *rt = (struct rt6_info *)cork->dst; struct ipv6_txoptions *opt = v6_cork->opt; @@ -1549,6 +1550,7 @@ static int __ip6_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1630,9 +1632,12 @@ static int __ip6_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Jul 12 20:52:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915633 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 814FBC43334 for ; Tue, 12 Jul 2022 20:53:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233896AbiGLUxR (ORCPT ); Tue, 12 Jul 2022 16:53:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233849AbiGLUxN (ORCPT ); Tue, 12 Jul 2022 16:53:13 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1543CDA00; Tue, 12 Jul 2022 13:53:12 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id o4so12838546wrh.3; Tue, 12 Jul 2022 13:53:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JiEsoyPR3qDUo4aXLZKMJBh9ZQXVTPlwfAb5ffGVmj0=; b=kOIFXsBdRObEqASUMsZzEh/aqoqwqbwCDOMq4rqv3GXJSxLYwgkQuRVixFReggbWWP W7wsVjozBM9egJSxeN8TJhWQPbrnjWWSOh04GlSP2Ay62v2qBc6/uVSIiaSaWY+ThvMK PpDmYplls0IEY9flosZjKGTz829ZsJs86Zoh8U1kzgohq46fbjiP1IP5/b5M3C/UWl1E r0uXA4xUabOQH8R0zXEnlRR37+NAxQN4OlNd7Ciho3eC1JGz6m9Y8q8/lxurekHLwJkU A7AZbLFUxbpZh/wcpokeIjuhHdGUv7Ij0CmleYgIZ6y3LkLiXCJz8fgrj9jWFdxeZp5d TuLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JiEsoyPR3qDUo4aXLZKMJBh9ZQXVTPlwfAb5ffGVmj0=; b=orwbmPSfCbwUtDBciK0+on9LY1eytetwQzymSauFoQ7V8mL7CjCSFtwXKq7axf4r6L mw0SVoIxrzeWnhfmhBxWiv9Too5pmqnqbUbvHC5oUOhva+e7FcqHDItrVu6QLGgpySHR DhJLHsIB40D+c/VONnEvGqy1fF9YlDzdO4+l/Rf5b0+C+PAcy68FUqUVug9CvJEEsHdS SSiq236eh3rnr0nrq4/1S/ykKeemIYHu21oN154ic+g/D+zbx/0EPozrNxOrB/ARzzic C4F7mZJo4Y/bzj4ODkbIek2oDmdu1QR1tAhNaR6yfpFdVO8IbOdGakEqsoTcped8SdzG C/Zw== X-Gm-Message-State: AJIora9BCJk2Rs1Rl2+TYLaVnlbuVqYUrKB1HlRI9q1UDpxFAGl8UxSB xDX5SvH7aZxg0Yb2yIWyikJwnu/DQIU= X-Google-Smtp-Source: AGRyM1uBhbY/jrqGtZBYelVX2VgWJoU+7kXWyEfnVRt1KHsCgZ9RpY5dibKUaueTWeOjEyxP0ZTDSg== X-Received: by 2002:a5d:5c05:0:b0:21d:83b4:d339 with SMTP id cc5-20020a5d5c05000000b0021d83b4d339mr23672288wrb.611.1657659190968; Tue, 12 Jul 2022 13:53:10 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:10 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 03/27] skbuff: don't mix ubuf_info from different sources Date: Tue, 12 Jul 2022 21:52:27 +0100 Message-Id: <8fc991e842a43fef95b09f2d387567d06999c11c.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org We should not append MSG_ZEROCOPY requests to skbuff with non MSG_ZEROCOPY ubuf_info, they might be not compatible. Signed-off-by: Pavel Begunkov --- net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b3559cb1d82..09f56bfa2771 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1212,6 +1212,10 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, const u32 byte_limit = 1 << 19; /* limit to a few TSO */ u32 bytelen, next; + /* there might be non MSG_ZEROCOPY users */ + if (uarg->callback != msg_zerocopy_callback) + return NULL; + /* realloc only when socket is locked (TCP, UDP cork), * so uarg->len and sk_zckey access is serialized */ From patchwork Tue Jul 12 20:52:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915634 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CAF2C43334 for ; Tue, 12 Jul 2022 20:53:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233923AbiGLUxe (ORCPT ); Tue, 12 Jul 2022 16:53:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230161AbiGLUxN (ORCPT ); Tue, 12 Jul 2022 16:53:13 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB0D4CDA1C; Tue, 12 Jul 2022 13:53:12 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id bu1so11641726wrb.9; Tue, 12 Jul 2022 13:53:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PuXIMDYTsEzEI4UhsZdMwE6aeyhFjCPDoQCmzh6YH3o=; b=QP9OK++DUQUGUN/TfnVfXW4o6NNZJOcm61ED+dTvcywJIfbyqHNgd2Iwavh3F+mj9X Rjh9ftLgNpOdYsR56P1d2hM6QN7oPIU2M6c5825vOLnJWB+0arqLlzmQi+2Wn4GO9+61 1gIyVdHgeAj1RcEH4EB5kV23mslrkcXswLTfLPwFtMm/MsJVRrl9TNtAgfZVzLhG5YiI dOHZbdQWqvp0Pfah2v/+PCCJe7t218O6mFrzIag8U+Vq4xBhBeap6dtelnPVwfz91bjw hlmy0qKRCXCORMdWjRA/4GE0yqCcysmexvnDdAwb385pwzE6RDkNFwp+4kNDv++M0ENe JWXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PuXIMDYTsEzEI4UhsZdMwE6aeyhFjCPDoQCmzh6YH3o=; b=MGeKL86jPpNbD/GScwSSXLAjj7XMQRgbknPRpyHutrRftVFqgt48iYJvOw7NRl21FA QiG/FVt/wA8cCzJouOBMGipEuyIWllNL9Xu7lUqHwplmuh/wnTYaiM9ms/bxya2B0LFL JwRZUl8BgcD2RyIUNGCn9gIDZMs9TbQEGejSwOgQ7gyo7fG7FP3x094bJwjn+ZxnDV9k JG3dY8K1El534vvjLomYvZJ250g3e2olKcRIOexHAffW41fsJbltyE5B7k4InE5hO+1/ 2/l9Aeus0Xvce3u7PPAXm9WRKbzI5WQWs8kGYL6dZvEuuoJBtH7upL5ZMhgSz3mBKoog m/MQ== X-Gm-Message-State: AJIora+PV0bcM7d+yvSZcEQacFnguYPLwXtVlDG85SsUPyhyjknR+6tX SwTBKHSgjuXeQoNWI1qmjMz4dr5/RLs= X-Google-Smtp-Source: AGRyM1t8LKotJskqrU/NswXAsKu+ZHQ2weZAQ/2YMkDGFacGQX1f8sJ9NH26DiTkDDbVTeW5UwOdxw== X-Received: by 2002:a5d:6b43:0:b0:21d:7d01:b314 with SMTP id x3-20020a5d6b43000000b0021d7d01b314mr25127544wrw.357.1657659192085; Tue, 12 Jul 2022 13:53:12 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:11 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Date: Tue, 12 Jul 2022 21:52:28 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 8 +++++--- net/core/skbuff.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d3d10556f0fa..8e12b3b9ad6c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -686,10 +686,13 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) -#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY) +#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ + SKBFL_DONT_ORPHAN) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -3182,8 +3185,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 09f56bfa2771..fc22b3d32052 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1193,7 +1193,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; - uarg->flags = SKBFL_ZEROCOPY_FRAG; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; refcount_set(&uarg->refcnt, 1); sock_hold(sk); From patchwork Tue Jul 12 20:52:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915635 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65C93C433EF for ; Tue, 12 Jul 2022 20:53:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234041AbiGLUxo (ORCPT ); Tue, 12 Jul 2022 16:53:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231857AbiGLUxh (ORCPT ); Tue, 12 Jul 2022 16:53:37 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52869CEB8B; Tue, 12 Jul 2022 13:53:15 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id h17so12872942wrx.0; Tue, 12 Jul 2022 13:53:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gX+xYQxM94KdtlI7YGh/i2Kh7GR2uKZ7IifaNrt+CnI=; b=qwHfey5WrSh60CwhUhPpq9J8tMgPyA7SZrCS0uF/kzXsPtQ8Qk713E1Zm8O7OEwpli BEhdg+7WlQv0h4u5QBgCsk/caRl+1HXHwz84/fL63Oz/UD+QWjJNCoR44z77rirM68xS yh4m0OkzkPcCrPHX0m3uMruKKKfY/+u8w7BJGitMF062xQYL0SXHbKAeR09KLaNY5FZH syKClEWRPA48a8VLoxlYcPPxN9Sj0j++LO8k4BTQdUOErukuiiWP8+u2rDsuygIV+0xP a2ybPvIDpHK/VFiBVVjs9vMnc8PZlCx3p4M45yniprJGOQXUF+kKnIusZrcrdL/WTG7o Xvvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gX+xYQxM94KdtlI7YGh/i2Kh7GR2uKZ7IifaNrt+CnI=; b=7YauKXK8kAFhqSGt63X2d+NGtKaXOreDSaUmPmoNPtVbDCw1DXm7SFnVXle+w2Rp3g ZCtGyaq9KNuwEei81kZP26ZuKY9QAoaAee7eJZAw+ft3FerhtsEDWZvyeNrqOS/l7CsV B84IENg1jbLAQrGGbyIOLEOZz7qPYO5oRWKfU8F8Y8py1B4OlCz5KhfcmHPjdI5kFhh3 J+RYnxbDS84VqGWHJMjd2tYjUySOJZCGoxJ2SvobzMRfeZKwYZm35Xs8vQDnETWtvw3m EQGvFQpap1olEqtP0WuPGd99UDTGATMNck1PHjBQ2Go/YQwBVJeJJe/X18ahLm9PICeq eE5Q== X-Gm-Message-State: AJIora9rOpUZVl4lQdZA07iVDzi0HXvzH0bAF8kSg//5LnoJUlz9xBQS xDiid21C+wfmxCzoSNlx6BruYIjwjQ4= X-Google-Smtp-Source: AGRyM1vJO6/TlD4KPbS4LApDW2ydFy9CLGTsitQWz16pq4eg5pqazjcj6Muji88ifwzIrnA1KmbRDw== X-Received: by 2002:adf:eccb:0:b0:21d:7b41:22c7 with SMTP id s11-20020adfeccb000000b0021d7b4122c7mr22293206wro.543.1657659193344; Tue, 12 Jul 2022 13:53:13 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:12 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 05/27] skbuff: carry external ubuf_info in msghdr Date: Tue, 12 Jul 2022 21:52:29 +0100 Message-Id: <2c3ce22ec6939856cef4329d8c95e4c8dfb355d8.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov --- include/linux/socket.h | 1 + net/compat.c | 1 + net/socket.c | 3 +++ 3 files changed, 5 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..7bac9fc1cee0 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -69,6 +69,7 @@ struct msghdr { unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { diff --git a/net/compat.c b/net/compat.c index 210fc3b4d0d8..6cd2e7683dd0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -80,6 +80,7 @@ int __get_compat_msghdr(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/socket.c b/net/socket.c index 2bc8773d9dc5..ed061609265e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2106,6 +2106,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2171,6 +2172,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2409,6 +2411,7 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0; From patchwork Tue Jul 12 20:52:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915636 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BADECCA481 for ; Tue, 12 Jul 2022 20:53:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233176AbiGLUxp (ORCPT ); Tue, 12 Jul 2022 16:53:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233956AbiGLUxl (ORCPT ); Tue, 12 Jul 2022 16:53:41 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DD8ACEB9F; Tue, 12 Jul 2022 13:53:16 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id ay25so5410710wmb.1; Tue, 12 Jul 2022 13:53:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XCXfZmOW7gRJPeEUDxdP0X8k7tXP2rnXiHz0tBC0xrs=; b=k1XhABbbyFgYPio4xnYEEE2npAsh4iDTVRFhYdo9Tx8++hkR70imBU2qduW2nZ9qnp KQl555FWZyHML+pXridL1VGARny18hsV1emlz/9tZEbWzTR5KdRVtJnvWWgO5EFjABsy suGFOBfgzTjImHdKx71+pNYBqguuZ8hjn3/slhuOaHQbA6Fa8N3/Dykcf4jY6j2w+oXQ rZnxvOEsgJqWl7s+p7caUm1ngatv9GgsVsO+CbAIHWKz3KtItzzAwjyObtttSdqA8y+H Pw8HdkK7tCHHsRUMo8E8PgV6tSjz5z5721WGP/AEayk78SOkdZOENbQUu1lJ6X4OjYB6 Yldw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XCXfZmOW7gRJPeEUDxdP0X8k7tXP2rnXiHz0tBC0xrs=; b=q8YaOEK/s+lOb2rBO2qhnM9l0fQ8vWxIhimG4qIa/8lQW8oI9Vt/jnM/SdEfPhMXPr iAG/vmVTC0bUPoHxNHIRr432uHI9ph6mpF3Ti2vO9Ek71h5FZk+K77RglxbRGNAOq4bU zRQznBujfC9N336Mo7E0lQil4wwqtnUgrFonmsVtQDH58WzDplVK7QTL5GDxHuj+Af50 maHnAD6SudmUEGpsMrhbKp1v7XZm78i0GV5kUtjet/OUfztQZkagu3ZVH1EibpWn5wzF kfbg+08XxhUQtlPbtFaRjmsMpR2cmVNT8LVgikMR3bVmzKyEiXr/nbi4A6Gk3KZfl/zQ GDJw== X-Gm-Message-State: AJIora9JS3BOUJ8GLAgydEhZTMXsDMOLcrZ8u7//MvI5Iz8O1pnQfIg4 ovLaAkLB7xYo6ThOAUppxEBpqGMVmPo= X-Google-Smtp-Source: AGRyM1tN6uTuVFmURH7snLFs7M0nKdVHmKgEPj3Qw/eV+oN8YQDLh9pb98++A5OJlJxBbvvRx4s6LA== X-Received: by 2002:a05:600c:3788:b0:3a2:f2a5:4e61 with SMTP id o8-20020a05600c378800b003a2f2a54e61mr2295347wmr.196.1657659194472; Tue, 12 Jul 2022 13:53:14 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:14 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 06/27] net: Allow custom iter handler in msghdr Date: Tue, 12 Jul 2022 21:52:30 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: David Ahern Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split. Signed-off-by: David Ahern [pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 7 ++++--- include/linux/socket.h | 4 ++++ net/core/datagram.c | 14 ++++++++++---- net/core/skbuff.c | 2 +- 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8e12b3b9ad6c..a8a2dd4cfdfd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1776,13 +1776,14 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, bool success); -int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length); +int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, + struct sk_buff *skb, struct iov_iter *from, + size_t length); static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len) { - return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, len); + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len); } int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, diff --git a/include/linux/socket.h b/include/linux/socket.h index 7bac9fc1cee0..3c11ef18a9cf 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -14,6 +14,8 @@ struct file; struct pid; struct cred; struct socket; +struct sock; +struct sk_buff; #define __sockaddr_check_size(size) \ BUILD_BUG_ON(((size) > sizeof(struct __kernel_sockaddr_storage))) @@ -70,6 +72,8 @@ struct msghdr { __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ struct ubuf_info *msg_ubuf; + int (*sg_from_iter)(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length); }; struct user_msghdr { diff --git a/net/core/datagram.c b/net/core/datagram.c index 50f4faeea76c..28cdb79df74d 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -613,10 +613,16 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); -int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length) +int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, + struct sk_buff *skb, struct iov_iter *from, + size_t length) { - int frag = skb_shinfo(skb)->nr_frags; + int frag; + + if (msg && msg->sg_from_iter) + return msg->sg_from_iter(sk, skb, from, length); + + frag = skb_shinfo(skb)->nr_frags; while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; @@ -702,7 +708,7 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from) if (skb_copy_datagram_from_iter(skb, 0, from, copy)) return -EFAULT; - return __zerocopy_sg_from_iter(NULL, skb, from, ~0U); + return __zerocopy_sg_from_iter(NULL, NULL, skb, from, ~0U); } EXPORT_SYMBOL(zerocopy_sg_from_iter); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index fc22b3d32052..f5a3ebbc1f7e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1358,7 +1358,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, if (orig_uarg && uarg != orig_uarg) return -EEXIST; - err = __zerocopy_sg_from_iter(sk, skb, &msg->msg_iter, len); + err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len); if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) { struct sock *save_sk = skb->sk; From patchwork Tue Jul 12 20:52:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915637 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F90BCCA483 for ; Tue, 12 Jul 2022 20:53:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234094AbiGLUxq (ORCPT ); Tue, 12 Jul 2022 16:53:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233739AbiGLUxm (ORCPT ); Tue, 12 Jul 2022 16:53:42 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3CDCCEBBD; Tue, 12 Jul 2022 13:53:17 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id r129-20020a1c4487000000b003a2d053adcbso82369wma.4; Tue, 12 Jul 2022 13:53:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=YjVAy/XzqYSepr6pCaF7TexnaeT3/z2JRP6ZUmYU4No=; b=Y0UrfsORwJyIQSf0p6ap/9rJypqa+lWdMkdHizOo9KK634pPoRnhuBR/mDKXNT016U DLX6w4onOIS9F5xidKNElZQ+/xsw1JcXJwG/GZuR89XS/x5TxDpw2+vcB3GKHqXZFl/E 4ZsNVvuBgsZDoU1xgjaWX2muDOrzOPZUYh+gIZSbXit7st1E75yWEnOMIOJpX0w7j2my linNL/BipzgF6ogKTuSF48NM9mCefG6QgfJ1H9mbQRbYW4KFi2Q7qwg7oNcvQcxqT9/t vi+FRgu2c4JoZYfY8qmBERIhVYgmDCOZXLTxZVPgS8p6RGpuseyEA8ZgP/11wlqvAcPb OGfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YjVAy/XzqYSepr6pCaF7TexnaeT3/z2JRP6ZUmYU4No=; b=FSOQHmhlg6ip0CV+37zpxqJH8AHsQsX/nuxc4shkCBhL0hhJTii4Ggbj9lNFR1xliS lN3vXW1N7KYuODNHPvHGHt0lUf2OjTsrJPsaUMaSRQnDVsGfaFX3WDN427I5jTLBv10G 1aCu+fQu6POj/AByx5mYMd3ZaJg/PEbRHsH2eZeNfZOpmVTJgeF7WAA+RiPTl4RnZMFN rEtiPnppg8rPw4NQCtaVWUZYbm6+aPLQ22tJYMe3U/qEphItRWikI1UtihBLDW6ZyUW5 J/7d35TSEe5IAakVQrc5ax4DIHYxCq53L/E1wmfJVrZPmLkGaZojxYNYbN02A6m3iWJV IXOQ== X-Gm-Message-State: AJIora+Xfukz4zV0eAoXOsuUwvRZbviH8v8Z7+WCDKKLTGl2LC+nFk5A nMTZE9r2iOr+YANt+zv9NTdcTQCit4E= X-Google-Smtp-Source: AGRyM1vtfqtujdE4oHDsalKfLY3I3h7DqcDOQRZ+6iyuI6QAlAEibyZe67mYwPrKFC82bzEcoqrJSg== X-Received: by 2002:a1c:f208:0:b0:3a2:dc06:f3fe with SMTP id s8-20020a1cf208000000b003a2dc06f3femr5921709wmc.119.1657659195698; Tue, 12 Jul 2022 13:53:15 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:15 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 07/27] net: introduce managed frags infrastructure Date: Tue, 12 Jul 2022 21:52:31 +0100 Message-Id: <83c1d2b77aa4fa2a2b1666e57fae931e7ca8e933.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Some users like io_uring can do page pinning more efficiently, so we want a way to delegate referencing to other subsystems. For that add a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold page references and upper layers are responsivle to managing page lifetime. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. It's needed, for instance, to avoid mixing managed modes. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 25 +++++++++++++++++++++++-- net/core/skbuff.c | 29 +++++++++++++++++++++++++++-- 2 files changed, 50 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a8a2dd4cfdfd..07004593d7ca 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -688,11 +688,16 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAG_REFS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ - SKBFL_DONT_ORPHAN) + SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -1810,6 +1815,11 @@ static inline bool skb_zcopy_pure(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_PURE_ZEROCOPY; } +static inline bool skb_zcopy_managed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -1884,6 +1894,14 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) } } +void __skb_zcopy_downgrade_managed(struct sk_buff *skb); + +static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + if (unlikely(skb_zcopy_managed(skb))) + __skb_zcopy_downgrade_managed(skb); +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -3499,7 +3517,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!skb_zcopy_managed(skb)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f5a3ebbc1f7e..cf4107d80bc4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,11 +666,18 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) goto exit; - skb_zcopy_clear(skb, true); + if (skb_zcopy(skb)) { + bool skip_unref = shinfo->flags & SKBFL_MANAGED_FRAG_REFS; + + skb_zcopy_clear(skb, true); + if (skip_unref) + goto free_head; + } for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); +free_head: if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); @@ -895,7 +902,10 @@ EXPORT_SYMBOL(skb_dump); */ void skb_tx_error(struct sk_buff *skb) { - skb_zcopy_clear(skb, true); + if (skb) { + skb_zcopy_downgrade_managed(skb); + skb_zcopy_clear(skb, true); + } } EXPORT_SYMBOL(skb_tx_error); @@ -1375,6 +1385,16 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream); +void __skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + int i; + + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAG_REFS; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) + skb_frag_ref(skb, i); +} +EXPORT_SYMBOL_GPL(__skb_zcopy_downgrade_managed); + static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig, gfp_t gfp_mask) { @@ -1692,6 +1712,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, BUG_ON(skb_shared(skb)); + skb_zcopy_downgrade_managed(skb); + size = SKB_DATA_ALIGN(size); if (skb_pfmemalloc(skb)) @@ -3488,6 +3510,8 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len) int pos = skb_headlen(skb); const int zc_flags = SKBFL_SHARED_FRAG | SKBFL_PURE_ZEROCOPY; + skb_zcopy_downgrade_managed(skb); + skb_shinfo(skb1)->flags |= skb_shinfo(skb)->flags & zc_flags; skb_zerocopy_clone(skb1, skb, 0); if (len < pos) /* Split line is inside header. */ @@ -3841,6 +3865,7 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, if (skb_can_coalesce(skb, i, page, offset)) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size); } else if (i < MAX_SKB_FRAGS) { + skb_zcopy_downgrade_managed(skb); get_page(page); skb_fill_page_desc(skb, i, page, offset, size); } else { From patchwork Tue Jul 12 20:52:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915640 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DC54C433EF for ; Tue, 12 Jul 2022 20:53:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234026AbiGLUx4 (ORCPT ); Tue, 12 Jul 2022 16:53:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233719AbiGLUxo (ORCPT ); Tue, 12 Jul 2022 16:53:44 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64243CEB94; Tue, 12 Jul 2022 13:53:19 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id v67-20020a1cac46000000b003a1888b9d36so112606wme.0; Tue, 12 Jul 2022 13:53:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GKMfDsz3ncRIFfRnqejiNLX2DyehkKru80XLdiaCkUE=; b=RgCPTWd4uK4sLGJmlktxUWB5eHkAaMwBuqAi5yAP0tRVSE0e49D9eFXJ7lXjvhUz0V OfgVvRaimWYIIzVXCPXDFmxm+9Xq+vxgVOxYT0a5BHmlTCCDZe+sj7V2VEGhLey6xF/I EK1pfcsmt2XaxDymzzu2DjGUlle6MQNB7jy2tw9N7fkJRqrjkxgCKizy73xdRd+d7Bdi 5gfb1huCMwnQ3Co3r2DQcQiKdNOQZn5adr+wSd/UEaFn/OLk4hvhHr/LOzmqk04hyIp7 dn2EuoKgcb0aM5HzQr5YeGdNMNMgcXqIkbSzZ2lNdwa4VkGCbCn73vLtuT3seuk1uqfn e+ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GKMfDsz3ncRIFfRnqejiNLX2DyehkKru80XLdiaCkUE=; b=GPpoijbSPgFbSin9mtd921f0so2nSwVr0kjPD5IrkuI7LssAXav9GiZQwkyf+rrTOD WcFbNWc9Q3ejqqpuVGLAsZe2q+t3Asp0plehI3wPz8K1g6LKdtkmwOKaocSkSyYr5D/Q 2YjJPaKLttObkptB25rtI0kn6yuEys76Sk7gcS0C/1h9Z15cuH0QVnc0NyiSgT22lDWA p4feXcDL7VhUr0A8FnIc44Aiy367a8T2FmctXJCnhKuG/0ua/sPJE7YS/vs3Pmc9l3JP DciSu4N0JmGes8QSXSJl0KddFsSDgWPVZHOqDEqPj1pCoWfSs7/3fnfh17V5c7rZK2rD niWg== X-Gm-Message-State: AJIora+uYJDOTUzhdg1pFuKjEmHYLkz1UivRhdTmMMRsKJJCfrJJYTTO EzdvPFH73lr9WAyIXCVeYmGrChkfk5Y= X-Google-Smtp-Source: AGRyM1sVlcAUo5UVbD3xERHem11AAK2I0GUI2uEzBIl6tcuBLMrFWIBt8sGwjutEfjhv1lH8oyBp5Q== X-Received: by 2002:a05:600c:4e16:b0:3a2:ef34:dbe3 with SMTP id b22-20020a05600c4e1600b003a2ef34dbe3mr4824648wmq.71.1657659196932; Tue, 12 Jul 2022 13:53:16 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:16 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 08/27] net: introduce __skb_fill_page_desc_noacc Date: Tue, 12 Jul 2022 21:52:32 +0100 Message-Id: <99bdf93fa51edfbef709ad41f9985d855998fe38.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Managed pages contain pinned userspace pages and controlled by upper layers, there is no need in tracking skb->pfmemalloc for them. Introduce a helper for filling frags but ignoring page tracking, it'll be needed later. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 07004593d7ca..1111adefd906 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2550,6 +2550,22 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, + int i, struct page *page, + int off, int size) +{ + skb_frag_t *frag = &shinfo->frags[i]; + + /* + * Propagate page pfmemalloc to the skb if we can. The problem is + * that not all callers have unique ownership of the page but rely + * on page_is_pfmemalloc doing the right thing(tm). + */ + frag->bv_page = page; + frag->bv_offset = off; + skb_frag_size_set(frag, size); +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2566,17 +2582,7 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - /* - * Propagate page pfmemalloc to the skb if we can. The problem is - * that not all callers have unique ownership of the page but rely - * on page_is_pfmemalloc doing the right thing(tm). - */ - frag->bv_page = page; - frag->bv_offset = off; - skb_frag_size_set(frag, size); - + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; From patchwork Tue Jul 12 20:52:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915638 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAD54CCA47F for ; Tue, 12 Jul 2022 20:53:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234161AbiGLUxv (ORCPT ); Tue, 12 Jul 2022 16:53:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231349AbiGLUxn (ORCPT ); Tue, 12 Jul 2022 16:53:43 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C937D0380; Tue, 12 Jul 2022 13:53:20 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id c131-20020a1c3589000000b003a2cc290135so93713wma.2; Tue, 12 Jul 2022 13:53:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=93tCkgQSnWP7j8MKkQNUUkCctSIIcngjSEkqdPJ6fOw=; b=fc571E8SVwODapQL0U19Opr1UFzf0q5QnKd1RASOgTqD1+rq7tyYrd2eHiC2uwlrv2 EovC5vq+D1z7pKbUvECinQEEkM7S/iWcjjF0EAQ6UZWiqBQSJuqKr7w8pEIJuOU0Rt4I 3bgNyzwq9YAWUNvU8FZWQwILB/QCx1RZHjLCla1ibLyUZT3hICnZlq2txEtzfoFnN1Dd N5XdC4dMGlMR5kYfJzbrDkrflgM2w5kj/4pFoD38YxbfgYaDqUx5tnFDJu+fcPL2jyXg 6tvdPx9kKv9fVmV2b2rMsR0IpV7spxDBuGCGiPtNVEEJsp654oiUwaRw8j6GggKgZQAE MdOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=93tCkgQSnWP7j8MKkQNUUkCctSIIcngjSEkqdPJ6fOw=; b=W1NU2Ub4bEBIstTVKb88F5BKtILt8dssulu7GK3mv7pchDaYwsEU4mIm0Bbgv+NupA BjBHyYJ1P7NEV4ro9opnU6N237YNhLTycg+TRUPBosZSM72iCPZi/Eh+DbbRbbEnzRW+ 253cz1db6gX3of1XHK4RFSqgTD2nWNvJEGjg9MzqTeIU+RxnguoTO6+Q7MKlz53WLEu8 OUVPCj6h2YqlvB0vfluyNpIYCl8xX14SxTdHoVpHzYbZumyAv1EcsAbjs6BSzBRyWiFk ncTgu+rtJnN032njG321ZS98qI5W0fu9x33fKxcTOqmLy+InFZzY7Y+VsoEUo8wKK837 7oGQ== X-Gm-Message-State: AJIora9+vmGVYvXkaoFjUmgEIv1A5JWhpHN4m+x5K7yMfA8dT5KLpVwh nUrxs8jfNU7zPm6g5ebhoI/hwfOcILg= X-Google-Smtp-Source: AGRyM1sbJ9UkYmXzur77FvKfiydeJpmt08QWMLfgXVHgYyEiKJ0cddUvd8qHIVStFA1S8oB8U9exZA== X-Received: by 2002:a7b:cd82:0:b0:3a1:7528:2d79 with SMTP id y2-20020a7bcd82000000b003a175282d79mr6043091wmj.79.1657659198241; Tue, 12 Jul 2022 13:53:18 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:17 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 09/27] ipv4/udp: support externally provided ubufs Date: Tue, 12 Jul 2022 21:52:33 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Teach ipv4/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 44 +++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 581d1e233260..df7f9dfbe8be 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1017,18 +1017,35 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1192,13 +1209,14 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; From patchwork Tue Jul 12 20:52:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915641 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9856BC43334 for ; Tue, 12 Jul 2022 20:54:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234086AbiGLUyW (ORCPT ); Tue, 12 Jul 2022 16:54:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234077AbiGLUxq (ORCPT ); Tue, 12 Jul 2022 16:53:46 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAC2DD0398; Tue, 12 Jul 2022 13:53:21 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id y22-20020a7bcd96000000b003a2e2725e89so108869wmj.0; Tue, 12 Jul 2022 13:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=s8Z4UCQCwXUXw8ymrA0AVRoggqHC6f/FDd/tj/tGO10=; b=YI+zuSHqDwgBSxMQDbNdfCXJYnhCEzG4jPJonm/OmJPXWYtolgG2mlqxIhZ+njdofI knhC01VD3yq1AxzXVlRRvfZDTpIXxbDi/ljvOl7fu+97hDXxyo71aDpQ/4sSrPq0TF9E XATyfL9AWsmPeUok7LUGrB07G9QDvlfELB/rFcvym7OGBrM8CkFiIBcfhuc065c5k8wR q6Ew2qcY8s8CYrsonVSA3c5pqRnuSqM81RKk37QqsOX7kIb5jiE0vZyd+WG0IjIQtpSK /rEQ/xNxv3xz34QhrecjpSmSBFeY93Z3IBGhsiWLjDOe3t0+weJ6+OknbAuXMCNh3Ijx /aTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=s8Z4UCQCwXUXw8ymrA0AVRoggqHC6f/FDd/tj/tGO10=; b=klN8gSfmOwhEh0IZ9z5dSOLCHDhvq0MuigvPFbws094qw4Cr2zYM1TDqIqJ+8300Bv f+JUDsh3QJQr7e9dxvb551Ubd/vrELKrJP/h23IxYd0dRFLknGEsxDVNnSUne8P1BpU3 9UfIlJt1iDFhe+tubuo6uBbpuTMiC8ovsiX+CcObIWpRNWYS0kCn2IO3Wo+PUNIGQqrD ZPPSB3sYHNhv0QWakp6O6eXdL7S0iZX/P2JoA6L20pVC8srQSXlH3ivJlB4elnm63sFE ugNhxNx4w8iScVVbOcuKU8DcWGR49l8Cg/IAnZa+zE5kUJ4lKwoQjjwWqVeK2UIAT/KD tfsg== X-Gm-Message-State: AJIora9nZCWUKAakRIS/myNXSuhASu0RRLzYSapPmtnnpP7CCxPPRcrg B7jWMAszkn3MB3gttsGZSOjLTH/IjqA= X-Google-Smtp-Source: AGRyM1tHCITSPTZ0F1HAPZl2f3/b1zgi2kTBLFOzml+urwPzYJHZiZ4OEm8WKUydwn4dPFhWDsbuWw== X-Received: by 2002:a05:600c:3512:b0:3a0:5005:86b5 with SMTP id h18-20020a05600c351200b003a0500586b5mr6132266wmq.191.1657659199486; Tue, 12 Jul 2022 13:53:19 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:19 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 10/27] ipv6/udp: support externally provided ubufs Date: Tue, 12 Jul 2022 21:52:34 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Teach ipv6/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 44 ++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index fc74ce3ed8cc..897ca4f9b791 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1542,18 +1542,35 @@ static int __ip6_append_data(struct sock *sk, rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1747,13 +1764,14 @@ static int __ip6_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; From patchwork Tue Jul 12 20:52:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915639 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF2CAC43334 for ; Tue, 12 Jul 2022 20:53:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231435AbiGLUxz (ORCPT ); Tue, 12 Jul 2022 16:53:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234049AbiGLUxo (ORCPT ); Tue, 12 Jul 2022 16:53:44 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94023D038E; Tue, 12 Jul 2022 13:53:21 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id bu1so11642111wrb.9; Tue, 12 Jul 2022 13:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OJ0hXZfV5Alrn5VQyEcS8Jj+bIgVFhTm5Bw+BZyTG0c=; b=gUEjS0b1uKrgeP8SV+ckYTtCLYwhpD7RrM77qL572lXDM/R9S/ant/W8YQXwWS8+zB hMd/GuIMiQbUiCN7T9Kz3mcSLcXXPvy9vWl1ZWFtznrlOFtDTcti6qS0TlFpvKxQjcoe sGsQI+eAGvIGcRv8p5WIswDXDQT6geiY3gmb/Iyu1uiVYWBblSdY/5QHUaYCZSJUNk+z bMz+59jAper6eKDw3Bjn9wB7nnW48inVy3bLJfehRAuErxPTdNfdbceGPdrHorQFpp1z S7P6PC/b6hYgj5N6zZc7mwIkmsnMkejqrK2p1ZhJBxUqR8cCFbSb2UbTdqJzEEbYqSmb 6IQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OJ0hXZfV5Alrn5VQyEcS8Jj+bIgVFhTm5Bw+BZyTG0c=; b=ycoHrmkTrgrqOQx7ptenRImEfrrV8AX/qi+KMMulpeatYDLkPt/MLBUX/GQP7QA1Ct kiSLF4DY6FOGVcHqao8OcPKw3q5uUGHWK0dCEdB5xQbjcpF31EiDpssYknjHpbZ9sXR7 umH322GpXEXPwwo4J78zBne1FJFDsjTPG6MOIduteQ6Ogzuya5ABc5EIUnTJwAWRTe/R gUXOZU6guav9DW05UKq7On8WhaFfDewA1f2mjbGpFamLXpBGB+JlKp2qoiOA10Ivymxp Bp2UKIRftR4tz84HdM/otT2RSrbKPGcW5csFKXOP/Wtlx42/Yyxof8kYI5YnuxzM4XUc eTPA== X-Gm-Message-State: AJIora9V/l+jUhFod1AoylzdwGvK04hTs813+PgBw7GCd0e6Xz+TTP6f REkyfUgPNHYc7SgSqCv/rBYtOkEC9f0= X-Google-Smtp-Source: AGRyM1vd7lHZ0K8ZfBC4o5wvmHxAWa3AvtgHBbjRQkWNlv/9DxoLb2w3ZwrpMYP+O6fyLAgn5NaSUg== X-Received: by 2002:a5d:4892:0:b0:20c:d4eb:1886 with SMTP id g18-20020a5d4892000000b0020cd4eb1886mr24080604wrq.96.1657659200697; Tue, 12 Jul 2022 13:53:20 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:20 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 11/27] tcp: support externally provided ubufs Date: Tue, 12 Jul 2022 21:52:35 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Teach tcp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov --- net/ipv4/tcp.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 390eb3dc53bd..a81f694af5e9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1223,17 +1223,23 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) flags = msg->msg_flags; - if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) { + if ((flags & MSG_ZEROCOPY) && size) { skb = tcp_write_queue_tail(sk); - uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); - if (!uarg) { - err = -ENOBUFS; - goto out_err; - } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) - uarg->zerocopy = 0; + if (msg->msg_ubuf) { + uarg = msg->msg_ubuf; + net_zcopy_get(uarg); + zc = sk->sk_route_caps & NETIF_F_SG; + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); + if (!uarg) { + err = -ENOBUFS; + goto out_err; + } + zc = sk->sk_route_caps & NETIF_F_SG; + if (!zc) + uarg->zerocopy = 0; + } } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1356,9 +1362,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) copy = min_t(int, copy, pfrag->size - pfrag->offset); - if (tcp_downgrade_zcopy_pure(sk, skb)) - goto wait_for_space; - + if (unlikely(skb_zcopy_pure(skb) || skb_zcopy_managed(skb))) { + if (tcp_downgrade_zcopy_pure(sk, skb)) + goto wait_for_space; + skb_zcopy_downgrade_managed(skb); + } copy = tcp_wmem_schedule(sk, copy); if (!copy) goto wait_for_space; From patchwork Tue Jul 12 20:52:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915643 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACB2BCCA481 for ; Tue, 12 Jul 2022 20:54:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233704AbiGLUyj (ORCPT ); Tue, 12 Jul 2022 16:54:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233990AbiGLUxs (ORCPT ); Tue, 12 Jul 2022 16:53:48 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 032B7D03B3; Tue, 12 Jul 2022 13:53:23 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id v16so12790828wrd.13; Tue, 12 Jul 2022 13:53:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zU7SCTZ1JR9bpmBy/aLylVb+e+9ie+s0rEDh1Smgcsg=; b=F8O/jx5Je8eryKkph6AeX5I8ABHFSKvG/mNDO6VjK4TGox9Z+Z9g67tGWGVonwYaX8 ABTZZc1RpU56XCDZ8Oz2N5S3H3n8gU4PHzi8YCR7NumYYazatkqYJ5zn3rcQ9p5/bxfc fDhXHPw/ynwfAMT9uNhGUJb8WuEKLc9A1Qg02RGFolptTciLXGLh0qXZKytXPJ6jzfYk BYz7nUpalakGuhdfS90zylg1dT7gkgO/ihrdrKgPGI6o7NzCCRkzK1BfSHPl4yZLZ2IF QmPBsIc67tbl78FFrpuU5RP2FIJKBlXaGwfz5dupuJnhGrOnR8t4S7vllopPthaK+Pam bW/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zU7SCTZ1JR9bpmBy/aLylVb+e+9ie+s0rEDh1Smgcsg=; b=6dgZIsGbvcvff7NGxVk0n/xR9CVVjq5Q46Su6+2axU6S82H9u3HnnnbIBGxp72ZXU1 DUlmnaQ+jk65UcFNRDrZf/pN7HXs2yv5MYw5IBNdcv1Bsktw3VBrwylNL/3aT4X6VtC3 dT0DV8zcBzoUfbBDAOJ3udUUvNQcwqAwkr/Z1khlRCIkzLS+0KyOID1ZFz3adMsabSY8 w+OWegG4G4WQCVcl6GSXopxQXlvesUNQJHo19ZfRy09WOd8Cnd8c2ZE+FOK2NSd8f+ys DlRPjQwpQ835l5LocX0JMSxI4Pp3BItQa/1ko3bbwpqtUyBHt7z+N492EFndcmbfoErv Iazw== X-Gm-Message-State: AJIora+Oh/JEdv7AiYxCkQC7nNujCNQk0EHwY5Otg8Wz9MiKz+Nbnuo2 YyWh3am9xwME6jsvzAU6lMldp4TWFfk= X-Google-Smtp-Source: AGRyM1tgGvNkygL1SBdVVnlSafYsitm6VFrOzmb1ExdfB9SQlfNPmvMHAfOSIcCo+FyJPCGHGE3S+w== X-Received: by 2002:a05:6000:1152:b0:21d:7646:a976 with SMTP id d18-20020a056000115200b0021d7646a976mr24466876wrx.416.1657659201832; Tue, 12 Jul 2022 13:53:21 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:21 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 12/27] io_uring: initialise msghdr::msg_ubuf Date: Tue, 12 Jul 2022 21:52:36 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Initialise newly added ->msg_ubuf in io_recv() and io_send(). Signed-off-by: Pavel Begunkov --- io_uring/net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index cb08a4b62840..2dd61fcf91d8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -255,6 +255,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -601,6 +602,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_flags = 0; msg.msg_controllen = 0; msg.msg_iocb = NULL; + msg.msg_ubuf = NULL; flags = sr->msg_flags; if (force_nonblock) From patchwork Tue Jul 12 20:52:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915642 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74BBBC43334 for ; Tue, 12 Jul 2022 20:54:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234263AbiGLUyi (ORCPT ); Tue, 12 Jul 2022 16:54:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234109AbiGLUxr (ORCPT ); Tue, 12 Jul 2022 16:53:47 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B4A0D085C; Tue, 12 Jul 2022 13:53:24 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id 9-20020a1c0209000000b003a2dfdebe47so85146wmc.3; Tue, 12 Jul 2022 13:53:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=t800C3UW24EYUJ7jIBViV/Es6005J2b7uIwCCrhjWNQ=; b=YRIMsVyKoyP/kU4ZdScISJ4L6DZjSnsTWvwKFZwgiVRZJnM+UFyIh2/FJDFaqfT8wF zKg/CJycE8Awf2lysm/pLhFOh6ZjdDBFCoZt1HQYjTk82M4LSpMVrYK+V1v9vS+neKxY W9p/iryUhFJ5LFsTl/MUzsFgi3iBsR7fDQicnDgrcqVAhqnCzMuLzDmlZjqtX4737Dgq UB083BCgopx63K63+0Ngt/UphCaFyAlNvtm+AQFaYDwzpP8G1EEHx5R4m3f+r5ZD2Mp2 A4ZouS83CH/rrI6+YpNvQCxCa5R1U5a5U0L72MfpaRlbV4NgKYu4fhfEeDOfRVRniCXD hGlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=t800C3UW24EYUJ7jIBViV/Es6005J2b7uIwCCrhjWNQ=; b=2pKatZJqADUpIGOHzI3UkfeibT601k6fCKA1SxPbfwIcEu2nZfKD2Tcrt8Xw7M4ycy +r7c0Wc2Yo6H4dBjdqLozeRlmouunbLfp9vrdVNziS1aU0kE3BxXhfSm2ADja1Mm2jJA sBloMcTqp2wXCN01OhEd0LgmmOiN74mtNpRh6SGJOe/2s5WrOHCTRAx6g4ARwHLrm+WZ LjsRLHBo3SQm1rhnSKcjMo5GHtFqCWRZl1E/N8V74iOCmi5PDxLQkX3TTPilpB7aaqsh CepuLXaOSCBOxtt5s4mC2CQ8Ben7N3t/I196YdN6EvXm5YqiaYqCDfIC6DjVce14T6Ip kTHQ== X-Gm-Message-State: AJIora9z0RQX6DeJzdnhGMPdXhGBX/19tVB5noJS4v32k+0zvE8i8SZO aUF1bolF6yxFE/MmFnR1kgCXpHGqZ4s= X-Google-Smtp-Source: AGRyM1uNbx+CtqsvokSH4Szzl4TpleQVijINmFvzFFrVB0wRpW5GCpJaCWuKMG/6MQFXZH/EAlYlPA== X-Received: by 2002:a05:600c:1e22:b0:3a2:ec81:a415 with SMTP id ay34-20020a05600c1e2200b003a2ec81a415mr6042471wmb.139.1657659203055; Tue, 12 Jul 2022 13:53:23 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:22 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 13/27] io_uring: export io_put_task() Date: Tue, 12 Jul 2022 21:52:37 +0100 Message-Id: <3686807d4c03b72e389947b0e8692d4d44334ef0.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Make io_put_task() available to non-core parts of io_uring, we'll need it for notification infrastructure. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 25 +++++++++++++++++++++++++ io_uring/io_uring.c | 11 +---------- io_uring/io_uring.h | 10 ++++++++++ io_uring/tctx.h | 26 -------------------------- 4 files changed, 36 insertions(+), 36 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 26ef11e978d4..d876a0367081 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -4,6 +4,7 @@ #include #include #include +#include #include struct io_wq_work_node { @@ -43,6 +44,30 @@ struct io_hash_table { unsigned hash_bits; }; +/* + * Arbitrary limit, can be raised if need be + */ +#define IO_RINGFD_REG_MAX 16 + +struct io_uring_task { + /* submission side */ + int cached_refs; + const struct io_ring_ctx *last; + struct io_wq *io_wq; + struct file *registered_rings[IO_RINGFD_REG_MAX]; + + struct xarray xa; + struct wait_queue_head wait; + atomic_t in_idle; + atomic_t inflight_tracked; + struct percpu_counter inflight; + + struct { /* task_work */ + struct llist_head task_list; + struct callback_head task_work; + } ____cacheline_aligned_in_smp; +}; + struct io_uring { u32 head ____cacheline_aligned_in_smp; u32 tail ____cacheline_aligned_in_smp; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index caf979cd4327..bb644b1b575a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -602,7 +602,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx) return ret; } -static void __io_put_task(struct task_struct *task, int nr) +void __io_put_task(struct task_struct *task, int nr) { struct io_uring_task *tctx = task->io_uring; @@ -612,15 +612,6 @@ static void __io_put_task(struct task_struct *task, int nr) put_task_struct_many(task, nr); } -/* must to be called somewhat shortly after putting a request */ -static inline void io_put_task(struct task_struct *task, int nr) -{ - if (likely(task == current)) - task->io_uring->cached_refs += nr; - else - __io_put_task(task, nr); -} - static void io_task_refs_refill(struct io_uring_task *tctx) { unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 868f45d55543..2379d9e70c10 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -66,6 +66,7 @@ void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); void io_queue_next(struct io_kiocb *req); +void __io_put_task(struct task_struct *task, int nr); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); @@ -253,4 +254,13 @@ static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx) __io_commit_cqring_flush(ctx); } +/* must to be called somewhat shortly after putting a request */ +static inline void io_put_task(struct task_struct *task, int nr) +{ + if (likely(task == current)) + task->io_uring->cached_refs += nr; + else + __io_put_task(task, nr); +} + #endif diff --git a/io_uring/tctx.h b/io_uring/tctx.h index 8a33ff6e5d91..25974beed4d6 100644 --- a/io_uring/tctx.h +++ b/io_uring/tctx.h @@ -1,31 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 -#include - -/* - * Arbitrary limit, can be raised if need be - */ -#define IO_RINGFD_REG_MAX 16 - -struct io_uring_task { - /* submission side */ - int cached_refs; - const struct io_ring_ctx *last; - struct io_wq *io_wq; - struct file *registered_rings[IO_RINGFD_REG_MAX]; - - struct xarray xa; - struct wait_queue_head wait; - atomic_t in_idle; - atomic_t inflight_tracked; - struct percpu_counter inflight; - - struct { /* task_work */ - struct llist_head task_list; - struct callback_head task_work; - } ____cacheline_aligned_in_smp; -}; - struct io_tctx_node { struct list_head ctx_node; struct task_struct *task; From patchwork Tue Jul 12 20:52:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915644 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31292C43334 for ; Tue, 12 Jul 2022 20:54:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234029AbiGLUyw (ORCPT ); Tue, 12 Jul 2022 16:54:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233143AbiGLUxu (ORCPT ); Tue, 12 Jul 2022 16:53:50 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0866D0E15; Tue, 12 Jul 2022 13:53:26 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id f2so12810162wrr.6; Tue, 12 Jul 2022 13:53:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vE3qFQ8ddlENz32QVzcPsvY6xTeH/TIslKWkfAbKxUg=; b=Uro7tQbm/+uYzh7ZSgzm9Ae9KYWXN0i1e5qvUPuLNSpyLqman93RwXJ8tsx4LvAp3a ObsNOgUN4toJoeFFiqxnylasNI72+cLMXjZSFnzGjASIufcRVLENsB1UUzwHrQ4dDpwV u0mOY4RgdwH1X6VO18JoxLIeAPMJELNemuNq3Xpfq4rd+VSwasnvKlrClDWaZ0OBwpii rdfhyn5SUFFLslukhiaF0cLeuyzHqoFpvhp2WAXGcd1QjmmURHvHzKxKEhbmAvXnVEZ9 A3P8GtEB48LQH6H/n3Gi6oTQAzSSRV5Pqs7zUv83UnKMDsgIv6dN+MS6zNKYOy9Bz/BL Aq9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vE3qFQ8ddlENz32QVzcPsvY6xTeH/TIslKWkfAbKxUg=; b=m1oeHXOwoBLYLljAFuvXFAcbl5NPxfIPdLECuRfQtiG6LnD7gvcGH5LkH+XMywqRqe nyduTqY1qsnNsK+mi891Gr3VTxycWdAEfbUdYcoIxCkPAff/Ni9HwMFPjujtIa/5kfou kAICwOIapRi4X0BdlCG49MNk5xMz/NodJbNt0/km9LLCgkLWfXeMriFEes0LmOcwy35H zsBj1dOwAYb21EYrNfQ+0Ryv8YHQXKZJkpJkAKnRu7mY+v0w+nRYYiSSA1ga9/JK5RRF EhitYPoHxoISrRykR8+wYnOhe4wT6T98vObbQohQb2zv2/zgUPN0yMug5rW/PfaZnivz kryw== X-Gm-Message-State: AJIora8yQmEzBijhCkfPoR9KleXYARA8DQXndnu+0X1/aNKRQWf8u+Gx QrV8HG3BcM6vbWL4brOWviMJlQCcBHw= X-Google-Smtp-Source: AGRyM1vmcTiRUiaTt7Z8+3CF2mqaXewDfFH5/VL59cyKWbiOxMKKJDKFhtm2JOBe2FOgJMPUXH1mCQ== X-Received: by 2002:a5d:6d8a:0:b0:21d:a6f3:f458 with SMTP id l10-20020a5d6d8a000000b0021da6f3f458mr11509671wrs.574.1657659204209; Tue, 12 Jul 2022 13:53:24 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:23 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 14/27] io_uring: add zc notification infrastructure Date: Tue, 12 Jul 2022 21:52:38 +0100 Message-Id: <3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 5 ++ io_uring/Makefile | 2 +- io_uring/io_uring.c | 8 ++- io_uring/io_uring.h | 2 + io_uring/notif.c | 102 +++++++++++++++++++++++++++++++++ io_uring/notif.h | 64 +++++++++++++++++++++ 6 files changed, 179 insertions(+), 4 deletions(-) create mode 100644 io_uring/notif.c create mode 100644 io_uring/notif.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d876a0367081..95334e678586 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -34,6 +34,9 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_notif; +struct io_notif_slot; + struct io_hash_bucket { spinlock_t lock; struct hlist_head list; @@ -232,6 +235,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_notif_slot *notif_slots; + unsigned nr_notif_slots; struct io_submit_state submit_state; diff --git a/io_uring/Makefile b/io_uring/Makefile index 466639c289be..8cc8e5387a75 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index bb644b1b575a..ad816afe2345 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -89,6 +89,7 @@ #include "kbuf.h" #include "rsrc.h" #include "cancel.h" +#include "notif.h" #include "timeout.h" #include "poll.h" @@ -726,9 +727,8 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx) return &rings->cqes[off]; } -static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, - u64 user_data, s32 res, u32 cflags, - bool allow_overflow) +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, + bool allow_overflow) { struct io_uring_cqe *cqe; @@ -2496,6 +2496,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) } #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -2672,6 +2673,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) io_unregister_personality(ctx, index); if (ctx->rings) io_poll_remove_all(ctx, NULL, true); + io_notif_unregister(ctx); mutex_unlock(&ctx->uring_lock); /* failed during ring init, it couldn't have issued any requests */ diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2379d9e70c10..b8c858727dc8 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -33,6 +33,8 @@ void io_req_complete_post(struct io_kiocb *req); void __io_req_complete_post(struct io_kiocb *req); bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, bool allow_overflow); +bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags, + bool allow_overflow); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); diff --git a/io_uring/notif.c b/io_uring/notif.c new file mode 100644 index 000000000000..6ee948af6a49 --- /dev/null +++ b/io_uring/notif.c @@ -0,0 +1,102 @@ +#include +#include +#include +#include +#include +#include + +#include "io_uring.h" +#include "notif.h" + +static void __io_notif_complete_tw(struct callback_head *cb) +{ + struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_ring_ctx *ctx = notif->ctx; + + io_cq_lock(ctx); + io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq, true); + io_cq_unlock_post(ctx); + + percpu_ref_put(&ctx->refs); + kfree(notif); +} + +static inline void io_notif_complete(struct io_notif *notif) +{ + __io_notif_complete_tw(¬if->task_work); +} + +static void io_notif_complete_wq(struct work_struct *work) +{ + struct io_notif *notif = container_of(work, struct io_notif, commit_work); + + io_notif_complete(notif); +} + +static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *uarg, + bool success) +{ + struct io_notif *notif = container_of(uarg, struct io_notif, uarg); + + if (!refcount_dec_and_test(&uarg->refcnt)) + return; + INIT_WORK(¬if->commit_work, io_notif_complete_wq); + queue_work(system_unbound_wq, ¬if->commit_work); +} + +struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif; + + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + + notif->seq = slot->seq++; + notif->tag = slot->tag; + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + /* master ref owned by io_notif_slot, will be dropped on flush */ + refcount_set(¬if->uarg.refcnt, 1); + percpu_ref_get(&ctx->refs); + return notif; +} + +static void io_notif_slot_flush(struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif = slot->notif; + + slot->notif = NULL; + + if (WARN_ON_ONCE(in_interrupt())) + return; + /* drop slot's master ref */ + if (refcount_dec_and_test(¬if->uarg.refcnt)) + io_notif_complete(notif); +} + +__cold int io_notif_unregister(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + int i; + + if (!ctx->notif_slots) + return -ENXIO; + + for (i = 0; i < ctx->nr_notif_slots; i++) { + struct io_notif_slot *slot = &ctx->notif_slots[i]; + + if (slot->notif) + io_notif_slot_flush(slot); + } + + kvfree(ctx->notif_slots); + ctx->notif_slots = NULL; + ctx->nr_notif_slots = 0; + return 0; +} \ No newline at end of file diff --git a/io_uring/notif.h b/io_uring/notif.h new file mode 100644 index 000000000000..3d7a1d242e17 --- /dev/null +++ b/io_uring/notif.h @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include + +struct io_notif { + struct ubuf_info uarg; + struct io_ring_ctx *ctx; + + /* cqe->user_data, io_notif_slot::tag if not overridden */ + u64 tag; + /* see struct io_notif_slot::seq */ + u32 seq; + + union { + struct callback_head task_work; + struct work_struct commit_work; + }; +}; + +struct io_notif_slot { + /* + * Current/active notifier. A slot holds only one active notifier at a + * time and keeps one reference to it. Flush releases the reference and + * lazily replaces it with a new notifier. + */ + struct io_notif *notif; + + /* + * Default ->user_data for this slot notifiers CQEs + */ + u64 tag; + /* + * Notifiers of a slot live in generations, we create a new notifier + * only after flushing the previous one. Track the sequential number + * for all notifiers and copy it into notifiers's cqe->cflags + */ + u32 seq; +}; + +int io_notif_unregister(struct io_ring_ctx *ctx); + +struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot); + +static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) +{ + if (!slot->notif) + slot->notif = io_alloc_notif(ctx, slot); + return slot->notif; +} + +static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, + int idx) + __must_hold(&ctx->uring_lock) +{ + if (idx >= ctx->nr_notif_slots) + return NULL; + idx = array_index_nospec(idx, ctx->nr_notif_slots); + return &ctx->notif_slots[idx]; +} From patchwork Tue Jul 12 20:52:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915645 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCE4EC433EF for ; Tue, 12 Jul 2022 20:54:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234195AbiGLUy4 (ORCPT ); Tue, 12 Jul 2022 16:54:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234174AbiGLUxw (ORCPT ); Tue, 12 Jul 2022 16:53:52 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6822DD0E21; Tue, 12 Jul 2022 13:53:27 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id f2so12810216wrr.6; Tue, 12 Jul 2022 13:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MeY1SJ5dADtlZJuMOY4DtypkfXVjyNqMgbsyEsXrBK8=; b=MnbCf9rDUl7U/n/4OTzP92LAfXNC0Ly4vBE0GO99GAtY2lMXnatwo4AWuD+LiMmLZj QUPZhBWZfHVy/0x3Ujhg8HAQ8dqs7oLUtOUYNgjnSbfXDOs71uMz3AMQc1U/RB/+0UbX LXfTyR/1odChXPLAQPFyvD831Zk5fPqEpZ7BnKRuoq+Ei/Q66Why45fAZkruNi5L2nCO Vs/qVqg4InLPbCvL8zOfJmOeeAA4jXylrQiNbxb8WDJaUnkT4FBSpObcfize0e1la0XX 8KCzHWcXgF7LVV3yvEUjmwjCqYGlBLD2KllxemNy2xgzKvuF2Z5A+Y8ndtdm05sj2TRT b5oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MeY1SJ5dADtlZJuMOY4DtypkfXVjyNqMgbsyEsXrBK8=; b=YqTrPs/LBMaOYoP2Mq4oX2YvrhidD1YOCqulueeSFx0fWv2XxQeZVOyejEmtGJv3ex G/f+88ti+w+TXN6zFKs1uPqqNgPI+QoUcssmpLPVEtzP4ua7NweJ4lOk4n3XjV2ccUCs MszvqsDObJ16vVBNQb/DYpEOhdlKEDgxD7ZMVNW2FH7FYGNIxROl4l8/bv3TxD9BZu1Y so8bCFd37trGEK8R4/38VDcUmgNjfNpKAkwh7yBqNHYTtIVKZXICJQxHuPO7/r0a50H8 BVS5+J0tKTYNw0WA/iOuTKUJF4oWnOpDfIi+lY4sWfqFcFlkroUJU+e3TEu50RRrD+xI L57w== X-Gm-Message-State: AJIora9Qquxq7KgSoFSSFS1KZce6khvuJbeQ9eEwqCDeVJvMz0KtkLHX O7nTBf8Klh8HfC5daufYEsz2csdqEtU= X-Google-Smtp-Source: AGRyM1tbnbVzN5jbSxUvYUYUPJCkX+TbsQkJx77D/udEqeZRF9xB9pzlUcupPuGR6yS9NloRnw7ygQ== X-Received: by 2002:a05:6000:18a1:b0:21d:b2bd:d6e2 with SMTP id b1-20020a05600018a100b0021db2bdd6e2mr5277615wri.53.1657659205403; Tue, 12 Jul 2022 13:53:25 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:25 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 15/27] io_uring: cache struct io_notif Date: Tue, 12 Jul 2022 21:52:39 +0100 Message-Id: <9dec18f7fcbab9f4bd40b96e5ae158b119945230.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 7 +++++ io_uring/io_uring.c | 3 ++ io_uring/notif.c | 57 +++++++++++++++++++++++++++++----- io_uring/notif.h | 5 +++ 4 files changed, 65 insertions(+), 7 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 95334e678586..66ab009e7a6b 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -244,6 +244,9 @@ struct io_ring_ctx { struct xarray io_bl_xa; struct list_head io_buffers_cache; + /* struct io_notif cache, protected by uring_lock */ + struct list_head notif_list; + struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; struct list_head apoll_cache; @@ -255,6 +258,10 @@ struct io_ring_ctx { struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; + /* struct io_notif cache protected by completion_lock */ + struct list_head notif_list_locked; + unsigned int notif_locked_nr; + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ad816afe2345..bdc5a2839d94 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -318,6 +318,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_LIST_HEAD(&ctx->notif_list); + INIT_LIST_HEAD(&ctx->notif_list_locked); return ctx; err: kfree(ctx->dummy_ubuf); @@ -2498,6 +2500,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); + io_notif_cache_purge(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); diff --git a/io_uring/notif.c b/io_uring/notif.c index 6ee948af6a49..b257db2120b4 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -15,10 +15,12 @@ static void __io_notif_complete_tw(struct callback_head *cb) io_cq_lock(ctx); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq, true); + + list_add(¬if->cache_node, &ctx->notif_list_locked); + ctx->notif_locked_nr++; io_cq_unlock_post(ctx); percpu_ref_put(&ctx->refs); - kfree(notif); } static inline void io_notif_complete(struct io_notif *notif) @@ -45,21 +47,62 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, queue_work(system_unbound_wq, ¬if->commit_work); } +static void io_notif_splice_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + spin_lock(&ctx->completion_lock); + list_splice_init(&ctx->notif_list_locked, &ctx->notif_list); + ctx->notif_locked_nr = 0; + spin_unlock(&ctx->completion_lock); +} + +void io_notif_cache_purge(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + io_notif_splice_cached(ctx); + + while (!list_empty(&ctx->notif_list)) { + struct io_notif *notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + + list_del(¬if->cache_node); + kfree(notif); + } +} + +static inline bool io_notif_has_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + if (likely(!list_empty(&ctx->notif_list))) + return true; + if (data_race(READ_ONCE(ctx->notif_locked_nr) <= IO_NOTIF_SPLICE_BATCH)) + return false; + io_notif_splice_cached(ctx); + return !list_empty(&ctx->notif_list); +} + struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif; - notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); - if (!notif) - return NULL; + if (likely(io_notif_has_cached(ctx))) { + notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + list_del(¬if->cache_node); + } else { + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + /* pre-initialise some fields */ + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + } notif->seq = slot->seq++; notif->tag = slot->tag; - notif->ctx = ctx; - notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; - notif->uarg.callback = io_uring_tx_zerocopy_callback; /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); diff --git a/io_uring/notif.h b/io_uring/notif.h index 3d7a1d242e17..b23c9c0515bb 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -5,6 +5,8 @@ #include #include +#define IO_NOTIF_SPLICE_BATCH 32 + struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; @@ -13,6 +15,8 @@ struct io_notif { u64 tag; /* see struct io_notif_slot::seq */ u32 seq; + /* hook into ctx->notif_list and ctx->notif_list_locked */ + struct list_head cache_node; union { struct callback_head task_work; @@ -41,6 +45,7 @@ struct io_notif_slot { }; int io_notif_unregister(struct io_ring_ctx *ctx); +void io_notif_cache_purge(struct io_ring_ctx *ctx); struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot); From patchwork Tue Jul 12 20:52:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915646 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC2D5C43334 for ; Tue, 12 Jul 2022 20:55:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234267AbiGLUy6 (ORCPT ); Tue, 12 Jul 2022 16:54:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231556AbiGLUxw (ORCPT ); Tue, 12 Jul 2022 16:53:52 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED038D0E3B; Tue, 12 Jul 2022 13:53:28 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id f2so12810283wrr.6; Tue, 12 Jul 2022 13:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=S7QIHL9rz0f1Xw6h7UlVdKmXRdlJe5R2rqah9rB6Z1U=; b=WmTK37BhXvSOMT0wNwK2oZ6SpVo7QQt/gbQnNtPtgHH92KjlYgZJv0tz3Y50aPG+HX IlbmGmgnzmB/T0vLDIQxdYNW3nmvO/OwpNi4oj4G3HSa0RH1yldAdQtFN6gZwvFqxbD4 EMPoPQjd8Fwnvx4SiBzBT6OAc3qufkddonjOkdlfbkTnmUMyI3etiQZQtkop/RNMt+5g fOgA4xHviW6UoqGBDfrIkHsZzle6rJ93MgB9I9KaX97fOfJlHsOhWW01xZqkTGQYVuPW +IvZ+4tTOdVk93ASrEiXeM2WWeeCx/rT5avkPPlW3DXhZsxQI41OjCZ5IFgM5ZPcxczr 8hTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=S7QIHL9rz0f1Xw6h7UlVdKmXRdlJe5R2rqah9rB6Z1U=; b=cz/pYVxkCUZUs4e7O0EZC+32tr7CZlZsFOQt0LLqwd2bfVDwsem2r/Ujhd9Gvv/sAq rGx2vttuNYGzJrxRvmXtwQb8Nx9rf8jHUGLSISt+vmrwh0mjI+aeypQY0sbnY5/XoRbc 7QmgCI2aeyoK7QUAT3N1LEm77NhNlIP4bPnyErRFjyDssFTgLgDWzKE2O31hrp6aiDog BZyc/MsgV+3WnORrhRcCbC3v8O/YvBpqQL7hN688wjELqq18rm1gHfbXbCW86hHWKN5E ZRL3NsqzolW5b9Ly5wet+1Khp0ACG7k3RuxG8lwACWVnxcYdd2/5sUs8I8Qj5UnzNsz2 tqrA== X-Gm-Message-State: AJIora/0FPU0ALntquVX4aSlnVPTsZuNU2PNvCIeG57L/wA5WR4oJigb m/8PGiBvH/IRKadKGxpcJ3ytK5umsPg= X-Google-Smtp-Source: AGRyM1seF52c451hR69uLK5grok3sZWkfdwfn2W1+GceSC1Rz0Yx94N46AV5vP36NL/vysqrWr9Gbw== X-Received: by 2002:a5d:588b:0:b0:21d:a918:65a5 with SMTP id n11-20020a5d588b000000b0021da91865a5mr10607375wrf.210.1657659206728; Tue, 12 Jul 2022 13:53:26 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:26 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 16/27] io_uring: complete notifiers in tw Date: Tue, 12 Jul 2022 21:52:40 +0100 Message-Id: <089799ab665b10b78fdc614ae6d59fa7ef0d5f91.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 22 +++++++++++++++++++--- io_uring/notif.h | 3 +++ 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index b257db2120b4..aec74f88fc33 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -13,6 +13,11 @@ static void __io_notif_complete_tw(struct callback_head *cb) struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_ring_ctx *ctx = notif->ctx; + if (likely(notif->task)) { + io_put_task(notif->task, 1); + notif->task = NULL; + } + io_cq_lock(ctx); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq, true); @@ -43,6 +48,14 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, if (!refcount_dec_and_test(&uarg->refcnt)) return; + + if (likely(notif->task)) { + init_task_work(¬if->task_work, __io_notif_complete_tw); + if (likely(!task_work_add(notif->task, ¬if->task_work, + TWA_SIGNAL))) + return; + } + INIT_WORK(¬if->commit_work, io_notif_complete_wq); queue_work(system_unbound_wq, ¬if->commit_work); } @@ -134,12 +147,15 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx) for (i = 0; i < ctx->nr_notif_slots; i++) { struct io_notif_slot *slot = &ctx->notif_slots[i]; - if (slot->notif) - io_notif_slot_flush(slot); + if (!slot->notif) + continue; + if (WARN_ON_ONCE(slot->notif->task)) + slot->notif->task = NULL; + io_notif_slot_flush(slot); } kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; return 0; -} \ No newline at end of file +} diff --git a/io_uring/notif.h b/io_uring/notif.h index b23c9c0515bb..23ca7620fff9 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -11,6 +11,9 @@ struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ + struct task_struct *task; + /* cqe->user_data, io_notif_slot::tag if not overridden */ u64 tag; /* see struct io_notif_slot::seq */ From patchwork Tue Jul 12 20:52:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915648 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F19FCCA483 for ; Tue, 12 Jul 2022 20:55:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233877AbiGLUzE (ORCPT ); Tue, 12 Jul 2022 16:55:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233301AbiGLUyQ (ORCPT ); Tue, 12 Jul 2022 16:54:16 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 290C3D1391; Tue, 12 Jul 2022 13:53:30 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id v14so12825724wra.5; Tue, 12 Jul 2022 13:53:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Wp0kER+7OMasisCSsReAtX19cUc/NcYCt5wUZzImOuo=; b=GYWGSGPS6YNm2dtj3iq60Pdz/K6WV2DG30z4HfrV/UTGVMjSYlgplWvh/RqSnMbXOS U5Si8G2L9Mo2NtsZ2ANr3yK0HBYlIbIyC4y4bUxGXaXYK2eFZm2nfY0ejA+q6HDpsg5j TjrIwr0+TwbKQCpankfmi9tp5nfca28udAIQQKuHhrRyy4V0zzKw0P6epsOrFv3hFa56 UuAYTOtvDhms2TtE9ojNESZtub7hPY+ShOOKessexm6ts3wfxa3wXohf6S8JXAXFyGLq jGqXV18nDgCp1gGLDolwpG3+eNAIDKQ7GKnwf7sYnq5+JhBqzGm4zc80PijnDrAvnoJB WJSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Wp0kER+7OMasisCSsReAtX19cUc/NcYCt5wUZzImOuo=; b=D7kwf7hAivz/7snWLTW07RGHbP0xz8uRD7rMUek5wQh7xBdO7I9KX2BgKrHG38Gr9G DRCET/fauj1AhKzba/OBT747i4Y1zqZLW6mSsJjT7Ytd4JR56LPPGpN5Wt9byxrNSEpH JS/dob2bo08CVFTJzbtu6XYfDwCswhFafoQCfPNM5yLWmUpmoeoVGJZjqw81DYPVIouW Vrv/8rYZiBPYOmSGmJg2TCxOECCNvQeFtnb3bGMOnFxQD0vguRmlD0E4PpVB9oIfCMTy 28Zo2ISYCQOmSUz4/2cd//qinWZU0ve5E1I/fkofel1kmikhHb+/Y63zrg71s/LjUwwZ F/RA== X-Gm-Message-State: AJIora8CXZvdDaiPj6XTh+Dfmn1p/7GFCSLO6AYmWrB0YxXhCs/x8jdt ZLENzQNJ5XExTSaab56h4BM2RekzecY= X-Google-Smtp-Source: AGRyM1ttbGVusCjYADvlkUtJn2zh+eLy5GFazEjAG/V1u9H3yHHa4fYjl+DtXfAqC8Nswe3Ec4q4Lw== X-Received: by 2002:a05:6000:16cb:b0:21d:7b9e:d0af with SMTP id h11-20020a05600016cb00b0021d7b9ed0afmr23944052wrf.139.1657659207969; Tue, 12 Jul 2022 13:53:27 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:27 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 17/27] io_uring: add rsrc referencing for notifiers Date: Tue, 12 Jul 2022 21:52:41 +0100 Message-Id: <3cd7a01d26837945b6982fa9cf15a63230f2ed4f.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 5 +++++ io_uring/notif.h | 1 + io_uring/rsrc.h | 12 +++++++++--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index aec74f88fc33..0a2e98bd74f6 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -7,10 +7,12 @@ #include "io_uring.h" #include "notif.h" +#include "rsrc.h" static void __io_notif_complete_tw(struct callback_head *cb) { struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_rsrc_node *rsrc_node = notif->rsrc_node; struct io_ring_ctx *ctx = notif->ctx; if (likely(notif->task)) { @@ -25,6 +27,7 @@ static void __io_notif_complete_tw(struct callback_head *cb) ctx->notif_locked_nr++; io_cq_unlock_post(ctx); + io_rsrc_put_node(rsrc_node, 1); percpu_ref_put(&ctx->refs); } @@ -119,6 +122,8 @@ struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); + notif->rsrc_node = ctx->rsrc_node; + io_charge_rsrc_node(ctx); return notif; } diff --git a/io_uring/notif.h b/io_uring/notif.h index 23ca7620fff9..1dd48efb7744 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -10,6 +10,7 @@ struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + struct io_rsrc_node *rsrc_node; /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ struct task_struct *task; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 87f58315b247..af342fd239d0 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -135,6 +135,13 @@ static inline void io_req_put_rsrc_locked(struct io_kiocb *req, } } +static inline void io_charge_rsrc_node(struct io_ring_ctx *ctx) +{ + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); +} + static inline void io_req_set_rsrc_node(struct io_kiocb *req, struct io_ring_ctx *ctx, unsigned int issue_flags) @@ -144,9 +151,8 @@ static inline void io_req_set_rsrc_node(struct io_kiocb *req, if (!(issue_flags & IO_URING_F_UNLOCKED)) { lockdep_assert_held(&ctx->uring_lock); - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); + + io_charge_rsrc_node(ctx); } else { percpu_ref_get(&req->rsrc_node->refs); } From patchwork Tue Jul 12 20:52:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915647 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B402CC43334 for ; Tue, 12 Jul 2022 20:55:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233759AbiGLUzD (ORCPT ); Tue, 12 Jul 2022 16:55:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234203AbiGLUyC (ORCPT ); Tue, 12 Jul 2022 16:54:02 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3B4FC4A; Tue, 12 Jul 2022 13:53:31 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id n185so5401730wmn.4; Tue, 12 Jul 2022 13:53:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mddxqgnN1yh+xxOhPWC4dQqQBxAtQdPRnybH+6hTMsI=; b=Yb1tXPVjRfgzdWZkwJhHv7IHoK2pniKpmYyzXJCC+3KYJEq2cuoV7UXCJchFoA30+P lLynJbO9bUw3ETkyGi1t6buNFlme7ETGozCcdUVMwnrtfuiWGdnrlj0D2wgRhrY6ooAz 7iU6S9XyUZPN/oAjY135mt/+yBmi9PD0JqxWxH+Z+Uyc4UybXI9tcEgDCZFzXdZBZDd1 QSQ54k9Ru5TqYlGHdxMHbUXTIRluGa5BgRzayu5Vp+tubp47fIhSi8WJU72nXWXSYAvT xgtBZzLvYz5JQnM87SEt2jpDW1aq/haRR3Q5Wxmdqt5Zbm/G4KUEssLdOurr5EbdSZPV 5Vmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mddxqgnN1yh+xxOhPWC4dQqQBxAtQdPRnybH+6hTMsI=; b=UFDesXCawAP10MNo+K2XsP1N5cCzHBpX4SlaIZWbmM/bQsmvYXC3cy7o43onfxNSb6 xszDEK+jHnLMz3jNp2H+ChNPj73+FzaozgrZWmdkQVtAUxt/rmHzyqkeQQL8QfCnG1SW jAzWcq8UfUNVSoneb+tme7ZZOkyoW+5x3UIu0vw8MMe3j+1eJbzOe8K0sa5T1w5jjERJ t8aiW8Cev0bKM320gZIOgvxCRv7Gna97FEY1PliHLmqkalv0wtMiMsGW+hN1nZ8UplUY ObJYIQZe2o1TKEk8QvMwibQF1ahy2CH1k7ifEwjieumrsmw+sHBIAA7dT5J2t3TL57y/ aQDw== X-Gm-Message-State: AJIora/IxD2FpV6rHNgUp1rTfb+F6U0CsnwQ/hgreUSLK+klgZq/RQEE DRx1hf7kTogrc22eQV+wuzJaPG/hn1s= X-Google-Smtp-Source: AGRyM1uBx/j0OyBwsle4ybJkRt3ODr/BAjbZIaGy2igf4h9B7g3gPW2OkHjJO71FlbD1KRdfAiTfzg== X-Received: by 2002:a05:600c:4e16:b0:3a2:ef34:dbe3 with SMTP id b22-20020a05600c4e1600b003a2ef34dbe3mr4825412wmq.71.1657659209185; Tue, 12 Jul 2022 13:53:29 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:28 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 18/27] io_uring: add notification slot registration Date: Tue, 12 Jul 2022 21:52:42 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 17 ++++++++++++++ io_uring/io_uring.c | 9 ++++++++ io_uring/notif.c | 43 +++++++++++++++++++++++++++++++++++ io_uring/notif.h | 3 +++ 4 files changed, 72 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e858dba2e6c9..f1ba8e934168 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -454,6 +454,10 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + /* zerocopy notification API */ + IORING_REGISTER_NOTIFIERS = 26, + IORING_UNREGISTER_NOTIFIERS = 27, + /* this goes last */ IORING_REGISTER_LAST }; @@ -500,6 +504,19 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_notification_slot { + __u64 tag; + __u64 resv[3]; +}; + +struct io_uring_notification_register { + __u32 nr_slots; + __u32 resv; + __u64 resv2; + __u64 data; + __u64 resv3; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index bdc5a2839d94..41ef98a43d32 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3875,6 +3875,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_file_alloc_range(ctx, arg); break; + case IORING_REGISTER_NOTIFIERS: + ret = io_notif_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_NOTIFIERS: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_notif_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/io_uring/notif.c b/io_uring/notif.c index 0a2e98bd74f6..e6d98dc208c7 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -162,5 +162,48 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx) kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; + io_notif_cache_purge(ctx); + return 0; +} + +__cold int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size) + __must_hold(&ctx->uring_lock) +{ + struct io_uring_notification_slot __user *slots; + struct io_uring_notification_slot slot; + struct io_uring_notification_register reg; + unsigned i; + + if (ctx->nr_notif_slots) + return -EBUSY; + if (size != sizeof(reg)) + return -EINVAL; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (!reg.nr_slots || reg.nr_slots > IORING_MAX_NOTIF_SLOTS) + return -EINVAL; + if (reg.resv || reg.resv2 || reg.resv3) + return -EINVAL; + + slots = u64_to_user_ptr(reg.data); + ctx->notif_slots = kvcalloc(reg.nr_slots, sizeof(ctx->notif_slots[0]), + GFP_KERNEL_ACCOUNT); + if (!ctx->notif_slots) + return -ENOMEM; + + for (i = 0; i < reg.nr_slots; i++, ctx->nr_notif_slots++) { + struct io_notif_slot *notif_slot = &ctx->notif_slots[i]; + + if (copy_from_user(&slot, &slots[i], sizeof(slot))) { + io_notif_unregister(ctx); + return -EFAULT; + } + if (slot.resv[0] | slot.resv[1] | slot.resv[2]) { + io_notif_unregister(ctx); + return -EINVAL; + } + notif_slot->tag = slot.tag; + } return 0; } diff --git a/io_uring/notif.h b/io_uring/notif.h index 1dd48efb7744..00efe164bdc4 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -6,6 +6,7 @@ #include #define IO_NOTIF_SPLICE_BATCH 32 +#define IORING_MAX_NOTIF_SLOTS (1U << 10) struct io_notif { struct ubuf_info uarg; @@ -48,6 +49,8 @@ struct io_notif_slot { u32 seq; }; +int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size); int io_notif_unregister(struct io_ring_ctx *ctx); void io_notif_cache_purge(struct io_ring_ctx *ctx); From patchwork Tue Jul 12 20:52:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915649 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2899EC433EF for ; Tue, 12 Jul 2022 20:55:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234317AbiGLUzL (ORCPT ); Tue, 12 Jul 2022 16:55:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232585AbiGLUye (ORCPT ); Tue, 12 Jul 2022 16:54:34 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEBC71FD; Tue, 12 Jul 2022 13:53:32 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id bk26so12796144wrb.11; Tue, 12 Jul 2022 13:53:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Pw3xCF+IE0zNhMTl93PD4uOC8iaBTkE5EoQbu6T6UBI=; b=SJVWG8+ZpxKYt9//cbmEV1CwWTzVT6ePgkcZN6bT73VKRqsh7yfkwcjYi59+pLRbfV hUQiTkI/FvxUMLpxO/rXKB72qsIQ4Ic9V3Q4cBfH/S7Bu3OAQj1ZbSlzZU06O8qEFMAI wuTRp17yjQeNwVgZXpaJur8q7qRiaYa3n9xAZs8PZ2H8ikD1c5PFnGV4pbkGdVQ+ROfZ NO42tFrcoppZXzQvHC5mDV7OiYQrAEUyF+E9/crFsrE381Fb/Va/Ysut6DKBpeM/UPer 8DI9F/1c7YxT4hMdp0vR8zHDS1bqR7Sa6IMtBuXzHQ4VNH0cmBzG5sgQpuof3eA8Bbp+ n6qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Pw3xCF+IE0zNhMTl93PD4uOC8iaBTkE5EoQbu6T6UBI=; b=p/e5vEvkTAdSjPJ9G7liWW2lV86kaiTxjWootG2uNLGI/dPLuVW+BWjdsXT8092FGv 15kMyKW0olUo43H1RFtDppRW+RPDkpThgD3LjXJpReSEcVqTAsU6e5nLso4Y2gBUuMQo swNT7kNBVdv6gIoTXx9Ur1bXrdihObEunLzqbrVZ4SL4GUBBy7biKn8t/6vfKsB7tAsC jUeWTtEyOMxtZONBhhWmscs10Q1UE1ksD53hcesxnTnfNlXsf62P8JwzHiLcQo4eHdDf 6OTOrynC2rKL0aEm09LdUHfqjP1wTOhsvbWUjDOykUtu8pVO5YqumpVkjxOb73YoBw8Y I4eg== X-Gm-Message-State: AJIora/j9dMb+yvzJ8VVEHva9mKwY2py71eQXvKr0b98O++EqaWq5BEM ct8UdtKsc24GQzlPA+AR51iQpomkffY= X-Google-Smtp-Source: AGRyM1sgaP0mGniZtsIizQQ/BZ406MWmZevgqCljJnKhPDUYmuViayCwwRPDuujhrswvmUBsp7DoVQ== X-Received: by 2002:a5d:5c05:0:b0:21d:83b4:d339 with SMTP id cc5-20020a5d5c05000000b0021d83b4d339mr23673286wrb.611.1657659210420; Tue, 12 Jul 2022 13:53:30 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:30 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 19/27] io_uring: wire send zc request type Date: Tue, 12 Jul 2022 21:52:43 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 5 ++ io_uring/net.c | 94 +++++++++++++++++++++++++++++++++++ io_uring/net.h | 4 ++ io_uring/opdef.c | 15 ++++++ 4 files changed, 118 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f1ba8e934168..dcef9d6e7f78 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -63,6 +63,10 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + struct { + __u16 notification_idx; + __u16 __pad; + }; }; union { struct { @@ -194,6 +198,7 @@ enum io_uring_op { IORING_OP_GETXATTR, IORING_OP_SOCKET, IORING_OP_URING_CMD, + IORING_OP_SENDZC_NOTIF, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/net.c b/io_uring/net.c index 2dd61fcf91d8..399267e8f1ef 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -13,6 +13,7 @@ #include "io_uring.h" #include "kbuf.h" #include "net.h" +#include "notif.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -58,6 +59,15 @@ struct io_sr_msg { unsigned int flags; }; +struct io_sendzc { + struct file *file; + void __user *buf; + size_t len; + u16 slot_idx; + unsigned msg_flags; + unsigned flags; +}; + #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -652,6 +662,90 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_sendzc *zc = io_kiocb_to_cmd(req); + + if (READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0]) || + READ_ONCE(sqe->addr3)) + return -EINVAL; + + zc->flags = READ_ONCE(sqe->ioprio); + if (zc->flags & ~IORING_RECVSEND_POLL_FIRST) + return -EINVAL; + + zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + zc->len = READ_ONCE(sqe->len); + zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; + zc->slot_idx = READ_ONCE(sqe->notification_idx); + if (zc->msg_flags & MSG_DONTWAIT) + req->flags |= REQ_F_NOWAIT; +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_sendzc *zc = io_kiocb_to_cmd(req); + struct io_notif_slot *notif_slot; + struct io_notif *notif; + struct msghdr msg; + struct iovec iov; + struct socket *sock; + unsigned msg_flags; + int ret, min_ret = 0; + + if (!(req->flags & REQ_F_POLLED) && + (zc->flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + if (issue_flags & IO_URING_F_UNLOCKED) + return -EAGAIN; + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + + notif_slot = io_get_notif_slot(ctx, zc->slot_idx); + if (!notif_slot) + return -EINVAL; + notif = io_get_notif(ctx, notif_slot); + if (!notif) + return -ENOMEM; + + msg.msg_name = NULL; + msg.msg_control = NULL; + msg.msg_controllen = 0; + msg.msg_namelen = 0; + + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); + if (unlikely(ret)) + return ret; + + msg_flags = zc->msg_flags | MSG_ZEROCOPY; + if (issue_flags & IO_URING_F_NONBLOCK) + msg_flags |= MSG_DONTWAIT; + if (msg_flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + + msg.msg_flags = msg_flags; + msg.msg_ubuf = ¬if->uarg; + msg.sg_from_iter = NULL; + ret = sock_sendmsg(sock, &msg); + + if (unlikely(ret < min_ret)) { + if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) + return -EAGAIN; + return ret == -ERESTARTSYS ? -EINTR : ret; + } + + io_req_set_res(req, ret, 0); + return IOU_OK; +} + int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_accept *accept = io_kiocb_to_cmd(req); diff --git a/io_uring/net.h b/io_uring/net.h index 81d71d164770..1dba8befebb3 100644 --- a/io_uring/net.h +++ b/io_uring/net.h @@ -40,4 +40,8 @@ int io_socket(struct io_kiocb *req, unsigned int issue_flags); int io_connect_prep_async(struct io_kiocb *req); int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_connect(struct io_kiocb *req, unsigned int issue_flags); + +int io_sendzc(struct io_kiocb *req, unsigned int issue_flags); +int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif diff --git a/io_uring/opdef.c b/io_uring/opdef.c index a7b84b43e6c2..7ab19bbf3126 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -470,6 +470,21 @@ const struct io_op_def io_op_defs[] = { .issue = io_uring_cmd, .prep_async = io_uring_cmd_prep_async, }, + [IORING_OP_SENDZC_NOTIF] = { + .name = "SENDZC_NOTIF", + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollout = 1, + .audit_skip = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_sendzc_prep, + .issue = io_sendzc, +#else + .prep = io_eopnotsupp_prep, +#endif + + }, }; const char *io_uring_get_opcode(u8 opcode) From patchwork Tue Jul 12 20:52:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915651 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C668AC433EF for ; Tue, 12 Jul 2022 20:55:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234336AbiGLUzo (ORCPT ); Tue, 12 Jul 2022 16:55:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234107AbiGLUyh (ORCPT ); Tue, 12 Jul 2022 16:54:37 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BF616387; Tue, 12 Jul 2022 13:53:34 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id b26so12831204wrc.2; Tue, 12 Jul 2022 13:53:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=u1S2AivxVWXdd+4RPf9USNps4dL740oJjGqqaX0bN80=; b=AfNA+WmPd/ODqSuz0pd5DC7lj+/T4z/EeKuS84uKE9jZwUFV2vy/bvq29V+qjT+VSy UrlX0Chn/ehVo1QPRzLxUFKXsFnKZgwz0rxVxURd3WuQSiATfDYJM5knQVQ40MeN/+mF w2MShjYqFQp4Xn3dqjJZIOi+37r+gQ+u5Ehtjk57x4GGczaUUuZXRR4v0ifQ63d8Ke0q AFY/uXn12dRps575KClC664t0nwgVTUbAE5e3bH27BO0u3/TQLLn8davgOCSWrqKanfj YoWpS6iM4d1qT+yogHZNqMQcH76ALBGbgrl1L61ekH8JtMlJmUQaY+hupfiUSkIg5U7l KeZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=u1S2AivxVWXdd+4RPf9USNps4dL740oJjGqqaX0bN80=; b=cuCnOPl9frvjoBz5jy49LySVDwo6jUITEPPSgiZlszL+wnnb1WnNAlm/d/r1dcIrV+ ZCtDYaAH5iMcitj63H82nss3IqWoiinDgPrSmjie+iWLOGVGu0mun6MAo0cAXaQtFqCf HUQVGsy5iBeo3j1Td/bZ7CL6aOXfOabENecQco02K+h4pVLL7LN7Xmg4lWgSj5W/7kHx yhf3+69AVkPVqdmpmzLlJkuHk7/10OfIqilj4Jw4yEebjHjVVSjxoQNWqDdVSP3D0okH 1gnMbg64YYkNo2EB9ToWloSe+GdVYjLC/r5ZwCr9mBEIJ9xCcybJzbVeblAs3s5GWoIr 2TNg== X-Gm-Message-State: AJIora/iimZQfKBgGjPuz8xklw9aQCtk9/xyqisjx4O3jCVe5l0cySoW e6FA7i6QCNB/Jcs+aowOpvRxESf50Rw= X-Google-Smtp-Source: AGRyM1uE7Cqktcst31MQ9kgPT70aVpCSQOze3Uyp1WAJPWaljaRlxbLFnmVVs8pXxGoWrp3G0OSJng== X-Received: by 2002:a5d:4c91:0:b0:21d:8293:66dc with SMTP id z17-20020a5d4c91000000b0021d829366dcmr25537183wrs.30.1657659211786; Tue, 12 Jul 2022 13:53:31 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:31 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 20/27] io_uring: account locked pages for non-fixed zc Date: Tue, 12 Jul 2022 21:52:44 +0100 Message-Id: <19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov --- io_uring/net.c | 1 + io_uring/notif.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index 399267e8f1ef..69273d4f4ef0 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -724,6 +724,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); if (unlikely(ret)) return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) diff --git a/io_uring/notif.c b/io_uring/notif.c index e6d98dc208c7..c5179e5c1cd6 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -14,7 +14,13 @@ static void __io_notif_complete_tw(struct callback_head *cb) struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_rsrc_node *rsrc_node = notif->rsrc_node; struct io_ring_ctx *ctx = notif->ctx; + struct mmpin *mmp = ¬if->uarg.mmp; + if (mmp->user) { + atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm); + free_uid(mmp->user); + mmp->user = NULL; + } if (likely(notif->task)) { io_put_task(notif->task, 1); notif->task = NULL; From patchwork Tue Jul 12 20:52:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915650 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66857C43334 for ; Tue, 12 Jul 2022 20:55:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234167AbiGLUzn (ORCPT ); Tue, 12 Jul 2022 16:55:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234096AbiGLUyh (ORCPT ); Tue, 12 Jul 2022 16:54:37 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A75E6567; Tue, 12 Jul 2022 13:53:35 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id v14so12825946wra.5; Tue, 12 Jul 2022 13:53:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4OVOf9IDPubhHkoLu+U0v1fqovo6hHLbrezmSTxRVeo=; b=GMi9+1Jdab22yW+S6P6gEbiOQ6kCcVOpmm67nF4awMz862hR+G9cxEvb/lNgYhNmOh rK3aBmGd6NorGC2R2StFFp3GaPQ3WkjtxlkF3q4GutFUDPqj6BdhjfjSeKwBQClbbMtP /FJWw6ajKWB4YKo9DZDgdyZiPKpGi4+g9LCFwbQoBR7yuiOLo1AvIgbVP0P9c09SPFGh Gnv62nbGBEzG04ZBsb3iY/7Z+310AwcE9vBbFemBnSpJugIAdBS28G7xD0GfBGordeXy wO7RjlmBh9UY0DVFhGPOolS2MswaTXSUtgpFYJKbRcAL74GKZpcLTGlwYseJVa4VfPoV mBFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4OVOf9IDPubhHkoLu+U0v1fqovo6hHLbrezmSTxRVeo=; b=m3PKGWz0fCq37NzgSNB/Nn8BTe/zqCUrSRaHTLdsRP9tH/yVUcC4AjY3KQPq11KVYI HdNuB6+Lmhuqr5/gcFJfyUPwqMMZD7O2FcV3sogt5MY9/wWTdL2HVK/2sCp5//5NthpW 98StNT4g1Wxr2hRknOsvBwsNYKXqEHkgL262SygFaxmAh/MIxK06/Om/9j2yagIAc/th 7KviPRdCUxHNWshKa64uB0ODMM2eJS2dar7hs2MryD7r6ugI8SqMYVo9+JDVQJJcSRt4 vjdAaYdwWv/P48eRjGhrwdE4BgcXE/2W5ogXSSifaX91GpmnTyovVTix1RynxWa12bYI lI+w== X-Gm-Message-State: AJIora/wDUbSWoo6A5ofdupEZnmb7sZlAG1DDWcog76FTcAJBEK0F/9D OpqpgH4LHJfgwe5jlMtgckN7SyOcgVg= X-Google-Smtp-Source: AGRyM1tLEiIMPU+NYy92WNx2lvrBRXuNzrH49l6b4b67UuFqmQWKjtm9PIXZTy+Fq+MZbEE7DvpqwQ== X-Received: by 2002:a5d:6da5:0:b0:21d:9275:4de0 with SMTP id u5-20020a5d6da5000000b0021d92754de0mr22740683wrs.670.1657659213079; Tue, 12 Jul 2022 13:53:33 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:32 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 21/27] io_uring: allow to pass addr into sendzc Date: Tue, 12 Jul 2022 21:52:45 +0100 Message-Id: <70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 2 +- io_uring/net.c | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index dcef9d6e7f78..9303bf5236f7 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -65,7 +65,7 @@ struct io_uring_sqe { __u32 file_index; struct { __u16 notification_idx; - __u16 __pad; + __u16 addr_len; }; }; union { diff --git a/io_uring/net.c b/io_uring/net.c index 69273d4f4ef0..2172cf3facd8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -66,6 +66,8 @@ struct io_sendzc { u16 slot_idx; unsigned msg_flags; unsigned flags; + unsigned addr_len; + void __user *addr; }; #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) @@ -666,8 +668,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = io_kiocb_to_cmd(req); - if (READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0]) || - READ_ONCE(sqe->addr3)) + if (READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)) return -EINVAL; zc->flags = READ_ONCE(sqe->ioprio); @@ -680,6 +681,10 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->slot_idx = READ_ONCE(sqe->notification_idx); if (zc->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; + + zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + zc->addr_len = READ_ONCE(sqe->addr_len); + #ifdef CONFIG_COMPAT if (req->ctx->compat) zc->msg_flags |= MSG_CMSG_COMPAT; @@ -689,6 +694,7 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { + struct sockaddr_storage address; struct io_ring_ctx *ctx = req->ctx; struct io_sendzc *zc = io_kiocb_to_cmd(req); struct io_notif_slot *notif_slot; @@ -726,6 +732,14 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret; mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->addr) { + ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); + if (unlikely(ret < 0)) + return ret; + msg.msg_name = (struct sockaddr *)&address; + msg.msg_namelen = zc->addr_len; + } + msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) msg_flags |= MSG_DONTWAIT; From patchwork Tue Jul 12 20:52:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915652 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E9E5CCA47F for ; Tue, 12 Jul 2022 20:55:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234351AbiGLUzq (ORCPT ); Tue, 12 Jul 2022 16:55:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234117AbiGLUyh (ORCPT ); Tue, 12 Jul 2022 16:54:37 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78B4ABE04; Tue, 12 Jul 2022 13:53:36 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id h17so12873849wrx.0; Tue, 12 Jul 2022 13:53:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=f+FHjKfi2Y2wFGJfFsXW/4+JIVb8vNyaCizWsPOe68o=; b=MjxNJFqouObPHfpi8IsDm/+O6nzEcjAIdJ+Fha0uNZ13V0q9H1ytmdFC0fKg1WmIKp TEzghd9lE5wZllrdF3EA2agJw3M8v9Rbqu1gzKbAvtVbKSKCSFHwT2tyN5ClMx440+k9 72H184I4b0fRQQ9BCQIf2YjqhI4VKnMdhFkEivQxXHPoXWe6ysqjnSIGEAXAJloYPzu9 4WTO1fqNkAXD7rngWOTHgxlcJdjdQjkAIRvDw1pqfqUdbk41nllVECOcIpQDXbBxLh2X jYHd6PqmpKMCJb3utCcp7w50P57l5VOQ/EEx/+GpGbxY03+Y5/Fs/33O+yfuWs+BaJ2v Yl7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=f+FHjKfi2Y2wFGJfFsXW/4+JIVb8vNyaCizWsPOe68o=; b=pB0RMx9kOBrJLeweUHHuMf7LNvrOIYyeKW5iOfI+1Uq77aHTe/ZOHTDTp0YHUFUhzK 4H+kqleTiDe5xZmrBu+2ZUhRstOdnyDrgWldJjDGCfcajUHvWwvtguhuWyRCXp2W4w3u xQhp/tZ4LaR4/46qL8m6NTXBkPZFvOqcuaiyh5hav+jy+j1lkOzCP5GLEc4DFf+wncB7 bEwQ11kZj3iTDeaB2TU7zbz6Qc5hbpNZ97DcbX/UDY7Z3UmIkCPtrfv0B3TNfH3DIqcG uk4quGrSR5opB781KR6St3Ghu4KAqrjUEYA/SX71QdTdPZCVjC2KNJshk0/ViWkuSJ/a FjXw== X-Gm-Message-State: AJIora+uH9jfWOHelhHuyTQkNACyKKyRyw2SL5hDpOAaUnGZFQf5VDBc a/x3mhZBMYf1A3y6sxR6OOPwI66Da4c= X-Google-Smtp-Source: AGRyM1v+7CNGTSY42yWM8UiG0FT052q4/NahMplXT7LsfGwqDMQUdIDVukiakVV0nxw0QMBMhxZQlA== X-Received: by 2002:adf:f1ca:0:b0:21d:5eec:1320 with SMTP id z10-20020adff1ca000000b0021d5eec1320mr24762908wro.196.1657659214324; Tue, 12 Jul 2022 13:53:34 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:33 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 22/27] io_uring: sendzc with fixed buffers Date: Tue, 12 Jul 2022 21:52:46 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 6 +++++- io_uring/net.c | 29 ++++++++++++++++++++++++----- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9303bf5236f7..3f2305bc5c79 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -269,9 +269,13 @@ enum io_uring_op { * IORING_RECV_MULTISHOT Multishot recv. Sets IORING_CQE_F_MORE if * the handler will continue to report * CQEs on behalf of the same SQE. + * + * IORING_RECVSEND_FIXED_BUF Use registered buffers, the index is stored in + * the buf_index field. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) -#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECVSEND_FIXED_BUF (1U << 2) /* * accept flags stored in sqe->ioprio diff --git a/io_uring/net.c b/io_uring/net.c index 2172cf3facd8..0259fbbad591 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -14,6 +14,7 @@ #include "kbuf.h" #include "net.h" #include "notif.h" +#include "rsrc.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -667,13 +668,23 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = io_kiocb_to_cmd(req); + struct io_ring_ctx *ctx = req->ctx; if (READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)) return -EINVAL; zc->flags = READ_ONCE(sqe->ioprio); - if (zc->flags & ~IORING_RECVSEND_POLL_FIRST) + if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_FIXED_BUF)) return -EINVAL; + if (zc->flags & IORING_RECVSEND_FIXED_BUF) { + unsigned idx = READ_ONCE(sqe->buf_index); + + if (unlikely(idx >= ctx->nr_user_bufs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_user_bufs); + req->imu = READ_ONCE(ctx->user_bufs[idx]); + io_req_set_rsrc_node(req, ctx, 0); + } zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); zc->len = READ_ONCE(sqe->len); @@ -727,10 +738,18 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_controllen = 0; msg.msg_namelen = 0; - ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); - if (unlikely(ret)) - return ret; - mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->flags & IORING_RECVSEND_FIXED_BUF) { + ret = io_import_fixed(WRITE, &msg.msg_iter, req->imu, + (u64)zc->buf, zc->len); + if (unlikely(ret)) + return ret; + } else { + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, + &msg.msg_iter); + if (unlikely(ret)) + return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + } if (zc->addr) { ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); From patchwork Tue Jul 12 20:52:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915653 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9CC6C43334 for ; Tue, 12 Jul 2022 20:55:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232476AbiGLUzs (ORCPT ); Tue, 12 Jul 2022 16:55:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233470AbiGLUyv (ORCPT ); Tue, 12 Jul 2022 16:54:51 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6DBE1A3BA; Tue, 12 Jul 2022 13:53:38 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id bi22-20020a05600c3d9600b003a04de22ab6so93929wmb.1; Tue, 12 Jul 2022 13:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=w7fyAuQ2khz2WbM6wASTr+55E3PKzSpYhdOm4HZ0ElA=; b=Yh5bnQBKwBQchTgTJkG71t6gslEQC+4LA0Y0cuOMkaNQZZjHQNIdug++72K5BOXMRB 9rvcB4xK6Qyp4HuPnF46aKvxtcsYIo/kPrp6GOZtGvRXhBVKIy1qNfPiKsgLqlqTVop4 EZAvl5ye7f/FxWyXixAtSP6P0DMoHrmxUKQ0jKeaTquu7BdBaNCS4YQStKPdpQEMGeXO jglnrOkPdbEA5/XdEFvSCPQ9fBv3CDvvfzP5dpAy5bYIVz3B2p+IBJUyhoIVUkxABALt Gn4XulJT6LWess6N89owEmwZgH9XoIFv2LoWPhPi+0lQOKbXWtHmRPmFZn1K5k9pW1/R h4dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=w7fyAuQ2khz2WbM6wASTr+55E3PKzSpYhdOm4HZ0ElA=; b=w7BsNhT642zk5H/n1E+XIrwlKqw40qJYEFkGn9eOjaUZNpUiood6lSltDEtd6IW1Y3 HRJ9+yCuZd5Xs0dRqZ+Qqj6dRPAglDNWaPZ3KbEuwcocdCGwx9HdYPSucRoAMJ8mFMey rNzzhYzZfiGwczTcAKwfnroyKKP55gLnQ/zpvKTJbxtdlY2gnOMFDBd6+EcNHgHYihNf aB3BQzAchR9st4u0/XZfhGQTGIEk2bASy1V7DjKHDc9PkSBK60w3/TDHsZvxPGUfseJV ry19c+KM+03uqwBnB2HFPuQ7FuE2OVlIkdDBaQ8ECHqFZcjtxY79W7UcTtVASzTbRlk0 9NNg== X-Gm-Message-State: AJIora9fYynn3Al/RID2o37kjEMHuaob3UETzgw0tRCnBxxG/y1dfdHk V3GZf/2wqMxy4SBhWIqX5lvJ1EYJZqE= X-Google-Smtp-Source: AGRyM1tpstMUUxRe+wp5Mn3k1GrbgZ7KLrunoocS6Dp+s8Cgj8OHiCKKQSY2gM9Irr1oddMpvbQH7A== X-Received: by 2002:a05:600c:1ca9:b0:3a1:887d:1567 with SMTP id k41-20020a05600c1ca900b003a1887d1567mr5947565wms.175.1657659215533; Tue, 12 Jul 2022 13:53:35 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:35 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 23/27] io_uring: flush notifiers after sendzc Date: Tue, 12 Jul 2022 21:52:47 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 4 ++++ io_uring/io_uring.c | 11 +---------- io_uring/io_uring.h | 10 ++++++++++ io_uring/net.c | 5 ++++- io_uring/notif.c | 2 +- io_uring/notif.h | 11 +++++++++++ 6 files changed, 31 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 3f2305bc5c79..7d21fba54b62 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -272,10 +272,14 @@ enum io_uring_op { * * IORING_RECVSEND_FIXED_BUF Use registered buffers, the index is stored in * the buf_index field. + * + * IORING_RECVSEND_NOTIF_FLUSH Flush a notification after a successful + * successful. Only for zerocopy sends. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) #define IORING_RECV_MULTISHOT (1U << 1) #define IORING_RECVSEND_FIXED_BUF (1U << 2) +#define IORING_RECVSEND_NOTIF_FLUSH (1U << 3) /* * accept flags stored in sqe->ioprio diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 41ef98a43d32..e4f3a1ede2f4 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -615,7 +615,7 @@ void __io_put_task(struct task_struct *task, int nr) put_task_struct_many(task, nr); } -static void io_task_refs_refill(struct io_uring_task *tctx) +void io_task_refs_refill(struct io_uring_task *tctx) { unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR; @@ -624,15 +624,6 @@ static void io_task_refs_refill(struct io_uring_task *tctx) tctx->cached_refs += refill; } -static inline void io_get_task_refs(int nr) -{ - struct io_uring_task *tctx = current->io_uring; - - tctx->cached_refs -= nr; - if (unlikely(tctx->cached_refs < 0)) - io_task_refs_refill(tctx); -} - static __cold void io_uring_drop_tctx_refs(struct task_struct *task) { struct io_uring_task *tctx = task->io_uring; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index b8c858727dc8..d9f2f5c71481 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -69,6 +69,7 @@ void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); void io_queue_next(struct io_kiocb *req); void __io_put_task(struct task_struct *task, int nr); +void io_task_refs_refill(struct io_uring_task *tctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); @@ -265,4 +266,13 @@ static inline void io_put_task(struct task_struct *task, int nr) __io_put_task(task, nr); } +static inline void io_get_task_refs(int nr) +{ + struct io_uring_task *tctx = current->io_uring; + + tctx->cached_refs -= nr; + if (unlikely(tctx->cached_refs < 0)) + io_task_refs_refill(tctx); +} + #endif diff --git a/io_uring/net.c b/io_uring/net.c index 0259fbbad591..bf9916d5e50c 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -674,7 +674,8 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; zc->flags = READ_ONCE(sqe->ioprio); - if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_FIXED_BUF)) + if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | + IORING_RECVSEND_FIXED_BUF | IORING_RECVSEND_NOTIF_FLUSH)) return -EINVAL; if (zc->flags & IORING_RECVSEND_FIXED_BUF) { unsigned idx = READ_ONCE(sqe->buf_index); @@ -776,6 +777,8 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret == -ERESTARTSYS ? -EINTR : ret; } + if (zc->flags & IORING_RECVSEND_NOTIF_FLUSH) + io_notif_slot_flush_submit(notif_slot, 0); io_req_set_res(req, ret, 0); return IOU_OK; } diff --git a/io_uring/notif.c b/io_uring/notif.c index c5179e5c1cd6..a93887451bbb 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -133,7 +133,7 @@ struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, return notif; } -static void io_notif_slot_flush(struct io_notif_slot *slot) +void io_notif_slot_flush(struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif = slot->notif; diff --git a/io_uring/notif.h b/io_uring/notif.h index 00efe164bdc4..6cd73d7b965b 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -54,6 +54,7 @@ int io_notif_register(struct io_ring_ctx *ctx, int io_notif_unregister(struct io_ring_ctx *ctx); void io_notif_cache_purge(struct io_ring_ctx *ctx); +void io_notif_slot_flush(struct io_notif_slot *slot); struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot); @@ -74,3 +75,13 @@ static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, idx = array_index_nospec(idx, ctx->nr_notif_slots); return &ctx->notif_slots[idx]; } + +static inline void io_notif_slot_flush_submit(struct io_notif_slot *slot, + unsigned int issue_flags) +{ + if (!(issue_flags & IO_URING_F_UNLOCKED)) { + slot->notif->task = current; + io_get_task_refs(1); + } + io_notif_slot_flush(slot); +} From patchwork Tue Jul 12 20:52:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915654 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7A8ACCA47F for ; Tue, 12 Jul 2022 20:55:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231857AbiGLUzt (ORCPT ); Tue, 12 Jul 2022 16:55:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234136AbiGLUyx (ORCPT ); Tue, 12 Jul 2022 16:54:53 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C938275F0; Tue, 12 Jul 2022 13:53:40 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id bk26so12796424wrb.11; Tue, 12 Jul 2022 13:53:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZyhUoseN7KcL9h32+uJElLshw0im2xlB0TZ9Ykk2TuM=; b=K/W12bdjjB492Vw/NxDsoG9XLKOA0bhbPjHXqLPwZwbgYiNIHsr2dMt29RR5G4IrK3 BZsnR+vs4toRkZ3scnvhn8pOVPz9V/fROfYiiaAs/dprOGshnmWYHbSRSc3GLL7Oki0F RVjL3hx5Ru7f/GK/1nxuezIbB5sLtvN2tg+ZQzYEzq2Ne1z0IXjLz36kkLUaOg9dIm1l bpePGHId/KZ2/QsgSCOzxxXhi1KQW3nfrPQhw/7DECfWqwOnomgjZEfseM6Z57/z4LYy gXzvy8SgT7HJuVLk/lOOh+EUJ4DYn/jcLA+SGRP54yRUAFVQ61eKs3VJ9qjK/kPD3IdG 0+yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZyhUoseN7KcL9h32+uJElLshw0im2xlB0TZ9Ykk2TuM=; b=sQ4qXfVYLpLoAEpSkrZFvf+HZ+/S1ivuEd4wZFSQjiB+e1ik2ALY9dw+Mlo3ZKixK+ eE8HyA64d9i4EaUH8MwltJVYjdg8I7Ne7G8CCIA/JJ1XAMmFCcTl5OJ0z2r8DBYBJrcP 3PcJNR1kp771d5n49Ukgwl5/L2NdYxxTCS/lsRp9uq7luSsHVpQAoKIUXW72px6AK7f7 ONQNQQp/g4creRfnfCdPRujL/c39Y31F9UnTIHFitCOkH/HwnqohJCoC+1wCjF9Csyzr AxzXBrmkNEVlVTQFOsCWk+tQjd5eDHarW6PAOPE+4HeY0sGHhLiq3Qvzr8yrQNx1NhlS YUAQ== X-Gm-Message-State: AJIora/IGeW0KtF37WgxrLaGNk+mFALGiIwyYmKaVM11SXPF4kSnRjvI OV2nI9gLCkrLEch68eyWn4EQtd5J4yM= X-Google-Smtp-Source: AGRyM1tv8MryVUOtChYGi7b4XRYgX+c81MKmCTna0ZFQ5ergTSrx6kVXSMRpgh54YLpGYE86Zkjp8g== X-Received: by 2002:adf:eccb:0:b0:21d:7b41:22c7 with SMTP id s11-20020adfeccb000000b0021d7b4122c7mr22294366wro.543.1657659216773; Tue, 12 Jul 2022 13:53:36 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:36 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 24/27] io_uring: rename IORING_OP_FILES_UPDATE Date: Tue, 12 Jul 2022 21:52:48 +0100 Message-Id: <0a907133907d9af3415a8a7aa1802c6aa97c03c6.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 12 +++++++++++- io_uring/opdef.c | 9 +++++---- io_uring/rsrc.c | 17 +++++++++++++++-- io_uring/rsrc.h | 4 ++-- 4 files changed, 33 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7d21fba54b62..37e8c104d31f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -171,7 +171,8 @@ enum io_uring_op { IORING_OP_FALLOCATE, IORING_OP_OPENAT, IORING_OP_CLOSE, - IORING_OP_FILES_UPDATE, + IORING_OP_RSRC_UPDATE, + IORING_OP_FILES_UPDATE = IORING_OP_RSRC_UPDATE, IORING_OP_STATX, IORING_OP_READ, IORING_OP_WRITE, @@ -220,6 +221,7 @@ enum io_uring_op { #define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) #define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) #define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) + /* * sqe->splice_flags * extends splice(2) flags @@ -286,6 +288,14 @@ enum io_uring_op { */ #define IORING_ACCEPT_MULTISHOT (1U << 0) + +/* + * IORING_OP_RSRC_UPDATE flags + */ +enum { + IORING_RSRC_UPDATE_FILES, +}; + /* * IORING_OP_MSG_RING command types, stored in sqe->addr */ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 7ab19bbf3126..72dd2b2d8a9d 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -246,12 +246,13 @@ const struct io_op_def io_op_defs[] = { .prep = io_close_prep, .issue = io_close, }, - [IORING_OP_FILES_UPDATE] = { + [IORING_OP_RSRC_UPDATE] = { .audit_skip = 1, .iopoll = 1, - .name = "FILES_UPDATE", - .prep = io_files_update_prep, - .issue = io_files_update, + .name = "RSRC_UPDATE", + .prep = io_rsrc_update_prep, + .issue = io_rsrc_update, + .ioprio = 1, }, [IORING_OP_STATX] = { .audit_skip = 1, diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 1182cf0ea1fc..98ce8a93a816 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -21,6 +21,7 @@ struct io_rsrc_update { u64 arg; u32 nr_args; u32 offset; + int type; }; static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, @@ -658,7 +659,7 @@ __cold int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, return -EINVAL; } -int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); @@ -672,6 +673,7 @@ int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (!up->nr_args) return -EINVAL; up->arg = READ_ONCE(sqe->addr); + up->type = READ_ONCE(sqe->ioprio); return 0; } @@ -711,7 +713,7 @@ static int io_files_update_with_index_alloc(struct io_kiocb *req, return ret; } -int io_files_update(struct io_kiocb *req, unsigned int issue_flags) +static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); struct io_ring_ctx *ctx = req->ctx; @@ -740,6 +742,17 @@ int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_rsrc_update *up = io_kiocb_to_cmd(req); + + switch (up->type) { + case IORING_RSRC_UPDATE_FILES: + return io_files_update(req, issue_flags); + } + return -EINVAL; +} + int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx, struct io_rsrc_node *node, void *rsrc) { diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index af342fd239d0..21813a23215f 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -167,6 +167,6 @@ static inline u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx) return &data->tags[table_idx][off]; } -int io_files_update(struct io_kiocb *req, unsigned int issue_flags); -int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags); +int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); #endif From patchwork Tue Jul 12 20:52:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915656 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 346CAC433EF for ; Tue, 12 Jul 2022 20:56:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233862AbiGLU4E (ORCPT ); Tue, 12 Jul 2022 16:56:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234178AbiGLUyz (ORCPT ); Tue, 12 Jul 2022 16:54:55 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C89B31172; Tue, 12 Jul 2022 13:53:43 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id bk26so12796473wrb.11; Tue, 12 Jul 2022 13:53:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lTEKopEUAeQr5d74gqsIxK77ORJixv2HTggIm0W0E6c=; b=XmSKQIQuCqY/uVsXhKKb1J3A1zYatMru/4pa7BBb8xXRz90QUz5IbSZV1pCxmrtKJs 3dPfzpdPc5wMlbGojMzs7TPgv//zAsgnRmhhe6vwENqPw8UjPzVmKYt7iEjLvZ1M1vwa xp0NIwC9T+pStJ7k+xRugEWK83w5YkaR4BZ3VipYK8wY+i4p466HRKwPHAYvz8I5gM2Q Co9/PmV+z8HB0KMcQ6APltCLYzrkZZ6+KszBZ24q/4kyBVKoJKDew2xCkKHPIgdl/vT/ dQQXu1XclWFXVcWF/elfOybBkbK05Vsg7g3QQQiRCmiHInnLdBYtkOZYaIhWg9l1CZ/T +P4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lTEKopEUAeQr5d74gqsIxK77ORJixv2HTggIm0W0E6c=; b=iTog5NjvB12UFidFA2nmltdGKbzPYhhpZTNJ5n3VdnlQGSdVRf3gfP0peis39eRB6Z /LoOBi7LgGDmriQ1bY5sSsVgRLNM/G9V7ZraYR+EBLOcZ8Q3HIJIvaHw+/wfp4OPaBtf k4F83A88+m/El+9vjZr6JzGaQwv8+Bl90X02XsZyCuxfxhGqG0lUnUevXTeS/G/pb/mk 77AtQi3z4oMfsjs3DlzS+U7/JZ4d+vJQ/7SItzyq8OCXSSIJoh1TarjQ/Pqkmwnl+EcK 6wOm0sVFxmxDZTPhP7xQlPR1MOvcEcTaB9bD7M1Kmzs7bCQEoLUJp7y+6O57KaGUfMsZ DZtQ== X-Gm-Message-State: AJIora8HGU9WsQppGJJSX5j4JfXnR6x66I5JDNn5ARraTwKpA+HsukUj fYEcFXtVetAEqSZdwe20tgar4eM/6ZU= X-Google-Smtp-Source: AGRyM1vvX0Sob/YNDt7bpKNdszodUoKWSI0yDRqYSfbrj0q9XwxNkvJ9nhdwp5iFWM46ebnCNOobuQ== X-Received: by 2002:a05:6000:1ac8:b0:21d:b7d9:3c03 with SMTP id i8-20020a0560001ac800b0021db7d93c03mr2006827wry.149.1657659218010; Tue, 12 Jul 2022 13:53:38 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:37 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 25/27] io_uring: add zc notification flush requests Date: Tue, 12 Jul 2022 21:52:49 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring.h | 1 + io_uring/rsrc.c | 38 +++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 37e8c104d31f..9a7aa25d09a1 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -294,6 +294,7 @@ enum io_uring_op { */ enum { IORING_RSRC_UPDATE_FILES, + IORING_RSRC_UPDATE_NOTIF, }; /* diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 98ce8a93a816..088a2dc32e2c 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -15,6 +15,7 @@ #include "io_uring.h" #include "openclose.h" #include "rsrc.h" +#include "notif.h" struct io_rsrc_update { struct file *file; @@ -742,6 +743,41 @@ static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +static int io_notif_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_rsrc_update *up = io_kiocb_to_cmd(req); + struct io_ring_ctx *ctx = req->ctx; + unsigned len = up->nr_args; + unsigned idx_end, idx = up->offset; + int ret = 0; + + io_ring_submit_lock(ctx, issue_flags); + if (unlikely(check_add_overflow(idx, len, &idx_end))) { + ret = -EOVERFLOW; + goto out; + } + if (unlikely(idx_end > ctx->nr_notif_slots)) { + ret = -EINVAL; + goto out; + } + + for (; idx < idx_end; idx++) { + struct io_notif_slot *slot = &ctx->notif_slots[idx]; + + if (!slot->notif) + continue; + if (up->arg) + slot->tag = up->arg; + io_notif_slot_flush_submit(slot, issue_flags); + } +out: + io_ring_submit_unlock(ctx, issue_flags); + if (ret < 0) + req_set_fail(req); + io_req_set_res(req, ret, 0); + return IOU_OK; +} + int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_rsrc_update *up = io_kiocb_to_cmd(req); @@ -749,6 +785,8 @@ int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) switch (up->type) { case IORING_RSRC_UPDATE_FILES: return io_files_update(req, issue_flags); + case IORING_RSRC_UPDATE_NOTIF: + return io_notif_update(req, issue_flags); } return -EINVAL; } From patchwork Tue Jul 12 20:52:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915655 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0B1FCCA47F for ; Tue, 12 Jul 2022 20:56:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234370AbiGLU4B (ORCPT ); Tue, 12 Jul 2022 16:56:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234180AbiGLUyz (ORCPT ); Tue, 12 Jul 2022 16:54:55 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA2E72ED52; Tue, 12 Jul 2022 13:53:43 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id b26so12831567wrc.2; Tue, 12 Jul 2022 13:53:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WExwuGiqZIBx1fmX1iBGP3JMT2QbBuC/oWcmMGulQEg=; b=AR1tcfKYxhwttlnwbEu04QtHHvFEwA8d6fTwm8XkFzi/3Skk2t67yWkd539d0tBtOg eUBF9HhPWXwa5aJyj6lJl75cbOLHT8evfHscEvlH5D8iFJErYCZUvl3NFVVEgccRXboK I9U3YW74wdBrCHMgnJ3M6E9p09tuXoZjeE6GhkKnnUI8U/yMNjVxbszzDsOG6fXLmpAG 2ioGMJFPqz1uxUxQZLa0GsCZnJQusg7jKmk/tt9N4Kw5JoyYgUdEdG3TmlRI/QUt80M3 /9YxMGdZSvS+u587xf+O2jsirphJ5whxgTaM0uFlLh4gfeOI35K2kMu2wu79/Jy9Lla8 tgUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WExwuGiqZIBx1fmX1iBGP3JMT2QbBuC/oWcmMGulQEg=; b=6TWucY1DYrW1SNtUT5fILBNoGWe8UyBRxqIafMYzTUj+UDujkrGLAEXKwPBtm0Lxbi ZFZk5wM+PRDlX4Y2Y+YdXkTcXFneqd8APpdkz5O+huoAiZtTK8lIAJG7AiKM4TMtqYl3 1HxIxfm0M29XlJR57b1QGF9fed0IFNHfHHGMVEaclL2ucmgqHV98D0XuVnqCjbdyZVF8 D49m5mBvlc9Ynd1xDoi8TGHEVf2diZQ3RmvkbMirSvMJYe0zlZHeV08wZrlaknPDe0f5 WF072+UzHQsUJU52Kfu41ypKcWsIoS22tl2euDxQ0z8gUnAHgJFVPcCluHPC1ApHZudv A5XA== X-Gm-Message-State: AJIora8zErZoTzYGq0DZUqSVHJRZi2QQmGT5fkMMA6CgsEOHuPooGcxx TssfS1PrwIyMUxc4IoISBunV+iL0gXE= X-Google-Smtp-Source: AGRyM1unJ1n2O0NPeI9UuP/QZRL5LZMj16HAQZczt097u1MFWD1uXx0tu81pUtVLOKUHy1uQupxduw== X-Received: by 2002:a5d:64e8:0:b0:21d:b277:d4a7 with SMTP id g8-20020a5d64e8000000b0021db277d4a7mr5252670wri.621.1657659219223; Tue, 12 Jul 2022 13:53:39 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:38 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 26/27] io_uring: enable managed frags with register buffers Date: Tue, 12 Jul 2022 21:52:50 +0100 Message-Id: <278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org io_uring's registered buffers infra has a good performant way of pinning pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely register buffer backed. Signed-off-by: Pavel Begunkov --- io_uring/net.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/io_uring/net.c b/io_uring/net.c index bf9916d5e50c..a4e863dce7ec 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -704,6 +704,60 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return 0; } +static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + int frag = shinfo->nr_frags; + int ret = 0; + struct bvec_iter bi; + ssize_t copied = 0; + unsigned long truesize = 0; + + if (!shinfo->nr_frags) + shinfo->flags |= SKBFL_MANAGED_FRAG_REFS; + + if (!skb_zcopy_managed(skb) || !iov_iter_is_bvec(from)) { + skb_zcopy_downgrade_managed(skb); + return __zerocopy_sg_from_iter(NULL, sk, skb, from, length); + } + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size && frag < MAX_SKB_FRAGS) { + struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); + + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, + v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + if (bi.bi_size) + ret = -EMSGSIZE; + + shinfo->nr_frags = frag; + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; +} + int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { struct sockaddr_storage address; @@ -768,7 +822,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_flags = msg_flags; msg.msg_ubuf = ¬if->uarg; - msg.sg_from_iter = NULL; + msg.sg_from_iter = io_sg_from_iter; ret = sock_sendmsg(sock, &msg); if (unlikely(ret < min_ret)) { From patchwork Tue Jul 12 20:52:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12915657 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 177D9CCA47F for ; Tue, 12 Jul 2022 20:56:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232821AbiGLU4L (ORCPT ); Tue, 12 Jul 2022 16:56:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233838AbiGLUzA (ORCPT ); Tue, 12 Jul 2022 16:55:00 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A489C39B84; Tue, 12 Jul 2022 13:53:44 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id bk26so12796608wrb.11; Tue, 12 Jul 2022 13:53:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EpZbzDMfnj4/WxgjQevjPP/4PRPQ5t7+5MEbvKcF/LQ=; b=bX0T13JkqUjQSfcUs3uEfYCCzdbRt85O31pfbJjAZ5ubTR6KlRiGVtCxiXtX//I6Cb ql7ZRpyLM+JSKR6Oo6ZvRPOmY0XWycuRcyvSG5jj/vX9R/SoxUH+0II6gBFxMS7+4JAJ tCjo89efrKuKJ6wxr+6j3YjPtnPqm9enBde2qmRAu3kADUonXuvUuj7GXBxkiPKkao97 WQqjJ5ggXZbn6FGiS40BE2Fco/YcNfjvh6tDoCECt240JtAGSkJUhGe3DK2UY7JaHCpL q5bY8Tw0hyLuHuztl29VoGkoBsKu1ntAYKfLU+N2KBLRaVB9ryUtWXAhcS3mifKpZ2P0 5ewQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EpZbzDMfnj4/WxgjQevjPP/4PRPQ5t7+5MEbvKcF/LQ=; b=L2GONhy/3uJ3G3/p+SaGyxWW7HctZukUyQugZdfRAeQQtATeEh8pXHQPm4lq1rsSYD qfMmOiQ0uUq9TjWmlCRj2IxtkpIwRYkVA80NQRG4wcjoULOjhnaJJqU66+t/TIK47VQ2 0we+gLcnvsCRZh47Mx4gf7dtNpPAOshs7whwnmZkxWxwPeWciJSzhmRM9ci6vtuzTRST 8+KtxQLZdmxe+dvvNujtyVSm09rLdsgRTcthC2XZDRjHssP235ACp/HsaPEL8jETWQLh wOgEPY3+if8aQt/NZoWeYa2OwlXJpf1IgqyxEOA+0HD2MbskWgiXkFR2KzaLyjhkHy/b SIfw== X-Gm-Message-State: AJIora97jam6gExEVTB1htzhEYg5zaxnSQ1a1q5162uga+K3fJdn3f3W qS2q6ff0vNnRRX0RCY0aw6006F6ys4I= X-Google-Smtp-Source: AGRyM1vS9ttYPSEp203gx+lvxsFO3cqL7g3Cr/WOl+KcGA9iw/LtsP528H4rVtNGiajAIPwx3lK6Pw== X-Received: by 2002:a5d:595d:0:b0:21b:84af:552a with SMTP id e29-20020a5d595d000000b0021b84af552amr23845802wri.656.1657659220527; Tue, 12 Jul 2022 13:53:40 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id c14-20020a7bc00e000000b003a044fe7fe7sm89833wmb.9.2022.07.12.13.53.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 13:53:40 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , David Ahern , kernel-team@fb.com, Pavel Begunkov Subject: [PATCH net-next v5 27/27] selftests/io_uring: test zerocopy send Date: Tue, 12 Jul 2022 21:52:51 +0100 Message-Id: <03d5ec78061cf52db420f88ed0b48eb8f47ce9f7.1657643355.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Add selftests for io_uring zerocopy sends and io_uring's notification infrastructure. It's largely influenced by msg_zerocopy and uses it on the receive side. Signed-off-by: Pavel Begunkov --- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 605 ++++++++++++++++++ .../selftests/net/io_uring_zerocopy_tx.sh | 131 ++++ 3 files changed, 737 insertions(+) create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 7ea54af55490..51261483744e 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -59,6 +59,7 @@ TEST_GEN_FILES += toeplitz TEST_GEN_FILES += cmsg_sender TEST_GEN_FILES += stress_reuseport_listen TEST_PROGS += test_vxlan_vnifiltering.sh +TEST_GEN_FILES += io_uring_zerocopy_tx TEST_FILES := settings diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c new file mode 100644 index 000000000000..9d64c560a2d6 --- /dev/null +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -0,0 +1,605 @@ +/* SPDX-License-Identifier: MIT */ +/* based on linux-kernel/tools/testing/selftests/net/msg_zerocopy.c */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define NOTIF_TAG 0xfffffffULL +#define NONZC_TAG 0 +#define ZC_TAG 1 + +enum { + MODE_NONZC = 0, + MODE_ZC = 1, + MODE_ZC_FIXED = 2, + MODE_MIXED = 3, +}; + +static bool cfg_flush = false; +static bool cfg_cork = false; +static int cfg_mode = MODE_ZC_FIXED; +static int cfg_nr_reqs = 8; +static int cfg_family = PF_UNSPEC; +static int cfg_payload_len; +static int cfg_port = 8000; +static int cfg_runtime_ms = 4200; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); + +struct io_sq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + unsigned *flags; + unsigned *array; +}; + +struct io_cq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *kflags; + unsigned *kdropped; + unsigned *array; + struct io_uring_sqe *sqes; + + unsigned sqe_head; + unsigned sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#ifdef __alpha__ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 535 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 536 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 537 +# endif +#else /* !__alpha__ */ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 425 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 426 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 427 +# endif +#endif + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else + +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static int io_uring_setup(unsigned int entries, struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_register_notifications(struct io_uring *ring, + unsigned nr, + struct io_uring_notification_slot *slots) +{ + int ret; + struct io_uring_notification_register r = { + .nr_slots = nr, + .data = (unsigned long)slots, + }; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_NOTIFIERS, &r, sizeof(r)); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static int io_uring_queue_init(unsigned entries, struct io_uring *ring, + unsigned flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +static int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned mask = *sq->kring_mask; + unsigned ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8) IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long) buf; + sqe->len = len; + sqe->msg_flags = (__u32) flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned slot_idx, unsigned zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8) IORING_OP_SENDZC_NOTIF; + sqe->notification_idx = slot_idx; + sqe->ioprio = zc_flags; +} + +static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned mask = *cq->kring_mask; + unsigned head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_setsockopt(int fd, int level, int optname, int val) +{ + if (setsockopt(fd, level, optname, &val, sizeof(val))) + error(1, errno, "setsockopt %d.%d: %d", level, optname, val); +} + +static int do_setup_tx(int domain, int type, int protocol) +{ + int fd; + + fd = socket(domain, type, protocol); + if (fd == -1) + error(1, errno, "socket t"); + + do_setsockopt(fd, SOL_SOCKET, SO_SNDBUF, 1 << 21); + + if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + return fd; +} + +static void do_tx(int domain, int type, int protocol) +{ + struct io_uring_notification_slot b[1] = {{.tag = NOTIF_TAG}}; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned long packets = 0, bytes = 0; + struct io_uring ring; + struct iovec iov; + uint64_t tstop; + int i, fd, ret; + int compl_cqes = 0; + + fd = do_setup_tx(domain, type, protocol); + + ret = io_uring_queue_init(512, &ring, 0); + if (ret) + error(1, ret, "io_uring: queue init"); + + ret = io_uring_register_notifications(&ring, 1, b); + if (ret) + error(1, ret, "io_uring: tx ctx registration"); + + iov.iov_base = payload; + iov.iov_len = cfg_payload_len; + + ret = io_uring_register_buffers(&ring, &iov, 1); + if (ret) + error(1, ret, "io_uring: buffer registration"); + + tstop = gettimeofday_ms() + cfg_runtime_ms; + do { + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 1); + + for (i = 0; i < cfg_nr_reqs; i++) { + unsigned zc_flags = 0; + unsigned buf_idx = 0; + unsigned slot_idx = 0; + unsigned mode = cfg_mode; + unsigned msg_flags = 0; + + if (cfg_mode == MODE_MIXED) + mode = rand() % 3; + + sqe = io_uring_get_sqe(&ring); + + if (mode == MODE_NONZC) { + io_uring_prep_send(sqe, fd, payload, + cfg_payload_len, msg_flags); + sqe->user_data = NONZC_TAG; + } else { + if (cfg_flush) { + zc_flags |= IORING_RECVSEND_NOTIF_FLUSH; + compl_cqes++; + } + io_uring_prep_sendzc(sqe, fd, payload, + cfg_payload_len, + msg_flags, slot_idx, zc_flags); + if (mode == MODE_ZC_FIXED) { + sqe->ioprio |= IORING_RECVSEND_FIXED_BUF; + sqe->buf_index = buf_idx; + } + sqe->user_data = ZC_TAG; + } + } + + ret = io_uring_submit(&ring); + if (ret != cfg_nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < cfg_nr_reqs; i++) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + + if (cqe->user_data == NOTIF_TAG) { + compl_cqes--; + i--; + } else if (cqe->user_data != NONZC_TAG && + cqe->user_data != ZC_TAG) { + error(1, cqe->res, "invalid user_data"); + } else if (cqe->res <= 0 && cqe->res != -EAGAIN) { + error(1, cqe->res, "send failed"); + } else { + if (cqe->res > 0) { + packets++; + bytes += cqe->res; + } + /* failed requests don't flush */ + if (cfg_flush && + cqe->res <= 0 && + cqe->user_data == ZC_TAG) + compl_cqes--; + } + io_uring_cqe_seen(&ring); + } + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 0); + } while (gettimeofday_ms() < tstop); + + if (close(fd)) + error(1, errno, "close"); + + fprintf(stderr, "tx=%lu (MB=%lu), tx/s=%lu (MB/s=%lu)\n", + packets, bytes >> 20, + packets / (cfg_runtime_ms / 1000), + (bytes >> 20) / (cfg_runtime_ms / 1000)); + + while (compl_cqes) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + io_uring_cqe_seen(&ring); + compl_cqes--; + } +} + +static void do_test(int domain, int type, int protocol) +{ + int i; + + for (i = 0; i < IP_MAXPACKET; i++) + payload[i] = 'a' + (i % 26); + do_tx(domain, type, protocol); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-f] [-n] [-z0] [-s] " + "(-4|-6) [-t