From patchwork Fri Jul 14 17:08:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313862 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CAC5154BB for ; Fri, 14 Jul 2023 17:10:27 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C1DA1BC9; Fri, 14 Jul 2023 10:10:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354625; x=1720890625; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oRjsVLz8w68j93xSHCl3UVc059uvZIwWLhOwhXuUe68=; b=EELU3S+/jqCGnNb5gILZ/ZNEguYlaQ2JSUDL4IrTTlU2Log0NlGyDAAG XoBHjMnhXJNwQqVh36YPtc4JEMsuz5VA9XyHUFuEuiVN42pmhAiOzuZsF OZOHfYVrJOFBipZAN8OrJY14SKAXvEje1yXbt4KoDJVg9tjzgT7nAo8c+ qmrBRyMnpvpDt1aFZua57IYpRyu5uxundfRlKOsViqPQCAHz0olieX21x TP8iUIIgUMuwYhvlsZsO6lnnOh482tGmZAWUfATzIGvDFtWZguXEWSzgC jO888oExRsjk61n7ALqflii/d0L5iBfovra9a+beHBwoeLrO3wDZF8rMQ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891830" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891830" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787906991" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787906991" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:21 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 1/7] net: skbuff: don't include to Date: Fri, 14 Jul 2023 19:08:44 +0200 Message-ID: <20230714170853.866018-2-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Currently, touching triggers a rebuild of more than half of the kernel. That's because it's included in . And each new include to page_pool.h adds more [useless] data for the toolchain to process per each source file from that pile. In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling"), Matteo included it to be able to call a couple of functions defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as long as page pool owns the page") one of the calls was removed, so only one was left. It's the call to page_pool_return_skb_page() in napi_frag_unref(). The function is external and doesn't have any dependencies. Having very niche page_pool.h included only for that looks like an overkill. As %PP_SIGNATURE is not local to page_pool.c (was only in the early submissions), nothing holds this function there. Teleport page_pool_return_skb_page() to skbuff.c, just next to the main consumer, skb_pp_recycle(). It's used also in napi_frag_unref() -> {__,}skb_frag_unref(), so no `static` unfortunately. Maybe next time. Now, after a few include fixes in the drivers, touching page_pool.h only triggers rebuilding of the drivers using it and a couple of core networking files. Suggested-by: Jakub Kicinski # make skbuff.h less heavy Suggested-by: Alexander Duyck # move to skbuff.c Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/engleder/tsnep_main.c | 1 + drivers/net/ethernet/freescale/fec_main.c | 1 + .../marvell/octeontx2/nic/otx2_common.c | 1 + .../ethernet/marvell/octeontx2/nic/otx2_pf.c | 1 + .../ethernet/mellanox/mlx5/core/en/params.c | 1 + .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 1 + drivers/net/wireless/mediatek/mt76/mt76.h | 1 + include/linux/skbuff.h | 3 +- include/net/page_pool.h | 2 - net/core/page_pool.c | 39 ------------------- net/core/skbuff.c | 39 +++++++++++++++++++ 11 files changed, 48 insertions(+), 42 deletions(-) diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index 84751bb303a6..6222aaa5157f 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #define TSNEP_RX_OFFSET (max(NET_SKB_PAD, XDP_PACKET_HEADROOM) + NET_IP_ALIGN) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 1b990a486059..cfc07f012254 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index a5d03583bf79..d17a0ebc9036 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -7,6 +7,7 @@ #include #include +#include #include #include diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index fe8ea4e531b7..7eca434a0550 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "otx2_reg.h" #include "otx2_common.h" diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c index 5ce28ff7685f..0f152f14165b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c @@ -6,6 +6,7 @@ #include "en/port.h" #include "en_accel/en_accel.h" #include "en_accel/ipsec.h" +#include #include static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index 40589cebb773..16038c23b7d8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -35,6 +35,7 @@ #include "en/xdp.h" #include "en/params.h" #include +#include int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk) { diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 6b07b8fafec2..95c16f11d156 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -15,6 +15,7 @@ #include #include #include +#include #include "util.h" #include "testmode.h" diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 91ed66952580..f76d172ed262 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -32,7 +32,6 @@ #include #include #include -#include #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #endif @@ -3423,6 +3422,8 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) __skb_frag_ref(&skb_shinfo(skb)->frags[f]); } +bool page_pool_return_skb_page(struct page *page, bool napi_safe); + static inline void napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) { diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 2b7db9992fc0..829dc1f8ba6b 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -307,8 +307,6 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) return pool->p.dma_dir; } -bool page_pool_return_skb_page(struct page *page, bool napi_safe); - struct page_pool *page_pool_create(const struct page_pool_params *params); struct xdp_mem_info; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 985ccaffc06a..09d76e37ac69 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -913,42 +913,3 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) } } EXPORT_SYMBOL(page_pool_update_nid); - -bool page_pool_return_skb_page(struct page *page, bool napi_safe) -{ - struct napi_struct *napi; - struct page_pool *pp; - bool allow_direct; - - page = compound_head(page); - - /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation - * in order to preserve any existing bits, such as bit 0 for the - * head page of compound page and bit 1 for pfmemalloc page, so - * mask those bits for freeing side when doing below checking, - * and page_is_pfmemalloc() is checked in __page_pool_put_page() - * to avoid recycling the pfmemalloc page. - */ - if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) - return false; - - pp = page->pp; - - /* Allow direct recycle if we have reasons to believe that we are - * in the same context as the consumer would run, so there's - * no possible race. - */ - napi = READ_ONCE(pp->p.napi); - allow_direct = napi_safe && napi && - READ_ONCE(napi->list_owner) == smp_processor_id(); - - /* Driver set this to memory recycling info. Reset it on recycle. - * This will *not* work for NIC using a split-page memory model. - * The page will be returned to the pool here regardless of the - * 'flipped' fragment being in use or not. - */ - page_pool_put_full_page(pp, page, allow_direct); - - return true; -} -EXPORT_SYMBOL(page_pool_return_skb_page); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ea9c09b4b927..4de280c454e4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -879,6 +879,45 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } +bool page_pool_return_skb_page(struct page *page, bool napi_safe) +{ + struct napi_struct *napi; + struct page_pool *pp; + bool allow_direct; + + page = compound_head(page); + + /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation + * in order to preserve any existing bits, such as bit 0 for the + * head page of compound page and bit 1 for pfmemalloc page, so + * mask those bits for freeing side when doing below checking, + * and page_is_pfmemalloc() is checked in __page_pool_put_page() + * to avoid recycling the pfmemalloc page. + */ + if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) + return false; + + pp = page->pp; + + /* Allow direct recycle if we have reasons to believe that we are + * in the same context as the consumer would run, so there's + * no possible race. + */ + napi = READ_ONCE(pp->p.napi); + allow_direct = napi_safe && napi && + READ_ONCE(napi->list_owner) == smp_processor_id(); + + /* Driver set this to memory recycling info. Reset it on recycle. + * This will *not* work for NIC using a split-page memory model. + * The page will be returned to the pool here regardless of the + * 'flipped' fragment being in use or not. + */ + page_pool_put_full_page(pp, page, allow_direct); + + return true; +} +EXPORT_SYMBOL(page_pool_return_skb_page); + static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) { if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) From patchwork Fri Jul 14 17:08:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313863 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D17B4156D9 for ; Fri, 14 Jul 2023 17:10:28 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8EF0212B; Fri, 14 Jul 2023 10:10:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354627; x=1720890627; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7p8mJB+ijeu7RaDZDF4ve84o/baC+2XSnWBRqZcf5LE=; b=kYDRqYPk6eMCWSeS7o4NMxIxqG8G2WyGCYYtEWj/Fu9Hq1C9k0z/VFqe JeQz4G95014EMVsp++8gWcwrvum9sAdd5quMBsJVURJrFmkf5VfBaOTtS 6ngwWTna8kcaYKSJPnzDl8yRcLQm8MhicXM+mQseB6ufNNXxyHmubm2OO 9N8LqPYPE8ssWbbR5aRbPaCE4/bMtK3oOXOQmdPu/Pf2KalHP2Vb7lFmB m9ozelMGKfl+ll/EnB12mtmWPuiZYcJgE5CU0Ul7vLvwyTytntZlAAHJM wJPv+Isz6pGv6qwEmBtMU1GY72yt01zizbb1fQye0sWiyAqWIU47YmKeT A==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891846" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891846" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787907004" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787907004" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:24 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 2/7] net: page_pool: place frag_* fields in one cacheline Date: Fri, 14 Jul 2023 19:08:45 +0200 Message-ID: <20230714170853.866018-3-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC On x86_64, frag_* fields of struct page_pool are scattered across two cachelines despite the summary size of 24 bytes. The last field, ::frag_users, is pushed out to the next one, sharing it with ::alloc_stats. All three fields are used in pretty much the same places. There are some holes and cold members to move around. Move frag_* one block up, placing them right after &page_pool_params perfectly at the beginning of CL2. This doesn't do any meaningful to the second block, as those are some destroy-path cold structures, and doesn't do anything to ::alloc_stats, which still starts at 200-byte offset, 8 bytes after CL3 (still fitting into 1 cacheline). On my setup, this yields 1-2% of Mpps when using PP frags actively. When it comes to 32-bit architectures with 32-byte CL: &page_pool_params plus ::pad is 44 bytes, the block taken care of is 16 bytes within one CL, so there should be at least no regressions from the actual change. Signed-off-by: Alexander Lobakin --- include/net/page_pool.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 829dc1f8ba6b..212d72b5cfec 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -130,16 +130,16 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats) struct page_pool { struct page_pool_params p; + long frag_users; + struct page *frag_page; + unsigned int frag_offset; + u32 pages_state_hold_cnt; + struct delayed_work release_dw; void (*disconnect)(void *); unsigned long defer_start; unsigned long defer_warn; - u32 pages_state_hold_cnt; - unsigned int frag_offset; - struct page *frag_page; - long frag_users; - #ifdef CONFIG_PAGE_POOL_STATS /* these stats are incremented while in softirq context */ struct page_pool_alloc_stats alloc_stats; From patchwork Fri Jul 14 17:08:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313865 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6C1E15AC6 for ; Fri, 14 Jul 2023 17:10:37 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50C973A87; Fri, 14 Jul 2023 10:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354634; x=1720890634; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=J4oLah63yJ0HDb0vEWkMBT8p/z6UYrzti87G1AcWpXY=; b=Djg7n3Fvxdp8jvnHwfgU+PhHXskCmVElSd0peTEQ/YySImKS2ctG6OCE Ds23HmqEOZ27Dwn4SB47086yrerxQLBDG1emJAqXshENxJrBtGDnpZ0tG RXKMperp9IsnTHwmqpOIJfR2/Eb3jcoamx6Ac/+KRCcHX5xiv/dPxIpKc 9aGHYrU/LeVCAYDdLlAvkeA27g7WNGXSc3vq2MgxXvy/YqQACqEeeEGes ZtL/+OCfeqJHUnhnGRcJdf5ttydkethQhAIrsXJqDR6qfc2KyqR/96DTo YK9wCqBtVVzTsOVZMSaB31XZDyRtHc6oAlrUnT6IB1qgP24VnSkzc1GLs Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891877" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891877" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787907035" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787907035" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:30 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 3/7] net: page_pool: place frag_* fields in one cacheline Date: Fri, 14 Jul 2023 19:08:47 +0200 Message-ID: <20230714170853.866018-5-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC On x86_64, frag_* fields of struct page_pool are scattered across two cachelines despite the summary size of 24 bytes. The last field, ::frag_users, is pushed out to the next one, sharing it with ::alloc_stats. All three fields are used in pretty much same places. There are some holes and cold members to move around. Move frag_* one block up, placing them right after &page_pool_params perfectly at the beginning of CL2. This doesn't do any meaningful to the second block, as those are some destroy-path cold structures, and doesn't do anything to ::alloc_stats, which still starts at 200 byte offset, 8 bytes after CL3 (still fitting into 1 cacheline). On my setup, this yields 1-2% of Mpps when using PP frags actively. When it comes to 32-bit architectures with 32 byte CL: &page_pool_params plus ::pad is 44 bytes, the block taken care of is 16 bytes within one CL, so there should be at least no regressions from the actual change. Signed-off-by: Alexander Lobakin --- include/net/page_pool.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 69e822021d95..68937deea4b1 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -131,16 +131,16 @@ struct page_pool { struct page_pool_params p; long pad; + long frag_users; + struct page *frag_page; + unsigned int frag_offset; + u32 pages_state_hold_cnt; + struct delayed_work release_dw; void (*disconnect)(void *); unsigned long defer_start; unsigned long defer_warn; - u32 pages_state_hold_cnt; - unsigned int frag_offset; - struct page *frag_page; - long frag_users; - #ifdef CONFIG_PAGE_POOL_STATS /* these stats are incremented while in softirq context */ struct page_pool_alloc_stats alloc_stats; From patchwork Fri Jul 14 17:08:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313867 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44FBB15AC6 for ; Fri, 14 Jul 2023 17:10:47 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3847E3AAA; Fri, 14 Jul 2023 10:10:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354639; x=1720890639; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BllYlKzEFfzFtohGzc4O3kjK2I6sflee5H/3e7j/i8o=; b=B8CZsFsn8mHgSbQFmawdT8XF0UgRQwu+giFB4c5zZCHUmbstg1OrHUu8 iLlOYiXpyzqYfxD7kf/Ypr1c8k8Nci268uLjHy+kFvoBT3NGWdc2lN1Mq /tmcjTd2ScTHpkDCQCKZxal7/v7M2lyfAfMvWvaoGGTB2o44023f4Z5Oh F5WfEt3iWdMSR/OJ+qVSlEdBkyu8SfwiBH9KNkv2zXLT1bkqjs9cS5x6U Kr8JbK6Xqx4CioBKdQ4FvuVb6V1sIYPu40+RxJdc94lXp8O4FuvgP7S5z IHTf7aeLeVoVe+Tli7MGq5lZjRoABdnW+wLHzUA8LLPFQFxesLd1xZ/65 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891905" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891905" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787907059" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787907059" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:36 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 4/7] net: page_pool: don't use driver-set flags field directly Date: Fri, 14 Jul 2023 19:08:49 +0200 Message-ID: <20230714170853.866018-7-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC page_pool::p is driver-defined params, copied directly from the structure passed via page_pool_create(). The structure isn't meant to be modified by the Page Pool core code and this even might look confusing[0][1]. In order to be able to alter some flags, let's define our own, internal fields. Use the slot freed earlier to stay within the same cacheline as before (or almost if it's shorter than 64 bytes). The flag indicating whether to perform DMA mapping can be bool; as for DMA sync, define it as an enum to be able to extend it later on. They are defined as bits in the driver-set params, leave them so here as well, to not waste byte-per-bit or so. Now there are 29 free bits left in those 4 bytes + 4 free bytes more before the cacheline boundary. We could've defined only new flags here or only the ones we may need to alter, but checking some flags in one place while others in another doesn't sound convenient or intuitive. Suggested-by: Jakub Kicinski Link[0]: https://lore.kernel.org/netdev/20230703133207.4f0c54ce@kernel.org Suggested-by: Alexander Duyck Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com Signed-off-by: Alexander Lobakin --- include/net/page_pool.h | 7 ++++++- net/core/page_pool.c | 26 ++++++++++++++------------ 2 files changed, 20 insertions(+), 13 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 68937deea4b1..aefcb76baf0e 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -129,7 +129,12 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats) struct page_pool { struct page_pool_params p; - long pad; + + bool dma_map:1; /* Perform DMA mapping */ + enum { + PP_DMA_SYNC_ACT_DISABLED = 0, /* Driver didn't ask to sync */ + PP_DMA_SYNC_ACT_DO, /* Perform DMA sync ops */ + } dma_sync_act:2; long frag_users; struct page *frag_page; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 09d76e37ac69..67957e74e1f5 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -182,6 +182,8 @@ static int page_pool_init(struct page_pool *pool, if ((pool->p.dma_dir != DMA_FROM_DEVICE) && (pool->p.dma_dir != DMA_BIDIRECTIONAL)) return -EINVAL; + + pool->dma_map = true; } if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) { @@ -194,6 +196,8 @@ static int page_pool_init(struct page_pool *pool, if (!pool->p.max_len) return -EINVAL; + pool->dma_sync_act = PP_DMA_SYNC_ACT_DO; + /* pool->p.offset has to be set according to the address * offset used by the DMA engine to start copying rx data */ @@ -213,7 +217,7 @@ static int page_pool_init(struct page_pool *pool, /* Driver calling page_pool_create() also call page_pool_destroy() */ refcount_set(&pool->user_cnt, 1); - if (pool->p.flags & PP_FLAG_DMA_MAP) + if (pool->dma_map) get_device(pool->p.dev); return 0; @@ -341,7 +345,7 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) page_pool_set_dma_addr(page, dma); - if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) + if (pool->dma_sync_act == PP_DMA_SYNC_ACT_DO) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); return true; @@ -380,8 +384,7 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool, if (unlikely(!page)) return NULL; - if ((pool->p.flags & PP_FLAG_DMA_MAP) && - unlikely(!page_pool_dma_map(pool, page))) { + if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page))) { put_page(page); return NULL; } @@ -401,8 +404,8 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, gfp_t gfp) { const int bulk = PP_ALLOC_CACHE_REFILL; - unsigned int pp_flags = pool->p.flags; unsigned int pp_order = pool->p.order; + bool dma_map = pool->dma_map; struct page *page; int i, nr_pages; @@ -427,8 +430,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, */ for (i = 0; i < nr_pages; i++) { page = pool->alloc.cache[i]; - if ((pp_flags & PP_FLAG_DMA_MAP) && - unlikely(!page_pool_dma_map(pool, page))) { + if (dma_map && unlikely(!page_pool_dma_map(pool, page))) { put_page(page); continue; } @@ -500,7 +502,7 @@ void page_pool_release_page(struct page_pool *pool, struct page *page) dma_addr_t dma; int count; - if (!(pool->p.flags & PP_FLAG_DMA_MAP)) + if (!pool->dma_map) /* Always account for inflight pages, even if we didn't * map them */ @@ -573,7 +575,7 @@ static bool page_pool_recycle_in_cache(struct page *page, } /* If the page refcnt == 1, this will try to recycle the page. - * if PP_FLAG_DMA_SYNC_DEV is set, we'll try to sync the DMA area for + * if pool->dma_sync_act is set, we'll try to sync the DMA area for * the configured size min(dma_sync_size, pool->max_len). * If the page refcnt != 1, then the page will be returned to memory * subsystem. @@ -594,7 +596,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, if (likely(page_ref_count(page) == 1 && !page_is_pfmemalloc(page))) { /* Read barrier done in page_ref_count / READ_ONCE */ - if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) + if (pool->dma_sync_act == PP_DMA_SYNC_ACT_DO) page_pool_dma_sync_for_device(pool, page, dma_sync_size); @@ -695,7 +697,7 @@ static struct page *page_pool_drain_frag(struct page_pool *pool, return NULL; if (page_ref_count(page) == 1 && !page_is_pfmemalloc(page)) { - if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) + if (pool->dma_sync_act == PP_DMA_SYNC_ACT_DO) page_pool_dma_sync_for_device(pool, page, -1); return page; @@ -781,7 +783,7 @@ static void __page_pool_destroy(struct page_pool *pool) ptr_ring_cleanup(&pool->ring, NULL); - if (pool->p.flags & PP_FLAG_DMA_MAP) + if (pool->dma_map) put_device(pool->p.dev); #ifdef CONFIG_PAGE_POOL_STATS From patchwork Fri Jul 14 17:08:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313868 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90DF815AC6 for ; Fri, 14 Jul 2023 17:10:50 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E9DC35A9; Fri, 14 Jul 2023 10:10:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354643; x=1720890643; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0cjmwAPzX05MPogT7th10UjSW/BOs61e1u7pspt8ffY=; b=hfNbHbc7oQrnvg+hMX/NosMuAv3hvP5MnQ1O1wCHSodtuj3osRivS6nR kuQzm91WIpuGyXqZcJSYzfEeY/NSkXoqzAhYO1TXsTFgI2nCUspS9uR6B 23kaCf49pBr3wCMuVOuzwtcbIUKxKFX/tBp/x2nT0bcYXLpd+ovKCDg2F OZSg8xUiue7J0Od2aA3g+mbxn5x+Jlj9NT3thvTG9/68N6ohmcq9a6SaR SmH0gTbO2Ot1EWMMkjyTsAqat+fZsx1k+dmqD1sZIk3gdOYZuOa1sgFaU NVS/JNH1/J89pN4x+Es7Us3YI4IISmC2hv/fNqySAwuRVIw8Myyf2MhKk w==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891921" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891921" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787907073" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787907073" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:38 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 5/7] net: page_pool: avoid calling no-op externals when possible Date: Fri, 14 Jul 2023 19:08:50 +0200 Message-ID: <20230714170853.866018-8-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles even when on DMA-coherent platforms (like x86) with no active IOMMU or swiotlb, just for the call ladder. Indeed, it's page_pool_put_page() page_pool_put_defragged_page() <- external __page_pool_put_page() page_pool_dma_sync_for_device() <- non-inline dma_sync_single_range_for_device() dma_sync_single_for_device() <- external dma_direct_sync_single_for_device() dev_is_dma_coherent() <- exit For the inline functions, no guarantees the compiler won't uninline them (they're clearly not one-liners and sometimes compilers uninline even 2 + 2). The first external call is necessary, but the rest 2+ are done for nothing each time, plus a bunch of checks here and there. Since Page Pool mappings are long-term and for one "device + addr" pair dma_need_sync() will always return the same value (basically, whether it belongs to an swiotlb pool), addresses can be tested once right after they're obtained and the result can be reused until the page is unmapped. Define the new PP DMA sync operation type, which will mean "do DMA syncs for the device, but only when needed" and turn it on by default when the driver asks to sync pages. When a page is mapped, check whether it needs syncs and if so, replace that "sync when needed" back to "always do syncs" globally for the whole pool (better safe than sorry). As long as the pool has no pages requiring DMA syncs, this cuts off a good piece of calls and checks. When at least one page required it, the pool conservatively falls back to "always call sync functions", no per-page verdicts. It's a fairly rare case anyway that only a few pages would require syncing. On my x86_64, this gives from 2% to 5% performance benefit with no negative impact for cases when IOMMU is on and the shortcut can't be used. Signed-off-by: Alexander Lobakin --- include/net/page_pool.h | 1 + net/core/page_pool.c | 7 ++++++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index aefcb76baf0e..3a2e1b29ff3e 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -133,6 +133,7 @@ struct page_pool { bool dma_map:1; /* Perform DMA mapping */ enum { PP_DMA_SYNC_ACT_DISABLED = 0, /* Driver didn't ask to sync */ + PP_DMA_SYNC_ACT_SKIP, /* Syncs can be skipped */ PP_DMA_SYNC_ACT_DO, /* Perform DMA sync ops */ } dma_sync_act:2; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 67957e74e1f5..ee6ce6786e29 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -196,7 +196,8 @@ static int page_pool_init(struct page_pool *pool, if (!pool->p.max_len) return -EINVAL; - pool->dma_sync_act = PP_DMA_SYNC_ACT_DO; + /* Try to avoid calling no-op syncs */ + pool->dma_sync_act = PP_DMA_SYNC_ACT_SKIP; /* pool->p.offset has to be set according to the address * offset used by the DMA engine to start copying rx data @@ -345,6 +346,10 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) page_pool_set_dma_addr(page, dma); + if (pool->dma_sync_act == PP_DMA_SYNC_ACT_SKIP && + dma_need_sync(pool->p.dev, dma)) + pool->dma_sync_act = PP_DMA_SYNC_ACT_DO; + if (pool->dma_sync_act == PP_DMA_SYNC_ACT_DO) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); From patchwork Fri Jul 14 17:08:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313869 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70085156C1 for ; Fri, 14 Jul 2023 17:10:54 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 197763ABC; Fri, 14 Jul 2023 10:10:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354646; x=1720890646; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lhIzZFfcEcJps7lcQP/nAxoYlG5hsQcTY92/w+e8fV0=; b=i5V95Dfo6doULw0JXaLxr92vplBvnp7Tjj7oNrBmEImxmIACcM7mbQex kF7wyvl2e4vX5zrOG82GFIzQYQm655fsn3RJK5oQAN1D0jZfC0AEDGdAm It61f+Y9JcaykvHySfdTandDnPgyWXZExUCTa7yNg1y4ZndPAiRzujuwT oDFgLq6d9W2puC5ylPSNjXDlz26wcTJxFJwkq4QV20SXcRtR+NFe5c0nY hoHjryQsApt3q5y61heYQO4Brc0Uct+zXTkvpaeJIVo8FKc9ju6lHe4iY jaSKdsJrqZ/HkmZVeJ8pld4tzeq1ykmaudeYqlCskWs3i58FkcgrK8dKw Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891946" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891946" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787907090" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787907090" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:41 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 6/7] net: skbuff: avoid accessing page_pool if !napi_safe when returning page Date: Fri, 14 Jul 2023 19:08:51 +0200 Message-ID: <20230714170853.866018-9-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Currently, pp->p.napi is always read, but the actual variable it gets assigned to is read-only when @napi_safe is true. For the !napi_safe cases, which yet is still a pack, it's an unneeded operation. Moreover, it can lead to premature or even redundant page_pool cacheline access. For example, when page_pool_is_last_frag() returns false (with the recent frag improvements). Thus, read it only when @napi_safe is true. This also allows moving @napi inside the condition block itself. Constify it while we are here, because why not. Signed-off-by: Alexander Lobakin --- net/core/skbuff.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4de280c454e4..fc1470aab5cf 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -881,9 +881,8 @@ static void skb_clone_fraglist(struct sk_buff *skb) bool page_pool_return_skb_page(struct page *page, bool napi_safe) { - struct napi_struct *napi; + bool allow_direct = false; struct page_pool *pp; - bool allow_direct; page = compound_head(page); @@ -903,9 +902,12 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe) * in the same context as the consumer would run, so there's * no possible race. */ - napi = READ_ONCE(pp->p.napi); - allow_direct = napi_safe && napi && - READ_ONCE(napi->list_owner) == smp_processor_id(); + if (napi_safe) { + const struct napi_struct *napi = READ_ONCE(pp->p.napi); + + allow_direct = napi && + READ_ONCE(napi->list_owner) == smp_processor_id(); + } /* Driver set this to memory recycling info. Reset it on recycle. * This will *not* work for NIC using a split-page memory model. From patchwork Fri Jul 14 17:08:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13313870 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46F38200C7 for ; Fri, 14 Jul 2023 17:10:55 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 068513A88; Fri, 14 Jul 2023 10:10:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689354648; x=1720890648; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nJ7Nmp7fZjFwNJay0+KnraPVyagEDyV2BETJOpf1NwA=; b=GeogfILQ3IfRjmTEZHfZv9yxJ1ov2N+LlaVH3a6fMdvDKYZ9zGi3ThzE 4tzPyqGDTcoqYfAvhaYkhHwVmIZ7q/lSROZmL0vfqjWkz3WW/tR6q3fxq PZNds9j1YC+zS+cJ3iy3cpnSS3KTs/hnqwNmWn/DjMN95jd2cPokCy+9F nCpTWfEmcl8k3Gz82No2fTTAc/SqEfWGbHtEaplfR7Di3vCqBEMtHflYy 14TUhFT1AawMn2ZKOj78d7WREeqpc35kWZnHAAsin3DRbvTAg5KSj2Cul QaQ2YkF8BvOL1kdpXJPLa2u79edpP+4zNQ3EfjigXOT+ga16/ieue8KiL g==; X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="451891965" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="451891965" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 10:10:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10771"; a="787907103" X-IronPort-AV: E=Sophos;i="6.01,206,1684825200"; d="scan'208";a="787907103" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmsmga008.fm.intel.com with ESMTP; 14 Jul 2023 10:10:45 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Maciej Fijalkowski , Larysa Zaremba , Yunsheng Lin , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH RFC net-next v2 7/7] net: skbuff: always try to recycle PP pages directly when in softirq Date: Fri, 14 Jul 2023 19:08:52 +0200 Message-ID: <20230714170853.866018-10-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714170853.866018-1-aleksander.lobakin@intel.com> References: <20230714170853.866018-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI") allowed direct recycling of skb pages to their PP for some cases, but unfortunately missed a couple of other majors. For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(), which unconditionally passes `false` as @napi_safe. Thus, all pages go through ptr_ring and locks, although most of time we're actually inside the NAPI polling this PP is linked with, so that it would be perfectly safe to recycle pages directly. Let's address such. If @napi_safe is true, we're fine, don't change anything for this path. But if it's false, check whether we are in the softirq context. It will most likely be so and then if ->list_owner is our current CPU, we're good to use direct recycling, even though @napi_safe is false -- concurrent access is excluded. in_softirq() protection is needed mostly due to we can hit this place in the process context (not the hardirq though). For the mentioned xdp-drop-skb-mode case, the improvement I got is 3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and alloc_slow counter doesn't change most of time, which means the MM layer is not even called to allocate any new pages. Suggested-by: Jakub Kicinski # in_softirq() Signed-off-by: Alexander Lobakin --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index fc1470aab5cf..1c22fd33be6c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -902,7 +902,7 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe) * in the same context as the consumer would run, so there's * no possible race. */ - if (napi_safe) { + if (napi_safe || in_softirq()) { const struct napi_struct *napi = READ_ONCE(pp->p.napi); allow_direct = napi &&