From patchwork Fri Feb 25 17:41:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Damato X-Patchwork-Id: 12760678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D04C2C433F5 for ; Fri, 25 Feb 2022 17:43:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243971AbiBYRnf (ORCPT ); Fri, 25 Feb 2022 12:43:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243979AbiBYRnd (ORCPT ); Fri, 25 Feb 2022 12:43:33 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3E255643E for ; Fri, 25 Feb 2022 09:42:55 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id z16so5301341pfh.3 for ; Fri, 25 Feb 2022 09:42:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; h=from:to:cc:subject:date:message-id; bh=GjEQhoVxVaf6CbgfB1Fv0t/i/p9umvdOVwNkz8hkV3M=; b=O9fMhhFRPmNVnOMB0lo/eFUa6spGgbteeNXO2bXyeL3KVeFoct0c3VGGdsPcAMA2wE x7Uip3jdBtO/Y1eMwAs0WFfM05gSbXxZbZXGhv7dm8+V5seQFiDtJNiOKXlJO8Zn7bbG kjLKBS44U5tRUSnT7uxJripbliaS5spNF6e34= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=GjEQhoVxVaf6CbgfB1Fv0t/i/p9umvdOVwNkz8hkV3M=; b=PNz/wRn04/l+oDKybKVlFHtMyEhklt9fG5KCs1LOKVH8B42lhN97vUF1ZzFTzlMi9z ZXjVYuWuN77LU9eb4JrnHoVLFgl3Br9v+ms6Fek6Ux9j4ihqGuy3hCWVSJWQF0WKvYaJ gAn/solDS1tNifiV98kb0rLPAC84OyQPvYYK7AUrfhdFJRGzPE0XuXnonC44bmXVAbWB jYzjoZWqclDH6aTu7oantMDRXkRt81JCUbDPB5KQ5Tq3KaFH/ShcjD/rpf2Z0JdMG2xd 4D3PduCxn91Nv5CmI20Qt9sI3dA35kI1wzDab4ckR+OgUrPiOZFegGc1kuXL4th9ukLM hnBA== X-Gm-Message-State: AOAM531RDvIm3yV8MaFCDzjNC4C4fEI9OpOE2eqtaOdMUKqA/DwrK+vO c10nwXo7UibYJOZbML72rkZ3KSOVGbtlea2afYGpHCbc4R6bstUUYDTXmAl7zWG2Ufwn1cm01Py 0j7pN0pgQOdkJx2Jid1pWy5/0zBcuzUyoA+hOESr+kA/vL8P1aRReXTgVFcawTjQkoCwl X-Google-Smtp-Source: ABdhPJx60d+RZ1NNoQdoBqvvK4rICldoZzxyzF+1Ku7J/DOmq/5jtfvTHjiATnioo+9iU2ei1rxjig== X-Received: by 2002:a63:1321:0:b0:376:333b:3ed with SMTP id i33-20020a631321000000b00376333b03edmr4392311pgl.283.1645810974088; Fri, 25 Feb 2022 09:42:54 -0800 (PST) Received: from localhost.localdomain (c-73-223-190-181.hsd1.ca.comcast.net. [73.223.190.181]) by smtp.gmail.com with ESMTPSA id h2-20020a656382000000b00370648d902csm3203805pgv.4.2022.02.25.09.42.52 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Feb 2022 09:42:53 -0800 (PST) From: Joe Damato To: netdev@vger.kernel.org, kuba@kernel.org, ilias.apalodimas@linaro.org, davem@davemloft.net, hawk@kernel.org, saeed@kernel.org, ttoukan.linux@gmail.com, brouer@redhat.com Cc: Joe Damato Subject: [net-next v7 0/4] page_pool: Add stats counters Date: Fri, 25 Feb 2022 09:41:50 -0800 Message-Id: <1645810914-35485-1-git-send-email-jdamato@fastly.com> X-Mailer: git-send-email 2.7.4 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Greetings: Welcome to v7. This revision splits stats into two structures and tries to pick the right placement within the page_pool struct. The allocation path stats have been changed to be per-pool (not per-cpu), and the recycle stats are per-pool per-cpu. pahole reports: /* --- cacheline 3 boundary (192 bytes) --- */ struct page_pool_alloc_stats alloc_stats; /* 192 48 */ u32 xdp_mem_id; /* 240 4 */ /* --- cacheline 24 boundary (1536 bytes) --- */ struct page_pool_recycle_stats * recycle_stats; /* 1536 8 */ atomic_t pages_state_release_cnt; /* 1544 4 */ refcount_t user_cnt; /* 1548 4 */ u64 destroy_cnt; /* 1552 8 */ /* size: 1600, cachelines: 25, members: 17 */ /* sum members: 1492, holes: 2, sum holes: 68 */ /* padding: 40 */ The page_pool_get_stats API has been updated to fill a wrapper struct (struct page_pool_stats) which encapsulates both allocation and recycle structs. It can be extended to include other stats in the future without API breakage. mlx5 driver patch to use this new API is included in this series. Benchmarks have been re-run. As always, results between runs are highly variable; you'll find results showing that stats disabled are both faster and slower than stats enabled in back to back benchmark runs. Raw benchmark output with stats off [1] and stats on [2] are available for examination. Test system: - 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz - 2 NUMA zones, with 18 cores per zone and 2 threads per core bench_page_pool_simple results, loops=200000000 test name stats enabled stats disabled cycles nanosec cycles nanosec for_loop 0 0.335 0 0.336 atomic_inc 14 6.106 13 6.022 lock 30 13.365 32 13.968 no-softirq-page_pool01 75 32.884 74 32.308 no-softirq-page_pool02 79 34.696 74 32.302 no-softirq-page_pool03 110 48.005 105 46.073 tasklet_page_pool01_fast_path 14 6.156 14 6.211 tasklet_page_pool02_ptr_ring 41 18.028 39 17.391 tasklet_page_pool03_slow 107 46.646 105 46.123 bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4: test name stats enabled stats disabled cycles nanosec cycles nanosec page_pool_cross_cpu CPU(0) 3973 1731.596 4015 1750.015 page_pool_cross_cpu CPU(1) 3976 1733.217 4022 1752.864 page_pool_cross_cpu CPU(2) 3973 1731.615 4016 1750.433 page_pool_cross_cpu CPU(3) 3976 1733.218 4021 1752.806 page_pool_cross_cpu CPU(4) 994 433.305 1005 438.217 page_pool_cross_cpu average 3378 - 3415 - bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8: test name stats enabled stats disabled cycles nanosec cycles nanosec page_pool_cross_cpu CPU(0) 6969 3037.488 6909 3011.463 page_pool_cross_cpu CPU(1) 6974 3039.469 6913 3012.961 page_pool_cross_cpu CPU(2) 6969 3037.575 6910 3011.585 page_pool_cross_cpu CPU(3) 6974 3039.415 6913 3012.961 page_pool_cross_cpu CPU(4) 6969 3037.288 6909 3011.368 page_pool_cross_cpu CPU(5) 6972 3038.732 6913 3012.920 page_pool_cross_cpu CPU(6) 6969 3037.350 6909 3011.386 page_pool_cross_cpu CPU(7) 6973 3039.356 6913 3012.921 page_pool_cross_cpu CPU(8) 871 379.934 864 376.620 page_pool_cross_cpu average 6293 - 6239 - Thanks. [1]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_disabled [2]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_enabled v6 -> v7: - stats split out into two structs one single per-page pool struct for allocation path stats and one per-cpu pointer for recycle path stats. - page_pool_get_stats updated to use a wrapper struct to gather stats for allocation and recycle stats with a single argument. - placement of structs adjusted - mlx5 driver modified to use page_pool_get_stats API v5 -> v6: - Per cpu page_pool_stats struct pointer is now marked as ____cacheline_aligned_in_smp. Placement of the field in the struct is unchanged; it is the last field. v4 -> v5: - Fixed the description of the kernel option in Kconfig. - Squashed commits 1-10 from v4 into a single commit for easier review. - Changed the comment style of the comment for the this_cpu_inc_alloc_stat macro. - Changed the return type of page_pool_get_stats from struct page_pool_stat * to bool. v3 -> v4: - Restructured stats to be per-cpu per-pool. - Global stats and proc file were removed. - Exposed an API (page_pool_get_stats) for batching the pool stats. v2 -> v3: - patch 8/10 ("Add stat tracking cache refill") fixed placement of counter increment. - patch 10/10 ("net-procfs: Show page pool stats in proc") updated: - fix unused label warning from kernel test robot, - fixed page_pool_seq_show to only display the refill stat once, - added a remove_proc_entry for page_pool_stat to dev_proc_net_exit. v1 -> v2: - A new kernel config option has been added, which defaults to N, preventing this code from being compiled in by default - The stats structure has been converted to a per-cpu structure - The stats are now exported via proc (/proc/net/page_pool_stat) Joe Damato (4): page_pool: Add allocation stats page_pool: Add recycle stats page_pool: Add function to batch and return stats mlx5: add support for page_pool_get_stats drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 76 +++++++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 27 +++++++- include/net/page_pool.h | 51 ++++++++++++++ net/Kconfig | 13 ++++ net/core/page_pool.c | 77 ++++++++++++++++++++-- 5 files changed, 238 insertions(+), 6 deletions(-)