mbox series

[net-next,v7,0/4] page_pool: Add stats counters

Message ID 1645810914-35485-1-git-send-email-jdamato@fastly.com (mailing list archive)
Headers show
Series page_pool: Add stats counters | expand

Message

Joe Damato Feb. 25, 2022, 5:41 p.m. UTC
Greetings:

Welcome to v7.

This revision splits stats into two structures and tries to pick the right
placement within the page_pool struct. The allocation path stats have been
changed to be per-pool (not per-cpu), and the recycle stats are per-pool
per-cpu.

pahole reports:

	/* --- cacheline 3 boundary (192 bytes) --- */
	struct page_pool_alloc_stats alloc_stats;        /*   192    48 */
	u32                        xdp_mem_id;           /*   240     4 */

	/* --- cacheline 24 boundary (1536 bytes) --- */
	struct page_pool_recycle_stats * recycle_stats;  /*  1536     8 */
	atomic_t                   pages_state_release_cnt; /*  1544     4 */
	refcount_t                 user_cnt;             /*  1548     4 */
	u64                        destroy_cnt;          /*  1552     8 */

	/* size: 1600, cachelines: 25, members: 17 */
	/* sum members: 1492, holes: 2, sum holes: 68 */
	/* padding: 40 */

The page_pool_get_stats API has been updated to fill a wrapper struct
(struct page_pool_stats) which encapsulates both allocation and recycle
structs. It can be extended to include other stats in the future without
API breakage.

mlx5 driver patch to use this new API is included in this series.

Benchmarks have been re-run. As always, results between runs are highly
variable; you'll find results showing that stats disabled are both faster
and slower than stats enabled in back to back benchmark runs.

Raw benchmark output with stats off [1] and stats on [2] are available for
examination.

Test system:
	- 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
	- 2 NUMA zones, with 18 cores per zone and 2 threads per core

bench_page_pool_simple results, loops=200000000
test name			stats enabled		stats disabled
				cycles	nanosec		cycles	nanosec

for_loop			0	0.335		0	0.336
atomic_inc 			14	6.106		13	6.022
lock				30	13.365		32	13.968

no-softirq-page_pool01		75	32.884		74	32.308
no-softirq-page_pool02		79	34.696		74	32.302
no-softirq-page_pool03		110	48.005		105	46.073

tasklet_page_pool01_fast_path	14	6.156		14	6.211
tasklet_page_pool02_ptr_ring	41	18.028		39	17.391
tasklet_page_pool03_slow	107	46.646		105	46.123

bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4:
test name			stats enabled		stats disabled
				cycles	nanosec		cycles	nanosec

page_pool_cross_cpu CPU(0)	3973	1731.596	4015	1750.015
page_pool_cross_cpu CPU(1)	3976	1733.217	4022	1752.864
page_pool_cross_cpu CPU(2)	3973	1731.615	4016	1750.433
page_pool_cross_cpu CPU(3)	3976	1733.218	4021	1752.806
page_pool_cross_cpu CPU(4)	994	433.305		1005	438.217

page_pool_cross_cpu average	3378	-		3415	-

bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8:
test name			stats enabled		stats disabled
				cycles	nanosec		cycles	nanosec

page_pool_cross_cpu CPU(0)	6969	3037.488	6909	3011.463
page_pool_cross_cpu CPU(1)	6974	3039.469	6913	3012.961
page_pool_cross_cpu CPU(2)	6969	3037.575	6910	3011.585
page_pool_cross_cpu CPU(3)	6974	3039.415	6913	3012.961
page_pool_cross_cpu CPU(4)	6969	3037.288	6909	3011.368
page_pool_cross_cpu CPU(5)	6972	3038.732	6913	3012.920
page_pool_cross_cpu CPU(6)	6969	3037.350	6909	3011.386
page_pool_cross_cpu CPU(7)	6973	3039.356	6913	3012.921
page_pool_cross_cpu CPU(8)	871	379.934		864	376.620

page_pool_cross_cpu average	6293	-		6239	-

Thanks.

[1]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_disabled
[2]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_enabled

v6 -> v7:
	- stats split out into two structs one single per-page pool struct
	  for allocation path stats and one per-cpu pointer for recycle
	  path stats.
	- page_pool_get_stats updated to use a wrapper struct to gather
	  stats for allocation and recycle stats with a single argument.
	- placement of structs adjusted
	- mlx5 driver modified to use page_pool_get_stats API

v5 -> v6:
	- Per cpu page_pool_stats struct pointer is now marked as
	  ____cacheline_aligned_in_smp. Placement of the field in the
	  struct is unchanged; it is the last field.

v4 -> v5:
	- Fixed the description of the kernel option in Kconfig.
	- Squashed commits 1-10 from v4 into a single commit for easier
	  review.
	- Changed the comment style of the comment for
	  the this_cpu_inc_alloc_stat macro.
	- Changed the return type of page_pool_get_stats from struct
	  page_pool_stat * to bool.

v3 -> v4:
	- Restructured stats to be per-cpu per-pool.
	- Global stats and proc file were removed.
	- Exposed an API (page_pool_get_stats) for batching the pool stats.

v2 -> v3:
	- patch 8/10 ("Add stat tracking cache refill") fixed placement of
	  counter increment.
	- patch 10/10 ("net-procfs: Show page pool stats in proc") updated:
		- fix unused label warning from kernel test robot,
		- fixed page_pool_seq_show to only display the refill stat
		  once,
		- added a remove_proc_entry for page_pool_stat to
		  dev_proc_net_exit.

v1 -> v2:
	- A new kernel config option has been added, which defaults to N,
	   preventing this code from being compiled in by default
	- The stats structure has been converted to a per-cpu structure
	- The stats are now exported via proc (/proc/net/page_pool_stat)

Joe Damato (4):
  page_pool: Add allocation stats
  page_pool: Add recycle stats
  page_pool: Add function to batch and return stats
  mlx5: add support for page_pool_get_stats

 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 76 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 27 +++++++-
 include/net/page_pool.h                            | 51 ++++++++++++++
 net/Kconfig                                        | 13 ++++
 net/core/page_pool.c                               | 77 ++++++++++++++++++++--
 5 files changed, 238 insertions(+), 6 deletions(-)