mbox series

[net-next,v2,00/15] net: page_pool: add netlink-based introspection

Message ID 20231121000048.789613-1-kuba@kernel.org (mailing list archive)
Headers show
Series net: page_pool: add netlink-based introspection | expand

Message

Jakub Kicinski Nov. 21, 2023, midnight UTC
We recently started to deploy newer kernels / drivers at Meta,
making significant use of page pools for the first time.
We immediately run into page pool leaks both real and false positive
warnings. As Eric pointed out/predicted there's no guarantee that
applications will read / close their sockets so a page pool page
may be stuck in a socket (but not leaked) forever. This happens
a lot in our fleet. Most of these are obviously due to application
bugs but we should not be printing kernel warnings due to minor
application resource leaks.

Conversely the page pool memory may get leaked at runtime, and
we have no way to detect / track that, unless someone reconfigures
the NIC and destroys the page pools which leaked the pages.

The solution presented here is to expose the memory use of page
pools via netlink. This allows for continuous monitoring of memory
used by page pools, regardless if they were destroyed or not.
Sample in patch 15 can print the memory use and recycling
efficiency:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

v2:
 - hopefully fix build with PAGE_POOL=n
v1:  https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/
 - The main change compared to the RFC is that the API now exposes
   outstanding references and byte counts even for "live" page pools.
   The warning is no longer printed if page pool is accessible via netlink.
RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/

Jakub Kicinski (15):
  net: page_pool: split the page_pool_params into fast and slow
  net: page_pool: avoid touching slow on the fastpath
  net: page_pool: factor out uninit
  net: page_pool: id the page pools
  net: page_pool: record pools per netdev
  net: page_pool: stash the NAPI ID for easier access
  eth: link netdev to page_pools in drivers
  net: page_pool: add nlspec for basic access to page pools
  net: page_pool: implement GET in the netlink API
  net: page_pool: add netlink notifications for state changes
  net: page_pool: report amount of memory held by page pools
  net: page_pool: report when page pool was destroyed
  net: page_pool: expose page pool stats via netlink
  net: page_pool: mute the periodic warning for visible page pools
  tools: ynl: add sample for getting page-pool information

 Documentation/netlink/specs/netdev.yaml       | 166 +++++++
 Documentation/networking/page_pool.rst        |  10 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   1 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   1 +
 drivers/net/ethernet/microsoft/mana/mana_en.c |   1 +
 drivers/net/ethernet/socionext/netsec.c       |   2 +
 include/linux/list.h                          |  20 +
 include/linux/netdevice.h                     |   4 +
 include/linux/poison.h                        |   2 +
 include/net/page_pool/helpers.h               |   8 +-
 include/net/page_pool/types.h                 |  43 +-
 include/uapi/linux/netdev.h                   |  36 ++
 net/core/Makefile                             |   2 +-
 net/core/netdev-genl-gen.c                    |  60 +++
 net/core/netdev-genl-gen.h                    |  11 +
 net/core/page_pool.c                          |  78 ++--
 net/core/page_pool_priv.h                     |  12 +
 net/core/page_pool_user.c                     | 414 +++++++++++++++++
 tools/include/uapi/linux/netdev.h             |  36 ++
 tools/net/ynl/generated/netdev-user.c         | 419 ++++++++++++++++++
 tools/net/ynl/generated/netdev-user.h         | 171 +++++++
 tools/net/ynl/lib/ynl.h                       |   2 +-
 tools/net/ynl/samples/.gitignore              |   1 +
 tools/net/ynl/samples/Makefile                |   2 +-
 tools/net/ynl/samples/page-pool.c             | 147 ++++++
 25 files changed, 1601 insertions(+), 48 deletions(-)
 create mode 100644 net/core/page_pool_priv.h
 create mode 100644 net/core/page_pool_user.c
 create mode 100644 tools/net/ynl/samples/page-pool.c

Comments

Jakub Kicinski Nov. 22, 2023, 1:31 a.m. UTC | #1
On Mon, 20 Nov 2023 16:00:33 -0800 Jakub Kicinski wrote:
>   net: page_pool: split the page_pool_params into fast and slow
>   net: page_pool: avoid touching slow on the fastpath

To relieve some of the pain and suffering this series causes to our
build tester I'm going to apply the first 2 patches already. I hope
that's fine. They are pretty stand-alone and have broad acks/review
tags.
patchwork-bot+netdevbpf@kernel.org Nov. 22, 2023, 1:40 a.m. UTC | #2
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 20 Nov 2023 16:00:33 -0800 you wrote:
> We recently started to deploy newer kernels / drivers at Meta,
> making significant use of page pools for the first time.
> We immediately run into page pool leaks both real and false positive
> warnings. As Eric pointed out/predicted there's no guarantee that
> applications will read / close their sockets so a page pool page
> may be stuck in a socket (but not leaked) forever. This happens
> a lot in our fleet. Most of these are obviously due to application
> bugs but we should not be printing kernel warnings due to minor
> application resource leaks.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,01/15] net: page_pool: split the page_pool_params into fast and slow
    https://git.kernel.org/netdev/net-next/c/5027ec19f104
  - [net-next,v2,02/15] net: page_pool: avoid touching slow on the fastpath
    https://git.kernel.org/netdev/net-next/c/2da0cac1e949
  - [net-next,v2,03/15] net: page_pool: factor out uninit
    (no matching commit)
  - [net-next,v2,04/15] net: page_pool: id the page pools
    (no matching commit)
  - [net-next,v2,05/15] net: page_pool: record pools per netdev
    (no matching commit)
  - [net-next,v2,06/15] net: page_pool: stash the NAPI ID for easier access
    (no matching commit)
  - [net-next,v2,07/15] eth: link netdev to page_pools in drivers
    (no matching commit)
  - [net-next,v2,08/15] net: page_pool: add nlspec for basic access to page pools
    (no matching commit)
  - [net-next,v2,09/15] net: page_pool: implement GET in the netlink API
    (no matching commit)
  - [net-next,v2,10/15] net: page_pool: add netlink notifications for state changes
    (no matching commit)
  - [net-next,v2,11/15] net: page_pool: report amount of memory held by page pools
    (no matching commit)
  - [net-next,v2,12/15] net: page_pool: report when page pool was destroyed
    (no matching commit)
  - [net-next,v2,13/15] net: page_pool: expose page pool stats via netlink
    (no matching commit)
  - [net-next,v2,14/15] net: page_pool: mute the periodic warning for visible page pools
    (no matching commit)
  - [net-next,v2,15/15] tools: ynl: add sample for getting page-pool information
    (no matching commit)

You are awesome, thank you!
Jesper Dangaard Brouer Nov. 22, 2023, 8:45 a.m. UTC | #3
On 11/22/23 02:31, Jakub Kicinski wrote:
> On Mon, 20 Nov 2023 16:00:33 -0800 Jakub Kicinski wrote:
>>    net: page_pool: split the page_pool_params into fast and slow
>>    net: page_pool: avoid touching slow on the fastpath
> 
> To relieve some of the pain and suffering this series causes to our
> build tester I'm going to apply the first 2 patches already. I hope
> that's fine. They are pretty stand-alone and have broad acks/review
> tags.

Fine by me to apply the first 2 patches. Thanks for communicating this 
as I did get confused seeing patchwork-bot reporting this was applied 
and seeing a V3 on the list.

Keep up the good work. Overall I like this netlink-based introspection.
I'll try to get some opinions from Cloudflare people how this can be
integrated into existing monitoring systems.

--Jesper