mbox series

[net-next,v3,00/13] net: page_pool: add netlink-based introspection

Message ID 20231122034420.1158898-1-kuba@kernel.org (mailing list archive)
Headers show
Series net: page_pool: add netlink-based introspection | expand

Message

Jakub Kicinski Nov. 22, 2023, 3:44 a.m. UTC
We recently started to deploy newer kernels / drivers at Meta,
making significant use of page pools for the first time.
We immediately run into page pool leaks both real and false positive
warnings. As Eric pointed out/predicted there's no guarantee that
applications will read / close their sockets so a page pool page
may be stuck in a socket (but not leaked) forever. This happens
a lot in our fleet. Most of these are obviously due to application
bugs but we should not be printing kernel warnings due to minor
application resource leaks.

Conversely the page pool memory may get leaked at runtime, and
we have no way to detect / track that, unless someone reconfigures
the NIC and destroys the page pools which leaked the pages.

The solution presented here is to expose the memory use of page
pools via netlink. This allows for continuous monitoring of memory
used by page pools, regardless if they were destroyed or not.
Sample in patch 15 can print the memory use and recycling
efficiency:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

v3:
 - ID is still here, can't decide if it matters
 - rename destroyed -> detach-time, good enough?
 - fix build for netsec
v2: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org
 - hopefully fix build with PAGE_POOL=n
v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/
 - The main change compared to the RFC is that the API now exposes
   outstanding references and byte counts even for "live" page pools.
   The warning is no longer printed if page pool is accessible via netlink.
RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/

Jakub Kicinski (13):
  net: page_pool: factor out uninit
  net: page_pool: id the page pools
  net: page_pool: record pools per netdev
  net: page_pool: stash the NAPI ID for easier access
  eth: link netdev to page_pools in drivers
  net: page_pool: add nlspec for basic access to page pools
  net: page_pool: implement GET in the netlink API
  net: page_pool: add netlink notifications for state changes
  net: page_pool: report amount of memory held by page pools
  net: page_pool: report when page pool was destroyed
  net: page_pool: expose page pool stats via netlink
  net: page_pool: mute the periodic warning for visible page pools
  tools: ynl: add sample for getting page-pool information

 Documentation/netlink/specs/netdev.yaml       | 170 +++++++
 Documentation/networking/page_pool.rst        |  10 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   1 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   1 +
 drivers/net/ethernet/microsoft/mana/mana_en.c |   1 +
 drivers/net/ethernet/socionext/netsec.c       |   2 +
 include/linux/list.h                          |  20 +
 include/linux/netdevice.h                     |   4 +
 include/linux/poison.h                        |   2 +
 include/net/page_pool/helpers.h               |   8 +-
 include/net/page_pool/types.h                 |  10 +
 include/uapi/linux/netdev.h                   |  36 ++
 net/core/Makefile                             |   2 +-
 net/core/netdev-genl-gen.c                    |  60 +++
 net/core/netdev-genl-gen.h                    |  11 +
 net/core/page_pool.c                          |  69 ++-
 net/core/page_pool_priv.h                     |  12 +
 net/core/page_pool_user.c                     | 414 +++++++++++++++++
 tools/include/uapi/linux/netdev.h             |  36 ++
 tools/net/ynl/generated/netdev-user.c         | 419 ++++++++++++++++++
 tools/net/ynl/generated/netdev-user.h         | 171 +++++++
 tools/net/ynl/lib/ynl.h                       |   2 +-
 tools/net/ynl/samples/.gitignore              |   1 +
 tools/net/ynl/samples/Makefile                |   2 +-
 tools/net/ynl/samples/page-pool.c             | 147 ++++++
 25 files changed, 1578 insertions(+), 33 deletions(-)
 create mode 100644 net/core/page_pool_priv.h
 create mode 100644 net/core/page_pool_user.c
 create mode 100644 tools/net/ynl/samples/page-pool.c

Comments

Shakeel Butt Nov. 25, 2023, 8:57 p.m. UTC | #1
On Tue, Nov 21, 2023 at 07:44:07PM -0800, Jakub Kicinski wrote:
> We recently started to deploy newer kernels / drivers at Meta,
> making significant use of page pools for the first time.
> We immediately run into page pool leaks both real and false positive
> warnings. As Eric pointed out/predicted there's no guarantee that
> applications will read / close their sockets so a page pool page
> may be stuck in a socket (but not leaked) forever. This happens
> a lot in our fleet. Most of these are obviously due to application
> bugs but we should not be printing kernel warnings due to minor
> application resource leaks.
> 
> Conversely the page pool memory may get leaked at runtime, and
> we have no way to detect / track that, unless someone reconfigures
> the NIC and destroys the page pools which leaked the pages.
> 
> The solution presented here is to expose the memory use of page
> pools via netlink. This allows for continuous monitoring of memory
> used by page pools, regardless if they were destroyed or not.
> Sample in patch 15 can print the memory use and recycling
> efficiency:
> 
> $ ./page-pool
>     eth0[2]	page pools: 10 (zombies: 0)
> 		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
> 		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

Hi Jakub, I am wondering if you considered to expose these metrics
through meminfo/vmstat as well, is that a bad idea or is this/netlink
more of a preference?

thanks,
Shakeel
Jakub Kicinski Nov. 26, 2023, 10:43 p.m. UTC | #2
On Sat, 25 Nov 2023 20:57:24 +0000 Shakeel Butt wrote:
> > $ ./page-pool
> >     eth0[2]	page pools: 10 (zombies: 0)
> > 		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
> > 		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)  
> 
> Hi Jakub, I am wondering if you considered to expose these metrics
> through meminfo/vmstat as well, is that a bad idea or is this/netlink
> more of a preference?

If that's net-namespaced we can add the basics there. We'll still need
the netlink interface, tho, it's currently per-interface and per-queue
(simplifying a bit). But internally the recycling stats are also
per-CPU, which could be of interest at some stage.
Shakeel Butt Nov. 27, 2023, 6:54 a.m. UTC | #3
On Sun, Nov 26, 2023 at 02:43:00PM -0800, Jakub Kicinski wrote:
> On Sat, 25 Nov 2023 20:57:24 +0000 Shakeel Butt wrote:
> > > $ ./page-pool
> > >     eth0[2]	page pools: 10 (zombies: 0)
> > > 		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
> > > 		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)  
> > 
> > Hi Jakub, I am wondering if you considered to expose these metrics
> > through meminfo/vmstat as well, is that a bad idea or is this/netlink
> > more of a preference?
> 
> If that's net-namespaced we can add the basics there. We'll still need
> the netlink interface, tho, it's currently per-interface and per-queue
> (simplifying a bit). But internally the recycling stats are also
> per-CPU, which could be of interest at some stage.

Not really net-namespaced but rather system level stats in those
interfaces. Anyways if having system level makes sense then these stats
can be added later.