Message ID | 20231122034420.1158898-1-kuba@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | net: page_pool: add netlink-based introspection | expand |
On Tue, Nov 21, 2023 at 07:44:07PM -0800, Jakub Kicinski wrote: > We recently started to deploy newer kernels / drivers at Meta, > making significant use of page pools for the first time. > We immediately run into page pool leaks both real and false positive > warnings. As Eric pointed out/predicted there's no guarantee that > applications will read / close their sockets so a page pool page > may be stuck in a socket (but not leaked) forever. This happens > a lot in our fleet. Most of these are obviously due to application > bugs but we should not be printing kernel warnings due to minor > application resource leaks. > > Conversely the page pool memory may get leaked at runtime, and > we have no way to detect / track that, unless someone reconfigures > the NIC and destroys the page pools which leaked the pages. > > The solution presented here is to expose the memory use of page > pools via netlink. This allows for continuous monitoring of memory > used by page pools, regardless if they were destroyed or not. > Sample in patch 15 can print the memory use and recycling > efficiency: > > $ ./page-pool > eth0[2] page pools: 10 (zombies: 0) > refs: 41984 bytes: 171966464 (refs: 0 bytes: 0) > recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201) Hi Jakub, I am wondering if you considered to expose these metrics through meminfo/vmstat as well, is that a bad idea or is this/netlink more of a preference? thanks, Shakeel
On Sat, 25 Nov 2023 20:57:24 +0000 Shakeel Butt wrote: > > $ ./page-pool > > eth0[2] page pools: 10 (zombies: 0) > > refs: 41984 bytes: 171966464 (refs: 0 bytes: 0) > > recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201) > > Hi Jakub, I am wondering if you considered to expose these metrics > through meminfo/vmstat as well, is that a bad idea or is this/netlink > more of a preference? If that's net-namespaced we can add the basics there. We'll still need the netlink interface, tho, it's currently per-interface and per-queue (simplifying a bit). But internally the recycling stats are also per-CPU, which could be of interest at some stage.
On Sun, Nov 26, 2023 at 02:43:00PM -0800, Jakub Kicinski wrote: > On Sat, 25 Nov 2023 20:57:24 +0000 Shakeel Butt wrote: > > > $ ./page-pool > > > eth0[2] page pools: 10 (zombies: 0) > > > refs: 41984 bytes: 171966464 (refs: 0 bytes: 0) > > > recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201) > > > > Hi Jakub, I am wondering if you considered to expose these metrics > > through meminfo/vmstat as well, is that a bad idea or is this/netlink > > more of a preference? > > If that's net-namespaced we can add the basics there. We'll still need > the netlink interface, tho, it's currently per-interface and per-queue > (simplifying a bit). But internally the recycling stats are also > per-CPU, which could be of interest at some stage. Not really net-namespaced but rather system level stats in those interfaces. Anyways if having system level makes sense then these stats can be added later.