diff mbox series

[net] net: page_pool: don't try to stash the napi id

Message ID 20250123231620.1086401-1-kuba@kernel.org (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net] net: page_pool: don't try to stash the napi id | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 1426 this patch: 1426
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 240 this patch: 240
netdev/checkpatch warning CHECK: DEFINE_MUTEX definition without comment
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 82 this patch: 82
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-01-25--09-00 (tests: 885)

Commit Message

Jakub Kicinski Jan. 23, 2025, 11:16 p.m. UTC
Page ppol tried to cache the NAPI ID in page pool info to avoid
having a dependency on the life cycle of the NAPI instance.
Since commit under Fixes the NAPI ID is not populated until
napi_enable() and there's a good chance that page pool is
created before NAPI gets enabled.

Protect the NAPI pointer with the existing page pool mutex,
the reading path already holds it. napi_id itself we need
to READ_ONCE(), it's protected by netdev_lock() which are
not holding in page pool.

Before this patch napi IDs were missing for mlx5:

 # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912},
  {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528},
  {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520},
  {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232},
  ...

After:

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912,
   'napi-id': 565},
  {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504,
   'napi-id': 525},
  {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648,
   'napi-id': 524},
  ...

Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: hawk@kernel.org
CC: ilias.apalodimas@linaro.org
CC: asml.silence@gmail.com
CC: almasrymina@google.com
CC: kaiyuanz@google.com
CC: willemb@google.com
CC: mkarsten@uwaterloo.ca
CC: jdamato@fastly.com
---
 include/net/page_pool/types.h |  1 -
 net/core/page_pool_priv.h     |  2 ++
 net/core/dev.c                |  2 +-
 net/core/page_pool.c          |  2 ++
 net/core/page_pool_user.c     | 15 +++++++++------
 5 files changed, 14 insertions(+), 8 deletions(-)

Comments

Mina Almasry Jan. 24, 2025, 9 p.m. UTC | #1
On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Page ppol tried to cache the NAPI ID in page pool info to avoid

Page pool

> having a dependency on the life cycle of the NAPI instance.
> Since commit under Fixes the NAPI ID is not populated until
> napi_enable() and there's a good chance that page pool is
> created before NAPI gets enabled.
>
> Protect the NAPI pointer with the existing page pool mutex,
> the reading path already holds it. napi_id itself we need

The reading paths in page_pool.c don't hold the lock, no? Only the
reading paths in page_pool_user.c seem to do.

I could not immediately wrap my head around why pool->p.napi can be
accessed in page_pool_napi_local with no lock, but needs to be
protected in the code in page_pool_user.c. It seems
READ_ONCE/WRITE_ONCE protection is good enough to make sure
page_pool_napi_local doesn't race with
page_pool_disable_direct_recycling in a way that can crash (the
reading code either sees a valid pointer or NULL). Why is that not
good enough to also synchronize the accesses between
page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
the locking?

Is there some guarantee the napi won't change/get freed while
page_pool_local is running, but can change while page_pool_nl_fill is
running?

> to READ_ONCE(), it's protected by netdev_lock() which are
> not holding in page pool.
>
> Before this patch napi IDs were missing for mlx5:
>
>  # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get
>
>  [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912},
>   {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528},
>   {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520},
>   {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232},
>   ...
>
> After:
>
>  [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912,
>    'napi-id': 565},
>   {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504,
>    'napi-id': 525},
>   {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648,
>    'napi-id': 524},
>   ...
>
> Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: hawk@kernel.org
> CC: ilias.apalodimas@linaro.org
> CC: asml.silence@gmail.com
> CC: almasrymina@google.com
> CC: kaiyuanz@google.com
> CC: willemb@google.com
> CC: mkarsten@uwaterloo.ca
> CC: jdamato@fastly.com
> ---
>  include/net/page_pool/types.h |  1 -
>  net/core/page_pool_priv.h     |  2 ++
>  net/core/dev.c                |  2 +-
>  net/core/page_pool.c          |  2 ++
>  net/core/page_pool_user.c     | 15 +++++++++------
>  5 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index ed4cd114180a..7f405672b089 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -237,7 +237,6 @@ struct page_pool {
>         struct {
>                 struct hlist_node list;
>                 u64 detach_time;
> -               u32 napi_id;
>                 u32 id;
>         } user;
>  };
> diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
> index 57439787b9c2..2fb06d5f6d55 100644
> --- a/net/core/page_pool_priv.h
> +++ b/net/core/page_pool_priv.h
> @@ -7,6 +7,8 @@
>
>  #include "netmem_priv.h"
>
> +extern struct mutex page_pools_lock;
> +
>  s32 page_pool_inflight(const struct page_pool *pool, bool strict);
>
>  int page_pool_list(struct page_pool *pool);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index afa2282f2604..07b2bb1ce64f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6708,7 +6708,7 @@ void napi_resume_irqs(unsigned int napi_id)
>  static void __napi_hash_add_with_id(struct napi_struct *napi,
>                                     unsigned int napi_id)
>  {
> -       napi->napi_id = napi_id;
> +       WRITE_ONCE(napi->napi_id, napi_id);
>         hlist_add_head_rcu(&napi->napi_hash_node,
>                            &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
>  }
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index a3de752c5178..ed0f89373259 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1147,7 +1147,9 @@ void page_pool_disable_direct_recycling(struct page_pool *pool)
>         WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
>         WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
>
> +       mutex_lock(&page_pools_lock);
>         WRITE_ONCE(pool->p.napi, NULL);
> +       mutex_unlock(&page_pools_lock);
>  }
>  EXPORT_SYMBOL(page_pool_disable_direct_recycling);
>
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index 48335766c1bf..6677e0c2e256 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -3,6 +3,7 @@
>  #include <linux/mutex.h>
>  #include <linux/netdevice.h>
>  #include <linux/xarray.h>
> +#include <net/busy_poll.h>
>  #include <net/net_debug.h>
>  #include <net/netdev_rx_queue.h>
>  #include <net/page_pool/helpers.h>
> @@ -14,10 +15,11 @@
>  #include "netdev-genl-gen.h"
>
>  static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
> -/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
> +/* Protects: page_pools, netdevice->page_pools, pool->p.napi, pool->slow.netdev,
> + *     pool->user.
>   * Ordering: inside rtnl_lock
>   */
> -static DEFINE_MUTEX(page_pools_lock);
> +DEFINE_MUTEX(page_pools_lock);
>
>  /* Page pools are only reachable from user space (via netlink) if they are
>   * linked to a netdev at creation time. Following page pool "visibility"
> @@ -216,6 +218,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
>  {
>         struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
>         size_t inflight, refsz;
> +       unsigned int napi_id;
>         void *hdr;
>
>         hdr = genlmsg_iput(rsp, info);
> @@ -229,8 +232,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
>             nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
>                         pool->slow.netdev->ifindex))
>                 goto err_cancel;
> -       if (pool->user.napi_id &&
> -           nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
> +
> +       napi_id = pool->p.napi ? READ_ONCE(pool->p.napi->napi_id) : 0;

Flowing up on above, I wonder if this can be similar to the code in
page_pool_napi_local to work without the mutex protection:

napi = READ_ONCE(pool->p.napi);
if (napi)
   napi_id = READ_ONCE(napi->napi_id);

> +       if (napi_id >= MIN_NAPI_ID &&

I think this check is added to filter out 0? Nit: I would check for 0
here, since any non zero napi_id should come from the napi->napi_id,
which should be valid, but not a necessary change.

> +           nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, napi_id))
>                 goto err_cancel;
>
>         inflight = page_pool_inflight(pool, false);
> @@ -319,8 +324,6 @@ int page_pool_list(struct page_pool *pool)
>         if (pool->slow.netdev) {
>                 hlist_add_head(&pool->user.list,
>                                &pool->slow.netdev->page_pools);
> -               pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
> -
>                 netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
>         }
>
> --
> 2.48.1
>
Toke Høiland-Jørgensen Jan. 24, 2025, 10:18 p.m. UTC | #2
Mina Almasry <almasrymina@google.com> writes:

> On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> Page ppol tried to cache the NAPI ID in page pool info to avoid
>
> Page pool
>
>> having a dependency on the life cycle of the NAPI instance.
>> Since commit under Fixes the NAPI ID is not populated until
>> napi_enable() and there's a good chance that page pool is
>> created before NAPI gets enabled.
>>
>> Protect the NAPI pointer with the existing page pool mutex,
>> the reading path already holds it. napi_id itself we need
>
> The reading paths in page_pool.c don't hold the lock, no? Only the
> reading paths in page_pool_user.c seem to do.
>
> I could not immediately wrap my head around why pool->p.napi can be
> accessed in page_pool_napi_local with no lock, but needs to be
> protected in the code in page_pool_user.c. It seems
> READ_ONCE/WRITE_ONCE protection is good enough to make sure
> page_pool_napi_local doesn't race with
> page_pool_disable_direct_recycling in a way that can crash (the
> reading code either sees a valid pointer or NULL). Why is that not
> good enough to also synchronize the accesses between
> page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
> the locking?

It actually seems that this is *not* currently the case. See the
discussion here:

https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/

IMO (as indicated in the message linked above), we should require users
to destroy the page pool before freeing the NAPI memory, rather than add
additional synchronisation.

-Toke
Mina Almasry Jan. 24, 2025, 11:49 p.m. UTC | #3
On Fri, Jan 24, 2025 at 2:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Mina Almasry <almasrymina@google.com> writes:
>
> > On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> Page ppol tried to cache the NAPI ID in page pool info to avoid
> >
> > Page pool
> >
> >> having a dependency on the life cycle of the NAPI instance.
> >> Since commit under Fixes the NAPI ID is not populated until
> >> napi_enable() and there's a good chance that page pool is
> >> created before NAPI gets enabled.
> >>
> >> Protect the NAPI pointer with the existing page pool mutex,
> >> the reading path already holds it. napi_id itself we need
> >
> > The reading paths in page_pool.c don't hold the lock, no? Only the
> > reading paths in page_pool_user.c seem to do.
> >
> > I could not immediately wrap my head around why pool->p.napi can be
> > accessed in page_pool_napi_local with no lock, but needs to be
> > protected in the code in page_pool_user.c. It seems
> > READ_ONCE/WRITE_ONCE protection is good enough to make sure
> > page_pool_napi_local doesn't race with
> > page_pool_disable_direct_recycling in a way that can crash (the
> > reading code either sees a valid pointer or NULL). Why is that not
> > good enough to also synchronize the accesses between
> > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
> > the locking?
>
> It actually seems that this is *not* currently the case. See the
> discussion here:
>
> https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/
>
> IMO (as indicated in the message linked above), we should require users
> to destroy the page pool before freeing the NAPI memory, rather than add
> additional synchronisation.
>

Ah, I see. I wonder if we should make this part of the API via comment
and/or add DEBUG_NET_WARN_ON to catch misuse, something like:

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index ed4cd114180a..3919ca302e95 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -257,6 +257,10 @@ struct xdp_mem_info;

 #ifdef CONFIG_PAGE_POOL
 void page_pool_disable_direct_recycling(struct page_pool *pool);
+
+/* page_pool_destroy or page_pool_disable_direct_recycling must be
called before
+ * netif_napi_del if pool->p.napi is set.
+ */
 void page_pool_destroy(struct page_pool *pool);
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
                           const struct xdp_mem_info *mem);

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5c4b788b811b..dc82767b2516 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1161,6 +1161,8 @@ void page_pool_destroy(struct page_pool *pool)
        if (!page_pool_put(pool))
                return;

+       DEBUG_NET_WARN_ON(pool->p.napi && !napi_is_valid(pool->p.napi));
+
        page_pool_disable_direct_recycling(pool);
        page_pool_free_frag(pool);

I also took a quick spot check - which could be wrong - but it seems
to me both gve and bnxt free the napi before destroying the pool :(

But I think this entire discussion is unrelated to this patch, so and
the mutex sync in this patch seems necessary for the page_pool_user.c
code which runs outside of softirq context:

Reviewed-by: Mina Almasry <almasrymina@google.com>
Jakub Kicinski Jan. 25, 2025, 1:31 a.m. UTC | #4
On Fri, 24 Jan 2025 23:18:08 +0100 Toke Høiland-Jørgensen wrote:
> > The reading paths in page_pool.c don't hold the lock, no? Only the
> > reading paths in page_pool_user.c seem to do.
> >
> > I could not immediately wrap my head around why pool->p.napi can be
> > accessed in page_pool_napi_local with no lock, but needs to be
> > protected in the code in page_pool_user.c. It seems
> > READ_ONCE/WRITE_ONCE protection is good enough to make sure
> > page_pool_napi_local doesn't race with
> > page_pool_disable_direct_recycling in a way that can crash (the
> > reading code either sees a valid pointer or NULL). Why is that not
> > good enough to also synchronize the accesses between
> > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
> > the locking?  
> 
> It actually seems that this is *not* currently the case. See the
> discussion here:
> 
> https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/
> 
> IMO (as indicated in the message linked above), we should require users
> to destroy the page pool before freeing the NAPI memory, rather than add
> additional synchronisation.

Agreed in general but this is a slightly different case. 
This sequence should be legal IMHO:

  page_pool_disable_direct_recycling()
  napi_disable()
  netif_napi_del()
  # free NAPI
  page_pool_destroy()

I'm not saying it's a good idea! but since
page_pool_disable_direct_recycling() detaches the NAPI,
logically someone could assume the above works.

I agree with you on datapath accesses, as discussed in the thread you
linked. But here reader is not under RCU, so the RCU sync in NAPI
destruction does not protect us from reader stalling for a long time.
diff mbox series

Patch

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index ed4cd114180a..7f405672b089 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -237,7 +237,6 @@  struct page_pool {
 	struct {
 		struct hlist_node list;
 		u64 detach_time;
-		u32 napi_id;
 		u32 id;
 	} user;
 };
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
index 57439787b9c2..2fb06d5f6d55 100644
--- a/net/core/page_pool_priv.h
+++ b/net/core/page_pool_priv.h
@@ -7,6 +7,8 @@ 
 
 #include "netmem_priv.h"
 
+extern struct mutex page_pools_lock;
+
 s32 page_pool_inflight(const struct page_pool *pool, bool strict);
 
 int page_pool_list(struct page_pool *pool);
diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..07b2bb1ce64f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6708,7 +6708,7 @@  void napi_resume_irqs(unsigned int napi_id)
 static void __napi_hash_add_with_id(struct napi_struct *napi,
 				    unsigned int napi_id)
 {
-	napi->napi_id = napi_id;
+	WRITE_ONCE(napi->napi_id, napi_id);
 	hlist_add_head_rcu(&napi->napi_hash_node,
 			   &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
 }
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a3de752c5178..ed0f89373259 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1147,7 +1147,9 @@  void page_pool_disable_direct_recycling(struct page_pool *pool)
 	WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
 	WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
 
+	mutex_lock(&page_pools_lock);
 	WRITE_ONCE(pool->p.napi, NULL);
+	mutex_unlock(&page_pools_lock);
 }
 EXPORT_SYMBOL(page_pool_disable_direct_recycling);
 
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 48335766c1bf..6677e0c2e256 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -3,6 +3,7 @@ 
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
 #include <linux/xarray.h>
+#include <net/busy_poll.h>
 #include <net/net_debug.h>
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/helpers.h>
@@ -14,10 +15,11 @@ 
 #include "netdev-genl-gen.h"
 
 static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
-/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
+/* Protects: page_pools, netdevice->page_pools, pool->p.napi, pool->slow.netdev,
+ *	pool->user.
  * Ordering: inside rtnl_lock
  */
-static DEFINE_MUTEX(page_pools_lock);
+DEFINE_MUTEX(page_pools_lock);
 
 /* Page pools are only reachable from user space (via netlink) if they are
  * linked to a netdev at creation time. Following page pool "visibility"
@@ -216,6 +218,7 @@  page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
 {
 	struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
 	size_t inflight, refsz;
+	unsigned int napi_id;
 	void *hdr;
 
 	hdr = genlmsg_iput(rsp, info);
@@ -229,8 +232,10 @@  page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
 	    nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
 			pool->slow.netdev->ifindex))
 		goto err_cancel;
-	if (pool->user.napi_id &&
-	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
+
+	napi_id = pool->p.napi ? READ_ONCE(pool->p.napi->napi_id) : 0;
+	if (napi_id >= MIN_NAPI_ID &&
+	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, napi_id))
 		goto err_cancel;
 
 	inflight = page_pool_inflight(pool, false);
@@ -319,8 +324,6 @@  int page_pool_list(struct page_pool *pool)
 	if (pool->slow.netdev) {
 		hlist_add_head(&pool->user.list,
 			       &pool->slow.netdev->page_pools);
-		pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
-
 		netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
 	}