Message ID | 20250123231620.1086401-1-kuba@kernel.org (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: page_pool: don't try to stash the napi id | expand |
On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Page ppol tried to cache the NAPI ID in page pool info to avoid Page pool > having a dependency on the life cycle of the NAPI instance. > Since commit under Fixes the NAPI ID is not populated until > napi_enable() and there's a good chance that page pool is > created before NAPI gets enabled. > > Protect the NAPI pointer with the existing page pool mutex, > the reading path already holds it. napi_id itself we need The reading paths in page_pool.c don't hold the lock, no? Only the reading paths in page_pool_user.c seem to do. I could not immediately wrap my head around why pool->p.napi can be accessed in page_pool_napi_local with no lock, but needs to be protected in the code in page_pool_user.c. It seems READ_ONCE/WRITE_ONCE protection is good enough to make sure page_pool_napi_local doesn't race with page_pool_disable_direct_recycling in a way that can crash (the reading code either sees a valid pointer or NULL). Why is that not good enough to also synchronize the accesses between page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop the locking? Is there some guarantee the napi won't change/get freed while page_pool_local is running, but can change while page_pool_nl_fill is running? > to READ_ONCE(), it's protected by netdev_lock() which are > not holding in page pool. > > Before this patch napi IDs were missing for mlx5: > > # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get > > [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912}, > {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528}, > {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520}, > {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232}, > ... > > After: > > [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912, > 'napi-id': 565}, > {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504, > 'napi-id': 525}, > {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648, > 'napi-id': 524}, > ... > > Fixes: 86e25f40aa1e ("net: napi: Add napi_config") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > CC: hawk@kernel.org > CC: ilias.apalodimas@linaro.org > CC: asml.silence@gmail.com > CC: almasrymina@google.com > CC: kaiyuanz@google.com > CC: willemb@google.com > CC: mkarsten@uwaterloo.ca > CC: jdamato@fastly.com > --- > include/net/page_pool/types.h | 1 - > net/core/page_pool_priv.h | 2 ++ > net/core/dev.c | 2 +- > net/core/page_pool.c | 2 ++ > net/core/page_pool_user.c | 15 +++++++++------ > 5 files changed, 14 insertions(+), 8 deletions(-) > > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index ed4cd114180a..7f405672b089 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -237,7 +237,6 @@ struct page_pool { > struct { > struct hlist_node list; > u64 detach_time; > - u32 napi_id; > u32 id; > } user; > }; > diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h > index 57439787b9c2..2fb06d5f6d55 100644 > --- a/net/core/page_pool_priv.h > +++ b/net/core/page_pool_priv.h > @@ -7,6 +7,8 @@ > > #include "netmem_priv.h" > > +extern struct mutex page_pools_lock; > + > s32 page_pool_inflight(const struct page_pool *pool, bool strict); > > int page_pool_list(struct page_pool *pool); > diff --git a/net/core/dev.c b/net/core/dev.c > index afa2282f2604..07b2bb1ce64f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6708,7 +6708,7 @@ void napi_resume_irqs(unsigned int napi_id) > static void __napi_hash_add_with_id(struct napi_struct *napi, > unsigned int napi_id) > { > - napi->napi_id = napi_id; > + WRITE_ONCE(napi->napi_id, napi_id); > hlist_add_head_rcu(&napi->napi_hash_node, > &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]); > } > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index a3de752c5178..ed0f89373259 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -1147,7 +1147,9 @@ void page_pool_disable_direct_recycling(struct page_pool *pool) > WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state)); > WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1); > > + mutex_lock(&page_pools_lock); > WRITE_ONCE(pool->p.napi, NULL); > + mutex_unlock(&page_pools_lock); > } > EXPORT_SYMBOL(page_pool_disable_direct_recycling); > > diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c > index 48335766c1bf..6677e0c2e256 100644 > --- a/net/core/page_pool_user.c > +++ b/net/core/page_pool_user.c > @@ -3,6 +3,7 @@ > #include <linux/mutex.h> > #include <linux/netdevice.h> > #include <linux/xarray.h> > +#include <net/busy_poll.h> > #include <net/net_debug.h> > #include <net/netdev_rx_queue.h> > #include <net/page_pool/helpers.h> > @@ -14,10 +15,11 @@ > #include "netdev-genl-gen.h" > > static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1); > -/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user. > +/* Protects: page_pools, netdevice->page_pools, pool->p.napi, pool->slow.netdev, > + * pool->user. > * Ordering: inside rtnl_lock > */ > -static DEFINE_MUTEX(page_pools_lock); > +DEFINE_MUTEX(page_pools_lock); > > /* Page pools are only reachable from user space (via netlink) if they are > * linked to a netdev at creation time. Following page pool "visibility" > @@ -216,6 +218,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, > { > struct net_devmem_dmabuf_binding *binding = pool->mp_priv; > size_t inflight, refsz; > + unsigned int napi_id; > void *hdr; > > hdr = genlmsg_iput(rsp, info); > @@ -229,8 +232,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, > nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX, > pool->slow.netdev->ifindex)) > goto err_cancel; > - if (pool->user.napi_id && > - nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id)) > + > + napi_id = pool->p.napi ? READ_ONCE(pool->p.napi->napi_id) : 0; Flowing up on above, I wonder if this can be similar to the code in page_pool_napi_local to work without the mutex protection: napi = READ_ONCE(pool->p.napi); if (napi) napi_id = READ_ONCE(napi->napi_id); > + if (napi_id >= MIN_NAPI_ID && I think this check is added to filter out 0? Nit: I would check for 0 here, since any non zero napi_id should come from the napi->napi_id, which should be valid, but not a necessary change. > + nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, napi_id)) > goto err_cancel; > > inflight = page_pool_inflight(pool, false); > @@ -319,8 +324,6 @@ int page_pool_list(struct page_pool *pool) > if (pool->slow.netdev) { > hlist_add_head(&pool->user.list, > &pool->slow.netdev->page_pools); > - pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0; > - > netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF); > } > > -- > 2.48.1 >
Mina Almasry <almasrymina@google.com> writes: > On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote: >> >> Page ppol tried to cache the NAPI ID in page pool info to avoid > > Page pool > >> having a dependency on the life cycle of the NAPI instance. >> Since commit under Fixes the NAPI ID is not populated until >> napi_enable() and there's a good chance that page pool is >> created before NAPI gets enabled. >> >> Protect the NAPI pointer with the existing page pool mutex, >> the reading path already holds it. napi_id itself we need > > The reading paths in page_pool.c don't hold the lock, no? Only the > reading paths in page_pool_user.c seem to do. > > I could not immediately wrap my head around why pool->p.napi can be > accessed in page_pool_napi_local with no lock, but needs to be > protected in the code in page_pool_user.c. It seems > READ_ONCE/WRITE_ONCE protection is good enough to make sure > page_pool_napi_local doesn't race with > page_pool_disable_direct_recycling in a way that can crash (the > reading code either sees a valid pointer or NULL). Why is that not > good enough to also synchronize the accesses between > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop > the locking? It actually seems that this is *not* currently the case. See the discussion here: https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/ IMO (as indicated in the message linked above), we should require users to destroy the page pool before freeing the NAPI memory, rather than add additional synchronisation. -Toke
On Fri, Jan 24, 2025 at 2:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Mina Almasry <almasrymina@google.com> writes: > > > On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote: > >> > >> Page ppol tried to cache the NAPI ID in page pool info to avoid > > > > Page pool > > > >> having a dependency on the life cycle of the NAPI instance. > >> Since commit under Fixes the NAPI ID is not populated until > >> napi_enable() and there's a good chance that page pool is > >> created before NAPI gets enabled. > >> > >> Protect the NAPI pointer with the existing page pool mutex, > >> the reading path already holds it. napi_id itself we need > > > > The reading paths in page_pool.c don't hold the lock, no? Only the > > reading paths in page_pool_user.c seem to do. > > > > I could not immediately wrap my head around why pool->p.napi can be > > accessed in page_pool_napi_local with no lock, but needs to be > > protected in the code in page_pool_user.c. It seems > > READ_ONCE/WRITE_ONCE protection is good enough to make sure > > page_pool_napi_local doesn't race with > > page_pool_disable_direct_recycling in a way that can crash (the > > reading code either sees a valid pointer or NULL). Why is that not > > good enough to also synchronize the accesses between > > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop > > the locking? > > It actually seems that this is *not* currently the case. See the > discussion here: > > https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/ > > IMO (as indicated in the message linked above), we should require users > to destroy the page pool before freeing the NAPI memory, rather than add > additional synchronisation. > Ah, I see. I wonder if we should make this part of the API via comment and/or add DEBUG_NET_WARN_ON to catch misuse, something like: diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index ed4cd114180a..3919ca302e95 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -257,6 +257,10 @@ struct xdp_mem_info; #ifdef CONFIG_PAGE_POOL void page_pool_disable_direct_recycling(struct page_pool *pool); + +/* page_pool_destroy or page_pool_disable_direct_recycling must be called before + * netif_napi_del if pool->p.napi is set. + */ void page_pool_destroy(struct page_pool *pool); void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), const struct xdp_mem_info *mem); diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 5c4b788b811b..dc82767b2516 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1161,6 +1161,8 @@ void page_pool_destroy(struct page_pool *pool) if (!page_pool_put(pool)) return; + DEBUG_NET_WARN_ON(pool->p.napi && !napi_is_valid(pool->p.napi)); + page_pool_disable_direct_recycling(pool); page_pool_free_frag(pool); I also took a quick spot check - which could be wrong - but it seems to me both gve and bnxt free the napi before destroying the pool :( But I think this entire discussion is unrelated to this patch, so and the mutex sync in this patch seems necessary for the page_pool_user.c code which runs outside of softirq context: Reviewed-by: Mina Almasry <almasrymina@google.com>
On Fri, 24 Jan 2025 23:18:08 +0100 Toke Høiland-Jørgensen wrote: > > The reading paths in page_pool.c don't hold the lock, no? Only the > > reading paths in page_pool_user.c seem to do. > > > > I could not immediately wrap my head around why pool->p.napi can be > > accessed in page_pool_napi_local with no lock, but needs to be > > protected in the code in page_pool_user.c. It seems > > READ_ONCE/WRITE_ONCE protection is good enough to make sure > > page_pool_napi_local doesn't race with > > page_pool_disable_direct_recycling in a way that can crash (the > > reading code either sees a valid pointer or NULL). Why is that not > > good enough to also synchronize the accesses between > > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop > > the locking? > > It actually seems that this is *not* currently the case. See the > discussion here: > > https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/ > > IMO (as indicated in the message linked above), we should require users > to destroy the page pool before freeing the NAPI memory, rather than add > additional synchronisation. Agreed in general but this is a slightly different case. This sequence should be legal IMHO: page_pool_disable_direct_recycling() napi_disable() netif_napi_del() # free NAPI page_pool_destroy() I'm not saying it's a good idea! but since page_pool_disable_direct_recycling() detaches the NAPI, logically someone could assume the above works. I agree with you on datapath accesses, as discussed in the thread you linked. But here reader is not under RCU, so the RCU sync in NAPI destruction does not protect us from reader stalling for a long time.
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index ed4cd114180a..7f405672b089 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -237,7 +237,6 @@ struct page_pool { struct { struct hlist_node list; u64 detach_time; - u32 napi_id; u32 id; } user; }; diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h index 57439787b9c2..2fb06d5f6d55 100644 --- a/net/core/page_pool_priv.h +++ b/net/core/page_pool_priv.h @@ -7,6 +7,8 @@ #include "netmem_priv.h" +extern struct mutex page_pools_lock; + s32 page_pool_inflight(const struct page_pool *pool, bool strict); int page_pool_list(struct page_pool *pool); diff --git a/net/core/dev.c b/net/core/dev.c index afa2282f2604..07b2bb1ce64f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6708,7 +6708,7 @@ void napi_resume_irqs(unsigned int napi_id) static void __napi_hash_add_with_id(struct napi_struct *napi, unsigned int napi_id) { - napi->napi_id = napi_id; + WRITE_ONCE(napi->napi_id, napi_id); hlist_add_head_rcu(&napi->napi_hash_node, &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]); } diff --git a/net/core/page_pool.c b/net/core/page_pool.c index a3de752c5178..ed0f89373259 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1147,7 +1147,9 @@ void page_pool_disable_direct_recycling(struct page_pool *pool) WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state)); WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1); + mutex_lock(&page_pools_lock); WRITE_ONCE(pool->p.napi, NULL); + mutex_unlock(&page_pools_lock); } EXPORT_SYMBOL(page_pool_disable_direct_recycling); diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c index 48335766c1bf..6677e0c2e256 100644 --- a/net/core/page_pool_user.c +++ b/net/core/page_pool_user.c @@ -3,6 +3,7 @@ #include <linux/mutex.h> #include <linux/netdevice.h> #include <linux/xarray.h> +#include <net/busy_poll.h> #include <net/net_debug.h> #include <net/netdev_rx_queue.h> #include <net/page_pool/helpers.h> @@ -14,10 +15,11 @@ #include "netdev-genl-gen.h" static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1); -/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user. +/* Protects: page_pools, netdevice->page_pools, pool->p.napi, pool->slow.netdev, + * pool->user. * Ordering: inside rtnl_lock */ -static DEFINE_MUTEX(page_pools_lock); +DEFINE_MUTEX(page_pools_lock); /* Page pools are only reachable from user space (via netlink) if they are * linked to a netdev at creation time. Following page pool "visibility" @@ -216,6 +218,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, { struct net_devmem_dmabuf_binding *binding = pool->mp_priv; size_t inflight, refsz; + unsigned int napi_id; void *hdr; hdr = genlmsg_iput(rsp, info); @@ -229,8 +232,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX, pool->slow.netdev->ifindex)) goto err_cancel; - if (pool->user.napi_id && - nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id)) + + napi_id = pool->p.napi ? READ_ONCE(pool->p.napi->napi_id) : 0; + if (napi_id >= MIN_NAPI_ID && + nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, napi_id)) goto err_cancel; inflight = page_pool_inflight(pool, false); @@ -319,8 +324,6 @@ int page_pool_list(struct page_pool *pool) if (pool->slow.netdev) { hlist_add_head(&pool->user.list, &pool->slow.netdev->page_pools); - pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0; - netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF); }
Page ppol tried to cache the NAPI ID in page pool info to avoid having a dependency on the life cycle of the NAPI instance. Since commit under Fixes the NAPI ID is not populated until napi_enable() and there's a good chance that page pool is created before NAPI gets enabled. Protect the NAPI pointer with the existing page pool mutex, the reading path already holds it. napi_id itself we need to READ_ONCE(), it's protected by netdev_lock() which are not holding in page pool. Before this patch napi IDs were missing for mlx5: # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912}, {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528}, {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520}, {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232}, ... After: [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912, 'napi-id': 565}, {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504, 'napi-id': 525}, {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648, 'napi-id': 524}, ... Fixes: 86e25f40aa1e ("net: napi: Add napi_config") Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- CC: hawk@kernel.org CC: ilias.apalodimas@linaro.org CC: asml.silence@gmail.com CC: almasrymina@google.com CC: kaiyuanz@google.com CC: willemb@google.com CC: mkarsten@uwaterloo.ca CC: jdamato@fastly.com --- include/net/page_pool/types.h | 1 - net/core/page_pool_priv.h | 2 ++ net/core/dev.c | 2 +- net/core/page_pool.c | 2 ++ net/core/page_pool_user.c | 15 +++++++++------ 5 files changed, 14 insertions(+), 8 deletions(-)