diff mbox series

[RFC,net-next,1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>

Message ID 20230629152305.905962-2-aleksander.lobakin@intel.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series net: page_pool: a couple assorted optimizations | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next, async
netdev/apply fail Patch does not apply to net-next

Commit Message

Alexander Lobakin June 29, 2023, 3:23 p.m. UTC
Currently, touching <net/page_pool.h> triggers a rebuild of more than
a half of the kernel. That's because it's included in <linux/skbuff.h>.
And each new include to page_pool.h adds more [useless] data for the
toolchain to process per each source file from that pile.

In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB
recycling"), Matteo included it to be able to call a couple functions
defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info
as long as page pool owns the page") one of the calls was removed, so
only one left. It's the call to page_pool_return_skb_page() in
napi_frag_unref(). The function is external and doesn't have any
dependencies. Having include of very niche page_pool.h only for that
looks like an overkill.
As Alex noticed, the only thing that holds this function in page_pool.c
is %PP_SIGNATURE. By moving the check for magic a couple functions up,
the whole page_pool_return_skb_page() can be moved to skbuff.c.
The arguments for moving the check are the following:

1) It checks for a corner case that shouldn't ever happen when the code
   is sane. And execution speed doesn't matter on exception path, thus
   doing more calls before bailing out doesn't make any weather.
2) There are 2 users of the internal __page_pool_put_page(), where this
   check is moved: page_pool_put_defragged_page() and
   page_pool_put_page_bulk(). Both are exported and can be called from
   the drivers, but previously only the skb recycling path of the former
   was protected with the magic check. So this also makes the code a bit
   more reliable.

After the check is moved, teleport page_pool_return_skb_page() to
skbuff.c, just next to the main consumer, skb_pp_recycle(). It's used
also in napi_frag_unref() -> {__,}skb_frag_unref(), so no `static`
unfortunately. Maybe next time.
Now, after a few include fixes in the drivers, touching page_pool.h
only triggers rebuilding of the drivers using it and a couple core
networking files.

Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy
Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/ethernet/engleder/tsnep_main.c    |  1 +
 drivers/net/ethernet/freescale/fec_main.c     |  1 +
 .../marvell/octeontx2/nic/otx2_common.c       |  1 +
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  1 +
 .../ethernet/mellanox/mlx5/core/en/params.c   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  1 +
 drivers/net/wireless/mediatek/mt76/mt76.h     |  1 +
 include/linux/skbuff.h                        |  3 +-
 include/net/page_pool.h                       |  2 -
 net/core/page_pool.c                          | 52 +++++--------------
 net/core/skbuff.c                             | 28 ++++++++++
 11 files changed, 50 insertions(+), 42 deletions(-)

Comments

Alexander Duyck June 29, 2023, 4:55 p.m. UTC | #1
On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> Currently, touching <net/page_pool.h> triggers a rebuild of more than
> a half of the kernel. That's because it's included in <linux/skbuff.h>.
> And each new include to page_pool.h adds more [useless] data for the
> toolchain to process per each source file from that pile.
> 
> In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB
> recycling"), Matteo included it to be able to call a couple functions
> defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info
> as long as page pool owns the page") one of the calls was removed, so
> only one left. It's the call to page_pool_return_skb_page() in
> napi_frag_unref(). The function is external and doesn't have any
> dependencies. Having include of very niche page_pool.h only for that
> looks like an overkill.
> As Alex noticed, the only thing that holds this function in page_pool.c
> is %PP_SIGNATURE. By moving the check for magic a couple functions up,
> the whole page_pool_return_skb_page() can be moved to skbuff.c.
> The arguments for moving the check are the following:
> 
> 1) It checks for a corner case that shouldn't ever happen when the code
>    is sane. And execution speed doesn't matter on exception path, thus
>    doing more calls before bailing out doesn't make any weather.
> 2) There are 2 users of the internal __page_pool_put_page(), where this
>    check is moved: page_pool_put_defragged_page() and
>    page_pool_put_page_bulk(). Both are exported and can be called from
>    the drivers, but previously only the skb recycling path of the former
>    was protected with the magic check. So this also makes the code a bit
>    more reliable.
> 
> After the check is moved, teleport page_pool_return_skb_page() to
> skbuff.c, just next to the main consumer, skb_pp_recycle(). It's used
> also in napi_frag_unref() -> {__,}skb_frag_unref(), so no `static`
> unfortunately. Maybe next time.
> Now, after a few include fixes in the drivers, touching page_pool.h
> only triggers rebuilding of the drivers using it and a couple core
> networking files.
> 
> Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy
> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  drivers/net/ethernet/engleder/tsnep_main.c    |  1 +
>  drivers/net/ethernet/freescale/fec_main.c     |  1 +
>  .../marvell/octeontx2/nic/otx2_common.c       |  1 +
>  .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  1 +
>  .../ethernet/mellanox/mlx5/core/en/params.c   |  1 +
>  .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  1 +
>  drivers/net/wireless/mediatek/mt76/mt76.h     |  1 +
>  include/linux/skbuff.h                        |  3 +-
>  include/net/page_pool.h                       |  2 -
>  net/core/page_pool.c                          | 52 +++++--------------
>  net/core/skbuff.c                             | 28 ++++++++++
>  11 files changed, 50 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
> index 84751bb303a6..6222aaa5157f 100644
> --- a/drivers/net/ethernet/engleder/tsnep_main.c
> +++ b/drivers/net/ethernet/engleder/tsnep_main.c
> @@ -28,6 +28,7 @@
>  #include <linux/iopoll.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
> +#include <net/page_pool.h>
>  #include <net/xdp_sock_drv.h>
>  
>  #define TSNEP_RX_OFFSET (max(NET_SKB_PAD, XDP_PACKET_HEADROOM) + NET_IP_ALIGN)
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 8fbe47703d47..99b3b4f79603 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -38,6 +38,7 @@
>  #include <linux/in.h>
>  #include <linux/ip.h>
>  #include <net/ip.h>
> +#include <net/page_pool.h>
>  #include <net/selftests.h>
>  #include <net/tso.h>
>  #include <linux/tcp.h>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index a5d03583bf79..d17a0ebc9036 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -7,6 +7,7 @@
>  
>  #include <linux/interrupt.h>
>  #include <linux/pci.h>
> +#include <net/page_pool.h>
>  #include <net/tso.h>
>  #include <linux/bitfield.h>
>  
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index fe8ea4e531b7..7eca434a0550 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> @@ -16,6 +16,7 @@
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
>  #include <linux/bitfield.h>
> +#include <net/page_pool.h>
>  
>  #include "otx2_reg.h"
>  #include "otx2_common.h"
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> index 5ce28ff7685f..0f152f14165b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> @@ -6,6 +6,7 @@
>  #include "en/port.h"
>  #include "en_accel/en_accel.h"
>  #include "en_accel/ipsec.h"
> +#include <net/page_pool.h>
>  #include <net/xdp_sock_drv.h>
>  
>  static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index f0e6095809fa..1bd91bc09eb8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -35,6 +35,7 @@
>  #include "en/xdp.h"
>  #include "en/params.h"
>  #include <linux/bitfield.h>
> +#include <net/page_pool.h>
>  
>  int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk)
>  {
> diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
> index 6b07b8fafec2..95c16f11d156 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt76.h
> +++ b/drivers/net/wireless/mediatek/mt76/mt76.h
> @@ -15,6 +15,7 @@
>  #include <linux/average.h>
>  #include <linux/soc/mediatek/mtk_wed.h>
>  #include <net/mac80211.h>
> +#include <net/page_pool.h>
>  #include "util.h"
>  #include "testmode.h"
>  
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 91ed66952580..f76d172ed262 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -32,7 +32,6 @@
>  #include <linux/if_packet.h>
>  #include <linux/llist.h>
>  #include <net/flow.h>
> -#include <net/page_pool.h>
>  #if IS_ENABLED(CONFIG_NF_CONNTRACK)
>  #include <linux/netfilter/nf_conntrack_common.h>
>  #endif
> @@ -3423,6 +3422,8 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
>  	__skb_frag_ref(&skb_shinfo(skb)->frags[f]);
>  }
>  
> +bool page_pool_return_skb_page(struct page *page, bool napi_safe);
> +
>  static inline void
>  napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
>  {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 2b7db9992fc0..829dc1f8ba6b 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -307,8 +307,6 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
>  	return pool->p.dma_dir;
>  }
>  
> -bool page_pool_return_skb_page(struct page *page, bool napi_safe);
> -
>  struct page_pool *page_pool_create(const struct page_pool_params *params);
>  
>  struct xdp_mem_info;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 985ccaffc06a..dff0b4fa2316 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -582,6 +582,19 @@ static __always_inline struct page *
>  __page_pool_put_page(struct page_pool *pool, struct page *page,
>  		     unsigned int dma_sync_size, bool allow_direct)
>  {
> +	/* Avoid recycling non-PP pages, give them back to the page allocator.
> +	 * page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> +	 * in order to preserve any existing bits, such as bit 0 for the
> +	 * head page of compound page and bit 1 for pfmemalloc page, so
> +	 * mask those bits for freeing side when doing below checking,
> +	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> +	 * to avoid recycling the pfmemalloc page.
> +	 */
> +	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) {
> +		put_page(page);
> +		return NULL;
> +	}
> +

Rather than moving this block of code down into here. You may just want
to look at creating an inline function that would act as a accessor for
retrieving the page pool for pages with the signature, and for those
without just returning NULL.

>  	/* This allocator is optimized for the XDP mode that uses
>  	 * one-frame-per-page, but have fallbacks that act like the
>  	 * regular page allocator APIs.
> @@ -913,42 +926,3 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
>  	}
>  }
>  EXPORT_SYMBOL(page_pool_update_nid);
> -
> -bool page_pool_return_skb_page(struct page *page, bool napi_safe)
> -{
> -	struct napi_struct *napi;
> -	struct page_pool *pp;
> -	bool allow_direct;
> -
> -	page = compound_head(page);
> -
> -	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> -	 * in order to preserve any existing bits, such as bit 0 for the
> -	 * head page of compound page and bit 1 for pfmemalloc page, so
> -	 * mask those bits for freeing side when doing below checking,
> -	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> -	 * to avoid recycling the pfmemalloc page.
> -	 */
> -	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
> -		return false;
> -
> -	pp = page->pp;
> -
> -	/* Allow direct recycle if we have reasons to believe that we are
> -	 * in the same context as the consumer would run, so there's
> -	 * no possible race.
> -	 */
> -	napi = READ_ONCE(pp->p.napi);
> -	allow_direct = napi_safe && napi &&
> -		READ_ONCE(napi->list_owner) == smp_processor_id();
> -
> -	/* Driver set this to memory recycling info. Reset it on recycle.
> -	 * This will *not* work for NIC using a split-page memory model.
> -	 * The page will be returned to the pool here regardless of the
> -	 * 'flipped' fragment being in use or not.
> -	 */
> -	page_pool_put_full_page(pp, page, allow_direct);
> -
> -	return true;
> -}
> -EXPORT_SYMBOL(page_pool_return_skb_page);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7edabf17988a..4b7d00d5b5d7 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -879,6 +879,34 @@ static void skb_clone_fraglist(struct sk_buff *skb)
>  		skb_get(list);
>  }
>  
> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
> +{
> +	struct napi_struct *napi;
> +	struct page_pool *pp;
> +	bool allow_direct;
> +
> +	page = compound_head(page);
> +	pp = page->pp;

So this is just assuming that any page we pass thru is a page pool
page. The problem is there may be some other pointer stored here that
could cause issues.

I would suggest creating an accessor as mentioned above to verify it is
a page pool page before you access page->pp.

> +
> +	/* Allow direct recycle if we have reasons to believe that we are
> +	 * in the same context as the consumer would run, so there's
> +	 * no possible race.
> +	 */
> +	napi = READ_ONCE(pp->p.napi);
> +	allow_direct = napi_safe && napi &&
> +		READ_ONCE(napi->list_owner) == smp_processor_id();
> +
> +	/* Driver set this to memory recycling info. Reset it on recycle.
> +	 * This will *not* work for NIC using a split-page memory model.
> +	 * The page will be returned to the pool here regardless of the
> +	 * 'flipped' fragment being in use or not.
> +	 */
> +	page_pool_put_full_page(pp, page, allow_direct);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(page_pool_return_skb_page);
> +
>  static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
>  {
>  	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
Alexander Lobakin June 30, 2023, 12:39 p.m. UTC | #2
From: Alexander H Duyck <alexander.duyck@gmail.com>
Date: Thu, 29 Jun 2023 09:55:15 -0700

> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
>> Currently, touching <net/page_pool.h> triggers a rebuild of more than
>> a half of the kernel. That's because it's included in <linux/skbuff.h>.
>> And each new include to page_pool.h adds more [useless] data for the
>> toolchain to process per each source file from that pile.

[...]

>> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
>> +{
>> +	struct napi_struct *napi;
>> +	struct page_pool *pp;
>> +	bool allow_direct;
>> +
>> +	page = compound_head(page);
>> +	pp = page->pp;
> 
> So this is just assuming that any page we pass thru is a page pool
> page. The problem is there may be some other pointer stored here that
> could cause issues.

But that is exactly what you suggested in the previous revision's
thread... Hey! :D

"I suspect we could look at pulling parts of it out as well. The
pp_magic check should always be succeeding unless we have pages getting
routed the wrong way somewhere. So maybe we should look at pulling it
out and moving it to another part of the path such as
__page_pool_put_page() and making it a bit more visible to catch those
cases".

Anyway, since some drivers still mix PP pages with non-PP ones (mt76
IIRC, maybe more), I feel the check for magic is still relevant. I just
believed you and forgot about that T.T

> 
> I would suggest creating an accessor as mentioned above to verify it is
> a page pool page before you access page->pp.
> 
>> +
>> +	/* Allow direct recycle if we have reasons to believe that we are
[...]

Thanks,
Olek
Alexander Duyck June 30, 2023, 3:11 p.m. UTC | #3
On Fri, Jun 30, 2023 at 5:39 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Alexander H Duyck <alexander.duyck@gmail.com>
> Date: Thu, 29 Jun 2023 09:55:15 -0700
>
> > On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> >> Currently, touching <net/page_pool.h> triggers a rebuild of more than
> >> a half of the kernel. That's because it's included in <linux/skbuff.h>.
> >> And each new include to page_pool.h adds more [useless] data for the
> >> toolchain to process per each source file from that pile.
>
> [...]
>
> >> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
> >> +{
> >> +    struct napi_struct *napi;
> >> +    struct page_pool *pp;
> >> +    bool allow_direct;
> >> +
> >> +    page = compound_head(page);
> >> +    pp = page->pp;
> >
> > So this is just assuming that any page we pass thru is a page pool
> > page. The problem is there may be some other pointer stored here that
> > could cause issues.
>
> But that is exactly what you suggested in the previous revision's
> thread... Hey! :D
>
> "I suspect we could look at pulling parts of it out as well. The
> pp_magic check should always be succeeding unless we have pages getting
> routed the wrong way somewhere. So maybe we should look at pulling it
> out and moving it to another part of the path such as
> __page_pool_put_page() and making it a bit more visible to catch those
> cases".

Yeah, I have had a few off days recently, amazing what happens when
you are running on only a few hours of sleep.. :-/

Anyway, as I was saying it might make sense to wrap the whole thing up
as a page pool accessor that would return NULL if the MAGIC value
isn't present. Alternatively one other possibility would be to look at
creating an inline function here that could check to see if the
skb_frag is page pool and then just keep that in sk_buff.h since it
looks like it should only need to rely on the page struct and
PP_SIGNATURE which is a poison.h value. With that napi_frag_unref
could avoid an unnecessary trip into the page_pool_return_skb_page
function entirely if it isn't a page pool page and we could look at
dropping the return value from page_pool_return_skb_page entirely.

> Anyway, since some drivers still mix PP pages with non-PP ones (mt76
> IIRC, maybe more), I feel the check for magic is still relevant. I just
> believed you and forgot about that T.T

Yeah, sorry about that, my bad. I was a bit too focused on the main
drivers we use and not thinking outside the box enough.
Alexander Lobakin June 30, 2023, 4:05 p.m. UTC | #4
From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 30 Jun 2023 08:11:02 -0700

> On Fri, Jun 30, 2023 at 5:39 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Alexander H Duyck <alexander.duyck@gmail.com>
>> Date: Thu, 29 Jun 2023 09:55:15 -0700
>>
>>> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
>>>> Currently, touching <net/page_pool.h> triggers a rebuild of more than
>>>> a half of the kernel. That's because it's included in <linux/skbuff.h>.
>>>> And each new include to page_pool.h adds more [useless] data for the
>>>> toolchain to process per each source file from that pile.
>>
>> [...]
>>
>>>> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
>>>> +{
>>>> +    struct napi_struct *napi;
>>>> +    struct page_pool *pp;
>>>> +    bool allow_direct;
>>>> +
>>>> +    page = compound_head(page);
>>>> +    pp = page->pp;
>>>
>>> So this is just assuming that any page we pass thru is a page pool
>>> page. The problem is there may be some other pointer stored here that
>>> could cause issues.
>>
>> But that is exactly what you suggested in the previous revision's
>> thread... Hey! :D
>>
>> "I suspect we could look at pulling parts of it out as well. The
>> pp_magic check should always be succeeding unless we have pages getting
>> routed the wrong way somewhere. So maybe we should look at pulling it
>> out and moving it to another part of the path such as
>> __page_pool_put_page() and making it a bit more visible to catch those
>> cases".
> 
> Yeah, I have had a few off days recently, amazing what happens when
> you are running on only a few hours of sleep.. :-/

Aaah I see :D

> 
> Anyway, as I was saying it might make sense to wrap the whole thing up
> as a page pool accessor that would return NULL if the MAGIC value
> isn't present. Alternatively one other possibility would be to look at
> creating an inline function here that could check to see if the
> skb_frag is page pool and then just keep that in sk_buff.h since it
> looks like it should only need to rely on the page struct and
> PP_SIGNATURE which is a poison.h value. With that napi_frag_unref
> could avoid an unnecessary trip into the page_pool_return_skb_page
> function entirely if it isn't a page pool page and we could look at
> dropping the return value from page_pool_return_skb_page entirely.

I like this one. This is rather simple check and shouldn't cause code
size inflating, but I'll recheck with bloat-o-meter just in case.

> 
>> Anyway, since some drivers still mix PP pages with non-PP ones (mt76
>> IIRC, maybe more), I feel the check for magic is still relevant. I just
>> believed you and forgot about that T.T
> 
> Yeah, sorry about that, my bad. I was a bit too focused on the main
> drivers we use and not thinking outside the box enough.

Not a prob!

Thanks,
Olek
diff mbox series

Patch

diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
index 84751bb303a6..6222aaa5157f 100644
--- a/drivers/net/ethernet/engleder/tsnep_main.c
+++ b/drivers/net/ethernet/engleder/tsnep_main.c
@@ -28,6 +28,7 @@ 
 #include <linux/iopoll.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <net/page_pool.h>
 #include <net/xdp_sock_drv.h>
 
 #define TSNEP_RX_OFFSET (max(NET_SKB_PAD, XDP_PACKET_HEADROOM) + NET_IP_ALIGN)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 8fbe47703d47..99b3b4f79603 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -38,6 +38,7 @@ 
 #include <linux/in.h>
 #include <linux/ip.h>
 #include <net/ip.h>
+#include <net/page_pool.h>
 #include <net/selftests.h>
 #include <net/tso.h>
 #include <linux/tcp.h>
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index a5d03583bf79..d17a0ebc9036 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -7,6 +7,7 @@ 
 
 #include <linux/interrupt.h>
 #include <linux/pci.h>
+#include <net/page_pool.h>
 #include <net/tso.h>
 #include <linux/bitfield.h>
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index fe8ea4e531b7..7eca434a0550 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -16,6 +16,7 @@ 
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <linux/bitfield.h>
+#include <net/page_pool.h>
 
 #include "otx2_reg.h"
 #include "otx2_common.h"
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 5ce28ff7685f..0f152f14165b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -6,6 +6,7 @@ 
 #include "en/port.h"
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec.h"
+#include <net/page_pool.h>
 #include <net/xdp_sock_drv.h>
 
 static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f0e6095809fa..1bd91bc09eb8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -35,6 +35,7 @@ 
 #include "en/xdp.h"
 #include "en/params.h"
 #include <linux/bitfield.h>
+#include <net/page_pool.h>
 
 int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk)
 {
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
index 6b07b8fafec2..95c16f11d156 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76.h
@@ -15,6 +15,7 @@ 
 #include <linux/average.h>
 #include <linux/soc/mediatek/mtk_wed.h>
 #include <net/mac80211.h>
+#include <net/page_pool.h>
 #include "util.h"
 #include "testmode.h"
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 91ed66952580..f76d172ed262 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -32,7 +32,6 @@ 
 #include <linux/if_packet.h>
 #include <linux/llist.h>
 #include <net/flow.h>
-#include <net/page_pool.h>
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 #include <linux/netfilter/nf_conntrack_common.h>
 #endif
@@ -3423,6 +3422,8 @@  static inline void skb_frag_ref(struct sk_buff *skb, int f)
 	__skb_frag_ref(&skb_shinfo(skb)->frags[f]);
 }
 
+bool page_pool_return_skb_page(struct page *page, bool napi_safe);
+
 static inline void
 napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
 {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 2b7db9992fc0..829dc1f8ba6b 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -307,8 +307,6 @@  inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
 	return pool->p.dma_dir;
 }
 
-bool page_pool_return_skb_page(struct page *page, bool napi_safe);
-
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
 struct xdp_mem_info;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 985ccaffc06a..dff0b4fa2316 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -582,6 +582,19 @@  static __always_inline struct page *
 __page_pool_put_page(struct page_pool *pool, struct page *page,
 		     unsigned int dma_sync_size, bool allow_direct)
 {
+	/* Avoid recycling non-PP pages, give them back to the page allocator.
+	 * page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
+	 * in order to preserve any existing bits, such as bit 0 for the
+	 * head page of compound page and bit 1 for pfmemalloc page, so
+	 * mask those bits for freeing side when doing below checking,
+	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
+	 * to avoid recycling the pfmemalloc page.
+	 */
+	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) {
+		put_page(page);
+		return NULL;
+	}
+
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
 	 * regular page allocator APIs.
@@ -913,42 +926,3 @@  void page_pool_update_nid(struct page_pool *pool, int new_nid)
 	}
 }
 EXPORT_SYMBOL(page_pool_update_nid);
-
-bool page_pool_return_skb_page(struct page *page, bool napi_safe)
-{
-	struct napi_struct *napi;
-	struct page_pool *pp;
-	bool allow_direct;
-
-	page = compound_head(page);
-
-	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
-	 * in order to preserve any existing bits, such as bit 0 for the
-	 * head page of compound page and bit 1 for pfmemalloc page, so
-	 * mask those bits for freeing side when doing below checking,
-	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
-	 * to avoid recycling the pfmemalloc page.
-	 */
-	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
-		return false;
-
-	pp = page->pp;
-
-	/* Allow direct recycle if we have reasons to believe that we are
-	 * in the same context as the consumer would run, so there's
-	 * no possible race.
-	 */
-	napi = READ_ONCE(pp->p.napi);
-	allow_direct = napi_safe && napi &&
-		READ_ONCE(napi->list_owner) == smp_processor_id();
-
-	/* Driver set this to memory recycling info. Reset it on recycle.
-	 * This will *not* work for NIC using a split-page memory model.
-	 * The page will be returned to the pool here regardless of the
-	 * 'flipped' fragment being in use or not.
-	 */
-	page_pool_put_full_page(pp, page, allow_direct);
-
-	return true;
-}
-EXPORT_SYMBOL(page_pool_return_skb_page);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7edabf17988a..4b7d00d5b5d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -879,6 +879,34 @@  static void skb_clone_fraglist(struct sk_buff *skb)
 		skb_get(list);
 }
 
+bool page_pool_return_skb_page(struct page *page, bool napi_safe)
+{
+	struct napi_struct *napi;
+	struct page_pool *pp;
+	bool allow_direct;
+
+	page = compound_head(page);
+	pp = page->pp;
+
+	/* Allow direct recycle if we have reasons to believe that we are
+	 * in the same context as the consumer would run, so there's
+	 * no possible race.
+	 */
+	napi = READ_ONCE(pp->p.napi);
+	allow_direct = napi_safe && napi &&
+		READ_ONCE(napi->list_owner) == smp_processor_id();
+
+	/* Driver set this to memory recycling info. Reset it on recycle.
+	 * This will *not* work for NIC using a split-page memory model.
+	 * The page will be returned to the pool here regardless of the
+	 * 'flipped' fragment being in use or not.
+	 */
+	page_pool_put_full_page(pp, page, allow_direct);
+
+	return true;
+}
+EXPORT_SYMBOL(page_pool_return_skb_page);
+
 static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
 {
 	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)