diff mbox series

[RFC,net-next] net: veth: reduce page_pool memory footprint using half page per-buffer

Message ID d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next] net: veth: reduce page_pool memory footprint using half page per-buffer | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Lorenzo Bianconi May 12, 2023, 1:08 p.m. UTC
In order to reduce page_pool memory footprint, rely on
page_pool_dev_alloc_frag routine and reduce buffer size
(VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
(XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
supported MTU is now reduced to 36350B.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 14 deletions(-)

Comments

Alexander Lobakin May 12, 2023, 1:43 p.m. UTC | #1
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Fri, 12 May 2023 15:08:13 +0200

> In order to reduce page_pool memory footprint, rely on
> page_pool_dev_alloc_frag routine and reduce buffer size
> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> supported MTU is now reduced to 36350B.

I thought we're stepping away from page splitting bit by bit O_o
Primarily for the reasons you mentioned / worked around here: it creates
several significant limitations and at least on 64-bit systems it
doesn't scale anymore. 192 bytes of headroom is less than what XDP
expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
reallocates heads only to have 256+ there?), 384 bytes of shinfo can
change anytime and even now page split simply blocks you from increasing
MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
BTW Intel drivers suffer from the very same things due solely to page
split (and I'm almost done with converting at least some of them to Page
Pool and 1 page per buffer model), I don't recommend deliberately
falling into that pit =\ :D

> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
>  1 file changed, 25 insertions(+), 14 deletions(-)
[...]

Thanks,
Olek
Lorenzo Bianconi May 12, 2023, 2:14 p.m. UTC | #2
> From: Lorenzo Bianconi <lorenzo@kernel.org>
> Date: Fri, 12 May 2023 15:08:13 +0200
> 
> > In order to reduce page_pool memory footprint, rely on
> > page_pool_dev_alloc_frag routine and reduce buffer size
> > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > supported MTU is now reduced to 36350B.
> 
> I thought we're stepping away from page splitting bit by bit O_o

do you mean to driver private page_split implementation? AFAIK we are not
stepping away from page_pool page split implementation (or maybe I missed it :))

> Primarily for the reasons you mentioned / worked around here: it creates
> several significant limitations and at least on 64-bit systems it
> doesn't scale anymore. 192 bytes of headroom is less than what XDP
> expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
> reallocates heads only to have 256+ there?), 384 bytes of shinfo can
> change anytime and even now page split simply blocks you from increasing
> MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
> BTW Intel drivers suffer from the very same things due solely to page
> split (and I'm almost done with converting at least some of them to Page
> Pool and 1 page per buffer model), I don't recommend deliberately
> falling into that pit =\ :D

I am not sure about the 192 vs 256 bytes of headroom (this is why I sent this
patch as RFC, my main goal is to discuss about this requirement). In the
previous discussion [0] we deferred this implementation since if we do not
reduce requested xdp headroom, we will not be able to fit two 1500B frames
into a single page (for skb_shared_info size [1]) and we introduce a performance
penalty.

Regards,
Lorenzo

[0] https://lore.kernel.org/netdev/6298f73f7cc7391c7c4a52a6a89b1ae21488bda1.1682188837.git.lorenzo@kernel.org/
[1] $ pahole -C skb_shared_info vmlinux.o 
struct skb_shared_info {
        __u8                       flags;                /*     0     1 */
        __u8                       meta_len;             /*     1     1 */
        __u8                       nr_frags;             /*     2     1 */
        __u8                       tx_flags;             /*     3     1 */
        unsigned short             gso_size;             /*     4     2 */
        unsigned short             gso_segs;             /*     6     2 */
        struct sk_buff *           frag_list;            /*     8     8 */
        struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
        unsigned int               gso_type;             /*    24     4 */
        u32                        tskey;                /*    28     4 */
        atomic_t                   dataref;              /*    32     4 */
        unsigned int               xdp_frags_size;       /*    36     4 */
        void *                     destructor_arg;       /*    40     8 */
        skb_frag_t                 frags[17];            /*    48   272 */

        /* size: 320, cachelines: 5, members: 14 */
};

> 
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
> >  1 file changed, 25 insertions(+), 14 deletions(-)
> [...]
> 
> Thanks,
> Olek
Yunsheng Lin May 15, 2023, 11:10 a.m. UTC | #3
On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> In order to reduce page_pool memory footprint, rely on
> page_pool_dev_alloc_frag routine and reduce buffer size
> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page

Is there any performance improvement beside the memory saving? As it
should reduce TLB miss, I wonder if the TLB miss reducing can even
out the cost of the extra frag reference count handling for the
frag support?

> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> supported MTU is now reduced to 36350B.

Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
and use different frag size depending on the mtu or packet size?

Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
page if the requested frag size is larger than a specified size too.
I will try to implement it if the above idea makes sense.
Lorenzo Bianconi May 15, 2023, 11:24 a.m. UTC | #4
> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> > In order to reduce page_pool memory footprint, rely on
> > page_pool_dev_alloc_frag routine and reduce buffer size
> > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> 
> Is there any performance improvement beside the memory saving? As it
> should reduce TLB miss, I wonder if the TLB miss reducing can even
> out the cost of the extra frag reference count handling for the
> frag support?

reducing the requested headroom to 192 (from 256) we have a nice improvement in
the 1500B frame case while it is mostly the same in the case of paged skb
(e.g. MTU 8000B).

> 
> > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > supported MTU is now reduced to 36350B.
> 
> Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
> and use different frag size depending on the mtu or packet size?
> 
> Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
> page if the requested frag size is larger than a specified size too.
> I will try to implement it if the above idea makes sense.
> 

since there are no significant differences between full page and fragmented page
implementation if the MTU is over the page boundary, does it worth to do so?
(at least for the veth use-case).

Regards,
Lorenzo
Maciej Fijalkowski May 15, 2023, 1:11 p.m. UTC | #5
On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
> > On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> > > In order to reduce page_pool memory footprint, rely on
> > > page_pool_dev_alloc_frag routine and reduce buffer size
> > > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> > 
> > Is there any performance improvement beside the memory saving? As it
> > should reduce TLB miss, I wonder if the TLB miss reducing can even
> > out the cost of the extra frag reference count handling for the
> > frag support?
> 
> reducing the requested headroom to 192 (from 256) we have a nice improvement in
> the 1500B frame case while it is mostly the same in the case of paged skb
> (e.g. MTU 8000B).

Can you define 'nice improvement' ? ;)
Show us numbers or improvement in %.

> 
> > 
> > > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > > supported MTU is now reduced to 36350B.
> > 
> > Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
> > and use different frag size depending on the mtu or packet size?
> > 
> > Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
> > page if the requested frag size is larger than a specified size too.
> > I will try to implement it if the above idea makes sense.
> > 
> 
> since there are no significant differences between full page and fragmented page
> implementation if the MTU is over the page boundary, does it worth to do so?
> (at least for the veth use-case).
> 
> Regards,
> Lorenzo
>
Alexander Lobakin May 15, 2023, 4:36 p.m. UTC | #6
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Fri, 12 May 2023 16:14:48 +0200

>> From: Lorenzo Bianconi <lorenzo@kernel.org>
>> Date: Fri, 12 May 2023 15:08:13 +0200
>>
>>> In order to reduce page_pool memory footprint, rely on
>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
>>> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
>>> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
>>> supported MTU is now reduced to 36350B.
>>
>> I thought we're stepping away from page splitting bit by bit O_o
> 
> do you mean to driver private page_split implementation? AFAIK we are not
> stepping away from page_pool page split implementation (or maybe I missed it :))

Page split in general. Since early-mid 2021, Page Pool with 1 page per
frame shows results comparable to drivers doing page split, but it
doesn't have such MTU / headroom / ... limitations.

> 
>> Primarily for the reasons you mentioned / worked around here: it creates
>> several significant limitations and at least on 64-bit systems it
>> doesn't scale anymore. 192 bytes of headroom is less than what XDP
>> expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
>> reallocates heads only to have 256+ there?), 384 bytes of shinfo can
>> change anytime and even now page split simply blocks you from increasing
>> MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
>> BTW Intel drivers suffer from the very same things due solely to page
>> split (and I'm almost done with converting at least some of them to Page
>> Pool and 1 page per buffer model), I don't recommend deliberately
>> falling into that pit =\ :D
> 
> I am not sure about the 192 vs 256 bytes of headroom (this is why I sent this
> patch as RFC, my main goal is to discuss about this requirement). In the
> previous discussion [0] we deferred this implementation since if we do not
> reduce requested xdp headroom, we will not be able to fit two 1500B frames
> into a single page (for skb_shared_info size [1]) and we introduce a performance
> penalty.

Yes, that's what I'm talking about. In order to fit two 1500-byte frames
onto one page on x86_64, you need to have at most 192 bytes of headroom
(192 + 320 of shinfo = 512 per frame / 1024 per page), but XDP requires
256. And then you have one more problem from the other side, I mean
shinfo size. It can change anytime because it's not UAPI or ABI or
whatever and nobody can say "hey, don't touch it, you break my page
split", at the same time with page splitting you're not able to increase
MAX_SKB_FRAGS.
And for MTU > 1536 this is all worthless, just a waste of cycles. With 1
page per frame you can have up to 3.5k per fragment.

You mentioned memory footprint. Do you have any exact numbers to show
this can help significantly?
Because I have PP on my home router with 16k-sized pages and 128 Mb of
RAM and there's no memory shortage there :D I realize it doesn't mean
anything and I'm mostly joking mentioning this, but still.

> 
> Regards,
> Lorenzo
> 
> [0] https://lore.kernel.org/netdev/6298f73f7cc7391c7c4a52a6a89b1ae21488bda1.1682188837.git.lorenzo@kernel.org/
> [1] $ pahole -C skb_shared_info vmlinux.o 
> struct skb_shared_info {
>         __u8                       flags;                /*     0     1 */
>         __u8                       meta_len;             /*     1     1 */
>         __u8                       nr_frags;             /*     2     1 */
>         __u8                       tx_flags;             /*     3     1 */
>         unsigned short             gso_size;             /*     4     2 */
>         unsigned short             gso_segs;             /*     6     2 */
>         struct sk_buff *           frag_list;            /*     8     8 */
>         struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
>         unsigned int               gso_type;             /*    24     4 */
>         u32                        tskey;                /*    28     4 */
>         atomic_t                   dataref;              /*    32     4 */
>         unsigned int               xdp_frags_size;       /*    36     4 */
>         void *                     destructor_arg;       /*    40     8 */
>         skb_frag_t                 frags[17];            /*    48   272 */
> 
>         /* size: 320, cachelines: 5, members: 14 */
> };
> 
>>
>>>
>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>> ---
>>>  drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
>>>  1 file changed, 25 insertions(+), 14 deletions(-)
>> [...]
>>
>> Thanks,
>> Olek

Thanks,
Olek
Yunsheng Lin May 16, 2023, 12:55 p.m. UTC | #7
On 2023/5/15 19:24, Lorenzo Bianconi wrote:
>> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
>>> In order to reduce page_pool memory footprint, rely on
>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>
>> Is there any performance improvement beside the memory saving? As it
>> should reduce TLB miss, I wonder if the TLB miss reducing can even
>> out the cost of the extra frag reference count handling for the
>> frag support?
> 
> reducing the requested headroom to 192 (from 256) we have a nice improvement in
> the 1500B frame case while it is mostly the same in the case of paged skb
> (e.g. MTU 8000B).
> 
>>
>>> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
>>> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
>>> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
>>> supported MTU is now reduced to 36350B.
>>
>> Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
>> and use different frag size depending on the mtu or packet size?
>>
>> Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
>> page if the requested frag size is larger than a specified size too.
>> I will try to implement it if the above idea makes sense.
>>
> 
> since there are no significant differences between full page and fragmented page
> implementation if the MTU is over the page boundary, does it worth to do so?
> (at least for the veth use-case).

Yes, as there is no significant differences between full page and fragmented page
implementation, unifying the interface is trivial, I have sent a RFC for that.
Using that interface may solve the supported mtu reducing problem at least.

> 
> Regards,
> Lorenzo
>
Jesper Dangaard Brouer May 16, 2023, 4:11 p.m. UTC | #8
Cc. Alex Duyck + Eric, please criticize my idea below.

On 12/05/2023 15.08, Lorenzo Bianconi wrote:
> In order to reduce page_pool memory footprint, rely on
> page_pool_dev_alloc_frag routine and reduce buffer size
> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> supported MTU is now reduced to 36350B.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>   drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
>   1 file changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 614f3e3efab0..0e648703cccf 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -31,9 +31,12 @@
>   #define DRV_NAME	"veth"
>   #define DRV_VERSION	"1.0"
>   
> -#define VETH_XDP_FLAG		BIT(0)
> -#define VETH_RING_SIZE		256
> -#define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)
> +#define VETH_XDP_FLAG			BIT(0)
> +#define VETH_RING_SIZE			256
> +#define VETH_XDP_PACKET_HEADROOM	192
> +#define VETH_XDP_HEADROOM		(VETH_XDP_PACKET_HEADROOM + \
> +					 NET_IP_ALIGN)
> +#define VETH_PAGE_POOL_FRAG_SIZE	2048
>   
>   #define VETH_XDP_TX_BULK_SIZE	16
>   #define VETH_XDP_BATCH		16
> @@ -736,7 +739,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>   	if (skb_shared(skb) || skb_head_is_locked(skb) ||
>   	    skb_shinfo(skb)->nr_frags ||
>   	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> -		u32 size, len, max_head_size, off;
> +		u32 size, len, max_head_size, off, pp_off;
>   		struct sk_buff *nskb;
>   		struct page *page;
>   		int i, head_off;
> @@ -747,17 +750,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>   		 *
>   		 * Make sure we have enough space for linear and paged area
>   		 */
> -		max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE -
> +		max_head_size = SKB_WITH_OVERHEAD(VETH_PAGE_POOL_FRAG_SIZE -
>   						  VETH_XDP_HEADROOM);
> -		if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
> +		if (skb->len >
> +		    VETH_PAGE_POOL_FRAG_SIZE * MAX_SKB_FRAGS + max_head_size)
>   			goto drop;
>   
>   		/* Allocate skb head */
> -		page = page_pool_dev_alloc_pages(rq->page_pool);

It seems wasteful to allocate a full page PAGE_SIZE.

> +		page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
> +						VETH_PAGE_POOL_FRAG_SIZE);

Allocating PAGE_SIZE/2 isn't much better.

At this point we already know the skb->len (and skb_headlen).

Why don't we allocated the size that we need?

See page_frag_alloc() system invented by Eric and Duyck.


>   		if (!page)
>   			goto drop;
>   
> -		nskb = napi_build_skb(page_address(page), PAGE_SIZE);
> +		nskb = napi_build_skb(page_address(page) + pp_off,
> +				      VETH_PAGE_POOL_FRAG_SIZE);
>   		if (!nskb) {
>   			page_pool_put_full_page(rq->page_pool, page, true);
>   			goto drop;
> @@ -782,15 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>   		len = skb->len - off;
>   
>   		for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
> -			page = page_pool_dev_alloc_pages(rq->page_pool);
> +			page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
> +							VETH_PAGE_POOL_FRAG_SIZE);
>   			if (!page) {
>   				consume_skb(nskb);
>   				goto drop;
>   			}
>   
> -			size = min_t(u32, len, PAGE_SIZE);
> -			skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
> -			if (skb_copy_bits(skb, off, page_address(page),
> +			size = min_t(u32, len, VETH_PAGE_POOL_FRAG_SIZE);
> +			skb_add_rx_frag(nskb, i, page, pp_off, size,
> +					VETH_PAGE_POOL_FRAG_SIZE);
> +			if (skb_copy_bits(skb, off,
> +					  page_address(page) + pp_off,
>   					  size)) {
>   				consume_skb(nskb);
>   				goto drop;
[...]
Lorenzo Bianconi May 16, 2023, 10:52 p.m. UTC | #9
> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
> > > On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> > > > In order to reduce page_pool memory footprint, rely on
> > > > page_pool_dev_alloc_frag routine and reduce buffer size
> > > > (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> > > 
> > > Is there any performance improvement beside the memory saving? As it
> > > should reduce TLB miss, I wonder if the TLB miss reducing can even
> > > out the cost of the extra frag reference count handling for the
> > > frag support?
> > 
> > reducing the requested headroom to 192 (from 256) we have a nice improvement in
> > the 1500B frame case while it is mostly the same in the case of paged skb
> > (e.g. MTU 8000B).
> 
> Can you define 'nice improvement' ? ;)
> Show us numbers or improvement in %.

I am testing this RFC patch in the scenario reported below:

iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx

- 6.4.0-rc1 net-next:
  MTU 1500B: ~ 7.07 Gbps
  MTU 8000B: ~ 14.7 Gbps

- 6.4.0-rc1 net-next + page_pool frag support in veth:
  MTU 1500B: ~ 8.57 Gbps
  MTU 8000B: ~ 14.5 Gbps

side note: it seems there is a regression between 6.2.15 and 6.4.0-rc1 net-next
(even without latest veth page_pool patches) in the throughput I can get in the
scenario above, but I have not looked into it yet.

- 6.2.15:
  MTU 1500B: ~ 7.91 Gbps
  MTU 8000B: ~ 14.1 Gbps

- 6.4.0-rc1 net-next w/o commits [0],[1],[2]
  MTU 1500B: ~ 6.38 Gbps
  MTU 8000B: ~ 13.2 Gbps

Regards,
Lorenzo

[0] 0ebab78cbcbf  net: veth: add page_pool for page recycling
[1] 4fc418053ec7  net: veth: add page_pool stats
[2] 9d142ed484a3  net: veth: rely on napi_build_skb in veth_convert_skb_to_xdp_buff

> 
> > 
> > > 
> > > > for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
> > > > (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
> > > > Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
> > > > supported MTU is now reduced to 36350B.
> > > 
> > > Maybe we don't need to limit the frag size to VETH_PAGE_POOL_FRAG_SIZE,
> > > and use different frag size depending on the mtu or packet size?
> > > 
> > > Perhaps the page_pool_dev_alloc_frag() can be improved to return non-frag
> > > page if the requested frag size is larger than a specified size too.
> > > I will try to implement it if the above idea makes sense.
> > > 
> > 
> > since there are no significant differences between full page and fragmented page
> > implementation if the MTU is over the page boundary, does it worth to do so?
> > (at least for the veth use-case).
> > 
> > Regards,
> > Lorenzo
> > 
> 
>
Yunsheng Lin May 17, 2023, 9:41 a.m. UTC | #10
On 2023/5/17 6:52, Lorenzo Bianconi wrote:
>> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
>>>> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
>>>>> In order to reduce page_pool memory footprint, rely on
>>>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>>>
>>>> Is there any performance improvement beside the memory saving? As it
>>>> should reduce TLB miss, I wonder if the TLB miss reducing can even
>>>> out the cost of the extra frag reference count handling for the
>>>> frag support?
>>>
>>> reducing the requested headroom to 192 (from 256) we have a nice improvement in
>>> the 1500B frame case while it is mostly the same in the case of paged skb
>>> (e.g. MTU 8000B).
>>
>> Can you define 'nice improvement' ? ;)
>> Show us numbers or improvement in %.
> 
> I am testing this RFC patch in the scenario reported below:
> 
> iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
> 
> - 6.4.0-rc1 net-next:
>   MTU 1500B: ~ 7.07 Gbps
>   MTU 8000B: ~ 14.7 Gbps
> 
> - 6.4.0-rc1 net-next + page_pool frag support in veth:
>   MTU 1500B: ~ 8.57 Gbps
>   MTU 8000B: ~ 14.5 Gbps
> 

Thanks for sharing the data.
Maybe using the new frag interface introduced in [1] bring
back the performance for the MTU 8000B case.

1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@huawei.com/


I drafted a patch for veth to use the new frag interface, maybe that
will show how veth can make use of it. Would you give it a try to see
if there is any performance improvment for MTU 8000B case? Thanks.

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
            skb_shinfo(skb)->nr_frags ||
            skb_headroom(skb) < XDP_PACKET_HEADROOM) {
                u32 size, len, max_head_size, off;
+               struct page_pool_frag *pp_frag;
                struct sk_buff *nskb;
-               struct page *page;
                int i, head_off;

                /* We need a private copy of the skb and data buffers since
@@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
                if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
                        goto drop;

+               size = min_t(u32, skb->len, max_head_size);
+               size += VETH_XDP_HEADROOM;
+               size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
                /* Allocate skb head */
-               page = page_pool_dev_alloc_pages(rq->page_pool);
-               if (!page)
+               pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
+               if (!pp_frag)
                        goto drop;

-               nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+               nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
+                                     pp_frag->truesize);
                if (!nskb) {
-                       page_pool_put_full_page(rq->page_pool, page, true);
+                       page_pool_put_full_page(rq->page_pool, pp_frag->page,
+                                               true);
                        goto drop;
                }

@@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
                len = skb->len - off;

                for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
-                       page = page_pool_dev_alloc_pages(rq->page_pool);
-                       if (!page) {
+                       size = min_t(u32, len, PAGE_SIZE);
+
+                       pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
+                       if (!pp_frag) {
                                consume_skb(nskb);
                                goto drop;
                        }

-                       size = min_t(u32, len, PAGE_SIZE);
-                       skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
-                       if (skb_copy_bits(skb, off, page_address(page),
-                                         size)) {
+                       skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
+                                       size, pp_frag->truesize);
+                       if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
+                                         pp_frag->offset, size)) {
                                consume_skb(nskb);
                                goto drop;
                        }
@@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
                return err;
        }

+       page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
+
        return 0;
 }
Lorenzo Bianconi May 17, 2023, 2:17 p.m. UTC | #11
> On 2023/5/17 6:52, Lorenzo Bianconi wrote:
> >> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote:
> >>>> On 2023/5/12 21:08, Lorenzo Bianconi wrote:
> >>>>> In order to reduce page_pool memory footprint, rely on
> >>>>> page_pool_dev_alloc_frag routine and reduce buffer size
> >>>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
> >>>>
> >>>> Is there any performance improvement beside the memory saving? As it
> >>>> should reduce TLB miss, I wonder if the TLB miss reducing can even
> >>>> out the cost of the extra frag reference count handling for the
> >>>> frag support?
> >>>
> >>> reducing the requested headroom to 192 (from 256) we have a nice improvement in
> >>> the 1500B frame case while it is mostly the same in the case of paged skb
> >>> (e.g. MTU 8000B).
> >>
> >> Can you define 'nice improvement' ? ;)
> >> Show us numbers or improvement in %.
> > 
> > I am testing this RFC patch in the scenario reported below:
> > 
> > iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
> > 
> > - 6.4.0-rc1 net-next:
> >   MTU 1500B: ~ 7.07 Gbps
> >   MTU 8000B: ~ 14.7 Gbps
> > 
> > - 6.4.0-rc1 net-next + page_pool frag support in veth:
> >   MTU 1500B: ~ 8.57 Gbps
> >   MTU 8000B: ~ 14.5 Gbps
> > 
> 
> Thanks for sharing the data.
> Maybe using the new frag interface introduced in [1] bring
> back the performance for the MTU 8000B case.
> 
> 1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@huawei.com/
> 
> 
> I drafted a patch for veth to use the new frag interface, maybe that
> will show how veth can make use of it. Would you give it a try to see
> if there is any performance improvment for MTU 8000B case? Thanks.
> 
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>             skb_shinfo(skb)->nr_frags ||
>             skb_headroom(skb) < XDP_PACKET_HEADROOM) {
>                 u32 size, len, max_head_size, off;
> +               struct page_pool_frag *pp_frag;
>                 struct sk_buff *nskb;
> -               struct page *page;
>                 int i, head_off;
> 
>                 /* We need a private copy of the skb and data buffers since
> @@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>                 if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
>                         goto drop;
> 
> +               size = min_t(u32, skb->len, max_head_size);
> +               size += VETH_XDP_HEADROOM;
> +               size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
>                 /* Allocate skb head */
> -               page = page_pool_dev_alloc_pages(rq->page_pool);
> -               if (!page)
> +               pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
> +               if (!pp_frag)
>                         goto drop;
> 
> -               nskb = napi_build_skb(page_address(page), PAGE_SIZE);
> +               nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
> +                                     pp_frag->truesize);
>                 if (!nskb) {
> -                       page_pool_put_full_page(rq->page_pool, page, true);
> +                       page_pool_put_full_page(rq->page_pool, pp_frag->page,
> +                                               true);
>                         goto drop;
>                 }
> 
> @@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>                 len = skb->len - off;
> 
>                 for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
> -                       page = page_pool_dev_alloc_pages(rq->page_pool);
> -                       if (!page) {
> +                       size = min_t(u32, len, PAGE_SIZE);
> +
> +                       pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
> +                       if (!pp_frag) {
>                                 consume_skb(nskb);
>                                 goto drop;
>                         }
> 
> -                       size = min_t(u32, len, PAGE_SIZE);
> -                       skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
> -                       if (skb_copy_bits(skb, off, page_address(page),
> -                                         size)) {
> +                       skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
> +                                       size, pp_frag->truesize);
> +                       if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
> +                                         pp_frag->offset, size)) {
>                                 consume_skb(nskb);
>                                 goto drop;
>                         }
> @@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
>                 return err;
>         }

IIUC the code here we are using a variable length for linear part (at most one page)
while we will always use a full page (exept for the last fragment) for the paged
area, correct? I have not tested it yet but I do not think we will get a significant
improvement since if we set MTU to 8000B in my tests we get mostly the same throughput
(14.5 Gbps vs 14.7 Gbps) if we use page_pool fragment or page_pool full page.
Am I missing something?
What we are discussing with Jesper is try to allocate a order 3 page from the pool and
rely page_pool fragment, similar to page_frag_cache is doing. I will look into it if
there are no strong 'red flags'.

Regards,
Lorenzo

> 
> +       page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
> +
>         return 0;
>  }
>
Jakub Kicinski May 17, 2023, 2:58 p.m. UTC | #12
On Wed, 17 May 2023 00:52:25 +0200 Lorenzo Bianconi wrote:
> I am testing this RFC patch in the scenario reported below:
> 
> iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx
> 
> - 6.4.0-rc1 net-next:
>   MTU 1500B: ~ 7.07 Gbps
>   MTU 8000B: ~ 14.7 Gbps
> 
> - 6.4.0-rc1 net-next + page_pool frag support in veth:
>   MTU 1500B: ~ 8.57 Gbps
>   MTU 8000B: ~ 14.5 Gbps
> 
> side note: it seems there is a regression between 6.2.15 and 6.4.0-rc1 net-next
> (even without latest veth page_pool patches) in the throughput I can get in the
> scenario above, but I have not looked into it yet.
> 
> - 6.2.15:
>   MTU 1500B: ~ 7.91 Gbps
>   MTU 8000B: ~ 14.1 Gbps
> 
> - 6.4.0-rc1 net-next w/o commits [0],[1],[2]
>   MTU 1500B: ~ 6.38 Gbps
>   MTU 8000B: ~ 13.2 Gbps

If the benchmark is iperf, wouldn't working towards preserving GSO
status across XDP (assuming prog is multi-buf-capable) be the most
beneficial optimization?
Yunsheng Lin May 18, 2023, 1:16 a.m. UTC | #13
On 2023/5/17 22:17, Lorenzo Bianconi wrote:
>> Maybe using the new frag interface introduced in [1] bring
>> back the performance for the MTU 8000B case.
>>
>> 1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@huawei.com/
>>
>>
>> I drafted a patch for veth to use the new frag interface, maybe that
>> will show how veth can make use of it. Would you give it a try to see
>> if there is any performance improvment for MTU 8000B case? Thanks.
>>
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>             skb_shinfo(skb)->nr_frags ||
>>             skb_headroom(skb) < XDP_PACKET_HEADROOM) {
>>                 u32 size, len, max_head_size, off;
>> +               struct page_pool_frag *pp_frag;
>>                 struct sk_buff *nskb;
>> -               struct page *page;
>>                 int i, head_off;
>>
>>                 /* We need a private copy of the skb and data buffers since
>> @@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>                 if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
>>                         goto drop;
>>
>> +               size = min_t(u32, skb->len, max_head_size);
>> +               size += VETH_XDP_HEADROOM;
>> +               size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +
>>                 /* Allocate skb head */
>> -               page = page_pool_dev_alloc_pages(rq->page_pool);
>> -               if (!page)
>> +               pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
>> +               if (!pp_frag)
>>                         goto drop;
>>
>> -               nskb = napi_build_skb(page_address(page), PAGE_SIZE);
>> +               nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
>> +                                     pp_frag->truesize);
>>                 if (!nskb) {
>> -                       page_pool_put_full_page(rq->page_pool, page, true);
>> +                       page_pool_put_full_page(rq->page_pool, pp_frag->page,
>> +                                               true);
>>                         goto drop;
>>                 }
>>
>> @@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>                 len = skb->len - off;
>>
>>                 for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
>> -                       page = page_pool_dev_alloc_pages(rq->page_pool);
>> -                       if (!page) {
>> +                       size = min_t(u32, len, PAGE_SIZE);
>> +
>> +                       pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
>> +                       if (!pp_frag) {
>>                                 consume_skb(nskb);
>>                                 goto drop;
>>                         }
>>
>> -                       size = min_t(u32, len, PAGE_SIZE);
>> -                       skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
>> -                       if (skb_copy_bits(skb, off, page_address(page),
>> -                                         size)) {
>> +                       skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
>> +                                       size, pp_frag->truesize);
>> +                       if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
>> +                                         pp_frag->offset, size)) {
>>                                 consume_skb(nskb);
>>                                 goto drop;
>>                         }
>> @@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
>>                 return err;
>>         }
> 
> IIUC the code here we are using a variable length for linear part (at most one page)
> while we will always use a full page (exept for the last fragment) for the paged

More correctly, it does not care if the data is in linear part or in paged area.
We copy the data to new skb using least possible fragment and most memory saving
depending on head/tail room size and the page size/order, as skb_copy_bits() hides
the date layout differenence for it's caller.

> area, correct? I have not tested it yet but I do not think we will get a significant
> improvement since if we set MTU to 8000B in my tests we get mostly the same throughput
> (14.5 Gbps vs 14.7 Gbps) if we use page_pool fragment or page_pool full page.
> Am I missing something?

I don't expect significant improvement too, but I do expect a 'nice improvement' for
performance and memory saving depending on how you view 'nice improvement':)

> What we are discussing with Jesper is try to allocate a order 3 page from the pool and
> rely page_pool fragment, similar to page_frag_cache is doing. I will look into it if
> there are no strong 'red flags'.

Thanks.
Yes, if we do not really care about memory usage, using order 3 page should give more
performance improvement.
As my understanding, improvement mentioned above is also applied to order 3 page.

> 
> Regards,
> Lorenzo
> 
>>
>> +       page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
>> +
>>         return 0;
>>  }
>>
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 614f3e3efab0..0e648703cccf 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -31,9 +31,12 @@ 
 #define DRV_NAME	"veth"
 #define DRV_VERSION	"1.0"
 
-#define VETH_XDP_FLAG		BIT(0)
-#define VETH_RING_SIZE		256
-#define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+#define VETH_XDP_FLAG			BIT(0)
+#define VETH_RING_SIZE			256
+#define VETH_XDP_PACKET_HEADROOM	192
+#define VETH_XDP_HEADROOM		(VETH_XDP_PACKET_HEADROOM + \
+					 NET_IP_ALIGN)
+#define VETH_PAGE_POOL_FRAG_SIZE	2048
 
 #define VETH_XDP_TX_BULK_SIZE	16
 #define VETH_XDP_BATCH		16
@@ -736,7 +739,7 @@  static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	if (skb_shared(skb) || skb_head_is_locked(skb) ||
 	    skb_shinfo(skb)->nr_frags ||
 	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
-		u32 size, len, max_head_size, off;
+		u32 size, len, max_head_size, off, pp_off;
 		struct sk_buff *nskb;
 		struct page *page;
 		int i, head_off;
@@ -747,17 +750,20 @@  static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		 *
 		 * Make sure we have enough space for linear and paged area
 		 */
-		max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE -
+		max_head_size = SKB_WITH_OVERHEAD(VETH_PAGE_POOL_FRAG_SIZE -
 						  VETH_XDP_HEADROOM);
-		if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
+		if (skb->len >
+		    VETH_PAGE_POOL_FRAG_SIZE * MAX_SKB_FRAGS + max_head_size)
 			goto drop;
 
 		/* Allocate skb head */
-		page = page_pool_dev_alloc_pages(rq->page_pool);
+		page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
+						VETH_PAGE_POOL_FRAG_SIZE);
 		if (!page)
 			goto drop;
 
-		nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+		nskb = napi_build_skb(page_address(page) + pp_off,
+				      VETH_PAGE_POOL_FRAG_SIZE);
 		if (!nskb) {
 			page_pool_put_full_page(rq->page_pool, page, true);
 			goto drop;
@@ -782,15 +788,18 @@  static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		len = skb->len - off;
 
 		for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
-			page = page_pool_dev_alloc_pages(rq->page_pool);
+			page = page_pool_dev_alloc_frag(rq->page_pool, &pp_off,
+							VETH_PAGE_POOL_FRAG_SIZE);
 			if (!page) {
 				consume_skb(nskb);
 				goto drop;
 			}
 
-			size = min_t(u32, len, PAGE_SIZE);
-			skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
-			if (skb_copy_bits(skb, off, page_address(page),
+			size = min_t(u32, len, VETH_PAGE_POOL_FRAG_SIZE);
+			skb_add_rx_frag(nskb, i, page, pp_off, size,
+					VETH_PAGE_POOL_FRAG_SIZE);
+			if (skb_copy_bits(skb, off,
+					  page_address(page) + pp_off,
 					  size)) {
 				consume_skb(nskb);
 				goto drop;
@@ -1035,6 +1044,7 @@  static int veth_create_page_pool(struct veth_rq *rq)
 	struct page_pool_params pp_params = {
 		.order = 0,
 		.pool_size = VETH_RING_SIZE,
+		.flags = PP_FLAG_PAGE_FRAG,
 		.nid = NUMA_NO_NODE,
 		.dev = &rq->dev->dev,
 	};
@@ -1634,13 +1644,14 @@  static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 			goto err;
 		}
 
-		max_mtu = SKB_WITH_OVERHEAD(PAGE_SIZE - VETH_XDP_HEADROOM) -
+		max_mtu = SKB_WITH_OVERHEAD(VETH_PAGE_POOL_FRAG_SIZE -
+					    VETH_XDP_HEADROOM) -
 			  peer->hard_header_len;
 		/* Allow increasing the max_mtu if the program supports
 		 * XDP fragments.
 		 */
 		if (prog->aux->xdp_has_frags)
-			max_mtu += PAGE_SIZE * MAX_SKB_FRAGS;
+			max_mtu += VETH_PAGE_POOL_FRAG_SIZE * MAX_SKB_FRAGS;
 
 		if (peer->mtu > max_mtu) {
 			NL_SET_ERR_MSG_MOD(extack, "Peer MTU is too large to set XDP");