diff mbox series

[bpf-next,v1,1/2] xdp: recycle Page Pool backed skbs built from XDP frames

Message ID 20230301160315.1022488-2-aleksander.lobakin@intel.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series xdp: recycle Page Pool backed skbs built from XDP frames | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers warning 4 maintainers not CCed: pabeni@redhat.com edumazet@google.com davem@davemloft.net john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-23 fail Logs for test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-20 fail Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-36 fail Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc

Commit Message

Alexander Lobakin March 1, 2023, 4:03 p.m. UTC
__xdp_build_skb_from_frame() state(d):

/* Until page_pool get SKB return path, release DMA here */

Page Pool got skb pages recycling in April 2021, but missed this
function.

xdp_release_frame() is relevant only for Page Pool backed frames and it
detaches the page from the corresponding Pool in order to make it
freeable via page_frag_free(). It can instead just mark the output skb
as eligible for recycling if the frame is backed by a PP. No change for
other memory model types (the same condition check as before).
cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
almost).

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 net/core/xdp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

kernel test robot March 1, 2023, 7:08 p.m. UTC | #1
Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230301160315.1022488-2-aleksander.lobakin%40intel.com
patch subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames
config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20230302/202303020331.PSFMFbXw-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/a5ca5578e9bd35220be091fd02df96d492120ee3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635
        git checkout a5ca5578e9bd35220be091fd02df96d492120ee3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303020331.PSFMFbXw-lkp@intel.com/

All errors (new ones prefixed by >>):

   net/core/xdp.c: In function '__xdp_build_skb_from_frame':
>> net/core/xdp.c:662:17: error: implicit declaration of function 'skb_mark_for_recycle' [-Werror=implicit-function-declaration]
     662 |                 skb_mark_for_recycle(skb);
         |                 ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/skb_mark_for_recycle +662 net/core/xdp.c

   614	
   615	struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
   616						   struct sk_buff *skb,
   617						   struct net_device *dev)
   618	{
   619		struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
   620		unsigned int headroom, frame_size;
   621		void *hard_start;
   622		u8 nr_frags;
   623	
   624		/* xdp frags frame */
   625		if (unlikely(xdp_frame_has_frags(xdpf)))
   626			nr_frags = sinfo->nr_frags;
   627	
   628		/* Part of headroom was reserved to xdpf */
   629		headroom = sizeof(*xdpf) + xdpf->headroom;
   630	
   631		/* Memory size backing xdp_frame data already have reserved
   632		 * room for build_skb to place skb_shared_info in tailroom.
   633		 */
   634		frame_size = xdpf->frame_sz;
   635	
   636		hard_start = xdpf->data - headroom;
   637		skb = build_skb_around(skb, hard_start, frame_size);
   638		if (unlikely(!skb))
   639			return NULL;
   640	
   641		skb_reserve(skb, headroom);
   642		__skb_put(skb, xdpf->len);
   643		if (xdpf->metasize)
   644			skb_metadata_set(skb, xdpf->metasize);
   645	
   646		if (unlikely(xdp_frame_has_frags(xdpf)))
   647			xdp_update_skb_shared_info(skb, nr_frags,
   648						   sinfo->xdp_frags_size,
   649						   nr_frags * xdpf->frame_sz,
   650						   xdp_frame_is_frag_pfmemalloc(xdpf));
   651	
   652		/* Essential SKB info: protocol and skb->dev */
   653		skb->protocol = eth_type_trans(skb, dev);
   654	
   655		/* Optional SKB info, currently missing:
   656		 * - HW checksum info		(skb->ip_summed)
   657		 * - HW RX hash			(skb_set_hash)
   658		 * - RX ring dev queue index	(skb_record_rx_queue)
   659		 */
   660	
   661		if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
 > 662			skb_mark_for_recycle(skb);
   663	
   664		/* Allow SKB to reuse area used by xdp_frame */
   665		xdp_scrub_frame(xdpf);
   666	
   667		return skb;
   668	}
   669	EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame);
   670
kernel test robot March 1, 2023, 7:18 p.m. UTC | #2
Hi Alexander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230301160315.1022488-2-aleksander.lobakin%40intel.com
patch subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames
config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230302/202303020342.Wi2PRFFH-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a5ca5578e9bd35220be091fd02df96d492120ee3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635
        git checkout a5ca5578e9bd35220be091fd02df96d492120ee3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303020342.Wi2PRFFH-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/core/xdp.c:662:3: error: implicit declaration of function 'skb_mark_for_recycle' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                   skb_mark_for_recycle(skb);
                   ^
   1 error generated.


vim +/skb_mark_for_recycle +662 net/core/xdp.c

   614	
   615	struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
   616						   struct sk_buff *skb,
   617						   struct net_device *dev)
   618	{
   619		struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
   620		unsigned int headroom, frame_size;
   621		void *hard_start;
   622		u8 nr_frags;
   623	
   624		/* xdp frags frame */
   625		if (unlikely(xdp_frame_has_frags(xdpf)))
   626			nr_frags = sinfo->nr_frags;
   627	
   628		/* Part of headroom was reserved to xdpf */
   629		headroom = sizeof(*xdpf) + xdpf->headroom;
   630	
   631		/* Memory size backing xdp_frame data already have reserved
   632		 * room for build_skb to place skb_shared_info in tailroom.
   633		 */
   634		frame_size = xdpf->frame_sz;
   635	
   636		hard_start = xdpf->data - headroom;
   637		skb = build_skb_around(skb, hard_start, frame_size);
   638		if (unlikely(!skb))
   639			return NULL;
   640	
   641		skb_reserve(skb, headroom);
   642		__skb_put(skb, xdpf->len);
   643		if (xdpf->metasize)
   644			skb_metadata_set(skb, xdpf->metasize);
   645	
   646		if (unlikely(xdp_frame_has_frags(xdpf)))
   647			xdp_update_skb_shared_info(skb, nr_frags,
   648						   sinfo->xdp_frags_size,
   649						   nr_frags * xdpf->frame_sz,
   650						   xdp_frame_is_frag_pfmemalloc(xdpf));
   651	
   652		/* Essential SKB info: protocol and skb->dev */
   653		skb->protocol = eth_type_trans(skb, dev);
   654	
   655		/* Optional SKB info, currently missing:
   656		 * - HW checksum info		(skb->ip_summed)
   657		 * - HW RX hash			(skb_set_hash)
   658		 * - RX ring dev queue index	(skb_record_rx_queue)
   659		 */
   660	
   661		if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
 > 662			skb_mark_for_recycle(skb);
   663	
   664		/* Allow SKB to reuse area used by xdp_frame */
   665		xdp_scrub_frame(xdpf);
   666	
   667		return skb;
   668	}
   669	EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame);
   670
Yunsheng Lin March 2, 2023, 2:30 a.m. UTC | #3
On 2023/3/2 0:03, Alexander Lobakin wrote:
> __xdp_build_skb_from_frame() state(d):
> 
> /* Until page_pool get SKB return path, release DMA here */
> 
> Page Pool got skb pages recycling in April 2021, but missed this
> function.
> 
> xdp_release_frame() is relevant only for Page Pool backed frames and it
> detaches the page from the corresponding Pool in order to make it
> freeable via page_frag_free(). It can instead just mark the output skb
> as eligible for recycling if the frame is backed by a PP. No change for
> other memory model types (the same condition check as before).
> cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
> almost).
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  net/core/xdp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 8c92fc553317..a2237cfca8e9 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
>  	 * - RX ring dev queue index	(skb_record_rx_queue)
>  	 */
>  
> -	/* Until page_pool get SKB return path, release DMA here */
> -	xdp_release_frame(xdpf);
> +	if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
> +		skb_mark_for_recycle(skb);


We both rely on both skb->pp_recycle and page->pp_magic to decide
the page is really from page pool. So there was a few corner case
problem when we are sharing a page for different skb in the driver
level or calling skb_clone() or skb_try_coalesce().
see:
https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803
https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/
https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/

As the 'struct xdp_frame' also use 'struct skb_shared_info' which is
sharable, see xdp_get_shared_info_from_frame().

For now xdpf_clone() does not seems to handling frag page yet,
so it should be fine for now.

IMHO we should find a way to use per-page marker, instead of both
per-skb and per-page markers, in order to avoid the above problem
for xdp if xdp has a similar processing as skb, as suggested by Eric.

https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/

>  
>  	/* Allow SKB to reuse area used by xdp_frame */
>  	xdp_scrub_frame(xdpf);
>
Jesper Dangaard Brouer March 3, 2023, 10:31 a.m. UTC | #4
On 02/03/2023 03.30, Yunsheng Lin wrote:
> On 2023/3/2 0:03, Alexander Lobakin wrote:
>> __xdp_build_skb_from_frame() state(d):
>>
>> /* Until page_pool get SKB return path, release DMA here */
>>
>> Page Pool got skb pages recycling in April 2021, but missed this
>> function.
>>
>> xdp_release_frame() is relevant only for Page Pool backed frames and it
>> detaches the page from the corresponding Pool in order to make it
>> freeable via page_frag_free(). It can instead just mark the output skb
>> as eligible for recycling if the frame is backed by a PP. No change for
>> other memory model types (the same condition check as before).
>> cpumap redirect and veth on Page Pool drivers now become zero-alloc (or
>> almost).
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> ---
>>   net/core/xdp.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index 8c92fc553317..a2237cfca8e9 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
>>   	 * - RX ring dev queue index	(skb_record_rx_queue)
>>   	 */
>>   
>> -	/* Until page_pool get SKB return path, release DMA here */
>> -	xdp_release_frame(xdpf);
>> +	if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
>> +		skb_mark_for_recycle(skb);
> 
> 
> We both rely on both skb->pp_recycle and page->pp_magic to decide
> the page is really from page pool. So there was a few corner case
> problem when we are sharing a page for different skb in the driver
> level or calling skb_clone() or skb_try_coalesce().
> see:
> https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803
> https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/
> https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/
> 
> As the 'struct xdp_frame' also use 'struct skb_shared_info' which is
> sharable, see xdp_get_shared_info_from_frame().
> 
> For now xdpf_clone() does not seems to handling frag page yet,
> so it should be fine for now.
> 
> IMHO we should find a way to use per-page marker, instead of both
> per-skb and per-page markers, in order to avoid the above problem
> for xdp if xdp has a similar processing as skb, as suggested by Eric.
> 

Moving to a per-page marker can be *more* expensive if the struct-page
memory isn't cache-hot.  So, if struct-page is accessed anyhow then sure
we can move it to a per-page marker.

> https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/
> 
>>   
>>   	/* Allow SKB to reuse area used by xdp_frame */
>>   	xdp_scrub_frame(xdpf);
>>
>
Alexander Lobakin March 3, 2023, 11:22 a.m. UTC | #5
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Thu, 2 Mar 2023 10:30:13 +0800

> On 2023/3/2 0:03, Alexander Lobakin wrote:
>> __xdp_build_skb_from_frame() state(d):
>>
>> /* Until page_pool get SKB return path, release DMA here */
>>
>> Page Pool got skb pages recycling in April 2021, but missed this
>> function.

[...]

> We both rely on both skb->pp_recycle and page->pp_magic to decide
> the page is really from page pool. So there was a few corner case
> problem when we are sharing a page for different skb in the driver
> level or calling skb_clone() or skb_try_coalesce().
> see:
> https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803
> https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/
> https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/

And they are fixed :D
No drivers currently which use Page Pool mix PP pages with non-PP. And
it's impossible to trigger try_coalesce() or so at least on cpumap path
since we're only creating skbs at that moment, they don't come from
anywhere else.

> 
> As the 'struct xdp_frame' also use 'struct skb_shared_info' which is
> sharable, see xdp_get_shared_info_from_frame().
> 
> For now xdpf_clone() does not seems to handling frag page yet,
> so it should be fine for now.

xdpf_clone() clones a frame to a new full page and doesn't copy its
skb_shared_info.

> 
> IMHO we should find a way to use per-page marker, instead of both
> per-skb and per-page markers, in order to avoid the above problem
> for xdp if xdp has a similar processing as skb, as suggested by Eric.
> 
> https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/

As Jesper already pointed out, not having a quick way to check whether
we have to check ::pp_magic at all can decrease performance. So it's
rather a shortcut.

> 
>>  
>>  	/* Allow SKB to reuse area used by xdp_frame */
>>  	xdp_scrub_frame(xdpf);
>>

Thanks,
Olek
Yunsheng Lin March 3, 2023, 12:44 p.m. UTC | #6
On 2023/3/3 19:22, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Thu, 2 Mar 2023 10:30:13 +0800
> 
>> On 2023/3/2 0:03, Alexander Lobakin wrote:
>>> __xdp_build_skb_from_frame() state(d):
>>>
>>> /* Until page_pool get SKB return path, release DMA here */
>>>
>>> Page Pool got skb pages recycling in April 2021, but missed this
>>> function.
> 
> [...]
> 
>> We both rely on both skb->pp_recycle and page->pp_magic to decide
>> the page is really from page pool. So there was a few corner case
>> problem when we are sharing a page for different skb in the driver
>> level or calling skb_clone() or skb_try_coalesce().
>> see:
>> https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803
>> https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/
>> https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/
> 
> And they are fixed :D
> No drivers currently which use Page Pool mix PP pages with non-PP. And

The wireless adapter which use Page Pool *does* mix PP pages with
non-PP, see below discussion:

https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/

> it's impossible to trigger try_coalesce() or so at least on cpumap path
> since we're only creating skbs at that moment, they don't come from
> anywhere else.
> 
>>
>> As the 'struct xdp_frame' also use 'struct skb_shared_info' which is
>> sharable, see xdp_get_shared_info_from_frame().
>>
>> For now xdpf_clone() does not seems to handling frag page yet,
>> so it should be fine for now.
> 
> xdpf_clone() clones a frame to a new full page and doesn't copy its
> skb_shared_info.
> 
>>
>> IMHO we should find a way to use per-page marker, instead of both
>> per-skb and per-page markers, in order to avoid the above problem
>> for xdp if xdp has a similar processing as skb, as suggested by Eric.
>>
>> https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/
> 
> As Jesper already pointed out, not having a quick way to check whether
> we have to check ::pp_magic at all can decrease performance. So it's
> rather a shortcut.

When we are freeing a page by updating the _refcount, I think
we are already touching the cache of ::pp_magic.

Anyway, I am not sure checking ::pp_magic is correct when a
page will be passing between different subsystem and back to
the network stack eventually, checking ::pp_magic may not be
correct if this happens.

Another way is to use the bottom two bits in bv_page, see:
https://www.spinics.net/lists/netdev/msg874099.html

> 
>>
>>>  
>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>  	xdp_scrub_frame(xdpf);
>>>
> 
> Thanks,
> Olek
> .
>
Alexander Lobakin March 3, 2023, 1:26 p.m. UTC | #7
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Fri, 3 Mar 2023 20:44:24 +0800

> On 2023/3/3 19:22, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Thu, 2 Mar 2023 10:30:13 +0800

[...]

>> And they are fixed :D
>> No drivers currently which use Page Pool mix PP pages with non-PP. And
> 
> The wireless adapter which use Page Pool *does* mix PP pages with
> non-PP, see below discussion:
> 
> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/

Ah right, I remember that (also was fixed).
Not that I think it is correct to mix them -- for my PoV, a driver
shoule either give *all* its Rx buffers as PP-backed or not use PP at all.

[...]

>> As Jesper already pointed out, not having a quick way to check whether
>> we have to check ::pp_magic at all can decrease performance. So it's
>> rather a shortcut.
> 
> When we are freeing a page by updating the _refcount, I think
> we are already touching the cache of ::pp_magic.

But no page freeing happens before checking for skb->pp_recycle, neither
in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1].

> 
> Anyway, I am not sure checking ::pp_magic is correct when a
> page will be passing between different subsystem and back to
> the network stack eventually, checking ::pp_magic may not be
> correct if this happens.
> 
> Another way is to use the bottom two bits in bv_page, see:
> https://www.spinics.net/lists/netdev/msg874099.html
> 
>>
>>>
>>>>  
>>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>>  	xdp_scrub_frame(xdpf);
>>>>
>>
>> Thanks,
>> Olek
>> .
>>

[0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808
[1]
https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385

Thanks,
Olek
Yunsheng Lin March 6, 2023, 1:09 a.m. UTC | #8
On 2023/3/3 21:26, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Fri, 3 Mar 2023 20:44:24 +0800
> 
>> On 2023/3/3 19:22, Alexander Lobakin wrote:
>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>> Date: Thu, 2 Mar 2023 10:30:13 +0800
> 
> [...]
> 
>>> And they are fixed :D
>>> No drivers currently which use Page Pool mix PP pages with non-PP. And
>>
>> The wireless adapter which use Page Pool *does* mix PP pages with
>> non-PP, see below discussion:
>>
>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/
> 
> Ah right, I remember that (also was fixed).
> Not that I think it is correct to mix them -- for my PoV, a driver
> shoule either give *all* its Rx buffers as PP-backed or not use PP at all.
> 
> [...]
> 
>>> As Jesper already pointed out, not having a quick way to check whether
>>> we have to check ::pp_magic at all can decrease performance. So it's
>>> rather a shortcut.
>>
>> When we are freeing a page by updating the _refcount, I think
>> we are already touching the cache of ::pp_magic.
> 
> But no page freeing happens before checking for skb->pp_recycle, neither
> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1].

If we move to per page marker, we probably do not need checking
skb->pp_recycle.

Note both page_pool_return_skb_page() and skb_free_frag() can
reuse the cache line triggered by per page marker checking if
the per page marker is in the 'struct page'.

> 
>>
>> Anyway, I am not sure checking ::pp_magic is correct when a
>> page will be passing between different subsystem and back to
>> the network stack eventually, checking ::pp_magic may not be
>> correct if this happens.
>>
>> Another way is to use the bottom two bits in bv_page, see:
>> https://www.spinics.net/lists/netdev/msg874099.html
>>
>>>
>>>>
>>>>>  
>>>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>>>  	xdp_scrub_frame(xdpf);
>>>>>
>>>
>>> Thanks,
>>> Olek
>>> .
>>>
> 
> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808
> [1]
> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385
> 
> Thanks,
> Olek
> .
>
Alexander Lobakin March 6, 2023, 11:58 a.m. UTC | #9
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Mon, 6 Mar 2023 09:09:31 +0800

> On 2023/3/3 21:26, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Fri, 3 Mar 2023 20:44:24 +0800
>>
>>> On 2023/3/3 19:22, Alexander Lobakin wrote:
>>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>>> Date: Thu, 2 Mar 2023 10:30:13 +0800
>>
>> [...]
>>
>>>> And they are fixed :D
>>>> No drivers currently which use Page Pool mix PP pages with non-PP. And
>>>
>>> The wireless adapter which use Page Pool *does* mix PP pages with
>>> non-PP, see below discussion:
>>>
>>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/
>>
>> Ah right, I remember that (also was fixed).
>> Not that I think it is correct to mix them -- for my PoV, a driver
>> shoule either give *all* its Rx buffers as PP-backed or not use PP at all.
>>
>> [...]
>>
>>>> As Jesper already pointed out, not having a quick way to check whether
>>>> we have to check ::pp_magic at all can decrease performance. So it's
>>>> rather a shortcut.
>>>
>>> When we are freeing a page by updating the _refcount, I think
>>> we are already touching the cache of ::pp_magic.
>>
>> But no page freeing happens before checking for skb->pp_recycle, neither
>> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1].
> 
> If we move to per page marker, we probably do not need checking
> skb->pp_recycle.
> 
> Note both page_pool_return_skb_page() and skb_free_frag() can
> reuse the cache line triggered by per page marker checking if
> the per page marker is in the 'struct page'.

Ah, from that perspective. Yes, you're probably right, but would need to
be tested anyway. I don't see any open problems with the PP recycling
right now on the lists, but someone may try to change it one day.
Anyway, this flag is only to do a quick test. We do have
sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb
was pfmemalloced.

> 
>>
>>>
>>> Anyway, I am not sure checking ::pp_magic is correct when a
>>> page will be passing between different subsystem and back to
>>> the network stack eventually, checking ::pp_magic may not be
>>> correct if this happens.
>>>
>>> Another way is to use the bottom two bits in bv_page, see:
>>> https://www.spinics.net/lists/netdev/msg874099.html
>>>
>>>>
>>>>>
>>>>>>  
>>>>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>>>>  	xdp_scrub_frame(xdpf);
>>>>>>
>>>>
>>>> Thanks,
>>>> Olek
>>>> .
>>>>
>>
>> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808
>> [1]
>> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385
>>
>> Thanks,
>> Olek
>> .
>>

Thanks,
Olek
Yunsheng Lin March 7, 2023, 2:50 a.m. UTC | #10
On 2023/3/6 19:58, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Mon, 6 Mar 2023 09:09:31 +0800
> 
>> On 2023/3/3 21:26, Alexander Lobakin wrote:
>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>> Date: Fri, 3 Mar 2023 20:44:24 +0800
>>>
>>>> On 2023/3/3 19:22, Alexander Lobakin wrote:
>>>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>>>> Date: Thu, 2 Mar 2023 10:30:13 +0800
>>>
>>> [...]
>>>
>>>>> And they are fixed :D
>>>>> No drivers currently which use Page Pool mix PP pages with non-PP. And
>>>>
>>>> The wireless adapter which use Page Pool *does* mix PP pages with
>>>> non-PP, see below discussion:
>>>>
>>>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/
>>>
>>> Ah right, I remember that (also was fixed).
>>> Not that I think it is correct to mix them -- for my PoV, a driver
>>> shoule either give *all* its Rx buffers as PP-backed or not use PP at all.
>>>
>>> [...]
>>>
>>>>> As Jesper already pointed out, not having a quick way to check whether
>>>>> we have to check ::pp_magic at all can decrease performance. So it's
>>>>> rather a shortcut.
>>>>
>>>> When we are freeing a page by updating the _refcount, I think
>>>> we are already touching the cache of ::pp_magic.
>>>
>>> But no page freeing happens before checking for skb->pp_recycle, neither
>>> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1].
>>
>> If we move to per page marker, we probably do not need checking
>> skb->pp_recycle.
>>
>> Note both page_pool_return_skb_page() and skb_free_frag() can
>> reuse the cache line triggered by per page marker checking if
>> the per page marker is in the 'struct page'.
> 
> Ah, from that perspective. Yes, you're probably right, but would need to
> be tested anyway. I don't see any open problems with the PP recycling
> right now on the lists, but someone may try to change it one day.
> Anyway, this flag is only to do a quick test. We do have
> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb
> was pfmemalloced.

The point seems to be that sk_buff::pfmemalloc allow false positive, which
means skb->pfmemalloc can be set to true while every page from this skb is
not pfmemalloced as you mentioned.

While skb->pp_recycle can't allow false positive, if that happens, reference
counting of the page will not be handled properly if pp and non-pp skb shares
the page as the wireless adapter does.

> 
>>
>>>
>>>>
>>>> Anyway, I am not sure checking ::pp_magic is correct when a
>>>> page will be passing between different subsystem and back to
>>>> the network stack eventually, checking ::pp_magic may not be
>>>> correct if this happens.
>>>>
>>>> Another way is to use the bottom two bits in bv_page, see:
>>>> https://www.spinics.net/lists/netdev/msg874099.html
>>>>
>>>>>
>>>>>>
>>>>>>>  
>>>>>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>>>>>  	xdp_scrub_frame(xdpf);
>>>>>>>
>>>>>
>>>>> Thanks,
>>>>> Olek
>>>>> .
>>>>>
>>>
>>> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808
>>> [1]
>>> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385
>>>
>>> Thanks,
>>> Olek
>>> .
>>>
> 
> Thanks,
> Olek
> 
> .
>
Alexander Lobakin March 7, 2023, 6:14 p.m. UTC | #11
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Tue, 7 Mar 2023 10:50:34 +0800

> On 2023/3/6 19:58, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Mon, 6 Mar 2023 09:09:31 +0800

[...]

>> Ah, from that perspective. Yes, you're probably right, but would need to
>> be tested anyway. I don't see any open problems with the PP recycling
>> right now on the lists, but someone may try to change it one day.
>> Anyway, this flag is only to do a quick test. We do have
>> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb
>> was pfmemalloced.
> 
> The point seems to be that sk_buff::pfmemalloc allow false positive, which
> means skb->pfmemalloc can be set to true while every page from this skb is
> not pfmemalloced as you mentioned.
> 
> While skb->pp_recycle can't allow false positive, if that happens, reference
> counting of the page will not be handled properly if pp and non-pp skb shares
> the page as the wireless adapter does.

You mean false-positives in both directions? Because if ->pp_recycle is
set, the stack can still free non-PP pages. In the opposite case, I mean
when ->pp_recycle is false and an skb page belongs to a page_pool, yes,
there'll be issues.
But I think the deal is to propagate the flag when you want to attach a
PP-backed page to the skb? I mean, if someone decides to mix pages with
different memory models, it's his responsibility to make sure everything
is fine, because it's not a common/intended way. Isn't it?

> 
>>
>>>
>>>>
>>>>>
>>>>> Anyway, I am not sure checking ::pp_magic is correct when a
>>>>> page will be passing between different subsystem and back to
>>>>> the network stack eventually, checking ::pp_magic may not be
>>>>> correct if this happens.
>>>>>
>>>>> Another way is to use the bottom two bits in bv_page, see:
>>>>> https://www.spinics.net/lists/netdev/msg874099.html

This one is interesting actually. We'd only need one bit -- which is
100% free and available in case of page pointers.

>>>>>
>>>>>>
>>>>>>>
>>>>>>>>  
>>>>>>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>>>>>>  	xdp_scrub_frame(xdpf);

[...]

Thanks,
Olek
Yunsheng Lin March 8, 2023, 6:27 a.m. UTC | #12
On 2023/3/8 2:14, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Tue, 7 Mar 2023 10:50:34 +0800
> 
>> On 2023/3/6 19:58, Alexander Lobakin wrote:
>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>> Date: Mon, 6 Mar 2023 09:09:31 +0800
> 
> [...]
> 
>>> Ah, from that perspective. Yes, you're probably right, but would need to
>>> be tested anyway. I don't see any open problems with the PP recycling
>>> right now on the lists, but someone may try to change it one day.
>>> Anyway, this flag is only to do a quick test. We do have
>>> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb
>>> was pfmemalloced.
>>
>> The point seems to be that sk_buff::pfmemalloc allow false positive, which
>> means skb->pfmemalloc can be set to true while every page from this skb is
>> not pfmemalloced as you mentioned.
>>
>> While skb->pp_recycle can't allow false positive, if that happens, reference
>> counting of the page will not be handled properly if pp and non-pp skb shares
>> the page as the wireless adapter does.
> 
> You mean false-positives in both directions? Because if ->pp_recycle is
> set, the stack can still free non-PP pages. In the opposite case, I mean
> when ->pp_recycle is false and an skb page belongs to a page_pool, yes,
> there'll be issues.

That may depends on what is a PP pages and what is a non-PP pages, it seems
hard to answer now.

For a skb with ->pp_recycle being true and its frag page with page->pp_magic
being PP_SIGNATURE, when calling skb_clone()/pskb_expand_head() or
skb_try_coalesce(), we may call __skb_frag_ref() for the frag page, which
mean a page with page->pp_magic being PP_SIGNATURE can be both PP page
and non-PP page at the same time. So it is important to set the ->pp_recycle
correctly, and it seems hard to get that right from past experience,that's
why a per page marker is suggested.


> But I think the deal is to propagate the flag when you want to attach a
> PP-backed page to the skb? I mean, if someone decides to mix pages with
> different memory models, it's his responsibility to make sure everything
> is fine, because it's not a common/intended way. Isn't it?
> 
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Anyway, I am not sure checking ::pp_magic is correct when a
>>>>>> page will be passing between different subsystem and back to
>>>>>> the network stack eventually, checking ::pp_magic may not be
>>>>>> correct if this happens.
>>>>>>
>>>>>> Another way is to use the bottom two bits in bv_page, see:
>>>>>> https://www.spinics.net/lists/netdev/msg874099.html
> 
> This one is interesting actually. We'd only need one bit -- which is
> 100% free and available in case of page pointers.
> 
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>  
>>>>>>>>>  	/* Allow SKB to reuse area used by xdp_frame */
>>>>>>>>>  	xdp_scrub_frame(xdpf);
> 
> [...]
> 
> Thanks,
> Olek
> 
> .
>
Alexander Lobakin March 9, 2023, 4:27 p.m. UTC | #13
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Wed, 8 Mar 2023 14:27:13 +0800

> On 2023/3/8 2:14, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Tue, 7 Mar 2023 10:50:34 +0800

[...]

>> You mean false-positives in both directions? Because if ->pp_recycle is
>> set, the stack can still free non-PP pages. In the opposite case, I mean
>> when ->pp_recycle is false and an skb page belongs to a page_pool, yes,
>> there'll be issues.
> 
> That may depends on what is a PP pages and what is a non-PP pages, it seems
> hard to answer now.
> 
> For a skb with ->pp_recycle being true and its frag page with page->pp_magic
> being PP_SIGNATURE, when calling skb_clone()/pskb_expand_head() or
> skb_try_coalesce(), we may call __skb_frag_ref() for the frag page, which
> mean a page with page->pp_magic being PP_SIGNATURE can be both PP page
> and non-PP page at the same time. So it is important to set the ->pp_recycle
> correctly, and it seems hard to get that right from past experience,that's
> why a per page marker is suggested.

Oh well, I didn't know that :s
Thanks for the expl.

[...]

Olek
diff mbox series

Patch

diff --git a/net/core/xdp.c b/net/core/xdp.c
index 8c92fc553317..a2237cfca8e9 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -658,8 +658,8 @@  struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 	 * - RX ring dev queue index	(skb_record_rx_queue)
 	 */
 
-	/* Until page_pool get SKB return path, release DMA here */
-	xdp_release_frame(xdpf);
+	if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
+		skb_mark_for_recycle(skb);
 
 	/* Allow SKB to reuse area used by xdp_frame */
 	xdp_scrub_frame(xdpf);