Message ID | 20230301160315.1022488-2-aleksander.lobakin@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | xdp: recycle Page Pool backed skbs built from XDP frames | expand |
Hi Alexander, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230301160315.1022488-2-aleksander.lobakin%40intel.com patch subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20230302/202303020331.PSFMFbXw-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/a5ca5578e9bd35220be091fd02df96d492120ee3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 git checkout a5ca5578e9bd35220be091fd02df96d492120ee3 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=i386 olddefconfig make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303020331.PSFMFbXw-lkp@intel.com/ All errors (new ones prefixed by >>): net/core/xdp.c: In function '__xdp_build_skb_from_frame': >> net/core/xdp.c:662:17: error: implicit declaration of function 'skb_mark_for_recycle' [-Werror=implicit-function-declaration] 662 | skb_mark_for_recycle(skb); | ^~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/skb_mark_for_recycle +662 net/core/xdp.c 614 615 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, 616 struct sk_buff *skb, 617 struct net_device *dev) 618 { 619 struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); 620 unsigned int headroom, frame_size; 621 void *hard_start; 622 u8 nr_frags; 623 624 /* xdp frags frame */ 625 if (unlikely(xdp_frame_has_frags(xdpf))) 626 nr_frags = sinfo->nr_frags; 627 628 /* Part of headroom was reserved to xdpf */ 629 headroom = sizeof(*xdpf) + xdpf->headroom; 630 631 /* Memory size backing xdp_frame data already have reserved 632 * room for build_skb to place skb_shared_info in tailroom. 633 */ 634 frame_size = xdpf->frame_sz; 635 636 hard_start = xdpf->data - headroom; 637 skb = build_skb_around(skb, hard_start, frame_size); 638 if (unlikely(!skb)) 639 return NULL; 640 641 skb_reserve(skb, headroom); 642 __skb_put(skb, xdpf->len); 643 if (xdpf->metasize) 644 skb_metadata_set(skb, xdpf->metasize); 645 646 if (unlikely(xdp_frame_has_frags(xdpf))) 647 xdp_update_skb_shared_info(skb, nr_frags, 648 sinfo->xdp_frags_size, 649 nr_frags * xdpf->frame_sz, 650 xdp_frame_is_frag_pfmemalloc(xdpf)); 651 652 /* Essential SKB info: protocol and skb->dev */ 653 skb->protocol = eth_type_trans(skb, dev); 654 655 /* Optional SKB info, currently missing: 656 * - HW checksum info (skb->ip_summed) 657 * - HW RX hash (skb_set_hash) 658 * - RX ring dev queue index (skb_record_rx_queue) 659 */ 660 661 if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > 662 skb_mark_for_recycle(skb); 663 664 /* Allow SKB to reuse area used by xdp_frame */ 665 xdp_scrub_frame(xdpf); 666 667 return skb; 668 } 669 EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame); 670
Hi Alexander, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230301160315.1022488-2-aleksander.lobakin%40intel.com patch subject: [PATCH bpf-next v1 1/2] xdp: recycle Page Pool backed skbs built from XDP frames config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230302/202303020342.Wi2PRFFH-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/a5ca5578e9bd35220be091fd02df96d492120ee3 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Alexander-Lobakin/xdp-recycle-Page-Pool-backed-skbs-built-from-XDP-frames/20230302-000635 git checkout a5ca5578e9bd35220be091fd02df96d492120ee3 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303020342.Wi2PRFFH-lkp@intel.com/ All errors (new ones prefixed by >>): >> net/core/xdp.c:662:3: error: implicit declaration of function 'skb_mark_for_recycle' is invalid in C99 [-Werror,-Wimplicit-function-declaration] skb_mark_for_recycle(skb); ^ 1 error generated. vim +/skb_mark_for_recycle +662 net/core/xdp.c 614 615 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, 616 struct sk_buff *skb, 617 struct net_device *dev) 618 { 619 struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); 620 unsigned int headroom, frame_size; 621 void *hard_start; 622 u8 nr_frags; 623 624 /* xdp frags frame */ 625 if (unlikely(xdp_frame_has_frags(xdpf))) 626 nr_frags = sinfo->nr_frags; 627 628 /* Part of headroom was reserved to xdpf */ 629 headroom = sizeof(*xdpf) + xdpf->headroom; 630 631 /* Memory size backing xdp_frame data already have reserved 632 * room for build_skb to place skb_shared_info in tailroom. 633 */ 634 frame_size = xdpf->frame_sz; 635 636 hard_start = xdpf->data - headroom; 637 skb = build_skb_around(skb, hard_start, frame_size); 638 if (unlikely(!skb)) 639 return NULL; 640 641 skb_reserve(skb, headroom); 642 __skb_put(skb, xdpf->len); 643 if (xdpf->metasize) 644 skb_metadata_set(skb, xdpf->metasize); 645 646 if (unlikely(xdp_frame_has_frags(xdpf))) 647 xdp_update_skb_shared_info(skb, nr_frags, 648 sinfo->xdp_frags_size, 649 nr_frags * xdpf->frame_sz, 650 xdp_frame_is_frag_pfmemalloc(xdpf)); 651 652 /* Essential SKB info: protocol and skb->dev */ 653 skb->protocol = eth_type_trans(skb, dev); 654 655 /* Optional SKB info, currently missing: 656 * - HW checksum info (skb->ip_summed) 657 * - HW RX hash (skb_set_hash) 658 * - RX ring dev queue index (skb_record_rx_queue) 659 */ 660 661 if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > 662 skb_mark_for_recycle(skb); 663 664 /* Allow SKB to reuse area used by xdp_frame */ 665 xdp_scrub_frame(xdpf); 666 667 return skb; 668 } 669 EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame); 670
On 2023/3/2 0:03, Alexander Lobakin wrote: > __xdp_build_skb_from_frame() state(d): > > /* Until page_pool get SKB return path, release DMA here */ > > Page Pool got skb pages recycling in April 2021, but missed this > function. > > xdp_release_frame() is relevant only for Page Pool backed frames and it > detaches the page from the corresponding Pool in order to make it > freeable via page_frag_free(). It can instead just mark the output skb > as eligible for recycling if the frame is backed by a PP. No change for > other memory model types (the same condition check as before). > cpumap redirect and veth on Page Pool drivers now become zero-alloc (or > almost). > > Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> > --- > net/core/xdp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/core/xdp.c b/net/core/xdp.c > index 8c92fc553317..a2237cfca8e9 100644 > --- a/net/core/xdp.c > +++ b/net/core/xdp.c > @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, > * - RX ring dev queue index (skb_record_rx_queue) > */ > > - /* Until page_pool get SKB return path, release DMA here */ > - xdp_release_frame(xdpf); > + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) > + skb_mark_for_recycle(skb); We both rely on both skb->pp_recycle and page->pp_magic to decide the page is really from page pool. So there was a few corner case problem when we are sharing a page for different skb in the driver level or calling skb_clone() or skb_try_coalesce(). see: https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ As the 'struct xdp_frame' also use 'struct skb_shared_info' which is sharable, see xdp_get_shared_info_from_frame(). For now xdpf_clone() does not seems to handling frag page yet, so it should be fine for now. IMHO we should find a way to use per-page marker, instead of both per-skb and per-page markers, in order to avoid the above problem for xdp if xdp has a similar processing as skb, as suggested by Eric. https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ > > /* Allow SKB to reuse area used by xdp_frame */ > xdp_scrub_frame(xdpf); >
On 02/03/2023 03.30, Yunsheng Lin wrote: > On 2023/3/2 0:03, Alexander Lobakin wrote: >> __xdp_build_skb_from_frame() state(d): >> >> /* Until page_pool get SKB return path, release DMA here */ >> >> Page Pool got skb pages recycling in April 2021, but missed this >> function. >> >> xdp_release_frame() is relevant only for Page Pool backed frames and it >> detaches the page from the corresponding Pool in order to make it >> freeable via page_frag_free(). It can instead just mark the output skb >> as eligible for recycling if the frame is backed by a PP. No change for >> other memory model types (the same condition check as before). >> cpumap redirect and veth on Page Pool drivers now become zero-alloc (or >> almost). >> >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> >> --- >> net/core/xdp.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/core/xdp.c b/net/core/xdp.c >> index 8c92fc553317..a2237cfca8e9 100644 >> --- a/net/core/xdp.c >> +++ b/net/core/xdp.c >> @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, >> * - RX ring dev queue index (skb_record_rx_queue) >> */ >> >> - /* Until page_pool get SKB return path, release DMA here */ >> - xdp_release_frame(xdpf); >> + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) >> + skb_mark_for_recycle(skb); > > > We both rely on both skb->pp_recycle and page->pp_magic to decide > the page is really from page pool. So there was a few corner case > problem when we are sharing a page for different skb in the driver > level or calling skb_clone() or skb_try_coalesce(). > see: > https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 > https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ > https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ > > As the 'struct xdp_frame' also use 'struct skb_shared_info' which is > sharable, see xdp_get_shared_info_from_frame(). > > For now xdpf_clone() does not seems to handling frag page yet, > so it should be fine for now. > > IMHO we should find a way to use per-page marker, instead of both > per-skb and per-page markers, in order to avoid the above problem > for xdp if xdp has a similar processing as skb, as suggested by Eric. > Moving to a per-page marker can be *more* expensive if the struct-page memory isn't cache-hot. So, if struct-page is accessed anyhow then sure we can move it to a per-page marker. > https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ > >> >> /* Allow SKB to reuse area used by xdp_frame */ >> xdp_scrub_frame(xdpf); >> >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Thu, 2 Mar 2023 10:30:13 +0800 > On 2023/3/2 0:03, Alexander Lobakin wrote: >> __xdp_build_skb_from_frame() state(d): >> >> /* Until page_pool get SKB return path, release DMA here */ >> >> Page Pool got skb pages recycling in April 2021, but missed this >> function. [...] > We both rely on both skb->pp_recycle and page->pp_magic to decide > the page is really from page pool. So there was a few corner case > problem when we are sharing a page for different skb in the driver > level or calling skb_clone() or skb_try_coalesce(). > see: > https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 > https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ > https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ And they are fixed :D No drivers currently which use Page Pool mix PP pages with non-PP. And it's impossible to trigger try_coalesce() or so at least on cpumap path since we're only creating skbs at that moment, they don't come from anywhere else. > > As the 'struct xdp_frame' also use 'struct skb_shared_info' which is > sharable, see xdp_get_shared_info_from_frame(). > > For now xdpf_clone() does not seems to handling frag page yet, > so it should be fine for now. xdpf_clone() clones a frame to a new full page and doesn't copy its skb_shared_info. > > IMHO we should find a way to use per-page marker, instead of both > per-skb and per-page markers, in order to avoid the above problem > for xdp if xdp has a similar processing as skb, as suggested by Eric. > > https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ As Jesper already pointed out, not having a quick way to check whether we have to check ::pp_magic at all can decrease performance. So it's rather a shortcut. > >> >> /* Allow SKB to reuse area used by xdp_frame */ >> xdp_scrub_frame(xdpf); >> Thanks, Olek
On 2023/3/3 19:22, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Thu, 2 Mar 2023 10:30:13 +0800 > >> On 2023/3/2 0:03, Alexander Lobakin wrote: >>> __xdp_build_skb_from_frame() state(d): >>> >>> /* Until page_pool get SKB return path, release DMA here */ >>> >>> Page Pool got skb pages recycling in April 2021, but missed this >>> function. > > [...] > >> We both rely on both skb->pp_recycle and page->pp_magic to decide >> the page is really from page pool. So there was a few corner case >> problem when we are sharing a page for different skb in the driver >> level or calling skb_clone() or skb_try_coalesce(). >> see: >> https://github.com/torvalds/linux/commit/2cc3aeb5ecccec0d266813172fcd82b4b5fa5803 >> https://lore.kernel.org/netdev/MW5PR15MB51214C0513DB08A3607FBC1FBDE19@MW5PR15MB5121.namprd15.prod.outlook.com/t/ >> https://lore.kernel.org/netdev/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain/ > > And they are fixed :D > No drivers currently which use Page Pool mix PP pages with non-PP. And The wireless adapter which use Page Pool *does* mix PP pages with non-PP, see below discussion: https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ > it's impossible to trigger try_coalesce() or so at least on cpumap path > since we're only creating skbs at that moment, they don't come from > anywhere else. > >> >> As the 'struct xdp_frame' also use 'struct skb_shared_info' which is >> sharable, see xdp_get_shared_info_from_frame(). >> >> For now xdpf_clone() does not seems to handling frag page yet, >> so it should be fine for now. > > xdpf_clone() clones a frame to a new full page and doesn't copy its > skb_shared_info. > >> >> IMHO we should find a way to use per-page marker, instead of both >> per-skb and per-page markers, in order to avoid the above problem >> for xdp if xdp has a similar processing as skb, as suggested by Eric. >> >> https://lore.kernel.org/netdev/CANn89iKgZU4Q+THXupzZi4hETuKuCOvOB=iHpp5JzQTNv_Fg_A@mail.gmail.com/ > > As Jesper already pointed out, not having a quick way to check whether > we have to check ::pp_magic at all can decrease performance. So it's > rather a shortcut. When we are freeing a page by updating the _refcount, I think we are already touching the cache of ::pp_magic. Anyway, I am not sure checking ::pp_magic is correct when a page will be passing between different subsystem and back to the network stack eventually, checking ::pp_magic may not be correct if this happens. Another way is to use the bottom two bits in bv_page, see: https://www.spinics.net/lists/netdev/msg874099.html > >> >>> >>> /* Allow SKB to reuse area used by xdp_frame */ >>> xdp_scrub_frame(xdpf); >>> > > Thanks, > Olek > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Fri, 3 Mar 2023 20:44:24 +0800 > On 2023/3/3 19:22, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Thu, 2 Mar 2023 10:30:13 +0800 [...] >> And they are fixed :D >> No drivers currently which use Page Pool mix PP pages with non-PP. And > > The wireless adapter which use Page Pool *does* mix PP pages with > non-PP, see below discussion: > > https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ Ah right, I remember that (also was fixed). Not that I think it is correct to mix them -- for my PoV, a driver shoule either give *all* its Rx buffers as PP-backed or not use PP at all. [...] >> As Jesper already pointed out, not having a quick way to check whether >> we have to check ::pp_magic at all can decrease performance. So it's >> rather a shortcut. > > When we are freeing a page by updating the _refcount, I think > we are already touching the cache of ::pp_magic. But no page freeing happens before checking for skb->pp_recycle, neither in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. > > Anyway, I am not sure checking ::pp_magic is correct when a > page will be passing between different subsystem and back to > the network stack eventually, checking ::pp_magic may not be > correct if this happens. > > Another way is to use the bottom two bits in bv_page, see: > https://www.spinics.net/lists/netdev/msg874099.html > >> >>> >>>> >>>> /* Allow SKB to reuse area used by xdp_frame */ >>>> xdp_scrub_frame(xdpf); >>>> >> >> Thanks, >> Olek >> . >> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 [1] https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 Thanks, Olek
On 2023/3/3 21:26, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Fri, 3 Mar 2023 20:44:24 +0800 > >> On 2023/3/3 19:22, Alexander Lobakin wrote: >>> From: Yunsheng Lin <linyunsheng@huawei.com> >>> Date: Thu, 2 Mar 2023 10:30:13 +0800 > > [...] > >>> And they are fixed :D >>> No drivers currently which use Page Pool mix PP pages with non-PP. And >> >> The wireless adapter which use Page Pool *does* mix PP pages with >> non-PP, see below discussion: >> >> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ > > Ah right, I remember that (also was fixed). > Not that I think it is correct to mix them -- for my PoV, a driver > shoule either give *all* its Rx buffers as PP-backed or not use PP at all. > > [...] > >>> As Jesper already pointed out, not having a quick way to check whether >>> we have to check ::pp_magic at all can decrease performance. So it's >>> rather a shortcut. >> >> When we are freeing a page by updating the _refcount, I think >> we are already touching the cache of ::pp_magic. > > But no page freeing happens before checking for skb->pp_recycle, neither > in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. If we move to per page marker, we probably do not need checking skb->pp_recycle. Note both page_pool_return_skb_page() and skb_free_frag() can reuse the cache line triggered by per page marker checking if the per page marker is in the 'struct page'. > >> >> Anyway, I am not sure checking ::pp_magic is correct when a >> page will be passing between different subsystem and back to >> the network stack eventually, checking ::pp_magic may not be >> correct if this happens. >> >> Another way is to use the bottom two bits in bv_page, see: >> https://www.spinics.net/lists/netdev/msg874099.html >> >>> >>>> >>>>> >>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>> xdp_scrub_frame(xdpf); >>>>> >>> >>> Thanks, >>> Olek >>> . >>> > > [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 > [1] > https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 > > Thanks, > Olek > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Mon, 6 Mar 2023 09:09:31 +0800 > On 2023/3/3 21:26, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Fri, 3 Mar 2023 20:44:24 +0800 >> >>> On 2023/3/3 19:22, Alexander Lobakin wrote: >>>> From: Yunsheng Lin <linyunsheng@huawei.com> >>>> Date: Thu, 2 Mar 2023 10:30:13 +0800 >> >> [...] >> >>>> And they are fixed :D >>>> No drivers currently which use Page Pool mix PP pages with non-PP. And >>> >>> The wireless adapter which use Page Pool *does* mix PP pages with >>> non-PP, see below discussion: >>> >>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ >> >> Ah right, I remember that (also was fixed). >> Not that I think it is correct to mix them -- for my PoV, a driver >> shoule either give *all* its Rx buffers as PP-backed or not use PP at all. >> >> [...] >> >>>> As Jesper already pointed out, not having a quick way to check whether >>>> we have to check ::pp_magic at all can decrease performance. So it's >>>> rather a shortcut. >>> >>> When we are freeing a page by updating the _refcount, I think >>> we are already touching the cache of ::pp_magic. >> >> But no page freeing happens before checking for skb->pp_recycle, neither >> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. > > If we move to per page marker, we probably do not need checking > skb->pp_recycle. > > Note both page_pool_return_skb_page() and skb_free_frag() can > reuse the cache line triggered by per page marker checking if > the per page marker is in the 'struct page'. Ah, from that perspective. Yes, you're probably right, but would need to be tested anyway. I don't see any open problems with the PP recycling right now on the lists, but someone may try to change it one day. Anyway, this flag is only to do a quick test. We do have sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb was pfmemalloced. > >> >>> >>> Anyway, I am not sure checking ::pp_magic is correct when a >>> page will be passing between different subsystem and back to >>> the network stack eventually, checking ::pp_magic may not be >>> correct if this happens. >>> >>> Another way is to use the bottom two bits in bv_page, see: >>> https://www.spinics.net/lists/netdev/msg874099.html >>> >>>> >>>>> >>>>>> >>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>> xdp_scrub_frame(xdpf); >>>>>> >>>> >>>> Thanks, >>>> Olek >>>> . >>>> >> >> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 >> [1] >> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 >> >> Thanks, >> Olek >> . >> Thanks, Olek
On 2023/3/6 19:58, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Mon, 6 Mar 2023 09:09:31 +0800 > >> On 2023/3/3 21:26, Alexander Lobakin wrote: >>> From: Yunsheng Lin <linyunsheng@huawei.com> >>> Date: Fri, 3 Mar 2023 20:44:24 +0800 >>> >>>> On 2023/3/3 19:22, Alexander Lobakin wrote: >>>>> From: Yunsheng Lin <linyunsheng@huawei.com> >>>>> Date: Thu, 2 Mar 2023 10:30:13 +0800 >>> >>> [...] >>> >>>>> And they are fixed :D >>>>> No drivers currently which use Page Pool mix PP pages with non-PP. And >>>> >>>> The wireless adapter which use Page Pool *does* mix PP pages with >>>> non-PP, see below discussion: >>>> >>>> https://lore.kernel.org/netdev/156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com/ >>> >>> Ah right, I remember that (also was fixed). >>> Not that I think it is correct to mix them -- for my PoV, a driver >>> shoule either give *all* its Rx buffers as PP-backed or not use PP at all. >>> >>> [...] >>> >>>>> As Jesper already pointed out, not having a quick way to check whether >>>>> we have to check ::pp_magic at all can decrease performance. So it's >>>>> rather a shortcut. >>>> >>>> When we are freeing a page by updating the _refcount, I think >>>> we are already touching the cache of ::pp_magic. >>> >>> But no page freeing happens before checking for skb->pp_recycle, neither >>> in skb_pp_recycle() (skb_free_head() etc.)[0] nor in skb_frag_unref()[1]. >> >> If we move to per page marker, we probably do not need checking >> skb->pp_recycle. >> >> Note both page_pool_return_skb_page() and skb_free_frag() can >> reuse the cache line triggered by per page marker checking if >> the per page marker is in the 'struct page'. > > Ah, from that perspective. Yes, you're probably right, but would need to > be tested anyway. I don't see any open problems with the PP recycling > right now on the lists, but someone may try to change it one day. > Anyway, this flag is only to do a quick test. We do have > sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb > was pfmemalloced. The point seems to be that sk_buff::pfmemalloc allow false positive, which means skb->pfmemalloc can be set to true while every page from this skb is not pfmemalloced as you mentioned. While skb->pp_recycle can't allow false positive, if that happens, reference counting of the page will not be handled properly if pp and non-pp skb shares the page as the wireless adapter does. > >> >>> >>>> >>>> Anyway, I am not sure checking ::pp_magic is correct when a >>>> page will be passing between different subsystem and back to >>>> the network stack eventually, checking ::pp_magic may not be >>>> correct if this happens. >>>> >>>> Another way is to use the bottom two bits in bv_page, see: >>>> https://www.spinics.net/lists/netdev/msg874099.html >>>> >>>>> >>>>>> >>>>>>> >>>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>>> xdp_scrub_frame(xdpf); >>>>>>> >>>>> >>>>> Thanks, >>>>> Olek >>>>> . >>>>> >>> >>> [0] https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L808 >>> [1] >>> https://elixir.bootlin.com/linux/latest/source/include/linux/skbuff.h#L3385 >>> >>> Thanks, >>> Olek >>> . >>> > > Thanks, > Olek > > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Tue, 7 Mar 2023 10:50:34 +0800 > On 2023/3/6 19:58, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Mon, 6 Mar 2023 09:09:31 +0800 [...] >> Ah, from that perspective. Yes, you're probably right, but would need to >> be tested anyway. I don't see any open problems with the PP recycling >> right now on the lists, but someone may try to change it one day. >> Anyway, this flag is only to do a quick test. We do have >> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb >> was pfmemalloced. > > The point seems to be that sk_buff::pfmemalloc allow false positive, which > means skb->pfmemalloc can be set to true while every page from this skb is > not pfmemalloced as you mentioned. > > While skb->pp_recycle can't allow false positive, if that happens, reference > counting of the page will not be handled properly if pp and non-pp skb shares > the page as the wireless adapter does. You mean false-positives in both directions? Because if ->pp_recycle is set, the stack can still free non-PP pages. In the opposite case, I mean when ->pp_recycle is false and an skb page belongs to a page_pool, yes, there'll be issues. But I think the deal is to propagate the flag when you want to attach a PP-backed page to the skb? I mean, if someone decides to mix pages with different memory models, it's his responsibility to make sure everything is fine, because it's not a common/intended way. Isn't it? > >> >>> >>>> >>>>> >>>>> Anyway, I am not sure checking ::pp_magic is correct when a >>>>> page will be passing between different subsystem and back to >>>>> the network stack eventually, checking ::pp_magic may not be >>>>> correct if this happens. >>>>> >>>>> Another way is to use the bottom two bits in bv_page, see: >>>>> https://www.spinics.net/lists/netdev/msg874099.html This one is interesting actually. We'd only need one bit -- which is 100% free and available in case of page pointers. >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>>>> xdp_scrub_frame(xdpf); [...] Thanks, Olek
On 2023/3/8 2:14, Alexander Lobakin wrote: > From: Yunsheng Lin <linyunsheng@huawei.com> > Date: Tue, 7 Mar 2023 10:50:34 +0800 > >> On 2023/3/6 19:58, Alexander Lobakin wrote: >>> From: Yunsheng Lin <linyunsheng@huawei.com> >>> Date: Mon, 6 Mar 2023 09:09:31 +0800 > > [...] > >>> Ah, from that perspective. Yes, you're probably right, but would need to >>> be tested anyway. I don't see any open problems with the PP recycling >>> right now on the lists, but someone may try to change it one day. >>> Anyway, this flag is only to do a quick test. We do have >>> sk_buff::pfmemalloc, but this flag doesn't mean every page from this skb >>> was pfmemalloced. >> >> The point seems to be that sk_buff::pfmemalloc allow false positive, which >> means skb->pfmemalloc can be set to true while every page from this skb is >> not pfmemalloced as you mentioned. >> >> While skb->pp_recycle can't allow false positive, if that happens, reference >> counting of the page will not be handled properly if pp and non-pp skb shares >> the page as the wireless adapter does. > > You mean false-positives in both directions? Because if ->pp_recycle is > set, the stack can still free non-PP pages. In the opposite case, I mean > when ->pp_recycle is false and an skb page belongs to a page_pool, yes, > there'll be issues. That may depends on what is a PP pages and what is a non-PP pages, it seems hard to answer now. For a skb with ->pp_recycle being true and its frag page with page->pp_magic being PP_SIGNATURE, when calling skb_clone()/pskb_expand_head() or skb_try_coalesce(), we may call __skb_frag_ref() for the frag page, which mean a page with page->pp_magic being PP_SIGNATURE can be both PP page and non-PP page at the same time. So it is important to set the ->pp_recycle correctly, and it seems hard to get that right from past experience,that's why a per page marker is suggested. > But I think the deal is to propagate the flag when you want to attach a > PP-backed page to the skb? I mean, if someone decides to mix pages with > different memory models, it's his responsibility to make sure everything > is fine, because it's not a common/intended way. Isn't it? > >> >>> >>>> >>>>> >>>>>> >>>>>> Anyway, I am not sure checking ::pp_magic is correct when a >>>>>> page will be passing between different subsystem and back to >>>>>> the network stack eventually, checking ::pp_magic may not be >>>>>> correct if this happens. >>>>>> >>>>>> Another way is to use the bottom two bits in bv_page, see: >>>>>> https://www.spinics.net/lists/netdev/msg874099.html > > This one is interesting actually. We'd only need one bit -- which is > 100% free and available in case of page pointers. > >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> /* Allow SKB to reuse area used by xdp_frame */ >>>>>>>>> xdp_scrub_frame(xdpf); > > [...] > > Thanks, > Olek > > . >
From: Yunsheng Lin <linyunsheng@huawei.com> Date: Wed, 8 Mar 2023 14:27:13 +0800 > On 2023/3/8 2:14, Alexander Lobakin wrote: >> From: Yunsheng Lin <linyunsheng@huawei.com> >> Date: Tue, 7 Mar 2023 10:50:34 +0800 [...] >> You mean false-positives in both directions? Because if ->pp_recycle is >> set, the stack can still free non-PP pages. In the opposite case, I mean >> when ->pp_recycle is false and an skb page belongs to a page_pool, yes, >> there'll be issues. > > That may depends on what is a PP pages and what is a non-PP pages, it seems > hard to answer now. > > For a skb with ->pp_recycle being true and its frag page with page->pp_magic > being PP_SIGNATURE, when calling skb_clone()/pskb_expand_head() or > skb_try_coalesce(), we may call __skb_frag_ref() for the frag page, which > mean a page with page->pp_magic being PP_SIGNATURE can be both PP page > and non-PP page at the same time. So it is important to set the ->pp_recycle > correctly, and it seems hard to get that right from past experience,that's > why a per page marker is suggested. Oh well, I didn't know that :s Thanks for the expl. [...] Olek
diff --git a/net/core/xdp.c b/net/core/xdp.c index 8c92fc553317..a2237cfca8e9 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -658,8 +658,8 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, * - RX ring dev queue index (skb_record_rx_queue) */ - /* Until page_pool get SKB return path, release DMA here */ - xdp_release_frame(xdpf); + if (xdpf->mem.type == MEM_TYPE_PAGE_POOL) + skb_mark_for_recycle(skb); /* Allow SKB to reuse area used by xdp_frame */ xdp_scrub_frame(xdpf);
__xdp_build_skb_from_frame() state(d): /* Until page_pool get SKB return path, release DMA here */ Page Pool got skb pages recycling in April 2021, but missed this function. xdp_release_frame() is relevant only for Page Pool backed frames and it detaches the page from the corresponding Pool in order to make it freeable via page_frag_free(). It can instead just mark the output skb as eligible for recycling if the frame is backed by a PP. No change for other memory model types (the same condition check as before). cpumap redirect and veth on Page Pool drivers now become zero-alloc (or almost). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> --- net/core/xdp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)