mbox series

[v3,00/26] Split netmem from struct page

Message ID 20230111042214.907030-1-willy@infradead.org (mailing list archive)
Headers show
Series Split netmem from struct page | expand

Message

Matthew Wilcox Jan. 11, 2023, 4:21 a.m. UTC
The MM subsystem is trying to reduce struct page to a single pointer.
The first step towards that is splitting struct page by its individual
users, as has already been done with folio and slab.  This patchset does
that for netmem which is used for page pools.

There are some relatively significant reductions in kernel text size
from these changes.  They don't appear to affect performance at all,
but it's nice to save a bit of memory.

v3:
 - Rebase to next-20230110
 - Add received Acked-by and Reviewed-by tags (thanks!)
 - Mark compat functions in page_pool.h (Ilias)
 - Correct a patch title
 - Convert hns3 driver (and page_pool_dev_alloc_frag())
 - Make page_pool_recycle_direct() accept a netmem or page pointer

Matthew Wilcox (Oracle) (26):
  netmem: Create new type
  netmem: Add utility functions
  page_pool: Add netmem_set_dma_addr() and netmem_get_dma_addr()
  page_pool: Convert page_pool_release_page() to
    page_pool_release_netmem()
  page_pool: Start using netmem in allocation path.
  page_pool: Convert page_pool_return_page() to
    page_pool_return_netmem()
  page_pool: Convert __page_pool_put_page() to __page_pool_put_netmem()
  page_pool: Convert pp_alloc_cache to contain netmem
  page_pool: Convert page_pool_defrag_page() to
    page_pool_defrag_netmem()
  page_pool: Convert page_pool_put_defragged_page() to netmem
  page_pool: Convert page_pool_empty_ring() to use netmem
  page_pool: Convert page_pool_alloc_pages() to page_pool_alloc_netmem()
  page_pool: Convert page_pool_dma_sync_for_device() to take a netmem
  page_pool: Convert page_pool_recycle_in_cache() to netmem
  page_pool: Remove __page_pool_put_page()
  page_pool: Use netmem in page_pool_drain_frag()
  page_pool: Convert page_pool_return_skb_page() to use netmem
  page_pool: Allow page_pool_recycle_direct() to take a netmem or a page
  page_pool: Convert frag_page to frag_nmem
  xdp: Convert to netmem
  mm: Remove page pool members from struct page
  page_pool: Pass a netmem to init_callback()
  net: Add support for netmem in skb_frag
  mvneta: Convert to netmem
  mlx5: Convert to netmem
  hns3: Convert to netmem

 Documentation/networking/page_pool.rst        |   5 +
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  16 +-
 drivers/net/ethernet/marvell/mvneta.c         |  48 +--
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en/txrx.h |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  24 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 130 +++++----
 include/linux/mm_types.h                      |  22 --
 include/linux/skbuff.h                        |  11 +
 include/net/page_pool.h                       | 228 ++++++++++++---
 include/trace/events/page_pool.h              |  28 +-
 net/bpf/test_run.c                            |   4 +-
 net/core/page_pool.c                          | 274 +++++++++---------
 net/core/xdp.c                                |   7 +-
 16 files changed, 493 insertions(+), 332 deletions(-)

Comments

Yunsheng Lin Jan. 11, 2023, 8:25 a.m. UTC | #1
On 2023/1/11 12:21, Matthew Wilcox (Oracle) wrote:
> The MM subsystem is trying to reduce struct page to a single pointer.
> The first step towards that is splitting struct page by its individual
> users, as has already been done with folio and slab.  This patchset does
> that for netmem which is used for page pools.

As page pool is only used for rx side in the net stack depending on the
driver, a lot more memory for the net stack is from page_frag_alloc_align(),
kmem cache, etc.
naming it netmem seems a little overkill, perhaps a more specific name for
the page pool? such as pp_cache.

@Jesper & Ilias
Any better idea?
And it seem some API may need changing too, as we are not pooling 'pages'
now.

> 
> There are some relatively significant reductions in kernel text size
> from these changes.  They don't appear to affect performance at all,
> but it's nice to save a bit of memory.
> 
> v3:
>  - Rebase to next-20230110
>  - Add received Acked-by and Reviewed-by tags (thanks!)
>  - Mark compat functions in page_pool.h (Ilias)
>  - Correct a patch title
>  - Convert hns3 driver (and page_pool_dev_alloc_frag())
>  - Make page_pool_recycle_direct() accept a netmem or page pointer
> 
> Matthew Wilcox (Oracle) (26):
>   netmem: Create new type
>   netmem: Add utility functions
>   page_pool: Add netmem_set_dma_addr() and netmem_get_dma_addr()
>   page_pool: Convert page_pool_release_page() to
>     page_pool_release_netmem()
>   page_pool: Start using netmem in allocation path.
                                                    ^
nit: there is a '.' at the end of patch titile.

>   page_pool: Convert page_pool_return_page() to
>     page_pool_return_netmem()
>   page_pool: Convert __page_pool_put_page() to __page_pool_put_netmem()
>   page_pool: Convert pp_alloc_cache to contain netmem
>   page_pool: Convert page_pool_defrag_page() to
>     page_pool_defrag_netmem()
>   page_pool: Convert page_pool_put_defragged_page() to netmem
>   page_pool: Convert page_pool_empty_ring() to use netmem
>   page_pool: Convert page_pool_alloc_pages() to page_pool_alloc_netmem()
>   page_pool: Convert page_pool_dma_sync_for_device() to take a netmem
>   page_pool: Convert page_pool_recycle_in_cache() to netmem
>   page_pool: Remove __page_pool_put_page()
>   page_pool: Use netmem in page_pool_drain_frag()
>   page_pool: Convert page_pool_return_skb_page() to use netmem
>   page_pool: Allow page_pool_recycle_direct() to take a netmem or a page
>   page_pool: Convert frag_page to frag_nmem
>   xdp: Convert to netmem
>   mm: Remove page pool members from struct page
>   page_pool: Pass a netmem to init_callback()
>   net: Add support for netmem in skb_frag
>   mvneta: Convert to netmem
>   mlx5: Convert to netmem
>   hns3: Convert to netmem
> 
>  Documentation/networking/page_pool.rst        |   5 +
>  .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  16 +-
>  drivers/net/ethernet/marvell/mvneta.c         |  48 +--
>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |  10 +-
>  .../net/ethernet/mellanox/mlx5/core/en/txrx.h |   4 +-
>  .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  24 +-
>  .../net/ethernet/mellanox/mlx5/core/en/xdp.h  |   2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |  12 +-
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 130 +++++----
>  include/linux/mm_types.h                      |  22 --
>  include/linux/skbuff.h                        |  11 +
>  include/net/page_pool.h                       | 228 ++++++++++++---
>  include/trace/events/page_pool.h              |  28 +-
>  net/bpf/test_run.c                            |   4 +-
>  net/core/page_pool.c                          | 274 +++++++++---------
>  net/core/xdp.c                                |   7 +-
>  16 files changed, 493 insertions(+), 332 deletions(-)
>
Matthew Wilcox Jan. 11, 2023, 1:21 p.m. UTC | #2
On Wed, Jan 11, 2023 at 04:25:46PM +0800, Yunsheng Lin wrote:
> On 2023/1/11 12:21, Matthew Wilcox (Oracle) wrote:
> > The MM subsystem is trying to reduce struct page to a single pointer.
> > The first step towards that is splitting struct page by its individual
> > users, as has already been done with folio and slab.  This patchset does
> > that for netmem which is used for page pools.
> 
> As page pool is only used for rx side in the net stack depending on the
> driver, a lot more memory for the net stack is from page_frag_alloc_align(),
> kmem cache, etc.
> naming it netmem seems a little overkill, perhaps a more specific name for
> the page pool? such as pp_cache.
> 
> @Jesper & Ilias
> Any better idea?
> And it seem some API may need changing too, as we are not pooling 'pages'
> now.

I raised the question of naming in v1, six weeks ago, and nobody had
any better names.  Seems a little unfair to ignore the question at first
and then bring it up now.  I'd hate to miss the merge window because of
a late-breaking major request like this.

https://lore.kernel.org/netdev/20221130220803.3657490-1-willy@infradead.org/

I'd like to understand what we think we'll do in networking when we trim
struct page down to a single pointer,  All these usages that aren't from
page_pool -- what information does networking need to track per-allocation?
Would it make sense for the netmem to describe all memory used by the
networking stack, and have allocators other than page_pool also return
netmem, or does the normal usage of memory in the net stack not need to
track that information?

> > Matthew Wilcox (Oracle) (26):
> >   netmem: Create new type
> >   netmem: Add utility functions
> >   page_pool: Add netmem_set_dma_addr() and netmem_get_dma_addr()
> >   page_pool: Convert page_pool_release_page() to
> >     page_pool_release_netmem()
> >   page_pool: Start using netmem in allocation path.
>                                                     ^
> nit: there is a '.' at the end of patch titile.

Thanks, I'll fix that for v4.
Yunsheng Lin Jan. 12, 2023, 2:12 a.m. UTC | #3
On 2023/1/11 21:21, Matthew Wilcox wrote:
> On Wed, Jan 11, 2023 at 04:25:46PM +0800, Yunsheng Lin wrote:
>> On 2023/1/11 12:21, Matthew Wilcox (Oracle) wrote:
>>> The MM subsystem is trying to reduce struct page to a single pointer.
>>> The first step towards that is splitting struct page by its individual
>>> users, as has already been done with folio and slab.  This patchset does
>>> that for netmem which is used for page pools.
>>
>> As page pool is only used for rx side in the net stack depending on the
>> driver, a lot more memory for the net stack is from page_frag_alloc_align(),
>> kmem cache, etc.
>> naming it netmem seems a little overkill, perhaps a more specific name for
>> the page pool? such as pp_cache.
>>
>> @Jesper & Ilias
>> Any better idea?
>> And it seem some API may need changing too, as we are not pooling 'pages'
>> now.
> 
> I raised the question of naming in v1, six weeks ago, and nobody had
> any better names.  Seems a little unfair to ignore the question at first
> and then bring it up now.  I'd hate to miss the merge window because of
> a late-breaking major request like this.

I was not keeping very close key on the maillist, so forgive me for missing
it.
As this version contains change to hns3 driver, it caught my eyes and I took
some time to go through it.

> 
> https://lore.kernel.org/netdev/20221130220803.3657490-1-willy@infradead.org/
> 
> I'd like to understand what we think we'll do in networking when we trim
> struct page down to a single pointer,  All these usages that aren't from
> page_pool -- what information does networking need to track per-allocation?

these are memory used by net stack as my limited understanding:
1. page allocated directly from page allocator directly, such as
   dev_alloc_pages().
2. page allocated from page pool API.
3. page allocated form page frag API such as page_frag_alloc_align().
4. buffer allocated from kmem cache API, and may converted to page
   using virt_to_page().
5. page allocated from other subsystem and used by net stack (or allocated
   by net stack and used by other subsystem)using copy avoidance optimization,
   such as sendfile and splite.

For case 1, I do not think the networking need to track per-allocation
information now as its user has taken care of that if information needed.

For case 3, I view it as a pool far more smaller than page pool, we may
merge page frag as part of page pool or find common information needed
by both of them.

For case 4 & 5, they seems to be a similar case. I am not sure how
"triming struct page down to a single pointer" works yet, but at least
it may need a commom field such as page->pp_magic for different subsystem
to use the same page coordinately as page pool does now?

> Would it make sense for the netmem to describe all memory used by the
> networking stack, and have allocators other than page_pool also return
> netmem, or does the normal usage of memory in the net stack not need to
> track that information?

The above question seems hard to answer now, if "reduce struct page to a
single pointer" does not affect case 4 & 5 or other case I missed, it
seems fine to limit the change to page pool as the patchset does for now,
it would be better if we can come out with better name than 'netmem'.
Jesper Dangaard Brouer Jan. 12, 2023, 10:15 a.m. UTC | #4
On 11/01/2023 14.21, Matthew Wilcox wrote:
> On Wed, Jan 11, 2023 at 04:25:46PM +0800, Yunsheng Lin wrote:
>> On 2023/1/11 12:21, Matthew Wilcox (Oracle) wrote:
>>> The MM subsystem is trying to reduce struct page to a single pointer.
>>> The first step towards that is splitting struct page by its individual
>>> users, as has already been done with folio and slab.  This patchset does
>>> that for netmem which is used for page pools.
>> As page pool is only used for rx side in the net stack depending on the
>> driver, a lot more memory for the net stack is from page_frag_alloc_align(),
>> kmem cache, etc.
>> naming it netmem seems a little overkill, perhaps a more specific name for
>> the page pool? such as pp_cache.
>>
>> @Jesper & Ilias
>> Any better idea?

I like the 'netmem' name.

>> And it seem some API may need changing too, as we are not pooling 'pages'
>> now.

IMHO it would be overkill to rename the page_pool to e.g. netmem_pool.
as it would generate too much churn and will be hard to follow in git
as the code filename page_pool.c would also have to be renamed.
It guess we keep page_pool for historical reasons ;-)

> I raised the question of naming in v1, six weeks ago, and nobody had
> any better names.  Seems a little unfair to ignore the question at first
> and then bring it up now.  I'd hate to miss the merge window because of
> a late-breaking major request like this.
> 
> https://lore.kernel.org/netdev/20221130220803.3657490-1-willy@infradead.org/
> 
> I'd like to understand what we think we'll do in networking when we trim
> struct page down to a single pointer,  All these usages that aren't from
> page_pool -- what information does networking need to track per-allocation?
> Would it make sense for the netmem to describe all memory used by the
> networking stack, and have allocators other than page_pool also return
> netmem, 

This is also how I see the future, that other netstack "allocators" can
return and work-with 'netmem' objects.   IMHO we are already cramming
too many use-cases into page_pool (like the frag support Yunsheng
added).  IMHO there are room for other netstack "allocators" that can
utilize netmem.  The page_pool is optimized for RX-NAPI workloads, using
it for other purposes is a mistake IMHO.  People should create other
netstack "allocators" that solves their specific use-cases.  E.g. The TX
path likely needs another "allocator" optimized for this TX use-case.

> or does the normal usage of memory in the net stack not need to
> track that information?

The page refcnt is (obviously) used by netstack as tracked information.
I have seen drivers that use the DMA mapping directly in page/'netmem',
instead of having to store this separately in the drivers.

--Jesper
Yunsheng Lin Jan. 13, 2023, 2:19 a.m. UTC | #5
On 2023/1/12 18:15, Jesper Dangaard Brouer wrote:> On 11/01/2023 14.21, Matthew Wilcox wrote:
>> On Wed, Jan 11, 2023 at 04:25:46PM +0800, Yunsheng Lin wrote:
>>> On 2023/1/11 12:21, Matthew Wilcox (Oracle) wrote:
>>>> The MM subsystem is trying to reduce struct page to a single pointer.
>>>> The first step towards that is splitting struct page by its individual
>>>> users, as has already been done with folio and slab.  This patchset does
>>>> that for netmem which is used for page pools.
>>> As page pool is only used for rx side in the net stack depending on the
>>> driver, a lot more memory for the net stack is from page_frag_alloc_align(),
>>> kmem cache, etc.
>>> naming it netmem seems a little overkill, perhaps a more specific name for
>>> the page pool? such as pp_cache.
>>>
>>> @Jesper & Ilias
>>> Any better idea?
> 
> I like the 'netmem' name.

Fair enough.
I just pointed out why netmem might not be appropriate when we are not
figuring out how netmem will work through the whole networking stack yet.
It is eventually your and david/jakub's call to decide the naming anyway.

> 
>>> And it seem some API may need changing too, as we are not pooling 'pages'
>>> now.
> 
> IMHO it would be overkill to rename the page_pool to e.g. netmem_pool.
> as it would generate too much churn and will be hard to follow in git
> as the code filename page_pool.c would also have to be renamed.
> It guess we keep page_pool for historical reasons ;-)

I think this is a matter of conflict between backward and forward maintainability.
IMHO we should prefer forward maintainability over backward maintainability.

And greg offers a possible way to fix the backport problem:
https://www.spinics.net/lists/kernel/msg4648826.html

For git history, I suppose that is a pain we have to pay for the future
maintainability.

> 
>> I raised the question of naming in v1, six weeks ago, and nobody had
>> any better names.  Seems a little unfair to ignore the question at first
>> and then bring it up now.  I'd hate to miss the merge window because of
>> a late-breaking major request like this.
>>
>> https://lore.kernel.org/netdev/20221130220803.3657490-1-willy@infradead.org/
>>
>> I'd like to understand what we think we'll do in networking when we trim
>> struct page down to a single pointer,  All these usages that aren't from
>> page_pool -- what information does networking need to track per-allocation?
>> Would it make sense for the netmem to describe all memory used by the
>> networking stack, and have allocators other than page_pool also return
>> netmem, 
> 
> This is also how I see the future, that other netstack "allocators" can
> return and work-with 'netmem' objects.   IMHO we are already cramming

I am not sure how "other netstack 'allocators' can return and work-with
'netmem' objects" works, I suppose putting different union for different
allocators in struct netmem like struct page does? Isn't that bringing
the similar problem Matthew is trying to fix in this patchset?


> too many use-cases into page_pool (like the frag support Yunsheng
> added).  IMHO there are room for other netstack "allocators" that can

I do not understand why frag support is viewed as "cramming use-cases to
page pool".
In my defence, the frag support for rx is fix in the page pool, it just
extend the page pool to return smaller buffer than before. If I create other
allocator for that, I might invent a lot of wheel page pool already invented.

> utilize netmem.  The page_pool is optimized for RX-NAPI workloads, using
> it for other purposes is a mistake IMHO.  People should create other
> netstack "allocators" that solves their specific use-cases.  E.g. The TX
> path likely needs another "allocator" optimized for this TX use-case.
>