diff mbox series

[net-next,v4,1/5] page_pool: frag API support for 32-bit arch with 64-bit DMA

Message ID 20230612130256.4572-2-linyunsheng@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series introduce page_pool_alloc() API | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5148 this patch: 5148
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 1977 this patch: 1977
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5378 this patch: 5378
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 137 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yunsheng Lin June 12, 2023, 1:02 p.m. UTC
Currently page_pool_alloc_frag() is not supported in 32-bit
arch with 64-bit DMA, which seems to be quite common, see
[1], which means driver may need to handle it when using
page_pool_alloc_frag() API.

In order to simplify the driver's work for supporting page
frag, this patch allows page_pool_alloc_frag() to call
page_pool_alloc_pages() to return a big page frag without
page splitting because of overlap issue between pp_frag_count
and dma_addr_upper in 'struct page' for those arches.

As page_pool_create() with PP_FLAG_PAGE_FRAG is supported in
32-bit arch with 64-bit DMA now, mlx5 calls page_pool_create()
with PP_FLAG_PAGE_FRAG and manipulate the page->pp_frag_count
directly using the page_pool_defrag_page(), so add a checking
for it to aoivd writing to page->pp_frag_count that may not
exist in some arch.

Note that it may aggravate truesize underestimate problem for
skb as there is no page splitting for those pages, if driver
need a accuate truesize, it may calculate that according to
frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
being true or not. And we may provide a helper for that if it
turns out to be helpful.

1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 10 +++++
 include/net/page_pool.h                       | 44 ++++++++++++++++---
 net/core/page_pool.c                          | 18 ++------
 3 files changed, 52 insertions(+), 20 deletions(-)

Comments

Alexander Lobakin June 13, 2023, 1:30 p.m. UTC | #1
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Mon, 12 Jun 2023 21:02:52 +0800

> Currently page_pool_alloc_frag() is not supported in 32-bit
> arch with 64-bit DMA, which seems to be quite common, see
> [1], which means driver may need to handle it when using
> page_pool_alloc_frag() API.

[...]

> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 126f9e294389..5c7f7501f300 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -33,6 +33,7 @@
>  #include <linux/mm.h> /* Needed by ptr_ring */
>  #include <linux/ptr_ring.h>
>  #include <linux/dma-direction.h>

This include is redundant now that you include dma-mapping.h.

> +#include <linux/dma-mapping.h>

As Jakub previously mentioned, this involves including dma-mapping.h,
which is relatively heavy, to each source file which includes skbuff.h,
i.e. almost the whole kernel :D
I addressed this in my series, which I hope will land soon after yours
(sending new revision in 24-48 hours), so you can leave it as it is. Or
otherwise you can pick my solution (or come up with your own :D).

>  
>  #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
>  					* map/unmap

Thanks,
Olek
Yunsheng Lin June 14, 2023, 3:36 a.m. UTC | #2
On 2023/6/13 21:30, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Mon, 12 Jun 2023 21:02:52 +0800
> 
>> Currently page_pool_alloc_frag() is not supported in 32-bit
>> arch with 64-bit DMA, which seems to be quite common, see
>> [1], which means driver may need to handle it when using
>> page_pool_alloc_frag() API.
> 
> [...]
> 
>> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
>> index 126f9e294389..5c7f7501f300 100644
>> --- a/include/net/page_pool.h
>> +++ b/include/net/page_pool.h
>> @@ -33,6 +33,7 @@
>>  #include <linux/mm.h> /* Needed by ptr_ring */
>>  #include <linux/ptr_ring.h>
>>  #include <linux/dma-direction.h>
> 
> This include is redundant now that you include dma-mapping.h.
> 
>> +#include <linux/dma-mapping.h>
> 
> As Jakub previously mentioned, this involves including dma-mapping.h,
> which is relatively heavy, to each source file which includes skbuff.h,
> i.e. almost the whole kernel :D

I am not sure I understand the part about 'includes skbuff.h' yet, it seems
'skbuff.h' does not have anything related to this patch?

> I addressed this in my series, which I hope will land soon after yours
> (sending new revision in 24-48 hours), so you can leave it as it is. Or
> otherwise you can pick my solution (or come up with your own :D).

Do you mean by removing "#include <linux/dma-direction.h>" as dma-mapping.h
has included dma-direction.h?
But I am not sure if there is any hard rule about not explicitly including
a .h which is implicitly included. What if the dma-mapping.h is changed to not
include dma-direction.h anymore?

Anyway, it seems it is unlikely to not include dma-direction.h in dma-mapping.h,
Will remove the "#include <linux/dma-direction.h>" if there is another version
needed for this patchset:)

> 
>>  
>>  #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
>>  					* map/unmap
> 
> Thanks,
> Olek
> 
> .
>
Jakub Kicinski June 14, 2023, 3:55 a.m. UTC | #3
On Wed, 14 Jun 2023 11:36:28 +0800 Yunsheng Lin wrote:
> > As Jakub previously mentioned, this involves including dma-mapping.h,
> > which is relatively heavy, to each source file which includes skbuff.h,
> > i.e. almost the whole kernel :D  
> 
> I am not sure I understand the part about 'includes skbuff.h' yet, it seems
> 'skbuff.h' does not have anything related to this patch?

$ git grep net/page_pool.h -- include/linux/skbuff.h
include/linux/skbuff.h:#include <net/page_pool.h>

> > I addressed this in my series, which I hope will land soon after yours
> > (sending new revision in 24-48 hours), so you can leave it as it is. Or
> > otherwise you can pick my solution (or come up with your own :D).  
> 
> Do you mean by removing "#include <linux/dma-direction.h>" as dma-mapping.h
> has included dma-direction.h?
> But I am not sure if there is any hard rule about not explicitly including
> a .h which is implicitly included. What if the dma-mapping.h is changed to not
> include dma-direction.h anymore?
> 
> Anyway, it seems it is unlikely to not include dma-direction.h in dma-mapping.h,
> Will remove the "#include <linux/dma-direction.h>" if there is another version
> needed for this patchset:)

The point is that we don't want commonly included headers to pull
in huge dependencies. Please run the preprocessor on
linux/dma-direction.h, you'll see how enormous it is.
Jakub Kicinski June 14, 2023, 4:09 a.m. UTC | #4
On Mon, 12 Jun 2023 21:02:52 +0800 Yunsheng Lin wrote:
> Currently page_pool_alloc_frag() is not supported in 32-bit
> arch with 64-bit DMA, which seems to be quite common, see
> [1], which means driver may need to handle it when using
> page_pool_alloc_frag() API.
> 
> In order to simplify the driver's work for supporting page
> frag, this patch allows page_pool_alloc_frag() to call
> page_pool_alloc_pages() to return a big page frag without

it returns an entire (potentially compound) page, not a frag.
AFAICT

> page splitting because of overlap issue between pp_frag_count
> and dma_addr_upper in 'struct page' for those arches.

These two lines seem to belong in the first paragraph,

> As page_pool_create() with PP_FLAG_PAGE_FRAG is supported in

"is" -> "will now be"

> 32-bit arch with 64-bit DMA now, mlx5 calls page_pool_create()
> with PP_FLAG_PAGE_FRAG and manipulate the page->pp_frag_count
> directly using the page_pool_defrag_page(), so add a checking
> for it to aoivd writing to page->pp_frag_count that may not
> exist in some arch.

This paragraph needs some proof reading :(

> Note that it may aggravate truesize underestimate problem for
> skb as there is no page splitting for those pages, if driver
> need a accuate truesize, it may calculate that according to

accurate

> frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
> being true or not. And we may provide a helper for that if it
> turns out to be helpful.
> 
> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/

> +		/* Return error here to avoid writing to page->pp_frag_count in
> +		 * mlx5e_page_release_fragmented() for page->pp_frag_count is

I don't see any direct access to pp_frag_count anywhere outside of
page_pool.h in net-next. PAGE_POOL_DMA_USE_PP_FRAG_COUNT sounds like 
an internal flag, drivers shouldn't be looking at it, IMO.
Alexander Lobakin June 14, 2023, 10:52 a.m. UTC | #5
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Wed, 14 Jun 2023 11:36:28 +0800

> On 2023/6/13 21:30, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Mon, 12 Jun 2023 21:02:52 +0800

[...]

>> I addressed this in my series, which I hope will land soon after yours
>> (sending new revision in 24-48 hours), so you can leave it as it is. Or
>> otherwise you can pick my solution (or come up with your own :D).
> 
> Do you mean by removing "#include <linux/dma-direction.h>" as dma-mapping.h
> has included dma-direction.h?

By "I addressed" I meant that I dropped including page_pool.h from
skbuff.h, as I also had to include dma-mapping.h to page_pool.h and this
implied that half of the kernel started including dma-mapping.h as well
for no profit.

> But I am not sure if there is any hard rule about not explicitly including
> a .h which is implicitly included. What if the dma-mapping.h is changed to not
> include dma-direction.h anymore?

No hard rule, but I don't see a reason for redundant includes. I usually
try to keep include lists as small as possible.

> 
> Anyway, it seems it is unlikely to not include dma-direction.h in dma-mapping.h,
> Will remove the "#include <linux/dma-direction.h>" if there is another version
> needed for this patchset:)
> 
>>
>>>  
>>>  #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
Thanks,
Olek
Yunsheng Lin June 14, 2023, 11:42 a.m. UTC | #6
On 2023/6/14 12:09, Jakub Kicinski wrote:
> On Mon, 12 Jun 2023 21:02:52 +0800 Yunsheng Lin wrote:
>> Currently page_pool_alloc_frag() is not supported in 32-bit
>> arch with 64-bit DMA, which seems to be quite common, see
>> [1], which means driver may need to handle it when using
>> page_pool_alloc_frag() API.
>>
>> In order to simplify the driver's work for supporting page
>> frag, this patch allows page_pool_alloc_frag() to call
>> page_pool_alloc_pages() to return a big page frag without
> 
> it returns an entire (potentially compound) page, not a frag.
> AFAICT

As driver calls page_pool_alloc_frag(), and page_pool_alloc_frag()
calls page_pool_alloc_pages(), page_pool_alloc_pages() is hidden
inside page_pool_alloc_frag(), so it is a big page frag from driver's
point of view:)

> 
>> page splitting because of overlap issue between pp_frag_count
>> and dma_addr_upper in 'struct page' for those arches.
> 
> These two lines seem to belong in the first paragraph,
> 
>> As page_pool_create() with PP_FLAG_PAGE_FRAG is supported in
> 
> "is" -> "will now be"
> 
>> 32-bit arch with 64-bit DMA now, mlx5 calls page_pool_create()
>> with PP_FLAG_PAGE_FRAG and manipulate the page->pp_frag_count
>> directly using the page_pool_defrag_page(), so add a checking
>> for it to aoivd writing to page->pp_frag_count that may not
>> exist in some arch.
> 
> This paragraph needs some proof reading :(

Perhaps something like below?
mlx5 calls page_pool_create() with PP_FLAG_PAGE_FRAG and is
not using the frag API, as PP_FLAG_PAGE_FRAG checking for arch
with PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true will now be
removed in this patch, so add back the checking of
PAGE_POOL_DMA_USE_PP_FRAG_COUNT for mlx5 driver to retain the
old behavior, which is to avoid mlx5e_page_release_fragmented()
calling page_pool_defrag_page() to write to page->pp_frag_count.

> 
>> Note that it may aggravate truesize underestimate problem for
>> skb as there is no page splitting for those pages, if driver
>> need a accuate truesize, it may calculate that according to
> 
> accurate
> 
>> frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
>> being true or not. And we may provide a helper for that if it
>> turns out to be helpful.
>>
>> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
> 
>> +		/* Return error here to avoid writing to page->pp_frag_count in
>> +		 * mlx5e_page_release_fragmented() for page->pp_frag_count is
> 
> I don't see any direct access to pp_frag_count anywhere outside of
> page_pool.h in net-next. PAGE_POOL_DMA_USE_PP_FRAG_COUNT sounds like 
> an internal flag, drivers shouldn't be looking at it, IMO.

mlx5e_page_release_fragmented() calls page_pool_defrag_page(), maybe
below is more correct:

/* Return error here to avoid mlx5e_page_release_fragmented() calling
 * page_pool_defrag_page() to write to page->pp_frag_count which is
 * not usable for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
*/

I am agree with you about that drivers shouldn't be looking at it. But
adding PAGE_POOL_DMA_USE_PP_FRAG_COUNT checking back to mlx5 seems to be
the simplest way I can think of because of the reason mentioned above.

And it seems that it is hard to change mlx5 to use frag API according to
the below disscusion with Alexander:

https://lore.kernel.org/all/CAKgT0UeD=sboWNUsP33_UsKEKyqTBfeOqNO5NCdFaxh9KXEG3w@mail.gmail.com/

> 
> .
>
Yunsheng Lin June 14, 2023, 12:15 p.m. UTC | #7
On 2023/6/14 18:52, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Wed, 14 Jun 2023 11:36:28 +0800
> 
>> On 2023/6/13 21:30, Alexander Lobakin wrote:
>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>> Date: Mon, 12 Jun 2023 21:02:52 +0800
> 
> [...]
> 
>>> I addressed this in my series, which I hope will land soon after yours
>>> (sending new revision in 24-48 hours), so you can leave it as it is. Or
>>> otherwise you can pick my solution (or come up with your own :D).
>>
>> Do you mean by removing "#include <linux/dma-direction.h>" as dma-mapping.h
>> has included dma-direction.h?
> 
> By "I addressed" I meant that I dropped including page_pool.h from
> skbuff.h, as I also had to include dma-mapping.h to page_pool.h and this
> implied that half of the kernel started including dma-mapping.h as well
> for no profit.

I see, thank for the explanation.
I perfer that you can continue with that effort if that is ok.
Alexander Lobakin June 14, 2023, 12:42 p.m. UTC | #8
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Wed, 14 Jun 2023 20:15:48 +0800

> On 2023/6/14 18:52, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Wed, 14 Jun 2023 11:36:28 +0800

[...]

>> By "I addressed" I meant that I dropped including page_pool.h from
>> skbuff.h, as I also had to include dma-mapping.h to page_pool.h and this
>> implied that half of the kernel started including dma-mapping.h as well
>> for no profit.
> 
> I see, thank for the explanation.
> I perfer that you can continue with that effort if that is ok.

Sure. Especially since I based my series on top of yours :)

Thanks,
Olek
Jakub Kicinski June 14, 2023, 5:07 p.m. UTC | #9
On Wed, 14 Jun 2023 19:42:29 +0800 Yunsheng Lin wrote:
> On 2023/6/14 12:09, Jakub Kicinski wrote:
> > On Mon, 12 Jun 2023 21:02:52 +0800 Yunsheng Lin wrote:  
> >> Currently page_pool_alloc_frag() is not supported in 32-bit
> >> arch with 64-bit DMA, which seems to be quite common, see
> >> [1], which means driver may need to handle it when using
> >> page_pool_alloc_frag() API.
> >>
> >> In order to simplify the driver's work for supporting page
> >> frag, this patch allows page_pool_alloc_frag() to call
> >> page_pool_alloc_pages() to return a big page frag without  
> > 
> > it returns an entire (potentially compound) page, not a frag.
> > AFAICT  
> 
> As driver calls page_pool_alloc_frag(), and page_pool_alloc_frag()
> calls page_pool_alloc_pages(), page_pool_alloc_pages() is hidden
> inside page_pool_alloc_frag(), so it is a big page frag from driver's
> point of view:)

frag​ment : a part broken off, detached, or incomplete
          a small part broken or separated off something.

"big fragment" is definitely not the whole thing.

> >> page splitting because of overlap issue between pp_frag_count
> >> and dma_addr_upper in 'struct page' for those arches.  
> > 
> > These two lines seem to belong in the first paragraph,
> >   
> >> As page_pool_create() with PP_FLAG_PAGE_FRAG is supported in  
> > 
> > "is" -> "will now be"
> >   
> >> 32-bit arch with 64-bit DMA now, mlx5 calls page_pool_create()
> >> with PP_FLAG_PAGE_FRAG and manipulate the page->pp_frag_count
> >> directly using the page_pool_defrag_page(), so add a checking
> >> for it to aoivd writing to page->pp_frag_count that may not
> >> exist in some arch.  
> > 
> > This paragraph needs some proof reading :(  
> 
> Perhaps something like below?
> mlx5 calls page_pool_create() with PP_FLAG_PAGE_FRAG and is
> not using the frag API, as PP_FLAG_PAGE_FRAG checking for arch
> with PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true will now be
> removed in this patch, so add back the checking of
> PAGE_POOL_DMA_USE_PP_FRAG_COUNT for mlx5 driver to retain the
> old behavior, which is to avoid mlx5e_page_release_fragmented()
> calling page_pool_defrag_page() to write to page->pp_frag_count.

That's a 7-line long, single sentence. Not much better.

> >> Note that it may aggravate truesize underestimate problem for
> >> skb as there is no page splitting for those pages, if driver
> >> need a accuate truesize, it may calculate that according to  
> > 
> > accurate
> >   
> >> frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
> >> being true or not. And we may provide a helper for that if it
> >> turns out to be helpful.
> >>
> >> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/  
> >   
> >> +		/* Return error here to avoid writing to page->pp_frag_count in
> >> +		 * mlx5e_page_release_fragmented() for page->pp_frag_count is  
> > 
> > I don't see any direct access to pp_frag_count anywhere outside of
> > page_pool.h in net-next. PAGE_POOL_DMA_USE_PP_FRAG_COUNT sounds like 
> > an internal flag, drivers shouldn't be looking at it, IMO.  
> 
> mlx5e_page_release_fragmented() calls page_pool_defrag_page(), maybe
> below is more correct:
> 
> /* Return error here to avoid mlx5e_page_release_fragmented() calling
>  * page_pool_defrag_page() to write to page->pp_frag_count which is
>  * not usable for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
> */
> 
> I am agree with you about that drivers shouldn't be looking at it. But
> adding PAGE_POOL_DMA_USE_PP_FRAG_COUNT checking back to mlx5 seems to be
> the simplest way I can think of because of the reason mentioned above.
> 
> And it seems that it is hard to change mlx5 to use frag API according to
> the below disscusion with Alexander:
> 
> https://lore.kernel.org/all/CAKgT0UeD=sboWNUsP33_UsKEKyqTBfeOqNO5NCdFaxh9KXEG3w@mail.gmail.com/

It's better to add a flag like PP_FLAG_PAGE_FRAG for this use case and
have pool creation fail than poke at internals in the driver, IMO.
Yunsheng Lin June 15, 2023, 7:29 a.m. UTC | #10
On 2023/6/15 1:07, Jakub Kicinski wrote:
> On Wed, 14 Jun 2023 19:42:29 +0800 Yunsheng Lin wrote:
>> On 2023/6/14 12:09, Jakub Kicinski wrote:
>>> On Mon, 12 Jun 2023 21:02:52 +0800 Yunsheng Lin wrote:  
>>>> Currently page_pool_alloc_frag() is not supported in 32-bit
>>>> arch with 64-bit DMA, which seems to be quite common, see
>>>> [1], which means driver may need to handle it when using
>>>> page_pool_alloc_frag() API.
>>>>
>>>> In order to simplify the driver's work for supporting page
>>>> frag, this patch allows page_pool_alloc_frag() to call
>>>> page_pool_alloc_pages() to return a big page frag without  
>>>
>>> it returns an entire (potentially compound) page, not a frag.
>>> AFAICT  
>>
>> As driver calls page_pool_alloc_frag(), and page_pool_alloc_frag()
>> calls page_pool_alloc_pages(), page_pool_alloc_pages() is hidden
>> inside page_pool_alloc_frag(), so it is a big page frag from driver's
>> point of view:)
> 
> frag​ment : a part broken off, detached, or incomplete
>           a small part broken or separated off something.
> 
> "big fragment" is definitely not the whole thing.

Alexander susgested something as below:
non-fragmented - legacy page pool w/o page frags
mono-frag - after this page page pool w/o frags
fragmented - before/after this patch w/ frags

https://patchwork.kernel.org/project/netdevbpf/patch/20230529092840.40413-3-linyunsheng@huawei.com/#25366535

Does 'mono-frag' make sense to you? or any better name in
mind?

> 
>>>> page splitting because of overlap issue between pp_frag_count
>>>> and dma_addr_upper in 'struct page' for those arches.  
>>>
>>> These two lines seem to belong in the first paragraph,
>>>   
>>>> As page_pool_create() with PP_FLAG_PAGE_FRAG is supported in  
>>>
>>> "is" -> "will now be"
>>>   
>>>> 32-bit arch with 64-bit DMA now, mlx5 calls page_pool_create()
>>>> with PP_FLAG_PAGE_FRAG and manipulate the page->pp_frag_count
>>>> directly using the page_pool_defrag_page(), so add a checking
>>>> for it to aoivd writing to page->pp_frag_count that may not
>>>> exist in some arch.  
>>>
>>> This paragraph needs some proof reading :(  
>>
>> Perhaps something like below?
>> mlx5 calls page_pool_create() with PP_FLAG_PAGE_FRAG and is
>> not using the frag API, as PP_FLAG_PAGE_FRAG checking for arch
>> with PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true will now be
>> removed in this patch, so add back the checking of
>> PAGE_POOL_DMA_USE_PP_FRAG_COUNT for mlx5 driver to retain the
>> old behavior, which is to avoid mlx5e_page_release_fragmented()
>> calling page_pool_defrag_page() to write to page->pp_frag_count.
> 
> That's a 7-line long, single sentence. Not much better.

I had the same feeling when I was typing:(
Any suggestion for the adjustment?

> 
>>>> Note that it may aggravate truesize underestimate problem for
>>>> skb as there is no page splitting for those pages, if driver
>>>> need a accuate truesize, it may calculate that according to  
>>>
>>> accurate
>>>   
>>>> frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
>>>> being true or not. And we may provide a helper for that if it
>>>> turns out to be helpful.
>>>>
>>>> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/  
>>>   
>>>> +		/* Return error here to avoid writing to page->pp_frag_count in
>>>> +		 * mlx5e_page_release_fragmented() for page->pp_frag_count is  
>>>
>>> I don't see any direct access to pp_frag_count anywhere outside of
>>> page_pool.h in net-next. PAGE_POOL_DMA_USE_PP_FRAG_COUNT sounds like 
>>> an internal flag, drivers shouldn't be looking at it, IMO.  
>>
>> mlx5e_page_release_fragmented() calls page_pool_defrag_page(), maybe
>> below is more correct:
>>
>> /* Return error here to avoid mlx5e_page_release_fragmented() calling
>>  * page_pool_defrag_page() to write to page->pp_frag_count which is
>>  * not usable for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
>> */
>>
>> I am agree with you about that drivers shouldn't be looking at it. But
>> adding PAGE_POOL_DMA_USE_PP_FRAG_COUNT checking back to mlx5 seems to be
>> the simplest way I can think of because of the reason mentioned above.
>>
>> And it seems that it is hard to change mlx5 to use frag API according to
>> the below disscusion with Alexander:
>>
>> https://lore.kernel.org/all/CAKgT0UeD=sboWNUsP33_UsKEKyqTBfeOqNO5NCdFaxh9KXEG3w@mail.gmail.com/
> 
> It's better to add a flag like PP_FLAG_PAGE_FRAG for this use case and
> have pool creation fail than poke at internals in the driver, IMO.

Jesper also had the similiar comment about that, but we decided to postpone
that because of the naming, any better name for that flag in mind?

maybe something as:
PP_FLAG_DRIVER_FRAG_HANDLING?

> 
> .
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a7c526ee5024..593cdfbfc035 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -832,6 +832,16 @@  static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		/* Create a page_pool and register it with rxq */
 		struct page_pool_params pp_params = { 0 };
 
+		/* Return error here to avoid writing to page->pp_frag_count in
+		 * mlx5e_page_release_fragmented() for page->pp_frag_count is
+		 * not usable for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT
+		 * being true.
+		 */
+		if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+			err = -EINVAL;
+			goto err_free_by_rq_type;
+		}
+
 		pp_params.order     = 0;
 		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_PAGE_FRAG;
 		pp_params.pool_size = pool_size;
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 126f9e294389..5c7f7501f300 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -33,6 +33,7 @@ 
 #include <linux/mm.h> /* Needed by ptr_ring */
 #include <linux/ptr_ring.h>
 #include <linux/dma-direction.h>
+#include <linux/dma-mapping.h>
 
 #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
 					* map/unmap
@@ -50,6 +51,9 @@ 
 				 PP_FLAG_DMA_SYNC_DEV |\
 				 PP_FLAG_PAGE_FRAG)
 
+#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
+		(sizeof(dma_addr_t) > sizeof(unsigned long))
+
 /*
  * Fast allocation side cache array/stack
  *
@@ -219,8 +223,33 @@  static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
 	return page_pool_alloc_pages(pool, gfp);
 }
 
-struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
-				  unsigned int size, gfp_t gfp);
+struct page *__page_pool_alloc_frag(struct page_pool *pool,
+				    unsigned int *offset, unsigned int size,
+				    gfp_t gfp);
+
+static inline struct page *page_pool_alloc_frag(struct page_pool *pool,
+						unsigned int *offset,
+						unsigned int size, gfp_t gfp)
+{
+	unsigned int max_size = PAGE_SIZE << pool->p.order;
+
+	size = ALIGN(size, dma_get_cache_alignment());
+
+	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
+		    size > max_size))
+		return NULL;
+
+	/* Don't allow page splitting and allocate one big frag
+	 * for 32-bit arch with 64-bit DMA, corresponding to
+	 * the checking in page_pool_is_last_frag().
+	 */
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
+	return __page_pool_alloc_frag(pool, offset, size, gfp);
+}
 
 static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 						    unsigned int *offset,
@@ -322,8 +351,14 @@  static inline long page_pool_defrag_page(struct page *page, long nr)
 static inline bool page_pool_is_last_frag(struct page_pool *pool,
 					  struct page *page)
 {
-	/* If fragments aren't enabled or count is 0 we were the last user */
+	/* We assume we are the last frag user that is still holding
+	 * on to the page if:
+	 * 1. Fragments aren't enabled.
+	 * 2. We are running in 32-bit arch with 64-bit DMA.
+	 * 3. page_pool_defrag_page() indicate we are the last user.
+	 */
 	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
+	       PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
 	       (page_pool_defrag_page(page, 1) == 0);
 }
 
@@ -357,9 +392,6 @@  static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
-#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
-		(sizeof(dma_addr_t) > sizeof(unsigned long))
-
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
 	dma_addr_t ret = page->dma_addr;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a3e12a61d456..9c4118c62997 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -14,7 +14,6 @@ 
 #include <net/xdp.h>
 
 #include <linux/dma-direction.h>
-#include <linux/dma-mapping.h>
 #include <linux/page-flags.h>
 #include <linux/mm.h> /* for put_page() */
 #include <linux/poison.h>
@@ -200,10 +199,6 @@  static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
-	    pool->p.flags & PP_FLAG_PAGE_FRAG)
-		return -EINVAL;
-
 #ifdef CONFIG_PAGE_POOL_STATS
 	pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
 	if (!pool->recycle_stats)
@@ -715,18 +710,13 @@  static void page_pool_free_frag(struct page_pool *pool)
 	page_pool_return_page(pool, page);
 }
 
-struct page *page_pool_alloc_frag(struct page_pool *pool,
-				  unsigned int *offset,
-				  unsigned int size, gfp_t gfp)
+struct page *__page_pool_alloc_frag(struct page_pool *pool,
+				    unsigned int *offset,
+				    unsigned int size, gfp_t gfp)
 {
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-		    size > max_size))
-		return NULL;
-
-	size = ALIGN(size, dma_get_cache_alignment());
 	*offset = pool->frag_offset;
 
 	if (page && *offset + size > max_size) {
@@ -759,7 +749,7 @@  struct page *page_pool_alloc_frag(struct page_pool *pool,
 	alloc_stat_inc(pool, fast);
 	return page;
 }
-EXPORT_SYMBOL(page_pool_alloc_frag);
+EXPORT_SYMBOL(__page_pool_alloc_frag);
 
 static void page_pool_empty_ring(struct page_pool *pool)
 {