diff mbox series

[net-next,v5,01/14] page_pool: make sure frag API fields don't span between cachelines

Message ID 20231124154732.1623518-2-aleksander.lobakin@intel.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: intel: start The Great Code Dedup + Page Pool for iavf | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/codegen success Generated files up to date
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1254 this patch: 1254
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1143 this patch: 1143
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1295 this patch: 1295
netdev/checkpatch warning WARNING: function definition argument 'long' should also have an identifier name
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Alexander Lobakin Nov. 24, 2023, 3:47 p.m. UTC
After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
into fast and slow") that made &page_pool contain only "hot" params at
the start, cacheline boundary chops frag API fields group in the middle
again.
To not bother with this each time fast params get expanded or shrunk,
let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
their actual size (2 longs + 2 ints). This ensures 16-byte alignment for
the 32-bit architectures and 32-byte alignment for the 64-bit ones,
excluding unnecessary false-sharing.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yunsheng Lin Nov. 25, 2023, 12:29 p.m. UTC | #1
On 2023/11/24 23:47, Alexander Lobakin wrote:
> After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
> into fast and slow") that made &page_pool contain only "hot" params at
> the start, cacheline boundary chops frag API fields group in the middle
> again.
> To not bother with this each time fast params get expanded or shrunk,
> let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
> their actual size (2 longs + 2 ints). This ensures 16-byte alignment for
> the 32-bit architectures and 32-byte alignment for the 64-bit ones,
> excluding unnecessary false-sharing.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/net/page_pool/types.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index e1bb92c192de..989d07b831fc 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -127,7 +127,7 @@ struct page_pool {
>  
>  	bool has_init_callback;

It seems odd to have only a slow field between tow fast
field group, isn't it better to move it to the end of
page_pool or where is more appropriate?

>  
> -	long frag_users;
> +	long frag_users __aligned(4 * sizeof(long));

If we need that, why not just use '____cacheline_aligned_in_smp'?

>  	struct page *frag_page;
>  	unsigned int frag_offset;
>  	u32 pages_state_hold_cnt;
>
Jakub Kicinski Nov. 26, 2023, 10:54 p.m. UTC | #2
On Fri, 24 Nov 2023 16:47:19 +0100 Alexander Lobakin wrote:
> -	long frag_users;
> +	long frag_users __aligned(4 * sizeof(long));

A comment for the somewhat unusual alignment size would be good.
Alexander Lobakin Nov. 27, 2023, 2:08 p.m. UTC | #3
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Sat, 25 Nov 2023 20:29:22 +0800

> On 2023/11/24 23:47, Alexander Lobakin wrote:
>> After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
>> into fast and slow") that made &page_pool contain only "hot" params at
>> the start, cacheline boundary chops frag API fields group in the middle
>> again.
>> To not bother with this each time fast params get expanded or shrunk,
>> let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
>> their actual size (2 longs + 2 ints). This ensures 16-byte alignment for
>> the 32-bit architectures and 32-byte alignment for the 64-bit ones,
>> excluding unnecessary false-sharing.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> ---
>>  include/net/page_pool/types.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>> index e1bb92c192de..989d07b831fc 100644
>> --- a/include/net/page_pool/types.h
>> +++ b/include/net/page_pool/types.h
>> @@ -127,7 +127,7 @@ struct page_pool {
>>  
>>  	bool has_init_callback;
> 
> It seems odd to have only a slow field between tow fast
> field group, isn't it better to move it to the end of
> page_pool or where is more appropriate?

1. There will be more in the subsequent patches.
2. ::has_init_callback happens each new page allocation, it's not slow.
   Jakub did put it here for purpose.

> 
>>  
>> -	long frag_users;
>> +	long frag_users __aligned(4 * sizeof(long));
> 
> If we need that, why not just use '____cacheline_aligned_in_smp'?

It can be an overkill. We don't need a full cacheline, but only these
fields to stay within one, no matter whether they are in the beginning
of it or at the end.

> 
>>  	struct page *frag_page;
>>  	unsigned int frag_offset;
>>  	u32 pages_state_hold_cnt;
>>

Thanks,
Olek
Alexander Lobakin Nov. 27, 2023, 2:12 p.m. UTC | #4
From: Jakub Kicinski <kuba@kernel.org>
Date: Sun, 26 Nov 2023 14:54:57 -0800

> On Fri, 24 Nov 2023 16:47:19 +0100 Alexander Lobakin wrote:
>> -	long frag_users;
>> +	long frag_users __aligned(4 * sizeof(long));
> 
> A comment for the somewhat unusual alignment size would be good.

Roger that. Will paste a couple words from the commit message.

FYI, I had an idea of doing something like

__aligned(roundup_pow_of_2(2 * sizeof(long) + 2 * sizeof(int)))

but that looks horrible, so I stopped on the current :D There are no
functional changes between them either way.

Thanks,
Olek
Yunsheng Lin Nov. 29, 2023, 2:55 a.m. UTC | #5
On 2023/11/27 22:08, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Sat, 25 Nov 2023 20:29:22 +0800
> 
>> On 2023/11/24 23:47, Alexander Lobakin wrote:
>>> After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
>>> into fast and slow") that made &page_pool contain only "hot" params at
>>> the start, cacheline boundary chops frag API fields group in the middle
>>> again.
>>> To not bother with this each time fast params get expanded or shrunk,
>>> let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
>>> their actual size (2 longs + 2 ints). This ensures 16-byte alignment for
>>> the 32-bit architectures and 32-byte alignment for the 64-bit ones,
>>> excluding unnecessary false-sharing.
>>>
>>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> ---
>>>  include/net/page_pool/types.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>> index e1bb92c192de..989d07b831fc 100644
>>> --- a/include/net/page_pool/types.h
>>> +++ b/include/net/page_pool/types.h
>>> @@ -127,7 +127,7 @@ struct page_pool {
>>>  
>>>  	bool has_init_callback;
>>
>> It seems odd to have only a slow field between tow fast
>> field group, isn't it better to move it to the end of
>> page_pool or where is more appropriate?
> 
> 1. There will be more in the subsequent patches.
> 2. ::has_init_callback happens each new page allocation, it's not slow.
>    Jakub did put it here for purpose.
> 
>>
>>>  
>>> -	long frag_users;
>>> +	long frag_users __aligned(4 * sizeof(long));
>>
>> If we need that, why not just use '____cacheline_aligned_in_smp'?
> 
> It can be an overkill. We don't need a full cacheline, but only these
> fields to stay within one, no matter whether they are in the beginning
> of it or at the end.

I am still a little lost here, A comment explaining why using '4' in the
above would be really helpful here.
Alexander Lobakin Nov. 29, 2023, 1:12 p.m. UTC | #6
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Wed, 29 Nov 2023 10:55:00 +0800

> On 2023/11/27 22:08, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Sat, 25 Nov 2023 20:29:22 +0800
>>
>>> On 2023/11/24 23:47, Alexander Lobakin wrote:
>>>> After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
>>>> into fast and slow") that made &page_pool contain only "hot" params at
>>>> the start, cacheline boundary chops frag API fields group in the middle
>>>> again.
>>>> To not bother with this each time fast params get expanded or shrunk,
>>>> let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
>>>> their actual size (2 longs + 2 ints). This ensures 16-byte alignment for
>>>> the 32-bit architectures and 32-byte alignment for the 64-bit ones,
>>>> excluding unnecessary false-sharing.
>>>>
>>>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>> ---
>>>>  include/net/page_pool/types.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>>> index e1bb92c192de..989d07b831fc 100644
>>>> --- a/include/net/page_pool/types.h
>>>> +++ b/include/net/page_pool/types.h
>>>> @@ -127,7 +127,7 @@ struct page_pool {
>>>>  
>>>>  	bool has_init_callback;
>>>
>>> It seems odd to have only a slow field between tow fast
>>> field group, isn't it better to move it to the end of
>>> page_pool or where is more appropriate?
>>
>> 1. There will be more in the subsequent patches.
>> 2. ::has_init_callback happens each new page allocation, it's not slow.
>>    Jakub did put it here for purpose.
>>
>>>
>>>>  
>>>> -	long frag_users;
>>>> +	long frag_users __aligned(4 * sizeof(long));
>>>
>>> If we need that, why not just use '____cacheline_aligned_in_smp'?
>>
>> It can be an overkill. We don't need a full cacheline, but only these
>> fields to stay within one, no matter whether they are in the beginning
>> of it or at the end.
> 
> I am still a little lost here, A comment explaining why using '4' in the
> above would be really helpful here.

The block is: 2 longs (users, frag pointer) and 2 ints (offset, cnt).
On 32-bit architectures, longs == ints, so that this effectively means 4
longs.
On 64-bit architectures, long is 8 bytes and int is 4, so that the value
becomes 2 * 8 + 2 * 4 = 24, but the alignment must be a pow-2. The
closest pow-2 to 24 is 32, which equals to 4 * 8 = 4 longs.
At the end, regardless of the architecture, the desired alignment would
end up as 4 * longs. As I wrote earlier, we could do something like

__aligned(roundup_pow_of_2(2 * sizeof(long) + 2 * sizeof(int)))

but doesn't that seem ugly as hell?

As I replied to Jakub, I'll add a comment in the code (so that you
wouldn't need refer to the Git history / commit message) in the next
version.

Thanks,
Olek
diff mbox series

Patch

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index e1bb92c192de..989d07b831fc 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -127,7 +127,7 @@  struct page_pool {
 
 	bool has_init_callback;
 
-	long frag_users;
+	long frag_users __aligned(4 * sizeof(long));
 	struct page *frag_page;
 	unsigned int frag_offset;
 	u32 pages_state_hold_cnt;