diff mbox series

[net-next] net: page_pool: optimize page pool page allocation in NUMA scenario

Message ID 20220624093621.12505-1-huangguangbin2@huawei.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: page_pool: optimize page pool page allocation in NUMA scenario | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers warning 8 maintainers not CCed: daniel@iogearbox.net songliubraving@fb.com ast@kernel.org yhs@fb.com john.fastabend@gmail.com kafai@fb.com andrii@kernel.org kpsingh@kernel.org
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Guangbin Huang June 24, 2022, 9:36 a.m. UTC
From: Jie Wang <wangjie125@huawei.com>

Currently NIC packet receiving performance based on page pool deteriorates
occasionally. To analysis the causes of this problem page allocation stats
are collected. Here are the stats when NIC rx performance deteriorates:

bandwidth(Gbits/s)		16.8		6.91
rx_pp_alloc_fast		13794308	21141869
rx_pp_alloc_slow		108625		166481
rx_pp_alloc_slow_h		0		0
rx_pp_alloc_empty		8192		8192
rx_pp_alloc_refill		0		0
rx_pp_alloc_waive		100433		158289
rx_pp_recycle_cached		0		0
rx_pp_recycle_cache_full	0		0
rx_pp_recycle_ring		362400		420281
rx_pp_recycle_ring_full		6064893		9709724
rx_pp_recycle_released_ref	0		0

The rx_pp_alloc_waive count indicates that a large number of pages' numa
node are inconsistent with the NIC device numa node. Therefore these pages
can't be reused by the page pool. As a result, many new pages would be
allocated by __page_pool_alloc_pages_slow which is time consuming. This
causes the NIC rx performance fluctuations.

The main reason of huge numa mismatch pages in page pool is that page pool
uses alloc_pages_bulk_array to allocate original pages. This function is
not suitable for page allocation in NUMA scenario. So this patch uses
alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
the NUMA consistent between NIC device and allocated pages.

Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth
is higher and more stable compared to the datas above. Here are three test
stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
indicates pages allocated from slow patch is relatively low.

bandwidth(Gbits/s)		93		93.9		93.8
rx_pp_alloc_fast		60066264	61266386	60938254
rx_pp_alloc_slow		16512		16517		16539
rx_pp_alloc_slow_ho		0		0		0
rx_pp_alloc_empty		16512		16517		16539
rx_pp_alloc_refill		473841		481910		481585
rx_pp_alloc_waive		0		0		0
rx_pp_recycle_cached		0		0		0
rx_pp_recycle_cache_full	0		0		0
rx_pp_recycle_ring		29754145	30358243	30194023
rx_pp_recycle_ring_full		0		0		0
rx_pp_recycle_released_ref	0		0		0

Signed-off-by: Jie Wang <wangjie125@huawei.com>
---
 net/core/page_pool.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Jesper Dangaard Brouer June 27, 2022, 9:50 a.m. UTC | #1
On 24/06/2022 11.36, Guangbin Huang wrote:
> From: Jie Wang <wangjie125@huawei.com>
> 
> Currently NIC packet receiving performance based on page pool deteriorates
> occasionally. To analysis the causes of this problem page allocation stats
> are collected. Here are the stats when NIC rx performance deteriorates:
> 
> bandwidth(Gbits/s)		16.8		6.91
> rx_pp_alloc_fast		13794308	21141869
> rx_pp_alloc_slow		108625		166481
> rx_pp_alloc_slow_h		0		0
> rx_pp_alloc_empty		8192		8192
> rx_pp_alloc_refill		0		0
> rx_pp_alloc_waive		100433		158289
> rx_pp_recycle_cached		0		0
> rx_pp_recycle_cache_full	0		0
> rx_pp_recycle_ring		362400		420281
> rx_pp_recycle_ring_full		6064893		9709724
> rx_pp_recycle_released_ref	0		0
> 
> The rx_pp_alloc_waive count indicates that a large number of pages' numa
> node are inconsistent with the NIC device numa node. Therefore these pages
> can't be reused by the page pool. As a result, many new pages would be
> allocated by __page_pool_alloc_pages_slow which is time consuming. This
> causes the NIC rx performance fluctuations.
> 
> The main reason of huge numa mismatch pages in page pool is that page pool
> uses alloc_pages_bulk_array to allocate original pages. This function is
> not suitable for page allocation in NUMA scenario. So this patch uses
> alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
> the NUMA consistent between NIC device and allocated pages.
> 
> Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth
> is higher and more stable compared to the datas above. Here are three test
> stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
> indicates pages allocated from slow patch is relatively low.
> 
> bandwidth(Gbits/s)		93		93.9		93.8
> rx_pp_alloc_fast		60066264	61266386	60938254
> rx_pp_alloc_slow		16512		16517		16539
> rx_pp_alloc_slow_ho		0		0		0
> rx_pp_alloc_empty		16512		16517		16539
> rx_pp_alloc_refill		473841		481910		481585
> rx_pp_alloc_waive		0		0		0
> rx_pp_recycle_cached		0		0		0
> rx_pp_recycle_cache_full	0		0		0
> rx_pp_recycle_ring		29754145	30358243	30194023
> rx_pp_recycle_ring_full		0		0		0
> rx_pp_recycle_released_ref	0		0		0
> 
> Signed-off-by: Jie Wang <wangjie125@huawei.com>
> ---
>   net/core/page_pool.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)

Thanks for improving this, but we need some small adjustments below.
And then you need to send a V2 of the patch.

> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index f18e6e771993..15997fcd78f3 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -377,6 +377,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
>   	unsigned int pp_order = pool->p.order;
>   	struct page *page;
>   	int i, nr_pages;
> +	int pref_nid; /* preferred NUMA node */
>   
>   	/* Don't support bulk alloc for high-order pages */
>   	if (unlikely(pp_order))
> @@ -386,10 +387,18 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
>   	if (unlikely(pool->alloc.count > 0))
>   		return pool->alloc.cache[--pool->alloc.count];
>   
> +#ifdef CONFIG_NUMA
> +	pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() : pool->p.nid;
> +#else
> +	/* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */

Remove "helps compiler" from comments, it only make sense in the code
this was copy-pasted from.


> +	pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */

The comment about "page_to_nid()" is only relevant in the code
this was copy-pasted from.

Change to:
	pref_nid = NUMA_NO_NODE;

As alloc_pages_bulk_array_node() will be inlined, the effect (generated 
asm code) will be the same, but it will be better for code maintenance.

> +#endif
> +
>   	/* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */
>   	memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
>   
> -	nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
> +	nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk,
> +					       pool->alloc.cache);
>   	if (unlikely(!nr_pages))
>   		return NULL;
>
wangjie (L) June 27, 2022, 1:04 p.m. UTC | #2
On 2022/6/27 17:50, Jesper Dangaard Brouer wrote:
>
>
> On 24/06/2022 11.36, Guangbin Huang wrote:
>> From: Jie Wang <wangjie125@huawei.com>
>>
>> Currently NIC packet receiving performance based on page pool
>> deteriorates
>> occasionally. To analysis the causes of this problem page allocation
>> stats
>> are collected. Here are the stats when NIC rx performance deteriorates:
>>
>> bandwidth(Gbits/s)        16.8        6.91
>> rx_pp_alloc_fast        13794308    21141869
>> rx_pp_alloc_slow        108625        166481
>> rx_pp_alloc_slow_h        0        0
>> rx_pp_alloc_empty        8192        8192
>> rx_pp_alloc_refill        0        0
>> rx_pp_alloc_waive        100433        158289
>> rx_pp_recycle_cached        0        0
>> rx_pp_recycle_cache_full    0        0
>> rx_pp_recycle_ring        362400        420281
>> rx_pp_recycle_ring_full        6064893        9709724
>> rx_pp_recycle_released_ref    0        0
>>
>> The rx_pp_alloc_waive count indicates that a large number of pages' numa
>> node are inconsistent with the NIC device numa node. Therefore these
>> pages
>> can't be reused by the page pool. As a result, many new pages would be
>> allocated by __page_pool_alloc_pages_slow which is time consuming. This
>> causes the NIC rx performance fluctuations.
>>
>> The main reason of huge numa mismatch pages in page pool is that page
>> pool
>> uses alloc_pages_bulk_array to allocate original pages. This function is
>> not suitable for page allocation in NUMA scenario. So this patch uses
>> alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
>> the NUMA consistent between NIC device and allocated pages.
>>
>> Repeated NIC rx performance tests are performed 40 times. NIC rx
>> bandwidth
>> is higher and more stable compared to the datas above. Here are three
>> test
>> stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
>> indicates pages allocated from slow patch is relatively low.
>>
>> bandwidth(Gbits/s)        93        93.9        93.8
>> rx_pp_alloc_fast        60066264    61266386    60938254
>> rx_pp_alloc_slow        16512        16517        16539
>> rx_pp_alloc_slow_ho        0        0        0
>> rx_pp_alloc_empty        16512        16517        16539
>> rx_pp_alloc_refill        473841        481910        481585
>> rx_pp_alloc_waive        0        0        0
>> rx_pp_recycle_cached        0        0        0
>> rx_pp_recycle_cache_full    0        0        0
>> rx_pp_recycle_ring        29754145    30358243    30194023
>> rx_pp_recycle_ring_full        0        0        0
>> rx_pp_recycle_released_ref    0        0        0
>>
>> Signed-off-by: Jie Wang <wangjie125@huawei.com>
>> ---
>>   net/core/page_pool.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> Thanks for improving this, but we need some small adjustments below.
> And then you need to send a V2 of the patch.
>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index f18e6e771993..15997fcd78f3 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -377,6 +377,7 @@ static struct page
>> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>>       unsigned int pp_order = pool->p.order;
>>       struct page *page;
>>       int i, nr_pages;
>> +    int pref_nid; /* preferred NUMA node */
>>         /* Don't support bulk alloc for high-order pages */
>>       if (unlikely(pp_order))
>> @@ -386,10 +387,18 @@ static struct page
>> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>>       if (unlikely(pool->alloc.count > 0))
>>           return pool->alloc.cache[--pool->alloc.count];
>>   +#ifdef CONFIG_NUMA
>> +    pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() :
>> pool->p.nid;
>> +#else
>> +    /* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */
>
> Remove "helps compiler" from comments, it only make sense in the code
> this was copy-pasted from.
>
>
>> +    pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */
>
> The comment about "page_to_nid()" is only relevant in the code
> this was copy-pasted from.
>
> Change to:
>     pref_nid = NUMA_NO_NODE;
>
> As alloc_pages_bulk_array_node() will be inlined, the effect (generated
> asm code) will be the same, but it will be better for code maintenance.
>
OK,thanks for your review, I will fix it in next version.
>> +#endif
>> +
>>       /* Mark empty alloc.cache slots "empty" for
>> alloc_pages_bulk_array */
>>       memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
>>   -    nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
>> +    nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk,
>> +                           pool->alloc.cache);
>>       if (unlikely(!nr_pages))
>>           return NULL;
>>
>
>
> .
>
diff mbox series

Patch

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f18e6e771993..15997fcd78f3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -377,6 +377,7 @@  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	unsigned int pp_order = pool->p.order;
 	struct page *page;
 	int i, nr_pages;
+	int pref_nid; /* preferred NUMA node */
 
 	/* Don't support bulk alloc for high-order pages */
 	if (unlikely(pp_order))
@@ -386,10 +387,18 @@  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	if (unlikely(pool->alloc.count > 0))
 		return pool->alloc.cache[--pool->alloc.count];
 
+#ifdef CONFIG_NUMA
+	pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() : pool->p.nid;
+#else
+	/* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */
+	pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */
+#endif
+
 	/* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */
 	memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
 
-	nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
+	nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk,
+					       pool->alloc.cache);
 	if (unlikely(!nr_pages))
 		return NULL;