Message ID | 20220624093621.12505-1-huangguangbin2@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: page_pool: optimize page pool page allocation in NUMA scenario | expand |
On 24/06/2022 11.36, Guangbin Huang wrote: > From: Jie Wang <wangjie125@huawei.com> > > Currently NIC packet receiving performance based on page pool deteriorates > occasionally. To analysis the causes of this problem page allocation stats > are collected. Here are the stats when NIC rx performance deteriorates: > > bandwidth(Gbits/s) 16.8 6.91 > rx_pp_alloc_fast 13794308 21141869 > rx_pp_alloc_slow 108625 166481 > rx_pp_alloc_slow_h 0 0 > rx_pp_alloc_empty 8192 8192 > rx_pp_alloc_refill 0 0 > rx_pp_alloc_waive 100433 158289 > rx_pp_recycle_cached 0 0 > rx_pp_recycle_cache_full 0 0 > rx_pp_recycle_ring 362400 420281 > rx_pp_recycle_ring_full 6064893 9709724 > rx_pp_recycle_released_ref 0 0 > > The rx_pp_alloc_waive count indicates that a large number of pages' numa > node are inconsistent with the NIC device numa node. Therefore these pages > can't be reused by the page pool. As a result, many new pages would be > allocated by __page_pool_alloc_pages_slow which is time consuming. This > causes the NIC rx performance fluctuations. > > The main reason of huge numa mismatch pages in page pool is that page pool > uses alloc_pages_bulk_array to allocate original pages. This function is > not suitable for page allocation in NUMA scenario. So this patch uses > alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure > the NUMA consistent between NIC device and allocated pages. > > Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth > is higher and more stable compared to the datas above. Here are three test > stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which > indicates pages allocated from slow patch is relatively low. > > bandwidth(Gbits/s) 93 93.9 93.8 > rx_pp_alloc_fast 60066264 61266386 60938254 > rx_pp_alloc_slow 16512 16517 16539 > rx_pp_alloc_slow_ho 0 0 0 > rx_pp_alloc_empty 16512 16517 16539 > rx_pp_alloc_refill 473841 481910 481585 > rx_pp_alloc_waive 0 0 0 > rx_pp_recycle_cached 0 0 0 > rx_pp_recycle_cache_full 0 0 0 > rx_pp_recycle_ring 29754145 30358243 30194023 > rx_pp_recycle_ring_full 0 0 0 > rx_pp_recycle_released_ref 0 0 0 > > Signed-off-by: Jie Wang <wangjie125@huawei.com> > --- > net/core/page_pool.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) Thanks for improving this, but we need some small adjustments below. And then you need to send a V2 of the patch. > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index f18e6e771993..15997fcd78f3 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -377,6 +377,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, > unsigned int pp_order = pool->p.order; > struct page *page; > int i, nr_pages; > + int pref_nid; /* preferred NUMA node */ > > /* Don't support bulk alloc for high-order pages */ > if (unlikely(pp_order)) > @@ -386,10 +387,18 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, > if (unlikely(pool->alloc.count > 0)) > return pool->alloc.cache[--pool->alloc.count]; > > +#ifdef CONFIG_NUMA > + pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() : pool->p.nid; > +#else > + /* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */ Remove "helps compiler" from comments, it only make sense in the code this was copy-pasted from. > + pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */ The comment about "page_to_nid()" is only relevant in the code this was copy-pasted from. Change to: pref_nid = NUMA_NO_NODE; As alloc_pages_bulk_array_node() will be inlined, the effect (generated asm code) will be the same, but it will be better for code maintenance. > +#endif > + > /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */ > memset(&pool->alloc.cache, 0, sizeof(void *) * bulk); > > - nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache); > + nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk, > + pool->alloc.cache); > if (unlikely(!nr_pages)) > return NULL; >
On 2022/6/27 17:50, Jesper Dangaard Brouer wrote: > > > On 24/06/2022 11.36, Guangbin Huang wrote: >> From: Jie Wang <wangjie125@huawei.com> >> >> Currently NIC packet receiving performance based on page pool >> deteriorates >> occasionally. To analysis the causes of this problem page allocation >> stats >> are collected. Here are the stats when NIC rx performance deteriorates: >> >> bandwidth(Gbits/s) 16.8 6.91 >> rx_pp_alloc_fast 13794308 21141869 >> rx_pp_alloc_slow 108625 166481 >> rx_pp_alloc_slow_h 0 0 >> rx_pp_alloc_empty 8192 8192 >> rx_pp_alloc_refill 0 0 >> rx_pp_alloc_waive 100433 158289 >> rx_pp_recycle_cached 0 0 >> rx_pp_recycle_cache_full 0 0 >> rx_pp_recycle_ring 362400 420281 >> rx_pp_recycle_ring_full 6064893 9709724 >> rx_pp_recycle_released_ref 0 0 >> >> The rx_pp_alloc_waive count indicates that a large number of pages' numa >> node are inconsistent with the NIC device numa node. Therefore these >> pages >> can't be reused by the page pool. As a result, many new pages would be >> allocated by __page_pool_alloc_pages_slow which is time consuming. This >> causes the NIC rx performance fluctuations. >> >> The main reason of huge numa mismatch pages in page pool is that page >> pool >> uses alloc_pages_bulk_array to allocate original pages. This function is >> not suitable for page allocation in NUMA scenario. So this patch uses >> alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure >> the NUMA consistent between NIC device and allocated pages. >> >> Repeated NIC rx performance tests are performed 40 times. NIC rx >> bandwidth >> is higher and more stable compared to the datas above. Here are three >> test >> stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which >> indicates pages allocated from slow patch is relatively low. >> >> bandwidth(Gbits/s) 93 93.9 93.8 >> rx_pp_alloc_fast 60066264 61266386 60938254 >> rx_pp_alloc_slow 16512 16517 16539 >> rx_pp_alloc_slow_ho 0 0 0 >> rx_pp_alloc_empty 16512 16517 16539 >> rx_pp_alloc_refill 473841 481910 481585 >> rx_pp_alloc_waive 0 0 0 >> rx_pp_recycle_cached 0 0 0 >> rx_pp_recycle_cache_full 0 0 0 >> rx_pp_recycle_ring 29754145 30358243 30194023 >> rx_pp_recycle_ring_full 0 0 0 >> rx_pp_recycle_released_ref 0 0 0 >> >> Signed-off-by: Jie Wang <wangjie125@huawei.com> >> --- >> net/core/page_pool.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) > > Thanks for improving this, but we need some small adjustments below. > And then you need to send a V2 of the patch. > >> diff --git a/net/core/page_pool.c b/net/core/page_pool.c >> index f18e6e771993..15997fcd78f3 100644 >> --- a/net/core/page_pool.c >> +++ b/net/core/page_pool.c >> @@ -377,6 +377,7 @@ static struct page >> *__page_pool_alloc_pages_slow(struct page_pool *pool, >> unsigned int pp_order = pool->p.order; >> struct page *page; >> int i, nr_pages; >> + int pref_nid; /* preferred NUMA node */ >> /* Don't support bulk alloc for high-order pages */ >> if (unlikely(pp_order)) >> @@ -386,10 +387,18 @@ static struct page >> *__page_pool_alloc_pages_slow(struct page_pool *pool, >> if (unlikely(pool->alloc.count > 0)) >> return pool->alloc.cache[--pool->alloc.count]; >> +#ifdef CONFIG_NUMA >> + pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() : >> pool->p.nid; >> +#else >> + /* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */ > > Remove "helps compiler" from comments, it only make sense in the code > this was copy-pasted from. > > >> + pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */ > > The comment about "page_to_nid()" is only relevant in the code > this was copy-pasted from. > > Change to: > pref_nid = NUMA_NO_NODE; > > As alloc_pages_bulk_array_node() will be inlined, the effect (generated > asm code) will be the same, but it will be better for code maintenance. > OK,thanks for your review, I will fix it in next version. >> +#endif >> + >> /* Mark empty alloc.cache slots "empty" for >> alloc_pages_bulk_array */ >> memset(&pool->alloc.cache, 0, sizeof(void *) * bulk); >> - nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache); >> + nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk, >> + pool->alloc.cache); >> if (unlikely(!nr_pages)) >> return NULL; >> > > > . >
diff --git a/net/core/page_pool.c b/net/core/page_pool.c index f18e6e771993..15997fcd78f3 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -377,6 +377,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, unsigned int pp_order = pool->p.order; struct page *page; int i, nr_pages; + int pref_nid; /* preferred NUMA node */ /* Don't support bulk alloc for high-order pages */ if (unlikely(pp_order)) @@ -386,10 +387,18 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, if (unlikely(pool->alloc.count > 0)) return pool->alloc.cache[--pool->alloc.count]; +#ifdef CONFIG_NUMA + pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() : pool->p.nid; +#else + /* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */ + pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */ +#endif + /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */ memset(&pool->alloc.cache, 0, sizeof(void *) * bulk); - nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache); + nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk, + pool->alloc.cache); if (unlikely(!nr_pages)) return NULL;