diff mbox series

[FIX] mm: pcp: fix pcp->free_count reduction on page allocation

Message ID 20250107091724.35287-1-nikhil.dhama@amd.com (mailing list archive)
State New
Headers show
Series [FIX] mm: pcp: fix pcp->free_count reduction on page allocation | expand

Commit Message

Nikhil Dhama Jan. 7, 2025, 9:17 a.m. UTC
In current PCP auto-tuning desgin, free_count was introduced to track
the consecutive page freeing with a counter, This counter is incremented
by the exact amount of pages that are freed, but reduced by half on
allocation. This is causing a 2-node iperf3 client to server's network
bandwidth to drop by 30% if we scale number of client-server pairs from 32
(where we achieved peak network bandwidth) to 64.

To fix this issue, on allocation, reduce free_count by the exact number
of pages that are allocated instead of halving it.

On a 2-node AMD server, one running iperf3 clients and other iperf3
sever, This patch restores the performance drop.

Fixes: 6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high order page freeing")

Signed-off-by: Nikhil Dhama <nikhil.dhama@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ying Huang <huang.ying.caritas@gmail.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Bharata B Rao <bharata@amd.com>
Cc: Raghavendra <raghavendra.kodsarathimmappa@amd.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Andrew Morton Jan. 8, 2025, 5:05 a.m. UTC | #1
On Tue, 7 Jan 2025 14:47:24 +0530 Nikhil Dhama <nikhil.dhama@amd.com> wrote:

> In current PCP auto-tuning desgin, free_count was introduced to track
> the consecutive page freeing with a counter, This counter is incremented
> by the exact amount of pages that are freed, but reduced by half on
> allocation. This is causing a 2-node iperf3 client to server's network
> bandwidth to drop by 30% if we scale number of client-server pairs from 32
> (where we achieved peak network bandwidth) to 64.
> 
> To fix this issue, on allocation, reduce free_count by the exact number
> of pages that are allocated instead of halving it.

The present division by two appears to be somewhat randomly chosen. 
And as far as I can tell, this patch proposes replacing that with
another somewhat random adjustment.

What's the actual design here?  What are we attempting to do and why,
and why is the proposed design superior to the present one?

> On a 2-node AMD server, one running iperf3 clients and other iperf3
> sever, This patch restores the performance drop.

Nice, but might other workloads on other machines get slower?
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cae7b93864c2..e2a8ec5584f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3037,10 +3037,10 @@  static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 
 	/*
 	 * On allocation, reduce the number of pages that are batch freed.
-	 * See nr_pcp_free() where free_factor is increased for subsequent
+	 * See free_unref_page_commit() where free_count is increased for subsequent
 	 * frees.
 	 */
-	pcp->free_count >>= 1;
+	pcp->free_count -= (1 << order);
 	list = &pcp->lists[order_to_pindex(migratetype, order)];
 	page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
 	pcp_spin_unlock(pcp);