From patchwork Sun Aug  4 08:01:05 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yafang Shao <laoar.shao@gmail.com>
X-Patchwork-Id: 13752534
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4D01CC3DA7F
	for <linux-mm@archiver.kernel.org>; Sun,  4 Aug 2024 08:01:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id C10506B0098; Sun,  4 Aug 2024 04:01:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BC02E6B00A0; Sun,  4 Aug 2024 04:01:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A87CA6B00A1; Sun,  4 Aug 2024 04:01:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com
 [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 8CF396B0098
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 04:01:39 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 14B21C186D
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:39 +0000 (UTC)
X-FDA: 82413818718.07.B0A9D98
Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com
 [209.85.214.171])
	by imf24.hostedemail.com (Postfix) with ESMTP id 2A5BE180002
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:36 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=i8iqR5DE;
	spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.214.171 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1722758467;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=VuIgKurhfVuhaRadflHzHTLjU4+G8ZrUnfFV7pM3qek=;
	b=5Cv9FT4E52z5/CHFgWh3f1jd/r0nRXd1TJas+W+bGkzRl1WskleY0SXUmP9dtnACEWgYMz
	Xs69O8CyFO4p6oXFQF81eU9gOUcihUWqkEn4qaZ0gC2sYh1THE8tJvFKtOmmKAm12BrtKa
	jMtTzCurR4UZq4NthgRTdD6XW4R38PI=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=i8iqR5DE;
	spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.214.171 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722758467; a=rsa-sha256;
	cv=none;
	b=IfU7GogVaJIGqc9Vc5hLpL9BD5uwgVn9+SFgjACjK19kLvZT1rRIv8vnoCOO46rig1M9WV
	MYV+XIOCE0DgTKkbe74kvu6xghOhcA3ZWQeV127/GFktboA6vW5BJzlDt9m8vZTrJcnN9I
	YHPJ0/nsEMixjnvRICKC2XRxWRBQacw=
Received: by mail-pl1-f171.google.com with SMTP id
 d9443c01a7336-1fc566ac769so69138895ad.1
        for <linux-mm@kvack.org>; Sun, 04 Aug 2024 01:01:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1722758496; x=1723363296; darn=kvack.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=VuIgKurhfVuhaRadflHzHTLjU4+G8ZrUnfFV7pM3qek=;
        b=i8iqR5DEqEMvr2ZLY2HmEoJrVKck1STH6A9GXD3n6VvutQIgBrZVmOXBy7lUXqCTw5
         s3+kKahUuG1NuidNHpgko6a8wF+vFehlIma0b68F6S8BoOuQAhz7dGVqGBJW/JnWYHlU
         jkn+Hr2GZ/pFAIl7a0FqHz1a3O+dm4lFoTrmYFBAy1x2OiRLiW3wnorfxNHH9ybrzKD9
         xp5biVqgsvR82U4SXNkORc+tUn2gBPs8pqYyp3iPw/MN+cuoQ0uMODPe3yEDC/HeFAOi
         GDLSAeP4JTdTt7OCMw3c9r5pJOgCfMIMFlpP2AbxKdgHbc7U6X0Bwg24JdIPPs/AhY8h
         yrvw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722758496; x=1723363296;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=VuIgKurhfVuhaRadflHzHTLjU4+G8ZrUnfFV7pM3qek=;
        b=dG++MXgmoueefuovP5WbAEF1C2XgOhSxMMiOUnC3YWEzYCFl/s/BoUrbBRB5JQicoX
         J2eNbvaiYaCa6iBdm3BrgDwDa1xE2IS3UMv9CrJK3855RpZ1qhcAptzBgqwHlhmbKuPV
         V1ZXAI44EyzV/BtwyE1mCLwqp7JjKHZOMii/i9CNlFQEyucdgMzJn5ei7rfGjS+VKF2V
         8JRLsg+0qaPgg7DAGXrfe3JdwDEOEsFHMIWr6N+AVaXEPMjGELBEU+p5AzYusfsBzaS9
         eN1vCvQMv4wVFUPikD9XFh/GzLaVP3DmsTnntGEPFFkZdmsud0XM/n8mAxICjbTFVquo
         TQFw==
X-Forwarded-Encrypted: i=1;
 AJvYcCWbijNvTkO2Pgtq5kZMiq8Jtm9EeDG7P+aOniRsPrpQwfPAWmx+wO3/ttCXYo7/EYYtaE4HmD32xg==@kvack.org
X-Gm-Message-State: AOJu0YwDhgCJmPow0hAGPvURSuj5vDo17GEnjSW9BspFN+RcUJ/xWM57
	qJHR5bB2FdH4+BHdeo3QBXDUFr7yyBNDYGpuGqfpvz4D6t0yuhtJ
X-Google-Smtp-Source: 
 AGHT+IETMCm9KlLGfOUvm2Wfa1A1NfpEge6tnm3wsEPeLEldIWCcv2jJJTQKZyOhauMLSQHb4RwIlQ==
X-Received: by 2002:a17:903:41c9:b0:1fd:70f7:220d with SMTP id
 d9443c01a7336-1ff57421fd0mr86427315ad.40.1722758495817;
        Sun, 04 Aug 2024 01:01:35 -0700 (PDT)
Received: from localhost.localdomain ([39.144.105.172])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-1ff61f3fc8asm39601295ad.231.2024.08.04.01.01.32
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Sun, 04 Aug 2024 01:01:35 -0700 (PDT)
From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org
Cc: ying.huang@intel.com,
	mgorman@techsingularity.net,
	linux-mm@kvack.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH v3 1/3] mm/page_alloc: A minor fix to the calculation of
 pcp->free_count
Date: Sun,  4 Aug 2024 16:01:05 +0800
Message-Id: <20240804080107.21094-2-laoar.shao@gmail.com>
X-Mailer: git-send-email 2.30.1 (Apple Git-130)
In-Reply-To: <20240804080107.21094-1-laoar.shao@gmail.com>
References: <20240804080107.21094-1-laoar.shao@gmail.com>
MIME-Version: 1.0
X-Rspamd-Server: rspam03
X-Rspam-User: 
X-Rspamd-Queue-Id: 2A5BE180002
X-Stat-Signature: u93yi4gyh9mqmd6e31z11tq6iwcr174h
X-HE-Tag: 1722758496-793229
X-HE-Meta: 
 U2FsdGVkX1/+P5a9bgSQMK7DxJOvveKHLgZmVZX3BjQeDkybgDaQuXbwk6Ci3jwukksPvXhCKbiFQsDzBDThzdAPLYH4pEBKov+jab/zQ26itBc0XKdL3YBRW0ZULvm8cmQpaoIAXh5EKsL+7u8r1sucr4PeG9IBByXW6lv5Tt+0Xoa7MGxFrbK9Elv0iooXa0T55Mn38V3ag2juFfseM2d9/dNIkfVSqI7v4yYxaeD9wDCF59U4uiOvx8oOjnyXfrkXRFAaSRGp7dZizRUaFDRzW8bPclNycVpx+MJtY+vxsiykxVDiRFMMSQ47EO/sQgaD3ER9lF0GlTgksIowev23eMnp62OMcmDp8NDXIlk8qDcBNE0xIg3tOwhygB88sUk8TZd3h8Cn/f/vRcf7tjRIjdMOvWO6SMV/7rk8EyaNJx/MJeB3BwZ9qAPpPpZKD+Ad84nOV+kzbu4IayP5F2pj1EqEdWdN1YSRiJBV+cx7HQsDWuoir016RKAyDGkwGURgGSxMAEOxntNENCNX9obyaYGcicRxbgQhcBYpvn2J6iaerLjPB6064CpbxRtauVKc+vsol3+J7mZWJ20KkAZ8p9XwmvjguoF+EdsQwpkHfl3f97oonCTSzoSpzx/vttR+kdRuWAcpwqZwuknHDtz+TE6gCdgacSaR3AmI6A0AkCXyvh1BdUw0rLfNg3ILVuOHIzrk3bj+2GibZGG7bRUxY+cgtmVaX6u7np1SYCGXp64EysdzmkpC6p/uhaXn+yJd40QKR4k0miDCIZty56MUUj/nMQXdZlAyvahtOf9G6+0GEwiEbDpmXC9f4MMGknxDgmOWGBXoBgB5Kq0O33xyUnjFYjlqz1H4KtsMsigvJhvrg4N15sv7RFU+F9J4ajPnW5eDj0QOtyYZLMh2QTP6JX+dVqnsVopaSX9ACu96FUE49C7Jq9xvwG+y4yQF2IH2rQF6P9zay0QCuE5
 7HMJQNXL
 B/qcBP5vYhFMFv7bvsJ8/zgOsOZegEobCXd7y+j+VxIAQ+5KkzdMIBQOEwBF4VjTttI+fe6HttRvZfQatajBqmj8CulLYvy/4rYmMBvbLcTYyeYapPnNHnv1QDSfCM6PcrafEyZkEMkgauZ17tUJQPVmfAnGciZebZj8D8Op5itEvJO9PThs3zN/JFUpm6CYQSXw2sMnHMDC5rpCUHUtVdE1RSvFIgEBjyEYhcffa/kvrpPPXF9QzdiRbYLxo3jm0CuTvgvE3qCHy5S5zS1gngb46qJ5BeKvMhUYUZr1F+HZ3SP3MSIxh2j0I9j4EbFcdRyWtmj55T1O/V9JBOHo6+rJqsQceog/KLQ5FDsouCXw6IGX3gS26d8TZFrSvC+MnApb+iS7QZkH0a9Iwna1dLpOAL7BRjF83orRRzW3NlmMN1tFST87tZ+sV8hkuFxoYuX3+0F82HEDDAJ1RtyBzsFRgUfU7Ri9ZgqrG+eqLfFp4NcZ5u+e5VnS22K5YvmSUS+cFPVz8wkVJGgM=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Currently, At worst, the pcp->free_count can be
(batch - 1 + (1 << MAX_ORDER)), which may exceed the expected max value of
(batch << CONFIG_PCP_BATCH_SCALE_MAX).

This issue was identified through code review, and no real problems have
been observed.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47c0ce1d6fa1..d9371806f6b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2614,7 +2614,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
 		pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER;
 	}
 	if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX))
-		pcp->free_count += (1 << order);
+		pcp->free_count = min(pcp->free_count + (1 << order),
+				      batch << CONFIG_PCP_BATCH_SCALE_MAX);
 	high = nr_pcp_high(pcp, zone, batch, free_high);
 	if (pcp->count >= high) {
 		free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high),

From patchwork Sun Aug  4 08:01:06 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yafang Shao <laoar.shao@gmail.com>
X-Patchwork-Id: 13752535
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D666DC3DA64
	for <linux-mm@archiver.kernel.org>; Sun,  4 Aug 2024 08:01:42 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 6BA4D6B00A0; Sun,  4 Aug 2024 04:01:42 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 66B466B00A1; Sun,  4 Aug 2024 04:01:42 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4BB4D6B00A2; Sun,  4 Aug 2024 04:01:42 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com
 [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 2C9F76B00A0
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 04:01:42 -0400 (EDT)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id E7C29161813
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:41 +0000 (UTC)
X-FDA: 82413818802.11.1095671
Received: from mail-oi1-f179.google.com (mail-oi1-f179.google.com
 [209.85.167.179])
	by imf27.hostedemail.com (Postfix) with ESMTP id 165974000E
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:39 +0000 (UTC)
Authentication-Results: imf27.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=PEUy2Zxr;
	spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.167.179 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1722758440;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=kNiJvp0LzGOERpnuIWeLIg8QkTfQVcbyGUX/lTAue9U=;
	b=YzqtPJTfVn1gczKqwoRSGzK0XgBsvG7PHszgAy5AyWlP+scI3qM1jo6bkjAlrqV9GeXQs/
	RHlYaKHJOekKGM55+h29R2h0zOGiXLsZhzTSi0DgywRGgZIi6UEWCnopaaBE+sAJgMUGZ2
	EDwAoAF9+YUiInyQ3fdWH9q+cPQ+1a4=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722758440; a=rsa-sha256;
	cv=none;
	b=6EQuKT1vuqV1W2EGJ2EbARfLC2XOCzCyYqcC3+ItI/4E9iDfGVSsqAt0Zni+m7kV77Vkme
	R7ZAybuOgthlE7pX9YV0jT/ZigHntO0tCRv+SaanAu95x8iyl1f5KeHZ3QeBznove4hBCp
	Qsw+4tfdlMb6A5dOUaBRjgPHlgCtRtQ=
ARC-Authentication-Results: i=1;
	imf27.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=PEUy2Zxr;
	spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.167.179 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-oi1-f179.google.com with SMTP id
 5614622812f47-3db1956643bso6114503b6e.3
        for <linux-mm@kvack.org>; Sun, 04 Aug 2024 01:01:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1722758499; x=1723363299; darn=kvack.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=kNiJvp0LzGOERpnuIWeLIg8QkTfQVcbyGUX/lTAue9U=;
        b=PEUy2ZxrbnZ7tafalZAxfQw0gSico0lk98QBvbwzY/+QgD+IzTM8fB59CWvqJnKzNJ
         /2ZC8lXK5JPVPtMbkdI469Qysp4atG6nhbW3tCvwWei1UQSfy+Z7Rr61ttOUi8OUkPln
         BIhck2xr+ndRVd1tCUYoct5+71L0ofWveSvmatj7ueYt7KVlzW1eCg0KB1NYPWmi22Q7
         o1Wl9gD3NG/hChJ89So8ZjPDb0DQ3Y5IboPRg/as/Ay1Ei71udp357bzYRLkZEt2YTev
         N8172ghnvnnFYkz9JFA8rVWGcGhyAmclmJJ4EmrAXpHTl/ZsbzLcK7vW1wmXl5UjrpUO
         cl1w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722758499; x=1723363299;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=kNiJvp0LzGOERpnuIWeLIg8QkTfQVcbyGUX/lTAue9U=;
        b=vw9Uas+UQUg/W/vNCUJCYJU+8H+/dDu8pz/XavpYTKQzwl43CIqstiqPgUkIjfLBwJ
         NCtMbcoPdQ8dh4CUocBkQJ6AXjVkTaOr3WRZdvxgWOwd5qhhJehUZrOCPvHVuYLFqfGK
         re3I78CrMJ4EaBpAspEJUXJUoLsPOq75XqNac1+WHM7msI/1RS4HLQ7c1GvynLI05DSu
         W5G3NQj6tV7YChdxrwPwgbKOK4hFBpdHC6IuNYqwYEgEZ0uxkim9gPD2PbHCkeXnsrA8
         /fGF+Rwi4r6YCOhwAb56Af/B/Vl11hBeSq8I0rCNEmIFWme+sPG7UKoyyoVPM+nQwbKs
         22nw==
X-Forwarded-Encrypted: i=1;
 AJvYcCWbXr/Gwr3qnr5cfkFrDgeFC5rRIVUXvWdop33vaeYRDvMjrwMMWWclEvs2BRAu82MvtocWCOXV4evRHNDLWVS0/Yg=
X-Gm-Message-State: AOJu0YwJwUQjxMbIfVP5ZzbUbrDoIo9bOYIvhMOMVM90D1OCVpHd1fI/
	O6pZQblpbig3y8sNBnc9I1UVQP+nLOw3k42I75zpZ52OQElWglCY0PJYiaW6FG0=
X-Google-Smtp-Source: 
 AGHT+IGpAbpxmU4tVq7yjlwber2e4K3/+Aaz/MX1DJ5GRLxTzRf1FL6TBbe53FfFkn6TZQy8Gweh1w==
X-Received: by 2002:a05:6808:14cc:b0:3da:df77:80cd with SMTP id
 5614622812f47-3db55817c10mr12090443b6e.29.1722758499153;
        Sun, 04 Aug 2024 01:01:39 -0700 (PDT)
Received: from localhost.localdomain ([39.144.105.172])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-1ff61f3fc8asm39601295ad.231.2024.08.04.01.01.36
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Sun, 04 Aug 2024 01:01:38 -0700 (PDT)
From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org
Cc: ying.huang@intel.com,
	mgorman@techsingularity.net,
	linux-mm@kvack.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH v3 2/3] mm/page_alloc: Avoid changing pcp->high decaying when
 adjusting CONFIG_PCP_BATCH_SCALE_MAX
Date: Sun,  4 Aug 2024 16:01:06 +0800
Message-Id: <20240804080107.21094-3-laoar.shao@gmail.com>
X-Mailer: git-send-email 2.30.1 (Apple Git-130)
In-Reply-To: <20240804080107.21094-1-laoar.shao@gmail.com>
References: <20240804080107.21094-1-laoar.shao@gmail.com>
MIME-Version: 1.0
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 165974000E
X-Stat-Signature: ponps9iuz8cn347peyzh9xn4do3qkbxp
X-HE-Tag: 1722758499-935031
X-HE-Meta: 
 U2FsdGVkX18/i9QhNSdQhv2eRaYnLDkGRqPnx/f/qP43JjnAjU4iYhr/7qQSgvoDdFLqqLLepCpltL/uL24ZC9yIH4DK/6ZtDzadLw4iqjOiiAnXz+W9mJ+W2C3u8u4NV/kuKut8b6991QxDgXqUbNc/0wzvuiWfzl74EtAK2MLD5XrZ+PsQzjEmZG55+bWCtwWiKRapaIt+dWqw9ga8BfsdLh/pxk4IRKt6rH7jClnT6hzXRuS2ck22RbwBPBHRZjeOFSRZ6K7lDI7qKbFguGghyFX8luxxjemoerDR7Z+2ShA55xNcMMDKqyjjlIZ9p8HIt+eOrWybXGW9XyC3M+3wP0I29gXUtcUmqGIhah3IZSY/CBdn9lbssLGU8Q0d7BT9RPRSruk280beCTW3SWzjUuUVUskLrF+jbgwgYxTJT5n3pTEtrcWlzJh3sbQsUiIt8deEekTjmsYDX+3STgjoj6dBI4D734Y/89Z9ZXr+wR3f5R56MpPBUJLdYEoJ7z7Sa7Bc/WQ0UVIGnjZRLgRGx6GghdsWqqDI3sL654EPr7FSqIukd4E01rgBVGaUSBdkhdarEtivl/rwIXZC/rgAysM1o6Ce2gJiPJUNlkz58JpCv9j7wcbXbFeZ346Aqie7wdYRf871TBDkuqBUXpydEuejhfO8KKpiEqWNntYZ8OT2zqggVt5EPC7IOxzOJ/WkEbpRUQU+V3DvwE+dbBi9kwEDcmhiM5uyOUtAKbn/U1FiKQrOGmJMZHV+iSqyuGZuleQenvi7pibkOjy0gahOZgPH2dGiz3BxxzW0xXDRC8Tmr3T9WBxZcVygDBC2tgVuSPhjIfIUeUj265vGjlmrG0BvSXFX0gqdbcIkq79kk/vgn8EF3Y+0oawYREFsvvKYUZv++9gPgG8Il/3NJWide5TsJf+WsrKw/ux5YLXbhvdkKoyNjWw5vUFu0g9Rs9g4r+eLBTCzZpRD+4l
 1lBQOetH
 VYgeZuQt75xYg4N40VKnVryQOd84jmpTpZANyIQj542NYYkeVJ1g9T9AvSB3hovkcO0vojG6rc3h6L8wEzVf2l8cr5EWzscepfapuZKgUHOSFznQQMu4srbgtCwmKTYafdJgJuWtEdNEgvizDU5z5dai+JEPXLOQTQ0pCTjMZkYFFTBXVTdFhiYxgsVftvSmgwhN7G4eAV066YOfDcP+WSrUMDM864Pi0PDePVEyNpm/XH2h0gIJYuARW8LTW2BkY8w0MqH+5XQgTRjMZzZ84FfC0gpHGbm0YztaAMJ64CMWrGnnEP2It7cqb6WQ0QiLhOveAAMs3ERBiEyBJGILYwRNzmFLv2lItLEjKsiqFxylAtmNxBnR5JdAteNkv+MceGZLqVCYsHHoU8i0+J4UQ1GRvPkQatFSobZXPGPfyxPO0u9sXTFoYm7yhOM5e+eZKFWZ1ALWC8rKU2DDREKxdBFpjBQjGrvwzns85GgVRiDqje8CFxo+ZJwr3y27IL5SRu3EbUjaaQZ75FF4=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.020705, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

When adjusting the CONFIG_PCP_BATCH_SCALE_MAX configuration from its
default value of 5 to a lower value, such as 0, it's important to ensure
that the pcp->high decaying is not inadvertently slowed down. Similarly,
when increasing CONFIG_PCP_BATCH_SCALE_MAX to a larger value, like 6, we
must avoid inadvertently increasing the number of pages freed in
free_pcppages_bulk() as a result of this change.

So below improvements are made:
- hardcode the default value of 5 to avoiding modifying the pcp->high
- change free_pcppages_bulk() calling into multiple steps

Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/page_alloc.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d9371806f6b5..5a842cc13314 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2323,7 +2323,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 {
 	int high_min, to_drain, batch;
-	int todo = 0;
+	int todo = 0, count = 0;
 
 	high_min = READ_ONCE(pcp->high_min);
 	batch = READ_ONCE(pcp->batch);
@@ -2333,18 +2333,26 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 	 * control latency.  This caps pcp->high decrement too.
 	 */
 	if (pcp->high > high_min) {
-		pcp->high = max3(pcp->count - (batch << CONFIG_PCP_BATCH_SCALE_MAX),
+		/*
+		 * We will decay 1/8 pcp->high each time in general, so that the
+		 * idle PCP pages can be returned to buddy system timely. To
+		 * control the max latency of decay, we also constrain the
+		 * number pages freed each time.
+		 */
+		pcp->high = max3(pcp->count - (batch << 5),
 				 pcp->high - (pcp->high >> 3), high_min);
 		if (pcp->high > high_min)
 			todo++;
 	}
 
 	to_drain = pcp->count - pcp->high;
-	if (to_drain > 0) {
+	while (count < to_drain) {
 		spin_lock(&pcp->lock);
-		free_pcppages_bulk(zone, to_drain, pcp, 0);
+		free_pcppages_bulk(zone, min(batch, to_drain - count), pcp, 0);
 		spin_unlock(&pcp->lock);
+		count += batch;
 		todo++;
+		cond_resched();
 	}
 
 	return todo;

From patchwork Sun Aug  4 08:01:07 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yafang Shao <laoar.shao@gmail.com>
X-Patchwork-Id: 13752587
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DA631C3DA7F
	for <linux-mm@archiver.kernel.org>; Sun,  4 Aug 2024 08:01:46 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 6BAC26B00A3; Sun,  4 Aug 2024 04:01:46 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 668006B00A4; Sun,  4 Aug 2024 04:01:46 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4E22D6B00A5; Sun,  4 Aug 2024 04:01:46 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com
 [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 29B686B00A3
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 04:01:46 -0400 (EDT)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 94A17161813
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:45 +0000 (UTC)
X-FDA: 82413818970.10.59B45EA
Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com
 [209.85.167.178])
	by imf08.hostedemail.com (Postfix) with ESMTP id B0359160024
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:43 +0000 (UTC)
Authentication-Results: imf08.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=BzKwBXib;
	spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.167.178 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1722758443;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=vDvZ//zCaWvmWjz2UseTXJvmfMYoi/N4vXD74u/t6Ow=;
	b=bXp6qg+TytYAdOFB8SgXG0IrV+PcEE83QmqjVDaImI9yR2huymtaLDWFDqp0e2CMOFdY2I
	XISwr01CUfIBuZf6HcBF3RiZIwv8z+RnsgguFY5XonIbnlEEGdsuyrrJK0S4krchrE/Az+
	zayLCE3bAuC3Tv3XJHYUu/Amt6Zh6TY=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722758443; a=rsa-sha256;
	cv=none;
	b=6I2ex1Z6PH8v6JieXFf1y17qnL4++F6Nxp3Bvs4/oVP/0knguMWNpm2omrWxpPvJJqTWdD
	a+OkDwEulc4MinxvE/NnZSUzmOwbG+Dy2NB12boBBRjm27Cg4nKO9YfHvrm7rt68wlp+Bg
	uB/XaWfobr3Sz7YHbiil75w26nZeyik=
ARC-Authentication-Results: i=1;
	imf08.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=BzKwBXib;
	spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.167.178 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-oi1-f178.google.com with SMTP id
 5614622812f47-3db1956643bso6114525b6e.3
        for <linux-mm@kvack.org>; Sun, 04 Aug 2024 01:01:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1722758503; x=1723363303; darn=kvack.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=vDvZ//zCaWvmWjz2UseTXJvmfMYoi/N4vXD74u/t6Ow=;
        b=BzKwBXib2ZwBgTz3HALhdpHp3VYA1djcufnQzyaq6OU+2kf5HiIYdkXhWDMk4O0nNb
         wReZ4x/RbScH33uon/KV69R7tU/9Xxx4VK6ousWdk/n5cOJ9WBb3GQ4jlU7RQdMqjkfz
         7lJBXBgMAUQkcYIStehz/QqcFZurVfUUAj6RifubB3PLzOUloALMsMz4rT+3bZ7QpMGd
         0uSIk0ofgoiXb8sUPtdfJx/tGUNY4/OIwiKCYPps5oMNqg8dfROyAKqtX/XgDkbvqzQO
         IqiAjTypu+L/33PQyM92cFXPWHdaqOUCXs59T9IN+fWO/lUcDwAI8ivYzinOfVrw/uZU
         1L+Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722758503; x=1723363303;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=vDvZ//zCaWvmWjz2UseTXJvmfMYoi/N4vXD74u/t6Ow=;
        b=oGdfd8UndAvCpSAtl4F8Su3+i3bs5aNXhLmgUvSlzPA0UK8MEC0OVxPa0k8uUpOTt8
         wMeJfS/vAkRBp3bIsaZRLyXgV/GuB0CatHgPIpsi3Qp+jAggj2MBRBRjrg7WVljPjFus
         lHNcK8b16X24rJFjf2qUdUvFgNJP09rNTzPNsdiQuNhFd4lOOTbV8tkP0P4xuitwhCT/
         cnVElM/zddcuHdotE9QyjAnBRVPmLoIBmz5ueFqYlvg/v5cBBdLXqiYbD4nJiz643kuJ
         yYXdAqd2GWYbSCGSvTh7H3v2l7Y2eWs0vfhcr+cOnqaO9JNArFGxz3pHB1kaYoFDrA0m
         5ENw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXP8TZsbbiXonzQMxZ7apI0Pw8IbF+8+pPO/XmGL4hgUEJd1fxEdI7RCNIb687MPfxwOR9Ev+oJ8v8Sm5Y15ouTzuM=
X-Gm-Message-State: AOJu0Yw0wW5iUAOjM8HK4c1ZMgnGQ1fGMEDfDtcYfILQwYRfhfzaOIkQ
	r2NjaQBPykNqxlFml3wcOhti7w4vuiDutUMeEtvrDYPrJEhKN2jf
X-Google-Smtp-Source: 
 AGHT+IGMoMeVRj+uNO0jf4ccDI2oVk9+HVkbM7PYbBW0QBTZxc1Kk8Cg3fsSSXITMM0ylfDteTLTBg==
X-Received: by 2002:a05:6808:f8a:b0:3da:a2bf:23b7 with SMTP id
 5614622812f47-3db55817822mr12664660b6e.28.1722758502624;
        Sun, 04 Aug 2024 01:01:42 -0700 (PDT)
Received: from localhost.localdomain ([39.144.105.172])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-1ff61f3fc8asm39601295ad.231.2024.08.04.01.01.39
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Sun, 04 Aug 2024 01:01:42 -0700 (PDT)
From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org
Cc: ying.huang@intel.com,
	mgorman@techsingularity.net,
	linux-mm@kvack.org,
	Yafang Shao <laoar.shao@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Rientjes <rientjes@google.com>
Subject: [PATCH v3 3/3] mm/page_alloc: Introduce a new sysctl knob
 vm.pcp_batch_scale_max
Date: Sun,  4 Aug 2024 16:01:07 +0800
Message-Id: <20240804080107.21094-4-laoar.shao@gmail.com>
X-Mailer: git-send-email 2.30.1 (Apple Git-130)
In-Reply-To: <20240804080107.21094-1-laoar.shao@gmail.com>
References: <20240804080107.21094-1-laoar.shao@gmail.com>
MIME-Version: 1.0
X-Stat-Signature: k8xonphqhjkxuguyp5xq9dhg4zrdd8ux
X-Rspamd-Queue-Id: B0359160024
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1722758503-458755
X-HE-Meta: 
 U2FsdGVkX18SvTRzdgMsR/WKGYf/wtPKXSspOn/WIqyBbXz7kckNijDPDmthzFE+V+0S2ZmXxjRXzzzCLgsdBzPLjg4tOPWKi+kAtEL9jX4hfvP+j5Y6ResAo4MurbwlFT5qzFpjttXAVaKIgYD0iFFOLGYMhAYv2hq53m/Yev3q0/SgIzJou4hoa+YMUXCTXnCtxbF7JQHpoPKk2jwVA5FltJttsBIfVfqL/NNcPe5lGA94nT02lCqAUCuF0ddeCqTwTRjsKJDcdYVHtRrUmsdkohP9pc1qJ+iBZ5COFx/QqhMNrbp9PPXScwiUjJf5ZZjL8/WKpaTaWDU/SWQrO8wI365WjYFeZTvCdMcxG+5+0P+t3aNLw1B8+gghjC6Ll1b+ulLZuxWq6UyVEIHB7I/1MwGz2iJfrQfceK2oFHiYRRM9y/QM7FiMvy4wncq/PyMFxIga0yTzWS8VM703glsnrinzSucrRoN3stUY95Fo8ClApLOd1qBSKwMuz0F4Js/kDsYNihpl19fdvtr2bucwwOxvjC6Z7E2XXUZDOeeaY6VGiNLH4L5hvn+aXPQ8yspziFhdMHCQya3xpPJ9bd7IyGqzQUb8gfPCoWy0JumT1tHryDQ0QHU68ECc1NpCPsIVkq3IJSZq0qDP6dr+Yn9up20ik04NpmpDVxZytUyCEDhd2411GzAr2DU4pTjuEedKiPx59G/mPsmTuX0JKBqksEu7Bf7q0WcURS3jr7+f3N6seqi6gWvmzmDHg9aEXCXtiUUtIq3aZpdWnuuv3IxWdzngJfHx95IAt+rr5WW1EWEPIMXopqjS+ywkx1GK2qkZ7A4mpa7JozKLpwHyhzd51zP2QKWKRdUoFymsLDA7NpKW/JbtiHzgKzyO1sBtFWsJeEsATG2mOuQQIueW8lsdO0DfRuKGDq1RBQ9KVZCeJVI3lPQNgYOvpcbfVsua+INXlmlWmM3c7rkNtE9
 DBjaId/s
 8VbbGYrSU99sr371Z/U5IlSvLPXcZQiX97HKewGvtRnhIl3A0jZ25CUj0xCehVcmtM8OKdILZT28iuaRs1kt8DNlJG9oIb48cBy6SniNoQTqvUd7mfx2mH7GQUOc/fzNGC/tv5+z0l4LrYPAqi5mXDDYfAfJaWXhWKO4QDHKcj2boLJ364psIIvh1PSwKcuxB9uT5wlj/crmlbLv0lEEXAU/sab9xM8zZOl2iAUFo+q+25ULNAQG5BnqauXMK1BBH6r7RDEXoii3bOLTfNKB+LZ7bbj0wbq7A1IRnH2/M6xBgItF8+/sLhPw7Cj5tyGuKvTfz46lqCgkELFu8ICI5YLIMMRBWb0eDrR2VaBdVL0PuMh06InOAEski1Ww+dr2OkxoK9BA4k3/rFc949EX/ta9aRmsxijinr3+TBf5R4F+evJkmcdsATHIpCYSxeqhPRnncXswN9kJqmlVrSgYfrwdTM/X0bFf6JkR56MYRyVdp7YMvZKny99iEyN8xnUdNlcZtK+yUVI2m7/5kyqpvkznZJZkMdcAS2U8rcgToug0z7tS8K/bC7CCfYbjSeimzgDUW6qdc2Rv/l/yYkWtetSNx6P350zKi/GJDhkrrhQHMwnFjIX988y0gYpJ2Za8rsgobezkmIDyRDEGUzILQ19NlZP1pvhroWEGHIUYyTC7Oz1sn3K+W6RjOZ30gaoqeFL7moMkITciSakDlC5ExMb7KSQ==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Larger page allocation/freeing batch number may cause longer run time of
code holding zone->lock.  If zone->lock is heavily contended at the same
time, latency spikes may occur even for casual page allocation/freeing.
Although reducing the batch number cannot make zone->lock contended
lighter, it can reduce the latency spikes effectively.

To demonstrate this, I wrote a Python script:

  import mmap

  size = 6 * 1024**3

  while True:
      mm = mmap.mmap(-1, size)
      mm[:] = b'\xff' * size
      mm.close()

Run this script 10 times in parallel and measure the allocation latency by
measuring the duration of rmqueue_bulk() with the BCC tools
funclatency[0]:

  funclatency -T -i 600 rmqueue_bulk

Here are the results for both AMD and Intel CPUs.

AMD EPYC 7W83 64-Core Processor, single NUMA node, KVM virtual server
=====================================================================

- Default value of 5

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 12       |                                        |
      1024 -> 2047       : 9116     |                                        |
      2048 -> 4095       : 2004     |                                        |
      4096 -> 8191       : 2497     |                                        |
      8192 -> 16383      : 2127     |                                        |
     16384 -> 32767      : 2483     |                                        |
     32768 -> 65535      : 10102    |                                        |
     65536 -> 131071     : 212730   |*******************                     |
    131072 -> 262143     : 314692   |*****************************           |
    262144 -> 524287     : 430058   |****************************************|
    524288 -> 1048575    : 224032   |********************                    |
   1048576 -> 2097151    : 73567    |******                                  |
   2097152 -> 4194303    : 17079    |*                                       |
   4194304 -> 8388607    : 3900     |                                        |
   8388608 -> 16777215   : 750      |                                        |
  16777216 -> 33554431   : 88       |                                        |
  33554432 -> 67108863   : 2        |                                        |

avg = 449775 nsecs, total: 587066511229 nsecs, count: 1305242

The avg alloc latency can be 449us, and the max latency can be higher
than 30ms.

- Value set to 0

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 92       |                                        |
      1024 -> 2047       : 8594     |                                        |
      2048 -> 4095       : 2042818  |******                                  |
      4096 -> 8191       : 8737624  |**************************              |
      8192 -> 16383      : 13147872 |****************************************|
     16384 -> 32767      : 8799951  |**************************              |
     32768 -> 65535      : 2879715  |********                                |
     65536 -> 131071     : 659600   |**                                      |
    131072 -> 262143     : 204004   |                                        |
    262144 -> 524287     : 78246    |                                        |
    524288 -> 1048575    : 30800    |                                        |
   1048576 -> 2097151    : 12251    |                                        |
   2097152 -> 4194303    : 2950     |                                        |
   4194304 -> 8388607    : 78       |                                        |

avg = 19359 nsecs, total: 708638369918 nsecs, count: 36604636

The avg was reduced significantly to 19us, and the max latency is reduced
to less than 8ms.

- Conclusion

On this AMD CPU, reducing vm.pcp_batch_scale_max significantly helps reduce
latency. Latency-sensitive applications will benefit from this tuning.

However, I don't have access to other types of AMD CPUs, so I was unable to
test it on different AMD models.

Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, two NUMA nodes
============================================================

- Default value of 5

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 2419     |                                        |
      1024 -> 2047       : 34499    |*                                       |
      2048 -> 4095       : 4272     |                                        |
      4096 -> 8191       : 9035     |                                        |
      8192 -> 16383      : 4374     |                                        |
     16384 -> 32767      : 2963     |                                        |
     32768 -> 65535      : 6407     |                                        |
     65536 -> 131071     : 884806   |****************************************|
    131072 -> 262143     : 145931   |******                                  |
    262144 -> 524287     : 13406    |                                        |
    524288 -> 1048575    : 1874     |                                        |
   1048576 -> 2097151    : 249      |                                        |
   2097152 -> 4194303    : 28       |                                        |

avg = 96173 nsecs, total: 106778157925 nsecs, count: 1110263

- Conclusion

This Intel CPU works fine with the default setting.

Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, single NUMA node
==============================================================

Using the cpuset cgroup, we can restrict the test script to run on NUMA
node 0 only.

- Default value of 5

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 46       |                                        |
       512 -> 1023       : 695      |                                        |
      1024 -> 2047       : 19950    |*                                       |
      2048 -> 4095       : 1788     |                                        |
      4096 -> 8191       : 3392     |                                        |
      8192 -> 16383      : 2569     |                                        |
     16384 -> 32767      : 2619     |                                        |
     32768 -> 65535      : 3809     |                                        |
     65536 -> 131071     : 616182   |****************************************|
    131072 -> 262143     : 295587   |*******************                     |
    262144 -> 524287     : 75357    |****                                    |
    524288 -> 1048575    : 15471    |*                                       |
   1048576 -> 2097151    : 2939     |                                        |
   2097152 -> 4194303    : 243      |                                        |
   4194304 -> 8388607    : 3        |                                        |

avg = 144410 nsecs, total: 150281196195 nsecs, count: 1040651

The zone->lock contention becomes severe when there is only a single NUMA
node. The average latency is approximately 144us, with the maximum
latency exceeding 4ms.

- Value set to 0

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 24       |                                        |
       512 -> 1023       : 2686     |                                        |
      1024 -> 2047       : 10246    |                                        |
      2048 -> 4095       : 4061529  |*********                               |
      4096 -> 8191       : 16894971 |****************************************|
      8192 -> 16383      : 6279310  |**************                          |
     16384 -> 32767      : 1658240  |***                                     |
     32768 -> 65535      : 445760   |*                                       |
     65536 -> 131071     : 110817   |                                        |
    131072 -> 262143     : 20279    |                                        |
    262144 -> 524287     : 4176     |                                        |
    524288 -> 1048575    : 436      |                                        |
   1048576 -> 2097151    : 8        |                                        |
   2097152 -> 4194303    : 2        |                                        |

avg = 8401 nsecs, total: 247739809022 nsecs, count: 29488508

After setting it to 0, the avg latency is reduced to around 8us, and the
max latency is less than 4ms.

- Conclusion

On this Intel CPU, this tuning doesn't help much. Latency-sensitive
applications work well with the default setting.

It is worth noting that all the above data were tested using the upstream
kernel.

Why introduce a systl knob?
===========================

From the above data, it's clear that different CPU types have varying
allocation latencies concerning zone->lock contention. Typically, people
don't release individual kernel packages for each type of x86_64 CPU.

Furthermore, for latency-insensitive applications, we can keep the default
setting for better throughput. In our production environment, we set this
value to 0 for applications running on Kubernetes servers while keeping it
at the default value of 5 for other applications like big data. It's not
common to release individual kernel packages for each application.

Future work
===========

To ultimately mitigate the zone->lock contention issue, several suggestions
have been proposed. One approach involves dividing large zones into multi
smaller zones, as suggested by Matthew[1], while another entails splitting
the zone->lock using a mechanism similar to memory arenas and shifting away
from relying solely on zone_id to identify the range of free lists a
particular page belongs to, as suggested by Mel[2]. However, implementing
these solutions is likely to necessitate a more extended development
effort.

Link: https://github.com/iovisor/bcc/blob/master/tools/funclatency.py [0]
Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [1]
Link: https://lore.kernel.org/linux-mm/20240705130943.htsyhhhzbcptnkcu@techsingularity.net/ [2]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
---
 Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++++++++
 mm/Kconfig                              | 11 -----------
 mm/page_alloc.c                         | 23 +++++++++++++++++------
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f48eaa98d22d..4971289dfb79 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -66,6 +66,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- pcp_batch_scale_max
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -883,6 +884,22 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+pcp_batch_scale_max
+===================
+
+In page allocator, PCP (Per-CPU pageset) is refilled and drained in
+batches.  The batch number is scaled automatically to improve page
+allocation/free throughput.  But too large scale factor may hurt
+latency.  This option sets the upper limit of scale factor to limit
+the maximum latency.
+
+The range for this parameter spans from 0 to 6, with a default value of 5.
+The value assigned to 'N' signifies that during each refilling or draining
+process, a maximum of (batch << N) pages will be involved, where "batch"
+represents the default batch size automatically computed by the kernel for
+each zone.
+
+
 percpu_pagelist_high_fraction
 =============================
 
diff --git a/mm/Kconfig b/mm/Kconfig
index 7b716ac80272..14f64b4f744a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -690,17 +690,6 @@ config HUGETLB_PAGE_SIZE_VARIABLE
 config CONTIG_ALLOC
 	def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
-config PCP_BATCH_SCALE_MAX
-	int "Maximum scale factor of PCP (Per-CPU pageset) batch allocate/free"
-	default 5
-	range 0 6
-	help
-	  In page allocator, PCP (Per-CPU pageset) is refilled and drained in
-	  batches.  The batch number is scaled automatically to improve page
-	  allocation/free throughput.  But too large scale factor may hurt
-	  latency.  This option sets the upper limit of scale factor to limit
-	  the maximum latency.
-
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5a842cc13314..bf0c94a0b659 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -273,6 +273,8 @@ int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
 static int watermark_boost_factor __read_mostly = 15000;
 static int watermark_scale_factor = 10;
+static int pcp_batch_scale_max = 5;
+static int sysctl_6 = 6;
 
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
@@ -2391,7 +2393,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
 		count = pcp->count;
 		if (count) {
 			int to_drain = min(count,
-				pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+				pcp->batch << pcp_batch_scale_max);
 
 			free_pcppages_bulk(zone, to_drain, pcp, 0);
 			count -= to_drain;
@@ -2519,7 +2521,7 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int batch, int high, bool free
 
 	/* Free as much as possible if batch freeing high-order pages. */
 	if (unlikely(free_high))
-		return min(pcp->count, batch << CONFIG_PCP_BATCH_SCALE_MAX);
+		return min(pcp->count, batch << pcp_batch_scale_max);
 
 	/* Check for PCP disabled or boot pageset */
 	if (unlikely(high < batch))
@@ -2551,7 +2553,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
 		return 0;
 
 	if (unlikely(free_high)) {
-		pcp->high = max(high - (batch << CONFIG_PCP_BATCH_SCALE_MAX),
+		pcp->high = max(high - (batch << pcp_batch_scale_max),
 				high_min);
 		return 0;
 	}
@@ -2621,9 +2623,9 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
 	} else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) {
 		pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER;
 	}
-	if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX))
+	if (pcp->free_count < (batch << pcp_batch_scale_max))
 		pcp->free_count = min(pcp->free_count + (1 << order),
-				      batch << CONFIG_PCP_BATCH_SCALE_MAX);
+				      batch << pcp_batch_scale_max);
 	high = nr_pcp_high(pcp, zone, batch, free_high);
 	if (pcp->count >= high) {
 		free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high),
@@ -2964,7 +2966,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order)
 		 * subsequent allocation of order-0 pages without any freeing.
 		 */
 		if (batch <= max_nr_alloc &&
-		    pcp->alloc_factor < CONFIG_PCP_BATCH_SCALE_MAX)
+		    pcp->alloc_factor < pcp_batch_scale_max)
 			pcp->alloc_factor++;
 		batch = min(batch, max_nr_alloc);
 	}
@@ -6341,6 +6343,15 @@ static struct ctl_table page_alloc_sysctl_table[] = {
 		.proc_handler	= percpu_pagelist_high_fraction_sysctl_handler,
 		.extra1		= SYSCTL_ZERO,
 	},
+	{
+		.procname	= "pcp_batch_scale_max",
+		.data		= &pcp_batch_scale_max,
+		.maxlen		= sizeof(pcp_batch_scale_max),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &sysctl_6,
+	},
 	{
 		.procname	= "lowmem_reserve_ratio",
 		.data		= &sysctl_lowmem_reserve_ratio,