From patchwork Sun Jul 7 09:49:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13725992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BB13C30653 for ; Sun, 7 Jul 2024 09:50:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BD876B0093; Sun, 7 Jul 2024 05:50:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96D0B6B0095; Sun, 7 Jul 2024 05:50:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80E8A6B0096; Sun, 7 Jul 2024 05:50:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 61DD46B0093 for ; Sun, 7 Jul 2024 05:50:50 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 15C251C1A02 for ; Sun, 7 Jul 2024 09:50:50 +0000 (UTC) X-FDA: 82312487460.18.7243B6C Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf16.hostedemail.com (Postfix) with ESMTP id 38B2A180008 for ; Sun, 7 Jul 2024 09:50:48 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L8sozZGe; spf=pass (imf16.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720345834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sD7NfWgTVs0F8RAzMXcGiyzg6zCjJB986hliTgQKeuM=; b=LK7mGV+YVyhakXxrqzzNf7bQC59rAONdMJlQ6bR04qgTlMhcyjZPRs+WhKReK7SqP4fPnx b/Eichk1JcMtkl15HwZ7I8ci7KBDq0htoSva4jGv1R3x3CjgkKp67vcXKHmmCMZgmZAPYg 1RmuXnwx9puZryiDlhH4RGyJSYrqE3U= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L8sozZGe; spf=pass (imf16.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720345834; a=rsa-sha256; cv=none; b=D5t7DZqDbJ6E88r629FdO+zPk0odR20q+HaRGD3JT1aTYni2AA8tv3LOXDiOTbBQrfqpfh +M79hyrrMZ6FZlTr4GKIkrgf1TrlYSUqOb3wM7Is3QSm3MT+sAD401lQA2Jt/+K7oL6NET TJNT//iTn7BBY+jGK9IWnRYRGxzxXUE= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-64b29539d87so26223477b3.0 for ; Sun, 07 Jul 2024 02:50:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720345847; x=1720950647; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sD7NfWgTVs0F8RAzMXcGiyzg6zCjJB986hliTgQKeuM=; b=L8sozZGeI7xEU44kRP2vx4/gtunUdb3ZsbElXMaAhD/J6If4bZR8O76N0O0SLwpEYE w/LAym21s1DImAmS0NkXdqHw8ql4p4jUnIvuKYDKMnxIX+Kp0wDUCguzRl/EFvI2i2E3 5jeKV/DhBFAff6Yk6HozYW9J75OkfWfs41ta3ccDWWughRwR/Vllr7CiiNXhe61rxp+8 vusN4rf2CJv1/IaU+AsIa2oBATPQiHusnCq/KA78rfkqR1sEr82BVcx3SpiAQ+M/S/y3 mzBp7FmAoxQlNSwgpgVVjrBNsMxXcxZtW8+Ii0ljMMsLwivlH0fsjIVjzj43swNFc4mS hVow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720345847; x=1720950647; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sD7NfWgTVs0F8RAzMXcGiyzg6zCjJB986hliTgQKeuM=; b=tdPRfaKNTbFrDKML5Wtoq79TZo/OAWhP94kIwB3DN+F73wdeQ7w/Gsp/7maRMQ06rA IpdqJouwwXZsDYoHB7EOTzjuHCZvFQ+r6xhO4YFQA1RbTT3E5QkzXipTUkFSuiAqzBcw bFwZQn925F2/b7F+OiIHHGaMIlSgmtEjOwTE+kX1nXqGAg9Sur5YMPBhgCOKWTXgDAXs SjAVeIk/Cl7C8+AjS6p6eIq8MDAjr+zrQScq1e6ZaZ8bElePlDZGTDfnRz/iGFCd3WAO 70rg4ngP/mUkpjgVk8d9z7maNrp0RgmaZRL8WlZVxbFNk23om5v4oLndDlTa0ASHpSom Y0Yg== X-Forwarded-Encrypted: i=1; AJvYcCWmVQw06aJJY2rX0K/oaz7g1m+GxBTx/Zn377DhWCSJIe2V1FSOafnkScfjrcbTZCej1B/0jwz+T3kM6RJcR0k1nU0= X-Gm-Message-State: AOJu0YznnO2CxAE4H79WWpP1UmTeKNFUpp7KN3v0fdBcGUULvr343flf jlfLg6bw4YUPnblzbE6Rwa568vXZyoy8g1AQOh39/hUaw4IX5HKm X-Google-Smtp-Source: AGHT+IGXOdiPbTJIJ5f4sdMD5U4qP3UzLYv6wZ9lkRKsq7bdrwD6nYfjB3zi9Ul4pQTIJE0D0OayXQ== X-Received: by 2002:a81:b649:0:b0:627:a757:cdfa with SMTP id 00721157ae682-652d7b5cf79mr92659127b3.38.1720345847252; Sun, 07 Jul 2024 02:50:47 -0700 (PDT) Received: from localhost.localdomain ([39.144.43.178]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b1fdc335csm2184601b3a.22.2024.07.07.02.50.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Jul 2024 02:50:46 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org Cc: ying.huang@intel.com, mgorman@techsingularity.net, linux-mm@kvack.org, Yafang Shao , Matthew Wilcox , David Rientjes Subject: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max Date: Sun, 7 Jul 2024 17:49:56 +0800 Message-Id: <20240707094956.94654-4-laoar.shao@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20240707094956.94654-1-laoar.shao@gmail.com> References: <20240707094956.94654-1-laoar.shao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 38B2A180008 X-Stat-Signature: t711iuwftsjyq94no5edxhsmy8u3jq9u X-HE-Tag: 1720345848-126003 X-HE-Meta: U2FsdGVkX19JmZXEhZJVDPwVKIUe52idaaMlHAdvyyjUncrB7QK0m7vEn3r+nG26GWc7Rdkju3E3BmOzxFwr9LZkayZP6oym9tXqbRVh8/6Vs5XvKnQyxQNt84x+pp/WjuBZlpWzUwCaPyOmzsYqwh8Csa4H5cL8n2QFAgB4PNsyFROod/vq/2KDAuKkOUIAVN5x8dCm10ISUslDNBKnddAI/f26BpL3NNFalF4lSUSL3pHY118mQC5oF3fzxL4vxVKsyXPd774lVdU3A3hQ+L1P0vkQVBniY03SGNhRB5NcaoxogrZzASQjb0SpKEDVlhOgSGA1+eIBNBOWMfFza2j+i8egNgKl4/CmhBFI5+pKpUyCChld+yRv9YIIyWZDtisYZVTblT79WrRPKFDkkF/C6Mk+rq7SQ6nl21+bpoLY12QaEOp/eWyPs1u1mwq0CAQkAzqcQU6AGAtxGE0h1KI1I8RSTavRbxPj5tOPYj+ZU4cg58zi9FUtkXypRKDDBqwYrt/i9xYmEiiuAgmquPJ4LPdbcZ3EAD7F5iKsgeCZPNZE3eDbplBotIjRwWE7UHCpikcFgd0eNK3tVTP9rm2vDy72kxrlgFAFqtDYrSmgCdfPRBj+7WsAbDNmavnB4TeGM/n/OGc8CdqFPbSqd07aa5HU+ADtsRflxT94lXNtwkVZFVzybf3UmKpDPIUdD0yjh+s+auDxfWLn0RJlgEkXgMFJvgGe6uCj8+YZutwcxJg70a+Oke5eagJTyMlJVjiRbf5xkNZ1QquXNFxgt3/eh1y2kcw/JEBGd7tCts3IEfrFl/aoWIq3kYOAMf4WNsJV+9/IvHGWtFqLXHqTiTQg1vL2L8nVpoACWiyl6WAb2mvcOZW1O0rWNmIYAxEhhms6nCV3T2Z1LLyBgN/ORwqlFqeeNhXFlhIa4xvr05u/QoauIXQFHdxpe5Cq5gfBkzHfiDuHRaivmX8SACJ o8iWekNo 9zZNgBl8CzjXL1jZQ0GyBEx2ci17tRKUJdgZBxLXjBUKW/BXLUozY7vGEZgCUWPNC60/ybKIxaeagzwUzuOMNpbQqjwe/xmEI1JHydrAtHqZcmTBn6PSRV6YTkexsCmLtAsrNXMDn9Cwy5MxFX6c4VOx7p6qs5TTDW2I98gH1HvrVxCrCh0omm0mewCiHH67kaRN1rfjikhoikXGXQMUNz1LnrYDUl/3wgncqs3JPjvrpKrheMRHDZPVzKtymbtfIYOO6HMrMmekDjY5UNW4LAon7t4jnr8h9cLOBtm8gJ83PbCM08YkeV1FF4D0IkbvS8VbBIfi7ME6qUxUuIfO9uJvmv88ro/CRLfF4Rf4pLqNkuckiQCT9P15INR05z6QbJyxpvSABdAVXGfyQH3VwTCWrSDx/gAQ26ghz5gZsel8SzVUSxC+G7QIn0juIyfEHkJlhcYZs6Jcro6wiIuwsnDHvBenCpYpDVcIHY47NdhrdvNafJZizv4/kFH29GzjFwGZtcl9dBYbNfrhVlFuLC5/nW+Bs+NLA49MRXQobwy92qIm8bbn9hTZY5fRWAcXOZ7wNcxEHKbJNVX8BcibU1zzOa1BfKELTM9AEye55EIDdSDHeCA4AZ89YuQck9CfpZLjqEMLmZtzpfaksk+cUh8Is4BN8Etz+z5iOV1hRADJZpHBtx79xVCgdYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The configuration parameter PCP_BATCH_SCALE_MAX poses challenges for quickly experimenting with specific workloads in a production environment, particularly when monitoring latency spikes caused by contention on the zone->lock. To address this, a new sysctl parameter vm.pcp_batch_scale_max is introduced as a more practical alternative. To ultimately mitigate the zone->lock contention issue, several suggestions have been proposed. One approach involves dividing large zones into multi smaller zones, as suggested by Matthew[0], while another entails splitting the zone->lock using a mechanism similar to memory arenas and shifting away from relying solely on zone_id to identify the range of free lists a particular page belongs to[1]. However, implementing these solutions is likely to necessitate a more extended development effort. Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0] Link: https://lore.kernel.org/linux-mm/20240705130943.htsyhhhzbcptnkcu@techsingularity.net/ [1] Signed-off-by: Yafang Shao Cc: "Huang, Ying" Cc: Mel Gorman Cc: Matthew Wilcox Cc: David Rientjes --- Documentation/admin-guide/sysctl/vm.rst | 15 +++++++++++++++ include/linux/sysctl.h | 1 + kernel/sysctl.c | 2 +- mm/Kconfig | 11 ----------- mm/page_alloc.c | 22 ++++++++++++++++------ 5 files changed, 33 insertions(+), 18 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index e86c968a7a0e..eb9e5216eefe 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -66,6 +66,7 @@ Currently, these files are in /proc/sys/vm: - page_lock_unfairness - panic_on_oom - percpu_pagelist_high_fraction +- pcp_batch_scale_max - stat_interval - stat_refresh - numa_stat @@ -864,6 +865,20 @@ mark based on the low watermark for the zone and the number of local online CPUs. If the user writes '0' to this sysctl, it will revert to this default behavior. +pcp_batch_scale_max +=================== + +In page allocator, PCP (Per-CPU pageset) is refilled and drained in +batches. The batch number is scaled automatically to improve page +allocation/free throughput. But too large scale factor may hurt +latency. This option sets the upper limit of scale factor to limit +the maximum latency. + +The range for this parameter spans from 0 to 6, with a default value of 5. +The value assigned to 'N' signifies that during each refilling or draining +process, a maximum of (batch << N) pages will be involved, where "batch" +represents the default batch size automatically computed by the kernel for +each zone. stat_interval ============= diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 09db2f2e6488..fb797f1c0ef7 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -52,6 +52,7 @@ struct ctl_dir; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ #define SYSCTL_MAXOLDUID ((void *)&sysctl_vals[10]) #define SYSCTL_NEG_ONE ((void *)&sysctl_vals[11]) +#define SYSCTL_SIX ((void *)&sysctl_vals[12]) extern const int sysctl_vals[]; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e0b917328cf9..430ac4f58eb7 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -82,7 +82,7 @@ #endif /* shared constants to be used in various sysctls */ -const int sysctl_vals[] = { 0, 1, 2, 3, 4, 100, 200, 1000, 3000, INT_MAX, 65535, -1 }; +const int sysctl_vals[] = { 0, 1, 2, 3, 4, 100, 200, 1000, 3000, INT_MAX, 65535, -1, 6 }; EXPORT_SYMBOL(sysctl_vals); const unsigned long sysctl_long_vals[] = { 0, 1, LONG_MAX }; diff --git a/mm/Kconfig b/mm/Kconfig index b4cb45255a54..41fe4c13b7ac 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -663,17 +663,6 @@ config HUGETLB_PAGE_SIZE_VARIABLE config CONTIG_ALLOC def_bool (MEMORY_ISOLATION && COMPACTION) || CMA -config PCP_BATCH_SCALE_MAX - int "Maximum scale factor of PCP (Per-CPU pageset) batch allocate/free" - default 5 - range 0 6 - help - In page allocator, PCP (Per-CPU pageset) is refilled and drained in - batches. The batch number is scaled automatically to improve page - allocation/free throughput. But too large scale factor may hurt - latency. This option sets the upper limit of scale factor to limit - the maximum latency. - config PHYS_ADDR_T_64BIT def_bool 64BIT diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2b76754a48e0..703eec22a997 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -273,6 +273,7 @@ int min_free_kbytes = 1024; int user_min_free_kbytes = -1; static int watermark_boost_factor __read_mostly = 15000; static int watermark_scale_factor = 10; +static int pcp_batch_scale_max = 5; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -2310,7 +2311,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) int count = READ_ONCE(pcp->count); while (count) { - int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); + int to_drain = min(count, pcp->batch << pcp_batch_scale_max); count -= to_drain; spin_lock(&pcp->lock); @@ -2438,7 +2439,7 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int batch, int high, bool free /* Free as much as possible if batch freeing high-order pages. */ if (unlikely(free_high)) - return min(pcp->count, batch << CONFIG_PCP_BATCH_SCALE_MAX); + return min(pcp->count, batch << pcp_batch_scale_max); /* Check for PCP disabled or boot pageset */ if (unlikely(high < batch)) @@ -2470,7 +2471,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return 0; if (unlikely(free_high)) { - pcp->high = max(high - (batch << CONFIG_PCP_BATCH_SCALE_MAX), + pcp->high = max(high - (batch << pcp_batch_scale_max), high_min); return 0; } @@ -2540,9 +2541,9 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) { pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER; } - if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) + if (pcp->free_count < (batch << pcp_batch_scale_max)) pcp->free_count = min(pcp->free_count + (1 << order), - batch << CONFIG_PCP_BATCH_SCALE_MAX); + batch << pcp_batch_scale_max); high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2884,7 +2885,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order) * subsequent allocation of order-0 pages without any freeing. */ if (batch <= max_nr_alloc && - pcp->alloc_factor < CONFIG_PCP_BATCH_SCALE_MAX) + pcp->alloc_factor < pcp_batch_scale_max) pcp->alloc_factor++; batch = min(batch, max_nr_alloc); } @@ -6251,6 +6252,15 @@ static struct ctl_table page_alloc_sysctl_table[] = { .proc_handler = percpu_pagelist_high_fraction_sysctl_handler, .extra1 = SYSCTL_ZERO, }, + { + .procname = "pcp_batch_scale_max", + .data = &pcp_batch_scale_max, + .maxlen = sizeof(pcp_batch_scale_max), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_SIX, + }, { .procname = "lowmem_reserve_ratio", .data = &sysctl_lowmem_reserve_ratio,