From patchwork Mon Jul 1 14:20:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13718196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B55EEC2BD09 for ; Mon, 1 Jul 2024 14:21:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 448F16B00A7; Mon, 1 Jul 2024 10:21:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F8FB6B00A8; Mon, 1 Jul 2024 10:21:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E7C76B00A9; Mon, 1 Jul 2024 10:21:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 115216B00A7 for ; Mon, 1 Jul 2024 10:21:19 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C086D1611C1 for ; Mon, 1 Jul 2024 14:21:18 +0000 (UTC) X-FDA: 82291396236.07.D33F33D Received: from mail-oo1-f44.google.com (mail-oo1-f44.google.com [209.85.161.44]) by imf08.hostedemail.com (Postfix) with ESMTP id EE87C160026 for ; Mon, 1 Jul 2024 14:21:16 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dJmi9ABo; spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.161.44 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719843648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=B+KAZ0EivdK/MsjvIuk49PQNZkkEbdFnoSd49T2Qc5k=; b=HLWispHrNz8+1dzxly59QF4v5tYnaSJfm6aBw+MIPKyD8qmyxeJqRDxrgdOCznwQbLrme1 RjBRzT3G5j/a+8Um3AMb30JfTytpQFyvne6pwdCK+Pj9giEOLfZbQ0ImEjwT/I9FjYjJvW K2ydhFJ304xolyojXepMX6h3C8jh/ic= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dJmi9ABo; spf=pass (imf08.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.161.44 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719843648; a=rsa-sha256; cv=none; b=NRh+s6QMBHNBT0p8kK9nR2uDTZstxkVAKtpbTLYOqorK+L4d7SlTzvuDRjFs7lID8UXDo0 W+mvwHamohu5S+eLn8Zlpai75WBaZONAwpZEsPaAsN9c0gvxOOa4fYFNhIzlwO+dDQ7hEY hEffjqmBWK2P9ZIMiVJ95RSxUfnp/4g= Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-5c1d2f7ab69so1450308eaf.3 for ; Mon, 01 Jul 2024 07:21:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719843676; x=1720448476; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=B+KAZ0EivdK/MsjvIuk49PQNZkkEbdFnoSd49T2Qc5k=; b=dJmi9ABo2mcnxagdcOov0uqB2Iv5pirpBYwF4ge0nIXVQr+RlNcpThD4mKhh4VNbma borOaYgqacnZE+dQ3h4g4qZ/UcHKwJApMW8ogYxITKvm8xIPxjb20YNkJIrbysVcNhAQ lQk2BOznKK1PYwLvhoiSPPMW3I4EISUxh5c4rKIzhuiXmWb2H+Y+IqUchdWXRnWdRopA 90NsxZTrCx/wERvSa9LM76FrMcgbLXCqjH9oDDO5F7pH6zG7rB8K7Yrks4wQTFdrYAju kkgTcKczO3clegpfPUfKXoNlzIGcWd4RdWf1OvvRvqAvIqZ0JmsyLmdxuVEerlOSkZ2v g3Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719843676; x=1720448476; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=B+KAZ0EivdK/MsjvIuk49PQNZkkEbdFnoSd49T2Qc5k=; b=oe9tHLguW4m2ZkHUEwFcEeDzlccOKx4UKenQmsa+UOQKPGrwH6svv61oAGsBPs/uqz r1loGoQejgkkjxAMX75Sfq36LozzfDELU++JpMVidlw1QVwFy4Qal8vyKDxDCs/K9VbW d5R5Uld36zH0JlCIKf2XvKyyAGkMSMAjhA/gy4BPyWI9reFeZ/mNMfIEiDGT4BgRqMeh 7hv1WFImi0GhohIEjG+ktsrjwTtuDOok9BwWDpfb7fbFu7Tw1IcltR1QJuoIA4dyQU2R C/n6SW1vCkmraVSlUPaO5gU4CUNOMVx8UAvsEj3KvJfjHnhAsCmezyOKqd+rPvsqmSz8 vyGw== X-Gm-Message-State: AOJu0YxZbGvgbyr1tS1W4BVkW8m+/tjE37KEx5QdQMadZi/vNhQSXd7o cDTqDjoMV4Q+f6W4Mzzvbstqu0Fl1yL1Gk/mSlPefRKOQfUcWX83 X-Google-Smtp-Source: AGHT+IHIESJSje0sbXKl/XSiosrnY07FoU36RrDo0L5tLHuVFSPVQ5Yo54axAwwv8Dk0EZHE9Hlb8Q== X-Received: by 2002:a05:6358:6f07:b0:1a6:771e:110c with SMTP id e5c5f4694b2df-1a6acc3044amr505060755d.6.1719843675657; Mon, 01 Jul 2024 07:21:15 -0700 (PDT) Received: from localhost.localdomain ([39.144.104.218]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-72c6b584624sm5211322a12.53.2024.07.01.07.21.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 01 Jul 2024 07:21:15 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, Yafang Shao , Matthew Wilcox , David Rientjes , "Huang, Ying" , Mel Gorman Subject: [PATCH] mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist Date: Mon, 1 Jul 2024 22:20:46 +0800 Message-Id: <20240701142046.6050-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EE87C160026 X-Stat-Signature: 9zm4xr438xi76mb194frk7oqsggyhtkh X-Rspam-User: X-HE-Tag: 1719843676-756509 X-HE-Meta: U2FsdGVkX1/WTc0SIKYWrxRymaGZ8cO2X6EEgloFHBqjWOzGabpwYyHMrFjMWLYuOWnqrrw9LtVczsRvWmKmxKvpGvxEKZGPPlgANBTzAWMPkIQ6VPU0/FrTVZ+o2ZMKpT7ZGvXmz3o+Uu3iiWfrqI1Ug1lPDSIoHbRqPHvFLd/9WUmabTtZOrG5Duqdu3+vzISrHaF2N0b5dC+HrSDUy6cPXqRJM55060rFUGG4rJc8nYHlDceJzccSKiumVJVwrpPa4xZ/U7iU7AkT7B7ltvIcSRqigbcl5mE0zdeiHoZ5jMZjL5P0RwSIbhlU4SK3qNvDatqg8utmAZUD4Hv9e5fcdKb35gpzjRsOUlzRcokGRNW+C/NtrSBUo8TTPLMs5XWjxF56HWAiKDaYs1Os07wbmG/wnwy5MhkdcrRPTVM5ykSdce20BLuiIE2wXe47G2ggZViPRKkwwhp68sG/aiF5Po3AYQFJ6VogGT8Jrsmyj4OA7BZEsC9JgWaqXqWDqxSplvHMPeCjJtFpevxzgBXaPt6estEJuY3vuADSV0IylkOsYglPvQr2PZ0Q6VzBkUu40Dyuo8xbnuy1ILwtVWEDsqa0SdjAhfE8BU5f1fjh9n9O2tPAI1V/J+393XEGvYg4ztAk/E9yLO0l5w9O5NTji+OXBbZKTiOUGiVTrkDtosthehyMmhJ83wn+cYp4FLL08CmNW3zr8rUm9LHgFlWNsszEqQVsKUS8DOQoFefBaJqtGQtbxKcLw9kRD6Mg/jSp4j61ESb8nbSFVhfO0V4iieQHUCTaKP4/PRzESCBy+xhXLsKVm9B65fjKTrUV2LqQeDAIiJVtClvLH0us8x9AC19lO2pWXAVYtv0PZVf7mHWHCofKqC7+FMVaoR2Hmd1wzdL565RKmH5nEj5SGPCghdJDcVeMRQFr4kaOJ8h64mTpTk6dk2FcNYoc2Y5ZyDorNMAhyWBS9v/lWx8 iWYSALP5 hg723mzqrH0LU6DUqrhkptgev7GoQ6wOn8f7alIgBuh+7XvvrnjgzcQAZAcBnswkcaLhQQST/eF/gt/go9ffXAClYjJelR+HkuQNdAL2ZnEXJmB/IRZCDynXNParDmPlJVkFaPxIXnn1ZNCFkrAb3YwLkjLkqNhMLvIiBPM/zNetSyHEeQ6ChvBeFpthZQ18+xSKz3ibzNYJQDmQlg7pOaAARQljhQxPcTFIM3WAZRwPCOJ51EhhnouqebLYwEaB2YYRSotvsOw2W74/QmCXu1mDiWS832fjX5pIEPnfgdGbwZCkQsBUobWVYepGcCv/QyJ2+TxcJjCkvwnlD8j9Wn+TPI3PhU/DMn0iaQ50guyFEhmnP/TmSf3GdTcJQS4vq/LfRGEJEXhPghE2hQKKHvCyvz1gKKcVWD6FcKsuKoI+Wl55C75iJ6NOeF1OzG2xZXqbyM0aelbaWnAgFG0+9SBUYGCOumAWZME4owsSuuuN5l0r7UugxKD8d5rfCjHew5KG7/IV7w5dyzW8P/3VxY2w7za9rGnSh5L3NkKMobCuj6ilOu21AhdXNu+r9bjz/6No5l3hnoI0Zum02pFSFf4j4nnNpA6+v5BcfsB03sAB54kxFoonAKRLjnKIzje+8JJ/TRWzQMUn7ByYBiswmMhOqxQl532AoK/nZRoMaXgUXDxAm5zLn5kjQhCgmadvrqfVksoEyHVGSYilRqGaWytm8esLPEweGHoIS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, we're encountering latency spikes in our container environment when a specific container with multiple Python-based tasks exits. These tasks may hold the zone->lock for an extended period, significantly impacting latency for other containers attempting to allocate memory. As a workaround, we've found that minimizing the pagelist size, such as setting it to 4 times the batch size, can help mitigate these spikes. However, managing vm.percpu_pagelist_high_fraction across a large fleet of servers poses challenges due to variations in CPU counts, NUMA nodes, and physical memory capacities. To enhance practicality, we propose allowing the setting of -1 for vm.percpu_pagelist_high_fraction to designate a minimum pagelist size. Furthermore, considering the challenges associated with utilizing vm.percpu_pagelist_high_fraction, it would be beneficial to introduce a more intuitive parameter, vm.percpu_pagelist_high_size, that would permit direct specification of the pagelist size as a multiple of the batch size. This methodology would mirror the functionality of vm.dirty_ratio and vm.dirty_bytes, providing users with greater flexibility and control. We have discussed the possibility of introducing multiple small zones to mitigate the contention on the zone->lock[0], but this approach is likely to require a longer-term implementation effort. Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0] Signed-off-by: Yafang Shao Cc: Matthew Wilcox Cc: David Rientjes Cc: "Huang, Ying" Cc: Mel Gorman --- Documentation/admin-guide/sysctl/vm.rst | 4 ++++ mm/page_alloc.c | 8 ++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index e86c968a7a0e..1f535d022cda 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -856,6 +856,10 @@ on per-cpu page lists. This entry only changes the value of hot per-cpu page lists. A user can specify a number like 100 to allocate 1/100th of each zone between per-cpu lists. +The minimum number of pages that can be stored in per-CPU page lists is +four times the batch value. By writing '-1' to this sysctl, you can set +this minimum value. + The batch value of each per-cpu page list remains the same regardless of the value of the high fraction so allocation latencies are unaffected. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2e22ce5675ca..e7313f9d704b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5486,6 +5486,10 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online, int nr_split_cpus; unsigned long total_pages; + /* Setting -1 to set the minimum pagelist size, four times the batch size */ + if (high_fraction == -1) + return batch << 2; + if (!high_fraction) { /* * By default, the high value of the pcp is based on the zone @@ -6192,7 +6196,8 @@ static int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table, /* Sanity checking to avoid pcp imbalance */ if (percpu_pagelist_high_fraction && - percpu_pagelist_high_fraction < MIN_PERCPU_PAGELIST_HIGH_FRACTION) { + percpu_pagelist_high_fraction < MIN_PERCPU_PAGELIST_HIGH_FRACTION && + percpu_pagelist_high_fraction != -1) { percpu_pagelist_high_fraction = old_percpu_pagelist_high_fraction; ret = -EINVAL; goto out; @@ -6241,7 +6246,6 @@ static struct ctl_table page_alloc_sysctl_table[] = { .maxlen = sizeof(percpu_pagelist_high_fraction), .mode = 0644, .proc_handler = percpu_pagelist_high_fraction_sysctl_handler, - .extra1 = SYSCTL_ZERO, }, { .procname = "lowmem_reserve_ratio",