From patchwork Fri Aug 30 01:44:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: mawupeng X-Patchwork-Id: 13784097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E811CA0ED1 for ; Fri, 30 Aug 2024 01:45:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A09A6B0082; Thu, 29 Aug 2024 21:45:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 650EF6B0085; Thu, 29 Aug 2024 21:45:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53EB56B0088; Thu, 29 Aug 2024 21:45:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 313556B0082 for ; Thu, 29 Aug 2024 21:45:06 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C324EAAC52 for ; Fri, 30 Aug 2024 01:45:05 +0000 (UTC) X-FDA: 82507218570.18.FB92812 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf06.hostedemail.com (Postfix) with ESMTP id C71F118001C for ; Fri, 30 Aug 2024 01:45:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf06.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724982283; a=rsa-sha256; cv=none; b=KdfywYAp9OnLy6Z+gdd8PGdYBa34Zz5GAD961xDktqyk2nUDe6FtA+WWvXTihGiX75N+gg PECvuK7OtHB9xW4Q7A9G2k6SGPEVHDj6aTtnwzelWPvNVmwmEvfaw7Q+R246nChBjBm0FR X0t04aQuzCv+22cvlU7rprP6bmeOURA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf06.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724982283; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=1n/sNzb1uJppRFylsWPZ17EqDi6pDJ8ReBW9wkTD9J0=; b=X0ruH5akKfOxyVdJgha/e5T/7I5tdpvKtS2uIwbj8XzRhR77J92Gtip6jP4rWlyXjxHJ/p X/+PBgi4I/jngSP9cEjjCw6O92AHw9slQCCH10RvnTWaI8urKEZyrvoxIbQDxdsZGs4qZU bvqyEGCmR8yEo8TaDA28Q2vTNeqqzgU= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Ww16t5rTNzQqks; Fri, 30 Aug 2024 09:40:06 +0800 (CST) Received: from kwepemg100017.china.huawei.com (unknown [7.202.181.58]) by mail.maildlp.com (Postfix) with ESMTPS id 634891800A0; Fri, 30 Aug 2024 09:44:58 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by kwepemg100017.china.huawei.com (7.202.181.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 30 Aug 2024 09:44:57 +0800 From: Wupeng Ma To: , , , , CC: , , , , Ma Wupeng Subject: [PATCH] mm, proc: collect percpu free pages into the free pages Date: Fri, 30 Aug 2024 09:44:53 +0800 Message-ID: <20240830014453.3070909-1-mawupeng1@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemg100017.china.huawei.com (7.202.181.58) X-Rspam-User: X-Rspamd-Queue-Id: C71F118001C X-Rspamd-Server: rspam01 X-Stat-Signature: g1jf59cxresskix3t791z9sx6s6qaxwa X-HE-Tag: 1724982302-699839 X-HE-Meta: U2FsdGVkX18/wJ4giVztu+Kd3iUUCmxkbxMYawnofWO/LQvU7ktqVsh+S2+zBnfsCf2jasbxsCpI0Aku1Li20lU8LCD6SYMWZBZsgGANF6ZytIx1LD73WhP8+njqkhLEu3kxe3CeAlYERRYl6Ovnlz7axt5Jb/vMauuNp9Hi7af4RBhls8gPJam59kCQeyOM1h+EQq3Yn1febZBP9aVW+Tgar+3zDlSQ/xr8OJUMF8ncxcV3NopoA2++X8k1nobvTMGp+tkEo3PFbEOt2FL9hnQDqkZljB+lunmgcbJcrhrqTwzq82MlkaRDpaYX5fsHPiQODPYufAti4G3shbNjOUquO4Y32wlJkCvAMOGsenoW6W5U8ygMkUhLNJh5ATCqd0VrpBtXEZnWE2NdMESidHj7QjCKoymn85MWPGLt9STk4ZAduOhKgNGUtwj+LJyO/Jw0BVpUQ1bVteItlBqRjcITzfKVDkj0mJzD5h4Ugg2Eb/ckZRnzH+7pw4gsQqMkQNXAYs4qyWsK6srIc+PtzB+6jrXVf+5XnjXcGbSIuQR138arVug3HK+JuPUCVArr4VdlEzzY34U8f4MyvjJpMTvK09RyTlGHnhO484GLxslguM9r9+gYSwDVRX8bJLzLvQ3boZyMJKW0E8xt1g/7jb7IJpMntZGmezyAt9rrzBqTyYOli96GkTD9p18PPncNIMUrPImYQDym6SaPr7f4BVV8k3egKY/Am93H9PLJe2Aklb/Ppl77eFxhqyDrvZ2rCEw/WbPHUjZWPmU+N6de7Llg5Ubag/gF+Fe/sHGUnyPkZJjt0pBxMAROYBlql8lurHNuiAWeIKEEfOWvJqKkoEVrK2sk13gBAjCRDw5qhYbiz58ZRIUZWbbPnd4fba9UOcnTBDhDLfmAHs8SBSiZCdMvWP5nKLIOaImwXcYaloeBVFEvRc8L4LGRImHHsLZ2QS055YetP4d05MP7LSn 9qvNWzDY l9j8KRKd/ostvXKo0Ww3jzQLQQ2j31+jVV+rrhLkn1WBEzaK7ICOCR+8rBs0j6x/F6egYeYJ5AdW3Y9qZBJxSBPClEpd8Sco6D10qw5pUWxQuMAGeYazHLCTMoDNxIYG+F+HmI1okdB287ijsVtKYcka9YjLsKI+cHgkFVO2XUe8wO7N3kYmxmljQ8A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Ma Wupeng The introduction of Per-CPU-Pageset (PCP) per zone aims to enhance the performance of the page allocator by enabling page allocation without requiring the zone lock. This kind of memory is free memory however is not included in Memfree or MemAvailable. With the support of higt-order pcp and pcp auto-tuning, the size of the pages in this list has become a matter of concern due to the following patches: 1. Introduction of Order 1~3 and PMD level PCP in commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists"). 2. Introduction of PCP auto-tuning in commit 90b41691b988 ("mm: add framework for PCP high auto-tuning"). Which lead to the total amount of the pcp can not be ignored just after booting without any real tasks for as the result show below: w/o patch with patch diff diff/total MemTotal: 525424652 kB 525424652 kB 0 kB 0% MemFree: 517030396 kB 520134136 kB 3103740 kB 0.6% MemAvailable: 515837152 kB 518941080 kB 3103928 kB 0.6% On a machine with 16 zones and 600+ CPUs, prior to these commits, the PCP list contained 274368 pages (1097M) immediately after booting. In the mainline, this number has increased to 3003M, marking a 173% increase. Since available memory is used by numerous services to determine memory pressure. A substantial PCP memory volume leads to an inaccurate estimation of available memory size, significantly impacting the service logic. Remove the useless CONFIG_HIGMEM in si_meminfo_node since it will always false in is_highmem_idx if config is not enabled. Signed-off-by: Ma Wupeng Signed-off-by: Liu Shixin --- mm/show_mem.c | 46 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/mm/show_mem.c b/mm/show_mem.c index bdb439551eef..08f566c30b3d 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -29,6 +29,26 @@ static inline void show_node(struct zone *zone) printk("Node %d ", zone_to_nid(zone)); } +static unsigned long nr_free_zone_pcplist_pages(struct zone *zone) +{ + unsigned long sum = 0; + int cpu; + + for_each_online_cpu(cpu) + sum += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + return sum; +} + +static unsigned long nr_free_pcplist_pages(void) +{ + unsigned long sum = 0; + struct zone *zone; + + for_each_populated_zone(zone) + sum += nr_free_zone_pcplist_pages(zone); + return sum; +} + long si_mem_available(void) { long available; @@ -44,7 +64,8 @@ long si_mem_available(void) * Estimate the amount of memory available for userspace allocations, * without causing swapping or OOM. */ - available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages; + available = global_zone_page_state(NR_FREE_PAGES) + + nr_free_pcplist_pages() - totalreserve_pages; /* * Not all the page cache can be freed, otherwise the system will @@ -76,7 +97,8 @@ void si_meminfo(struct sysinfo *val) { val->totalram = totalram_pages(); val->sharedram = global_node_page_state(NR_SHMEM); - val->freeram = global_zone_page_state(NR_FREE_PAGES); + val->freeram = + global_zone_page_state(NR_FREE_PAGES) + nr_free_pcplist_pages(); val->bufferram = nr_blockdev_pages(); val->totalhigh = totalhigh_pages(); val->freehigh = nr_free_highpages(); @@ -90,30 +112,27 @@ void si_meminfo_node(struct sysinfo *val, int nid) { int zone_type; /* needs to be signed */ unsigned long managed_pages = 0; + unsigned long free_pages = sum_zone_node_page_state(nid, NR_FREE_PAGES); unsigned long managed_highpages = 0; unsigned long free_highpages = 0; pg_data_t *pgdat = NODE_DATA(nid); - for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) - managed_pages += zone_managed_pages(&pgdat->node_zones[zone_type]); - val->totalram = managed_pages; - val->sharedram = node_page_state(pgdat, NR_SHMEM); - val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES); -#ifdef CONFIG_HIGHMEM for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) { struct zone *zone = &pgdat->node_zones[zone_type]; + managed_pages += zone_managed_pages(zone); + free_pages += nr_free_zone_pcplist_pages(zone); if (is_highmem(zone)) { managed_highpages += zone_managed_pages(zone); free_highpages += zone_page_state(zone, NR_FREE_PAGES); } } + + val->totalram = managed_pages; + val->sharedram = node_page_state(pgdat, NR_SHMEM); + val->freeram = free_pages; val->totalhigh = managed_highpages; val->freehigh = free_highpages; -#else - val->totalhigh = managed_highpages; - val->freehigh = free_highpages; -#endif val->mem_unit = PAGE_SIZE; } #endif @@ -196,8 +215,7 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask)) continue; - for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp += nr_free_zone_pcplist_pages(zone); } printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"