From patchwork Thu Nov 7 01:12:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13865683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD551D29FA5 for ; Thu, 7 Nov 2024 01:13:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 117AD6B0085; Wed, 6 Nov 2024 20:13:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C6966B0088; Wed, 6 Nov 2024 20:13:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA9B26B0089; Wed, 6 Nov 2024 20:13:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CD46B6B0085 for ; Wed, 6 Nov 2024 20:13:02 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 838DE1C40C2 for ; Thu, 7 Nov 2024 01:13:02 +0000 (UTC) X-FDA: 82757523954.30.FBD0D7B Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf17.hostedemail.com (Postfix) with ESMTP id B38AA4000E for ; Thu, 7 Nov 2024 01:12:33 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XUSdFjeo; spf=pass (imf17.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730941895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=7eKRthy0nkvvk1YUqbsHIklnZcvTop5o8zHLQjk8A7k=; b=fDOXq+0+jrmAIuWFJ2AZ7hPRjgLQ3h0QvQCkiUuYFhWAp+0NHWTn+lFkPxDXSOB7bOqtGL 3iPi3t5SLKNZIMkOcqInjNaMvIxnM5u1aBMY8hqJZYudafpdJVUvBkXjv7izbO1YG/k+mU lLAh1k9Utet+iyfLzHWhY/e7b7l2e4I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730941895; a=rsa-sha256; cv=none; b=bqoF6KoPTKVARkavoCzsUxBy590d7h03jvi28iTcsuTkw07Y4jyv/X+SE1SpfwaRs1dQoi Yf3AnkOvwaAw8k/XyEbmj4YonZy5VD5KySDeXnXVkUf8LxhHMg8IQGIzU6HlGq1S/XfgxL rgzBq184C85+fy9LPQ34pROBErbTyis= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XUSdFjeo; spf=pass (imf17.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-2e2a97c2681so321525a91.2 for ; Wed, 06 Nov 2024 17:13:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730941979; x=1731546779; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7eKRthy0nkvvk1YUqbsHIklnZcvTop5o8zHLQjk8A7k=; b=XUSdFjeoq8nW+smBAdzn/SYXuSW0oBY2Q/BVvQNXsZ3Dx3mf+Lnlb2YJJWvoxHV9N/ uuJDY5duryF5IAG567aF24Yjn4i85NQQPL0uo5GwvcjCCLqa5/KOgpqVG7EXubsSfCKT AesoWkdeB2vgHADfA6F2qvl9Rst9aW5s/Nl3cEvoqyKN0LcKjNY71G+T8mBR8Ns1Zf1C EQ/9gS2H65Jbrwb4sCxXYEcBrN4Kr1BDAENBHRCnYsDHRy6G3mXJJH8nB9hKkTaNqQFi v9GU7Vt/OlBN2wZqjd/8xsEpfmQMGdtW8C+LLRuZo0T7oMKCA2WWbwrS1nk0BwSg2I4B AeuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730941979; x=1731546779; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7eKRthy0nkvvk1YUqbsHIklnZcvTop5o8zHLQjk8A7k=; b=fc96GwYe5S3K1KUHHm0t3/Vv+KObbS8kLup+Crdvi6wVPIsFpfSeWXbPKOhY4Ca0T/ 16CfZplzdv0ATYe2Bt9BsbzxgV6O6nD6ymSyxoyPIrdQpKMsyP9QakXK1w8LQgxXMPAI 41CgRBTK8wtM83bsXI/VVQRyoDDcgXF22lfA6lk+9+5Yx62G2I+TZzG/YHRvwJF/RT26 cJpa66i0HMhWWi3GeMek1K0zSoGuapEo260hoYVwkyMJxjy449iHpvhyGSoy7GfdTIuB Bs1ZwLPhZUDCtDJepVGEzANL8tyz/LGkVDS463l+dS8UTEMOBkC2weJkJHjoY6hOd04A BvQw== X-Forwarded-Encrypted: i=1; AJvYcCX5VMmAMz3kqxLuM7J8ztmQI+5JUlFi97LkOpE5Klcbt3Gb/OCNOnXK76Hey8kr61XggMU+REAnMA==@kvack.org X-Gm-Message-State: AOJu0YxKm9N2FihSAlOGb0WZ1jZ/UX6WtuqK+c90SzLLGoLBgZkwGmeR kAU7dm2Ke1KwXZKB8CVGG0gSgHc3S39c/4VsydaW8rr0Z50F/Qa0 X-Google-Smtp-Source: AGHT+IH+szR71evvH85kWFDS4yO0a3/IVnquRsGdrcdkLacKhZHuX3W0ZKEEwt0YS3xvMPjWuHj/uA== X-Received: by 2002:a17:90b:3906:b0:2e0:80e8:a31a with SMTP id 98e67ed59e1d1-2e9a7668b6emr86660a91.35.1730941979051; Wed, 06 Nov 2024 17:12:59 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:8942:5500:507e:2219:6f85:3a5f]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e99a62bfb6sm2257550a91.43.2024.11.06.17.12.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 06 Nov 2024 17:12:58 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Barry Song , Kanchana P Sridhar , Nhat Pham , Chengming Zhou , Johannes Weiner , Usama Arif , Yosry Ahmed , Hailong Liu , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts Subject: [PATCH v4] mm: count zeromap read and set for swapout and swapin Date: Thu, 7 Nov 2024 14:12:46 +1300 Message-Id: <20241107011246.59137-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Stat-Signature: x5bxmx5fcpkqne5mezkd9cqjzfdhg9nc X-Rspamd-Queue-Id: B38AA4000E X-Rspam-User: X-HE-Tag: 1730941953-807551 X-HE-Meta: U2FsdGVkX1+8FyaaBiel+OPExN/gxAxwS+Li6zrm7DTigJBM7o01hNRpFX6UzsXo4PPpJIhCJsc5TDszWc7lXpux7o5aYfux00HlQ2lxnA+4BgJftdbn6FUkGnHcVpC0K1VN3T391ID2liS4RG6bqVAjloaO1I/nD1iWfQc4KK8ASYwcs1DMqCBpIg7FfsOjXUwHXEz9qZk0LpadkxFw1QbvZ8CWj4E1fKJmrijNtKE6Em49FaL6iNeamGLxTCPvQzyjaR2eAFaqjTTN3gSnjnbzzvjofW82uR0+GkBsKvRPgvbKBari96C3iTuosA08/66D2QmynkE7S5xygLZMdjyVkIYvOBru9Q0iS7zEFqCRQFF/A35LF5QzrFFTYCErird33v9pv62ohsSfFAgYfgi1YzoPPheTviGw5Te7DaFnfSTD1dNWXVjSLZmTzrqhosHlDZGrcoqCVw17a0V4/lhv07tm5GtZ3eYrEd3xGKqmk9Rh7neVwgxARVD/fGmLl9klplQEnTU5jxS3ob11mT2TuX7Aic1m5bxBI7lLPSq1dqI4e4RyQGYrRuZc0wzPQayxg3QcdM8+4sJ36BF1pY65cAHf35xoBzWmw+r1B6tLkVL7t/WJYBt62XGFvDqjsCJClZSlD3kgzsrIJ9PEZhmgzTwSNrE5m8cpYT9bz2REiFHL/spdpQaFME7KoQq3qi3pyf+ydcWMZ7Z+xRmWnLXafSkWUufnV7GR/va7SCljInE2dMn/LHLACHj5qRCMn7ZYZ5xKn9GUMafCxlk/OUtC0wA5sUGLpgixQnn9bR5YUw+lA4b+bbJCWTmdxlq0hs2yft8QuSEbPJgu3/0oI6dV14mGY14CuaQ7B5tWEqFAVDv5AQI8mV/yAqFxMA3DmQCZmqopBWg+ikqeLfAevjCGhp4Vh6qbaczO4LVXk03quT9VPbiyE2aLBXE0lxpvyYtT81jC5qd61lgayAt L21sRGaL Vrsryi6uVB3APPNsc9omD2Nt9OaJYdSl02FcgZ1Z5nIAKV5cMm70dy03mtJbzpmn19U8yjH4BaASIlO89XcoNVhzGlB2vZJcCCC9Rz9T4MxU4xbkqypr5nENNR0sdNsMGZiXiCesCeVqCC4uvQCC83hBlWeD3zEPuuBjhAEhtK8BQXqNd5+J42cnbYwtRD0jepCVe5JV9EXTdKBtO+ECF2718wIYSXKYWQCyyLfV0JZLROmA9V2PHprT7fYMZc3I66+GgwuNu38Sr5uXG97ejuK8soB/msKIoSFKazyw5CkOpdJObBury25/NWlL3cd7i1vekdeguKKTSog3i+KrcXlRRb+A0YKZdTfPJ9QmRbbKMNF5r0Vd76nhHgidTk9fLYvUkh8ENU/aIKFCxaviHk6q5YmkQ0xm3QV8sSJb+D1cSFTpQEsUp7xjP5m8m2KB/gk7gAB6545tVDXvPflBAl45DHW7F7SQhmG6lO/oFKc31jZ9Lovtcb18Q3/ybHFILwGYiKOlKsX54iVuU2byv3z4x9DVJAIoAldBVnNScUec33FViYihYC7SINIFSNNOJ2zzrJEIp8xjXfuyyUfbPlsG9vZj+zcGSASRk4tmKs18U0mOMrLQ61fgdN5jT+H+vBWKJzqjS8fBjSl5Oy8lgQktUyJJ5W7t17gF++bEM+XXYb/YZKGSp9dvMq3uc4PkADkbBL67OJ8/unWE99Hy6S2SQOpTMBcCZzSX5TTco3QrwZEblqzphrUsdTgpF2ZFI4/erQAryeiZVngdlhlnjivFPlk00FIAnldPupQtjllo4bMxeCxrgxbfFvWhRfYxxGJW4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song When the proportion of folios from the zeromap is small, missing their accounting may not significantly impact profiling. However, it's easy to construct a scenario where this becomes an issue—for example, allocating 1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out and swap-in counts seem to vanish into a black hole, potentially causing semantic ambiguity. On the other hand, Usama reported that zero-filled pages can exceed 10% in workloads utilizing zswap, while Hailong noted that some app in Android have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented similar optimizations, leading to these optimized-out pages being counted in either zswap or zRAM counters (with pswpin/pswpout also increasing for zRAM). With zeromap functioning prior to both zswap and zRAM, userspace will no longer detect these swap-out and swap-in actions. We have three ways to address this: 1. Introduce a dedicated counter specifically for the zeromap. 2. Use pswpin/pswpout accounting, treating the zero map as a standard backend. This approach aligns with zRAM's current handling of same-page fills at the device level. However, it would mean losing the optimized-out page counters previously available in zRAM and would not align with systems using zswap. Additionally, as noted by Nhat Pham, pswpin/pswpout counters apply only to I/O done directly to the backend device. 3. Count zeromap pages under zswap, aligning with system behavior when zswap is enabled. However, this would not be consistent with zRAM, nor would it align with systems lacking both zswap and zRAM. Given the complications with options 2 and 3, this patch selects option 1. We can find these counters from /proc/vmstat (counters for the whole system) and memcg's memory.stat (counters for the interested memcg). For example: $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat swpin_zero 1648 swpout_zero 33536 $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat swpin_zero 3905 swpout_zero 3985 This patch does not address any specific zeromap bug, but the missing swpout and swpin counts for zero-filled pages can be highly confusing and may mislead user-space agents that rely on changes in these counters as indicators. Therefore, we add a Fixes tag to encourage the inclusion of this counter in any kernel versions with zeromap. Many thanks to Kanchana for the contribution of changing count_objcg_event() to count_objcg_events() to support large folios[1], which has now been incorporated into this patch. [1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") Co-developed-by: Kanchana P Sridhar Signed-off-by: Barry Song Reviewed-by: Nhat Pham Reviewed-by: Chengming Zhou Acked-by: Johannes Weiner Cc: Usama Arif Cc: Yosry Ahmed Cc: Hailong Liu Cc: David Hildenbrand Cc: Hugh Dickins Cc: Matthew Wilcox (Oracle) Cc: Shakeel Butt Cc: Andi Kleen Cc: Baolin Wang Cc: Chris Li Cc: "Huang, Ying" Cc: Kairui Song Cc: Ryan Roberts Signed-off-by: Andrew Morton --- -v4: * combine Kanchana's count_objcg_events() change to fix build errors on 6.12-rc * collect Chengming's and Johannes's reviewed/acked tags, thanks! Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ include/linux/memcontrol.h | 12 +++++++----- include/linux/vm_event_item.h | 2 ++ mm/memcontrol.c | 4 ++++ mm/page_io.c | 16 ++++++++++++++++ mm/vmstat.c | 2 ++ mm/zswap.c | 6 +++--- 7 files changed, 43 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 69af2173555f..6d02168d78be 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1599,6 +1599,15 @@ The following nested keys are defined. pglazyfreed (npn) Amount of reclaimed lazyfree pages + swpin_zero + Number of pages swapped into memory and filled with zero, where I/O + was optimized out because the page content was detected to be zero + during swapout. + + swpout_zero + Number of zero-filled pages swapped out with I/O skipped due to the + content being detected as zero. + zswpin Number of pages moved in to memory from zswap. diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 34d2da05f2f1..e1b41554a5fb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1760,8 +1760,9 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_slab_obj(void *p); -static inline void count_objcg_event(struct obj_cgroup *objcg, - enum vm_event_item idx) +static inline void count_objcg_events(struct obj_cgroup *objcg, + enum vm_event_item idx, + unsigned long count) { struct mem_cgroup *memcg; @@ -1770,7 +1771,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, rcu_read_lock(); memcg = obj_cgroup_memcg(objcg); - count_memcg_events(memcg, idx, 1); + count_memcg_events(memcg, idx, count); rcu_read_unlock(); } @@ -1825,8 +1826,9 @@ static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) return NULL; } -static inline void count_objcg_event(struct obj_cgroup *objcg, - enum vm_event_item idx) +static inline void count_objcg_events(struct obj_cgroup *objcg, + enum vm_event_item idx, + unsigned long count) { } diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aed952d04132..f70d0958095c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, + SWPIN_ZERO, + SWPOUT_ZERO, #ifdef CONFIG_KSM KSM_SWPIN_COPY, #endif diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 06df2af97415..53db98d2c4a1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -431,6 +431,10 @@ static const unsigned int memcg_vm_event_stat[] = { PGDEACTIVATE, PGLAZYFREE, PGLAZYFREED, +#ifdef CONFIG_SWAP + SWPIN_ZERO, + SWPOUT_ZERO, +#endif #ifdef CONFIG_ZSWAP ZSWPIN, ZSWPOUT, diff --git a/mm/page_io.c b/mm/page_io.c index 69536a2b3c13..01749b99fb54 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -204,7 +204,9 @@ static bool is_folio_zero_filled(struct folio *folio) static void swap_zeromap_folio_set(struct folio *folio) { + struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio); struct swap_info_struct *sis = swp_swap_info(folio->swap); + int nr_pages = folio_nr_pages(folio); swp_entry_t entry; unsigned int i; @@ -212,6 +214,12 @@ static void swap_zeromap_folio_set(struct folio *folio) entry = page_swap_entry(folio_page(folio, i)); set_bit(swp_offset(entry), sis->zeromap); } + + count_vm_events(SWPOUT_ZERO, nr_pages); + if (objcg) { + count_objcg_events(objcg, SWPOUT_ZERO, nr_pages); + obj_cgroup_put(objcg); + } } static void swap_zeromap_folio_clear(struct folio *folio) @@ -503,6 +511,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret) static bool swap_read_folio_zeromap(struct folio *folio) { int nr_pages = folio_nr_pages(folio); + struct obj_cgroup *objcg; bool is_zeromap; /* @@ -517,6 +526,13 @@ static bool swap_read_folio_zeromap(struct folio *folio) if (!is_zeromap) return false; + objcg = get_obj_cgroup_from_folio(folio); + count_vm_events(SWPIN_ZERO, nr_pages); + if (objcg) { + count_objcg_events(objcg, SWPIN_ZERO, nr_pages); + obj_cgroup_put(objcg); + } + folio_zero_range(folio, 0, folio_size(folio)); folio_mark_uptodate(folio); return true; diff --git a/mm/vmstat.c b/mm/vmstat.c index b5a4cea423e1..ac6a5aa34eab 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1415,6 +1415,8 @@ const char * const vmstat_text[] = { #ifdef CONFIG_SWAP "swap_ra", "swap_ra_hit", + "swpin_zero", + "swpout_zero", #ifdef CONFIG_KSM "ksm_swpin_copy", #endif diff --git a/mm/zswap.c b/mm/zswap.c index 162013952074..0030ce8fecfc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1053,7 +1053,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, count_vm_event(ZSWPWB); if (entry->objcg) - count_objcg_event(entry->objcg, ZSWPWB); + count_objcg_events(entry->objcg, ZSWPWB, 1); zswap_entry_free(entry); @@ -1483,7 +1483,7 @@ bool zswap_store(struct folio *folio) if (objcg) { obj_cgroup_charge_zswap(objcg, entry->length); - count_objcg_event(objcg, ZSWPOUT); + count_objcg_events(objcg, ZSWPOUT, 1); } /* @@ -1577,7 +1577,7 @@ bool zswap_load(struct folio *folio) count_vm_event(ZSWPIN); if (entry->objcg) - count_objcg_event(entry->objcg, ZSWPIN); + count_objcg_events(entry->objcg, ZSWPIN, 1); if (swapcache) { zswap_entry_free(entry);