From patchwork Fri Jul 26 09:46:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13742536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21183C3DA49 for ; Fri, 26 Jul 2024 09:47:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB0486B0083; Fri, 26 Jul 2024 05:47:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A36106B0085; Fri, 26 Jul 2024 05:47:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 861E96B008C; Fri, 26 Jul 2024 05:47:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 64E5A6B0083 for ; Fri, 26 Jul 2024 05:47:05 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0046514171F for ; Fri, 26 Jul 2024 09:47:04 +0000 (UTC) X-FDA: 82381425168.28.2624937 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf20.hostedemail.com (Postfix) with ESMTP id 1F0AB1C0029 for ; Fri, 26 Jul 2024 09:47:02 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h+9du4JT; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721987221; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=1hYeok8a21a815KmMqTiVxvmLWth6eyV+eijWhQCcpY=; b=UuBNfQ8tJI1NtWJiVQ0+zkM5mOhGjxYJ+c3JGc95L7QjnhQL9u3ORA4f+ozM5P1eiN1c+h Ju8ry97PomzoDqrx7B2xsBsxHxjAxyWOk4VPuoPXnawQ2ebT6bB2MleB2Wa+jo5472hEqL C2b2suTAzx9ffKLuUVK/o+RJ24fuliI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h+9du4JT; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721987221; a=rsa-sha256; cv=none; b=tnKqpTyCkAKUtoGoU+E5Q72/8RjekHPOuQbKSnBXYkG+53KcYTCChRVTp1Kvut3odIpbZE /gwX6xMiqXpJ0GOE66qaxzsXN/nVeCFvAbKYTd/umWFLgi3x5oNItiHX3uGuUo/e18lTvO e+ctUoN0yUt4ZBn728boXIofK83YBoQ= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1fc611a0f8cso3906885ad.2 for ; Fri, 26 Jul 2024 02:47:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721987222; x=1722592022; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=1hYeok8a21a815KmMqTiVxvmLWth6eyV+eijWhQCcpY=; b=h+9du4JTNlJGDFIRTiJjUuWJ7pghG8zi6JnowFKDB6OrQsEg/4C5aROVx2kueA48DY nM52NE5xMhXTuOQqz2L6w7bWET/B8IyRezYiEbXycmZT3sEiyWzrfAMDKgeErmfxpUt8 +LcVan6ZnsBjBUvOJLVL8yULXYSw6v8hd0UCxJfuDOVP4YvX/zzJlv0l6eBaIbKYBmXE KUn3s7I21O4olMaXGKR4ZmQn/W6f5P6UpOExnAg9r5cdCrA4nfhbPWmgabjC5xLmTqrP HaB8Y7A5iG9htniDbtdBWjQX0GjEGLbayGeO29Q6yzF8AeaNnr9hQ+1g0qFZxrw26srL 2U3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721987222; x=1722592022; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1hYeok8a21a815KmMqTiVxvmLWth6eyV+eijWhQCcpY=; b=HqjzLvdcZzQnqRvBcrRllB/6Jo+F88kTOZsv6ktj1ZvuiPNXDqbwoNbmYEreXGKlSr j9d9dDEnGZwL9nlMEKMfT4/kdIk3Gq0LsxggZbjqaujKcJVzqcqDZGjNRO+JuhfjfkMf tNRP9FGTk436bZY0O5qOweQl7aNikLon6qIU1Pp+TUN1u40hzSFOgwdyUvP5rUs5Aogw u08acxisIyjxFswqQjonONansY6FchRFMYGnNNr6V71vJBX7ckB5UaffgcihyJJ8d5QV XHvw6nlijJCa5uSWDHRyacDd8m7trceOQMG5xNoV3qZTaEYAy39VzENTk0EnOzflC21B OMBw== X-Forwarded-Encrypted: i=1; AJvYcCV4DEQp8YKPTQMeLqBKRNJ26qqBZUayRwjpP4ESy8uAMWDxK2CconAdpeQJf0tMcOLsOm5YUqcTzf5anUzFjiThndk= X-Gm-Message-State: AOJu0Yym1x/oZQoJQs5w8+QhJsdxcllY0e6q2YqJR/PKdlgOjqZYxP63 q8orccspW08jDRqIidI7HzKX5CWn0S83m4yt57Z42eSI1Pq93w58 X-Google-Smtp-Source: AGHT+IF2rYGD79oIra6KDSbEdo1SbLFnul/wydficDNOgKTqChEyBYfaZEkLcFnkvEiK8cpU4uVCwA== X-Received: by 2002:a17:903:1c9:b0:1fb:3d7:1d01 with SMTP id d9443c01a7336-1fed934e6a6mr51565535ad.59.1721987221583; Fri, 26 Jul 2024 02:47:01 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7d15e98sm28127455ad.99.2024.07.26.02.46.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 02:47:01 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com Subject: [PATCH v5 0/4] mm: support mTHP swap-in for zRAM-like swapfile Date: Fri, 26 Jul 2024 21:46:14 +1200 Message-Id: <20240726094618.401593-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 5946ecckbghdhz9nk166fpjtu5h3gayr X-Rspamd-Queue-Id: 1F0AB1C0029 X-Rspamd-Server: rspam11 X-HE-Tag: 1721987222-943284 X-HE-Meta: U2FsdGVkX1+8QMQpYBU3DlgZGwZTNjceRk7blQBNesJaAmAz3Wl9xmsv+E+Qx8MO5YC1dXnWBOfrxSHIgkHRSvDj36EQBTPJ3BlmVtuacN4gLZyvJTlDn7/zH4ftIwRTzII5xmpHz7RjO0t+KBB4cbtu7SktFY98N4VGtqHQ37atujtgngqheIaLilmbYUvVrIZRGZriL/LhfxrqSzgiGWzXIqrZQ0Xh3vx3rxTM27u7S449yB8yvM6WC8sChAcuRqs0ADzoE61mYophUPOvCVlisuxxBUGg9eo+dbp7rsrfEVyStTHBDIpMMvGq5gWAh5vDr4J3mzuuQ4+GQ+iG4YkdbEN+/n8ICU9eMQ+b+X0V1LVvm5gBSF2SnHfNpt9ZeibnbaSTRHkp7OR98PecEbXKzcHVw+AdVbLzplOxbX6HaxlZj2Lfoix6CYTv3EdDz3CGJBqclFPKeA5+e6yTHamxi8Hl8FOmHiiawLlncFQ3h/2zJw2XPtUNcx9YaurUAYcCU11Ki5k5Fz04npZTC8PPjO9/TwEn3xWf4k17mouhcKWDtEOIAvXeFCjQ4bO/da2t8czxVVu7LJGUh0lgmwTxAhhJwo5fg6OD2YFfdyMIXF2BuaOW2nkHddKoq9LMVklkeLZ82arhqmK2aAclmQB8Qrnki91TeCtiqSJPL3NS9vN+DDbR6sebTzLQVCnesVv42+C3ZQ4FqZE5kU38kvMWCQevhF+EewTaTgtgvFriI5YtTjO9KNW0wuin1awl9ZI+i92yHIbjGx37iAWbVftuX3lnaX3k14ZtNQKF+W2H5JOFp+qSSOw019mVNUgTnPWih7VSXFAigeiiuEv69Rh3/Gq822xpjf1XYcFALRncoCkLbuGPsao0kxiaTAhDlXGp8WGxGSifUYc6XHBmZeD0xpros9oAePumgPYYLQIzF3ymexdU7gbyTKjq0AcOLLX1Hm9vTPrTyCZTv2i wRpThU3p 8FewJL6f6W6fINyaHpU92tpv9HWVjQh+dkL8wr5Df/g3vqOSWmGyI7M8VFF/bUOT7gl/sKYHzUolYW7418grIdbp4B2ycIiBOd12p7O+Vv2IZAfwv7b1ejUEwBFlNl7oKftSty25u1ti8iL4CjnDg3tJ8dKFrisGBdNXSSbvC9DCRwHmV9PJUf8Z42K1BHnG7zopFP185J4GnZHI1ZCMxMYDXuA0OxIQiGIpc1qgE9e8ZF0p5V2PkMLTVg+Q9GphqqJa47TWhPwKkd/qlznlUXHqh+LVOH2loi9819s5Tyq8HNNVlw3MDcA5ZqBgqZZnlwBFYVM8jDzLqdH9bFjbsjXsB/76ODKa/FH8ysDn5DGwQ/QzqqZcOr8IjEo+njzwUkodDouumcjRBn/s/V3PA0n/NOQxtTZEzMNJmwEKGE8OsFYCpDdyahBHhHxTmsK2l6vpnf6nKPmziLRd544/Old+bU5YIMjUP1Q8pXvvCK+DuJtfRSQV5bfxMpUVVfKAaYCYH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song In an embedded system like Android, more than half of anonymous memory is actually stored in swap devices such as zRAM. For instance, when an app is switched to the background, most of its memory might be swapped out. Currently, we have mTHP features, but unfortunately, without support for large folio swap-ins, once those large folios are swapped out, we lose them immediately because mTHP is a one-way ticket. This is unacceptable and reduces mTHP to merely a toy on systems with significant swap utilization. This patch introduces mTHP swap-in support. For now, we limit mTHP swap-ins to contiguous swaps that were likely swapped out from mTHP as a whole. Additionally, the current implementation only covers the SWAP_SYNCHRONOUS case. This is the simplest and most common use case, benefiting millions of Android phones and similar devices with minimal implementation cost. In this straightforward scenario, large folios are always exclusive, eliminating the need to handle complex rmap and swapcache issues. It offers several benefits: 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after swap-out and swap-in. 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT without fragmentation. Based on the observed data [1] on Chris's and Ryan's THP swap allocation optimization, aligned swap-in plays a crucial role in the success of THP_SWPOUT. 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU usage and enhancing compression ratios significantly. We have another patchset to enable mTHP compression and decompression in zsmalloc/zRAM[2]. Using the readahead mechanism to decide whether to swap in mTHP doesn't seem to be an optimal approach. There's a critical distinction between pagecache and anonymous pages: pagecache can be evicted and later retrieved from disk, potentially becoming a mTHP upon retrieval, whereas anonymous pages must always reside in memory or swapfile. If we swap in small folios and identify adjacent memory suitable for swapping in as mTHP, those pages that have been converted to small folios may never transition to mTHP. The process of converting mTHP into small folios remains irreversible. This introduces the risk of losing all mTHP through several swap-out and swap-in cycles, let alone losing the benefits of defragmentation, improved compression ratios, and reduced CPU usage based on mTHP compression/decompression. Conversely, in deploying mTHP on millions of real-world products with this feature in OPPO's out-of-tree code[3], we haven't observed any significant increase in memory footprint for 64KiB mTHP based on CONT-PTE on ARM64. [1] https://lore.kernel.org/linux-mm/20240622071231.576056-1-21cnbao@gmail.com/ [2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gmail.com/ [3] OnePlusOSS / android_kernel_oneplus_sm8550 https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11 -v5: * Add swap-in control policy according to Ying's proposal. Right now only "always" and "never" are supported, later we can extend to "auto"; * Fix the comment regarding zswap_never_enabled() according to Yosry; * Filter out unaligned swp entries earlier; * add mem_cgroup_swapin_uncharge_swap_nr() helper -v4: https://lore.kernel.org/linux-mm/20240629111010.230484-1-21cnbao@gmail.com/ Many parts of v3 have been merged into the mm tree with the help on reviewing from Ryan, David, Ying and Chris etc. Thank you very much! This is the final part to allocate large folios and map them. * Use Yosry's zswap_never_enabled(), notice there is a bug. I put the bug fix in this v4 RFC though it should be fixed in Yosry's patch * lots of code improvement (drop large stack, hold ptl etc) according to Yosry's and Ryan's feedback * rebased on top of the latest mm-unstable and utilized some new helpers introduced recently. -v3: https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/ * avoid over-writing err in __swap_duplicate_nr, pointed out by Yosry, thanks! * fix the issue folio is charged twice for do_swap_page, separating alloc_anon_folio and alloc_swap_folio as they have many differences now on * memcg charing * clearing allocated folio or not -v2: https://lore.kernel.org/linux-mm/20240229003753.134193-1-21cnbao@gmail.com/ * lots of code cleanup according to Chris's comments, thanks! * collect Chris's ack tags, thanks! * address David's comment on moving to use folio_add_new_anon_rmap for !folio_test_anon in do_swap_page, thanks! * remove the MADV_PAGEOUT patch from this series as Ryan will intergrate it into swap-out series * Apply Kairui's work of "mm/swap: fix race when skipping swapcache" on large folios swap-in as well * fixed corrupted data(zero-filled data) in two races: zswap and a part of entries are in swapcache while some others are not in by checking SWAP_HAS_CACHE while swapping in a large folio -v1: https://lore.kernel.org/all/20240118111036.72641-1-21cnbao@gmail.com/#t Barry Song (3): mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in mm: Introduce mem_cgroup_swapin_uncharge_swap_nr() helper for large folios swap-in mm: Introduce per-thpsize swapin control policy Chuanhua Han (1): mm: support large folios swapin as a whole for zRAM-like swapfile Documentation/admin-guide/mm/transhuge.rst | 6 + include/linux/huge_mm.h | 1 + include/linux/memcontrol.h | 12 ++ include/linux/swap.h | 9 +- mm/huge_memory.c | 44 +++++ mm/memory.c | 212 ++++++++++++++++++--- mm/swap.h | 10 +- mm/swapfile.c | 102 ++++++---- 8 files changed, 329 insertions(+), 67 deletions(-)