From patchwork Sat Jun 29 11:10:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13716889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0E09C27C4F for ; Sat, 29 Jun 2024 11:10:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15D096B0082; Sat, 29 Jun 2024 07:10:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10EFA6B0083; Sat, 29 Jun 2024 07:10:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F170F6B0088; Sat, 29 Jun 2024 07:10:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D32876B0082 for ; Sat, 29 Jun 2024 07:10:36 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 08B0CA0DF2 for ; Sat, 29 Jun 2024 11:10:36 +0000 (UTC) X-FDA: 82283658072.05.3C83DA5 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 1403680014 for ; Sat, 29 Jun 2024 11:10:33 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=f+Wx8kS8; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719659425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=pBAcVgcZQQmUaocHAdNszX5FkxHdKW1jcpkzZ5MX9kQ=; b=CWhptUSov2TT+UoCYjrrSpoReRB4pxh59NPFuq5p9qvfNKRO6QHHSLb2AhjN1Kd1M3qtOe HOo6GIXpWquIAa/EkvRHLGFZJ5I6lS1wutUjtA9f+RGmSjCtvobZDE10gobwQzi+odR3hH tUeFy3p7tYN7svsS4b07+o7OdOCEO+Q= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=f+Wx8kS8; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719659425; a=rsa-sha256; cv=none; b=TU8O/zyLLYWNGFwihNy6HIVjdahYGd1189vPfbaI7ar+KKxDQV9+73CACAMup5kWJwVpXe /0hS3IHrVdAyyZ2axRElVK8rLoDJzQhtbr/793a+r/eu0dnpEZFAYUhQc5Acoec0TjZGwk +HjyiVZw/RG9vm8T/2qVTqlNh0hM0RY= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1fa2782a8ccso7904875ad.2 for ; Sat, 29 Jun 2024 04:10:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719659433; x=1720264233; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=pBAcVgcZQQmUaocHAdNszX5FkxHdKW1jcpkzZ5MX9kQ=; b=f+Wx8kS8gVsTXxZxUBJkML3Vt5j7dxFZW/KXuaImp5cohX54kQQez0ojHP39W3tS1X qP8Ie7J+IWA0dPB5VjpvmTErt61E5PgWDk5ww438D79779iczFbLfNDUAvfY/xb85tsu ft67Yy9tJIOOaSSJWBpcEKEMo9UH6lMzHQNyf735S9AnzPRA49Fh/A22sw28CTMT5Fie A64ZaSsC3FJymBFAqZzfqUX2q4YIKhaFUF6F2Do/Gmx3DmiquLxTC6RK2hqzujcW2/sk etNcd4NOEkXAd5VU6JQq6C/2LZe8Q5wJaCLc7ocJITMoxc4eaIbFG4/x6wCGxoxIY7ix 4DvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719659433; x=1720264233; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pBAcVgcZQQmUaocHAdNszX5FkxHdKW1jcpkzZ5MX9kQ=; b=PYDgXrox8muYl/J5IHfueZ3XdPqdojSfSTG1N7LAWbE+O0QPmwekCbqoVHvriG/ipC oJEi+vwIvBTYu0L0SfuChH+sWd6rMoz9oBOhrtSJFOG1BJvT4pNKRWpBj/dpWKarX6iI WK+5fTOfwLsbP7ZuFIDTihYRD/1vWhTDIddN6rZuRqcnwrO9tGdrUSlPb5CVhFPX0/Zf Jq8UrqWFI0ZQ6dzLNF61ZEuJqzuxWRGhBymVqvYu6jRqwKN3lBQy9HMCbhbxxB4Dmq9D K1/N1bo2Quo6zBtg7RqZMneg2UCSF1PElwRgc6NHTuktF7XTqTglSUIrvWnnjdLTaw4a LZKA== X-Forwarded-Encrypted: i=1; AJvYcCVTvXmBL1e9Evd+fhhOsNe5uyfi4w/crnq1V065DEggqSI2DJImXXSGbjFvnrgM9M6VsIdKgRnA42I1/Ne+6lG9aAo= X-Gm-Message-State: AOJu0YxlzBTDZyotjwtv4QBRj+lHmL6yf0+k4T3FreCcdxYXKRIwhyjP jMiQVX5LOuQlFwIGusm0Us/yyAXcMm0a3pd6942A6P8NrQp4SheQ X-Google-Smtp-Source: AGHT+IFcma5rRXL35Rr9ot3u/wZrnRCfziW0eE4+61wGxHzyqaFYi4s2Hx7oY/SVvwtFUFY0PE5FDQ== X-Received: by 2002:a17:902:e546:b0:1f6:fcd9:5b86 with SMTP id d9443c01a7336-1fadbc73dcemr3582105ad.12.1719659432559; Sat, 29 Jun 2024 04:10:32 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fac1596920sm30068975ad.268.2024.06.29.04.10.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 29 Jun 2024 04:10:32 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, surenb@google.com, kaleshsingh@google.com, hughd@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, baolin.wang@linux.alibaba.com, shakeel.butt@linux.dev, senozhatsky@chromium.org, minchan@kernel.org Subject: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile Date: Sat, 29 Jun 2024 23:10:08 +1200 Message-Id: <20240629111010.230484-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1403680014 X-Stat-Signature: xxz7tirjs9k8uxo6mq6iteqy8xf34zii X-HE-Tag: 1719659433-960403 X-HE-Meta: U2FsdGVkX1/t+BQmZAWRJWiFawsoVFvL4WtG9iiS2UjFYfXcvj8/xkIh6d7qW8oaObo6ANlJipMTaXRXSudvFp9p58Cfgs4aPZVTW1+fLoNZx+K/4qVUbqZjcc3i7vcC28rItPWil+plqVi8krXDbcWlOdr6OHE3ldlSJJUZ/rA7vyH8ZD44gTyzfwU9xdP+TNsqJvT0NFNzycwNoZpYH1uHqhwfk+SwNyfsT2MlZTh6em9Fh0hs+BbFJZ+/KkrrGu+q9qkfO4og4jpdq4k/WSiu+UxP8UWvchfDKKlx65FKVbprPdQj/ULd9CSDnv4NXYlDLzOXvtEnykRlJqUiNCtr9qoBMYovTw6oXK5rR5E729jJj6yG3VCJW8ALACqSxe4ylOw3KPnnegwscJ2jFNqavLKZiWNVi3/N6xPggcv99RrkrhW/VA4WlamhWATDKxxQXQGfgwNOKGVHPXBddV6Fnf7FK1yzpz0wBz44G6yuVqlBkH2VprQH6RKVMtmYvF6VAx8p8rBKE0cXHyyoQZW2FrkYxBNYieBtRjULMHvOBH4b5eiKEpzlqy2AJu6bca4GiNnyv7C1ivJ/XdxnsTxvUsYzUD9Sj17jIGDalCLNPgw2bvRi+/VMm3kLx1Z8vOko0TAFedbJnpFn6Up84GFVJpDw66Y4BbnTlJ+MFooAWgf9pJxLrBpv79GZbkmUhpsOPHowrUtO1467r7Cf+rB1VmdHHiiTNH3ZMqIGzAtWBwhG1Wt4GldtiXJFo77ldgdwUDpaDOdFXaGHe/619ae08XyIw/rYsRSDQ3a9CgQoFfsomHGpFl4ehKr51aVzMtdHykQdI53voZDLFlyamLx/glutms0nCLEVTRYkrB8YzIrldQPThYY1zhnn7DHFiolkkd84g8TOaXyi6V5f7ZK/S2uXzr1gGMq65TXY2kFjJ5NyB6JU+GwhSaGCQ9RQzMUtLzoUWHJy9CsCHlD AZ9fnmhg h3fCQhs5hjgTAO7PAfL8snsyX1HuLcTMs60aHaObfUioUfVHAaUb6YjSJNrPIknLJFxvqbvozU1J9wcVCGvg9BWE+edl6GfxLNbu+SeHBCqKza0WrC4TX479XiWFoF7kpWSRjtCQiOFdaFF9L7cIZnczoHAY9tDUXYSL98JNaDhC/jXBncv1uw4/oDrRDCIGdVQLxtOCBiGpzvXV+znkDeBFhtan/LFCOX3Lt9m+lZim3BnAG1JFH3ddhnUA0ZYpUFpiVtESUybGq43OvapVgFIGiMZ0EgwxGCt6pS/ay3tfe2mVvxCxn+FdUPRVv8DXsEiiPNXLZsw52bx09FhcwS/fSygfZMx95k24uUqVpUsDs/Fhz9DiA8g6F3usCWlFc7Q9XPsZXaV8ON7ssGJFR/IfsI7s4S/C20pdFPWxD23iJeOOXlgdvR9B+Z0CbPExvVkVlXWVDqwO2yby0u7MmqkVXrh9v0Fl69+udR0y7wcgVr+DQbCO8AoXrIkZKeB639dfc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song In an embedded system like Android, more than half of anonymous memory is actually stored in swap devices such as zRAM. For instance, when an app is switched to the background, most of its memory might be swapped out. Currently, we have mTHP features, but unfortunately, without support for large folio swap-ins, once those large folios are swapped out, we lose them immediately because mTHP is a one-way ticket. This is unacceptable and reduces mTHP to merely a toy on systems with significant swap utilization. This patch introduces mTHP swap-in support. For now, we limit mTHP swap-ins to contiguous swaps that were likely swapped out from mTHP as a whole. Additionally, the current implementation only covers the SWAP_SYNCHRONOUS case. This is the simplest and most common use case, benefiting millions of Android phones and similar devices with minimal implementation cost. In this straightforward scenario, large folios are always exclusive, eliminating the need to handle complex rmap and swapcache issues. It offers several benefits: 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after swap-out and swap-in. 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT without fragmentation. Based on the observed data [1] on Chris's and Ryan's THP swap allocation optimization, aligned swap-in plays a crucial role in the success of THP_SWPOUT. 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU usage and enhancing compression ratios significantly. We have another patchset to enable mTHP compression and decompression in zsmalloc/zRAM[2]. Using the readahead mechanism to decide whether to swap in mTHP doesn't seem to be an optimal approach. There's a critical distinction between pagecache and anonymous pages: pagecache can be evicted and later retrieved from disk, potentially becoming a mTHP upon retrieval, whereas anonymous pages must always reside in memory or swapfile. If we swap in small folios and identify adjacent memory suitable for swapping in as mTHP, those pages that have been converted to small folios may never transition to mTHP. The process of converting mTHP into small folios remains irreversible. This introduces the risk of losing all mTHP through several swap-out and swap-in cycles, let alone losing the benefits of defragmentation, improved compression ratios, and reduced CPU usage based on mTHP compression/decompression. Conversely, in deploying mTHP on millions of real-world products with this feature in OPPO's out-of-tree code[3], we haven't observed any significant increase in memory footprint for 64KiB mTHP based on CONT-PTE on ARM64. [1] https://lore.kernel.org/linux-mm/20240622071231.576056-1-21cnbao@gmail.com/ [2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gmail.com/ [3] OnePlusOSS / android_kernel_oneplus_sm8550 https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/tree/oneplus/sm8550_u_14.0.0_oneplus11 -v4: Many parts of v3 have been merged into the mm tree with the help on reviewing from Ryan, David, Ying and Chris etc. Thank you very much! This is the final part to allocate large folios and map them. * Use Yosry's zswap_never_enabled(), notice there is a bug. I put the bug fix in this v4 RFC though it should be fixed in Yosry's patch * lots of code improvement (drop large stack, hold ptl etc) according to Yosry's and Ryan's feedback * rebased on top of the latest mm-unstable and utilized some new helpers introduced recently. -v3: https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/ * avoid over-writing err in __swap_duplicate_nr, pointed out by Yosry, thanks! * fix the issue folio is charged twice for do_swap_page, separating alloc_anon_folio and alloc_swap_folio as they have many differences now on * memcg charing * clearing allocated folio or not -v2: https://lore.kernel.org/linux-mm/20240229003753.134193-1-21cnbao@gmail.com/ * lots of code cleanup according to Chris's comments, thanks! * collect Chris's ack tags, thanks! * address David's comment on moving to use folio_add_new_anon_rmap for !folio_test_anon in do_swap_page, thanks! * remove the MADV_PAGEOUT patch from this series as Ryan will intergrate it into swap-out series * Apply Kairui's work of "mm/swap: fix race when skipping swapcache" on large folios swap-in as well * fixed corrupted data(zero-filled data) in two races: zswap and a part of entries are in swapcache while some others are not in by checking SWAP_HAS_CACHE while swapping in a large folio -v1: https://lore.kernel.org/all/20240118111036.72641-1-21cnbao@gmail.com/#t Barry Song (1): mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Chuanhua Han (1): mm: support large folios swapin as a whole for zRAM-like swapfile include/linux/swap.h | 4 +- include/linux/zswap.h | 2 +- mm/memory.c | 210 +++++++++++++++++++++++++++++++++++------- mm/swap.h | 4 +- mm/swap_state.c | 2 +- mm/swapfile.c | 114 +++++++++++++---------- 6 files changed, 251 insertions(+), 85 deletions(-)