From patchwork Fri Jul 26 09:46:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13742537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A6DFC3DA7F for ; Fri, 26 Jul 2024 09:47:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 028526B0099; Fri, 26 Jul 2024 05:47:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF2646B009C; Fri, 26 Jul 2024 05:47:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D444F6B009E; Fri, 26 Jul 2024 05:47:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AD60C6B0099 for ; Fri, 26 Jul 2024 05:47:13 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4DDFE1217BF for ; Fri, 26 Jul 2024 09:47:13 +0000 (UTC) X-FDA: 82381425546.07.A03F0D5 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf25.hostedemail.com (Postfix) with ESMTP id 685B8A000D for ; Fri, 26 Jul 2024 09:47:11 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Q1dvEpZX; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721987229; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QemnTZhx46SI6yTUVPx+f2+oh78G0V4lHylINHFTYuU=; b=utiHwH7qMbsTcwfAo7hikK4kbmSI4HSmb2PoqDOEZGH142ckeYLUi/osm5VsaY8y/I31a6 q4PH+wBJat3miw2Pm2l++AmpXTN/3nM43tzEUw6fB75UbA6uaJjAsCL06B6vz+OgVGSiOK 8LRZuxlM5jGBt2dpqyMXdcYncVuh5PI= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Q1dvEpZX; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721987229; a=rsa-sha256; cv=none; b=w1L0AFeIiC0uP76Vk78y8ICsaC3MIbbyowfI6lKyIc2QY//C7nMlRGDH2UvlZRGZZHk2EK E4tF2TiMjo1ACcAztulXIguBGj0fissViNmUWtag154A1dlZBeWksIE2p8rTYr5bSC5Tfw TtWrlMAh6pA2rf8M883AKtPGnWhkw9k= Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-7aa7703cf08so638350a12.2 for ; Fri, 26 Jul 2024 02:47:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721987230; x=1722592030; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QemnTZhx46SI6yTUVPx+f2+oh78G0V4lHylINHFTYuU=; b=Q1dvEpZXMKB7crCPwuU9tf5331Yxzjm3IZ9rHCe9w+6hxQ9qv3+BbiDFv9bT+mMycg wBykcOZdMqqetgYV4xxgpX2Ayp4dHBVzG43YRM4v7SkOzPVroDvnFGwJJrHGnVf9vBRx QEa0a1Sf+RRrlkvbmr8KE2/VnZqP/SALdOPf8+i20bIRqjEVs6q8lXziX+Sjv929pWdC SJ6wVHWKLk9Vsuvyis8IzGYc7Yr0W0/UvqETJ6wYJ+Vd+WQrEhfQzga/EU0YxLQHNfzj lV6yODOGPqsrt1Irm0fCSMH60Mabv1o/TkWuc1UjqaXSwmiuOEvwsQkQg3LAW0tvYJLp 1kvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721987230; x=1722592030; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QemnTZhx46SI6yTUVPx+f2+oh78G0V4lHylINHFTYuU=; b=Q/+orHfI7hMUFfk+DIUU4pIqTHCY1jjfu851fehiNtAKokmP5wx6/iCsEy/QV+usJr 4jhr7IKYSG1OuT3fsHTpNbY3lI5+O8Z26AOI8A3J/FItNUCsqen4du7fILhb62e/GdlP sc0C6J4frv8yLWEjS/6ufjhtE4O0QlMBzS4B0xuVd+L3bVjRtrCU1po5eceOiFZ7pX+8 K+LWbPQ6DvCYfNFo+7r2dDY2S5ad2T8XubOq+7HUmDsxQzp4ndBNUZcVjMS1FYoBEvaU ZAg4ZuZBcSaEx4c1aSHEUvYth7ec8tAue0lPoYJzxnRxvAZVdP1sZlNDi+9HBNut5Ttr 5CvQ== X-Forwarded-Encrypted: i=1; AJvYcCV3lGf6yoeq+aFBLfd2SOVwuGWy69x2fhtbfhOqr98Ard/qztjAlUDoZAQVzT4mDlR4sIPrOsZ+oeNMjk6qbU4U+OQ= X-Gm-Message-State: AOJu0Yxa6d7tcwVoa/oE1X2+iLc1AuUBLZlEgcMEz1rSlLSLn+P8bGXZ /LooWCpukADwWm8zlNl+95GW8fhk4Itl2XyCGJamIhrvo2XPqKlY X-Google-Smtp-Source: AGHT+IEuYL/V6q+/Rf3zyV7tTxuHUrsd/jgokfNn9VNFhVU1hmhRuwtOEWzWwCDmdE70qZ7N+f0uTQ== X-Received: by 2002:a05:6a21:390:b0:1c4:23f0:9665 with SMTP id adf61e73a8af0-1c472aaeb54mr6667769637.29.1721987229972; Fri, 26 Jul 2024 02:47:09 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7d15e98sm28127455ad.99.2024.07.26.02.47.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 02:47:09 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com Subject: [PATCH v5 1/4] mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Date: Fri, 26 Jul 2024 21:46:15 +1200 Message-Id: <20240726094618.401593-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240726094618.401593-1-21cnbao@gmail.com> References: <20240726094618.401593-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 8mnaw56k7ahfy3ais46u7n8mypesnj7c X-Rspamd-Queue-Id: 685B8A000D X-Rspamd-Server: rspam11 X-HE-Tag: 1721987231-710986 X-HE-Meta: U2FsdGVkX19yGXr+BjYvdIsDI5l9pXoLSaU+/C8ZylyFHvq/KybP4oJb6IqQJlOOfHUd53iN6Qlc0ifkdIsf0ByyV/cKdwBz/HBHRVpEePEIqvjcCA56SCGTTeMQ/Lxv8LIV5C9TdnWN5UVoqc2Eei5L7zfMIB0KwVxizoAv4QTONxm3dwuNHD7T9F3ZIb0xzUxyDwM9QiF+U1iucdyx2FyKgLfPRP3XPB0Y2g9WUDXw1HkNvWdgCX6mz20VV0gCM7u7awfR7DdHdG/8nFFoI8uqbINm7+qTHxuY7ek0ifIORaRe4j3Ix32ceXnPOqybiI/2UVKzBhs7UWXDfiol2rCgAKH9HsA5Tu8pZAGwO2WcLrvk644JF+Qs6piFg6R3IfN3fGftzcyiEOtOMnSjQx7J7uI8I52qlpauEzVxkN+pe/6RvyfzVWwTDk19EA7w/sYvskIgE5yqfSLgfIJzrVa3QqOsfSigraBJtKh3B/Tg+Kq/E9z18pv+m0DTK2N+ivtOiOBFS6LR6a3zAJrgtM/yi2z0vN/A59XgQqNp+qEwqP6RM4peTlbTCvOUHIedRBQobHwlm+nihYEeNEXhaCHzM4nt+d4D4rfQT7JYHnyo9VP+fKJEhlZtPP0Q2PewrBIZJ35cN5Ulx2cS/7yni15k2RDwHfrhgyn+5KOHKnfl+gCEcDlI8SicCixKlbjB6hJ7K4aCjbU/r/dnII5ETnBND7RY6jJKOQptkE5cHeGazP53fdsTXWWkh47UbTL0Chk37Z0T/3mzKbjYo2iMDYnzOinqXgrgyAXJJQ3JIHsPJkLAiZJwLLP8Ix8fXWFxER3L5BimO6ij+QX/T0iEUdUBJt6toDAUcxdMCktMvNe2IexJ8Sif2sEw5exrdnppDVfXhjT73k0fAJUBWO6yP/y29vzfBAGW19Sdt9sbuF0q4hH5XB3ddYfb7UVUP5UIsq43rS7mWe9Or/ubOXD A3BWQSFM d7ATXBpq+5KSPJbyyYuL7v8flgZtFSROY/ckdq0FkW4B6fdax3Yfc9ivnasj0QSetMJgx8DkkgyLIhF/+BUKo33tH3FP+yU6XT5MMEnffg5mR2YRFwHI/CRCVGAHIg5gJizi3Kgl4vYHTJvIScgZrj5uguj1B9ZBJolwkbX4NXZR8ez3fXd7xNg4ckYZLSz8RqqQEwfwzrpyXpAXECVoEYtLVfWDS712YlFdYw+7IHkLgsHke7FlkZtqW+B/XNUFYp+t3009ZbjyzexwZq2aVZ82N5IbSsDMUGpXwCyC2chqQkx2OTrhPfhyPmTTid0sQ3G6W78CpBxDyKc8GQqK5CQpDf8Ub82+hhq34TJ4c1PxtUpm/CyBfZv3NZJsUAFUswdbc8GgAhtiRHfOdt1Z+MyJaLI4NhJUlIQL52ByaksuYmzBA0wRtSE+/nR24bEAkpJKsjNkXNN65ehhrsaAZtX7skdzai2ZbNyAv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") supports one entry only, to support large folio swap-in, we need to handle multiple swap entries. To optimize stack usage, we iterate twice in __swap_duplicate_nr(): the first time to verify that all entries are valid, and the second time to apply the modifications to the entries. Signed-off-by: Barry Song Reviewed-by: Baolin Wang --- include/linux/swap.h | 9 +++- mm/swap.h | 10 ++++- mm/swapfile.c | 102 ++++++++++++++++++++++++++----------------- 3 files changed, 77 insertions(+), 44 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index ba7ea95d1c57..f1b28fd04533 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -480,7 +480,7 @@ extern int get_swap_pages(int n, swp_entry_t swp_entries[], int order); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swapcache_prepare_nr(swp_entry_t entry, int nr); extern void swap_free_nr(swp_entry_t entry, int nr_pages); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern void free_swap_and_cache_nr(swp_entry_t entry, int nr); @@ -554,7 +554,7 @@ static inline int swap_duplicate(swp_entry_t swp) return 0; } -static inline int swapcache_prepare(swp_entry_t swp) +static inline int swapcache_prepare_nr(swp_entry_t swp, int nr) { return 0; } @@ -612,6 +612,11 @@ static inline void swap_free(swp_entry_t entry) swap_free_nr(entry, 1); } +static inline int swapcache_prepare(swp_entry_t entry) +{ + return swapcache_prepare_nr(entry, 1); +} + #ifdef CONFIG_MEMCG static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg) { diff --git a/mm/swap.h b/mm/swap.h index baa1fa946b34..81ff7eb0be9c 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -59,7 +59,7 @@ void __delete_from_swap_cache(struct folio *folio, void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry); +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr); struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr); struct folio *filemap_get_incore_folio(struct address_space *mapping, @@ -120,7 +120,7 @@ static inline int swap_writepage(struct page *p, struct writeback_control *wbc) return 0; } -static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +static inline void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr) { } @@ -172,4 +172,10 @@ static inline unsigned int folio_swap_flags(struct folio *folio) return 0; } #endif /* CONFIG_SWAP */ + +static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +{ + swapcache_clear_nr(si, entry, 1); +} + #endif /* _MM_SWAP_H */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 5f73a8553371..e688e46f1c62 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3363,7 +3363,7 @@ void si_swapinfo(struct sysinfo *val) } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that nr swap entries are valid and increment their swap map counts. * * Returns error code in following case. * - success -> 0 @@ -3373,66 +3373,88 @@ void si_swapinfo(struct sysinfo *val) * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_nr(swp_entry_t entry, unsigned char usage, int nr) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; unsigned char count; unsigned char has_cache; - int err; + int err, i; p = swp_swap_info(entry); offset = swp_offset(entry); + VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); ci = lock_cluster_or_swap_info(p, offset); - count = p->swap_map[offset]; + err = 0; + for (i = 0; i < nr; i++) { + count = p->swap_map[offset + i]; - /* - * swapin_readahead() doesn't check if a swap entry is valid, so the - * swap entry could be SWAP_MAP_BAD. Check here with lock held. - */ - if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { - err = -ENOENT; - goto unlock_out; - } + /* + * swapin_readahead() doesn't check if a swap entry is valid, so the + * swap entry could be SWAP_MAP_BAD. Check here with lock held. + */ + if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { + err = -ENOENT; + goto unlock_out; + } - has_cache = count & SWAP_HAS_CACHE; - count &= ~SWAP_HAS_CACHE; - err = 0; + has_cache = count & SWAP_HAS_CACHE; + count &= ~SWAP_HAS_CACHE; - if (usage == SWAP_HAS_CACHE) { + if (usage == SWAP_HAS_CACHE) { + /* set SWAP_HAS_CACHE if there is no cache and entry is used */ + if (!has_cache && count) + continue; + else if (has_cache) /* someone else added cache */ + err = -EEXIST; + else /* no users remaining */ + err = -ENOENT; - /* set SWAP_HAS_CACHE if there is no cache and entry is used */ - if (!has_cache && count) - has_cache = SWAP_HAS_CACHE; - else if (has_cache) /* someone else added cache */ - err = -EEXIST; - else /* no users remaining */ - err = -ENOENT; + } else if (count || has_cache) { - } else if (count || has_cache) { + if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) + continue; + else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) + err = -EINVAL; + else if (swap_count_continued(p, offset + i, count)) + continue; + else + err = -ENOMEM; + } else + err = -ENOENT; /* unused swap entry */ - if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) + if (err) + goto unlock_out; + } + + for (i = 0; i < nr; i++) { + count = p->swap_map[offset + i]; + has_cache = count & SWAP_HAS_CACHE; + count &= ~SWAP_HAS_CACHE; + + if (usage == SWAP_HAS_CACHE) + has_cache = SWAP_HAS_CACHE; + else if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) count += usage; - else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) - err = -EINVAL; - else if (swap_count_continued(p, offset, count)) - count = COUNT_CONTINUED; else - err = -ENOMEM; - } else - err = -ENOENT; /* unused swap entry */ + count = COUNT_CONTINUED; - if (!err) - WRITE_ONCE(p->swap_map[offset], count | has_cache); + WRITE_ONCE(p->swap_map[offset + i], count | has_cache); + } unlock_out: unlock_cluster_or_swap_info(p, ci); return err; } +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + return __swap_duplicate_nr(entry, usage, 1); +} + /* * Help swapoff by noting that swap entry belongs to shmem/tmpfs * (in which case its reference count is never incremented). @@ -3459,23 +3481,23 @@ int swap_duplicate(swp_entry_t entry) } /* - * @entry: swap entry for which we allocate swap cache. + * @entry: first swap entry from which we allocate nr swap cache. * - * Called when allocating swap cache for existing swap entry, + * Called when allocating swap cache for existing swap entries, * This can return error codes. Returns 0 at success. * -EEXIST means there is a swap cache. * Note: return code is different from swap_duplicate(). */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare_nr(swp_entry_t entry, int nr) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate_nr(entry, SWAP_HAS_CACHE, nr); } -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr) { unsigned long offset = swp_offset(entry); - cluster_swap_free_nr(si, offset, 1, SWAP_HAS_CACHE); + cluster_swap_free_nr(si, offset, nr, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Fri Jul 26 09:46:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13742538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 394D0C3DA7F for ; Fri, 26 Jul 2024 09:47:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBFB46B009F; Fri, 26 Jul 2024 05:47:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B48F26B00A0; Fri, 26 Jul 2024 05:47:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E9536B00A1; Fri, 26 Jul 2024 05:47:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7E19D6B009F for ; Fri, 26 Jul 2024 05:47:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 34B621A1720 for ; Fri, 26 Jul 2024 09:47:21 +0000 (UTC) X-FDA: 82381425882.22.E73EC78 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf18.hostedemail.com (Postfix) with ESMTP id 3D3C91C0004 for ; Fri, 26 Jul 2024 09:47:19 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CIOEDnqI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721987183; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iStQPHQRdxqlUzAFxf7mNsjUptqesHyoCMxe2kJhC00=; b=MgQ19SmY9mRhjG07cMsn1VHEEa0Sqx9XtG3YIxiSGqSoYUAfqoujnJVZwuXIRp0fKrGPMV do5v7br0Img6HsbrBBv1FLq1gOCQRFskM44YNbVZUBsf0Pp6OqkV6dEFqcEkXyyGqQ0jw0 TFQqJopL6541SpOJgkP3rzNKY5/xm5o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721987183; a=rsa-sha256; cv=none; b=X33KNer4ATTPE3Ztxnn946Tb/vaUoQbrGOQQmumMSdK3HXFGCkqf6qMCaJmTfBu4u8+JSf RnmFJ4dGOBYExve4PlSaoWsGReRg7pEfvsPRBNgN0Gbb3TLxj0laGAvdxiWxSC55SNXedu i9pfdj0+5y4uKy2MBQ8Xj9Xa/FHWPbs= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CIOEDnqI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1fd6ed7688cso4135385ad.3 for ; Fri, 26 Jul 2024 02:47:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721987238; x=1722592038; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iStQPHQRdxqlUzAFxf7mNsjUptqesHyoCMxe2kJhC00=; b=CIOEDnqIGaLBgkCPeYPqIAVej6WsxgRHiB0+BNg1uNeMVRck1koIlHU87SyFFbX1p+ f2ea6DfziZzk+FRqqLLQ/MKOx1SMh06ChPjiuwzgkRdbOAKfUrN65qtjHS6cQ7ECzVPD 8KezHOu+Y4v+4eNUjmvaKFkvL15WjrLZNw1E5Zoi0cx+WScX7WL8ef+o0uGQfJapH00+ OCYDU0/TwrscIyHFZ7JFpqGb/U8gxLGfRV/JyyhbYacY+dm8yZBmUnfTZ5xRgkQ5T02L x1RHGcdKINkFOq2juT/zvdxjizHukgcaTBnPedkKs4OXc25kOqpJxRpg9R9MagJ+9rxi 0Lig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721987238; x=1722592038; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iStQPHQRdxqlUzAFxf7mNsjUptqesHyoCMxe2kJhC00=; b=LackTlupR91gOC6KWNpnPRV2zdMTsrt0gNvOO1iT8HTOe8nji8sydCgo2M1+xkr3SW HX9T4WpVDPAPpXUFm8mMLTtqkqfaIdIUIZuwJs73jVxbC2yrNum39VbGJfDUD3uwTJfe 1CSPDTLcRU9ggABkQiarvH2kbQcdoejXipk2aRee8wbjzQJLCEu2XimuUzC60eAYpW5b bTG6Xj/lpYkhZMMkXDuFRLOF7WqKn/NBeAPVOcoic6mLTVVdUDBQ7KsGEOMVczOnjoFH uTcl6ORhe8UgqSHHEeWqDAdmUiNNFZYLIpODtIGxa74JfknV7myrZagkv9c6uawNN5cs 76rA== X-Forwarded-Encrypted: i=1; AJvYcCWfQIxWuxdJqiWODSCrI5IYeUTiJvxDDCryeBBlrH8VctWS+PhG91QiUdSWMUC6r8FDd6zlgSofhyxNMHqavSgdxSY= X-Gm-Message-State: AOJu0YxjKEElk+B5N+zrXi8Xy4crb40LV1eOpYT5ErF8bY9bSuAvnBX9 Cb42ZU87cFOU7HWuK/fqbQCZRmeKGE4YfKtfl1HlAA3HkPiF8vsb X-Google-Smtp-Source: AGHT+IFV2vyejBZHn0Sbv1/bTqARWkAYD5OnxGQx8g/+AtNmVatjTvFKeAc/Eg61DQwGNl7rxqQGzQ== X-Received: by 2002:a17:903:18a:b0:1fd:6529:7447 with SMTP id d9443c01a7336-1fed389c064mr59497475ad.29.1721987237952; Fri, 26 Jul 2024 02:47:17 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7d15e98sm28127455ad.99.2024.07.26.02.47.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 02:47:17 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com Subject: [PATCH v5 2/4] mm: Introduce mem_cgroup_swapin_uncharge_swap_nr() helper for large folios swap-in Date: Fri, 26 Jul 2024 21:46:16 +1200 Message-Id: <20240726094618.401593-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240726094618.401593-1-21cnbao@gmail.com> References: <20240726094618.401593-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3D3C91C0004 X-Stat-Signature: kosdhzoeo1dfxgbyxq4kyiej6fyzh5oa X-Rspam-User: X-HE-Tag: 1721987239-997717 X-HE-Meta: U2FsdGVkX1/EZpeFeTTcvvTBVI3Q6PinE4NzQqcXvvAEdG5L+H24gqZq0xuT263Ngm7JS0REctUNNeYPmckAPeV+1NtFU8fbIik2qMJJ+cC5tUzB7gJZQeYbaIfewKWXmU14+rfMH68oyaaz4L+wAvd01bpAxt89MiNacWGTDU6a1v26+pg5/HTsU/YnptbVz9FwCipI4rUjc9gXoR6CtkHDNh8isRMgDyxlSiVOQ4ptFavGu9jlGtLp5rQRku1kszcvuIe0yHvRf4t1V+/y63Nq3TSievVxttEFsvctGyGIUx8weOc8wRzfHNnc7EJCvyTGvNi+Nhp/u/2suy5VHPHCEW2wMgMKrK/gxbZy+VtFoA8RigfvApY/cpsiBJSZU16Ana9ZQXiEbKRxOdzvQ6q90FXNLNOU6u7qou3Dl10L2MPX1r7fSh+09KZaS0+oyTpZ7hdG8BAkGNbgnlI0XygpEjMF5iyN0LduH0viXqsro5vhKTBCrLY2ddfizkw3z4q3KS1sKYQfzPydZ5UqEf4He9qeFqOhiYBkCKM7nkCOcy6MubGLnB1iiW4m3SXYH1TDoZrXuNIpL7dO7dzPxEZShTuQZC6/MKFz5SXUsq9YwYw4SFpfvoYH3Ay55+DhnDPEn27e/LQVufB7CATn6L5bCKwi+krvfHE83oLTp0w3gD9ypsG31HTp4I16/zM53cIgC7aVbVUYr/A8B/seTerDoKoiPRW20eNvAShfNgUBZm8mHpwPNiyFE2H56HnzpaFHM4dGswNdcXe2wHHsfuoT6rii/Scx4v8zFWik/JdKrkKr+q/8MRS+7CFSsIwAwXEBZDJKdn+5cGlRYLbJjyAAHrItN1zcnWwHL3LHcEYtDPawFJG0fn9eysgYrvRdZd/y3zaJby6oTdfojyedEuQFaGZj+A6sB7BtWAbl58C0480N4Mtbt7Q6/8rbz11ASqYqsx/dKsg+mtJ3ufD jfDD8SxC d3T7/4HhEZRZKcguHoOunkpn9m+9d2A5Wg9UfOzfj3xWuj8UxooGw9iH2htr4FSDbo5v3QfaCnbDCmyMiSgPV0uSObxxba5lxJMmBE0ndkED7u9peruUsyqLZQ4OZS5MZMLw6QvNvYDwsJwHSBCKGgaQdXaRif/9oLeQsYSwhzAaxutNYGP7IbxQ004Cv2Qarjw/BJWoKSL8+oxtfUfQMyaMN9RQUoEZtA3EZLFjzf/rGTxgltpcsWuk60Bv6oqthdD4ITRyo3Ix65+qLXMcUw+yXDb82lh0HrcTJ1V1KS2//Aa0Dk6cN7Cv/lxsyT/PxnyQCci90L6CcIaHcZRAMrzFR2WfF3b0FP0e4AgEC/bbrewZYWStBdXJwMXRru0AjXhTzgxXJDaZ/X7E5C7CWjqh4vbheUDtf+g0NiEHfGX8Hq8HQYeydS6UM8NvQni1IjynChsx/i6wx99QpHfxFFX1bqv5V1LsQ3NgE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song With large folios swap-in, we might need to uncharge multiple entries all together, it is better to introduce a helper for that. Signed-off-by: Barry Song Signed-off-by: Barry Song --- include/linux/memcontrol.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1b79760af685..55958cbce61b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -684,6 +684,14 @@ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); +static inline void mem_cgroup_swapin_uncharge_swap_nr(swp_entry_t entry, int nr) +{ + int i; + + for (i = 0; i < nr; i++, entry.val++) + mem_cgroup_swapin_uncharge_swap(entry); +} + void __mem_cgroup_uncharge(struct folio *folio); /** @@ -1185,6 +1193,10 @@ static inline void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) { } +static inline void mem_cgroup_swapin_uncharge_swap_nr(swp_entry_t entry, int nr) +{ +} + static inline void mem_cgroup_uncharge(struct folio *folio) { } From patchwork Fri Jul 26 09:46:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13742539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B75C9C3DA7F for ; Fri, 26 Jul 2024 09:47:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 334536B00A1; Fri, 26 Jul 2024 05:47:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BC446B00A2; Fri, 26 Jul 2024 05:47:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1119F6B00A3; Fri, 26 Jul 2024 05:47:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E43566B00A1 for ; Fri, 26 Jul 2024 05:47:29 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 87F4B417B4 for ; Fri, 26 Jul 2024 09:47:29 +0000 (UTC) X-FDA: 82381426218.11.B1DBE1E Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf27.hostedemail.com (Postfix) with ESMTP id A813840003 for ; Fri, 26 Jul 2024 09:47:27 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NzzQ8fvD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721987207; a=rsa-sha256; cv=none; b=hcJqfPuPad6rgHR9XtR6hxDCJyo0i9hUevy+n9lAOIXl/g5TkaUJjPGV7UT+nt8H2YeYWL CqCxVPtxkVv8s2EhrdEdQp2kcpwNxPxrRxBNG3aeoN10/poG/GplE4wKDSyQohQAXpEczm LQClNpp6SYv9JStYTJ7TKuEOVaJy7JE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NzzQ8fvD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721987207; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gT1UL0dJwOaNKYiFjA7EC0yRYS8xQDfFpNmhEqLpJwk=; b=bR3gFUQG70fwCuVYuc55l5GeWjN4apMiPWQaNkSQl5nGqyHQ2cjyUHObReGZO8T+EAx2WJ UtjO+F9YozfEZJIktHHR4rm+mKqMkGh8bhH9UXPVaCQ4iXUW93OZZEjy0y7yYiFW6676Wm 2VGbSdycKmDX2GzixvZMRfvq2d1erF8= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1fd90c2fc68so4150215ad.1 for ; Fri, 26 Jul 2024 02:47:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721987246; x=1722592046; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gT1UL0dJwOaNKYiFjA7EC0yRYS8xQDfFpNmhEqLpJwk=; b=NzzQ8fvDF/GjmOMQCLk0w0BoCQBwT4HG7FBihhnYNfOX85yPW8YR/ysbcrhB92tBpo IKjMaLeFyEtTW3UO2uXhELJUH2ef12ZkeXcn2X1sawkRdYmVSeTPPO4zuRwgns/Mi+dr tuGJkMQ8L3PKwgGI3csSyGe8IIx0Dd3I2qPsD42IyeOfH9mFZAU46PMyFsyCMkEO9S/f HHbDbUJaXr200oBGMGNivApioWEqLcfuq/XtQU9lHSkAJtw6CXylt5HBLcxmlZ9vAmdU QhvNDMpigGQsOvG7aCrqA4EGjjUVbKcujIAVfKKSyuq/hnJ0qZr3NW0kQ/2HBCdYeq3J p+yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721987246; x=1722592046; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gT1UL0dJwOaNKYiFjA7EC0yRYS8xQDfFpNmhEqLpJwk=; b=hs0v4Rtb7orWbmJVpFAS/nsW/J4gubYB5NLKNk3hM5+jNGydJh2o9z3FZtv8paxBvx pDw8CcMR2prU+0JUqV9DN/BauAA34weOHhSTkE4Mp4CMNocOlTUEdhZi3f3upbQ2jix8 +p14bofXzGQsZ7urkIbQIQhw0iKbjYlOKlhnsaTr08O1IhEicfMxnIt+biE6StijTrCR pLvsgVwsj+c2S1OnQvI6/vMRtORx0whVOCiDju/CvdAirMC0AHbHN5CMlzkCvQTWMtlM 0IARccHeeTW5JrFsjoeNVYnQ7HyjQ50z1zPuDCw4/NyM5oblP4Yhx6ztY9DUcvguuj3Q DdUA== X-Forwarded-Encrypted: i=1; AJvYcCU4IVXckuC4gYwnzxmk3magENGOjDq7Ds/epTd1UuGg6SKsO/riePZa+lZYZz5doeNzjm5MehGEzJsu7vKumPa75y4= X-Gm-Message-State: AOJu0YxucfI60PYVCPu4gv8qcABNC+QLqtmi2oeInpe2X61CxuZ7TgfJ EI7DhjtsaVAaE2PEQveZFy0XxUCu6izfIvlAVcSwS0ZCycLNdFfI X-Google-Smtp-Source: AGHT+IFk/zzhxB/vWOVfPadjTGBbJvMnp6xOZwuxd7f6freLAGGRqyTe1ovwigii5sLP/MuWd1JkEQ== X-Received: by 2002:a17:902:ecd1:b0:1fd:7293:3d70 with SMTP id d9443c01a7336-1fed35301e8mr63985595ad.8.1721987246237; Fri, 26 Jul 2024 02:47:26 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7d15e98sm28127455ad.99.2024.07.26.02.47.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 02:47:25 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com, Chuanhua Han Subject: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile Date: Fri, 26 Jul 2024 21:46:17 +1200 Message-Id: <20240726094618.401593-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240726094618.401593-1-21cnbao@gmail.com> References: <20240726094618.401593-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A813840003 X-Stat-Signature: tkxjozhmo5sp73a5yx4td74qunjkfecp X-Rspam-User: X-HE-Tag: 1721987247-714472 X-HE-Meta: U2FsdGVkX19LZQB3dkZCuJ+5f5Em29Yb23g/zbY+mqnyPow/wB5yTbgplPgqK7cCgoauIWarX4sy4afyDguejts3AgF6sWvWUQe0NjiAVf0If6trZvHGhhapAxy6jLcMrGy+TqN25/lgf2+Wd6TjHcCTxDc5I9OfcJ2uhzO3nPkhf4FNOMj2eP2V82gv4mIsgVpLgOKfEF9tT6Hc0Yx764TNNq1KzxwtOZPbekL90zVwYTKZQh4y7HTRJ9a0mK/Cu3aw0rOd0uUyShdMBB7hLJQuzOkV0h8jiTIkxugxSDEOslVqQUNvpudZ8HCxQRPfvwbWpB5KdznyHoRIpFHiRGeuf6rzig+ConbttjIFtEFaajN6ZR/Ryihluzq3uvmCWQvxGeTsWRX7ZP1b+TgvDrK553O+m/phwPn27YOfGCqPquL4qAZRv8+vipvK8bZ5iUw7yAVSHZ4b6vIRskG18iWyLnbsTEzbGq/UjRG/ZApw6UTLjiQjRqvnyPdfG+gJGl38bxqeKBt8ogGbpLL4t7WlsWI7FmY8jdz6XUwLgRobdY+mG8RR2pvwwd40i1VwewDuxPno2/yr8G9zqde+nwykQ7v9uanT/83/705fGECvygudMZB3ypsZ2IqcWupppyGPka0WLbe/LbLzFoQTHBfsAJVdKlXyNuOoymRX8nefhwKri5O4I9hs34VfZaiGSKbmA83yeHaX1ETxhZ57zDKD3s6d2bj/135G0AC4qd/gq1jnGyN0taeXlwMI53tvsP9wTU0RRj2+cEeLm61ILxCxSn46bTF3QnzY9KlBijIQ2c6q9YbJ74Y2HlORZTwfhCCSPgz49/JoRcoUSBW++ZkfuUVKSB1sE4YL3k/WxRB6nnE07eot+kBWcduDWvCONg6fZPtS0D9KYWhHWmLvBTHopfavPHhNZZvUrLyUcdpyCUNuZ3H6X6aGi3TVINXQc2EBvWihBjgy2gG/0Bn PAFxl7tM gpRj62vZaeP/q3lrXowuJ5UuDaPkIN5R/nrO0Te/ESUsSw4+ELRkktcNmRiXZr1ypYrihRuvOruQEegfSHNvCY5QKaGfiUV1WFbci1PvmSb6CpV1NClgEreatyOLsg+8rsUHeFz16IO7406Zpqt2SY0MlB1pxHI9y9JcAUaHhdiBUDkqeIjetdzS8BJnIW6udIjA/MxmwJcCZM4rFxz54zHoy0xDj3ycsMQ64uMcbDtxML76dNfqe8r7znChUMQoAGv00BamgQ+j7TX7BAdYFWJcT+j8mxIXtjE+U2O/lpIwZZTc33H4Fcpr+D7xs5AlZozpMGJ59EUp8u9U4AkfEEaOvlZZg6DKI7sCPYtBTy57uCqa5Y/Ak8WmofzwIW7WVJ5GCjZmgUrFZJ0GdyrCbjbGUCAGHAu7ZH0AJ21CquA9aw/9nUabZ5JEHvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuanhua Han In an embedded system like Android, more than half of anonymous memory is actually stored in swap devices such as zRAM. For instance, when an app is switched to the background, most of its memory might be swapped out. Currently, we have mTHP features, but unfortunately, without support for large folio swap-ins, once those large folios are swapped out, we lose them immediately because mTHP is a one-way ticket. This patch introduces mTHP swap-in support. For now, we limit mTHP swap-ins to contiguous swaps that were likely swapped out from mTHP as a whole. Additionally, the current implementation only covers the SWAP_SYNCHRONOUS case. This is the simplest and most common use case, benefiting millions of Android phones and similar devices with minimal implementation cost. In this straightforward scenario, large folios are always exclusive, eliminating the need to handle complex rmap and swapcache issues. It offers several benefits: 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after swap-out and swap-in. 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT without fragmentation. 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU usage and enhancing compression ratios significantly. Deploying this on millions of actual products, we haven't observed any noticeable increase in memory footprint for 64KiB mTHP based on CONT-PTE on ARM64. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- mm/memory.c | 211 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 188 insertions(+), 23 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 833d2cad6eb2..14048e9285d4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3986,6 +3986,152 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) return VM_FAULT_SIGBUS; } +/* + * check a range of PTEs are completely swap entries with + * contiguous swap offsets and the same SWAP_HAS_CACHE. + * ptep must be first one in the range + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +{ + struct swap_info_struct *si; + unsigned long addr; + swp_entry_t entry; + pgoff_t offset; + char has_cache; + int idx, i; + pte_t pte; + + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + idx = (vmf->address - addr) / PAGE_SIZE; + pte = ptep_get(ptep); + + if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -idx))) + return false; + entry = pte_to_swp_entry(pte); + offset = swp_offset(entry); + if (swap_pte_batch(ptep, nr_pages, pte) != nr_pages) + return false; + + si = swp_swap_info(entry); + has_cache = si->swap_map[offset] & SWAP_HAS_CACHE; + for (i = 1; i < nr_pages; i++) { + /* + * while allocating a large folio and doing swap_read_folio for the + * SWP_SYNCHRONOUS_IO path, which is the case the being faulted pte + * doesn't have swapcache. We need to ensure all PTEs have no cache + * as well, otherwise, we might go to swap devices while the content + * is in swapcache + */ + if ((si->swap_map[offset + i] & SWAP_HAS_CACHE) != has_cache) + return false; + } + + return true; +} + +static inline unsigned long thp_swap_suitable_orders(pgoff_t swp_offset, + unsigned long addr, unsigned long orders) +{ + int order, nr; + + order = highest_order(orders); + + /* + * To swap-in a THP with nr pages, we require its first swap_offset + * is aligned with nr. This can filter out most invalid entries. + */ + while (orders) { + nr = 1 << order; + if ((addr >> PAGE_SHIFT) % nr == swp_offset % nr) + break; + order = next_order(&orders, order); + } + + return orders; +} +#else +static inline bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +{ + return false; +} +#endif + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + unsigned long orders; + struct folio *folio; + unsigned long addr; + swp_entry_t entry; + spinlock_t *ptl; + pte_t *pte; + gfp_t gfp; + int order; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (unlikely(userfaultfd_armed(vma))) + goto fallback; + + /* + * A large swapped out folio could be partially or fully in zswap. We + * lack handling for such cases, so fallback to swapping in order-0 + * folio. + */ + if (!zswap_never_enabled()) + goto fallback; + + entry = pte_to_swp_entry(vmf->orig_pte); + /* + * Get a list of all the (large) orders below PMD_ORDER that are enabled + * and suitable for swapping THP. + */ + orders = thp_vma_allowable_orders(vma, vma->vm_flags, + TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders = thp_vma_suitable_orders(vma, vmf->address, orders); + orders = thp_swap_suitable_orders(swp_offset(entry), vmf->address, orders); + + if (!orders) + goto fallback; + + pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address & PMD_MASK, &ptl); + if (unlikely(!pte)) + goto fallback; + + /* + * For do_swap_page, find the highest order where the aligned range is + * completely swap entries with contiguous swap offsets. + */ + order = highest_order(orders); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + if (can_swapin_thp(vmf, pte + pte_index(addr), 1 << order)) + break; + order = next_order(&orders, order); + } + + pte_unmap_unlock(pte, ptl); + + /* Try allocating the highest of the remaining orders. */ + gfp = vma_thp_gfp_mask(vma); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + folio = vma_alloc_folio(gfp, order, vma, addr, true); + if (folio) + return folio; + order = next_order(&orders, order); + } + +fallback: +#endif + return vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vmf->address, false); +} + + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4074,35 +4220,37 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry)) { - /* Relax a bit to prevent rapid repeated page faults */ - schedule_timeout_uninterruptible(1); - goto out; - } - need_clear_cache = true; - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio = alloc_swap_folio(vmf); page = &folio->page; if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); + nr_pages = folio_nr_pages(folio); + if (folio_test_large(folio)) + entry.val = ALIGN_DOWN(entry.val, nr_pages); + /* + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread may + * finish swapin first, free the entry, and swapout + * reusing the same entry. It's undetectable as + * pte_same() returns true due to entry reuse. + */ + if (swapcache_prepare_nr(entry, nr_pages)) { + /* Relax a bit to prevent rapid repeated page faults */ + schedule_timeout_uninterruptible(1); + goto out_page; + } + need_clear_cache = true; + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, GFP_KERNEL, entry)) { ret = VM_FAULT_OOM; goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry); + mem_cgroup_swapin_uncharge_swap_nr(entry, nr_pages); shadow = get_shadow_from_swap_cache(entry); if (shadow) @@ -4209,6 +4357,22 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_nomap; } + /* allocated large folios for SWP_SYNCHRONOUS_IO */ + if (folio_test_large(folio) && !folio_test_swapcache(folio)) { + unsigned long nr = folio_nr_pages(folio); + unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + unsigned long idx = (vmf->address - folio_start) / PAGE_SIZE; + pte_t *folio_ptep = vmf->pte - idx; + + if (!can_swapin_thp(vmf, folio_ptep, nr)) + goto out_nomap; + + page_idx = idx; + address = folio_start; + ptep = folio_ptep; + goto check_folio; + } + nr_pages = 1; page_idx = 0; address = vmf->address; @@ -4340,11 +4504,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_add_lru_vma(folio, vma); } else if (!folio_test_anon(folio)) { /* - * We currently only expect small !anon folios, which are either - * fully exclusive or fully shared. If we ever get large folios - * here, we have to be careful. + * We currently only expect small !anon folios which are either + * fully exclusive or fully shared, or new allocated large folios + * which are fully exclusive. If we ever get large folios within + * swapcache here, we have to be careful. */ - VM_WARN_ON_ONCE(folio_test_large(folio)); + VM_WARN_ON_ONCE(folio_test_large(folio) && folio_test_swapcache(folio)); VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); folio_add_new_anon_rmap(folio, vma, address, rmap_flags); } else { @@ -4387,7 +4552,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) out: /* Clear the swap cache pin for direct swapin after PTL unlock */ if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4403,7 +4568,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_put(swapcache); } if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; From patchwork Fri Jul 26 09:46:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13742540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 864BEC3DA7F for ; Fri, 26 Jul 2024 09:47:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 111A66B00A3; Fri, 26 Jul 2024 05:47:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09C366B00A4; Fri, 26 Jul 2024 05:47:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2F036B00A5; Fri, 26 Jul 2024 05:47:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C14B66B00A3 for ; Fri, 26 Jul 2024 05:47:37 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7A5B41217AE for ; Fri, 26 Jul 2024 09:47:37 +0000 (UTC) X-FDA: 82381426554.04.41F74A3 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 9214F2002B for ; Fri, 26 Jul 2024 09:47:35 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEQSP2Jj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721987215; a=rsa-sha256; cv=none; b=wOz8j0Xl6JXJ9VhUtxapIouwQRwsGttxgodbWO5n0pTLy4xDsl09T3Xi7vPW19koGED4FH FjwQbBEnUFCsMoeKUr59W3CpZy4eTKQcJ99u+olOw6bU1mWVpyCcHWUyffh+1lqKCxqNJP gt3Js2DbI+3ltmjP+msxBv8M5lTMFz8= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEQSP2Jj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721987215; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hXeSWJ/2XL0C00VB4D2wsSnDGte33aaztEw6N6+DVHo=; b=SBAFS/YtP2vogHjE0rVKKASI5+R+THOCzU5aV2qz0WBpj1teltLcRSR0CImRmDWQRhflIE 7/l+K6hQQdu2nSVgRunIBDREoXsPImMTn6CxrwqBRzse0XIVlusqYUTzU46UzFsF+ixFK5 plSO+Tk0JQSwdwMedSypSO4FhmJxxhE= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1fc60c3ead4so3656165ad.0 for ; Fri, 26 Jul 2024 02:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721987254; x=1722592054; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hXeSWJ/2XL0C00VB4D2wsSnDGte33aaztEw6N6+DVHo=; b=PEQSP2JjjJ0nyfWmRzBls/Dh2iCXtX+0gtRpTD/cvJnb3ONUVAovyiYB88Oz7QZzn/ hx8RIxH0lqcSS0NhRHNoqDk0wl5dT+sEdyIIr66pFT69rc5LcAf5BgDMS36Vjp6S34Dz Kw5Ics+5/ianL/T0pQ/8ex8eQ1wDpNkY4AtVb+OOp9h+JfNwvXbcvSF+hSfCEQjPqIMT k57LTPfabbKSsoq5FP7TSCdtc7aQdlmPSOhmjlI6jiRrDaUrna+OWqQbkPr5Bzr0tvKo YeZT4iJDrJVzLLSXxi88TRWqbj4DVNOYvxXFCQGFgTZH6f577A1hvqUJXLulmFSPVr8H Dt2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721987254; x=1722592054; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hXeSWJ/2XL0C00VB4D2wsSnDGte33aaztEw6N6+DVHo=; b=LQdBpjbVKw4IMNwRFEk1fI2rYd60lg5PITGYXSyu7kcDkaFEVmUUrnpXJ6mMD0ih2A UqLuc6257yEdL/zlpx+2dVfO7hWCZkZ1CeVHHaL3CMujS0eLfdDna0IS+7XItBfXcz1h qhbwKImEn2b30GAso1dL8DRMsf8CXrpN2gMjE0Cb9nFhRslk6HWO3OH5yMLOQI1ckkYA KVj5KDyqLnTS6tYLpbDFYnDndfLZW4vKvSrdqAFsprcdMIlPLEi1skLWfC8xlqe7ft4i l1lF106AOKlKfibPFqmGXxhMXnngIcHCh1X9oXgQYWmD+b+dWTAVj/wbn/yUuLpXWj7p C6qg== X-Forwarded-Encrypted: i=1; AJvYcCX4q4W+QWd3Ck6g6uYGRjrSzXuT3AxzYYzZhHxSkx1ZcH3bfW4F1DoZD3+Kg1PyNTlo6Oi4m2iHLb3KImUUMV0zMGw= X-Gm-Message-State: AOJu0Yy2VX0+Vjcnc4h/C4+PKtoN0ZYdyqojnPU/uTnC0cw4CbUgxkBA 3700TtV6Qquyjzfr8a4mCM/o+SG7HJJogp6pDKxf6A/9IU2wpXQP X-Google-Smtp-Source: AGHT+IEsB98ir1v1O+pC1M3K6sUPTrlIGX/N1CqYehZBqQeCofKQvFrGpl3C0i9QXGJhDvMHh4MUqQ== X-Received: by 2002:a17:902:dacc:b0:1fd:95ac:506b with SMTP id d9443c01a7336-1fed3af263fmr68421725ad.64.1721987254203; Fri, 26 Jul 2024 02:47:34 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7d15e98sm28127455ad.99.2024.07.26.02.47.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 02:47:33 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com Subject: [PATCH v5 4/4] mm: Introduce per-thpsize swapin control policy Date: Fri, 26 Jul 2024 21:46:18 +1200 Message-Id: <20240726094618.401593-5-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240726094618.401593-1-21cnbao@gmail.com> References: <20240726094618.401593-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9214F2002B X-Stat-Signature: x7fk9b6izowqab5gqud9g95hqdz6w7io X-Rspam-User: X-HE-Tag: 1721987255-974645 X-HE-Meta: U2FsdGVkX180jhr2GUWfGzAenFi1jR6ke/vX5Y4kmvChd1y2T9GRSxXli5tAyM40wM+gUkh8Rw6538RIpv8fRxD0bUnflqoFSkj8xtjiuVOojmjp/v5/jk2dsHFC3JikZXdgrDBtnC36+IKNRnmL4CkzqCk58EEEgorYco5RF6jAtz0N5JLnVAkncT5TyF0TQIM2Q8lgAJZSUAwW0umrZpsRPBLvQ4rkBqRoUnyzY2oShI6N0lubOqHjwCT3gGyYtWUSf6eQZn24cxSN49M/LifbnevdMc4zbJEdjLYUfWsxnL6+sZghzCt2F5BzN3Ouj2n2b7IjhLPTMbiNvJkDvakUuhsSlSyWA4Lco1ppwuKTEDi4Oi7EEp2k2uoHvpACqpyGg5s8AVqUQ80wUAecaYxsYs+OsS/SAu1FhpXVBKUxW0d4zZguEeKs6wwaMQqsdzVZUg53RenW6fenfilsAmkHYOsvkADA6P4OuOodhNEza1TbveTNaOODj21tmCAZRMnPoGYWMDwqqlTeIYUU8mothxEmOimACd5tv+8z0ax9sRwW2iy0C0Uo8lMfdHlRigzKtdWI5+vUhXkQclltYglKOFzYIcMnXfkyxqg3ymJvmH/pTrcFCpN2jhP+U9A4f7CIr9A6qgzHb4S46tFDN1SOnmRAzPRU9HJEYTSjVFs6N9ghcUnOrzrQ0a3RGPf7E0571rl9hme11Nmyu8atn5ymmmqUQykL3jH1Ok20Zubk5gyIpoz8YJqXumdMs7LWUQAvK2t8j7lFdUzHoZd1gid7/0KrCt8OFHJHljGV9kGYKgOrF3r7mQ+qPH/sZ1kwLjbGmqK3eyQeZxKqAIf1lPovfzLT0KG+YHhc4Jp1NYuSSPdQjYq63FxDkZgYLE+7XSAGjEcu5KqJV4vK9eJABe4vVGMcXi0sRYHxURx2fwWFbxWFLCZPyshnyxhZHPuhR0Xc+KQwPb2fCcTI30p 1eFASo/6 caNhbEH/2R9R1LAybu4hjvTAFWB/5HNDY24o6OcQUxYcHsbPRG3IN8LrF7BHzYaFDhl0aDujzHDWDsxHg7hpig1MkByUG49Wfrg8K7xY/C/pjaXOnQaxqYUwuOT7LXwJ+mpixi4iRtHPyqpVeiVF9Rpq3Z4+zs54WZutMR8FZWP9ZryLjtMSCqn7nYoi4gKuClCicz8m5Z1QsQavoOMrQYQshk601Pb8kTRqY7yTS3ZCaUYWaN0R3R+q8ZVav11yN4N050AnfPvkhFpmVEND1c2fXuO2AW9bGvi/UWqY4ekEP8L5szd5ipnQHwMocWj0nIBue+oz0WkuSU39wfYo/wOtnF4dd02ztVk5ljMul814UJoI+DhFYrBTglXkGbTK7RuWIe5p67nqmWgEhWvI+lvyaaBS890iUKtpwPmmBGKjhA929f+ThQZzzEa4aPDmxQs+j8JA9dJOyQoU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Quote Ying's comment: A user space interface can be implemented to select different swap-in order policies, similar to the mTHP allocation order policy. We need a distinct policy because the performance characteristics of memory allocation differ significantly from those of swap-in. For example, SSD read speeds can be much slower than memory allocation. With policy selection, I believe we can implement mTHP swap-in for non-SWAP_SYNCHRONOUS scenarios as well. However, users need to understand the implications of their choices. I think that it's better to start with at least always never. I believe that we will add auto in the future to tune automatically, which can be used as default finally. Suggested-by: "Huang, Ying" Signed-off-by: Barry Song --- Documentation/admin-guide/mm/transhuge.rst | 6 +++ include/linux/huge_mm.h | 1 + mm/huge_memory.c | 44 ++++++++++++++++++++++ mm/memory.c | 3 +- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 058485daf186..2e94e956ee12 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -144,6 +144,12 @@ hugepage sizes have enabled="never". If enabling multiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation. +Transparent Hugepage Swap-in for anonymous memory can be disabled or enabled +by per-supported-THP-size with one of:: + + echo always >/sys/kernel/mm/transparent_hugepage/hugepages-kB/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/hugepages-kB/swapin_enabled + It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..25174305b17f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -92,6 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define TVA_SMAPS (1 << 0) /* Will be used for procfs */ #define TVA_IN_PF (1 << 1) /* Page fault handler */ #define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */ +#define TVA_IN_SWAPIN (1 << 3) /* Do swap-in */ #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0167dc27e365..41460847988c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -80,6 +80,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL; unsigned long huge_anon_orders_always __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; +unsigned long huge_anon_orders_swapin_always __read_mostly; unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, @@ -88,6 +89,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, { bool smaps = tva_flags & TVA_SMAPS; bool in_pf = tva_flags & TVA_IN_PF; + bool in_swapin = tva_flags & TVA_IN_SWAPIN; bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; unsigned long supported_orders; @@ -100,6 +102,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; orders &= supported_orders; + if (in_swapin) + orders &= READ_ONCE(huge_anon_orders_swapin_always); if (!orders) return 0; @@ -523,8 +527,48 @@ static ssize_t thpsize_enabled_store(struct kobject *kobj, static struct kobj_attribute thpsize_enabled_attr = __ATTR(enabled, 0644, thpsize_enabled_show, thpsize_enabled_store); +static DEFINE_SPINLOCK(huge_anon_orders_swapin_lock); + +static ssize_t thpsize_swapin_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int order = to_thpsize(kobj)->order; + const char *output; + + if (test_bit(order, &huge_anon_orders_swapin_always)) + output = "[always] never"; + else + output = "always [never]"; + + return sysfs_emit(buf, "%s\n", output); +} + +static ssize_t thpsize_swapin_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int order = to_thpsize(kobj)->order; + ssize_t ret = count; + + if (sysfs_streq(buf, "always")) { + spin_lock(&huge_anon_orders_swapin_lock); + set_bit(order, &huge_anon_orders_swapin_always); + spin_unlock(&huge_anon_orders_swapin_lock); + } else if (sysfs_streq(buf, "never")) { + spin_lock(&huge_anon_orders_swapin_lock); + clear_bit(order, &huge_anon_orders_swapin_always); + spin_unlock(&huge_anon_orders_swapin_lock); + } else + ret = -EINVAL; + + return ret; +} +static struct kobj_attribute thpsize_swapin_enabled_attr = + __ATTR(swapin_enabled, 0644, thpsize_swapin_enabled_show, thpsize_swapin_enabled_store); + static struct attribute *thpsize_attrs[] = { &thpsize_enabled_attr.attr, + &thpsize_swapin_enabled_attr.attr, #ifdef CONFIG_SHMEM &thpsize_shmem_enabled_attr.attr, #endif diff --git a/mm/memory.c b/mm/memory.c index 14048e9285d4..27c77f739a2c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4091,7 +4091,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) * and suitable for swapping THP. */ orders = thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + TVA_IN_PF | TVA_IN_SWAPIN | TVA_ENFORCE_SYSFS, + BIT(PMD_ORDER) - 1); orders = thp_vma_suitable_orders(vma, vmf->address, orders); orders = thp_swap_suitable_orders(swp_offset(entry), vmf->address, orders);