From patchwork Tue Jul 30 12:46:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Usama Arif X-Patchwork-Id: 13747384 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9CD7C3DA49 for ; Tue, 30 Jul 2024 12:54:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49CFB6B009A; Tue, 30 Jul 2024 08:54:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44BDF6B009B; Tue, 30 Jul 2024 08:54:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2031C6B009C; Tue, 30 Jul 2024 08:54:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E88656B009A for ; Tue, 30 Jul 2024 08:54:20 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 95C2B802CA for ; Tue, 30 Jul 2024 12:54:20 +0000 (UTC) X-FDA: 82396412280.29.E6DA538 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf24.hostedemail.com (Postfix) with ESMTP id BE69F180012 for ; Tue, 30 Jul 2024 12:54:18 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e4CFB5mu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722344014; a=rsa-sha256; cv=none; b=jmj7dtvI1X/eKwW76cMipfZM39RfEndvkzjDoXXgkwVwZJrZfS/jFho8z8k2oGGicrwtUF tqm290yOUXLRm26hDElBTwSHUd+VZL7a39D23827Mtv8wkA08ukoj9dkDL47yzdJea0wAY Sp6059ngdDKrnc9KiaknSjcgAoCSTCw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e4CFB5mu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722344014; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bVPY4ASHiKDxJoUMy5U0J/OHFKRn+kIriV1SskgJKvM=; b=oP+0U4cplOxCXiNV875BUXQDxrMWd8S8ZNG0VdlLd/gBO8zxTmgAF4ezLl6QUnSilYoS5M Q3GGcT0vqWO29q2Mh+8zTooG4tyzFHAG3hfGLaU3lltJW6brsEnrJVZc6Ld+PRZHxvjRtM AzRChkiO5Tzfsz/UnSxB/qted0vdfH0= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-44f666d9607so22943571cf.1 for ; Tue, 30 Jul 2024 05:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722344058; x=1722948858; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bVPY4ASHiKDxJoUMy5U0J/OHFKRn+kIriV1SskgJKvM=; b=e4CFB5muVhpKYwAecpBxb0TtZiKCPbNYq2O14qGCYT/8Iycu7ft7NGy2PahDjpaaZq ZL2IrX/TveiSZyL6c+s0lFUXA8bgxbvhDYJU4WuOz+A/bqLxVOA2u2vui+7cMvaUTZUd NPHvYgT2ClkjvSEcMQN+8WjRCEdAzyS3V0jdTKsoxo9ObqZZcKYsV3Bc8YWxYJlPb2ET 5VXftdqIbjv+OvaEZ9lcRuZSOpxYIxpN/AGNoh+gTY0UPrp92OQ1WvlPtm2I83roBdd8 qQljv8PK1yXDDx76xGfuf8xDtDuK7LFVW61WfECp2AD2A3zXJcAIbFbob+qu8EyyzrxM L0zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722344058; x=1722948858; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bVPY4ASHiKDxJoUMy5U0J/OHFKRn+kIriV1SskgJKvM=; b=GWi7pTm4IkNyhYHP3ngEQTAazAtVhuCq2nJY5Egc596qbmfh5lMxE71oBINDX8+BJC 5ckcsyjcStViEoFpKBDil7DZ8LrTfmNQkBS/WdeRG2gshOIZrHK77nm2oqSpZzn0z9M2 IqRa6vfKEWK7ni0FsQ8Bd9ANqC8sqzwFsAovYhe7YqsM7xTqAEGxEwPCi5+ZiTgsCETv WGiUMBf/hQ3u3Xom7QqZP4y0SWNBwCUaZuyEU7V+pDVg+lpTdEadFk0fwelRJ5/T55vu Og29Ztgd7Zs+naDVsUgAM96XVgPLV+Z3pc84i2GZNYooS7FY8nXYmR5D5Ky2YxL26GcW BPdw== X-Forwarded-Encrypted: i=1; AJvYcCWziXgi6fROeGecMTpeFClIsEIadJVLxt7rCsEVODDnsZiNAs4O9lEFUUtGsvLhJBDqifBOZZmP2KYTHK3addMD4S4= X-Gm-Message-State: AOJu0YxocSdT4PW8K8KoUOWor6Shk46RsuU8bTt94LiMmKFJ/DWyvLYa r+LgsQjj4DYvFofPFkBZgq4rUw+ViUedxSVmbUAxVXz9cAkLwt7l X-Google-Smtp-Source: AGHT+IFuFNIaZtWXUVIV7gwuLUlPC5q8beeG5HvE70wh9m/VCp60jshxDnGmEpdr8lVXHO6FbmZa+A== X-Received: by 2002:ac8:5d0e:0:b0:442:2c5f:d2f7 with SMTP id d75a77b69052e-45004f2dc53mr162177051cf.31.1722344057639; Tue, 30 Jul 2024 05:54:17 -0700 (PDT) Received: from localhost (fwdproxy-ash-007.fbsv.net. [2a03:2880:20ff:7::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44fe84199d2sm49133741cf.97.2024.07.30.05.54.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 05:54:17 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH 6/6] mm: split underutilized THPs Date: Tue, 30 Jul 2024 13:46:03 +0100 Message-ID: <20240730125346.1580150-7-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: BE69F180012 X-Stat-Signature: e9i7x8anowmt8z1s3hsk769rikwjpyg4 X-Rspam-User: X-HE-Tag: 1722344058-620678 X-HE-Meta: U2FsdGVkX19xnR8mi81rdNNftwO5tvpP8HMJOphVH5CqwWYsmvQxm/yoUd8vaLsSaoh7ZCoKtWYowrOTHYL5wTfk31ezkUpZpulycnzl8SR/9exbynnP05dNX3UgSk2OeWFmpYBYap43nzHnm2z0NMno0qr2wHoy6MlonY27bpWBiATXT/rJWfvOlKaqfxWznHgiFFCCbenhh2leGR725DtOUnN9AxYdNmRtamQ45H46YV2wP9vhR4GkMzZScdiAKk16RfZheGtFyzwZOfn+Ur9Z/TWr4QYDqXwy4YQbl0oebT80ilcnoEQgmU4Dm/Qeziw8cGetD4H8NsFrp4Loj/Ue66qMS+9lO+RwnVmZAQXLvCja/kAUZ7MqmrbOYFu3AHeq9l8GxF48ky7ueQSp2WV5/ygxTcQMgp0sHbm96tQNq/Xo7ulkjf1P78cw9/kuybauvlbv6NDxkNWiOGgxxUKYJNJGzMPtDsfj8FWxhQGKYZaEZVnsefflmSbp0ZdL+TqnkmLMnMAnHgwOBd90turj9iQItOuZZqQiNjwrNTJjCidHvMKCdAQYM9CIjPHXKYYmDJTKvfhXy6NXPCvsXHDlDVuLWW20gPXzwGaA2X9xHkFbr8rqcwl6umpsZu6lO5vFHSBWroR2RfARMtvrP71k5dzbmCmOIWeRdmzvMNI76qe2jSwyOzJJMgj19pjukjN8vuHh/lFWfic+Jo/zNDY6eBgDBHY5jMKMl2bBFp/F2SDYhaKIIHyf04ON3Fw5mGrrHFyEpdEdpWzwI+vRd4D+oV1JskyVFcavDfiZE1zBDXGrz/cQAVCm7RP+U4iJ52wceShxpPhMXPfjCpjBRJG+IYaFPpuK3WQaugqEZ6tZkqMJJ1C1du3hUiarkmYkp/psvbrVLVs47uFOCrfxVoLbV23UrKB8HAigCoeFFG4g7B/VEZHNpCyt1ysIR3TtfEK1zm4fsz/sCffO4DM PEDm4PrH EDWTSu8py715U6QEeXVmQlF1r7x4Rgi7hPplnkOaITBiJGU1C//yNgirHHrFBcDBPDqtEjxtji1ImYAuXpDa7qaykqE3Moa+eJct8/v3VBjm+B8/zJY4xMaty+RpwEzWCZ5JqOv/YOycaDajMUQUv3JiMjQ6pTKriolQoS+aUbVFZqjhiUb4wXwGydStBOJOKaH65d1zwMzoDJMBhfTIRb9VF/vPnMAbQjPyXRWKariCU9DitHto6f0KTySBSQYsjHVjG3nqGzHY4V9wWOEo1qZcNGKH82oa7v0agf7hNJ51uWXBXDl3BPMp2ws772xJJPPNZMopGK4+bVklZjOZAXRl7lB3CDq1uWsA5pjY79j1cjevQvCdZWzdEE2m5UmSOM8umNEWMz9aF5bAntp3cCMIIxHXadTBV/zGh/PBeYS5FdOQ7znk5Z4WNpn18fNkhfFXtYhO07iNVPXc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is an attempt to mitigate the issue of running out of memory when THP is always enabled. During runtime whenever a THP is being faulted in (__do_huge_pmd_anonymous_page) or collapsed by khugepaged (collapse_huge_page), the THP is added to _deferred_list. Whenever memory reclaim happens in linux, the kernel runs the deferred_split shrinker which goes through the _deferred_list. If the folio was partially mapped, the shrinker attempts to split it. A new boolean is added to be able to distinguish between partially mapped folios and others in the deferred_list at split time in deferred_split_scan. Its needed as __folio_remove_rmap decrements the folio mapcount elements, hence it won't be possible to distinguish between partially mapped folios and others in deferred_split_scan without the boolean. If folio->_partially_mapped is not set, the shrinker checks if the THP was underutilized, i.e. how many of the base 4K pages of the entire THP were zero-filled. If this number goes above a certain threshold (decided by /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the shrinker will attempt to split that THP. Then at remap time, the pages that were zero-filled are not remapped, hence saving memory. Suggested-by: Rik van Riel Co-authored-by: Johannes Weiner Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 6 ++ include/linux/huge_mm.h | 4 +- include/linux/khugepaged.h | 1 + include/linux/mm_types.h | 2 + include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 118 ++++++++++++++++++--- mm/hugetlb.c | 1 + mm/internal.h | 4 +- mm/khugepaged.c | 3 +- mm/memcontrol.c | 3 +- mm/migrate.c | 3 +- mm/rmap.c | 2 +- mm/vmscan.c | 3 +- mm/vmstat.c | 1 + 14 files changed, 130 insertions(+), 22 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 058485daf186..24eec1c03ad8 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -447,6 +447,12 @@ thp_deferred_split_page splitting it would free up some memory. Pages on split queue are going to be split under memory pressure. +thp_underutilized_split_page + is incremented when a huge page on the split queue was split + because it was underutilized. A THP is underutilized if the + number of zero pages in the THP are above a certain threshold + (/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none). + thp_split_pmd is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..00af84aa88ea 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page *page) { return split_huge_page_to_list_to_order(page, NULL, 0); } -void deferred_split_folio(struct folio *folio); +void deferred_split_folio(struct folio *folio, bool partially_mapped); void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze, struct folio *folio); @@ -484,7 +484,7 @@ static inline int split_huge_page(struct page *page) { return 0; } -static inline void deferred_split_folio(struct folio *folio) {} +static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f68865e19b0b..30baae91b225 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -4,6 +4,7 @@ #include /* MMF_VM_HUGEPAGE */ +extern unsigned int khugepaged_max_ptes_none __read_mostly; #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern struct attribute_group khugepaged_attr_group; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 485424979254..443026cf763e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -311,6 +311,7 @@ typedef struct { * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head(). * @_deferred_list: Folios to be split under memory pressure. + * @_partially_mapped: Folio was partially mapped. * @_unused_slab_obj_exts: Placeholder to match obj_exts in struct slab. * * A folio is a physically, virtually and logically contiguous set @@ -393,6 +394,7 @@ struct folio { unsigned long _head_2a; /* public: */ struct list_head _deferred_list; + bool _partially_mapped; /* private: the union with struct page is transitional */ }; struct page __page_2; diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aae5c7c5cfb4..bf1470a7a737 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -105,6 +105,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PAGE, THP_SPLIT_PAGE_FAILED, THP_DEFERRED_SPLIT_PAGE, + THP_UNDERUTILIZED_SPLIT_PAGE, THP_SPLIT_PMD, THP_SCAN_EXCEED_NONE_PTE, THP_SCAN_EXCEED_SWAP_PTE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 892467d85f3a..3305e6d0b90e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -73,6 +73,7 @@ static unsigned long deferred_split_count(struct shrinker *shrink, struct shrink_control *sc); static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc); +static bool split_underutilized_thp = true; static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; @@ -438,6 +439,27 @@ static ssize_t hpage_pmd_size_show(struct kobject *kobj, static struct kobj_attribute hpage_pmd_size_attr = __ATTR_RO(hpage_pmd_size); +static ssize_t split_underutilized_thp_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%d\n", split_underutilized_thp); +} + +static ssize_t split_underutilized_thp_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err = kstrtobool(buf, &split_underutilized_thp); + + if (err < 0) + return err; + + return count; +} + +static struct kobj_attribute split_underutilized_thp_attr = __ATTR( + thp_low_util_shrinker, 0644, split_underutilized_thp_show, split_underutilized_thp_store); + static struct attribute *hugepage_attr[] = { &enabled_attr.attr, &defrag_attr.attr, @@ -446,6 +468,7 @@ static struct attribute *hugepage_attr[] = { #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, #endif + &split_underutilized_thp_attr.attr, NULL, }; @@ -1002,6 +1025,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(vma->vm_mm); + deferred_split_folio(folio, false); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); @@ -3259,6 +3283,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, * page_deferred_list. */ list_del_init(&folio->_deferred_list); + folio->_partially_mapped = false; } spin_unlock(&ds_queue->split_queue_lock); if (mapping) { @@ -3315,11 +3340,12 @@ void __folio_undo_large_rmappable(struct folio *folio) if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; list_del_init(&folio->_deferred_list); + folio->_partially_mapped = false; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } -void deferred_split_folio(struct folio *folio) +void deferred_split_folio(struct folio *folio, bool partially_mapped) { struct deferred_split *ds_queue = get_deferred_split_queue(folio); #ifdef CONFIG_MEMCG @@ -3334,6 +3360,9 @@ void deferred_split_folio(struct folio *folio) if (folio_order(folio) <= 1) return; + if (!partially_mapped && !split_underutilized_thp) + return; + /* * The try_to_unmap() in page reclaim path might reach here too, * this may cause a race condition to corrupt deferred split queue. @@ -3347,14 +3376,14 @@ void deferred_split_folio(struct folio *folio) if (folio_test_swapcache(folio)) return; - if (!list_empty(&folio->_deferred_list)) - return; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + folio->_partially_mapped = partially_mapped; if (list_empty(&folio->_deferred_list)) { - if (folio_test_pmd_mappable(folio)) - count_vm_event(THP_DEFERRED_SPLIT_PAGE); - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + if (partially_mapped) { + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + } list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef CONFIG_MEMCG @@ -3379,6 +3408,39 @@ static unsigned long deferred_split_count(struct shrinker *shrink, return READ_ONCE(ds_queue->split_queue_len); } +static bool thp_underutilized(struct folio *folio) +{ + int num_zero_pages = 0, num_filled_pages = 0; + void *kaddr; + int i; + + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1) + return false; + + for (i = 0; i < folio_nr_pages(folio); i++) { + kaddr = kmap_local_folio(folio, i * PAGE_SIZE); + if (memchr_inv(kaddr, 0, PAGE_SIZE) == NULL) { + num_zero_pages++; + if (num_zero_pages > khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return true; + } + } else { + /* + * Another path for early exit once the number + * of non-zero filled pages exceeds threshold. + */ + num_filled_pages++; + if (num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return false; + } + } + kunmap_local(kaddr); + } + return false; +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3403,6 +3465,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, } else { /* We lost race with folio_put() */ list_del_init(&folio->_deferred_list); + folio->_partially_mapped = false; ds_queue->split_queue_len--; } if (!--sc->nr_to_scan) @@ -3411,18 +3474,45 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); list_for_each_entry_safe(folio, next, &list, _deferred_list) { + bool did_split = false; + bool underutilized = false; + + if (folio->_partially_mapped) + goto split; + underutilized = thp_underutilized(folio); + if (underutilized) + goto split; + continue; +split: if (!folio_trylock(folio)) - goto next; - /* split_huge_page() removes page from list on success */ - if (!split_folio(folio)) - split++; + continue; + did_split = !split_folio(folio); folio_unlock(folio); -next: - folio_put(folio); + if (did_split) { + /* Splitting removed folio from the list, drop reference here */ + folio_put(folio); + if (underutilized) + count_vm_event(THP_UNDERUTILIZED_SPLIT_PAGE); + split++; + } } spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); + /* + * Only add back to the queue if folio->_partially_mapped is set. + * If thp_underutilized returns false, or if split_folio fails in + * the case it was underutilized, then consider it used and don't + * add it back to split_queue. + */ + list_for_each_entry_safe(folio, next, &list, _deferred_list) { + if (folio->_partially_mapped) + list_move(&folio->_deferred_list, &ds_queue->split_queue); + else { + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; + } + folio_put(folio); + } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5a32157ca309..df2da47d0637 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1758,6 +1758,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, free_gigantic_folio(folio, huge_page_order(h)); } else { INIT_LIST_HEAD(&folio->_deferred_list); + folio->_partially_mapped = false; folio_put(folio); } } diff --git a/mm/internal.h b/mm/internal.h index 259afe44dc88..8fc072cc3023 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -657,8 +657,10 @@ static inline void prep_compound_head(struct page *page, unsigned int order) atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); - if (order > 1) + if (order > 1) { INIT_LIST_HEAD(&folio->_deferred_list); + folio->_partially_mapped = false; + } } static inline void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f3b3db104615..5a434fdbc1ef 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,7 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * * Note that these are only respected if collapse was initiated by khugepaged. */ -static unsigned int khugepaged_max_ptes_none __read_mostly; +unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; static unsigned int khugepaged_max_ptes_shared __read_mostly; @@ -1235,6 +1235,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); spin_unlock(pmd_ptl); folio = NULL; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f568b9594c2b..2ee61d619d86 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4651,7 +4651,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); + !list_empty(&folio->_deferred_list) && + folio->_partially_mapped, folio); /* * Nobody should be changing or seriously looking at diff --git a/mm/migrate.c b/mm/migrate.c index f4f06bdded70..2731ac20ff33 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1734,7 +1734,8 @@ static int migrate_pages_batch(struct list_head *from, * use _deferred_list. */ if (nr_pages > 2 && - !list_empty(&folio->_deferred_list)) { + !list_empty(&folio->_deferred_list) && + folio->_partially_mapped) { if (try_split_folio(folio, split_folios) == 0) { nr_failed++; stats->nr_thp_failed += is_thp; diff --git a/mm/rmap.c b/mm/rmap.c index 2630bde38640..1b5418121965 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1582,7 +1582,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, */ if (folio_test_anon(folio) && partially_mapped && list_empty(&folio->_deferred_list)) - deferred_split_folio(folio); + deferred_split_folio(folio, true); } __folio_mod_stat(folio, -nr, -nr_pmdmapped); diff --git a/mm/vmscan.c b/mm/vmscan.c index c89d0551655e..1bee9b1262f6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1233,7 +1233,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * Split partially mapped folios right away. * We can free the unmapped pages without IO. */ - if (data_race(!list_empty(&folio->_deferred_list)) && + if (data_race(!list_empty(&folio->_deferred_list) && + folio->_partially_mapped) && split_folio_to_list(folio, folio_list)) goto activate_locked; } diff --git a/mm/vmstat.c b/mm/vmstat.c index 5082431dad28..525fad4a1d6d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1367,6 +1367,7 @@ const char * const vmstat_text[] = { "thp_split_page", "thp_split_page_failed", "thp_deferred_split_page", + "thp_underutilized_split_page", "thp_split_pmd", "thp_scan_exceed_none_pte", "thp_scan_exceed_swap_pte",