From patchwork Tue Feb 11 00:30:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13968639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC0B4C0219D for ; Tue, 11 Feb 2025 00:32:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AD7028000E; Mon, 10 Feb 2025 19:32:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 35CC7280008; Mon, 10 Feb 2025 19:32:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FD8F28000E; Mon, 10 Feb 2025 19:32:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 00704280008 for ; Mon, 10 Feb 2025 19:32:05 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AEDC4A081C for ; Tue, 11 Feb 2025 00:32:05 +0000 (UTC) X-FDA: 83105786610.03.627E86B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 059F4C0002 for ; Tue, 11 Feb 2025 00:32:03 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Ow4XB/WT"; spf=pass (imf22.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739233924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8DVGOzKE1Z7tgv2NOONtjQAZbYReHdoD7kuq0sx/74A=; b=7f3GAAhnHtLYrDML6eCloIb7dNX/p3EQeYa9tRSFmdseyEQHwMeRfncuSlu8Wc3/OKg6Vy 0RrP/V4JJLhoYYVTADJQ1RrxW0LAFJPgjm4DTFFlqRqM2YlufV8j3m9trpYMyloUvGUq79 kA0t360j3SV3vob22X+7Pr3OdBFTb7M= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Ow4XB/WT"; spf=pass (imf22.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739233924; a=rsa-sha256; cv=none; b=BBJ4YElTOK8lfJlIk4uzYiVNDspMtrjTlxVxkT1Dbz19YyczqPaRjo97NeqtewEsPB42tL kn5InW5fP1i1akRqmQknRCbg3VMMYkZ40qM4Ai6CC8KMl9N2hpNBHOXvnalrHLyDYKzqWy wYtWzLlWSRxwtOLXPi5JmR/6e4f0Dbw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8DVGOzKE1Z7tgv2NOONtjQAZbYReHdoD7kuq0sx/74A=; b=Ow4XB/WTHtyYw7hSCMf3UmQLWjpc6kUCC0bLz5VVsIOAFHf5G46qPEvES7Bw0+cdaF+lYr vNAasUgA69SV85T0KOUWa2sl66F1KodRX17Fc/XKhS5Q/6hG5RS4aKbY8ce+BDD9khU4NH R80j5yc4bSx5zQRrEuEAEdHm72pDc5c= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-247-ry-06wylPaeLVakMhizfSw-1; Mon, 10 Feb 2025 19:31:59 -0500 X-MC-Unique: ry-06wylPaeLVakMhizfSw-1 X-Mimecast-MFC-AGG-ID: ry-06wylPaeLVakMhizfSw Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 33633195608C; Tue, 11 Feb 2025 00:31:55 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7249818004A7; Tue, 11 Feb 2025 00:31:46 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 5/9] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Mon, 10 Feb 2025 17:30:24 -0700 Message-ID: <20250211003028.213461-6-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Rspam-User: X-Rspamd-Queue-Id: 059F4C0002 X-Stat-Signature: jbx8ajktdt81ziairisdqnj79wrjncju X-Rspamd-Server: rspam03 X-HE-Tag: 1739233923-610797 X-HE-Meta: U2FsdGVkX1/54dDIKfy2DKaTWvEaDd8O2DGAdXOGbSMVqycj4ScvgXgGhvdUgbWjMgCnQ+es6+Gw0qCSUzGkVXx5gMOyRSz1AfJhedIqLdiLXGBOT2KwfbWVuQIgUaEN80UoR9DeouRWyX8VaoWQHd+B/jje0DrEC+PBWRqBRSiXIPoH87YTVoFw0ENQbBKyCOYCt5QotnvzMW4o6IO0H7RbCYI3t9al8qSU/YByHFnnmqmIT5bUC30fs1xgd14kvGVrQQCNazhV0wweovdVV5QS2FXpOEUUfo7kYIzOqW4Y90HBVHtTN48LQkYBBrrLP07PXl8GNMIq4eZY4z49yDoZnmSyQPTSZF/pu9n4xn2nLa4IAFJhuL/2q/4keJWGMNkDZO5BeG7omZd/iH0gp8UE5Gj/HWGOcQ+2TFpRaT52lekIyUIQwYl0bM+sp8cAcMcF2DmyT5Enb3ZRr8RlbKSqjBUfWadfSByzyAFSimvbGjsfpltbUwjlZkvNs1vQetbuzpAkzSr81k03teT9DCkIjku5+n391yzTdlJJL1KcKQLzju4NL0ps8txJKkcRazZIu/jRPBssEpajWeVTFCoEgjT7KXfmqcz1PDbg5+8ca6yrroPTb82l1GNu0KNugyzroW9JDrLAZvxxX+e0oUqryQpCbsL9x4tApil7L73HGVellTXLOVWC9bGNTrKbvV3ELpvCTNS8sa4UPeftFKOUphJPbRCwVgtjtIcTaoofP5OmocmEpESqr6Jy1ySuXhORS17JUMW4nwgL3DExEvbdvnIuiFSEhK/riJLjpEcirzyqUsM3h1mL2KONq6zcHWONdWoxtWdEscCxAo1VgJRQ7WiQ2nbcD8mk6y2WZXtwXT+fTfBrkzGr7nUzA85lfIjBv9U077t0Zb7MGhlB7PQa/bSr53QKFZT/9SInXTXFlw+g8E8WL3SW5R7YXanNSXn/FkAP1t/rdUVfxve USg/v9+4 umxOl4yS1uGBIGlm7weN1/zw7/U//k8foXQ0n0vG+z4Y1isPrq/NI0TN5kosmjowZPsAzLabzeNkxMagk3Mfw1uqIy5XPlI0LWiSDRCIUeFSMK6vZ7x7ndTi742P0XXBK88bXVY4aD/jhrOjYtSNWkGT4qCcGXNoo+sUO5WdIQ4+/J/7oL7YQ+lFFABgte/C1TYpTC0s8GVhH3dShz5nMc5Xv9ph4a1GZcb7IFmPaS38xETUNUuO0bL37Drq0sj9Y3Coh2nhx70TsQrj39s2ltb633a1UsHKmsjmmWRko0FMXOI+mXcRahy8zAA/f1Q1Qz6NVBi9IXhrr0Dukgftz5EWbDHRFReZd3DVvY5IU/AQVyFrwGKzMdNRh57tCaWbmdTBrixJaaS0iNIGxN3f6ImzR/m/YrDwXk2msWaoFiJNjNvldWdP3kI+qH4bdtKg4E4Ar X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse can suffer from incosistant behavior, and memory waste "creep". disable swapin and shared support for mTHP collapse. No functional changes in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 48 ++++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0cfcdc11cabd..3776055bd477 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -565,15 +565,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct page *page = NULL; struct folio *folio = NULL; pte_t *_pte; int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; bool writable = false; + int scaled_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; + for (_pte = pte; _pte < pte + (1 << order); _pte++, address += PAGE_SIZE) { pte_t pteval = ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -581,7 +583,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <= khugepaged_max_ptes_none)) { + none_or_zero <= scaled_none)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -609,8 +611,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, /* See khugepaged_scan_pmd(). */ if (folio_likely_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -711,14 +713,15 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct folio *src, *tmp; pte_t *_pte; pte_t pteval; - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; - _pte++, address += PAGE_SIZE) { + for (_pte = pte; _pte < pte + (1 << order); + _pte++, address += PAGE_SIZE) { pteval = ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); @@ -764,7 +767,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { spinlock_t *pmd_ptl; @@ -781,7 +785,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } /* @@ -802,7 +806,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, u8 order) { unsigned int i; int result = SCAN_SUCCEED; @@ -810,7 +814,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i = 0; i < HPAGE_PMD_NR; i++) { + for (i = 0; i < (1 << order); i++) { pte_t pteval = ptep_get(pte + i); struct page *page = folio_page(folio, i); unsigned long src_addr = address + i * PAGE_SIZE; @@ -829,10 +833,10 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, if (likely(result == SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); return result; } @@ -1000,11 +1004,11 @@ static int check_pmd_still_valid(struct mm_struct *mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, u8 order) { int swapped_in = 0; vm_fault_t ret = 0; - unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end = haddr + (PAGE_SIZE << order); int result; pte_t *pte = NULL; spinlock_t *ptl; @@ -1035,6 +1039,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; + if (order != HPAGE_PMD_ORDER) { + result = SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte = pte; vmf.ptl = ptl; ret = do_swap_page(&vmf); @@ -1114,7 +1123,6 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, int result = SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; - VM_BUG_ON(address & ~HPAGE_PMD_MASK); /* @@ -1149,7 +1157,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * that case. Continuing to collapse causes inconsistency. */ result = __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) goto out_nolock; } @@ -1196,7 +1204,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result = __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result = SCAN_PMD_NULL; @@ -1226,7 +1234,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, result = __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result != SCAN_SUCCEED)) goto out_up_write;