From patchwork Tue Feb 11 00:30:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13968995 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4961526460E for ; Tue, 11 Feb 2025 00:31:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233886; cv=none; b=JMlp4z5/uOwHt9cldrKsvs2f21VpPqILv31LbQEsz0IFVw6Xyzl9Ge/xHHcMaKyCNch970JBcT5KBG9MLIHz8TsTnCWEfjdfq1SGvutBJqmGVJS9J4VY3VNqBEvezi8TIRNXSLmGAPNa1Zmy/ameWGer+eVG4JMCr7nlUL1NV/E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233886; c=relaxed/simple; bh=8TtCOJAi1ApH7nKMoP+QYirQI6nJuIv7Ly2CV6VwIuA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qGJYoLasR3PLqo2yAyZBf4u5ARMc3i0GFK1+9QseKi4bzJHtBMPFbhWQmUgruzfi5KxL5cfQ6J1I8hA3YTeMsyebmE3wW2YxjvOmkPzcBmH9/2lGxp2Le5IHiSS1BgqpHRY4yCMtb1YDvW86aRA9HVZy7HgogqXZM2IVVTZUo2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S7bQXmR6; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S7bQXmR6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233883; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SA9DDRGRHuqABtbetLeril2muYd34wgOB7Djw56jQ5Y=; b=S7bQXmR6D8EEmNMYivAQiMsfdjCYhaKvTF5jOzLm7N9UFV3yQAs+t8+dJGBV+hqWG8jVoC DX0pBvC43AkdKjCG8dVsgHr5hTHmlxnvYqTmvhxDl8x1dYT3j/qQjohl1kTqdzy4dN8ycP L2nj4PJ/ZUn0EnaUtvfsFmlGOrh+eII= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-646-E4IQmJcXNUSEkjimazaj8w-1; Mon, 10 Feb 2025 19:31:19 -0500 X-MC-Unique: E4IQmJcXNUSEkjimazaj8w-1 X-Mimecast-MFC-AGG-ID: E4IQmJcXNUSEkjimazaj8w Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 93D131800874; Tue, 11 Feb 2025 00:31:14 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EA485180035E; Tue, 11 Feb 2025 00:31:04 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 1/9] introduce khugepaged_collapse_single_pmd to unify khugepaged and madvise_collapse Date: Mon, 10 Feb 2025 17:30:20 -0700 Message-ID: <20250211003028.213461-2-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 The khugepaged daemon and madvise_collapse have two different implementations that do almost the same thing. Create khugepaged_collapse_single_pmd to increase code reuse and create an entry point for future khugepaged changes. Refactor madvise_collapse and khugepaged_scan_mm_slot to use the new khugepaged_collapse_single_pmd function. Signed-off-by: Nico Pache --- mm/khugepaged.c | 96 +++++++++++++++++++++++++------------------------ 1 file changed, 50 insertions(+), 46 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5f0be134141e..46faee67378b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2365,6 +2365,52 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, } #endif +/* + * Try to collapse a single PMD starting at a PMD aligned addr, and return + * the results. + */ +static int khugepaged_collapse_single_pmd(unsigned long addr, struct mm_struct *mm, + struct vm_area_struct *vma, bool *mmap_locked, + struct collapse_control *cc) +{ + int result = SCAN_FAIL; + unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; + + if (!*mmap_locked) { + mmap_read_lock(mm); + *mmap_locked = true; + } + + if (thp_vma_allowable_order(vma, vma->vm_flags, + tva_flags, PMD_ORDER)) { + if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { + struct file *file = get_file(vma->vm_file); + pgoff_t pgoff = linear_page_index(vma, addr); + + mmap_read_unlock(mm); + *mmap_locked = false; + result = hpage_collapse_scan_file(mm, addr, file, pgoff, + cc); + fput(file); + if (result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + if (hpage_collapse_test_exit_or_disable(mm)) + goto end; + result = collapse_pte_mapped_thp(mm, addr, + !cc->is_khugepaged); + mmap_read_unlock(mm); + } + } else { + result = hpage_collapse_scan_pmd(mm, vma, addr, + mmap_locked, cc); + } + if (result == SCAN_SUCCEED || result == SCAN_PMD_MAPPED) + ++khugepaged_pages_collapsed; + } +end: + return result; +} + static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, struct collapse_control *cc) __releases(&khugepaged_mm_lock) @@ -2439,33 +2485,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, VM_BUG_ON(khugepaged_scan.address < hstart || khugepaged_scan.address + HPAGE_PMD_SIZE > hend); - if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { - struct file *file = get_file(vma->vm_file); - pgoff_t pgoff = linear_page_index(vma, - khugepaged_scan.address); - mmap_read_unlock(mm); - mmap_locked = false; - *result = hpage_collapse_scan_file(mm, - khugepaged_scan.address, file, pgoff, cc); - fput(file); - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { - mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) - goto breakouterloop; - *result = collapse_pte_mapped_thp(mm, - khugepaged_scan.address, false); - if (*result == SCAN_PMD_MAPPED) - *result = SCAN_SUCCEED; - mmap_read_unlock(mm); - } - } else { - *result = hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, &mmap_locked, cc); - } - - if (*result == SCAN_SUCCEED) - ++khugepaged_pages_collapsed; + *result = khugepaged_collapse_single_pmd(khugepaged_scan.address, + mm, vma, &mmap_locked, cc); /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2785,36 +2807,18 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); - if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { - struct file *file = get_file(vma->vm_file); - pgoff_t pgoff = linear_page_index(vma, addr); - mmap_read_unlock(mm); - mmap_locked = false; - result = hpage_collapse_scan_file(mm, addr, file, pgoff, - cc); - fput(file); - } else { - result = hpage_collapse_scan_pmd(mm, vma, addr, - &mmap_locked, cc); - } + result = khugepaged_collapse_single_pmd(addr, mm, vma, &mmap_locked, cc); + if (!mmap_locked) *prev = NULL; /* Tell caller we dropped mmap_lock */ -handle_result: switch (result) { case SCAN_SUCCEED: case SCAN_PMD_MAPPED: ++thps; break; case SCAN_PTE_MAPPED_HUGEPAGE: - BUG_ON(mmap_locked); - BUG_ON(*prev); - mmap_read_lock(mm); - result = collapse_pte_mapped_thp(mm, addr, true); - mmap_read_unlock(mm); - goto handle_result; - /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: case SCAN_PTE_NON_PRESENT: case SCAN_PTE_UFFD_WP: From patchwork Tue Feb 11 00:30:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13968996 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABD871509A0 for ; Tue, 11 Feb 2025 00:31:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233897; cv=none; b=u8jSEfIPyRcfy/ERArZSs9tGTdgbLdRv+K42JuXiF82BucjYLMy+7Z/ZA32DTAsdY8icCD26zzRRFQgFuyun+xnLhbMC6QJ2y2ZyCD3LiuyT1aDNqTYtCstpxG+1OqBlqx4vBYTrU8Q7AJvmTiigcF3Ae+RxAeX7zJpG/QxGccQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233897; c=relaxed/simple; bh=xB7GjRFu5QBg0HB0KB9s8GMGBeDvPBs3p/kxcTdlWKE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NYV4WFYEaGjxo57ndzmFsS1OlLCkRAFPidi77VHaU7rUX4S+DlBzps+jIoMn6cr0BbMw9EeTJTqX6ozC2HEho2kNIgdWXCs/zCbpodOWi9ZcTAHPZ5XwUHgMt5qxc8DE9sNSk7rVGoDAFrfTX/xEoEDrtNUWT8iDsqcm2Ocr1XI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UzlzoKhE; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UzlzoKhE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233894; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sgDjeCsWqrbzoq3HtDwQTNQUeXq83w6g2bNTg/9HnMA=; b=UzlzoKhENc/yTsudbLVJi4sd4x6l3hO0Dg0RF7tx0qQxQo0BD0NOIMq2kNeR1L+4d671aq eK1rFCNs3MGmSH1JaDtpZdeeRhZS5PMYRGIL0HtwYZM7CAVAA4HQwv+CTx+2WwSHg65odh uBXB85IuGgJnrdDvW5tyaRJ4ftAcmrs= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-168-J1BlQCIqOI2vd53uE_jn0w-1; Mon, 10 Feb 2025 19:31:30 -0500 X-MC-Unique: J1BlQCIqOI2vd53uE_jn0w-1 X-Mimecast-MFC-AGG-ID: J1BlQCIqOI2vd53uE_jn0w Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 020FE19560B7; Tue, 11 Feb 2025 00:31:25 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E24981800358; Tue, 11 Feb 2025 00:31:14 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 2/9] khugepaged: rename hpage_collapse_* to khugepaged_* Date: Mon, 10 Feb 2025 17:30:21 -0700 Message-ID: <20250211003028.213461-3-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 functions in khugepaged.c use a mix of hpage_collapse and khugepaged as the function prefix. rename all of them to khugepaged to keep things consistent and slightly shorten the function names. Signed-off-by: Nico Pache Reviewed-by: Ryan Roberts --- mm/khugepaged.c | 52 ++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 46faee67378b..4c88d17250f4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -402,14 +402,14 @@ void __init khugepaged_destroy(void) kmem_cache_destroy(mm_slot_cache); } -static inline int hpage_collapse_test_exit(struct mm_struct *mm) +static inline int khugepaged_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) +static inline int khugepaged_test_exit_or_disable(struct mm_struct *mm) { - return hpage_collapse_test_exit(mm) || + return khugepaged_test_exit(mm) || test_bit(MMF_DISABLE_THP, &mm->flags); } @@ -444,7 +444,7 @@ void __khugepaged_enter(struct mm_struct *mm) int wakeup; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); + VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) return; @@ -503,7 +503,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * hpage_collapse_test_exit() (which is guaranteed to run + * khugepaged_test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we return all * pagetables will be destroyed) until khugepaged has finished * working on the pagetables under the mmap_lock. @@ -606,7 +606,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, folio = page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); - /* See hpage_collapse_scan_pmd(). */ + /* See khugepaged_scan_pmd(). */ if (folio_likely_mapped_shared(folio)) { ++shared; if (cc->is_khugepaged && @@ -851,7 +851,7 @@ struct collapse_control khugepaged_collapse_control = { .is_khugepaged = true, }; -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -886,7 +886,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -905,7 +905,7 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc) return target_node; } #else -static int hpage_collapse_find_target_node(struct collapse_control *cc) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -925,7 +925,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, struct vm_area_struct *vma; unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -992,7 +992,7 @@ static int check_pmd_still_valid(struct mm_struct *mm, /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if hpage_collapse_scan_pmd believes it is worthwhile. + * Only done if khugepaged_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held. * Returns result: if not SCAN_SUCCEED, mmap_lock has been released. @@ -1078,7 +1078,7 @@ static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, { gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); - int node = hpage_collapse_find_target_node(cc); + int node = khugepaged_find_target_node(cc); struct folio *folio; folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); @@ -1264,7 +1264,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, return result; } -static int hpage_collapse_scan_pmd(struct mm_struct *mm, +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, struct collapse_control *cc) @@ -1380,7 +1380,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, * hit record. */ node = folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } @@ -1449,7 +1449,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (hpage_collapse_test_exit(mm)) { + if (khugepaged_test_exit(mm)) { /* free mm_slot */ hash_del(&slot->hash); list_del(&slot->mm_node); @@ -1744,7 +1744,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) continue; - if (hpage_collapse_test_exit(mm)) + if (khugepaged_test_exit(mm)) continue; /* * When a vma is registered with uffd-wp, we cannot recycle @@ -2266,7 +2266,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, return result; } -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2311,7 +2311,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, } node = folio_nid(folio); - if (hpage_collapse_scan_abort(node, cc)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } @@ -2357,7 +2357,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, return result; } #else -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, +static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr, struct file *file, pgoff_t start, struct collapse_control *cc) { @@ -2389,19 +2389,19 @@ static int khugepaged_collapse_single_pmd(unsigned long addr, struct mm_struct * mmap_read_unlock(mm); *mmap_locked = false; - result = hpage_collapse_scan_file(mm, addr, file, pgoff, + result = khugepaged_scan_file(mm, addr, file, pgoff, cc); fput(file); if (result == SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); - if (hpage_collapse_test_exit_or_disable(mm)) + if (khugepaged_test_exit_or_disable(mm)) goto end; result = collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged); mmap_read_unlock(mm); } } else { - result = hpage_collapse_scan_pmd(mm, vma, addr, + result = khugepaged_scan_pmd(mm, vma, addr, mmap_locked, cc); } if (result == SCAN_SUCCEED || result == SCAN_PMD_MAPPED) @@ -2449,7 +2449,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, goto breakouterloop_mmap_lock; progress++; - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; vma_iter_init(&vmi, mm, khugepaged_scan.address); @@ -2457,7 +2457,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, unsigned long hstart, hend; cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) { + if (unlikely(khugepaged_test_exit_or_disable(mm))) { progress++; break; } @@ -2479,7 +2479,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, bool mmap_locked = true; cond_resched(); - if (unlikely(hpage_collapse_test_exit_or_disable(mm))) + if (unlikely(khugepaged_test_exit_or_disable(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2515,7 +2515,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (hpage_collapse_test_exit(mm) || !vma) { + if (khugepaged_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find From patchwork Tue Feb 11 00:30:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13968997 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E1221DFDAB for ; Tue, 11 Feb 2025 00:31:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233906; cv=none; b=pPEonWg2+KZ+XwrsP91IoAQYoQgHxKfgW31ntjPp1Wl/fUkGQcFzD7sYkgH4QV4YpBOHGGAkxLwqJsZF0CBbFoNRvu9fTjvBEsNNNK7dpF4w/hTIqSCAPKAem58OX4gb8ilO8YzYd+M1R5TNx+2L27s37jqjlc6MBDACM3iYF9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233906; c=relaxed/simple; bh=zsg7Lb4sKnYhVww/v9vwP2UhGN1gjZUs4C+UjH1ebK0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TcRo+IVzauYDCDcphMK/qEBelGGwOEqPxm8TeDBh+auv6/SYqvos/8TpJxt8/LbSH49GeyQ7NKUa47oFQwFZWDYCuj0z93eLedpSOjvxmMLTrREbALDDWXsG7NzcVDLfYJQpyYthdEeE5s2hjXQvpx1FtiCbHiQ3zA4cdH7I9U0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MtN31F8j; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MtN31F8j" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233903; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rzjDVzZllkv1t501JGBPBZlzucpspDJXSDj68ZE8hcQ=; b=MtN31F8jjsyg24xrARU2X4F8PJqsX2QDTwt5BWLzK8ZyvLjvi5HDc7b2FNulI+u0cGGXBA yRudhV63/S2lby/8urxMLTNIisCqQPbYTEACdLTIQIq7CrKQNg/l+Vb8npTRF9G8v9XjRM 83X1qFeazAT7SiWY7wSZwgK8FbzT+zQ= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-488-OM9rMKLZPIuLmOGLavR4PQ-1; Mon, 10 Feb 2025 19:31:40 -0500 X-MC-Unique: OM9rMKLZPIuLmOGLavR4PQ-1 X-Mimecast-MFC-AGG-ID: OM9rMKLZPIuLmOGLavR4PQ Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 735BC1800873; Tue, 11 Feb 2025 00:31:35 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4FABA180035E; Tue, 11 Feb 2025 00:31:25 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 3/9] khugepaged: generalize hugepage_vma_revalidate for mTHP support Date: Mon, 10 Feb 2025 17:30:22 -0700 Message-ID: <20250211003028.213461-4-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 For khugepaged to support different mTHP orders, we must generalize this function for arbitrary orders. No functional change in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4c88d17250f4..c834ea842847 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -920,7 +920,7 @@ static int khugepaged_find_target_node(struct collapse_control *cc) static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, bool expect_anon, struct vm_area_struct **vmap, - struct collapse_control *cc) + struct collapse_control *cc, int order) { struct vm_area_struct *vma; unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; @@ -932,9 +932,9 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, if (!vma) return SCAN_VMA_NULL; - if (!thp_vma_suitable_order(vma, address, PMD_ORDER)) + if (!thp_vma_suitable_order(vma, address, order)) return SCAN_ADDRESS_RANGE; - if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER)) + if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, order)) return SCAN_VMA_CHECK; /* * Anon VMA expected, the address may be unmapped then @@ -1130,7 +1130,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, cc); + result = hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1164,7 +1164,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, cc); + result = hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -2796,7 +2796,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, mmap_read_lock(mm); mmap_locked = true; result = hugepage_vma_revalidate(mm, addr, false, &vma, - cc); + cc, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) { last_fail = result; goto out_nolock; From patchwork Tue Feb 11 00:30:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13968998 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25B951805E for ; Tue, 11 Feb 2025 00:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233914; cv=none; b=jGm/E6awTvTuzEbcJmJX56iOlxoAFvp7fZi6ggCsCMK1+mZv6wErGLawsr615g6uTXu0zypaSJM+/Wggi/rxqADX2V4cbW4Bk9fRuWPE5oTHOdRtLUlWMHkFZYaVBAUbudBfnjHT+h0jtB/AlgKw14CD0Z0F5dSQjpefyqGvh00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233914; c=relaxed/simple; bh=exluz/RAmIVPLGFMkT8KUSBrUpp5/n1b2knsIKpRLuQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pl48BScCV+iQIIN1AOEddwPaPrbpSJlEtQs/A6IQcF7MZC3jxNB82hCOSwjUR9+2NBVo4p9O7JQk0Yt8OaVsBzfl2osKA8jspk9VshB1uURqC1FQi2kbkeYkxRWtFp6b6TwU3Hcgh9isu3KxkBZqFrfEKxnhg4eNcZmnuMNvkZk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bszaF5oh; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bszaF5oh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233912; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h0sopSeKMteOebrJDoWD56LIp+P2qMN45Do9CiZPN14=; b=bszaF5ohyyE88Dkld4htpnePyqWDU6oPketEnXt0GXMNIG6hjavc6CB4fAdALoXeps0OrP MNaLpawC3LyIqUw3Y+YU443eTKHdltovdCQ0m0xqz1dQQ4ziyypMT5DAzv6UTYXdSDrwge vl0MdaB3zQxJ9FhS/ncOzuf3ck+pbIs= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-480--IkTyaSUNSaX2JxRcwKqPA-1; Mon, 10 Feb 2025 19:31:50 -0500 X-MC-Unique: -IkTyaSUNSaX2JxRcwKqPA-1 X-Mimecast-MFC-AGG-ID: -IkTyaSUNSaX2JxRcwKqPA Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 236541800373; Tue, 11 Feb 2025 00:31:46 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C2B4718004A7; Tue, 11 Feb 2025 00:31:35 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 4/9] khugepaged: generalize alloc_charge_folio for mTHP support Date: Mon, 10 Feb 2025 17:30:23 -0700 Message-ID: <20250211003028.213461-5-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 alloc_charge_folio allocates the new folio for the khugepaged collapse. Generalize the order of the folio allocations to support future mTHP collapsing. No functional changes in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c834ea842847..0cfcdc11cabd 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1074,14 +1074,14 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, } static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, - struct collapse_control *cc) + struct collapse_control *cc, int order) { gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : GFP_TRANSHUGE); int node = khugepaged_find_target_node(cc); struct folio *folio; - folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask); + folio = __folio_alloc(gfp, order, node, &cc->alloc_nmask); if (!folio) { *foliop = NULL; count_vm_event(THP_COLLAPSE_ALLOC_FAILED); @@ -1125,7 +1125,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_read_unlock(mm); - result = alloc_charge_folio(&folio, mm, cc); + result = alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) goto out_nolock; @@ -1851,7 +1851,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - result = alloc_charge_folio(&new_folio, mm, cc); + result = alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) goto out; From patchwork Tue Feb 11 00:30:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13968999 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 305E0155336 for ; Tue, 11 Feb 2025 00:32:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233926; cv=none; b=LNGNQPlACo7XSq89cPUG+EtvEH57gNxf5jHbR1T0h1qG4uTUspZO02CnO/LRpPFf8hT1rtTQsKW36JgZDZQTgzJaMBu9scL7dnhWlRzPlCbkkpfP4r69F5xvi76yAU8KxNP44No11OWNMfvltSj9brwYJ1eLAAqWbZXw28ItOAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233926; c=relaxed/simple; bh=MrEf16RAQ3fbUWRMA/rIj2XkoGKeczUB0Zt5xMSkN0c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i0ndhje10h208ekytDUeUM/OGM5tBE9rmwKdSD5NDDlX4jmEKHfiMtjw5mLNUOBAp7E2C0GmWxf80Y/r23m5dXTj8m69bnxyHvEg+96h6JrhJBMwaOpHcg9s79FULhTVBome/49AUhT6H/T0zaDYX62KXA9jPnyPaaht4ULIYGA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=AZxRgED9; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AZxRgED9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233924; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8DVGOzKE1Z7tgv2NOONtjQAZbYReHdoD7kuq0sx/74A=; b=AZxRgED9tGg9zzXbs81at7wTIbJWSzeHECg5yIuBblcNFqxgJ63YZgthE9gfJarIBSEImN MN6Ur/Q0xMd3Hj9dVFxLmWNeAGhCGN8UpikYwLtsA9k5ansfkaDtaNIRKhl5MhU+M4d64o F+iL7I0a0v4HnaNCc+il17ZPcmVhO3Q= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-247-ry-06wylPaeLVakMhizfSw-1; Mon, 10 Feb 2025 19:31:59 -0500 X-MC-Unique: ry-06wylPaeLVakMhizfSw-1 X-Mimecast-MFC-AGG-ID: ry-06wylPaeLVakMhizfSw Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 33633195608C; Tue, 11 Feb 2025 00:31:55 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7249818004A7; Tue, 11 Feb 2025 00:31:46 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 5/9] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Mon, 10 Feb 2025 17:30:24 -0700 Message-ID: <20250211003028.213461-6-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse can suffer from incosistant behavior, and memory waste "creep". disable swapin and shared support for mTHP collapse. No functional changes in this patch. Signed-off-by: Nico Pache --- mm/khugepaged.c | 48 ++++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0cfcdc11cabd..3776055bd477 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -565,15 +565,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct page *page = NULL; struct folio *folio = NULL; pte_t *_pte; int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; bool writable = false; + int scaled_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; + for (_pte = pte; _pte < pte + (1 << order); _pte++, address += PAGE_SIZE) { pte_t pteval = ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -581,7 +583,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <= khugepaged_max_ptes_none)) { + none_or_zero <= scaled_none)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -609,8 +611,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, /* See khugepaged_scan_pmd(). */ if (folio_likely_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -711,14 +713,15 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct folio *src, *tmp; pte_t *_pte; pte_t pteval; - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; - _pte++, address += PAGE_SIZE) { + for (_pte = pte; _pte < pte + (1 << order); + _pte++, address += PAGE_SIZE) { pteval = ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); @@ -764,7 +767,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { spinlock_t *pmd_ptl; @@ -781,7 +785,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } /* @@ -802,7 +806,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, u8 order) { unsigned int i; int result = SCAN_SUCCEED; @@ -810,7 +814,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i = 0; i < HPAGE_PMD_NR; i++) { + for (i = 0; i < (1 << order); i++) { pte_t pteval = ptep_get(pte + i); struct page *page = folio_page(folio, i); unsigned long src_addr = address + i * PAGE_SIZE; @@ -829,10 +833,10 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, if (likely(result == SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); return result; } @@ -1000,11 +1004,11 @@ static int check_pmd_still_valid(struct mm_struct *mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, u8 order) { int swapped_in = 0; vm_fault_t ret = 0; - unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end = haddr + (PAGE_SIZE << order); int result; pte_t *pte = NULL; spinlock_t *ptl; @@ -1035,6 +1039,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; + if (order != HPAGE_PMD_ORDER) { + result = SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte = pte; vmf.ptl = ptl; ret = do_swap_page(&vmf); @@ -1114,7 +1123,6 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, int result = SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; - VM_BUG_ON(address & ~HPAGE_PMD_MASK); /* @@ -1149,7 +1157,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * that case. Continuing to collapse causes inconsistency. */ result = __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) goto out_nolock; } @@ -1196,7 +1204,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result = __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result = SCAN_PMD_NULL; @@ -1226,7 +1234,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, result = __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result != SCAN_SUCCEED)) goto out_up_write; From patchwork Tue Feb 11 00:30:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13969000 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 614721C5F34 for ; Tue, 11 Feb 2025 00:32:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233936; cv=none; b=dEJVsu4i8jZS7sHch5HmFXCGEwwvUnPnWfM8+w43nc6pFQ82qVkGQ/PXy8ZHzm1N56oY3x0mOtQ3CoZ/IV2KIeGVQ1LnCDC22qyzCDSS+DFxHi6n8z9ITpiCNGUSKcsoQzs+77/wVxV+8xGoiLwDn4leIdorQckejFPSGFfkYlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233936; c=relaxed/simple; bh=87um8lVosQJse8e52z1KjhCWTdxNgQ0ztUugHzM2anM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PYoC6ryivjR1noOpE389rARW1o++U8HJ5o/IUM2nNzYUak2TMOaLgcRAqmr4enZF1frm/QpT6bEBlH5FnsZNwBeLCM+Rj1Ja7LeOxbNXhbzKQXQtwZ4OpZkrVlqgrkXDVcPNWumQGwvM4y1lCiotzqhukZkDhfM73cR5oAxOGGo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PF4jJhrD; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PF4jJhrD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233933; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7fUONtMIJRjZ+fpEsfBAOSqHGTOAUFKQgLDIE5UlCPg=; b=PF4jJhrDB4yq6VoQwjBMiN6G3qaNWxyUCSEIRx5Ord7/dhfxOltYBEAsJZHBOwP0exdzUs VYJcm9aaI/KNwcymyjJSSJEwy4L8GgNDeMRergTB21YOCG0kpMK3Iqs0KUMUdzkI5F+x/j V+PoQSZ39uKwr+7CVe6u+1HTQjwIuMA= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-AKmiACISPX-LKpWM4wDUDg-1; Mon, 10 Feb 2025 19:32:09 -0500 X-MC-Unique: AKmiACISPX-LKpWM4wDUDg-1 X-Mimecast-MFC-AGG-ID: AKmiACISPX-LKpWM4wDUDg Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 669A519560AA; Tue, 11 Feb 2025 00:32:05 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 80DDA1800358; Tue, 11 Feb 2025 00:31:55 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 6/9] khugepaged: introduce khugepaged_scan_bitmap for mTHP support Date: Mon, 10 Feb 2025 17:30:25 -0700 Message-ID: <20250211003028.213461-7-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 khugepaged scans PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of fully utilized sections of the PMD. create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks. by default we will set this to order 3. The reasoning is that for 4K 512 PMD size this results in a 64 bit bitmap which has some optimizations. For other arches like ARM64 64K, we can set a larger order if needed. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the PMD. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. If a order is set to "always" lets always collapse to that order in a greedy manner. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 ++ mm/khugepaged.c | 89 +++++++++++++++++++++++++++++++++++--- 2 files changed, 86 insertions(+), 7 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 1f46046080f5..1fe0c4fc9d37 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,6 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H +#define MIN_MTHP_ORDER 3 +#define MIN_MTHP_NR (1<mthp_bitmap_stack[++top] = (struct scan_bit_state) + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 }; + + while (top >= 0) { + state = cc->mthp_bitmap_stack[top--]; + order = state.order + MIN_MTHP_ORDER; + offset = state.offset; + num_chunks = 1 << (state.order); + // Skip mTHP orders that are not enabled + if (!test_bit(order, &enabled_orders)) + goto next; + + // copy the relavant section to a new bitmap + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + threshold_bits = (HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) + >> (HPAGE_PMD_ORDER - state.order); + + //Check if the region is "almost full" based on the threshold + if (bits_set > threshold_bits + || test_bit(order, &huge_anon_orders_always)) { + ret = collapse_huge_page(mm, address, referenced, unmapped, cc, + mmap_locked, order, offset * MIN_MTHP_NR); + if (ret == SCAN_SUCCEED) { + collapsed += (1 << order); + continue; + } + } + +next: + if (state.order > 0) { + next_order = state.order - 1; + mid_offset = offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1440,7 +1514,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { result = collapse_huge_page(mm, address, referenced, - unmapped, cc); + unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked = false; } @@ -2856,6 +2930,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, mmdrop(mm); kfree(cc); + return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0 : madvise_collapse_errno(last_fail); } From patchwork Tue Feb 11 00:30:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13969001 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90EF4DDBC for ; Tue, 11 Feb 2025 00:32:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233947; cv=none; b=ReFDbhGueseA8WYq4ThVsfWdtLnnW08Vp8Lww93Ieck57RCWAuJoYTUJ1dD7oYbn9ePBFUrpG2d708y+WnFXeL2Ou+GSmaV47Mu0UO6TjfAOyYfRwBaCEy2ZTPTTvnKbTta4ONFTgvzHu3rhWIh1Fntqn0uJxIGsfiJ6Y1TkvCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233947; c=relaxed/simple; bh=Zz0G5JdsMMHRBmuE5GnJ+qSLwUW2jIk/3FZtp+JAc7U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FM7L4UqyxYFKnQopIXbYFe7sByIpQ8X93zJtekv8uact1jfq2gFbAfwyZogI1q0DCpV4GjwAjfxqoh7tnhWyaBnaaF7/PUawIiQfq7RzsCY6FJzOf7OHOzpzHCOVvebqEy6yi7qTMPrRmLU7bRA+lzdi+Fq7+xhpHGWBLCYIOvI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UenFlIX8; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UenFlIX8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233944; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R5QV1YYqNLRVQY/ha5W4paMmIxjg/ybigGOpz+fH3QI=; b=UenFlIX8JeEhBDgxNylLLggMv/qzbf+A553arjS5ZSdxiEn2etu2HyV+zB5Ctmx3wve/Us bP8BXllDTkxRva5xaSzVw4pl7Lfu3ODIFeoIpHOlbwyins3FQT/VuKTiHfcxzDLNIDxwL8 FcNCGkh2XVLWpmzR8PPoWCxKmKyCWNc= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-385-C75zRJGjOeWwYpJkNdH6-g-1; Mon, 10 Feb 2025 19:32:19 -0500 X-MC-Unique: C75zRJGjOeWwYpJkNdH6-g-1 X-Mimecast-MFC-AGG-ID: C75zRJGjOeWwYpJkNdH6-g Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 546471956088; Tue, 11 Feb 2025 00:32:14 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B53A818004A7; Tue, 11 Feb 2025 00:32:05 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 7/9] khugepaged: add mTHP support Date: Mon, 10 Feb 2025 17:30:26 -0700 Message-ID: <20250211003028.213461-8-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Introduce the ability for khugepaged to collapse to different mTHP sizes. While scanning a PMD range for potential collapse candidates, keep track of pages in MIN_MTHP_ORDER chunks via a bitmap. Each bit represents a utilized region of order MIN_MTHP_ORDER ptes. We remove the restriction of max_ptes_none during the scan phase so we dont bailout early and miss potential mTHP candidates. After the scan is complete we will perform binary recursion on the bitmap to determine which mTHP size would be most efficient to collapse to. max_ptes_none will be scaled by the attempted collapse order to determine how full a THP must be to be eligible. If a mTHP collapse is attempted, but contains swapped out, or shared pages, we dont perform the collapse. Signed-off-by: Nico Pache --- mm/khugepaged.c | 122 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 83 insertions(+), 39 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c8048d9ec7fb..cd310989725b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1127,13 +1127,14 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; - pte_t *pte; + pte_t *pte, mthp_pte; pgtable_t pgtable; struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result = SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; + unsigned long _address = address + offset * PAGE_SIZE; VM_BUG_ON(address & ~HPAGE_PMD_MASK); /* @@ -1148,12 +1149,13 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, *mmap_locked = false; } - result = alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER); + result = alloc_charge_folio(&folio, mm, cc, order); if (result != SCAN_SUCCEED) goto out_nolock; mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD_ORDER); + *mmap_locked = true; + result = hugepage_vma_revalidate(mm, address, true, &vma, cc, order); if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; @@ -1171,13 +1173,14 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * released when it fails. So we jump out_nolock directly in * that case. Continuing to collapse causes inconsistency. */ - result = __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); + result = __collapse_huge_page_swapin(mm, vma, _address, pmd, + referenced, order); if (result != SCAN_SUCCEED) goto out_nolock; } mmap_read_unlock(mm); + *mmap_locked = false; /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM @@ -1187,7 +1190,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, cc, HPAGE_PMD_ORDER); + result = hugepage_vma_revalidate(mm, address, true, &vma, cc, order); if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1198,11 +1201,12 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, vma_start_write(vma); anon_vma_lock_write(vma->anon_vma); - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, - address + HPAGE_PMD_SIZE); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, + _address + (PAGE_SIZE << order)); mmu_notifier_invalidate_range_start(&range); pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */ + /* * This removes any huge TLB entry from the CPU so we won't allow * huge and small TLB entries for the same virtual address to @@ -1216,10 +1220,10 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); tlb_remove_table_sync_one(); - pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); + pte = pte_offset_map_lock(mm, &_pmd, _address, &pte_ptl); if (pte) { - result = __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist, HPAGE_PMD_ORDER); + result = __collapse_huge_page_isolate(vma, _address, pte, cc, + &compound_pagelist, order); spin_unlock(pte_ptl); } else { result = SCAN_PMD_NULL; @@ -1248,8 +1252,8 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, anon_vma_unlock_write(vma->anon_vma); result = __collapse_huge_page_copy(pte, folio, pmd, _pmd, - vma, address, pte_ptl, - &compound_pagelist, HPAGE_PMD_ORDER); + vma, _address, pte_ptl, + &compound_pagelist, order); pte_unmap(pte); if (unlikely(result != SCAN_SUCCEED)) goto out_up_write; @@ -1260,20 +1264,37 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * write. */ __folio_mark_uptodate(folio); - pgtable = pmd_pgtable(_pmd); - - _pmd = mk_huge_pmd(&folio->page, vma->vm_page_prot); - _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order == HPAGE_PMD_ORDER) { + pgtable = pmd_pgtable(_pmd); + _pmd = mk_huge_pmd(&folio->page, vma->vm_page_prot); + _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, address, pmd, _pmd); + update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { //mTHP + mthp_pte = mk_pte(&folio->page, vma->vm_page_prot); + mthp_pte = maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl); + folio_ref_add(folio, (1 << order) - 1); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + spin_lock(pte_ptl); + set_ptes(vma->vm_mm, _address, pte, mthp_pte, (1 << order)); + update_mmu_cache_range(NULL, vma, _address, pte, (1 << order)); + spin_unlock(pte_ptl); + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } folio = NULL; @@ -1353,21 +1374,27 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; + int i; int result = SCAN_FAIL, referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; struct folio *folio = NULL; unsigned long _address; + unsigned long enabled_orders; spinlock_t *ptl; int node = NUMA_NO_NODE, unmapped = 0; bool writable = false; - + int chunk_none_count = 0; + int scaled_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - MIN_MTHP_ORDER); + unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0; VM_BUG_ON(address & ~HPAGE_PMD_MASK); result = find_pmd_or_thp_or_none(mm, address, &pmd); if (result != SCAN_SUCCEED) goto out; + bitmap_zero(cc->mthp_bitmap, MAX_MTHP_BITMAP_SIZE); + bitmap_zero(cc->mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE); memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); pte = pte_offset_map_lock(mm, pmd, address, &ptl); @@ -1376,8 +1403,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out; } - for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; - _pte++, _address += PAGE_SIZE) { + for (i = 0; i < HPAGE_PMD_NR; i++) { + if (i % MIN_MTHP_NR == 0) + chunk_none_count = 0; + + _pte = pte + i; + _address = address + i * PAGE_SIZE; pte_t pteval = ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; @@ -1400,16 +1431,14 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, } } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + ++chunk_none_count; ++none_or_zero; - if (!userfaultfd_armed(vma) && - (!cc->is_khugepaged || - none_or_zero <= khugepaged_max_ptes_none)) { - continue; - } else { + if (userfaultfd_armed(vma)) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); goto out_unmap; } + continue; } if (pte_uffd_wp(pteval)) { /* @@ -1500,7 +1529,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, address))) referenced++; + + /* + * we are reading in MIN_MTHP_NR page chunks. if there are no empty + * pages keep track of it in the bitmap for mTHP collapsing. + */ + if (chunk_none_count < scaled_none && + (i + 1) % MIN_MTHP_NR == 0) + bitmap_set(cc->mthp_bitmap, i / MIN_MTHP_NR, 1); } + if (!writable) { result = SCAN_PAGE_RO; } else if (cc->is_khugepaged && @@ -1513,10 +1551,14 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { - result = collapse_huge_page(mm, address, referenced, - unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked = false; + enabled_orders = thp_vma_allowable_orders(vma, vma->vm_flags, + tva_flags, THP_ORDERS_ALL_ANON); + result = khugepaged_scan_bitmap(mm, address, referenced, unmapped, cc, + mmap_locked, enabled_orders); + if (result > 0) + result = SCAN_SUCCEED; + else + result = SCAN_FAIL; } out: trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, @@ -2476,11 +2518,13 @@ static int khugepaged_collapse_single_pmd(unsigned long addr, struct mm_struct * fput(file); if (result == SCAN_PTE_MAPPED_HUGEPAGE) { mmap_read_lock(mm); + *mmap_locked = true; if (khugepaged_test_exit_or_disable(mm)) goto end; result = collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged); mmap_read_unlock(mm); + *mmap_locked = false; } } else { result = khugepaged_scan_pmd(mm, vma, addr, From patchwork Tue Feb 11 00:30:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13969002 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A18082E630 for ; Tue, 11 Feb 2025 00:32:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233961; cv=none; b=fydgZ0o9Cq03YHBiVpWlw2KbvH2ftdNO2bBZKxLnTR0hhoMpqnV7aFpmWygQJrxVkW2+plQTANUUyD0PFhqwUJQiO6bDtTFNiOvpEnK5tYstI+JqlJyoMYRWnU7s8w6zM5DdfHByOTRP2Oz8KGwcrfOxXvpbcuVejT6s7yfvz1A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233961; c=relaxed/simple; bh=JPucIzP6hBd57vYfJsYxB4P4vQ+RDB6cBPSfUoc+880=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=anNp8V2iIDVXZn+9pfvUN952bmFsjl/IAxpMAbsjO5wD8mDzO1a1mlkBR/cqj4AO71RiC2T8jeppGsnYXYlnBiOTqS0uir9ExWP8KzWebF2rLzPmVYe+Q6X8Qx2gUZgCufdTiurBsTgvbz9s5z0yVWTIT2GYj5X135gWUSYr49E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NrlymqkL; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NrlymqkL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233958; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mWdmFB3VEeA1XnEF1XTP4N/+VjRlOdLjAcJyCJVOZpY=; b=NrlymqkLlU/WBaetj5qLJ1b756PMm88K1uFJX4sLJkdts6wFsgvNigib3jDBEXaypA1WQk z3V1q8yCtOJdewJuXu6Yza+ZwuWQD29oimbNkgdmj8CWtWsIXps08iwmM72nFUFYDvDO07 Gdb+Sm4Vj6zeGJns/wVV7TAr10mjnjc= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-20-dTR64f4SM6qnszwScxMw2w-1; Mon, 10 Feb 2025 19:32:35 -0500 X-MC-Unique: dTR64f4SM6qnszwScxMw2w-1 X-Mimecast-MFC-AGG-ID: dTR64f4SM6qnszwScxMw2w Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 84DE3195608C; Tue, 11 Feb 2025 00:32:24 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A30601800876; Tue, 11 Feb 2025 00:32:14 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 8/9] khugepaged: improve tracepoints for mTHP orders Date: Mon, 10 Feb 2025 17:30:27 -0700 Message-ID: <20250211003028.213461-9-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Add the order to the tracepoints to give better insight into what order is being operated at for khugepaged. Signed-off-by: Nico Pache --- include/trace/events/huge_memory.h | 34 +++++++++++++++++++----------- mm/khugepaged.c | 10 +++++---- 2 files changed, 28 insertions(+), 16 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 9d5c00b0285c..ea2fe20a39f5 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -92,34 +92,37 @@ TRACE_EVENT(mm_khugepaged_scan_pmd, TRACE_EVENT(mm_collapse_huge_page, - TP_PROTO(struct mm_struct *mm, int isolated, int status), + TP_PROTO(struct mm_struct *mm, int isolated, int status, int order), - TP_ARGS(mm, isolated, status), + TP_ARGS(mm, isolated, status, order), TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, isolated) __field(int, status) + __field(int, order) ), TP_fast_assign( __entry->mm = mm; __entry->isolated = isolated; __entry->status = status; + __entry->order = order; ), - TP_printk("mm=%p, isolated=%d, status=%s", + TP_printk("mm=%p, isolated=%d, status=%s order=%d", __entry->mm, __entry->isolated, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); TRACE_EVENT(mm_collapse_huge_page_isolate, TP_PROTO(struct page *page, int none_or_zero, - int referenced, bool writable, int status), + int referenced, bool writable, int status, int order), - TP_ARGS(page, none_or_zero, referenced, writable, status), + TP_ARGS(page, none_or_zero, referenced, writable, status, order), TP_STRUCT__entry( __field(unsigned long, pfn) @@ -127,6 +130,7 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __field(int, referenced) __field(bool, writable) __field(int, status) + __field(int, order) ), TP_fast_assign( @@ -135,27 +139,31 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, __entry->referenced = referenced; __entry->writable = writable; __entry->status = status; + __entry->order = order; ), - TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, writable=%d, status=%s", + TP_printk("scan_pfn=0x%lx, none_or_zero=%d, referenced=%d, writable=%d, status=%s order=%d", __entry->pfn, __entry->none_or_zero, __entry->referenced, __entry->writable, - __print_symbolic(__entry->status, SCAN_STATUS)) + __print_symbolic(__entry->status, SCAN_STATUS), + __entry->order) ); TRACE_EVENT(mm_collapse_huge_page_swapin, - TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret), + TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret, + int order), - TP_ARGS(mm, swapped_in, referenced, ret), + TP_ARGS(mm, swapped_in, referenced, ret, order), TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, swapped_in) __field(int, referenced) __field(int, ret) + __field(int, order) ), TP_fast_assign( @@ -163,13 +171,15 @@ TRACE_EVENT(mm_collapse_huge_page_swapin, __entry->swapped_in = swapped_in; __entry->referenced = referenced; __entry->ret = ret; + __entry->order = order; ), - TP_printk("mm=%p, swapped_in=%d, referenced=%d, ret=%d", + TP_printk("mm=%p, swapped_in=%d, referenced=%d, ret=%d, order=%d", __entry->mm, __entry->swapped_in, __entry->referenced, - __entry->ret) + __entry->ret, + __entry->order) ); TRACE_EVENT(mm_khugepaged_scan_file, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cd310989725b..e2ba18e57064 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -713,13 +713,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } else { result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, - referenced, writable, result); + referenced, writable, result, + order); return result; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, - referenced, writable, result); + referenced, writable, result, order); return result; } @@ -1088,7 +1089,8 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, result = SCAN_SUCCEED; out: - trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result); + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, result, + order); return result; } @@ -1305,7 +1307,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, *mmap_locked = false; if (folio) folio_put(folio); - trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result); + trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result, order); return result; } From patchwork Tue Feb 11 00:30:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nico Pache X-Patchwork-Id: 13969003 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB36F1D5ADE for ; Tue, 11 Feb 2025 00:32:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233964; cv=none; b=aKuWO3YANq2xn/9ySHVp7FULn10BX0oIqSQ+DwmqhGgVUqNKQT9g49ES5FO+ok0crMySLWIBN798tM/MKnNmf9hKtdiij3jtDtWxCIEDJ6FGbsLJiq4kEBt7YOhtoSsgTTu+Eq9WM00DVyPoQsrcR73FlDPuP5c5HZ2NQjI5g2M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739233964; c=relaxed/simple; bh=TIXbPBCNooTZR52Rh+x5HGQ5XFsxTx+mXuupu87tMHU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zxb8hrXFYdMqJmxHZ7m1O1e8bMoUlAgln+yj4eE+8JUAIBC9+bAWstM6hIkG5Cz9/LXAEytECeYu1Qtq2sGz5RniIYqeazw32uSrVkb2QdpI/cr4kPzcO/nSbF6poHq1pno38j6r2vyqzjYTgoiuZQVCDr1sDcPS26xKW/sJP1g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BSQ9tysi; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BSQ9tysi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739233961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZZxaGvzUu8D0dA0MT9NpgVy0JYjj6UualQU6CuWTo+8=; b=BSQ9tysiim6PTQuTD7xuhmOTAlG1LC85Ry4Wpng7DxTPT/6PeobgZJwUdg1Fwh4GIkxHVl B9epbKmLlx+0X7V+Ay1E10vo63wYw2ONNn3XFIm2IB847aCGIXY2ekv5afTqweEQGIkQm4 XGYiBaR91MSWwxcd/xdFxRxcaD7WPdo= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-608-NeNDOhnXMjSlO2wf6HHz1A-1; Mon, 10 Feb 2025 19:32:38 -0500 X-MC-Unique: NeNDOhnXMjSlO2wf6HHz1A-1 X-Mimecast-MFC-AGG-ID: NeNDOhnXMjSlO2wf6HHz1A Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 424F21956054; Tue, 11 Feb 2025 00:32:33 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.129]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D3B411800358; Tue, 11 Feb 2025 00:32:24 +0000 (UTC) From: Nico Pache To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de Subject: [RFC v2 9/9] khugepaged: skip collapsing mTHP to smaller orders Date: Mon, 10 Feb 2025 17:30:28 -0700 Message-ID: <20250211003028.213461-10-npache@redhat.com> In-Reply-To: <20250211003028.213461-1-npache@redhat.com> References: <20250211003028.213461-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 khugepaged may try to collapse a mTHP to a smaller mTHP, resulting in some pages being unmapped. Skip these cases until we have a way to check if its ok to collapse to a smaller mTHP size (like in the case of a partially mapped folio). This patch is inspired by Dev Jain's work on khugepaged mTHP support [1]. [1] https://lore.kernel.org/lkml/20241216165105.56185-11-dev.jain@arm.com/ Signed-off-by: Nico Pache --- mm/khugepaged.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e2ba18e57064..fc30698b8e6e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -622,6 +622,11 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, folio = page_folio(page); VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio); + if (order != HPAGE_PMD_ORDER && folio_order(folio) >= order) { + result = SCAN_PTE_MAPPED_HUGEPAGE; + goto out; + } + /* See khugepaged_scan_pmd(). */ if (folio_likely_mapped_shared(folio)) { ++shared;