From patchwork Tue Feb 11 11:13:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 13969520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E7A8C0219B for ; Tue, 11 Feb 2025 11:14:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A49E46B0085; Tue, 11 Feb 2025 06:14:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FABC6B008C; Tue, 11 Feb 2025 06:14:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C12D6B0092; Tue, 11 Feb 2025 06:14:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 696346B0085 for ; Tue, 11 Feb 2025 06:14:50 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4B0808156B for ; Tue, 11 Feb 2025 11:14:46 +0000 (UTC) X-FDA: 83107406172.19.620EF0B Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf01.hostedemail.com (Postfix) with ESMTP id AA7D840011 for ; Tue, 11 Feb 2025 11:14:44 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739272484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qID2uk17t1uu87PO4uOK/zIqL4Gfv3A88xyN1Vfxjzc=; b=fxn3hbAS9bJPbRIbZDuPBSjZMIBTQaQrivQfQoABwHNM1GsHYUjMI3SZ5op2dpWP5cUDZ5 xFJ6tyCctbcmLcOiIj6+eX1fOqI09Vcl8Xk2r6QQCgk039vN3/q2eqQgi2VlARATEODAr/ A7QTONjeOL166fR4PVKdG7IJWqaPPTo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739272484; a=rsa-sha256; cv=none; b=v0c/vORrwfBdvJZkDdyx6HbwUyNiq5cusSHds+Ndxn4ezDVcJ+y4XkIENMIPyBdQ7QSvo8 pQCH59sflW2u+bf4aAT74YojXz1fBWmUrNb2nKAytqYrAC3lOr2VGWTCuBboGNW1Y+xBwh jSMceCpbGJXrlYqY5B2qvcNILMyZV6w= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B5BD13D5; Tue, 11 Feb 2025 03:15:05 -0800 (PST) Received: from K4MQJ0H1H2.emea.arm.com (K4MQJ0H1H2.blr.arm.com [10.162.40.80]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9B1463F5A1; Tue, 11 Feb 2025 03:14:34 -0800 (PST) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: npache@redhat.com, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [PATCH v2 06/17] khugepaged: Abstract PMD-THP collapse Date: Tue, 11 Feb 2025 16:43:15 +0530 Message-Id: <20250211111326.14295-7-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250211111326.14295-1-dev.jain@arm.com> References: <20250211111326.14295-1-dev.jain@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AA7D840011 X-Stat-Signature: mctg81mdo8n6ejcmxke6m4iidk1qexka X-HE-Tag: 1739272484-241790 X-HE-Meta: U2FsdGVkX19an8WNNj1zwul3UNjFVaGVbBQJar+DFFA7bUKHSdxf49mUa4JVkcjjosKTzuCn1Oc2XLFhDK8E44mdShNWGG7DHYy0Q8aawhlKLqVX940+XdRz2IZBOt8MrBYBjwU8JM4g6F4W4AJrNf73CcyDW76CYceeL8+aL4URuYlrQiedVaXSTdfEOanxlbPoniiyNucSQ0XljPFwTDxsyraHtNnWh8pYD2l9J1Q5JPbmmix9eERIDRbFkZOVpX7/+JYR0pvBWGTpGCdXpNkiDa99scVmzsij63oxnNsr5LMrrrBhZg5/nyGgtlbXa97PsVEEW3i+bt9NSPGNvB81l+g4BswobiWusfsNi7MgruysbHVt0VF2dqtU1MPaaIgDWHvdv4ihNQtdqhSFlp+d0hlKEVdRnqmHxeNzcmW8IzXHJXd0TDG+o853U06SMjOh4UCx+7echiZZuU+QTmWloxHgOtO1bVvXGFYMZHvfDtB1nlfEA/R2V/3tl+xtjB4TcsGhrCBIfanEabLHO8TFwNfMO7lN+y25y6z3lR5/shEjfi3IZm3tnnSpRMciRl7xqyBwFDBrlRI5csx1Z6UHoIq3fNBDvz7c8NEjBntrrcqmISFhwda9mRKpo78c6xIXR9AEmxsKSHppwDSpbfdvQoYAODd7nSAmCpxSjVhMbSjMw62pI36pi3DrsNfaCCJxolEZvMBqDLmWGlFK5ZoKJwIRYg9hJpr8Qdj5hWa/qhcghjzikFEeY7YYZZpK8MkwgF0swJr5z+og2f5+NSouKMDSKl6Y/m15nBhNOl1VOI6icjILWAM0TGE35s59k84Q0HMbFgpUv+MDj637J+K3d+MsttMzRzFKa3NFN3ftY2X+IW3moWAqwqc/2eGcr+iIN/6VAgPl3uXvkDyTJUyIHxZad09izyDAGbDRjY4rozHx2WFK/CzFhsBig7TQZEp2uV/brEW2AFjRbqH ll7rwx5w 4+gvXI114rWtdFD8dGVY/hItq7mSt/6Y0HjP74Lu+KjGgAQ2/0ErrwfO1KVnb3ZN2fTFnZDU9iZSqDkmmkf+9WOZjMtQvQDEj2aCHi6so+tAhXd3GwFSpC1CTr3VCorK2vcvP+LdZ+vwNhMCnyabldCxRN6b2LhhcLsomYG5tT4ZsCnHesfCUlT1rliLSaKYnpyT/ThvAm2FcLGFi+0ZaY9/TrnRthxQ/BPRe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Abstract away copying page contents, and setting the PMD, into vma_collapse_anon_folio_pmd(). Signed-off-by: Dev Jain --- mm/khugepaged.c | 140 +++++++++++++++++++++++++++--------------------- 1 file changed, 78 insertions(+), 62 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 99eb1f72a508..498cb5ad9ff1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1109,76 +1109,27 @@ static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, return SCAN_SUCCEED; } -static int collapse_huge_page(struct mm_struct *mm, unsigned long address, - int referenced, int unmapped, - struct collapse_control *cc) +static int vma_collapse_anon_folio_pmd(struct mm_struct *mm, unsigned long address, + struct vm_area_struct *vma, struct collapse_control *cc, pmd_t *pmd, + struct folio *folio) { LIST_HEAD(compound_pagelist); - pmd_t *pmd, _pmd; - pte_t *pte; pgtable_t pgtable; - struct folio *folio; spinlock_t *pmd_ptl, *pte_ptl; int result = SCAN_FAIL; - struct vm_area_struct *vma; struct mmu_notifier_range range; + pmd_t _pmd; + pte_t *pte; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - /* - * Before allocating the hugepage, release the mmap_lock read lock. - * The allocation can take potentially a long time if it involves - * sync compaction, and we do not need to hold the mmap_lock during - * that. We will recheck the vma after taking it again in write mode. - */ - mmap_read_unlock(mm); - - result = alloc_charge_folio(&folio, mm, HPAGE_PMD_ORDER, cc); - if (result != SCAN_SUCCEED) - goto out_nolock; - - mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, HPAGE_PMD_ORDER, cc); - if (result != SCAN_SUCCEED) { - mmap_read_unlock(mm); - goto out_nolock; - } - - result = find_pmd_or_thp_or_none(mm, address, &pmd); - if (result != SCAN_SUCCEED) { - mmap_read_unlock(mm); - goto out_nolock; - } - - if (unmapped) { - /* - * __collapse_huge_page_swapin will return with mmap_lock - * released when it fails. So we jump out_nolock directly in - * that case. Continuing to collapse causes inconsistency. - */ - result = __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, HPAGE_PMD_ORDER); - if (result != SCAN_SUCCEED) - goto out_nolock; - } - - mmap_read_unlock(mm); - /* - * Prevent all access to pagetables with the exception of - * gup_fast later handled by the ptep_clear_flush and the VM - * handled by the anon_vma lock + PG_lock. - * - * UFFDIO_MOVE is prevented to race as well thanks to the - * mmap_lock. - */ - mmap_write_lock(mm); result = hugepage_vma_revalidate(mm, address, true, &vma, HPAGE_PMD_ORDER, cc); if (result != SCAN_SUCCEED) - goto out_up_write; + goto out; /* check if the pmd is still valid */ result = check_pmd_still_valid(mm, address, pmd); if (result != SCAN_SUCCEED) - goto out_up_write; + goto out; vma_start_write(vma); anon_vma_lock_write(vma->anon_vma); @@ -1223,7 +1174,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); - goto out_up_write; + goto out; } /* @@ -1237,7 +1188,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result != SCAN_SUCCEED)) - goto out_up_write; + goto out; /* * The smp_wmb() inside __folio_mark_uptodate() ensures the @@ -1260,11 +1211,76 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, deferred_split_folio(folio, false); spin_unlock(pmd_ptl); - folio = NULL; - result = SCAN_SUCCEED; -out_up_write: +out: + return result; +} + +static int collapse_huge_page(struct mm_struct *mm, unsigned long address, + int referenced, int unmapped, int order, + struct collapse_control *cc) +{ + struct vm_area_struct *vma; + int result = SCAN_FAIL; + struct folio *folio; + pmd_t *pmd; + + /* + * Before allocating the hugepage, release the mmap_lock read lock. + * The allocation can take potentially a long time if it involves + * sync compaction, and we do not need to hold the mmap_lock during + * that. We will recheck the vma after taking it again in write mode. + */ + mmap_read_unlock(mm); + + result = alloc_charge_folio(&folio, mm, order, cc); + if (result != SCAN_SUCCEED) + goto out_nolock; + + mmap_read_lock(mm); + result = hugepage_vma_revalidate(mm, address, true, &vma, order, cc); + if (result != SCAN_SUCCEED) { + mmap_read_unlock(mm); + goto out_nolock; + } + + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) { + mmap_read_unlock(mm); + goto out_nolock; + } + + if (unmapped) { + /* + * __collapse_huge_page_swapin will return with mmap_lock + * released when it fails. So we jump out_nolock directly in + * that case. Continuing to collapse causes inconsistency. + */ + result = __collapse_huge_page_swapin(mm, vma, address, pmd, + referenced, order); + if (result != SCAN_SUCCEED) + goto out_nolock; + } + + mmap_read_unlock(mm); + /* + * Prevent all access to pagetables with the exception of + * gup_fast later handled by the ptep_clear_flush and the VM + * handled by the anon_vma lock + PG_lock. + * + * UFFDIO_MOVE is prevented to race as well thanks to the + * mmap_lock. + */ + mmap_write_lock(mm); + + if (order == HPAGE_PMD_ORDER) + result = vma_collapse_anon_folio_pmd(mm, address, vma, cc, pmd, folio); + mmap_write_unlock(mm); + + if (result == SCAN_SUCCEED) + folio = NULL; + out_nolock: if (folio) folio_put(folio); @@ -1440,7 +1456,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { result = collapse_huge_page(mm, address, referenced, - unmapped, cc); + unmapped, HPAGE_PMD_ORDER, cc); /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked = false; }