From patchwork Tue Mar 8 21:34:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EBE9C433FE for ; Tue, 8 Mar 2022 21:34:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B17608D0006; Tue, 8 Mar 2022 16:34:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9F1E8D0001; Tue, 8 Mar 2022 16:34:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91AAE8D0006; Tue, 8 Mar 2022 16:34:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 82E938D0001 for ; Tue, 8 Mar 2022 16:34:53 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3A4DCA15E4 for ; Tue, 8 Mar 2022 21:34:53 +0000 (UTC) X-FDA: 79222524066.17.A692B9E Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf09.hostedemail.com (Postfix) with ESMTP id C041314000C for ; Tue, 8 Mar 2022 21:34:52 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id w13-20020a1709027b8d00b0014fb4f012d3so124534pll.12 for ; Tue, 08 Mar 2022 13:34:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=+lRk9Yw5FndqvuB7uGyspfcX1upM1bCkLuVZqsL26no=; b=j4k/61voIeMKHKn2nyzf/ASuMPMU7+4XFrw+rKipLfUjvTS0HyzRQ2D/Y7VlHr7sqL JdaAI4IZse7duXR5UuSy+BFm72DjH0GRu039iIZaj73tQH0EJtONjA6ihmF9Ryp7cnSD XRFzw/OhRuxL2wOz9ma+6CFAgT7TtSMky2nmW9v97YWaf2IqoGpgFc9vwizrEc3j6Fny nzBR8EeBC1McmbtzJwZN0jE0DF/r0s4XTdAQgVsxWq9rUo7LNNI+4wtD42D480U2OeMz zq4pUoeLBoB9JbYdiObZjODOk4YGzoahpwEJ+drr/K+zSS7fjDdqQcoeD7iaS0Lxu/ks A0cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=+lRk9Yw5FndqvuB7uGyspfcX1upM1bCkLuVZqsL26no=; b=AKPpL5T6xTFCRlwCmlwxnW717SaSZhaxF5ES7opZ7ICBPT5cpxCOcb0KIDXD3ig5mt ubwCVxUJLVd67hziiKu7IIRaTae7cfBMIgeqh9QuW5sRSQkncp7AAkWjPg/Ps24Lbtyc AX/V9IJ1xzHDa8GjtejjH0m1xT8BYItrs4Jr22+PAlzeCsAhNK3fsWtq9scUPfXb+Xnh te/W2pDXpwd+fLKJB0n8v+AUaImWkNq+udJW180i0qRUCVetIKQdhf8dVabckyJQws07 F+QCV2JJ/5v0N9kosGKZCBE9pFV6XT8WiV9tfdh77rf84n8/nKiKIx/gaTiQBZOXVI3O YP/g== X-Gm-Message-State: AOAM532Nb4sNHF29BtPeAgt8GhRH6T8QXyZxzjnNU6lEYzB+LZqC+jtx TL50ok/bigi9u/urunTIsNNBpnTDsNk4 X-Google-Smtp-Source: ABdhPJyi6c4gqjy9u6OJOMVextgij1eoe8hgLPyeyWRbP2OwddCNIQN1ztDocVr0qlcONDm+1+BeRoqDpf5e X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id pg16-20020a17090b1e1000b001bf6c7854a9mr927812pjb.1.1646775291422; Tue, 08 Mar 2022 13:34:51 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:06 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 03/14] mm/khugepaged: add __do_collapse_huge_page() helper From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Queue-Id: C041314000C X-Stat-Signature: z8ns8xpkcyt6dxo1wbf1z33dzwopi7fg Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="j4k/61vo"; spf=pass (imf09.hostedemail.com: domain of 3-8snYgcKCEoB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3-8snYgcKCEoB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1646775292-799544 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: collapse_huge_page currently does: (1) possibly allocates a hugepage, (2) charges the owning memcg, (3) swaps in swapped-out pages (4) the actual collapse (copying of pages, installation of huge pmd), and (5) some final memcg accounting in error path. Separate out (4) so that it can be reused by itself later in the series. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 178 +++++++++++++++++++++++++++--------------------- 1 file changed, 100 insertions(+), 78 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 36fc0099c445..e3399a451662 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1058,85 +1058,23 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped, - int enforce_pte_scan_limits) -{ - LIST_HEAD(compound_pagelist); - pmd_t *pmd, _pmd; +static int __do_collapse_huge_page(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, pmd_t *pmd, + struct page *new_page, + int enforce_pte_scan_limits, + int *isolated_out) +{ + pmd_t _pmd; pte_t *pte; pgtable_t pgtable; - struct page *new_page; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; - struct vm_area_struct *vma; + int isolated = 0, result = SCAN_SUCCEED; struct mmu_notifier_range range; - gfp_t gfp; - - VM_BUG_ON(address & ~HPAGE_PMD_MASK); - - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - - /* - * Before allocating the hugepage, release the mmap_lock read lock. - * The allocation can take potentially a long time if it involves - * sync compaction, and we do not need to hold the mmap_lock during - * that. We will recheck the vma after taking it again in write mode. - */ - mmap_read_unlock(mm); - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; - goto out_nolock; - } - - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out_nolock; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); - - mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) { - mmap_read_unlock(mm); - goto out_nolock; - } - - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; - mmap_read_unlock(mm); - goto out_nolock; - } - - /* - * __collapse_huge_page_swapin always returns with mmap_lock locked. - * If it fails, we release mmap_lock and jump out_nolock. - * Continuing to collapse causes inconsistency. - */ - if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, - pmd, referenced)) { - mmap_read_unlock(mm); - goto out_nolock; - } + LIST_HEAD(compound_pagelist); - mmap_read_unlock(mm); - /* - * Prevent all access to pagetables with the exception of - * gup_fast later handled by the ptep_clear_flush and the VM - * handled by the anon_vma lock + PG_lock. - */ - mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) - goto out_up_write; - /* check if the pmd is still valid */ - if (mm_find_pmd(mm, address) != pmd) - goto out_up_write; + VM_BUG_ON(!new_page); + mmap_assert_write_locked(mm); anon_vma_lock_write(vma->anon_vma); @@ -1176,7 +1114,7 @@ static void collapse_huge_page(struct mm_struct *mm, spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); result = SCAN_FAIL; - goto out_up_write; + goto out; } /* @@ -1208,11 +1146,95 @@ static void collapse_huge_page(struct mm_struct *mm, set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); +out: + if (isolated_out) + *isolated_out = isolated; + return result; +} - *hpage = NULL; - khugepaged_pages_collapsed++; - result = SCAN_SUCCEED; +static void collapse_huge_page(struct mm_struct *mm, + unsigned long address, + struct page **hpage, + int node, int referenced, int unmapped, + int enforce_pte_scan_limits) +{ + pmd_t *pmd; + struct page *new_page; + int isolated = 0, result = 0; + struct vm_area_struct *vma; + gfp_t gfp; + + VM_BUG_ON(address & ~HPAGE_PMD_MASK); + + /* Only allocate from the target node */ + gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + + /* + * Before allocating the hugepage, release the mmap_lock read lock. + * The allocation can take potentially a long time if it involves + * sync compaction, and we do not need to hold the mmap_lock during + * that. We will recheck the vma after taking it again in write mode. + */ + mmap_read_unlock(mm); + new_page = khugepaged_alloc_page(hpage, gfp, node); + if (!new_page) { + result = SCAN_ALLOC_HUGE_PAGE_FAIL; + goto out_nolock; + } + + if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { + result = SCAN_CGROUP_CHARGE_FAIL; + goto out_nolock; + } + count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + + mmap_read_lock(mm); + result = hugepage_vma_revalidate(mm, address, &vma); + if (result) { + mmap_read_unlock(mm); + goto out_nolock; + } + + pmd = mm_find_pmd(mm, address); + if (!pmd) { + result = SCAN_PMD_NULL; + mmap_read_unlock(mm); + goto out_nolock; + } + + /* + * __collapse_huge_page_swapin always returns with mmap_lock locked. + * If it fails, we release mmap_lock and jump out_nolock. + * Continuing to collapse causes inconsistency. + */ + if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, + pmd, referenced)) { + mmap_read_unlock(mm); + goto out_nolock; + } + + mmap_read_unlock(mm); + /* + * Prevent all access to pagetables with the exception of + * gup_fast later handled by the ptep_clear_flush and the VM + * handled by the anon_vma lock + PG_lock. + */ + mmap_write_lock(mm); + + result = hugepage_vma_revalidate(mm, address, &vma); + if (result) + goto out_up_write; + /* check if the pmd is still valid */ + if (mm_find_pmd(mm, address) != pmd) + goto out_up_write; + + result = __do_collapse_huge_page(mm, vma, address, pmd, new_page, + enforce_pte_scan_limits, &isolated); + if (result == SCAN_SUCCEED) { + *hpage = NULL; + khugepaged_pages_collapsed++; + } out_up_write: mmap_write_unlock(mm); out_nolock: