From patchwork Tue Feb 28 12:23:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yin, Fengwei" X-Patchwork-Id: 13154852 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BD90C7EE2E for ; Tue, 28 Feb 2023 12:22:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1244E6B0075; Tue, 28 Feb 2023 07:22:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D5696B0078; Tue, 28 Feb 2023 07:22:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB6996B007B; Tue, 28 Feb 2023 07:22:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DC9126B0075 for ; Tue, 28 Feb 2023 07:22:27 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9F94A40BA0 for ; Tue, 28 Feb 2023 12:22:27 +0000 (UTC) X-FDA: 80516613534.20.A9EBE80 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf14.hostedemail.com (Postfix) with ESMTP id 956A3100002 for ; Tue, 28 Feb 2023 12:22:25 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dV25LRn8; spf=pass (imf14.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677586945; a=rsa-sha256; cv=none; b=M810BS+sHOU/KyOgbxBWPsgRHDUeU4VS7+YHpbphU2tDt6uWce6CZkx51EQ/wU2dDZndaD 2CjuChggiE9DktDvvQejWPzgGEfUmjRlHUtNQUfq/xbFQKoaa7T+tg/Oln7meEnXmlaEO9 qR9Iy54Ib05TWFqHw06vu839ettRKcs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dV25LRn8; spf=pass (imf14.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677586945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YNyikPF6tepsvMyyPv5ldrqa2xsKoUNY77UgEhRDE/w=; b=0Y0vfcv52cV6c2u8srOdaKEcPger5ryTZWUdzQ3YDtDctdDsUIvJlnfLYw0Ge8pI9TXg9Z 87XeX0f7B0akrvQfieHI+ICHujCG2lYyESreX/dqTfxM1haXyJaP9x8qaaZNTWyin6ZR9Y RrdW6bp9pzsFmF6uJv3CBC713gHbdi4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677586945; x=1709122945; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MoyDh6JaXdjTDJC8LTe5N7dBQ1nA5a+PQJq7FFmWwto=; b=dV25LRn89fysuvwHrJQN+AM1dRWvgMyRbf2b1dmAuvKfbI2ZFZ3lX0vI SOVqNsAM/dNvl2+p/R8MZMogvCCoC/8+lG7PWxxVvGrH8/9Vz2WK/hKn1 SQLoHav3ApM65E94U5gEFqOY9q0b6/N0Rdd+2y61krUiSIUUcWCRXB7+9 3OKWRxEUmv7MJ0SC328kzjUCPCMHd8N0E9qumCx73s8tkSlanLVLlJU26 pgzU+1yIrOZQUiwuPJ68fx+FStPBFOxLrR1MSv5o/AF8nSQjChrx50nUu z+RNdRc2hKwmn/TdmZDbR+Kcz1STEX3EOZ4XVqhUqyyaweqdb6I39m8PK w==; X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="317921170" X-IronPort-AV: E=Sophos;i="5.98,221,1673942400"; d="scan'208";a="317921170" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2023 04:22:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="1003220777" X-IronPort-AV: E=Sophos;i="5.98,221,1673942400"; d="scan'208";a="1003220777" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga005.fm.intel.com with ESMTP; 28 Feb 2023 04:22:01 -0800 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, sidhartha.kumar@oracle.com, mike.kravetz@oracle.com, jane.chu@oracle.com, naoya.horiguchi@nec.com Cc: fengwei.yin@intel.com Subject: [PATCH v2 2/5] rmap: move page unmap operation to dedicated function Date: Tue, 28 Feb 2023 20:23:05 +0800 Message-Id: <20230228122308.2972219-3-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230228122308.2972219-1-fengwei.yin@intel.com> References: <20230228122308.2972219-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 956A3100002 X-Rspamd-Server: rspam01 X-Stat-Signature: 6ug81zssxjh8utcu4thbydmyfps68ru8 X-HE-Tag: 1677586945-501146 X-HE-Meta: U2FsdGVkX18/7ywvoqEHyIFdfkpFezO9tWXo8F5BTPFdv6hncjnMEYlTIW1pvRIsRDZ3ebbnz/D5c5PxBp98NiIfZn2dDJ9wWLVICfPdM0uuIXPb/NiIFtycHgEKRGRBLjy6rRPGta87lOr9bjSBqCqDpB4jc8N5KFKAAMWD2NnagPphJkHPEjOMJxVAzjBA3MoBs2vbq/IuJ58L1F0LDiiyyKbLw2ABYzDxozWyegbuwhBmVDlFz5GuhczQd3ZxuBuvha8LmzSHH6FKmzgCpa5vtimKC5WWFpyqFAC1WlpOdItFMmZTkWLfIY5wQ+feNVBPSqssH+V1gq2JBsJg2shw/464HCtI9HS1pSMQoY+ezRQa0uAsBvKtHiGMEMcn8JZkli411ai5ZQYvj7xmr6sH7F8sMcDpRQeCPMUn0yyc2r3bGWCn3ZSb401y/g35XpoOX4Pt3lbag2tF5EzaPF5/u2hpAOMNkAni65JETtpSSo8xKO7zUO7pOXsD2On8yklKkSrKfXf3knmJFjjLpoPNm8HYW+KlIKpllqFlzBLB9kybqSHyDgdkEKtbC2X2/xM0eA6lMSZRuxGVwIN2eziHBasMszcsAEYXfCy820+QXpGkgshE+Dvi9rz4/kKBND2QrupRzykAtbqDF6nZSi1wffGamuO9R+jWjVp2LnURZ85ZtyaiuX6lXdx9LuwmjLrkPZSOg1h8a9uGwSpvNvXurgxEetlass8CYv4B+jb+aePdyIgwTZtncy0k0fyQLAzuOZlYhOaX+lw+HJ005jPFaJ9wsD8lfz9HFCIU4KxDkKRiEDxJznwBDbSB2tpmcvdpTSh+9c0s1mvUGbo4kWDhE76t3Z3qRv6K2ix+fGAq5dmVfrmOZ4s9toFqpI2XkH4XtOu52k9atszBrfwx7RsHdW3uuGzMtabJAHqFdA6U/zJc+abLbAlSoJ+TsXzUDL59qwduf0NrTDOEiTS TqtNC2D9 uA6DpJpxv2VgVktFcjqzfjLF5YbbSPC8StCCZlDVNxvGL5RKUjywIwPKpv2jtdEOpfV+2/1mVuke06O8wliCR1rvpvTJE+jZf9ANOWE61L1jKn56gLbDIVXcnFIh8GcosuurGta7+UMUWCJ57j1W7rsbkxcK/o+5gMUIaYUljDxiMjiI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: No functional change. Just code reorganized. Signed-off-by: Yin Fengwei --- mm/rmap.c | 369 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 194 insertions(+), 175 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 0f09518d6f30..987ab402392f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1540,17 +1540,204 @@ static bool try_to_unmap_one_hugetlb(struct folio *folio, return ret; } +static bool try_to_unmap_one_page(struct folio *folio, + struct vm_area_struct *vma, struct mmu_notifier_range range, + struct page_vma_mapped_walk pvmw, unsigned long address, + enum ttu_flags flags) +{ + bool anon_exclusive, ret = true; + struct page *subpage; + struct mm_struct *mm = vma->vm_mm; + pte_t pteval; + + subpage = folio_page(folio, + pte_pfn(*pvmw.pte) - folio_pfn(folio)); + anon_exclusive = folio_test_anon(folio) && + PageAnonExclusive(subpage); + + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + if (should_defer_flush(mm, flags)) { + /* + * We clear the PTE but do not flush so potentially + * a remote CPU could still be writing to the folio. + * If the entry was previously clean then the + * architecture must guarantee that a clear->dirty + * transition on a cached TLB entry is written through + * and traps if the PTE is unmapped. + */ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); + + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } + + /* + * Now the pte is cleared. If this pte was uffd-wp armed, + * we may want to replace a none pte with a marker pte if + * it's file-backed, so we don't lose the tracking info. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + if (PageHWPoison(subpage) && !(flags & TTU_HWPOISON)) { + pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); + dec_mm_counter(mm, mm_counter(&folio->page)); + set_pte_at(mm, address, pvmw.pte, pteval); + } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + /* + * The guest indicated that the page content is of no + * interest anymore. Simply discard the pte, vmscan + * will take care of the rest. + * A future reference will then fault in a new zero + * page. When userfaultfd is active, we must not drop + * this page though, as its main user (postcopy + * migration) will not expect userfaults on already + * copied pages. + */ + dec_mm_counter(mm, mm_counter(&folio->page)); + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else if (folio_test_anon(folio)) { + swp_entry_t entry = { .val = page_private(subpage) }; + pte_t swp_pte; + /* + * Store the swap location in the pte. + * See handle_pte_fault() ... + */ + if (unlikely(folio_test_swapbacked(folio) != + folio_test_swapcache(folio))) { + WARN_ON_ONCE(1); + ret = false; + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + /* MADV_FREE page check */ + if (!folio_test_swapbacked(folio)) { + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count = folio_ref_count(folio); + map_count = folio_mapcount(folio); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + /* + * The only page refs must be one from isolation + * plus the rmap(s) (dropped by discard:). + */ + if (ref_count == 1 + map_count && + !folio_test_dirty(folio)) { + /* Invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, + address, address + PAGE_SIZE); + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; + } + + /* + * If the folio was redirtied, it cannot be + * discarded. Remap the page to page table. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + folio_set_swapbacked(folio); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + if (swap_duplicate(entry) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + swap_free(entry); + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + + /* See page_try_share_anon_rmap(): clear PTE first. */ + if (anon_exclusive && + page_try_share_anon_rmap(subpage)) { + swap_free(entry); + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + goto discard; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + dec_mm_counter(mm, MM_ANONPAGES); + inc_mm_counter(mm, MM_SWAPENTS); + swp_pte = swp_entry_to_pte(entry); + if (anon_exclusive) + swp_pte = pte_swp_mkexclusive(swp_pte); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* Invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else { + /* + * This is a locked file-backed folio, + * so it cannot be removed from the page + * cache and replaced by a new folio before + * mmu_notifier_invalidate_range_end, so no + * concurrent thread might update its page table + * to point at a new folio while a device is + * still using this folio. + * + * See Documentation/mm/mmu_notifier.rst + */ + dec_mm_counter(mm, mm_counter_file(&folio->page)); + } + +discard: + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { - struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; + bool ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1615,179 +1802,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); - anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); - - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - /* Nuke the page table entry. */ - if (should_defer_flush(mm, flags)) { - /* - * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the folio. - * If the entry was previously clean then the - * architecture must guarantee that a clear->dirty - * transition on a cached TLB entry is written through - * and traps if the PTE is unmapped. - */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); - - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); - } - - /* - * Now the pte is cleared. If this pte was uffd-wp armed, - * we may want to replace a none pte with a marker pte if - * it's file-backed, so we don't lose the tracking info. - */ - pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - - /* Update high watermark before we lower rss */ - update_hiwater_rss(mm); - - if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) { - pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - dec_mm_counter(mm, mm_counter(&folio->page)); - set_pte_at(mm, address, pvmw.pte, pteval); - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { - /* - * The guest indicated that the page content is of no - * interest anymore. Simply discard the pte, vmscan - * will take care of the rest. - * A future reference will then fault in a new zero - * page. When userfaultfd is active, we must not drop - * this page though, as its main user (postcopy - * migration) will not expect userfaults on already - * copied pages. - */ - dec_mm_counter(mm, mm_counter(&folio->page)); - /* We have to invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; - pte_t swp_pte; - /* - * Store the swap location in the pte. - * See handle_pte_fault() ... - */ - if (unlikely(folio_test_swapbacked(folio) != - folio_test_swapcache(folio))) { - WARN_ON_ONCE(1); - ret = false; - /* We have to invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - page_vma_mapped_walk_done(&pvmw); - break; - } - - /* MADV_FREE page check */ - if (!folio_test_swapbacked(folio)) { - int ref_count, map_count; - - /* - * Synchronize with gup_pte_range(): - * - clear PTE; barrier; read refcount - * - inc refcount; barrier; read PTE - */ - smp_mb(); - - ref_count = folio_ref_count(folio); - map_count = folio_mapcount(folio); - - /* - * Order reads for page refcount and dirty flag - * (see comments in __remove_mapping()). - */ - smp_rmb(); - - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - !folio_test_dirty(folio)) { - /* Invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, - address, address + PAGE_SIZE); - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - folio_set_swapbacked(folio); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - if (swap_duplicate(entry) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - swap_free(entry); - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } + ret = try_to_unmap_one_page(folio, vma, + range, pvmw, address, flags); + if (!ret) + break; - /* See page_try_share_anon_rmap(): clear PTE first. */ - if (anon_exclusive && - page_try_share_anon_rmap(subpage)) { - swap_free(entry); - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - if (list_empty(&mm->mmlist)) { - spin_lock(&mmlist_lock); - if (list_empty(&mm->mmlist)) - list_add(&mm->mmlist, &init_mm.mmlist); - spin_unlock(&mmlist_lock); - } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); - swp_pte = swp_entry_to_pte(entry); - if (anon_exclusive) - swp_pte = pte_swp_mkexclusive(swp_pte); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* Invalidate as we cleared the pte */ - mmu_notifier_invalidate_range(mm, address, - address + PAGE_SIZE); - } else { - /* - * This is a locked file-backed folio, - * so it cannot be removed from the page - * cache and replaced by a new folio before - * mmu_notifier_invalidate_range_end, so no - * concurrent thread might update its page table - * to point at a new folio while a device is - * still using this folio. - * - * See Documentation/mm/mmu_notifier.rst - */ - dec_mm_counter(mm, mm_counter_file(&folio->page)); - } -discard: /* * No need to call mmu_notifier_invalidate_range() it has be * done above for all cases requiring it to happen under page