From patchwork Thu Feb 23 08:31:56 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Yin, Fengwei" <fengwei.yin@intel.com>
X-Patchwork-Id: 13149971
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A1BCAC636D7
	for <linux-mm@archiver.kernel.org>; Thu, 23 Feb 2023 08:30:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E117A6B0073; Thu, 23 Feb 2023 03:30:33 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D6F9E6B0074; Thu, 23 Feb 2023 03:30:33 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id BE83D6B0075; Thu, 23 Feb 2023 03:30:33 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com
 [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id AC8556B0073
	for <linux-mm@kvack.org>; Thu, 23 Feb 2023 03:30:33 -0500 (EST)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 63AD2161232
	for <linux-mm@kvack.org>; Thu, 23 Feb 2023 08:30:33 +0000 (UTC)
X-FDA: 80497885146.18.DD2C544
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
	by imf06.hostedemail.com (Postfix) with ESMTP id 3A766180013
	for <linux-mm@kvack.org>; Thu, 23 Feb 2023 08:30:31 +0000 (UTC)
Authentication-Results: imf06.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=cHreMMAU;
	spf=pass (imf06.hostedemail.com: domain of fengwei.yin@intel.com designates
 134.134.136.20 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1677141031;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Dn8mW7cehFDVu3IOsyhuwut6fAtn4JvHNLQrERqw/pU=;
	b=7uyqd7jv1+1oEV+SWknrCE6nG7jQIEBZEyipOP3iJI5mDUtDGXbhBlm2U2OqIvYCZnUE8L
	hmhBBMQl8zqI52coqHpdRew1vbwzvK84AWDrFAxEYeIyRxdG9/ZSJpxQ6ST4trij3e77mS
	gmgEaIxN0DoZrMk9VzMEPbGm7/hc54A=
ARC-Authentication-Results: i=1;
	imf06.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=cHreMMAU;
	spf=pass (imf06.hostedemail.com: domain of fengwei.yin@intel.com designates
 134.134.136.20 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677141031; a=rsa-sha256;
	cv=none;
	b=nRqPFmlXW9T7bCxEoPRdVlr46jPlmWipc+tsThXvGQd60j95tlAQMV9EYfsfScG0+a1UbV
	7838yq+OqqZRW6RrdVVmkcaxl3YfTLfwhtWG3UpB1NRliC/IDI/pp5ejFd9+Y5Q4+Q9RXV
	6ydlpJZPwUkuuUgPPo+NGrpE7xzMAZ4=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677141031; x=1708677031;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=7SOYS7uGeBRWEQq00stB4VBZna3R5bbYM1myX6BaqvE=;
  b=cHreMMAUvjheK1KAeatHZPdnOb4eEUtFogbj5iuTOaQLtA0ZxMpcp7PV
   pqapXRAVjvJIORLAore6slp8rkX/quyP04pmmo4jYXw3ZBCwjsj2pap3d
   +UxgNpjcWxj7e1SH39vk+g2Zc5Bs3Xw0xzZPJ0xylh5b8WpKO/KdsLicC
   nVK2cHoE33cJpCQV3zAD76w4PAuDOMiICgAOUUFJjWEY9fjhcjAChiXNs
   teTSGJueNbA6QwtQ+WGKYjuO2T8Dxras3pJvZvFWTPk4nC04hndG6Sfqh
   X8JiILakSJlNjk/XPs54sAwBcJ0DEoemHVO+GSnXip8hLeKmnhCL1KjbC
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10629"; a="321298465"
X-IronPort-AV: E=Sophos;i="5.97,320,1669104000";
   d="scan'208";a="321298465"
Received: from fmsmga007.fm.intel.com ([10.253.24.52])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 23 Feb 2023 00:30:28 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6500,9779,10629"; a="674437889"
X-IronPort-AV: E=Sophos;i="5.97,320,1669104000";
   d="scan'208";a="674437889"
Received: from fyin-dev.sh.intel.com ([10.239.159.32])
  by fmsmga007.fm.intel.com with ESMTP; 23 Feb 2023 00:30:27 -0800
From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org,
	akpm@linux-foundation.org,
	willy@infradead.org
Cc: fengwei.yin@intel.com
Subject: [PATCH 1/5] rmap: move hugetlb try_to_unmap to dedicated function
Date: Thu, 23 Feb 2023 16:31:56 +0800
Message-Id: <20230223083200.3149015-2-fengwei.yin@intel.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20230223083200.3149015-1-fengwei.yin@intel.com>
References: <20230223083200.3149015-1-fengwei.yin@intel.com>
MIME-Version: 1.0
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 3A766180013
X-Stat-Signature: t7zucmste6jpg5fn3f6q5eryfd6cn9ef
X-Rspam-User: 
X-HE-Tag: 1677141031-601686
X-HE-Meta: 
 U2FsdGVkX197tYnJT80v3M3RknW2NKR3cREYtNXv3dhh6/8C/W1uBq9ZP2RCbO1VMOJ3YIGuk9boLf3CamsDituPEt+ypFno195BiewB6spLEjKmISteBSWnJEX6mQ1zdKb2FC0r1WA5xfwOScxBaaidcAh6rxo6HXXyjtKwzBUdsVXi5e1FIxTqqJ4F/gsGFhLIseJQaHqRYlo6URJSVTqYEa8tUgBSs/ylxQmyMn9PPJa/qyw9LQ5CSesJIeUGW8ieGgz4hysIiFKkZv1wvV3m84z3QwR7g0/YsXeI+w5VI/8aCahtuELSTQeraORCrwLxC6F28pRp/tyyli+DqzZJuuC5gD1vDaZ67yTgZC/BtbN5Sfl3+QPkd+TTGnmV0lW5Na3jb2LEgC36iMTckCLqUfG67sKFK+fCVckxg57XqH9EeeDA5ixEZl65B1Wr39hVqbmSktCeYTcXn10vwM13zN2HHDDzl3e4FbfsK91Foc4R4KeLFPL6BeoJKWYUyVilr3jXtNMJhjQ+IkpI2+xNhOpSJZNk92jfBNWMbJqaIUTmZMgrC9n8+IOVpLcvcoJQPZL6licSSyZYBJtxqOjio/XHqnBwegUhCVHymraTXvp1iS/9bHsjPENIgcLtXvLA67l0bsf94ue067Ue6Tevzy3Eb2jhmSr+ZIC8SHFMEUUGPbXmjovRfUL/viT+Ngi5yApImJrbuRTQuLcjDxE54iiczR3v/+kYHgtu8RPLL486M74LlpY+fKJKOANyHbJQUB+KzvdK22JY2NyPX/GfAhq/9VZ5Qd2S2pUFTNp9Vp/F8QMkzl2J9Tzm8CPhrUd3k9mOuwAxj9D5sL8UBJxMK15PFs4/yZyB9NgYxMtiKS9ApMvgniU8qGSsdsaB8G2F8ean26XJJEyotCe+4PZv+bdFBdJbE6igMxCGl1ngBLiTdNX1PgQ8MPhZJLxHIsh6dTnwMgjkqvBdNLI
 744znrjQ
 Hvs6GEyDmeGSncCVOsSZh1yMj2RP036wjjaWm7KuHASUOQLs3M9jZFPNBbFled+YC/XN9vy/ng8QyFwXeYRLk/Z5+muPTXw3uVX9Sb3FmBf9K15BddAuCrc/BPFQ2/Y5M0cTsvkfIdJ4PEbK1apsFSScRuu06ffwuhp/105adbBKW/B3gIoPCDGa/VYdVlc1LHzW/jlchQGM6hY4XTMvukJHVDy3K58va1+kw
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

It's to prepare the batched rmap update for large folio.
No need to looped handle hugetlb. Just handle hugetlb and
bail out early.

Almost no functional change. Just one change to mm counter
update.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/rmap.c | 205 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 126 insertions(+), 79 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 15ae24585fc4..e7aa63b800f7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1443,6 +1443,108 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
 	munlock_vma_folio(folio, vma, compound);
 }
 
+static bool try_to_unmap_one_hugetlb(struct folio *folio,
+		struct vm_area_struct *vma, struct mmu_notifier_range range,
+		struct page_vma_mapped_walk pvmw, unsigned long address,
+		enum ttu_flags flags)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t pteval;
+	bool ret = true, anon = folio_test_anon(folio);
+
+	/*
+	 * The try_to_unmap() is only passed a hugetlb page
+	 * in the case where the hugetlb page is poisoned.
+	 */
+	VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+	/*
+	 * huge_pmd_unshare may unmap an entire PMD page.
+	 * There is no way of knowing exactly which PMDs may
+	 * be cached for this mm, so we must flush them all.
+	 * start/end were already adjusted above to cover this
+	 * range.
+	 */
+	flush_cache_range(vma, range.start, range.end);
+
+	/*
+	 * To call huge_pmd_unshare, i_mmap_rwsem must be
+	 * held in write mode.  Caller needs to explicitly
+	 * do this outside rmap routines.
+	 *
+	 * We also must hold hugetlb vma_lock in write mode.
+	 * Lock order dictates acquiring vma_lock BEFORE
+	 * i_mmap_rwsem.  We can only try lock here and fail
+	 * if unsuccessful.
+	 */
+	if (!anon) {
+		VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+		if (!hugetlb_vma_trylock_write(vma)) {
+			ret = false;
+			goto out;
+		}
+		if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+			hugetlb_vma_unlock_write(vma);
+			flush_tlb_range(vma,
+					range.start, range.end);
+			mmu_notifier_invalidate_range(mm,
+					range.start, range.end);
+			/*
+			 * The ref count of the PMD page was
+			 * dropped which is part of the way map
+			 * counting is done for shared PMDs.
+			 * Return 'true' here.  When there is
+			 * no other sharing, huge_pmd_unshare
+			 * returns false and we will unmap the
+			 * actual page and drop map count
+			 * to zero.
+			 */
+			goto out;
+		}
+		hugetlb_vma_unlock_write(vma);
+	}
+	pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+
+	/*
+	 * Now the pte is cleared. If this pte was uffd-wp armed,
+	 * we may want to replace a none pte with a marker pte if
+	 * it's file-backed, so we don't lose the tracking info.
+	 */
+	pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+	/* Set the dirty flag on the folio now the pte is gone. */
+	if (pte_dirty(pteval))
+		folio_mark_dirty(folio);
+
+	/* Update high watermark before we lower rss */
+	update_hiwater_rss(mm);
+
+	if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) {
+		pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
+		set_huge_pte_at(mm, address, pvmw.pte, pteval);
+	}
+
+	/*** try_to_unmap_one() called dec_mm_counter for
+	 * (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) not
+	 * true case, looks incorrect. Change it to hugetlb_count_sub() here.
+	 */
+	hugetlb_count_sub(folio_nr_pages(folio), mm);
+
+	/*
+	 * No need to call mmu_notifier_invalidate_range() it has be
+	 * done above for all cases requiring it to happen under page
+	 * table lock before mmu_notifier_invalidate_range_end()
+	 *
+	 * See Documentation/mm/mmu_notifier.rst
+	 */
+	page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio));
+	if (vma->vm_flags & VM_LOCKED)
+		mlock_drain_local();
+	folio_put(folio);
+
+out:
+	return ret;
+}
+
 /*
  * @arg: enum ttu_flags will be passed to this argument
  */
@@ -1506,86 +1608,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			break;
 		}
 
+		address = pvmw.address;
+		if (folio_test_hugetlb(folio)) {
+			ret = try_to_unmap_one_hugetlb(folio, vma, range,
+							pvmw, address, flags);
+
+			/* no need to loop for hugetlb */
+			page_vma_mapped_walk_done(&pvmw);
+			break;
+		}
+
 		subpage = folio_page(folio,
 					pte_pfn(*pvmw.pte) - folio_pfn(folio));
-		address = pvmw.address;
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);
 
-		if (folio_test_hugetlb(folio)) {
-			bool anon = folio_test_anon(folio);
-
+		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+		/* Nuke the page table entry. */
+		if (should_defer_flush(mm, flags)) {
 			/*
-			 * The try_to_unmap() is only passed a hugetlb page
-			 * in the case where the hugetlb page is poisoned.
+			 * We clear the PTE but do not flush so potentially
+			 * a remote CPU could still be writing to the folio.
+			 * If the entry was previously clean then the
+			 * architecture must guarantee that a clear->dirty
+			 * transition on a cached TLB entry is written through
+			 * and traps if the PTE is unmapped.
 			 */
-			VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
-			/*
-			 * huge_pmd_unshare may unmap an entire PMD page.
-			 * There is no way of knowing exactly which PMDs may
-			 * be cached for this mm, so we must flush them all.
-			 * start/end were already adjusted above to cover this
-			 * range.
-			 */
-			flush_cache_range(vma, range.start, range.end);
+			pteval = ptep_get_and_clear(mm, address, pvmw.pte);
 
-			/*
-			 * To call huge_pmd_unshare, i_mmap_rwsem must be
-			 * held in write mode.  Caller needs to explicitly
-			 * do this outside rmap routines.
-			 *
-			 * We also must hold hugetlb vma_lock in write mode.
-			 * Lock order dictates acquiring vma_lock BEFORE
-			 * i_mmap_rwsem.  We can only try lock here and fail
-			 * if unsuccessful.
-			 */
-			if (!anon) {
-				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
-				if (!hugetlb_vma_trylock_write(vma)) {
-					page_vma_mapped_walk_done(&pvmw);
-					ret = false;
-					break;
-				}
-				if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
-					hugetlb_vma_unlock_write(vma);
-					flush_tlb_range(vma,
-						range.start, range.end);
-					mmu_notifier_invalidate_range(mm,
-						range.start, range.end);
-					/*
-					 * The ref count of the PMD page was
-					 * dropped which is part of the way map
-					 * counting is done for shared PMDs.
-					 * Return 'true' here.  When there is
-					 * no other sharing, huge_pmd_unshare
-					 * returns false and we will unmap the
-					 * actual page and drop map count
-					 * to zero.
-					 */
-					page_vma_mapped_walk_done(&pvmw);
-					break;
-				}
-				hugetlb_vma_unlock_write(vma);
-			}
-			pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
 		} else {
-			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-			/* Nuke the page table entry. */
-			if (should_defer_flush(mm, flags)) {
-				/*
-				 * We clear the PTE but do not flush so potentially
-				 * a remote CPU could still be writing to the folio.
-				 * If the entry was previously clean then the
-				 * architecture must guarantee that a clear->dirty
-				 * transition on a cached TLB entry is written through
-				 * and traps if the PTE is unmapped.
-				 */
-				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
-				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-			} else {
-				pteval = ptep_clear_flush(vma, address, pvmw.pte);
-			}
+			pteval = ptep_clear_flush(vma, address, pvmw.pte);
 		}
 
 		/*
@@ -1604,14 +1657,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 
 		if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
 			pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
-			if (folio_test_hugetlb(folio)) {
-				hugetlb_count_sub(folio_nr_pages(folio), mm);
-				set_huge_pte_at(mm, address, pvmw.pte, pteval);
-			} else {
-				dec_mm_counter(mm, mm_counter(&folio->page));
-				set_pte_at(mm, address, pvmw.pte, pteval);
-			}
-
+			dec_mm_counter(mm, mm_counter(&folio->page));
+			set_pte_at(mm, address, pvmw.pte, pteval);
 		} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
 			/*
 			 * The guest indicated that the page content is of no