mbox series

[v4,00/17] mm: COW fixes part 2: reliable GUP pins of anonymous pages

Message ID 20220428083441.37290-1-david@redhat.com (mailing list archive)
Headers show
Series mm: COW fixes part 2: reliable GUP pins of anonymous pages | expand

Message

David Hildenbrand April 28, 2022, 8:34 a.m. UTC
This is roughly what we have in -mm and -next, however, includes one
additional patch and some minor differences, especially minor fixes in the
patch descriptions.

v4 is located at:
	https://github.com/davidhildenbrand/linux/tree/cow_fixes_part_2_v4

Please refer to to v3 cover letter:
	https://lkml.kernel.org/r/20220329160440.193848-1-david@redhat.com


v3 -> v4:
* Minor changes/fixes in patch descriptions
* "mm/rmap: drop "compound" parameter from page_add_new_anon_rmap()"
 -> Remove VM_BUG_ON_PAGE(PageTransCompound(page), page);
* "mm/rmap: fail try_to_migrate() early when setting a PMD migration entry
   fails"
 -> Added
* "mm: support GUP-triggered unsharing of anonymous pages"
 -> unlikely(unshare) -> likely(!unshare) in wp_huge_pmd()
 -> "page_copied && !unshare" -> "(page_copied && !unshare)"
* "mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are
   exclusive when (un)pinning"
 -> Use VM_BUG_ON_PAGE() instead of VM_BUG_ON()

v2 -> v3:
* Note 1: Left the terminology "unshare" in place for now instead of
  switching to "make anon exclusive".
* Note 2: We might have to tackle undoing effects of arch_unmap_one() on
  sparc, to free some tag memory immediately instead of when tearing down
  the vma/mm; looks like this needs more care either way, so I'll ignore it
  for now.
* Rebased on top of core MM changes for v5.18-rc1 (most conflicts were due
  to folio and ZONE_DEVICE migration rework). No severe changes were
  necessary -- mostly folio conversion and code movement.
* Retested on aarch64, ppc64, s390x and x86_64
* "mm/rmap: convert RMAP flags to a proper distinct rmap_t type"
  -> Missed to convert one instance in restore_exclusive_pte()
* "mm/rmap: pass rmap flags to hugepage_add_anon_rmap()"
  -> Use "!!(flags & RMAP_EXCLUSIVE)" to avoid sparse warnings
* "mm/huge_memory: remove outdated VM_WARN_ON_ONCE_PAGE from unmap_page()"
  -> Added, as we can trigger that now more frequently
* "mm: remember exclusively mapped anonymous pages with PG_anon_exclusive"
  -> Use subpage in VM_BUG_ON_PAGE() in try_to_migrate_one()
  -> Move comment from folio_migrate_mapping() to folio_migrate_flags()
     regarding PG_anon_exclusive/PG_mappedtodisk
  -> s/int rmap_flags/rmap_t rmap_flags/ in remove_migration_pmd()
* "mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are
   exclusive when (un)pinning"
  -> Use IS_ENABLED(CONFIG_DEBUG_VM) instead of ifdef

v1 -> v2:
* Tested on aarch64, ppc64, s390x and x86_64
* "mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon()
   pages"
  -> Use PG_mappedtodisk instead of PG_slab (thanks Willy!), this simlifies
     the patch and necessary handling a lot. Add safety BUG_ON's
  -> Move most documentation to the patch description, to be placed in a
     proper documentation doc in the future, once everything's in place
* ""mm: remember exclusively mapped anonymous pages with PG_anon_exclusive
  -> Skip check+clearing in page_try_dup_anon_rmap(), otherwise we might
     trigger a wrong VM_BUG_ON() for KSM pages in ClearPageAnonExclusive()
  -> In __split_huge_pmd_locked(), call page_try_share_anon_rmap() only
     for "anon_exclusive", otherwise we might trigger a wrong VM_BUG_ON()
  -> In __split_huge_page_tail(), drop any remaining PG_anon_exclusive on
     tail pages, and document why that is fine

RFC -> v1:
* Rephrased/extended some patch descriptions+comments
* Tested on aarch64, ppc64 and x86_64
* "mm/rmap: convert RMAP flags to a proper distinct rmap_t type"
 -> Added
* "mm/rmap: drop "compound" parameter from page_add_new_anon_rmap()"
 -> Added
* "mm: remember exclusively mapped anonymous pages with PG_anon_exclusive"
 -> Fixed __do_huge_pmd_anonymous_page() to recheck after temporarily
    dropping the PT lock.
 -> Use "reuse" label in __do_huge_pmd_anonymous_page()
 -> Slightly simplify logic in hugetlb_cow()
 -> In remove_migration_pte(), remove unrelated changes around
    page_remove_rmap()
* "mm: support GUP-triggered unsharing of anonymous pages"
 -> In handle_pte_fault(), trigger pte_mkdirty() only with
    FAULT_FLAG_WRITE
 -> In __handle_mm_fault(), extend comment regarding anonymous PUDs
* "mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared
   anonymous page"
   -> Added unsharing logic to gup_hugepte() and gup_huge_pud()
   -> Changed return logic in __follow_hugetlb_must_fault(), making sure
      that "unshare" is always set
* "mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are
   exclusive when (un)pinning"
  -> Slightly simplified sanity_check_pinned_pages()


David Hildenbrand (17):
  mm/rmap: fix missing swap_free() in try_to_unmap() after
    arch_unmap_one() failed
  mm/hugetlb: take src_mm->write_protect_seq in
    copy_hugetlb_page_range()
  mm/memory: slightly simplify copy_present_pte()
  mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and
    page_try_dup_anon_rmap()
  mm/rmap: convert RMAP flags to a proper distinct rmap_t type
  mm/rmap: remove do_page_add_anon_rmap()
  mm/rmap: pass rmap flags to hugepage_add_anon_rmap()
  mm/rmap: drop "compound" parameter from page_add_new_anon_rmap()
  mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon()
    page exclusively
  mm/huge_memory: remove outdated VM_WARN_ON_ONCE_PAGE from unmap_page()
  mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for
    PageAnon() pages
  mm: remember exclusively mapped anonymous pages with PG_anon_exclusive
  mm/rmap: fail try_to_migrate() early when setting a PMD migration
    entry fails
  mm/gup: disallow follow_page(FOLL_PIN)
  mm: support GUP-triggered unsharing of anonymous pages
  mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared
    anonymous page
  mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are
    exclusive when (un)pinning

 include/linux/mm.h         |  46 +++++++-
 include/linux/mm_types.h   |   8 ++
 include/linux/page-flags.h |  39 ++++++-
 include/linux/rmap.h       | 118 +++++++++++++++++--
 include/linux/swap.h       |  15 ++-
 include/linux/swapops.h    |  29 ++++-
 kernel/events/uprobes.c    |   2 +-
 mm/gup.c                   | 106 ++++++++++++++++-
 mm/huge_memory.c           | 133 ++++++++++++++++-----
 mm/hugetlb.c               | 135 ++++++++++++++-------
 mm/khugepaged.c            |   2 +-
 mm/ksm.c                   |  15 ++-
 mm/memory.c                | 234 +++++++++++++++++++++++--------------
 mm/memremap.c              |   9 ++
 mm/migrate.c               |  18 ++-
 mm/migrate_device.c        |  23 +++-
 mm/mprotect.c              |   8 +-
 mm/rmap.c                  | 105 ++++++++++++-----
 mm/swapfile.c              |   8 +-
 mm/userfaultfd.c           |   2 +-
 tools/vm/page-types.c      |   8 +-
 21 files changed, 836 insertions(+), 227 deletions(-)


base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b

Comments

David Hildenbrand April 28, 2022, 8:37 a.m. UTC | #1
On 28.04.22 10:34, David Hildenbrand wrote:
> This is roughly what we have in -mm and -next, however, includes one
> additional patch and some minor differences, especially minor fixes in the
> patch descriptions.
> 
> v4 is located at:
> 	https://github.com/davidhildenbrand/linux/tree/cow_fixes_part_2_v4
> 
> Please refer to to v3 cover letter:
> 	https://lkml.kernel.org/r/20220329160440.193848-1-david@redhat.com
> 
> 

Essential diff to v3:


diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 06280fc1c99b..8b6e4cd1fab8 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -299,7 +299,7 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry)
 struct page_vma_mapped_walk;
 
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
+extern int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 		struct page *page);
 
 extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw,
@@ -332,7 +332,7 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
 	return !pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
 }
 #else
-static inline void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
+static inline int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 		struct page *page)
 {
 	BUILD_BUG();
diff --git a/mm/gup.c b/mm/gup.c
index 5c17d4816441..46ffd8c51c6e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -564,8 +564,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		goto out;
 	}
 
-	VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) &&
-		  !PageAnonExclusive(page));
+	VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
+		       !PageAnonExclusive(page), page);
 
 	/* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
 	if (unlikely(!try_grab_page(page, flags))) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c7ac1b462543..a2f44d8d3d47 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1392,8 +1392,8 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 	if (!pmd_write(*pmd) && gup_must_unshare(flags, page))
 		return ERR_PTR(-EMLINK);
 
-	VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) &&
-		  !PageAnonExclusive(page));
+	VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
+			!PageAnonExclusive(page), page);
 
 	if (!try_grab_page(page, flags))
 		return ERR_PTR(-ENOMEM);
@@ -3080,7 +3080,7 @@ late_initcall(split_huge_pages_debugfs);
 #endif
 
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
+int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 		struct page *page)
 {
 	struct vm_area_struct *vma = pvmw->vma;
@@ -3092,7 +3092,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 	pmd_t pmdswp;
 
 	if (!(pvmw->pmd && !pvmw->pte))
-		return;
+		return 0;
 
 	flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
 	pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
@@ -3100,7 +3100,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 	anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
 	if (anon_exclusive && page_try_share_anon_rmap(page)) {
 		set_pmd_at(mm, address, pvmw->pmd, pmdval);
-		return;
+		return -EBUSY;
 	}
 
 	if (pmd_dirty(pmdval))
@@ -3118,6 +3118,8 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 	page_remove_rmap(page, vma, true);
 	put_page(page);
 	trace_set_migration_pmd(address, pmd_val(pmdswp));
+
+	return 0;
 }
 
 void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ee0542f77130..534747d661dd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6100,8 +6100,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
 		page = pte_page(huge_ptep_get(pte));
 
-		VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) &&
-			  !PageAnonExclusive(page));
+		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
+			       !PageAnonExclusive(page), page);
 
 		/*
 		 * If subpage information not requested, update counters
diff --git a/mm/memory.c b/mm/memory.c
index 2046de391da2..1a25d28ee5d9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3132,7 +3132,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			free_swap_cache(old_page);
 		put_page(old_page);
 	}
-	return page_copied && !unshare ? VM_FAULT_WRITE : 0;
+	return (page_copied && !unshare) ? VM_FAULT_WRITE : 0;
 oom_free_new:
 	put_page(new_page);
 oom:
@@ -4557,7 +4557,7 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
 	const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE;
 
 	if (vma_is_anonymous(vmf->vma)) {
-		if (unlikely(unshare) &&
+		if (likely(!unshare) &&
 		    userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd))
 			return handle_userfault(vmf, VM_UFFD_WP);
 		return do_huge_pmd_wp_page(vmf);
diff --git a/mm/rmap.c b/mm/rmap.c
index 00418faaf4ce..12f54fbdb920 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1217,8 +1217,6 @@ void page_add_new_anon_rmap(struct page *page,
 
 		__mod_lruvec_page_state(page, NR_ANON_THPS, nr);
 	} else {
-		/* Anon THP always mapped first with PMD */
-		VM_BUG_ON_PAGE(PageTransCompound(page), page);
 		/* increment count (starts at -1) */
 		atomic_set(&page->_mapcount, 0);
 	}
@@ -1814,7 +1812,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 			VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) ||
 					!folio_test_pmd_mappable(folio), folio);
 
-			set_pmd_migration_entry(&pvmw, subpage);
+			if (set_pmd_migration_entry(&pvmw, subpage)) {
+				ret = false;
+				page_vma_mapped_walk_done(&pvmw);
+				break;
+			}
 			continue;
 		}
 #endif