diff mbox series

[3/3] hugetlbfs: remove unnecessary code after i_mmap_rwsem synchronization

Message ID 20181203200850.6460-4-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series hugetlbfs: use i_mmap_rwsem for better synchronization | expand

Commit Message

Mike Kravetz Dec. 3, 2018, 8:08 p.m. UTC
After expanding i_mmap_rwsem use for better shared pmd and page fault/
truncation synchronization, remove code that is no longer necessary.

Cc: <stable@vger.kernel.org>
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 46 +++++++++++++++-----------------------------
 mm/hugetlb.c         | 21 ++++++++++----------
 2 files changed, 25 insertions(+), 42 deletions(-)

Comments

Sasha Levin Dec. 4, 2018, 1:54 p.m. UTC | #1
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ebed4bfc8da8 [PATCH] hugetlb: fix absurd HugePages_Rsvd.

The bot has tested the following trees: v4.19.6, v4.14.85, v4.9.142, v4.4.166, v3.18.128.

v4.19.6: Build OK!
v4.14.85: Failed to apply! Possible dependencies:
    285b8dcaacfc ("mm, hugetlbfs: pass fault address to no page handler")
    786e86da629d ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")

v4.9.142: Failed to apply! Possible dependencies:
    1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
    29f3ad7d8380 ("fs: Provide function to unmap metadata for a range of blocks")
    334fd34d76f2 ("vfs: Add page_cache_seek_hole_data helper")
    369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
    7868a2087ec1 ("mm/hugetlb: add size parameter to huge_pte_offset()")
    786e86da629d ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
    7fc9e4722435 ("fs: Introduce filemap_range_has_page()")
    82b0f8c39a38 ("mm: join struct fault_env and vm_fault")
    8bea80520750 ("mm/hugetlb.c: use huge_pte_lock instead of opencoding the lock")
    953c66c2b22a ("mm: THP page cache support for ppc64")
    d72dc8a25afc ("mm: make pagevec_lookup() update index")
    fd60775aea80 ("mm, thp: avoid unlikely branches for split_huge_pmd")

v4.4.166: Failed to apply! Possible dependencies:
    0070e28d97e7 ("radix_tree: loop based on shift count, not height")
    00f47b581105 ("radix-tree: rewrite radix_tree_tag_clear")
    0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations")
    1366c37ed84b ("radix tree test harness")
    29f3ad7d8380 ("fs: Provide function to unmap metadata for a range of blocks")
    334fd34d76f2 ("vfs: Add page_cache_seek_hole_data helper")
    339e6353046d ("radix_tree: tag all internal tree nodes as indirect pointers")
    4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
    52db400fcd50 ("pmem, dax: clean up clear_pmem()")
    72e2936c04f7 ("mm: remove unnecessary condition in remove_inode_hugepages")
    7fc9e4722435 ("fs: Introduce filemap_range_has_page()")
    83929372f629 ("filemap: prepare find and delete operations for huge pages")
    ac401cc78242 ("dax: New fault locking")
    b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()")
    d604c324524b ("radix-tree: introduce radix_tree_replace_clear_tags()")
    d72dc8a25afc ("mm: make pagevec_lookup() update index")
    e4b274915863 ("DAX: move RADIX_DAX_ definitions to dax.c")
    e61452365372 ("radix_tree: add support for multi-order entries")
    f9fe48bece3a ("dax: support dirty DAX entries in radix tree")

v3.18.128: Failed to apply! Possible dependencies:
    1817889e3b2c ("mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes")
    1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
    1dd308a7b49d ("mm/hugetlb: document the reserve map/region tracking routines")
    5e9113731a3c ("mm/hugetlb: add cache of descriptors to resv_map for region_add")
    83cde9e8ba95 ("mm: use new helper functions around the i_mmap_mutex")
    b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
    c672c7f29f2f ("mm/hugetlb: expose hugetlb fault mutex for use by fallocate")
    cf3ad20bfead ("mm/hugetlb: compute/return the number of regions added by region_add()")
    feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries")


How should we proceed with this patch?

--
Thanks,
Sasha
Aneesh Kumar K.V Dec. 17, 2018, 10:34 a.m. UTC | #2
On 12/4/18 1:38 AM, Mike Kravetz wrote:
> After expanding i_mmap_rwsem use for better shared pmd and page fault/
> truncation synchronization, remove code that is no longer necessary.
> 
> Cc: <stable@vger.kernel.org>
> Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   fs/hugetlbfs/inode.c | 46 +++++++++++++++-----------------------------
>   mm/hugetlb.c         | 21 ++++++++++----------
>   2 files changed, 25 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 3244147fc42b..a9c00c6ef80d 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
>    * truncation is indicated by end of range being LLONG_MAX
>    *	In this case, we first scan the range and release found pages.
>    *	After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
> - *	maps and global counts.  Page faults can not race with truncation
> - *	in this routine.  hugetlb_no_page() prevents page faults in the
> - *	truncated range.  It checks i_size before allocation, and again after
> - *	with the page table lock for the page held.  The same lock must be
> - *	acquired to unmap a page.
> + *	maps and global counts.
>    * hole punch is indicated if end is not LLONG_MAX
>    *	In the hole punch case we scan the range and release found pages.
>    *	Only when releasing a page is the associated region/reserv map
>    *	deleted.  The region/reserv map for ranges without associated
> - *	pages are not modified.  Page faults can race with hole punch.
> - *	This is indicated if we find a mapped page.
> + *	pages are not modified.
> + *
> + * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
> + * races with page faults.

Should this patch be merged to the previous one? Because the changes to 
callers are done in the previous patch.


> + *
>    * Note: If the passed end of range value is beyond the end of file, but
>    * not LLONG_MAX this routine still performs a hole punch operation.
>    */
> @@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> 
>   		for (i = 0; i < pagevec_count(&pvec); ++i) {
> 
-aneesh
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3244147fc42b..a9c00c6ef80d 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -383,17 +383,16 @@  hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
  * truncation is indicated by end of range being LLONG_MAX
  *	In this case, we first scan the range and release found pages.
  *	After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
- *	maps and global counts.  Page faults can not race with truncation
- *	in this routine.  hugetlb_no_page() prevents page faults in the
- *	truncated range.  It checks i_size before allocation, and again after
- *	with the page table lock for the page held.  The same lock must be
- *	acquired to unmap a page.
+ *	maps and global counts.
  * hole punch is indicated if end is not LLONG_MAX
  *	In the hole punch case we scan the range and release found pages.
  *	Only when releasing a page is the associated region/reserv map
  *	deleted.  The region/reserv map for ranges without associated
- *	pages are not modified.  Page faults can race with hole punch.
- *	This is indicated if we find a mapped page.
+ *	pages are not modified.
+ *
+ * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
+ * races with page faults.
+ *
  * Note: If the passed end of range value is beyond the end of file, but
  * not LLONG_MAX this routine still performs a hole punch operation.
  */
@@ -423,32 +422,14 @@  static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 
 		for (i = 0; i < pagevec_count(&pvec); ++i) {
 			struct page *page = pvec.pages[i];
-			u32 hash;
 
 			index = page->index;
-			hash = hugetlb_fault_mutex_hash(h, current->mm,
-							&pseudo_vma,
-							mapping, index, 0);
-			mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
 			/*
-			 * If page is mapped, it was faulted in after being
-			 * unmapped in caller.  Unmap (again) now after taking
-			 * the fault mutex.  The mutex will prevent faults
-			 * until we finish removing the page.
-			 *
-			 * This race can only happen in the hole punch case.
-			 * Getting here in a truncate operation is a bug.
+			 * A mapped page is impossible as callers should unmap
+			 * all references before calling.  And, i_mmap_rwsem
+			 * prevents the creation of additional mappings.
 			 */
-			if (unlikely(page_mapped(page))) {
-				BUG_ON(truncate_op);
-
-				i_mmap_lock_write(mapping);
-				hugetlb_vmdelete_list(&mapping->i_mmap,
-					index * pages_per_huge_page(h),
-					(index + 1) * pages_per_huge_page(h));
-				i_mmap_unlock_write(mapping);
-			}
+			VM_BUG_ON(page_mapped(page));
 
 			lock_page(page);
 			/*
@@ -470,7 +451,6 @@  static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 			}
 
 			unlock_page(page);
-			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 		}
 		huge_pagevec_release(&pvec);
 		cond_resched();
@@ -624,7 +604,11 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		/* addr is the offset within the file (zero based) */
 		addr = index * hpage_size;
 
-		/* mutex taken here, fault path and hole punch */
+		/*
+		 * fault mutex taken here, protects against fault path
+		 * and hole punch.  inode_lock previously taken protects
+		 * against truncation.
+		 */
 		hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
 						index, addr);
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 362601b69c56..89e1a253a40b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3760,16 +3760,16 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	}
 
 	/*
-	 * Use page lock to guard against racing truncation
-	 * before we get page_table_lock.
+	 * We can not race with truncation due to holding i_mmap_rwsem.
+	 * Check once here for faults beyond end of file.
 	 */
+	size = i_size_read(mapping->host) >> huge_page_shift(h);
+	if (idx >= size)
+		goto out;
+
 retry:
 	page = find_lock_page(mapping, idx);
 	if (!page) {
-		size = i_size_read(mapping->host) >> huge_page_shift(h);
-		if (idx >= size)
-			goto out;
-
 		/*
 		 * Check for page in userfault range
 		 */
@@ -3859,9 +3859,6 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	}
 
 	ptl = huge_pte_lock(h, mm, ptep);
-	size = i_size_read(mapping->host) >> huge_page_shift(h);
-	if (idx >= size)
-		goto backout;
 
 	ret = 0;
 	if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -3964,8 +3961,10 @@  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	/*
 	 * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
-	 * until finished with ptep.  This prevents huge_pmd_unshare from
-	 * being called elsewhere and making the ptep no longer valid.
+	 * until finished with ptep.  This serves two purposes:
+	 * 1) It prevents huge_pmd_unshare from being called elsewhere
+	 *    and making the ptep no longer valid.
+	 * 2) It synchronizes us with file truncation.
 	 *
 	 * ptep could have already be assigned via huge_pte_offset.  That
 	 * is OK, as huge_pte_alloc will return the same value unless