diff mbox series

[v2,3/3] mm: Hold the RCU read lock over calls to ->map_pages

Message ID 20230327174515.1811532-4-willy@infradead.org (mailing list archive)
State New
Headers show
Series Prevent ->map_pages from sleeping | expand

Commit Message

Matthew Wilcox March 27, 2023, 5:45 p.m. UTC
Prevent filesystems from doing things which sleep in their map_pages
method.  This is in preparation for a pagefault path protected only
by RCU.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  4 ++--
 mm/memory.c                           | 11 ++++++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

Comments

Dave Chinner March 27, 2023, 11:02 p.m. UTC | #1
On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> Prevent filesystems from doing things which sleep in their map_pages
> method.  This is in preparation for a pagefault path protected only
> by RCU.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  Documentation/filesystems/locking.rst |  4 ++--
>  mm/memory.c                           | 11 ++++++++---
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 922886fefb7f..8a80390446ba 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -645,7 +645,7 @@ ops		mmap_lock	PageLocked(page)
>  open:		yes
>  close:		yes
>  fault:		yes		can return with page locked
> -map_pages:	yes
> +map_pages:	read
>  page_mkwrite:	yes		can return with page locked
>  pfn_mkwrite:	yes
>  access:		yes
> @@ -661,7 +661,7 @@ locked. The VM will unlock the page.
>  
>  ->map_pages() is called when VM asks to map easy accessible pages.
>  Filesystem should find and map pages associated with offsets from "start_pgoff"
> -till "end_pgoff". ->map_pages() is called with page table locked and must
> +till "end_pgoff". ->map_pages() is called with the RCU lock held and must
>  not block.  If it's not possible to reach a page without blocking,
>  filesystem should skip it. Filesystem should use set_pte_range() to setup
>  page table entry. Pointer to entry associated with the page is passed in
> diff --git a/mm/memory.c b/mm/memory.c
> index 8071bb17abf2..a7edf6d714db 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>  	/* The page offset of vmf->address within the VMA. */
>  	pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
>  	pgoff_t from_pte, to_pte;
> +	vm_fault_t ret;
>  
>  	/* The PTE offset of the start address, clamped to the VMA. */
>  	from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
> @@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>  			return VM_FAULT_OOM;
>  	}
>  
> -	return vmf->vma->vm_ops->map_pages(vmf,
> -		vmf->pgoff + from_pte - pte_off,
> -		vmf->pgoff + to_pte - pte_off);
> +	rcu_read_lock();
> +	ret = vmf->vma->vm_ops->map_pages(vmf,
> +			vmf->pgoff + from_pte - pte_off,
> +			vmf->pgoff + to_pte - pte_off);
> +	rcu_read_unlock();
> +
> +	return ret;

Doesn't this mean that the rcu_read_lock/unlock can be removed from
filemap_map_pages()? i.e. all callers are now already under
rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
be put in filemap_map_pages() if you are worried about callers not
holding it...

Otherwise it looks fine.

-Dave.
Matthew Wilcox March 28, 2023, 3:45 p.m. UTC | #2
On Tue, Mar 28, 2023 at 10:02:06AM +1100, Dave Chinner wrote:
> On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> > Prevent filesystems from doing things which sleep in their map_pages
> > method.  This is in preparation for a pagefault path protected only
> > by RCU.
> > +	rcu_read_lock();
> > +	ret = vmf->vma->vm_ops->map_pages(vmf,
> > +			vmf->pgoff + from_pte - pte_off,
> > +			vmf->pgoff + to_pte - pte_off);
> > +	rcu_read_unlock();
> > +
> > +	return ret;
> 
> Doesn't this mean that the rcu_read_lock/unlock can be removed from
> filemap_map_pages()? i.e. all callers are now already under
> rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
> be put in filemap_map_pages() if you are worried about callers not
> holding it...

Yes, it could now be removed.  I wasn't too bothered because it's so
cheap (either a noop, or an inc/dec of a CPU-local variable).  I don't
think we need the WARN becaause there's one embedded in the XArray
code (must be holding the spinlock or the RCU read lock to iterate
the XArray).
diff mbox series

Patch

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 922886fefb7f..8a80390446ba 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -645,7 +645,7 @@  ops		mmap_lock	PageLocked(page)
 open:		yes
 close:		yes
 fault:		yes		can return with page locked
-map_pages:	yes
+map_pages:	read
 page_mkwrite:	yes		can return with page locked
 pfn_mkwrite:	yes
 access:		yes
@@ -661,7 +661,7 @@  locked. The VM will unlock the page.
 
 ->map_pages() is called when VM asks to map easy accessible pages.
 Filesystem should find and map pages associated with offsets from "start_pgoff"
-till "end_pgoff". ->map_pages() is called with page table locked and must
+till "end_pgoff". ->map_pages() is called with the RCU lock held and must
 not block.  If it's not possible to reach a page without blocking,
 filesystem should skip it. Filesystem should use set_pte_range() to setup
 page table entry. Pointer to entry associated with the page is passed in
diff --git a/mm/memory.c b/mm/memory.c
index 8071bb17abf2..a7edf6d714db 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4461,6 +4461,7 @@  static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	/* The page offset of vmf->address within the VMA. */
 	pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
 	pgoff_t from_pte, to_pte;
+	vm_fault_t ret;
 
 	/* The PTE offset of the start address, clamped to the VMA. */
 	from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
@@ -4476,9 +4477,13 @@  static vm_fault_t do_fault_around(struct vm_fault *vmf)
 			return VM_FAULT_OOM;
 	}
 
-	return vmf->vma->vm_ops->map_pages(vmf,
-		vmf->pgoff + from_pte - pte_off,
-		vmf->pgoff + to_pte - pte_off);
+	rcu_read_lock();
+	ret = vmf->vma->vm_ops->map_pages(vmf,
+			vmf->pgoff + from_pte - pte_off,
+			vmf->pgoff + to_pte - pte_off);
+	rcu_read_unlock();
+
+	return ret;
 }
 
 /* Return true if we should do read fault-around, false otherwise */