Message ID | 20230327174515.1811532-4-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Prevent ->map_pages from sleeping | expand |
On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote: > Prevent filesystems from doing things which sleep in their map_pages > method. This is in preparation for a pagefault path protected only > by RCU. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > Documentation/filesystems/locking.rst | 4 ++-- > mm/memory.c | 11 ++++++++--- > 2 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst > index 922886fefb7f..8a80390446ba 100644 > --- a/Documentation/filesystems/locking.rst > +++ b/Documentation/filesystems/locking.rst > @@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page) > open: yes > close: yes > fault: yes can return with page locked > -map_pages: yes > +map_pages: read > page_mkwrite: yes can return with page locked > pfn_mkwrite: yes > access: yes > @@ -661,7 +661,7 @@ locked. The VM will unlock the page. > > ->map_pages() is called when VM asks to map easy accessible pages. > Filesystem should find and map pages associated with offsets from "start_pgoff" > -till "end_pgoff". ->map_pages() is called with page table locked and must > +till "end_pgoff". ->map_pages() is called with the RCU lock held and must > not block. If it's not possible to reach a page without blocking, > filesystem should skip it. Filesystem should use set_pte_range() to setup > page table entry. Pointer to entry associated with the page is passed in > diff --git a/mm/memory.c b/mm/memory.c > index 8071bb17abf2..a7edf6d714db 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) > /* The page offset of vmf->address within the VMA. */ > pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; > pgoff_t from_pte, to_pte; > + vm_fault_t ret; > > /* The PTE offset of the start address, clamped to the VMA. */ > from_pte = max(ALIGN_DOWN(pte_off, nr_pages), > @@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) > return VM_FAULT_OOM; > } > > - return vmf->vma->vm_ops->map_pages(vmf, > - vmf->pgoff + from_pte - pte_off, > - vmf->pgoff + to_pte - pte_off); > + rcu_read_lock(); > + ret = vmf->vma->vm_ops->map_pages(vmf, > + vmf->pgoff + from_pte - pte_off, > + vmf->pgoff + to_pte - pte_off); > + rcu_read_unlock(); > + > + return ret; Doesn't this mean that the rcu_read_lock/unlock can be removed from filemap_map_pages()? i.e. all callers are now already under rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could be put in filemap_map_pages() if you are worried about callers not holding it... Otherwise it looks fine. -Dave.
On Tue, Mar 28, 2023 at 10:02:06AM +1100, Dave Chinner wrote: > On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote: > > Prevent filesystems from doing things which sleep in their map_pages > > method. This is in preparation for a pagefault path protected only > > by RCU. > > + rcu_read_lock(); > > + ret = vmf->vma->vm_ops->map_pages(vmf, > > + vmf->pgoff + from_pte - pte_off, > > + vmf->pgoff + to_pte - pte_off); > > + rcu_read_unlock(); > > + > > + return ret; > > Doesn't this mean that the rcu_read_lock/unlock can be removed from > filemap_map_pages()? i.e. all callers are now already under > rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could > be put in filemap_map_pages() if you are worried about callers not > holding it... Yes, it could now be removed. I wasn't too bothered because it's so cheap (either a noop, or an inc/dec of a CPU-local variable). I don't think we need the WARN becaause there's one embedded in the XArray code (must be holding the spinlock or the RCU read lock to iterate the XArray).
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 922886fefb7f..8a80390446ba 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page) open: yes close: yes fault: yes can return with page locked -map_pages: yes +map_pages: read page_mkwrite: yes can return with page locked pfn_mkwrite: yes access: yes @@ -661,7 +661,7 @@ locked. The VM will unlock the page. ->map_pages() is called when VM asks to map easy accessible pages. Filesystem should find and map pages associated with offsets from "start_pgoff" -till "end_pgoff". ->map_pages() is called with page table locked and must +till "end_pgoff". ->map_pages() is called with the RCU lock held and must not block. If it's not possible to reach a page without blocking, filesystem should skip it. Filesystem should use set_pte_range() to setup page table entry. Pointer to entry associated with the page is passed in diff --git a/mm/memory.c b/mm/memory.c index 8071bb17abf2..a7edf6d714db 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) /* The page offset of vmf->address within the VMA. */ pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; pgoff_t from_pte, to_pte; + vm_fault_t ret; /* The PTE offset of the start address, clamped to the VMA. */ from_pte = max(ALIGN_DOWN(pte_off, nr_pages), @@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) return VM_FAULT_OOM; } - return vmf->vma->vm_ops->map_pages(vmf, - vmf->pgoff + from_pte - pte_off, - vmf->pgoff + to_pte - pte_off); + rcu_read_lock(); + ret = vmf->vma->vm_ops->map_pages(vmf, + vmf->pgoff + from_pte - pte_off, + vmf->pgoff + to_pte - pte_off); + rcu_read_unlock(); + + return ret; } /* Return true if we should do read fault-around, false otherwise */
Prevent filesystems from doing things which sleep in their map_pages method. This is in preparation for a pagefault path protected only by RCU. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- Documentation/filesystems/locking.rst | 4 ++-- mm/memory.c | 11 ++++++++--- 2 files changed, 10 insertions(+), 5 deletions(-)