Message ID | 20230609005158.2421285-3-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Per-vma lock support for swap and userfaults | expand |
On Thu, Jun 08, 2023 at 05:51:54PM -0700, Suren Baghdasaryan wrote: > When page fault is handled under VMA lock protection, all swap page > faults are retried with mmap_lock because folio_lock_or_retry > implementation has to drop and reacquire mmap_lock if folio could > not be immediately locked. > Instead of retrying all swapped page faults, retry only when folio > locking fails. > Note that the only time do_swap_page calls synchronous swap_readpage > is when SWP_SYNCHRONOUS_IO is set, which is only set for > QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and > pmem). Therefore we don't sleep in this path, and there's no need to > drop the mmap or per-vma lock. > Drivers implementing ops->migrate_to_ram might still rely on mmap_lock, > therefore fall back to mmap_lock in this case. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/filemap.c | 6 ++++++ > mm/memory.c | 14 +++++++++----- > 2 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index b4c9bd368b7e..7cb0a3776a07 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1706,6 +1706,8 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) > * mmap_lock has been released (mmap_read_unlock(), unless flags had both > * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in > * which case mmap_lock is still held. > + * If flags had FAULT_FLAG_VMA_LOCK set, meaning the operation is performed > + * with VMA lock only, the VMA lock is still held. > * > * If neither ALLOW_RETRY nor KILLABLE are set, will always return true > * with the folio locked and the mmap_lock unperturbed. > @@ -1713,6 +1715,10 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) > bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, > unsigned int flags) > { > + /* Can't do this if not holding mmap_lock */ > + if (flags & FAULT_FLAG_VMA_LOCK) > + return false; If here what we need is the page lock, can we just conditionally release either mmap lock or vma lock depending on FAULT_FLAG_VMA_LOCK?
On Fri, Jun 09, 2023 at 04:25:42PM -0400, Peter Xu wrote: > > bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, > > unsigned int flags) > > { > > + /* Can't do this if not holding mmap_lock */ > > + if (flags & FAULT_FLAG_VMA_LOCK) > > + return false; > > If here what we need is the page lock, can we just conditionally release > either mmap lock or vma lock depending on FAULT_FLAG_VMA_LOCK? See patch 5 ...
On Fri, Jun 09, 2023 at 09:35:49PM +0100, Matthew Wilcox wrote: > On Fri, Jun 09, 2023 at 04:25:42PM -0400, Peter Xu wrote: > > > bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, > > > unsigned int flags) > > > { > > > + /* Can't do this if not holding mmap_lock */ > > > + if (flags & FAULT_FLAG_VMA_LOCK) > > > + return false; > > > > If here what we need is the page lock, can we just conditionally release > > either mmap lock or vma lock depending on FAULT_FLAG_VMA_LOCK? > > See patch 5 ... Just reaching.. :) Why not in one shot, then?
On Fri, Jun 9, 2023 at 1:45 PM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Jun 09, 2023 at 09:35:49PM +0100, Matthew Wilcox wrote: > > On Fri, Jun 09, 2023 at 04:25:42PM -0400, Peter Xu wrote: > > > > bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, > > > > unsigned int flags) > > > > { > > > > + /* Can't do this if not holding mmap_lock */ > > > > + if (flags & FAULT_FLAG_VMA_LOCK) > > > > + return false; > > > > > > If here what we need is the page lock, can we just conditionally release > > > either mmap lock or vma lock depending on FAULT_FLAG_VMA_LOCK? > > > > See patch 5 ... > > Just reaching.. :) > > Why not in one shot, then? I like small incremental changes, but I can squash them if that helps in having a complete picture. > > -- > Peter Xu >
On Fri, Jun 09, 2023 at 03:34:34PM -0700, Suren Baghdasaryan wrote: > On Fri, Jun 9, 2023 at 1:45 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Fri, Jun 09, 2023 at 09:35:49PM +0100, Matthew Wilcox wrote: > > > On Fri, Jun 09, 2023 at 04:25:42PM -0400, Peter Xu wrote: > > > > > bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, > > > > > unsigned int flags) > > > > > { > > > > > + /* Can't do this if not holding mmap_lock */ > > > > > + if (flags & FAULT_FLAG_VMA_LOCK) > > > > > + return false; > > > > > > > > If here what we need is the page lock, can we just conditionally release > > > > either mmap lock or vma lock depending on FAULT_FLAG_VMA_LOCK? > > > > > > See patch 5 ... > > > > Just reaching.. :) > > > > Why not in one shot, then? > > I like small incremental changes, but I can squash them if that helps > in having a complete picture. Yes that'll be appreciated. IMHO keeping changing semantics of FAULT_FLAG_VMA_LOCK for the folio lock function in the same small series is confusing.
On Mon, Jun 12, 2023 at 6:59 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Jun 09, 2023 at 03:34:34PM -0700, Suren Baghdasaryan wrote: > > On Fri, Jun 9, 2023 at 1:45 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Fri, Jun 09, 2023 at 09:35:49PM +0100, Matthew Wilcox wrote: > > > > On Fri, Jun 09, 2023 at 04:25:42PM -0400, Peter Xu wrote: > > > > > > bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, > > > > > > unsigned int flags) > > > > > > { > > > > > > + /* Can't do this if not holding mmap_lock */ > > > > > > + if (flags & FAULT_FLAG_VMA_LOCK) > > > > > > + return false; > > > > > > > > > > If here what we need is the page lock, can we just conditionally release > > > > > either mmap lock or vma lock depending on FAULT_FLAG_VMA_LOCK? > > > > > > > > See patch 5 ... > > > > > > Just reaching.. :) > > > > > > Why not in one shot, then? > > > > I like small incremental changes, but I can squash them if that helps > > in having a complete picture. > > Yes that'll be appreciated. IMHO keeping changing semantics of > FAULT_FLAG_VMA_LOCK for the folio lock function in the same small series is > confusing. Ack. Thanks for the feedback! > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
diff --git a/mm/filemap.c b/mm/filemap.c index b4c9bd368b7e..7cb0a3776a07 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1706,6 +1706,8 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) * mmap_lock has been released (mmap_read_unlock(), unless flags had both * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in * which case mmap_lock is still held. + * If flags had FAULT_FLAG_VMA_LOCK set, meaning the operation is performed + * with VMA lock only, the VMA lock is still held. * * If neither ALLOW_RETRY nor KILLABLE are set, will always return true * with the folio locked and the mmap_lock unperturbed. @@ -1713,6 +1715,10 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, unsigned int flags) { + /* Can't do this if not holding mmap_lock */ + if (flags & FAULT_FLAG_VMA_LOCK) + return false; + if (fault_flag_allow_retry_first(flags)) { /* * CAUTION! In this case, mmap_lock is not released diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..41f45819a923 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3711,11 +3711,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - ret = VM_FAULT_RETRY; - goto out; - } - entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { @@ -3725,6 +3720,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page = pfn_swap_entry_to_page(entry); ret = remove_device_exclusive_entry(vmf); } else if (is_device_private_entry(entry)) { + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + /* + * migrate_to_ram is not yet ready to operate + * under VMA lock. + */ + ret |= VM_FAULT_RETRY; + goto out; + } + vmf->page = pfn_swap_entry_to_page(entry); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl);
When page fault is handled under VMA lock protection, all swap page faults are retried with mmap_lock because folio_lock_or_retry implementation has to drop and reacquire mmap_lock if folio could not be immediately locked. Instead of retrying all swapped page faults, retry only when folio locking fails. Note that the only time do_swap_page calls synchronous swap_readpage is when SWP_SYNCHRONOUS_IO is set, which is only set for QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and pmem). Therefore we don't sleep in this path, and there's no need to drop the mmap or per-vma lock. Drivers implementing ops->migrate_to_ram might still rely on mmap_lock, therefore fall back to mmap_lock in this case. Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- mm/filemap.c | 6 ++++++ mm/memory.c | 14 +++++++++----- 2 files changed, 15 insertions(+), 5 deletions(-)