Message ID | 20230627042321.1763765-8-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Per-VMA lock support for swap and userfaults | expand |
Suren Baghdasaryan <surenb@google.com> writes: > migration_entry_wait does not need VMA lock, therefore it can be > dropped before waiting. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/memory.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 5caaa4c66ea2..bdf46fdc58d6 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > entry = pte_to_swp_entry(vmf->orig_pte); > if (unlikely(non_swap_entry(entry))) { > if (is_migration_entry(entry)) { > - migration_entry_wait(vma->vm_mm, vmf->pmd, > - vmf->address); > + /* Save mm in case VMA lock is dropped */ > + struct mm_struct *mm = vma->vm_mm; > + > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > + /* > + * No need to hold VMA lock for migration. > + * WARNING: vma can't be used after this! > + */ > + vma_end_read(vma); > + ret |= VM_FAULT_COMPLETED; Doesn't this need to also set FAULT_FLAG_LOCK_DROPPED to ensure we don't call vma_end_read() again in __handle_mm_fault()? > + } > + migration_entry_wait(mm, vmf->pmd, vmf->address); > } else if (is_device_exclusive_entry(entry)) { > vmf->page = pfn_swap_entry_to_page(entry); > ret = remove_device_exclusive_entry(vmf);
On Tue, Jun 27, 2023 at 1:06 AM Alistair Popple <apopple@nvidia.com> wrote: > > > Suren Baghdasaryan <surenb@google.com> writes: > > > migration_entry_wait does not need VMA lock, therefore it can be > > dropped before waiting. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > mm/memory.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 5caaa4c66ea2..bdf46fdc58d6 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > entry = pte_to_swp_entry(vmf->orig_pte); > > if (unlikely(non_swap_entry(entry))) { > > if (is_migration_entry(entry)) { > > - migration_entry_wait(vma->vm_mm, vmf->pmd, > > - vmf->address); > > + /* Save mm in case VMA lock is dropped */ > > + struct mm_struct *mm = vma->vm_mm; > > + > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > + /* > > + * No need to hold VMA lock for migration. > > + * WARNING: vma can't be used after this! > > + */ > > + vma_end_read(vma); > > + ret |= VM_FAULT_COMPLETED; > > Doesn't this need to also set FAULT_FLAG_LOCK_DROPPED to ensure we don't > call vma_end_read() again in __handle_mm_fault()? Uh, right. Got lost during the last refactoring. Thanks for flagging! > > > + } > > + migration_entry_wait(mm, vmf->pmd, vmf->address); > > } else if (is_device_exclusive_entry(entry)) { > > vmf->page = pfn_swap_entry_to_page(entry); > > ret = remove_device_exclusive_entry(vmf); >
On Mon, Jun 26, 2023 at 09:23:20PM -0700, Suren Baghdasaryan wrote: > migration_entry_wait does not need VMA lock, therefore it can be > dropped before waiting. Hmm, I'm not sure.. Note that we're still dereferencing *vmf->pmd when waiting, while *pmd is on the page table and IIUC only be guaranteed if the vma is still there. If without both mmap / vma lock I don't see what makes sure the pgtable is always there. E.g. IIUC a race can happen where unmap() runs right after vma_end_read() below but before pmdp_get_lockless() (inside migration_entry_wait()), then pmdp_get_lockless() can read some random things if the pgtable is freed. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/memory.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 5caaa4c66ea2..bdf46fdc58d6 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > entry = pte_to_swp_entry(vmf->orig_pte); > if (unlikely(non_swap_entry(entry))) { > if (is_migration_entry(entry)) { > - migration_entry_wait(vma->vm_mm, vmf->pmd, > - vmf->address); > + /* Save mm in case VMA lock is dropped */ > + struct mm_struct *mm = vma->vm_mm; > + > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > + /* > + * No need to hold VMA lock for migration. > + * WARNING: vma can't be used after this! > + */ > + vma_end_read(vma); > + ret |= VM_FAULT_COMPLETED; > + } > + migration_entry_wait(mm, vmf->pmd, vmf->address); > } else if (is_device_exclusive_entry(entry)) { > vmf->page = pfn_swap_entry_to_page(entry); > ret = remove_device_exclusive_entry(vmf); > -- > 2.41.0.178.g377b9f9a00-goog >
On Tue, Jun 27, 2023 at 8:49 AM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Jun 26, 2023 at 09:23:20PM -0700, Suren Baghdasaryan wrote: > > migration_entry_wait does not need VMA lock, therefore it can be > > dropped before waiting. > > Hmm, I'm not sure.. > > Note that we're still dereferencing *vmf->pmd when waiting, while *pmd is > on the page table and IIUC only be guaranteed if the vma is still there. > If without both mmap / vma lock I don't see what makes sure the pgtable is > always there. E.g. IIUC a race can happen where unmap() runs right after > vma_end_read() below but before pmdp_get_lockless() (inside > migration_entry_wait()), then pmdp_get_lockless() can read some random > things if the pgtable is freed. That sounds correct. I thought ptl would keep pmd stable but there is time between vma_end_read() and spin_lock(ptl) when it can be freed from under us. I think it would work if we do vma_end_read() after spin_lock(ptl) but that requires code refactoring. I'll probably drop this optimization from the patchset for now to keep things simple and will get back to it later. > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > mm/memory.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 5caaa4c66ea2..bdf46fdc58d6 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > entry = pte_to_swp_entry(vmf->orig_pte); > > if (unlikely(non_swap_entry(entry))) { > > if (is_migration_entry(entry)) { > > - migration_entry_wait(vma->vm_mm, vmf->pmd, > > - vmf->address); > > + /* Save mm in case VMA lock is dropped */ > > + struct mm_struct *mm = vma->vm_mm; > > + > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > + /* > > + * No need to hold VMA lock for migration. > > + * WARNING: vma can't be used after this! > > + */ > > + vma_end_read(vma); > > + ret |= VM_FAULT_COMPLETED; > > + } > > + migration_entry_wait(mm, vmf->pmd, vmf->address); > > } else if (is_device_exclusive_entry(entry)) { > > vmf->page = pfn_swap_entry_to_page(entry); > > ret = remove_device_exclusive_entry(vmf); > > -- > > 2.41.0.178.g377b9f9a00-goog > > > > -- > Peter Xu >
Suren Baghdasaryan <surenb@google.com> writes: > On Tue, Jun 27, 2023 at 8:49 AM Peter Xu <peterx@redhat.com> wrote: >> >> On Mon, Jun 26, 2023 at 09:23:20PM -0700, Suren Baghdasaryan wrote: >> > migration_entry_wait does not need VMA lock, therefore it can be >> > dropped before waiting. >> >> Hmm, I'm not sure.. >> >> Note that we're still dereferencing *vmf->pmd when waiting, while *pmd is >> on the page table and IIUC only be guaranteed if the vma is still there. >> If without both mmap / vma lock I don't see what makes sure the pgtable is >> always there. E.g. IIUC a race can happen where unmap() runs right after >> vma_end_read() below but before pmdp_get_lockless() (inside >> migration_entry_wait()), then pmdp_get_lockless() can read some random >> things if the pgtable is freed. > > That sounds correct. I thought ptl would keep pmd stable but there is > time between vma_end_read() and spin_lock(ptl) when it can be freed > from under us. I think it would work if we do vma_end_read() after > spin_lock(ptl) but that requires code refactoring. I'll probably drop > this optimization from the patchset for now to keep things simple and > will get back to it later. Oh thanks Peter that's a good point. It could be made to work, but agree it's probably not worth the code refactoring at this point so I'm ok if the optimisation is dropped for now. >> >> > >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> >> > --- >> > mm/memory.c | 14 ++++++++++++-- >> > 1 file changed, 12 insertions(+), 2 deletions(-) >> > >> > diff --git a/mm/memory.c b/mm/memory.c >> > index 5caaa4c66ea2..bdf46fdc58d6 100644 >> > --- a/mm/memory.c >> > +++ b/mm/memory.c >> > @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> > entry = pte_to_swp_entry(vmf->orig_pte); >> > if (unlikely(non_swap_entry(entry))) { >> > if (is_migration_entry(entry)) { >> > - migration_entry_wait(vma->vm_mm, vmf->pmd, >> > - vmf->address); >> > + /* Save mm in case VMA lock is dropped */ >> > + struct mm_struct *mm = vma->vm_mm; >> > + >> > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { >> > + /* >> > + * No need to hold VMA lock for migration. >> > + * WARNING: vma can't be used after this! >> > + */ >> > + vma_end_read(vma); >> > + ret |= VM_FAULT_COMPLETED; >> > + } >> > + migration_entry_wait(mm, vmf->pmd, vmf->address); >> > } else if (is_device_exclusive_entry(entry)) { >> > vmf->page = pfn_swap_entry_to_page(entry); >> > ret = remove_device_exclusive_entry(vmf); >> > -- >> > 2.41.0.178.g377b9f9a00-goog >> > >> >> -- >> Peter Xu >>
diff --git a/mm/memory.c b/mm/memory.c index 5caaa4c66ea2..bdf46fdc58d6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { - migration_entry_wait(vma->vm_mm, vmf->pmd, - vmf->address); + /* Save mm in case VMA lock is dropped */ + struct mm_struct *mm = vma->vm_mm; + + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + /* + * No need to hold VMA lock for migration. + * WARNING: vma can't be used after this! + */ + vma_end_read(vma); + ret |= VM_FAULT_COMPLETED; + } + migration_entry_wait(mm, vmf->pmd, vmf->address); } else if (is_device_exclusive_entry(entry)) { vmf->page = pfn_swap_entry_to_page(entry); ret = remove_device_exclusive_entry(vmf);
migration_entry_wait does not need VMA lock, therefore it can be dropped before waiting. Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- mm/memory.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)