Message ID | 1412153797-6667-3-git-send-email-aarcange@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > +static inline long __get_user_pages_locked(struct task_struct *tsk, > + struct mm_struct *mm, > + unsigned long start, > + unsigned long nr_pages, > + int write, int force, > + struct page **pages, > + struct vm_area_struct **vmas, > + int *locked, > + bool notify_drop) > +{ > + int flags = FOLL_TOUCH; > + long ret, pages_done; > + bool lock_dropped; > + > + if (locked) { > + /* if VM_FAULT_RETRY can be returned, vmas become invalid */ > + BUG_ON(vmas); > + /* check caller initialized locked */ > + BUG_ON(*locked != 1); > + } > + > + if (pages) > + flags |= FOLL_GET; > + if (write) > + flags |= FOLL_WRITE; > + if (force) > + flags |= FOLL_FORCE; > + > + pages_done = 0; > + lock_dropped = false; > + for (;;) { > + ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages, > + vmas, locked); > + if (!locked) > + /* VM_FAULT_RETRY couldn't trigger, bypass */ > + return ret; > + > + /* VM_FAULT_RETRY cannot return errors */ > + if (!*locked) { > + BUG_ON(ret < 0); > + BUG_ON(nr_pages == 1 && ret); If I understand correctly, this second BUG_ON is asserting that when __get_user_pages is asked for a single page and it is successfully gets the page, then it shouldn't have dropped the mmap_sem. If that's the case, then you could generalize this assertion to BUG_ON(nr_pages == ret); Otherwise, looks good! Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 1, 2014 at 8:51 AM, Peter Feiner <pfeiner@google.com> wrote: > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: >> +static inline long __get_user_pages_locked(struct task_struct *tsk, >> + struct mm_struct *mm, >> + unsigned long start, >> + unsigned long nr_pages, >> + int write, int force, >> + struct page **pages, >> + struct vm_area_struct **vmas, >> + int *locked, >> + bool notify_drop) >> +{ >> + int flags = FOLL_TOUCH; >> + long ret, pages_done; >> + bool lock_dropped; >> + >> + if (locked) { >> + /* if VM_FAULT_RETRY can be returned, vmas become invalid */ >> + BUG_ON(vmas); >> + /* check caller initialized locked */ >> + BUG_ON(*locked != 1); >> + } >> + >> + if (pages) >> + flags |= FOLL_GET; >> + if (write) >> + flags |= FOLL_WRITE; >> + if (force) >> + flags |= FOLL_FORCE; >> + >> + pages_done = 0; >> + lock_dropped = false; >> + for (;;) { >> + ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages, >> + vmas, locked); >> + if (!locked) >> + /* VM_FAULT_RETRY couldn't trigger, bypass */ >> + return ret; >> + >> + /* VM_FAULT_RETRY cannot return errors */ >> + if (!*locked) { >> + BUG_ON(ret < 0); >> + BUG_ON(nr_pages == 1 && ret); > > If I understand correctly, this second BUG_ON is asserting that when > __get_user_pages is asked for a single page and it is successfully gets the > page, then it shouldn't have dropped the mmap_sem. If that's the case, then > you could generalize this assertion to > > BUG_ON(nr_pages == ret); Even more strict: BUG_ON(ret >= nr_pages); Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> > > Otherwise, looks good! > > Peter
On Wed, Oct 01, 2014 at 10:06:27AM -0700, Andres Lagar-Cavilla wrote: > On Wed, Oct 1, 2014 at 8:51 AM, Peter Feiner <pfeiner@google.com> wrote: > > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > >> + /* VM_FAULT_RETRY cannot return errors */ > >> + if (!*locked) { > >> + BUG_ON(ret < 0); > >> + BUG_ON(nr_pages == 1 && ret); > > > > If I understand correctly, this second BUG_ON is asserting that when > > __get_user_pages is asked for a single page and it is successfully gets the > > page, then it shouldn't have dropped the mmap_sem. If that's the case, then > > you could generalize this assertion to > > > > BUG_ON(nr_pages == ret); Agreed. > > Even more strict: > BUG_ON(ret >= nr_pages); Agreed too, plus this should be quicker than my weaker check. Maybe some BUG_ON can be deleted later or converted to VM_BUG_ON, but initially I feel safer with the BUG_ON considering that is a slow path. > Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > +static inline long __get_user_pages_locked(struct task_struct *tsk, > + struct mm_struct *mm, > + unsigned long start, > + unsigned long nr_pages, > + int write, int force, > + struct page **pages, > + struct vm_area_struct **vmas, > + int *locked, > + bool notify_drop) You might want to consider __always_inline to make sure it does indeed get inlined and constant propagation works for @locked and @notify_drop. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > +static inline long __get_user_pages_locked(struct task_struct *tsk, > + struct mm_struct *mm, > + unsigned long start, > + unsigned long nr_pages, > + int write, int force, > + struct page **pages, > + struct vm_area_struct **vmas, > + int *locked, > + bool notify_drop) > +{ > + if (notify_drop && lock_dropped && *locked) { > + /* > + * We must let the caller know we temporarily dropped the lock > + * and so the critical section protected by it was lost. > + */ > + up_read(&mm->mmap_sem); > + *locked = 0; > + } > + return pages_done; > +} > +long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm, > + unsigned long start, unsigned long nr_pages, > + int write, int force, struct page **pages, > + int *locked) > +{ > + return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, > + pages, NULL, locked, true); > +} > +long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, > + unsigned long start, unsigned long nr_pages, > + int write, int force, struct page **pages) > +{ > + long ret; > + int locked = 1; > + down_read(&mm->mmap_sem); > + ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, > + pages, NULL, &locked, false); > + if (locked) > + up_read(&mm->mmap_sem); > + return ret; > +} > long get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > unsigned long start, unsigned long nr_pages, int write, > int force, struct page **pages, struct vm_area_struct **vmas) > { > + return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, > + pages, vmas, NULL, false); > } I'm wondering about that notify_drop parameter, what's the added benefit? If you look at these 3 callers we can do away with it, since in the second called where we have locked but !notify_drop we seem to do the exact same thing afterwards anyway. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 09, 2014 at 12:50:37PM +0200, Peter Zijlstra wrote: > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > > > +static inline long __get_user_pages_locked(struct task_struct *tsk, > > + struct mm_struct *mm, > > + unsigned long start, > > + unsigned long nr_pages, > > + int write, int force, > > + struct page **pages, > > + struct vm_area_struct **vmas, > > + int *locked, > > + bool notify_drop) > > +{ > > > + if (notify_drop && lock_dropped && *locked) { > > + /* > > + * We must let the caller know we temporarily dropped the lock > > + * and so the critical section protected by it was lost. > > + */ > > + up_read(&mm->mmap_sem); > > + *locked = 0; > > + } > > + return pages_done; > > +} > > > +long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm, > > + unsigned long start, unsigned long nr_pages, > > + int write, int force, struct page **pages, > > + int *locked) > > +{ > > + return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, > > + pages, NULL, locked, true); > > +} > > > +long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, > > + unsigned long start, unsigned long nr_pages, > > + int write, int force, struct page **pages) > > +{ > > + long ret; > > + int locked = 1; > > + down_read(&mm->mmap_sem); > > + ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, > > + pages, NULL, &locked, false); > > + if (locked) > > + up_read(&mm->mmap_sem); > > + return ret; > > +} > > > long get_user_pages(struct task_struct *tsk, struct mm_struct *mm, > > unsigned long start, unsigned long nr_pages, int write, > > int force, struct page **pages, struct vm_area_struct **vmas) > > { > > + return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, > > + pages, vmas, NULL, false); > > } > > I'm wondering about that notify_drop parameter, what's the added > benefit? If you look at these 3 callers we can do away with it, since in > the second called where we have locked but !notify_drop we seem to do The second (and third) caller pass notify_drop=false, so the notify_drop parameter is always a noop for them. They certainly could get away without it. > the exact same thing afterwards anyway. It makes a difference only to the first caller, if it wasn't for the first caller notify_drop could be dropped. The first caller does this: return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, pages, NULL, locked, true, FOLL_TOUCH); ^ notify_drop = true Without "notify_drop=true" the first caller could make its own respective caller think the lock has never been dropped, just because it is locked by the time get_user_pages_locked returned. But the caller must be made aware that the lock has been dropped during the call and in turn any "vma" it got before inside the mmap_sem critical section is now stale. That's all notify_drop achieves. Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 09, 2014 at 12:47:23PM +0200, Peter Zijlstra wrote: > On Wed, Oct 01, 2014 at 10:56:35AM +0200, Andrea Arcangeli wrote: > > +static inline long __get_user_pages_locked(struct task_struct *tsk, > > + struct mm_struct *mm, > > + unsigned long start, > > + unsigned long nr_pages, > > + int write, int force, > > + struct page **pages, > > + struct vm_area_struct **vmas, > > + int *locked, > > + bool notify_drop) > > You might want to consider __always_inline to make sure it does indeed > get inlined and constant propagation works for @locked and @notify_drop. Ok, that's included in the last patchset submit. Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/mm.h b/include/linux/mm.h index 0f4196a..8900ba9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1196,6 +1196,13 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, int write, int force, struct page **pages, struct vm_area_struct **vmas); +long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + int write, int force, struct page **pages, + int *locked); +long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + int write, int force, struct page **pages); int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages); struct kvec; diff --git a/mm/gup.c b/mm/gup.c index af7ea3e..b6d0076 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -580,6 +580,166 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, return 0; } +static inline long __get_user_pages_locked(struct task_struct *tsk, + struct mm_struct *mm, + unsigned long start, + unsigned long nr_pages, + int write, int force, + struct page **pages, + struct vm_area_struct **vmas, + int *locked, + bool notify_drop) +{ + int flags = FOLL_TOUCH; + long ret, pages_done; + bool lock_dropped; + + if (locked) { + /* if VM_FAULT_RETRY can be returned, vmas become invalid */ + BUG_ON(vmas); + /* check caller initialized locked */ + BUG_ON(*locked != 1); + } + + if (pages) + flags |= FOLL_GET; + if (write) + flags |= FOLL_WRITE; + if (force) + flags |= FOLL_FORCE; + + pages_done = 0; + lock_dropped = false; + for (;;) { + ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages, + vmas, locked); + if (!locked) + /* VM_FAULT_RETRY couldn't trigger, bypass */ + return ret; + + /* VM_FAULT_RETRY cannot return errors */ + if (!*locked) { + BUG_ON(ret < 0); + BUG_ON(nr_pages == 1 && ret); + } + + if (!pages) + /* If it's a prefault don't insist harder */ + return ret; + + if (ret > 0) { + nr_pages -= ret; + pages_done += ret; + if (!nr_pages) + break; + } + if (*locked) { + /* VM_FAULT_RETRY didn't trigger */ + if (!pages_done) + pages_done = ret; + break; + } + /* VM_FAULT_RETRY triggered, so seek to the faulting offset */ + pages += ret; + start += ret << PAGE_SHIFT; + + /* + * Repeat on the address that fired VM_FAULT_RETRY + * without FAULT_FLAG_ALLOW_RETRY but with + * FAULT_FLAG_TRIED. + */ + *locked = 1; + lock_dropped = true; + down_read(&mm->mmap_sem); + ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, + pages, NULL, NULL); + if (ret != 1) { + BUG_ON(ret > 1); + if (!pages_done) + pages_done = ret; + break; + } + nr_pages--; + pages_done++; + if (!nr_pages) + break; + pages++; + start += PAGE_SIZE; + } + if (notify_drop && lock_dropped && *locked) { + /* + * We must let the caller know we temporarily dropped the lock + * and so the critical section protected by it was lost. + */ + up_read(&mm->mmap_sem); + *locked = 0; + } + return pages_done; +} + +/* + * We can leverage the VM_FAULT_RETRY functionality in the page fault + * paths better by using either get_user_pages_locked() or + * get_user_pages_unlocked(). + * + * get_user_pages_locked() is suitable to replace the form: + * + * down_read(&mm->mmap_sem); + * do_something() + * get_user_pages(tsk, mm, ..., pages, NULL); + * up_read(&mm->mmap_sem); + * + * to: + * + * int locked = 1; + * down_read(&mm->mmap_sem); + * do_something() + * get_user_pages_locked(tsk, mm, ..., pages, &locked); + * if (locked) + * up_read(&mm->mmap_sem); + */ +long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + int write, int force, struct page **pages, + int *locked) +{ + return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, + pages, NULL, locked, true); +} +EXPORT_SYMBOL(get_user_pages_locked); + +/* + * get_user_pages_unlocked() is suitable to replace the form: + * + * down_read(&mm->mmap_sem); + * get_user_pages(tsk, mm, ..., pages, NULL); + * up_read(&mm->mmap_sem); + * + * with: + * + * get_user_pages_unlocked(tsk, mm, ..., pages); + * + * It is functionally equivalent to get_user_pages_fast so + * get_user_pages_fast should be used instead, if the two parameters + * "tsk" and "mm" are respectively equal to current and current->mm, + * or if "force" shall be set to 1 (get_user_pages_fast misses the + * "force" parameter). + */ +long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + int write, int force, struct page **pages) +{ + long ret; + int locked = 1; + down_read(&mm->mmap_sem); + ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, + pages, NULL, &locked, false); + if (locked) + up_read(&mm->mmap_sem); + return ret; +} +EXPORT_SYMBOL(get_user_pages_unlocked); + /* * get_user_pages() - pin user pages in memory * @tsk: the task_struct to use for page fault accounting, or @@ -629,22 +789,18 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, * use the correct cache flushing APIs. * * See also get_user_pages_fast, for performance critical applications. + * + * get_user_pages should be phased out in favor of + * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing + * should use get_user_pages because it cannot pass + * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault. */ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, int write, int force, struct page **pages, struct vm_area_struct **vmas) { - int flags = FOLL_TOUCH; - - if (pages) - flags |= FOLL_GET; - if (write) - flags |= FOLL_WRITE; - if (force) - flags |= FOLL_FORCE; - - return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas, - NULL); + return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force, + pages, vmas, NULL, false); } EXPORT_SYMBOL(get_user_pages); diff --git a/mm/nommu.c b/mm/nommu.c index a881d96..3918b0f 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -213,6 +213,29 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm, } EXPORT_SYMBOL(get_user_pages); +long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + int write, int force, struct page **pages, + int *locked) +{ + return get_user_pages(tsk, mm, start, nr_pages, write, force, + pages, NULL); +} +EXPORT_SYMBOL(get_user_pages_locked); + +long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, + int write, int force, struct page **pages) +{ + long ret; + down_read(&mm->mmap_sem); + ret = get_user_pages(tsk, mm, start, nr_pages, write, force, + pages, NULL); + up_read(&mm->mmap_sem); + return ret; +} +EXPORT_SYMBOL(get_user_pages_unlocked); + /** * follow_pfn - look up PFN at a user virtual address * @vma: memory mapping
We can leverage the VM_FAULT_RETRY functionality in the page fault paths better by using either get_user_pages_locked or get_user_pages_unlocked. The former allow conversion of get_user_pages invocations that will have to pass a "&locked" parameter to know if the mmap_sem was dropped during the call. Example from: down_read(&mm->mmap_sem); do_something() get_user_pages(tsk, mm, ..., pages, NULL); up_read(&mm->mmap_sem); to: int locked = 1; down_read(&mm->mmap_sem); do_something() get_user_pages_locked(tsk, mm, ..., pages, &locked); if (locked) up_read(&mm->mmap_sem); The latter is suitable only as a drop in replacement of the form: down_read(&mm->mmap_sem); get_user_pages(tsk, mm, ..., pages, NULL); up_read(&mm->mmap_sem); into: get_user_pages_unlocked(tsk, mm, ..., pages); Where tsk, mm, the intermediate "..." paramters and "pages" can be any value as before. Just the last parameter of get_user_pages (vmas) must be NULL for get_user_pages_locked|unlocked to be usable (the latter original form wouldn't have been safe anyway if vmas wasn't null, for the former we just make it explicit by dropping the parameter). If vmas is not NULL these two methods cannot be used. This patch then applies the new forms in various places, in some case also replacing it with get_user_pages_fast whenever tsk and mm are current and current->mm. get_user_pages_unlocked varies from get_user_pages_fast only if mm is not current->mm (like when get_user_pages works on some other process mm). Whenever tsk and mm matches current and current->mm get_user_pages_fast must always be used to increase performance and get the page lockless (only with irq disabled). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- include/linux/mm.h | 7 +++ mm/gup.c | 178 +++++++++++++++++++++++++++++++++++++++++++++++++---- mm/nommu.c | 23 +++++++ 3 files changed, 197 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html