diff mbox series

[RESEND,v6,02/16] mm/gup: Fix __get_user_pages() on fault retry of hugetlb

Message ID 20200220155353.8676-3-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm: Page fault enhancements | expand

Commit Message

Peter Xu Feb. 20, 2020, 3:53 p.m. UTC
When follow_hugetlb_page() returns with *locked==0, it means we've got
a VM_FAULT_RETRY within the fauling process and we've released the
mmap_sem.  When that happens, we should stop and bail out.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

David Hildenbrand March 2, 2020, 7:02 p.m. UTC | #1
On 20.02.20 16:53, Peter Xu wrote:
> When follow_hugetlb_page() returns with *locked==0, it means we've got
> a VM_FAULT_RETRY within the fauling process and we've released the
> mmap_sem.  When that happens, we should stop and bail out.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/gup.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 1b4411bd0042..76cb420c0fb7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -849,6 +849,16 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  				i = follow_hugetlb_page(mm, vma, pages, vmas,
>  						&start, &nr_pages, i,
>  						gup_flags, locked);
> +				if (locked && *locked == 0) {
> +					/*
> +					 * We've got a VM_FAULT_RETRY
> +					 * and we've lost mmap_sem.
> +					 * We must stop here.
> +					 */
> +					BUG_ON(gup_flags & FOLL_NOWAIT);
> +					BUG_ON(ret != 0);

Can we be sure ret is really set to != 0 at this point? At least,
reading the code this is not clear to me.

Shouldn't we set "ret = i" and assert that i is an error (e.g., EBUSY?).
Or set -EBUSY explicitly?
Peter Xu March 2, 2020, 8:07 p.m. UTC | #2
On Mon, Mar 02, 2020 at 08:02:34PM +0100, David Hildenbrand wrote:
> On 20.02.20 16:53, Peter Xu wrote:
> > When follow_hugetlb_page() returns with *locked==0, it means we've got
> > a VM_FAULT_RETRY within the fauling process and we've released the
> > mmap_sem.  When that happens, we should stop and bail out.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/gup.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 1b4411bd0042..76cb420c0fb7 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -849,6 +849,16 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> >  				i = follow_hugetlb_page(mm, vma, pages, vmas,
> >  						&start, &nr_pages, i,
> >  						gup_flags, locked);
> > +				if (locked && *locked == 0) {
> > +					/*
> > +					 * We've got a VM_FAULT_RETRY
> > +					 * and we've lost mmap_sem.
> > +					 * We must stop here.
> > +					 */
> > +					BUG_ON(gup_flags & FOLL_NOWAIT);
> > +					BUG_ON(ret != 0);
> 
> Can we be sure ret is really set to != 0 at this point? At least,
> reading the code this is not clear to me.

Here I wanted to make sure ret is zero (it's BUG_ON, not assert).

"ret" is the fallback return value only if error happens when i==0.
Here we want to make sure even if no page is pinned we'll return zero
gracefully when VM_FAULT_RETRY happened when following the hugetlb
pages.

> 
> Shouldn't we set "ret = i" and assert that i is an error (e.g., EBUSY?).
> Or set -EBUSY explicitly?

No.  Here "i" could only be either positive (when we've got some pages
pinned no matter where), or zero (when follow_hugetlb_page released
the mmap_sem on the first page that it wants to pin).  So imo "i"
should never be negative instead.

Thanks,
David Hildenbrand March 2, 2020, 8:22 p.m. UTC | #3
> Am 02.03.2020 um 21:07 schrieb Peter Xu <peterx@redhat.com>:
> 
> On Mon, Mar 02, 2020 at 08:02:34PM +0100, David Hildenbrand wrote:
>>> On 20.02.20 16:53, Peter Xu wrote:
>>> When follow_hugetlb_page() returns with *locked==0, it means we've got
>>> a VM_FAULT_RETRY within the fauling process and we've released the
>>> mmap_sem.  When that happens, we should stop and bail out.
>>> 
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>> mm/gup.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>> 
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 1b4411bd0042..76cb420c0fb7 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -849,6 +849,16 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>>                i = follow_hugetlb_page(mm, vma, pages, vmas,
>>>                        &start, &nr_pages, i,
>>>                        gup_flags, locked);
>>> +                if (locked && *locked == 0) {
>>> +                    /*
>>> +                     * We've got a VM_FAULT_RETRY
>>> +                     * and we've lost mmap_sem.
>>> +                     * We must stop here.
>>> +                     */
>>> +                    BUG_ON(gup_flags & FOLL_NOWAIT);
>>> +                    BUG_ON(ret != 0);
>> 
>> Can we be sure ret is really set to != 0 at this point? At least,
>> reading the code this is not clear to me.
> 
> Here I wanted to make sure ret is zero (it's BUG_ON, not assert).

Sorry, I completely misread that BUG_ON for whatever reason, maybe I was staring for too long into my computer screen :)

> 
> "ret" is the fallback return value only if error happens when i==0.
> Here we want to make sure even if no page is pinned we'll return zero
> gracefully when VM_FAULT_RETRY happened when following the hugetlb
> pages.

Makes sense!

> 
>> 
>> Shouldn't we set "ret = i" and assert that i is an error (e.g., EBUSY?).
>> Or set -EBUSY explicitly?
> 
> No.  Here "i" could only be either positive (when we've got some pages
> pinned no matter where), or zero (when follow_hugetlb_page released
> the mmap_sem on the first page that it wants to pin).  So imo "i"
> should never be negative instead.

I briefly scanned the function and spotted some errors being returned, that‘s why I was wondering.

Thanks!
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 1b4411bd0042..76cb420c0fb7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -849,6 +849,16 @@  static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 				i = follow_hugetlb_page(mm, vma, pages, vmas,
 						&start, &nr_pages, i,
 						gup_flags, locked);
+				if (locked && *locked == 0) {
+					/*
+					 * We've got a VM_FAULT_RETRY
+					 * and we've lost mmap_sem.
+					 * We must stop here.
+					 */
+					BUG_ON(gup_flags & FOLL_NOWAIT);
+					BUG_ON(ret != 0);
+					goto out;
+				}
 				continue;
 			}
 		}