diff mbox series

[1/3] mm/mlock.c: convert put_page() to put_user_page*()

Message ID 20190805222019.28592-2-jhubbard@nvidia.com (mailing list archive)
State New, archived
Headers show
Series mm/: 3 more put_user_page() conversions | expand

Commit Message

john.hubbard@gmail.com Aug. 5, 2019, 10:20 p.m. UTC
From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Black <daniel@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/mlock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Michal Hocko Aug. 7, 2019, 11:01 a.m. UTC | #1
On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().

Hmm, this is an interesting code path. There seems to be a mix of pages
in the game. We get one page via follow_page_mask but then other pages
in the range are filled by __munlock_pagevec_fill and that does a direct
pte walk. Is using put_user_page correct in this case? Could you explain
why in the changelog?

> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Daniel Black <daniel@linux.ibm.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/mlock.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index a90099da4fb4..b980e6270e8a 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -345,7 +345,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  				get_page(page); /* for putback_lru_page() */
>  				__munlock_isolated_page(page);
>  				unlock_page(page);
> -				put_page(page); /* from follow_page_mask() */
> +				put_user_page(page); /* from follow_page_mask() */
>  			}
>  		}
>  	}
> @@ -467,7 +467,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
>  		if (page && !IS_ERR(page)) {
>  			if (PageTransTail(page)) {
>  				VM_BUG_ON_PAGE(PageMlocked(page), page);
> -				put_page(page); /* follow_page_mask() */
> +				put_user_page(page); /* follow_page_mask() */
>  			} else if (PageTransHuge(page)) {
>  				lock_page(page);
>  				/*
> @@ -478,7 +478,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
>  				 */
>  				page_mask = munlock_vma_page(page);
>  				unlock_page(page);
> -				put_page(page); /* follow_page_mask() */
> +				put_user_page(page); /* follow_page_mask() */
>  			} else {
>  				/*
>  				 * Non-huge pages are handled in batches via
> -- 
> 2.22.0
John Hubbard Aug. 7, 2019, 11:32 p.m. UTC | #2
On 8/7/19 4:01 AM, Michal Hocko wrote:
> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> For pages that were retained via get_user_pages*(), release those pages
>> via the new put_user_page*() routines, instead of via put_page() or
>> release_pages().
> 
> Hmm, this is an interesting code path. There seems to be a mix of pages
> in the game. We get one page via follow_page_mask but then other pages
> in the range are filled by __munlock_pagevec_fill and that does a direct
> pte walk. Is using put_user_page correct in this case? Could you explain
> why in the changelog?
> 

Actually, I think follow_page_mask() gets all the pages, right? And the
get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() 
later.

But I still think I mighthave missed an error case, because the pvec_putback
in __munlock_pagevec() is never doing put_user_page() on the put-backed pages.

Let me sort through this one more time and maybe I'll need to actually
change the code. And either way, comments and changelog will need some notes, 
agreed.

thanks,
Michal Hocko Aug. 8, 2019, 6:21 a.m. UTC | #3
On Wed 07-08-19 16:32:08, John Hubbard wrote:
> On 8/7/19 4:01 AM, Michal Hocko wrote:
> > On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
> >> From: John Hubbard <jhubbard@nvidia.com>
> >>
> >> For pages that were retained via get_user_pages*(), release those pages
> >> via the new put_user_page*() routines, instead of via put_page() or
> >> release_pages().
> > 
> > Hmm, this is an interesting code path. There seems to be a mix of pages
> > in the game. We get one page via follow_page_mask but then other pages
> > in the range are filled by __munlock_pagevec_fill and that does a direct
> > pte walk. Is using put_user_page correct in this case? Could you explain
> > why in the changelog?
> > 
> 
> Actually, I think follow_page_mask() gets all the pages, right? And the
> get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() 
> later.

Maybe I am misreading the code (looking at Linus tree) but munlock_vma_pages_range
calls follow_page for the start address and then if not THP tries to
fill up the pagevec with few more pages (up to end), do the shortcut
via manual pte walk as an optimization and use generic get_page there.
Vlastimil Babka Aug. 8, 2019, 11:09 a.m. UTC | #4
On 8/8/19 8:21 AM, Michal Hocko wrote:
> On Wed 07-08-19 16:32:08, John Hubbard wrote:
>> On 8/7/19 4:01 AM, Michal Hocko wrote:
>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>
>>>> For pages that were retained via get_user_pages*(), release those pages
>>>> via the new put_user_page*() routines, instead of via put_page() or
>>>> release_pages().
>>>
>>> Hmm, this is an interesting code path. There seems to be a mix of pages
>>> in the game. We get one page via follow_page_mask but then other pages
>>> in the range are filled by __munlock_pagevec_fill and that does a direct
>>> pte walk. Is using put_user_page correct in this case? Could you explain
>>> why in the changelog?
>>>
>>
>> Actually, I think follow_page_mask() gets all the pages, right? And the
>> get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() 
>> later.
> 
> Maybe I am misreading the code (looking at Linus tree) but munlock_vma_pages_range
> calls follow_page for the start address and then if not THP tries to
> fill up the pagevec with few more pages (up to end), do the shortcut
> via manual pte walk as an optimization and use generic get_page there.

That's true. However, I'm not sure munlocking is where the
put_user_page() machinery is intended to be used anyway? These are
short-term pins for struct page manipulation, not e.g. dirtying of page
contents. Reading commit fc1d8e7cca2d I don't think this case falls
within the reasoning there. Perhaps not all GUP users should be
converted to the planned separate GUP tracking, and instead we should
have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
John Hubbard Aug. 8, 2019, 7:20 p.m. UTC | #5
On 8/8/19 4:09 AM, Vlastimil Babka wrote:
> On 8/8/19 8:21 AM, Michal Hocko wrote:
>> On Wed 07-08-19 16:32:08, John Hubbard wrote:
>>> On 8/7/19 4:01 AM, Michal Hocko wrote:
>>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>> Actually, I think follow_page_mask() gets all the pages, right? And the
>>> get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() 
>>> later.
>>
>> Maybe I am misreading the code (looking at Linus tree) but munlock_vma_pages_range
>> calls follow_page for the start address and then if not THP tries to
>> fill up the pagevec with few more pages (up to end), do the shortcut
>> via manual pte walk as an optimization and use generic get_page there.
> 

Yes, I see it finally, thanks. :)  

> That's true. However, I'm not sure munlocking is where the
> put_user_page() machinery is intended to be used anyway? These are
> short-term pins for struct page manipulation, not e.g. dirtying of page
> contents. Reading commit fc1d8e7cca2d I don't think this case falls
> within the reasoning there. Perhaps not all GUP users should be
> converted to the planned separate GUP tracking, and instead we should
> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
>  

Interesting. So far, the approach has been to get all the gup callers to
release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
wrapper, then maybe we could leave some sites unconverted.

However, in order to do so, we would have to change things so that we have
one set of APIs (gup) that do *not* increment a pin count, and another set
(vaddr_pin_pages) that do. 

Is that where we want to go...?

I have a tracking patch that only deals with gup/pup. I could post as an RFC,
but I think it might just muddy the waters at this point, anyway it's this one:

    
https://github.com/johnhubbard/linux/commit/a0fb73ce0a39c74f0d1fb6bd9d866f660f762eae


thanks,
John Hubbard Aug. 8, 2019, 10:59 p.m. UTC | #6
On 8/8/19 12:20 PM, John Hubbard wrote:
> On 8/8/19 4:09 AM, Vlastimil Babka wrote:
>> On 8/8/19 8:21 AM, Michal Hocko wrote:
>>> On Wed 07-08-19 16:32:08, John Hubbard wrote:
>>>> On 8/7/19 4:01 AM, Michal Hocko wrote:
>>>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
>>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>> Actually, I think follow_page_mask() gets all the pages, right? And the
>>>> get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() 
>>>> later.
>>>
>>> Maybe I am misreading the code (looking at Linus tree) but munlock_vma_pages_range
>>> calls follow_page for the start address and then if not THP tries to
>>> fill up the pagevec with few more pages (up to end), do the shortcut
>>> via manual pte walk as an optimization and use generic get_page there.
>>
> 
> Yes, I see it finally, thanks. :)  
> 
>> That's true. However, I'm not sure munlocking is where the
>> put_user_page() machinery is intended to be used anyway? These are
>> short-term pins for struct page manipulation, not e.g. dirtying of page
>> contents. Reading commit fc1d8e7cca2d I don't think this case falls
>> within the reasoning there. Perhaps not all GUP users should be
>> converted to the planned separate GUP tracking, and instead we should
>> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
>>  
> 
> Interesting. So far, the approach has been to get all the gup callers to
> release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
> wrapper, then maybe we could leave some sites unconverted.
> 
> However, in order to do so, we would have to change things so that we have
> one set of APIs (gup) that do *not* increment a pin count, and another set
> (vaddr_pin_pages) that do. 
> 
> Is that where we want to go...?
> 

Oh, and meanwhile, I'm leaning toward a cheap fix: just use gup_fast() instead
of get_page(), and also fix the releasing code. So this incremental patch, on
top of the existing one, should do it:

diff --git a/mm/mlock.c b/mm/mlock.c
index b980e6270e8a..2ea272c6fee3 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -318,18 +318,14 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
                /*
                 * We won't be munlocking this page in the next phase
                 * but we still need to release the follow_page_mask()
-                * pin. We cannot do it under lru_lock however. If it's
-                * the last pin, __page_cache_release() would deadlock.
+                * pin.
                 */
-               pagevec_add(&pvec_putback, pvec->pages[i]);
+               put_user_page(pages[i]);
                pvec->pages[i] = NULL;
        }
        __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
        spin_unlock_irq(&zone->zone_pgdat->lru_lock);
 
-       /* Now we can release pins of pages that we are not munlocking */
-       pagevec_release(&pvec_putback);
-
        /* Phase 2: page munlock */
        for (i = 0; i < nr; i++) {
                struct page *page = pvec->pages[i];
@@ -394,6 +390,8 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
        start += PAGE_SIZE;
        while (start < end) {
                struct page *page = NULL;
+               int ret;
+
                pte++;
                if (pte_present(*pte))
                        page = vm_normal_page(vma, start, *pte);
@@ -411,7 +409,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
                if (PageTransCompound(page))
                        break;
 
-               get_page(page);
+               /*
+                * Use get_user_pages_fast(), instead of get_page() so that the
+                * releasing code can unconditionally call put_user_page().
+                */
+               ret = get_user_pages_fast(start, 1, 0, &page);
+               if (ret != 1)
+                       break;
                /*
                 * Increase the address that will be returned *before* the
                 * eventual break due to pvec becoming full by adding the page


thanks,
Ira Weiny Aug. 8, 2019, 11:41 p.m. UTC | #7
On Thu, Aug 08, 2019 at 03:59:15PM -0700, John Hubbard wrote:
> On 8/8/19 12:20 PM, John Hubbard wrote:
> > On 8/8/19 4:09 AM, Vlastimil Babka wrote:
> >> On 8/8/19 8:21 AM, Michal Hocko wrote:
> >>> On Wed 07-08-19 16:32:08, John Hubbard wrote:
> >>>> On 8/7/19 4:01 AM, Michal Hocko wrote:
> >>>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
> >>>>>> From: John Hubbard <jhubbard@nvidia.com>
> >>>> Actually, I think follow_page_mask() gets all the pages, right? And the
> >>>> get_page() in __munlock_pagevec_fill() is there to allow a pagevec_release() 
> >>>> later.
> >>>
> >>> Maybe I am misreading the code (looking at Linus tree) but munlock_vma_pages_range
> >>> calls follow_page for the start address and then if not THP tries to
> >>> fill up the pagevec with few more pages (up to end), do the shortcut
> >>> via manual pte walk as an optimization and use generic get_page there.
> >>
> > 
> > Yes, I see it finally, thanks. :)  
> > 
> >> That's true. However, I'm not sure munlocking is where the
> >> put_user_page() machinery is intended to be used anyway? These are
> >> short-term pins for struct page manipulation, not e.g. dirtying of page
> >> contents. Reading commit fc1d8e7cca2d I don't think this case falls
> >> within the reasoning there. Perhaps not all GUP users should be
> >> converted to the planned separate GUP tracking, and instead we should
> >> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
> >>  
> > 
> > Interesting. So far, the approach has been to get all the gup callers to
> > release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
> > wrapper, then maybe we could leave some sites unconverted.
> > 
> > However, in order to do so, we would have to change things so that we have
> > one set of APIs (gup) that do *not* increment a pin count, and another set
> > (vaddr_pin_pages) that do. 
> > 
> > Is that where we want to go...?
> > 
> 
> Oh, and meanwhile, I'm leaning toward a cheap fix: just use gup_fast() instead
> of get_page(), and also fix the releasing code. So this incremental patch, on
> top of the existing one, should do it:
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index b980e6270e8a..2ea272c6fee3 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -318,18 +318,14 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>                 /*
>                  * We won't be munlocking this page in the next phase
>                  * but we still need to release the follow_page_mask()
> -                * pin. We cannot do it under lru_lock however. If it's
> -                * the last pin, __page_cache_release() would deadlock.
> +                * pin.
>                  */
> -               pagevec_add(&pvec_putback, pvec->pages[i]);
> +               put_user_page(pages[i]);
>                 pvec->pages[i] = NULL;
>         }
>         __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>         spin_unlock_irq(&zone->zone_pgdat->lru_lock);
>  
> -       /* Now we can release pins of pages that we are not munlocking */
> -       pagevec_release(&pvec_putback);
> -

I'm not an expert but this skips a call to lru_add_drain().  Is that ok?

>         /* Phase 2: page munlock */
>         for (i = 0; i < nr; i++) {
>                 struct page *page = pvec->pages[i];
> @@ -394,6 +390,8 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>         start += PAGE_SIZE;
>         while (start < end) {
>                 struct page *page = NULL;
> +               int ret;
> +
>                 pte++;
>                 if (pte_present(*pte))
>                         page = vm_normal_page(vma, start, *pte);
> @@ -411,7 +409,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>                 if (PageTransCompound(page))
>                         break;
>  
> -               get_page(page);
> +               /*
> +                * Use get_user_pages_fast(), instead of get_page() so that the
> +                * releasing code can unconditionally call put_user_page().
> +                */
> +               ret = get_user_pages_fast(start, 1, 0, &page);
> +               if (ret != 1)
> +                       break;

I like the idea of making this a get/put pair but I'm feeling uneasy about how
this is really supposed to work.

For sure the GUP/PUP was supposed to be separate from [get|put]_page.

Ira
>                 /*
>                  * Increase the address that will be returned *before* the
>                  * eventual break due to pvec becoming full by adding the page
> 
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
John Hubbard Aug. 8, 2019, 11:57 p.m. UTC | #8
On 8/8/19 4:41 PM, Ira Weiny wrote:
> On Thu, Aug 08, 2019 at 03:59:15PM -0700, John Hubbard wrote:
>> On 8/8/19 12:20 PM, John Hubbard wrote:
>>> On 8/8/19 4:09 AM, Vlastimil Babka wrote:
>>>> On 8/8/19 8:21 AM, Michal Hocko wrote:
>>>>> On Wed 07-08-19 16:32:08, John Hubbard wrote:
>>>>>> On 8/7/19 4:01 AM, Michal Hocko wrote:
>>>>>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
...
>> Oh, and meanwhile, I'm leaning toward a cheap fix: just use gup_fast() instead
>> of get_page(), and also fix the releasing code. So this incremental patch, on
>> top of the existing one, should do it:
>>
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index b980e6270e8a..2ea272c6fee3 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -318,18 +318,14 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>                 /*
>>                  * We won't be munlocking this page in the next phase
>>                  * but we still need to release the follow_page_mask()
>> -                * pin. We cannot do it under lru_lock however. If it's
>> -                * the last pin, __page_cache_release() would deadlock.
>> +                * pin.
>>                  */
>> -               pagevec_add(&pvec_putback, pvec->pages[i]);
>> +               put_user_page(pages[i]);

correction, make that:   
                   put_user_page(pvec->pages[i]);

(This is not fully tested yet.)

>>                 pvec->pages[i] = NULL;
>>         }
>>         __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>>         spin_unlock_irq(&zone->zone_pgdat->lru_lock);
>>  
>> -       /* Now we can release pins of pages that we are not munlocking */
>> -       pagevec_release(&pvec_putback);
>> -
> 
> I'm not an expert but this skips a call to lru_add_drain().  Is that ok?

Yes: unless I'm missing something, there is no reason to go through lru_add_drain
in this case. These are gup'd pages that are not going to get any further
processing.

> 
>>         /* Phase 2: page munlock */
>>         for (i = 0; i < nr; i++) {
>>                 struct page *page = pvec->pages[i];
>> @@ -394,6 +390,8 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>>         start += PAGE_SIZE;
>>         while (start < end) {
>>                 struct page *page = NULL;
>> +               int ret;
>> +
>>                 pte++;
>>                 if (pte_present(*pte))
>>                         page = vm_normal_page(vma, start, *pte);
>> @@ -411,7 +409,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>>                 if (PageTransCompound(page))
>>                         break;
>>  
>> -               get_page(page);
>> +               /*
>> +                * Use get_user_pages_fast(), instead of get_page() so that the
>> +                * releasing code can unconditionally call put_user_page().
>> +                */
>> +               ret = get_user_pages_fast(start, 1, 0, &page);
>> +               if (ret != 1)
>> +                       break;
> 
> I like the idea of making this a get/put pair but I'm feeling uneasy about how
> this is really supposed to work.
> 
> For sure the GUP/PUP was supposed to be separate from [get|put]_page.
> 

Actually, they both take references on the page. And it is absolutely OK to call
them both on the same page.

But anyway, we're not mixing them up here. If you follow the code paths, either 
gup or follow_page_mask() is used, and then put_user_page() releases. 

So...you haven't actually pointed to a bug here, right? :)


thanks,
Vlastimil Babka Aug. 9, 2019, 8:12 a.m. UTC | #9
On 8/9/19 12:59 AM, John Hubbard wrote:
>>> That's true. However, I'm not sure munlocking is where the
>>> put_user_page() machinery is intended to be used anyway? These are
>>> short-term pins for struct page manipulation, not e.g. dirtying of page
>>> contents. Reading commit fc1d8e7cca2d I don't think this case falls
>>> within the reasoning there. Perhaps not all GUP users should be
>>> converted to the planned separate GUP tracking, and instead we should
>>> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
>>>  
>>
>> Interesting. So far, the approach has been to get all the gup callers to
>> release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
>> wrapper, then maybe we could leave some sites unconverted.
>>
>> However, in order to do so, we would have to change things so that we have
>> one set of APIs (gup) that do *not* increment a pin count, and another set
>> (vaddr_pin_pages) that do. 
>>
>> Is that where we want to go...?
>>

We already have a FOLL_LONGTERM flag, isn't that somehow related? And if
it's not exactly the same thing, perhaps a new gup flag to distinguish
which kind of pinning to use?

> Oh, and meanwhile, I'm leaning toward a cheap fix: just use gup_fast() instead
> of get_page(), and also fix the releasing code. So this incremental patch, on
> top of the existing one, should do it:
> 
...
> @@ -411,7 +409,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>                 if (PageTransCompound(page))
>                         break;
>  
> -               get_page(page);
> +               /*
> +                * Use get_user_pages_fast(), instead of get_page() so that the
> +                * releasing code can unconditionally call put_user_page().
> +                */
> +               ret = get_user_pages_fast(start, 1, 0, &page);

Um the whole reason of __munlock_pagevec_fill() was to avoid the full
page walk cost, which made a 14% difference, see 7a8010cd3627 ("mm:
munlock: manual pte walk in fast path instead of follow_page_mask()")
Replacing simple get_page() with page walk to satisfy API requirements
seems rather suboptimal to me.

> +               if (ret != 1)
> +                       break;
>                 /*
>                  * Increase the address that will be returned *before* the
>                  * eventual break due to pvec becoming full by adding the page
> 
> 
> thanks,
>
Michal Hocko Aug. 9, 2019, 8:23 a.m. UTC | #10
On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
> On 8/9/19 12:59 AM, John Hubbard wrote:
> >>> That's true. However, I'm not sure munlocking is where the
> >>> put_user_page() machinery is intended to be used anyway? These are
> >>> short-term pins for struct page manipulation, not e.g. dirtying of page
> >>> contents. Reading commit fc1d8e7cca2d I don't think this case falls
> >>> within the reasoning there. Perhaps not all GUP users should be
> >>> converted to the planned separate GUP tracking, and instead we should
> >>> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
> >>>  
> >>
> >> Interesting. So far, the approach has been to get all the gup callers to
> >> release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
> >> wrapper, then maybe we could leave some sites unconverted.
> >>
> >> However, in order to do so, we would have to change things so that we have
> >> one set of APIs (gup) that do *not* increment a pin count, and another set
> >> (vaddr_pin_pages) that do. 
> >>
> >> Is that where we want to go...?
> >>
> 
> We already have a FOLL_LONGTERM flag, isn't that somehow related? And if
> it's not exactly the same thing, perhaps a new gup flag to distinguish
> which kind of pinning to use?

Agreed. This is a shiny example how forcing all existing gup users into
the new scheme is subotimal at best. Not the mention the overal
fragility mention elsewhere. I dislike the conversion even more now.

Sorry if this was already discussed already but why the new pinning is
not bound to FOLL_LONGTERM (ideally hidden by an interface so that users
do not have to care about the flag) only?
John Hubbard Aug. 9, 2019, 9:05 a.m. UTC | #11
On 8/9/19 1:23 AM, Michal Hocko wrote:
> On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
>> On 8/9/19 12:59 AM, John Hubbard wrote:
>>>>> That's true. However, I'm not sure munlocking is where the
>>>>> put_user_page() machinery is intended to be used anyway? These are
>>>>> short-term pins for struct page manipulation, not e.g. dirtying of page
>>>>> contents. Reading commit fc1d8e7cca2d I don't think this case falls
>>>>> within the reasoning there. Perhaps not all GUP users should be
>>>>> converted to the planned separate GUP tracking, and instead we should
>>>>> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
>>>>>   
>>>>
>>>> Interesting. So far, the approach has been to get all the gup callers to
>>>> release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
>>>> wrapper, then maybe we could leave some sites unconverted.
>>>>
>>>> However, in order to do so, we would have to change things so that we have
>>>> one set of APIs (gup) that do *not* increment a pin count, and another set
>>>> (vaddr_pin_pages) that do.
>>>>
>>>> Is that where we want to go...?
>>>>
>>
>> We already have a FOLL_LONGTERM flag, isn't that somehow related? And if
>> it's not exactly the same thing, perhaps a new gup flag to distinguish
>> which kind of pinning to use?
> 
> Agreed. This is a shiny example how forcing all existing gup users into
> the new scheme is subotimal at best. Not the mention the overal
> fragility mention elsewhere. I dislike the conversion even more now.
> 
> Sorry if this was already discussed already but why the new pinning is
> not bound to FOLL_LONGTERM (ideally hidden by an interface so that users
> do not have to care about the flag) only?
> 

Oh, it's been discussed alright, but given how some of the discussions have gone,
I certainly am not surprised that there are still questions and criticisms!
Especially since I may have misunderstood some of the points, along the way.
It's been quite a merry go round. :)

Anyway, what I'm hearing now is: for gup(FOLL_LONGTERM), apply the pinned tracking.
And therefore only do put_user_page() on pages that were pinned with
FOLL_LONGTERM. For short term pins, let the locking do what it will:
things can briefly block and all will be well.

Also, that may or may not come with a wrapper function, courtesy of Jan
and Ira.

Is that about right? It's late here, but I don't immediately recall any
problems with doing it that way...

thanks,
Michal Hocko Aug. 9, 2019, 9:16 a.m. UTC | #12
On Fri 09-08-19 02:05:15, John Hubbard wrote:
> On 8/9/19 1:23 AM, Michal Hocko wrote:
> > On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
> > > On 8/9/19 12:59 AM, John Hubbard wrote:
> > > > > > That's true. However, I'm not sure munlocking is where the
> > > > > > put_user_page() machinery is intended to be used anyway? These are
> > > > > > short-term pins for struct page manipulation, not e.g. dirtying of page
> > > > > > contents. Reading commit fc1d8e7cca2d I don't think this case falls
> > > > > > within the reasoning there. Perhaps not all GUP users should be
> > > > > > converted to the planned separate GUP tracking, and instead we should
> > > > > > have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
> > > > > 
> > > > > Interesting. So far, the approach has been to get all the gup callers to
> > > > > release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
> > > > > wrapper, then maybe we could leave some sites unconverted.
> > > > > 
> > > > > However, in order to do so, we would have to change things so that we have
> > > > > one set of APIs (gup) that do *not* increment a pin count, and another set
> > > > > (vaddr_pin_pages) that do.
> > > > > 
> > > > > Is that where we want to go...?
> > > > > 
> > > 
> > > We already have a FOLL_LONGTERM flag, isn't that somehow related? And if
> > > it's not exactly the same thing, perhaps a new gup flag to distinguish
> > > which kind of pinning to use?
> > 
> > Agreed. This is a shiny example how forcing all existing gup users into
> > the new scheme is subotimal at best. Not the mention the overal
> > fragility mention elsewhere. I dislike the conversion even more now.
> > 
> > Sorry if this was already discussed already but why the new pinning is
> > not bound to FOLL_LONGTERM (ideally hidden by an interface so that users
> > do not have to care about the flag) only?
> > 
> 
> Oh, it's been discussed alright, but given how some of the discussions have gone,
> I certainly am not surprised that there are still questions and criticisms!
> Especially since I may have misunderstood some of the points, along the way.
> It's been quite a merry go round. :)

Yeah, I've tried to follow them but just gave up at some point.

> Anyway, what I'm hearing now is: for gup(FOLL_LONGTERM), apply the pinned tracking.
> And therefore only do put_user_page() on pages that were pinned with
> FOLL_LONGTERM. For short term pins, let the locking do what it will:
> things can briefly block and all will be well.
> 
> Also, that may or may not come with a wrapper function, courtesy of Jan
> and Ira.
> 
> Is that about right? It's late here, but I don't immediately recall any
> problems with doing it that way...

Yes that makes more sense to me. Whoever needs that tracking should
opt-in for it. Otherwise you just risk problems like the one discussed
in the mlock path (because we do a strange stuff in the name of
performance) and a never ending whack a mole where new users do not
follow the new API usage and that results in all sorts of weird issues.

Thanks!
Jan Kara Aug. 9, 2019, 1:58 p.m. UTC | #13
On Fri 09-08-19 10:23:07, Michal Hocko wrote:
> On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
> > On 8/9/19 12:59 AM, John Hubbard wrote:
> > >>> That's true. However, I'm not sure munlocking is where the
> > >>> put_user_page() machinery is intended to be used anyway? These are
> > >>> short-term pins for struct page manipulation, not e.g. dirtying of page
> > >>> contents. Reading commit fc1d8e7cca2d I don't think this case falls
> > >>> within the reasoning there. Perhaps not all GUP users should be
> > >>> converted to the planned separate GUP tracking, and instead we should
> > >>> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
> > >>>  
> > >>
> > >> Interesting. So far, the approach has been to get all the gup callers to
> > >> release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
> > >> wrapper, then maybe we could leave some sites unconverted.
> > >>
> > >> However, in order to do so, we would have to change things so that we have
> > >> one set of APIs (gup) that do *not* increment a pin count, and another set
> > >> (vaddr_pin_pages) that do. 
> > >>
> > >> Is that where we want to go...?
> > >>
> > 
> > We already have a FOLL_LONGTERM flag, isn't that somehow related? And if
> > it's not exactly the same thing, perhaps a new gup flag to distinguish
> > which kind of pinning to use?
> 
> Agreed. This is a shiny example how forcing all existing gup users into
> the new scheme is subotimal at best. Not the mention the overal
> fragility mention elsewhere. I dislike the conversion even more now.
> 
> Sorry if this was already discussed already but why the new pinning is
> not bound to FOLL_LONGTERM (ideally hidden by an interface so that users
> do not have to care about the flag) only?

The new tracking cannot be bound to FOLL_LONGTERM. Anything that gets page
reference and then touches page data (e.g. direct IO) needs the new kind of
tracking so that filesystem knows someone is messing with the page data.
So what John is trying to address is a different (although related) problem
to someone pinning a page for a long time.

In principle, I'm not strongly opposed to a new FOLL flag to determine
whether a pin or an ordinary page reference will be acquired at least as an
internal implementation detail inside mm/gup.c. But I would really like to
discourage new GUP users taking just page reference as the most clueless
users (drivers) usually need a pin in the sense John implements. So in
terms of API I'd strongly prefer to deprecate GUP as an API, provide
vaddr_pin_pages() for drivers to get their buffer pages pinned and then for
those few users who really know what they are doing (and who are not
interested in page contents) we can have APIs like follow_page() to get a
page reference from a virtual address.

								Honza
Michal Hocko Aug. 9, 2019, 5:52 p.m. UTC | #14
On Fri 09-08-19 15:58:13, Jan Kara wrote:
> On Fri 09-08-19 10:23:07, Michal Hocko wrote:
> > On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
> > > On 8/9/19 12:59 AM, John Hubbard wrote:
> > > >>> That's true. However, I'm not sure munlocking is where the
> > > >>> put_user_page() machinery is intended to be used anyway? These are
> > > >>> short-term pins for struct page manipulation, not e.g. dirtying of page
> > > >>> contents. Reading commit fc1d8e7cca2d I don't think this case falls
> > > >>> within the reasoning there. Perhaps not all GUP users should be
> > > >>> converted to the planned separate GUP tracking, and instead we should
> > > >>> have a GUP/follow_page_mask() variant that keeps using get_page/put_page?
> > > >>>  
> > > >>
> > > >> Interesting. So far, the approach has been to get all the gup callers to
> > > >> release via put_user_page(), but if we add in Jan's and Ira's vaddr_pin_pages()
> > > >> wrapper, then maybe we could leave some sites unconverted.
> > > >>
> > > >> However, in order to do so, we would have to change things so that we have
> > > >> one set of APIs (gup) that do *not* increment a pin count, and another set
> > > >> (vaddr_pin_pages) that do. 
> > > >>
> > > >> Is that where we want to go...?
> > > >>
> > > 
> > > We already have a FOLL_LONGTERM flag, isn't that somehow related? And if
> > > it's not exactly the same thing, perhaps a new gup flag to distinguish
> > > which kind of pinning to use?
> > 
> > Agreed. This is a shiny example how forcing all existing gup users into
> > the new scheme is subotimal at best. Not the mention the overal
> > fragility mention elsewhere. I dislike the conversion even more now.
> > 
> > Sorry if this was already discussed already but why the new pinning is
> > not bound to FOLL_LONGTERM (ideally hidden by an interface so that users
> > do not have to care about the flag) only?
> 
> The new tracking cannot be bound to FOLL_LONGTERM. Anything that gets page
> reference and then touches page data (e.g. direct IO) needs the new kind of
> tracking so that filesystem knows someone is messing with the page data.
> So what John is trying to address is a different (although related) problem
> to someone pinning a page for a long time.

OK, I see. Thanks for the clarification.

> In principle, I'm not strongly opposed to a new FOLL flag to determine
> whether a pin or an ordinary page reference will be acquired at least as an
> internal implementation detail inside mm/gup.c. But I would really like to
> discourage new GUP users taking just page reference as the most clueless
> users (drivers) usually need a pin in the sense John implements. So in
> terms of API I'd strongly prefer to deprecate GUP as an API, provide
> vaddr_pin_pages() for drivers to get their buffer pages pinned and then for
> those few users who really know what they are doing (and who are not
> interested in page contents) we can have APIs like follow_page() to get a
> page reference from a virtual address.

Yes, going with a dedicated API sounds much better to me. Whether a
dedicated FOLL flag is used internally is not that important. I am also
for making the underlying gup to be really internal to the core kernel.

Thanks!
Ira Weiny Aug. 9, 2019, 6:14 p.m. UTC | #15
> On Fri 09-08-19 15:58:13, Jan Kara wrote:
> > On Fri 09-08-19 10:23:07, Michal Hocko wrote:
> > > On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
> > > > On 8/9/19 12:59 AM, John Hubbard wrote:
> > > > >>> That's true. However, I'm not sure munlocking is where the
> > > > >>> put_user_page() machinery is intended to be used anyway? These
> > > > >>> are short-term pins for struct page manipulation, not e.g.
> > > > >>> dirtying of page contents. Reading commit fc1d8e7cca2d I don't
> > > > >>> think this case falls within the reasoning there. Perhaps not
> > > > >>> all GUP users should be converted to the planned separate GUP
> > > > >>> tracking, and instead we should have a GUP/follow_page_mask()
> variant that keeps using get_page/put_page?
> > > > >>>
> > > > >>
> > > > >> Interesting. So far, the approach has been to get all the gup
> > > > >> callers to release via put_user_page(), but if we add in Jan's
> > > > >> and Ira's vaddr_pin_pages() wrapper, then maybe we could leave
> some sites unconverted.
> > > > >>
> > > > >> However, in order to do so, we would have to change things so
> > > > >> that we have one set of APIs (gup) that do *not* increment a
> > > > >> pin count, and another set
> > > > >> (vaddr_pin_pages) that do.
> > > > >>
> > > > >> Is that where we want to go...?
> > > > >>
> > > >
> > > > We already have a FOLL_LONGTERM flag, isn't that somehow related?
> > > > And if it's not exactly the same thing, perhaps a new gup flag to
> > > > distinguish which kind of pinning to use?
> > >
> > > Agreed. This is a shiny example how forcing all existing gup users
> > > into the new scheme is subotimal at best. Not the mention the overal
> > > fragility mention elsewhere. I dislike the conversion even more now.
> > >
> > > Sorry if this was already discussed already but why the new pinning
> > > is not bound to FOLL_LONGTERM (ideally hidden by an interface so
> > > that users do not have to care about the flag) only?
> >
> > The new tracking cannot be bound to FOLL_LONGTERM. Anything that gets
> > page reference and then touches page data (e.g. direct IO) needs the
> > new kind of tracking so that filesystem knows someone is messing with the
> page data.
> > So what John is trying to address is a different (although related)
> > problem to someone pinning a page for a long time.
> 
> OK, I see. Thanks for the clarification.

Not to beat a dead horse but FOLL_LONGTERM also has implications now for CMA pages which may or may not (I'm not an expert on those pages) need special tracking. 

> 
> > In principle, I'm not strongly opposed to a new FOLL flag to determine
> > whether a pin or an ordinary page reference will be acquired at least
> > as an internal implementation detail inside mm/gup.c. But I would
> > really like to discourage new GUP users taking just page reference as
> > the most clueless users (drivers) usually need a pin in the sense John
> > implements. So in terms of API I'd strongly prefer to deprecate GUP as
> > an API, provide
> > vaddr_pin_pages() for drivers to get their buffer pages pinned and
> > then for those few users who really know what they are doing (and who
> > are not interested in page contents) we can have APIs like
> > follow_page() to get a page reference from a virtual address.
> 
> Yes, going with a dedicated API sounds much better to me. Whether a
> dedicated FOLL flag is used internally is not that important. I am also for
> making the underlying gup to be really internal to the core kernel.

+1

I think GUP is too confusing.  I've been working with the details for many months now and it continues to confuse me.  :-(

My patches should be posted soon (based on mmotm) and I'll have my flame suit on so we can debate the interface.

Ira
Ira Weiny Aug. 9, 2019, 6:22 p.m. UTC | #16
> 
> On 8/8/19 4:41 PM, Ira Weiny wrote:
> > On Thu, Aug 08, 2019 at 03:59:15PM -0700, John Hubbard wrote:
> >> On 8/8/19 12:20 PM, John Hubbard wrote:
> >>> On 8/8/19 4:09 AM, Vlastimil Babka wrote:
> >>>> On 8/8/19 8:21 AM, Michal Hocko wrote:
> >>>>> On Wed 07-08-19 16:32:08, John Hubbard wrote:
> >>>>>> On 8/7/19 4:01 AM, Michal Hocko wrote:
> >>>>>>> On Mon 05-08-19 15:20:17, john.hubbard@gmail.com wrote:
> ...
> >> Oh, and meanwhile, I'm leaning toward a cheap fix: just use
> >> gup_fast() instead of get_page(), and also fix the releasing code. So
> >> this incremental patch, on top of the existing one, should do it:
> >>
> >> diff --git a/mm/mlock.c b/mm/mlock.c
> >> index b980e6270e8a..2ea272c6fee3 100644
> >> --- a/mm/mlock.c
> >> +++ b/mm/mlock.c
> >> @@ -318,18 +318,14 @@ static void __munlock_pagevec(struct pagevec
> *pvec, struct zone *zone)
> >>                 /*
> >>                  * We won't be munlocking this page in the next phase
> >>                  * but we still need to release the follow_page_mask()
> >> -                * pin. We cannot do it under lru_lock however. If it's
> >> -                * the last pin, __page_cache_release() would deadlock.
> >> +                * pin.
> >>                  */
> >> -               pagevec_add(&pvec_putback, pvec->pages[i]);
> >> +               put_user_page(pages[i]);
> 
> correction, make that:
>                    put_user_page(pvec->pages[i]);
> 
> (This is not fully tested yet.)
> 
> >>                 pvec->pages[i] = NULL;
> >>         }
> >>         __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
> >>         spin_unlock_irq(&zone->zone_pgdat->lru_lock);
> >>
> >> -       /* Now we can release pins of pages that we are not munlocking */
> >> -       pagevec_release(&pvec_putback);
> >> -
> >
> > I'm not an expert but this skips a call to lru_add_drain().  Is that ok?
> 
> Yes: unless I'm missing something, there is no reason to go through
> lru_add_drain in this case. These are gup'd pages that are not going to get
> any further processing.
> 
> >
> >>         /* Phase 2: page munlock */
> >>         for (i = 0; i < nr; i++) {
> >>                 struct page *page = pvec->pages[i]; @@ -394,6 +390,8
> >> @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
> >>         start += PAGE_SIZE;
> >>         while (start < end) {
> >>                 struct page *page = NULL;
> >> +               int ret;
> >> +
> >>                 pte++;
> >>                 if (pte_present(*pte))
> >>                         page = vm_normal_page(vma, start, *pte); @@
> >> -411,7 +409,13 @@ static unsigned long __munlock_pagevec_fill(struct
> pagevec *pvec,
> >>                 if (PageTransCompound(page))
> >>                         break;
> >>
> >> -               get_page(page);
> >> +               /*
> >> +                * Use get_user_pages_fast(), instead of get_page() so that the
> >> +                * releasing code can unconditionally call put_user_page().
> >> +                */
> >> +               ret = get_user_pages_fast(start, 1, 0, &page);
> >> +               if (ret != 1)
> >> +                       break;
> >
> > I like the idea of making this a get/put pair but I'm feeling uneasy
> > about how this is really supposed to work.
> >
> > For sure the GUP/PUP was supposed to be separate from [get|put]_page.
> >
> 
> Actually, they both take references on the page. And it is absolutely OK to call
> them both on the same page.
> 
> But anyway, we're not mixing them up here. If you follow the code paths,
> either gup or follow_page_mask() is used, and then put_user_page()
> releases.
> 
> So...you haven't actually pointed to a bug here, right? :)

No...  no bug.

sorry this was just a general comment on semantics.  But in keeping with the semantics discussion it is further confusing that follow_page_mask() is also mixed in here...

Which is where my comment was driving toward.  If you call GUP there should be a PUP.  Get_page/put_page...  follow_page/unfollow_page...  ???  ;-)  Ok now I'm off the rails...  but that was the point...

I think Jan and Michal are onto something here WRT internal vs external interfaces.

Ira
John Hubbard Aug. 9, 2019, 6:36 p.m. UTC | #17
On 8/9/19 11:14 AM, Weiny, Ira wrote:
>> On Fri 09-08-19 15:58:13, Jan Kara wrote:
>>> On Fri 09-08-19 10:23:07, Michal Hocko wrote:
>>>> On Fri 09-08-19 10:12:48, Vlastimil Babka wrote:
>>>>> On 8/9/19 12:59 AM, John Hubbard wrote:
...
>>> In principle, I'm not strongly opposed to a new FOLL flag to determine
>>> whether a pin or an ordinary page reference will be acquired at least
>>> as an internal implementation detail inside mm/gup.c. But I would
>>> really like to discourage new GUP users taking just page reference as
>>> the most clueless users (drivers) usually need a pin in the sense John
>>> implements. So in terms of API I'd strongly prefer to deprecate GUP as
>>> an API, provide
>>> vaddr_pin_pages() for drivers to get their buffer pages pinned and
>>> then for those few users who really know what they are doing (and who
>>> are not interested in page contents) we can have APIs like
>>> follow_page() to get a page reference from a virtual address.
>>
>> Yes, going with a dedicated API sounds much better to me. Whether a
>> dedicated FOLL flag is used internally is not that important. I am also for
>> making the underlying gup to be really internal to the core kernel.
> 
> +1
> 
> I think GUP is too confusing.  I've been working with the details for many months now and it continues to confuse me.  :-(
> 
> My patches should be posted soon (based on mmotm) and I'll have my flame suit on so we can debate the interface.
> 

OK, so: use FOLL_PIN as an internal gup flag. FOLL_PIN will get set by the
new vaddr_pin_pages*() wrapper calls. Then, put_user_page*() shall only be
invoked from call sites that use FOLL_PIN.

With that approach in mind, I can sweep through my callsite conversion
patchset and drop a few patches. There are actually quite a few patches that
just want to find the page, not really operate on its data.

And the conversion of the actual gup() calls can be done almost independently
of the put_user_page*() conversions, if necessary (and it sounds like with your
patchset, it is).

btw, as part of the conversion, to make merging and call site conversion
smoother, maybe it's OK to pass in FOLL_PIN to existing gup() calls, with
the intent to convert them to use vaddr_pin_pages.)

thanks,
diff mbox series

Patch

diff --git a/mm/mlock.c b/mm/mlock.c
index a90099da4fb4..b980e6270e8a 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -345,7 +345,7 @@  static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 				get_page(page); /* for putback_lru_page() */
 				__munlock_isolated_page(page);
 				unlock_page(page);
-				put_page(page); /* from follow_page_mask() */
+				put_user_page(page); /* from follow_page_mask() */
 			}
 		}
 	}
@@ -467,7 +467,7 @@  void munlock_vma_pages_range(struct vm_area_struct *vma,
 		if (page && !IS_ERR(page)) {
 			if (PageTransTail(page)) {
 				VM_BUG_ON_PAGE(PageMlocked(page), page);
-				put_page(page); /* follow_page_mask() */
+				put_user_page(page); /* follow_page_mask() */
 			} else if (PageTransHuge(page)) {
 				lock_page(page);
 				/*
@@ -478,7 +478,7 @@  void munlock_vma_pages_range(struct vm_area_struct *vma,
 				 */
 				page_mask = munlock_vma_page(page);
 				unlock_page(page);
-				put_page(page); /* follow_page_mask() */
+				put_user_page(page); /* follow_page_mask() */
 			} else {
 				/*
 				 * Non-huge pages are handled in batches via