diff mbox series

[1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal

Message ID 20200408014010.80428-2-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm: Two small fixes for recent syzbot reports | expand

Commit Message

Peter Xu April 8, 2020, 1:40 a.m. UTC
lookup_node() uses gup to pin the page and get node information.  It
checks against ret>=0 assuming the page will be filled in.  However
it's also possible that gup will return zero, for example, when the
thread is quickly killed with a fatal signal.  Teach lookup_node() to
gracefully return an error -EFAULT if it happens.

Meanwhile, initialize "page" to NULL to avoid potential risk of
exploiting the pointer.

Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/mempolicy.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Michal Hocko April 8, 2020, 10:21 a.m. UTC | #1
On Tue 07-04-20 21:40:09, Peter Xu wrote:
> lookup_node() uses gup to pin the page and get node information.  It
> checks against ret>=0 assuming the page will be filled in.  However
> it's also possible that gup will return zero, for example, when the
> thread is quickly killed with a fatal signal.  Teach lookup_node() to
> gracefully return an error -EFAULT if it happens.
> 
> Meanwhile, initialize "page" to NULL to avoid potential risk of
> exploiting the pointer.
> 
> Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")

I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
on a fatal signal?
Peter Xu April 8, 2020, 2:20 p.m. UTC | #2
On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > lookup_node() uses gup to pin the page and get node information.  It
> > checks against ret>=0 assuming the page will be filled in.  However
> > it's also possible that gup will return zero, for example, when the
> > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > gracefully return an error -EFAULT if it happens.
> > 
> > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > exploiting the pointer.
> > 
> > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> 
> I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> on a fatal signal?

Hi, Michal,

I do see quite a few usages on -ERESTARTSYS, but also some others,
majorly -EINTR, or even -EFAULT.  I think it could be a more general
question rather than a specific question to this patch only.

I saw some other discussions about this return value issue, I'll CC
you in the other thread when I raise this as a general question.

Thanks,
Michal Hocko April 8, 2020, 2:30 p.m. UTC | #3
On Wed 08-04-20 10:20:39, Peter Xu wrote:
> On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> > On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > > lookup_node() uses gup to pin the page and get node information.  It
> > > checks against ret>=0 assuming the page will be filled in.  However
> > > it's also possible that gup will return zero, for example, when the
> > > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > > gracefully return an error -EFAULT if it happens.
> > > 
> > > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > > exploiting the pointer.
> > > 
> > > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> > 
> > I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> > on a fatal signal?
> 
> Hi, Michal,
> 
> I do see quite a few usages on -ERESTARTSYS, but also some others,
> majorly -EINTR, or even -EFAULT.  I think it could be a more general
> question rather than a specific question to this patch only.

I am sorry but I was probably not clear enough. I was mostly worried
that gup doesn't return ERESTARTSYS or EINTR when it backed off because
of fatal signal pending. Your patch is checking for 0 an indicating that
this is that condition.
Peter Xu April 8, 2020, 3:24 p.m. UTC | #4
On Wed, Apr 08, 2020 at 04:30:24PM +0200, Michal Hocko wrote:
> On Wed 08-04-20 10:20:39, Peter Xu wrote:
> > On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> > > On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > > > lookup_node() uses gup to pin the page and get node information.  It
> > > > checks against ret>=0 assuming the page will be filled in.  However
> > > > it's also possible that gup will return zero, for example, when the
> > > > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > > > gracefully return an error -EFAULT if it happens.
> > > > 
> > > > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > > > exploiting the pointer.
> > > > 
> > > > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > > > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> > > 
> > > I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> > > on a fatal signal?
> > 
> > Hi, Michal,
> > 
> > I do see quite a few usages on -ERESTARTSYS, but also some others,
> > majorly -EINTR, or even -EFAULT.  I think it could be a more general
> > question rather than a specific question to this patch only.
> 
> I am sorry but I was probably not clear enough. I was mostly worried
> that gup doesn't return ERESTARTSYS or EINTR when it backed off because
> of fatal signal pending. Your patch is checking for 0 an indicating that
> this is that condition.

Yeah I just noticed the fact, sorry!

Hillf just posted a fix there for recovering the behavior:

https://lore.kernel.org/lkml/20200408151213.GE66033@xz-x1/

Thanks,
Michal Hocko April 8, 2020, 3:26 p.m. UTC | #5
On Wed 08-04-20 11:24:35, Peter Xu wrote:
> On Wed, Apr 08, 2020 at 04:30:24PM +0200, Michal Hocko wrote:
> > On Wed 08-04-20 10:20:39, Peter Xu wrote:
> > > On Wed, Apr 08, 2020 at 12:21:28PM +0200, Michal Hocko wrote:
> > > > On Tue 07-04-20 21:40:09, Peter Xu wrote:
> > > > > lookup_node() uses gup to pin the page and get node information.  It
> > > > > checks against ret>=0 assuming the page will be filled in.  However
> > > > > it's also possible that gup will return zero, for example, when the
> > > > > thread is quickly killed with a fatal signal.  Teach lookup_node() to
> > > > > gracefully return an error -EFAULT if it happens.
> > > > > 
> > > > > Meanwhile, initialize "page" to NULL to avoid potential risk of
> > > > > exploiting the pointer.
> > > > > 
> > > > > Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
> > > > > Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
> > > > 
> > > > I am not familiar with thic commit but shouldn't gup return ERESTARTSYS
> > > > on a fatal signal?
> > > 
> > > Hi, Michal,
> > > 
> > > I do see quite a few usages on -ERESTARTSYS, but also some others,
> > > majorly -EINTR, or even -EFAULT.  I think it could be a more general
> > > question rather than a specific question to this patch only.
> > 
> > I am sorry but I was probably not clear enough. I was mostly worried
> > that gup doesn't return ERESTARTSYS or EINTR when it backed off because
> > of fatal signal pending. Your patch is checking for 0 an indicating that
> > this is that condition.
> 
> Yeah I just noticed the fact, sorry!
> 
> Hillf just posted a fix there for recovering the behavior:
> 
> https://lore.kernel.org/lkml/20200408151213.GE66033@xz-x1/

yeah, that is the proper fix.
Michal Hocko April 9, 2020, 7:02 a.m. UTC | #6
This patch has been merged and it is actually wrong after ae46d2aa6a7f
has been merged. We can either revert or I suggest just handling >0,
like the patch below:

From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 9 Apr 2020 08:26:57 +0200
Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node

ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
added a special casing for 0 return value because that was a possible
gup return value when interrupted by fatal signal. This has been fixed
by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
for fatal signal") in the mean time so ba841078cd05 can be reverted.
This patch however doesn't go all the way to revert it because 0 return
value is impossible. We always get an error or 1 for a single page
request.

Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/mempolicy.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 48ba9729062e..1965e2681877 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
 
 	int locked = 1;
 	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err == 0) {
-		/* E.g. GUP interrupted by fatal signal */
-		err = -EFAULT;
-	} else if (err > 0) {
+	if (err > 0) {
 		err = page_to_nid(p);
 		put_page(p);
 	}
Peter Xu April 9, 2020, 12:52 p.m. UTC | #7
On Thu, Apr 09, 2020 at 09:02:53AM +0200, Michal Hocko wrote:
> This patch has been merged and it is actually wrong after ae46d2aa6a7f
> has been merged. We can either revert or I suggest just handling >0,
> like the patch below:
> 
> From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 9 Apr 2020 08:26:57 +0200
> Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node
> 
> ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
> added a special casing for 0 return value because that was a possible
> gup return value when interrupted by fatal signal. This has been fixed
> by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
> for fatal signal") in the mean time so ba841078cd05 can be reverted.
> This patch however doesn't go all the way to revert it because 0 return
> value is impossible. We always get an error or 1 for a single page
> request.
> 
> Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/mempolicy.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  
>  	int locked = 1;
>  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err == 0) {
> -		/* E.g. GUP interrupted by fatal signal */
> -		err = -EFAULT;
> -	} else if (err > 0) {
> +	if (err > 0) {
>  		err = page_to_nid(p);
>  		put_page(p);
>  	}

Hi, Michal,

I'm totally not against this, but note that get_user_pages_locked()
could still return zero. Although I'm not 100% sure now on whether
npages==0 will be the only case, it won't hurt to keep this ret==0
check until we consolidate the whole gup code to never return zero.

Assuming there's another case (even possible for a future gup bug)
that could return a zero, your patch will let err be anything (which
you didn't initialize err with your patch), then the function will
return a random value.  So even if you really want this change, I
would suggest you initialize err to some error code.

I just don't see much gain we get from removing that check.

Thanks,
Peter Xu April 9, 2020, 1 p.m. UTC | #8
On Thu, Apr 09, 2020 at 08:52:58AM -0400, Peter Xu wrote:
> On Thu, Apr 09, 2020 at 09:02:53AM +0200, Michal Hocko wrote:
> > This patch has been merged and it is actually wrong after ae46d2aa6a7f
> > has been merged. We can either revert or I suggest just handling >0,
> > like the patch below:
> > 
> > From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 9 Apr 2020 08:26:57 +0200
> > Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node
> > 
> > ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
> > added a special casing for 0 return value because that was a possible
> > gup return value when interrupted by fatal signal. This has been fixed
> > by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
> > for fatal signal") in the mean time so ba841078cd05 can be reverted.
> > This patch however doesn't go all the way to revert it because 0 return
> > value is impossible. We always get an error or 1 for a single page
> > request.
> > 
> > Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/mempolicy.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 48ba9729062e..1965e2681877 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
> >  
> >  	int locked = 1;
> >  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> > -	if (err == 0) {
> > -		/* E.g. GUP interrupted by fatal signal */
> > -		err = -EFAULT;
> > -	} else if (err > 0) {
> > +	if (err > 0) {
> >  		err = page_to_nid(p);
> >  		put_page(p);
> >  	}
> 
> Hi, Michal,
> 
> I'm totally not against this, but note that get_user_pages_locked()
> could still return zero. Although I'm not 100% sure now on whether
> npages==0 will be the only case, it won't hurt to keep this ret==0
> check until we consolidate the whole gup code to never return zero.
> 
> Assuming there's another case (even possible for a future gup bug)
> that could return a zero, your patch will let err be anything (which
> you didn't initialize err with your patch), then the function will
> return a random value.  So even if you really want this change, I
> would suggest you initialize err to some error code.

I'm sorry, not a random value, but err=0 will be returned as the mem
policy by lookup_node().

> 
> I just don't see much gain we get from removing that check.
> 
> Thanks,
> 
> -- 
> Peter Xu
Michal Hocko April 9, 2020, 1:53 p.m. UTC | #9
On Thu 09-04-20 08:52:58, Peter Xu wrote:
> On Thu, Apr 09, 2020 at 09:02:53AM +0200, Michal Hocko wrote:
> > This patch has been merged and it is actually wrong after ae46d2aa6a7f
> > has been merged. We can either revert or I suggest just handling >0,
> > like the patch below:
> > 
> > From 03fbe30ec61e65b0927d0d41bccc7dff5f7eafd8 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 9 Apr 2020 08:26:57 +0200
> > Subject: [PATCH] mm, mempolicy: fix up gup usage in lookup_node
> > 
> > ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
> > added a special casing for 0 return value because that was a possible
> > gup return value when interrupted by fatal signal. This has been fixed
> > by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
> > for fatal signal") in the mean time so ba841078cd05 can be reverted.
> > This patch however doesn't go all the way to revert it because 0 return
> > value is impossible. We always get an error or 1 for a single page
> > request.
> > 
> > Fixes: ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/mempolicy.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 48ba9729062e..1965e2681877 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
> >  
> >  	int locked = 1;
> >  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> > -	if (err == 0) {
> > -		/* E.g. GUP interrupted by fatal signal */
> > -		err = -EFAULT;
> > -	} else if (err > 0) {
> > +	if (err > 0) {
> >  		err = page_to_nid(p);
> >  		put_page(p);
> >  	}
> 
> Hi, Michal,
> 
> I'm totally not against this, but note that get_user_pages_locked()
> could still return zero. Although I'm not 100% sure now on whether
> npages==0 will be the only case, it won't hurt to keep this ret==0
> check until we consolidate the whole gup code to never return zero.

As we have discussed in other email thread, returning 0 should be really
possible only for an nr_pages == 0. And even in that case we should
rather return EINVAL. I wanted to do that change as well but gup is a
heavily used interface and I do not have time to check all existing
callers.
 
> Assuming there's another case (even possible for a future gup bug)
> that could return a zero, your patch will let err be anything (which
> you didn't initialize err with your patch), then the function will
> return a random value.  So even if you really want this change, I
> would suggest you initialize err to some error code.

I wouldn't really overcomplicate it. If you are worried about future
bugs then we can warn into the log when !err && nr_pages somewher inside
gup code. But let's keep callers as simple as possible. We surely do not
want to check for !err in all users now.

> I just don't see much gain we get from removing that check.

The code clarity is the primary reason.
Linus Torvalds April 9, 2020, 4:42 p.m. UTC | #10
On Thu, Apr 9, 2020 at 12:03 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> This patch however doesn't go all the way to revert it because 0 return
> value is impossible.

I'm not convinced it's impossible. And if it is, then the current code
is harmless.

Now, I do agree that we probably should go through and clarify the
whole range of different get_user_pages() cases of returning zero (or
not doing so), but right now it's so confusing that I'd prefer to keep
that (possibly unnecessary) belt-and-suspenders check for zero in
there.

If/when somebody actually does a real audit and the result is "these
functions cannot return zero" and it's documented, then we can remove
those checks.

Ok?

              Linus
Michal Hocko April 14, 2020, 11:04 a.m. UTC | #11
On Thu 09-04-20 09:42:20, Linus Torvalds wrote:
> On Thu, Apr 9, 2020 at 12:03 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > This patch however doesn't go all the way to revert it because 0 return
> > value is impossible.
> 
> I'm not convinced it's impossible.

__get_user_pages is documented as
 * -- If nr_pages is 0, returns 0.
 * -- If nr_pages is >0, but no pages were pinned, returns -errno.
 * -- If nr_pages is >0, and some pages were pinned, returns the number of
 *    pages pinned. Again, this may be less than nr_pages.

but let me double check the actual code... There seem to be only one
exception the above rule AFAICS. faultin_page returning EBUSY will
be overriden to either 0 for the first page or return the number of
already pinned pages. So nr_pages > 0 && ret = 0 is indeed possible
from __get_user_pages :/ That will be the case only for VM_FAULT_RETRY,
thoug.

Now __get_user_pages_locked behaves differently. It keeps retrying the
fault until it succeeds unless FOLL_NOWAIT is specified. Then it would
return 0. Why we need to return 0 is not really clear to me but it
seem to be a long term behavior. I believe we need to document it.

> And if it is, then the current code is harmless.

Yes from the above it seems that the check is indeed harmless becasue
this path doesn't use FOLL_NOWAIT and so it will never see 0 return.
I find a reference to EINTR confusing so I would still love to change
that.

> Now, I do agree that we probably should go through and clarify the
> whole range of different get_user_pages() cases of returning zero (or
> not doing so), but right now it's so confusing that I'd prefer to keep
> that (possibly unnecessary) belt-and-suspenders check for zero in
> there.
>
> If/when somebody actually does a real audit and the result is "these
> functions cannot return zero" and it's documented, then we can remove
> those checks.

Would you mind this patch instead?

commit bc6c0fa7c7fb5eb54963dca65ae4a62ba04c9efa
Author: Michal Hocko <mhocko@suse.com>
Date:   Thu Apr 9 08:26:57 2020 +0200

    mm, mempolicy: fix up gup usage in lookup_node
    
    ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
    added a special casing for 0 return value because that was a possible
    gup return value when interrupted by fatal signal. This has been fixed
    by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
    for fatal signal") in the mean time so ba841078cd05 can be reverted.
    
    This patch however doesn't go all the way to revert it because the check
    for 0 is wrong and confusing here. Firstly it is inherently unsafe to
    access the page when get_user_pages_locked returns 0 (aka no page
    returned).
    Fortunatelly this will not happen because get_user_pages_locked will not
    return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not
    the case here. Document this potential error code in gup code while we
    are at it.
    
    Signed-off-by: Michal Hocko <mhocko@suse.com>

diff --git a/mm/gup.c b/mm/gup.c
index 50681f0286de..a8575b880baf 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -980,6 +980,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
  * -- If nr_pages is >0, but no pages were pinned, returns -errno.
  * -- If nr_pages is >0, and some pages were pinned, returns the number of
  *    pages pinned. Again, this may be less than nr_pages.
+ * -- 0 return value is possible when the fault would need to be retried.
  *
  * The caller is responsible for releasing returned @pages, via put_page().
  *
@@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(fixup_user_fault);
 
+/*
+ * Please note that this function, unlike __get_user_pages will not
+ * return 0 for nr_pages > 0 without FOLL_NOWAIT
+ */
 static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 						struct mm_struct *mm,
 						unsigned long start,
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 48ba9729062e..1965e2681877 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
 
 	int locked = 1;
 	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err == 0) {
-		/* E.g. GUP interrupted by fatal signal */
-		err = -EFAULT;
-	} else if (err > 0) {
+	if (err > 0) {
 		err = page_to_nid(p);
 		put_page(p);
 	}
Peter Xu April 14, 2020, 1:49 p.m. UTC | #12
On Tue, Apr 14, 2020 at 01:04:29PM +0200, Michal Hocko wrote:

[...]

> @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL_GPL(fixup_user_fault);
>  
> +/*
> + * Please note that this function, unlike __get_user_pages will not
> + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> + */
>  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
>  						struct mm_struct *mm,
>  						unsigned long start,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  
>  	int locked = 1;
>  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err == 0) {
> -		/* E.g. GUP interrupted by fatal signal */
> -		err = -EFAULT;
> -	} else if (err > 0) {
> +	if (err > 0) {
>  		err = page_to_nid(p);
>  		put_page(p);
>  	}

Hi, Michal,

IIUC this is not the only place that we check against ret==0 for gup.
For example, the other direct caller of the same function,
get_vaddr_frames(), which will set -EFAULT too if ret==0.  So do we
want to change all the places and don't check against zero explicitly?

I'm now thinking whether this would be good even if we refactored gup
and only allow it to return either >0 as number of page pinned, or <0
for all the rest.  I'm not sure how others will see this, but the
answer is probably the same at least to me as before for this issue.

As a caller, I'll see gup as a black box.  Even if the gup function
guarantees that the retcode won't be zero and documented it, I (as a
caller) will be using that to index page array so I'd still better to
check that value before I do anything (because it's meaningless to
index an array with zero size), and a convertion of "ret==0" -->
"-EFAULT" (or some other failures) in this case still makes sense.
While removing that doesn't help a lot, imho, but instead make it
slightly unsafer.

Maybe that's also why ret==0 hasn't been reworked for years?  Maybe
there is just never a reason strong enough to do that explicitly,
because it's still good to check against ret==0 after all...

Thanks,
Michal Hocko April 14, 2020, 2:18 p.m. UTC | #13
On Tue 14-04-20 09:49:06, Peter Xu wrote:
> On Tue, Apr 14, 2020 at 01:04:29PM +0200, Michal Hocko wrote:
> 
> [...]
> 
> > @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> >  }
> >  EXPORT_SYMBOL_GPL(fixup_user_fault);
> >  
> > +/*
> > + * Please note that this function, unlike __get_user_pages will not
> > + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> > + */
> >  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> >  						struct mm_struct *mm,
> >  						unsigned long start,
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 48ba9729062e..1965e2681877 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
> >  
> >  	int locked = 1;
> >  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> > -	if (err == 0) {
> > -		/* E.g. GUP interrupted by fatal signal */
> > -		err = -EFAULT;
> > -	} else if (err > 0) {
> > +	if (err > 0) {
> >  		err = page_to_nid(p);
> >  		put_page(p);
> >  	}
> 
> Hi, Michal,
> 
> IIUC this is not the only place that we check against ret==0 for gup.
> For example, the other direct caller of the same function,
> get_vaddr_frames(), which will set -EFAULT too if ret==0.  So do we
> want to change all the places and don't check against zero explicitly?

This would require to analyze each such a call. For example
get_vaddr_frames has to handle get_user_pages_locked returning 0 because
it allows callers to specify FOLL_NOWAIT. Whether EFAULT is a proper
return value for that case is a question I didn't really get to analyze.

> I'm now thinking whether this would be good even if we refactored gup
> and only allow it to return either >0 as number of page pinned, or <0
> for all the rest.  I'm not sure how others will see this, but the
> answer is probably the same at least to me as before for this issue.

I would consider a semantic without that special case for FOLL_NOWAIT
much more clear but I do not really understand the historical background
for it TBH so I do not dare to touch that.

> As a caller, I'll see gup as a black box.  Even if the gup function
> guarantees that the retcode won't be zero and documented it, I (as a
> caller) will be using that to index page array so I'd still better to
> check that value before I do anything (because it's meaningless to
> index an array with zero size), and a convertion of "ret==0" -->
> "-EFAULT" (or some other failures) in this case still makes sense.
> While removing that doesn't help a lot, imho, but instead make it
> slightly unsafer.

Well, my experience tells me that people really love to copy&paste code
and error handling and if the error handling is bogus it just spreads
all over the place until it really defines a new standard which is close
to impossible to get rid of. So if the error handling can be done
properly then I would really prefer it. In the above case it is clearly
misleading, because fatal signal should be never reflected by err==0.
Michal Hocko April 20, 2020, 12:47 p.m. UTC | #14
Any opinion on this Linus? Should I just repost the patch?

On Tue 14-04-20 13:04:32, Michal Hocko wrote:
> On Thu 09-04-20 09:42:20, Linus Torvalds wrote:
> > On Thu, Apr 9, 2020 at 12:03 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > This patch however doesn't go all the way to revert it because 0 return
> > > value is impossible.
> > 
> > I'm not convinced it's impossible.
> 
> __get_user_pages is documented as
>  * -- If nr_pages is 0, returns 0.
>  * -- If nr_pages is >0, but no pages were pinned, returns -errno.
>  * -- If nr_pages is >0, and some pages were pinned, returns the number of
>  *    pages pinned. Again, this may be less than nr_pages.
> 
> but let me double check the actual code... There seem to be only one
> exception the above rule AFAICS. faultin_page returning EBUSY will
> be overriden to either 0 for the first page or return the number of
> already pinned pages. So nr_pages > 0 && ret = 0 is indeed possible
> from __get_user_pages :/ That will be the case only for VM_FAULT_RETRY,
> thoug.
> 
> Now __get_user_pages_locked behaves differently. It keeps retrying the
> fault until it succeeds unless FOLL_NOWAIT is specified. Then it would
> return 0. Why we need to return 0 is not really clear to me but it
> seem to be a long term behavior. I believe we need to document it.
> 
> > And if it is, then the current code is harmless.
> 
> Yes from the above it seems that the check is indeed harmless becasue
> this path doesn't use FOLL_NOWAIT and so it will never see 0 return.
> I find a reference to EINTR confusing so I would still love to change
> that.
> 
> > Now, I do agree that we probably should go through and clarify the
> > whole range of different get_user_pages() cases of returning zero (or
> > not doing so), but right now it's so confusing that I'd prefer to keep
> > that (possibly unnecessary) belt-and-suspenders check for zero in
> > there.
> >
> > If/when somebody actually does a real audit and the result is "these
> > functions cannot return zero" and it's documented, then we can remove
> > those checks.
> 
> Would you mind this patch instead?
> 
> commit bc6c0fa7c7fb5eb54963dca65ae4a62ba04c9efa
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Thu Apr 9 08:26:57 2020 +0200
> 
>     mm, mempolicy: fix up gup usage in lookup_node
>     
>     ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has
>     added a special casing for 0 return value because that was a possible
>     gup return value when interrupted by fatal signal. This has been fixed
>     by ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR
>     for fatal signal") in the mean time so ba841078cd05 can be reverted.
>     
>     This patch however doesn't go all the way to revert it because the check
>     for 0 is wrong and confusing here. Firstly it is inherently unsafe to
>     access the page when get_user_pages_locked returns 0 (aka no page
>     returned).
>     Fortunatelly this will not happen because get_user_pages_locked will not
>     return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not
>     the case here. Document this potential error code in gup code while we
>     are at it.
>     
>     Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 50681f0286de..a8575b880baf 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -980,6 +980,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   * -- If nr_pages is >0, but no pages were pinned, returns -errno.
>   * -- If nr_pages is >0, and some pages were pinned, returns the number of
>   *    pages pinned. Again, this may be less than nr_pages.
> + * -- 0 return value is possible when the fault would need to be retried.
>   *
>   * The caller is responsible for releasing returned @pages, via put_page().
>   *
> @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL_GPL(fixup_user_fault);
>  
> +/*
> + * Please note that this function, unlike __get_user_pages will not
> + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> + */
>  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
>  						struct mm_struct *mm,
>  						unsigned long start,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>  
>  	int locked = 1;
>  	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> -	if (err == 0) {
> -		/* E.g. GUP interrupted by fatal signal */
> -		err = -EFAULT;
> -	} else if (err > 0) {
> +	if (err > 0) {
>  		err = page_to_nid(p);
>  		put_page(p);
>  	}
> -- 
> Michal Hocko
> SUSE Labs
Linus Torvalds April 20, 2020, 5:31 p.m. UTC | #15
On Mon, Apr 20, 2020 at 5:48 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> Any opinion on this Linus? Should I just repost the patch?

I'm ok with the patch, but it's not exactly urgent and I was assuming
it would go the usual path through the -mm tree.

                 Linus
Michal Hocko April 21, 2020, 7:09 a.m. UTC | #16
On Mon 20-04-20 10:31:17, Linus Torvalds wrote:
> On Mon, Apr 20, 2020 at 5:48 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > Any opinion on this Linus? Should I just repost the patch?
> 
> I'm ok with the patch, but it's not exactly urgent and I was assuming
> it would go the usual path through the -mm tree.

OK, I will repost then. Thanks!
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 5fb427aed612..c7ca6a808fb1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -897,12 +897,15 @@  static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
 
 static int lookup_node(struct mm_struct *mm, unsigned long addr)
 {
-	struct page *p;
+	struct page *p = NULL;
 	int err;
 
 	int locked = 1;
 	err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
-	if (err >= 0) {
+	if (err == 0) {
+		/* E.g. GUP interrupted by fatal signal */
+		err = -EFAULT;
+	} else if (err > 0) {
 		err = page_to_nid(p);
 		put_page(p);
 	}