diff mbox series

[1/4] mm/gup: Add FOLL_INTERRUPTIBLE

Message ID 20220622213656.81546-2-peterx@redhat.com (mailing list archive)
State New
Headers show
Series kvm/mm: Allow GUP to respond to non fatal signals | expand

Commit Message

Peter Xu June 22, 2022, 9:36 p.m. UTC
We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
issue with it is that not all GUP paths are able to handle signal delivers
besides SIGKILL.

That's not ideal for the GUP users who are actually able to handle these
cases, like KVM.

KVM uses GUP extensively on faulting guest pages, during which we've got
existing infrastructures to retry a page fault at a later time.  Allowing
the GUP to be interrupted by generic signals can make KVM related threads
to be more responsive.  For examples:

  (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
      e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
      generated to kick the vcpus out of kernel context immediately,

  (2) SIGINT: which can be used with interactive hypervisor users to stop a
      virtual machine with Ctrl-C without any delays/hangs,

  (3) SIGTRAP: which grants GDB capability even during page faults that are
      stuck for a long time.

Normally hypervisor will be able to receive these signals properly, but not
if we're stuck in a GUP for a long time for whatever reason.  It happens
easily with a stucked postcopy migration when e.g. a network temp failure
happens, then some vcpu threads can hang death waiting for the pages.  With
the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
enable the ability to trap these signals.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 33 +++++++++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

Comments

Jason Gunthorpe June 25, 2022, 12:35 a.m. UTC | #1
On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote:
> We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
> issue with it is that not all GUP paths are able to handle signal delivers
> besides SIGKILL.
> 
> That's not ideal for the GUP users who are actually able to handle these
> cases, like KVM.
> 
> KVM uses GUP extensively on faulting guest pages, during which we've got
> existing infrastructures to retry a page fault at a later time.  Allowing
> the GUP to be interrupted by generic signals can make KVM related threads
> to be more responsive.  For examples:
> 
>   (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
>       e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
>       generated to kick the vcpus out of kernel context immediately,
> 
>   (2) SIGINT: which can be used with interactive hypervisor users to stop a
>       virtual machine with Ctrl-C without any delays/hangs,
> 
>   (3) SIGTRAP: which grants GDB capability even during page faults that are
>       stuck for a long time.
> 
> Normally hypervisor will be able to receive these signals properly, but not
> if we're stuck in a GUP for a long time for whatever reason.  It happens
> easily with a stucked postcopy migration when e.g. a network temp failure
> happens, then some vcpu threads can hang death waiting for the pages.  With
> the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
> enable the ability to trap these signals.

Can you talk abit about what is required to use this new interface
correctly?

Lots of GUP callers are in simple system call contexts (like ioctl),
can/should they set this flag and if so what else do they need to do?

Thanks,
Jason
Peter Xu June 25, 2022, 1:23 a.m. UTC | #2
Hi, Jason,

On Fri, Jun 24, 2022 at 09:35:54PM -0300, Jason Gunthorpe wrote:
> Can you talk abit about what is required to use this new interface
> correctly?
> 
> Lots of GUP callers are in simple system call contexts (like ioctl),
> can/should they set this flag and if so what else do they need to do?

Thanks for taking a look.

IMHO the major thing required is the caller can handle the case when GUP
returned (1) less page than expected, and (2) -EINTR returns.

For the -EINTR case, IIUC ideally in an ioctl context we should better
deliver it back to user app this -EINTR (while do cleanups gracefully),
rather than returning anything else (e.g. converting it to -EFAULT or
something else).

But note that FAULT_FLAG_INTERRUPTIBLE is only used in an userfaultfd
context (aka, userfaultfd_get_blocking_state()).  For example, if we hang
at lock_page() (if not go into whether hanging at lock_page makes sense or
not at all.. it really sounds like a bug) and we receive a non-fatal
signal, we won't be able to be scheduled for that since lock_page() uses
TASK_UNINTERRUPTIBLE always.

I think it's a separate problem on whether we should extend the usage of
FAULT_FLAG_INTERRUPTIBLE to things like lock_page() (and probably not..),
and currently it does solve a major issue regarding postcopy hanging on
pages for hypervisor use case.  Hopefully that still justifies this plumber
work to enable the interruptible cap to GUP layer.

If to go back to the original question with a shorter answer: if the ioctl
context that GUP upon a page that will never be with a uffd context, then
it's probably not gonna help at all.. at least not before we use
FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling.

Thanks,
Jason Gunthorpe June 25, 2022, 11:59 p.m. UTC | #3
On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote:
> If to go back to the original question with a shorter answer: if the ioctl
> context that GUP upon a page that will never be with a uffd context, then
> it's probably not gonna help at all.. at least not before we use
> FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling.

I think I would be more interested in this if it could abort a swap
in, for instance. Doesn't this happen if it flows the interruptible
flag into the VMA's fault handler?

Jason
Peter Xu June 27, 2022, 3:29 p.m. UTC | #4
On Sat, Jun 25, 2022 at 08:59:04PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote:
> > If to go back to the original question with a shorter answer: if the ioctl
> > context that GUP upon a page that will never be with a uffd context, then
> > it's probably not gonna help at all.. at least not before we use
> > FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling.
> 
> I think I would be more interested in this if it could abort a swap
> in, for instance. Doesn't this happen if it flows the interruptible
> flag into the VMA's fault handler?

The idea makes sense, but it doesn't work like that right now, afaict. We
need to teach lock_page_or_retry() to be able to consume the flag as
discussed.

I don't see a major blocker for it if lock_page_or_retry() is the only one
we'd like to touch.  Say, the only caller currently is do_swap_page()
(there's also remove_device_exclusive_entry() but just deeper in the
stack). Looks doable so far but I'll need to think about it..  I can keep
you updated if I get something, but it'll be separate from this patchset.

Thanks,
John Hubbard June 28, 2022, 2:07 a.m. UTC | #5
On 6/22/22 14:36, Peter Xu wrote:
> We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
> issue with it is that not all GUP paths are able to handle signal delivers
> besides SIGKILL.
> 
> That's not ideal for the GUP users who are actually able to handle these
> cases, like KVM.
> 
> KVM uses GUP extensively on faulting guest pages, during which we've got
> existing infrastructures to retry a page fault at a later time.  Allowing
> the GUP to be interrupted by generic signals can make KVM related threads
> to be more responsive.  For examples:
> 
>    (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
>        e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
>        generated to kick the vcpus out of kernel context immediately,
> 
>    (2) SIGINT: which can be used with interactive hypervisor users to stop a
>        virtual machine with Ctrl-C without any delays/hangs,
> 
>    (3) SIGTRAP: which grants GDB capability even during page faults that are
>        stuck for a long time.
> 
> Normally hypervisor will be able to receive these signals properly, but not
> if we're stuck in a GUP for a long time for whatever reason.  It happens
> easily with a stucked postcopy migration when e.g. a network temp failure
> happens, then some vcpu threads can hang death waiting for the pages.  With
> the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
> enable the ability to trap these signals.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   include/linux/mm.h |  1 +
>   mm/gup.c           | 33 +++++++++++++++++++++++++++++----
>   2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..ebdf8a6b86c1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>   #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
>   #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
>   #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
> +#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */

Perhaps, s/generic/non-fatal/ ?
>   
>   /*
>    * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
> diff --git a/mm/gup.c b/mm/gup.c
> index 551264407624..ad74b137d363 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
>   		fault_flags |= FAULT_FLAG_WRITE;
>   	if (*flags & FOLL_REMOTE)
>   		fault_flags |= FAULT_FLAG_REMOTE;
> -	if (locked)
> +	if (locked) {
>   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> +		/*
> +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
> +		 * (at least) killable.  It also mostly means we're not
> +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
> +		 * it won't make a lot of sense to be used alone.
> +		 */

This comment seems a little confusing due to its location. We've just
checked "locked", but the comment is talking about other constraints.

Not sure what to suggest. Maybe move it somewhere else?

> +		if (*flags & FOLL_INTERRUPTIBLE)
> +			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
> +	}
>   	if (*flags & FOLL_NOWAIT)
>   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
>   	if (*flags & FOLL_TRIED) {
> @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm,
>   }
>   EXPORT_SYMBOL_GPL(fixup_user_fault);
>   
> +/*
> + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> + * specified, it'll also respond to generic signals.  The caller of GUP
> + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> + */
> +static bool gup_signal_pending(unsigned int flags)
> +{
> +	if (fatal_signal_pending(current))
> +		return true;
> +
> +	if (!(flags & FOLL_INTERRUPTIBLE))
> +		return false;
> +
> +	return signal_pending(current);
> +}
> +

OK.

>   /*
>    * Please note that this function, unlike __get_user_pages will not
>    * return 0 for nr_pages > 0 without FOLL_NOWAIT
> @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
>   		 * Repeat on the address that fired VM_FAULT_RETRY
>   		 * with both FAULT_FLAG_ALLOW_RETRY and
>   		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
> -		 * by fatal signals, so we need to check it before we
> +		 * by fatal signals of even common signals, depending on
> +		 * the caller's request. So we need to check it before we
>   		 * start trying again otherwise it can loop forever.
>   		 */
> -
> -		if (fatal_signal_pending(current)) {
> +		if (gup_signal_pending(flags)) {

This is new and bold. :) Signals that an application was prepared to
handle can now cause gup to quit early. I wonder if that will break any
use cases out there (SIGPIPE...) ?

Generally, gup callers handle failures pretty well, so it's probably
not too bad. But I wanted to mention the idea that handled interrupts
might be a little surprising here.

thanks,
Peter Xu June 28, 2022, 7:31 p.m. UTC | #6
Hi, John,

Thanks for your comments!

On Mon, Jun 27, 2022 at 07:07:28PM -0700, John Hubbard wrote:

[...]

> > @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
> >   #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
> >   #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
> >   #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
> > +#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */
> 
> Perhaps, s/generic/non-fatal/ ?

Sure.

> > diff --git a/mm/gup.c b/mm/gup.c
> > index 551264407624..ad74b137d363 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
> >   		fault_flags |= FAULT_FLAG_WRITE;
> >   	if (*flags & FOLL_REMOTE)
> >   		fault_flags |= FAULT_FLAG_REMOTE;
> > -	if (locked)
> > +	if (locked) {
> >   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> > +		/*
> > +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
> > +		 * (at least) killable.  It also mostly means we're not
> > +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
> > +		 * it won't make a lot of sense to be used alone.
> > +		 */
> 
> This comment seems a little confusing due to its location. We've just
> checked "locked", but the comment is talking about other constraints.
> 
> Not sure what to suggest. Maybe move it somewhere else?

I put it here to be after FAULT_FLAG_KILLABLE we just set.

Only if we have "locked" will we set FAULT_FLAG_KILLABLE.  That's also the
key we grant "killable" attribute to this GUP.  So I thought it'll be good
to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked"
being set.

> 
> > +		if (*flags & FOLL_INTERRUPTIBLE)
> > +			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
> > +	}
> >   	if (*flags & FOLL_NOWAIT)
> >   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
> >   	if (*flags & FOLL_TRIED) {
> > @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm,
> >   }
> >   EXPORT_SYMBOL_GPL(fixup_user_fault);
> > +/*
> > + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> > + * specified, it'll also respond to generic signals.  The caller of GUP
> > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> > + */
> > +static bool gup_signal_pending(unsigned int flags)
> > +{
> > +	if (fatal_signal_pending(current))
> > +		return true;
> > +
> > +	if (!(flags & FOLL_INTERRUPTIBLE))
> > +		return false;
> > +
> > +	return signal_pending(current);
> > +}
> > +
> 
> OK.
> 
> >   /*
> >    * Please note that this function, unlike __get_user_pages will not
> >    * return 0 for nr_pages > 0 without FOLL_NOWAIT
> > @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
> >   		 * Repeat on the address that fired VM_FAULT_RETRY
> >   		 * with both FAULT_FLAG_ALLOW_RETRY and
> >   		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
> > -		 * by fatal signals, so we need to check it before we
> > +		 * by fatal signals of even common signals, depending on
> > +		 * the caller's request. So we need to check it before we
> >   		 * start trying again otherwise it can loop forever.
> >   		 */
> > -
> > -		if (fatal_signal_pending(current)) {
> > +		if (gup_signal_pending(flags)) {
> 
> This is new and bold. :) Signals that an application was prepared to
> handle can now cause gup to quit early. I wonder if that will break any
> use cases out there (SIGPIPE...) ?

Note: I introduced the new FOLL_INTERRUPTIBLE flag, so only if the caller
explicitly passing in that flag could there be a functional change.

IOW, no functional change intended for this single patch, not before I
start to let KVM code passing over that flag.

> 
> Generally, gup callers handle failures pretty well, so it's probably
> not too bad. But I wanted to mention the idea that handled interrupts
> might be a little surprising here.

Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't
need to worry at all, IMHO, because it should really work exactly like
before, otherwise I had a bug somewhere else.. :)

Thanks,
John Hubbard June 28, 2022, 9:40 p.m. UTC | #7
On 6/28/22 12:31, Peter Xu wrote:
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 551264407624..ad74b137d363 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
>>>    		fault_flags |= FAULT_FLAG_WRITE;
>>>    	if (*flags & FOLL_REMOTE)
>>>    		fault_flags |= FAULT_FLAG_REMOTE;
>>> -	if (locked)
>>> +	if (locked) {
>>>    		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
>>> +		/*
>>> +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
>>> +		 * (at least) killable.  It also mostly means we're not
>>> +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
>>> +		 * it won't make a lot of sense to be used alone.
>>> +		 */
>>
>> This comment seems a little confusing due to its location. We've just
>> checked "locked", but the comment is talking about other constraints.
>>
>> Not sure what to suggest. Maybe move it somewhere else?
> 
> I put it here to be after FAULT_FLAG_KILLABLE we just set.
> 
> Only if we have "locked" will we set FAULT_FLAG_KILLABLE.  That's also the
> key we grant "killable" attribute to this GUP.  So I thought it'll be good
> to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked"
> being set.
> 

The key point is the connection between "locked" and killable. If the comment
explained why "locked" means "killable", that would help clear this up. The
NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
clear it up either... :)


>> Generally, gup callers handle failures pretty well, so it's probably
>> not too bad. But I wanted to mention the idea that handled interrupts
>> might be a little surprising here.
> 
> Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't
> need to worry at all, IMHO, because it should really work exactly like
> before, otherwise I had a bug somewhere else.. :)
> 

Yes, that's true. OK then.

thanks,
Peter Xu June 28, 2022, 10:33 p.m. UTC | #8
On Tue, Jun 28, 2022 at 02:40:53PM -0700, John Hubbard wrote:
> On 6/28/22 12:31, Peter Xu wrote:
> > > > diff --git a/mm/gup.c b/mm/gup.c
> > > > index 551264407624..ad74b137d363 100644
> > > > --- a/mm/gup.c
> > > > +++ b/mm/gup.c
> > > > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
> > > >    		fault_flags |= FAULT_FLAG_WRITE;
> > > >    	if (*flags & FOLL_REMOTE)
> > > >    		fault_flags |= FAULT_FLAG_REMOTE;
> > > > -	if (locked)
> > > > +	if (locked) {
> > > >    		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> > > > +		/*
> > > > +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
> > > > +		 * (at least) killable.  It also mostly means we're not
> > > > +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
> > > > +		 * it won't make a lot of sense to be used alone.
> > > > +		 */
> > > 
> > > This comment seems a little confusing due to its location. We've just
> > > checked "locked", but the comment is talking about other constraints.
> > > 
> > > Not sure what to suggest. Maybe move it somewhere else?
> > 
> > I put it here to be after FAULT_FLAG_KILLABLE we just set.
> > 
> > Only if we have "locked" will we set FAULT_FLAG_KILLABLE.  That's also the
> > key we grant "killable" attribute to this GUP.  So I thought it'll be good
> > to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked"
> > being set.
> > 
> 
> The key point is the connection between "locked" and killable. If the comment
> explained why "locked" means "killable", that would help clear this up. The
> NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
> clear it up either... :)

Sorry to have a comment that makes it feels confusing.  I tried to
explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but
obviously I didn't do my job well..

Maybe that NOWAIT thing adds more complexity but not even necessary.

Would below one more acceptable?

		/*
		 * We'll only be able to respond to signals when "locked !=
		 * NULL".  When with it, we'll always respond to SIGKILL
		 * (as implied by FAULT_FLAG_KILLABLE above), and we'll
		 * respond to non-fatal signals only if the GUP user has
		 * specified FOLL_INTERRUPTIBLE.
		 */

Thanks,
John Hubbard June 29, 2022, 12:31 a.m. UTC | #9
On 6/28/22 15:33, Peter Xu wrote:
>> The key point is the connection between "locked" and killable. If the comment
>> explained why "locked" means "killable", that would help clear this up. The
>> NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
>> clear it up either... :)
> 
> Sorry to have a comment that makes it feels confusing.  I tried to
> explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but
> obviously I didn't do my job well..
> 
> Maybe that NOWAIT thing adds more complexity but not even necessary.
> 
> Would below one more acceptable?
> 
> 		/*
> 		 * We'll only be able to respond to signals when "locked !=
> 		 * NULL".  When with it, we'll always respond to SIGKILL
> 		 * (as implied by FAULT_FLAG_KILLABLE above), and we'll
> 		 * respond to non-fatal signals only if the GUP user has
> 		 * specified FOLL_INTERRUPTIBLE.
> 		 */


It looks like part of this comment is trying to document a pre-existing
concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
if locked != NULL. The problem I am (personally) having is that I don't
yet understand why or how those are connected: what is it about having
locked non-NULL that means the process is killable? (Can you explain why
that is?)

If that were clear, I think I could suggest a good comment wording.




thanks,
Peter Xu June 29, 2022, 3:47 p.m. UTC | #10
On Tue, Jun 28, 2022 at 05:31:43PM -0700, John Hubbard wrote:
> On 6/28/22 15:33, Peter Xu wrote:
> > > The key point is the connection between "locked" and killable. If the comment
> > > explained why "locked" means "killable", that would help clear this up. The
> > > NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
> > > clear it up either... :)
> > 
> > Sorry to have a comment that makes it feels confusing.  I tried to
> > explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but
> > obviously I didn't do my job well..
> > 
> > Maybe that NOWAIT thing adds more complexity but not even necessary.
> > 
> > Would below one more acceptable?
> > 
> > 		/*
> > 		 * We'll only be able to respond to signals when "locked !=
> > 		 * NULL".  When with it, we'll always respond to SIGKILL
> > 		 * (as implied by FAULT_FLAG_KILLABLE above), and we'll
> > 		 * respond to non-fatal signals only if the GUP user has
> > 		 * specified FOLL_INTERRUPTIBLE.
> > 		 */
> 
> 
> It looks like part of this comment is trying to document a pre-existing
> concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
> if locked != NULL.

I'd say that's not what I wanted to comment.. I wanted to express that
INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to
be after KILLABLE, not before.  IMHO it makes sense already to have
"interruptible" only if "killable", no matter what's the pre-requisite for
KILLABLE (in this case it's having "locked" being non-null).

> The problem I am (personally) having is that I don't yet understand why
> or how those are connected: what is it about having locked non-NULL that
> means the process is killable? (Can you explain why that is?)

Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow
retry at all it means we'll never wait in handle_mm_fault() anyway, then no
need to worry on being interrupted by any kind of signal (fatal or not).

Then if we allow retry, we need some way to know "whether mmap_sem is
released or not" during the process for the caller (because the caller
cannot see VM_FAULT_RETRY).  That's why we added "locked" parameter, so
that we can set *locked=false to tell the caller we have released mmap_sem.

I think that's why we have "locked" defined as "we allow this page fault
request to retry and wait, during wait we can always allow fatal signals".
I think that's defined throughout the gup call interfaces too, and
faultin_page() is the last step to talk to handle_mm_fault().

To make this whole picture complete, NOWAIT is another thing that relies on
ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all".
For example, when we want to make sure no vma will be released after
faultin_page() returned.

> 
> If that were clear, I think I could suggest a good comment wording.

IMHO it's a little bit weird to explain "locked" here, especially after
KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd
attempt.  There are some comments for "locked" above the definition of
faultin_page(), I think that'll be a nicer place to enrich explanations for
"locked", and it seems even more suitable as a separate patch?

Thanks,
John Hubbard June 30, 2022, 1:53 a.m. UTC | #11
On 6/29/22 08:47, Peter Xu wrote:
>> It looks like part of this comment is trying to document a pre-existing
>> concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
>> if locked != NULL.
> 
> I'd say that's not what I wanted to comment.. I wanted to express that
> INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to
> be after KILLABLE, not before.  IMHO it makes sense already to have
> "interruptible" only if "killable", no matter what's the pre-requisite for
> KILLABLE (in this case it's having "locked" being non-null).
>

OK, I think I finally understand both the intention of the comment,
and (thanks to your notes, below) the interaction between *locked and
_RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading
me by the nose through that. The pre-existing code is abusing *locked
a bit, by treating it as a flag when really it is a side effect of
flags, but at least now that's clear to me.

Anyway...this leads to finally getting into the comment, which I now
think is not quite what we want: there is no need for a hierarchy of
"_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an
application allows a fatal signal to get through, it's not clear to me
that that implies that non-fatal signal handling should be prevented.

The code is only vaguely enforcing such a thing, because it just so
happens that both cases require the same basic prerequisites. So the
code looks good, but I don't see a need to claim a hierarchy in the
comments.

So I'd either delete the comment entirely, or go with something that is
doesn't try to talk about hierarchy nor locked/retry either. Does this
look reasonable to you:


	/*
	 * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set
	 * FOLL_INTERRUPTIBLE. That's because some callers may not be
	 * prepared to handle early exits caused by non-fatal signals.
	 */

?

>> The problem I am (personally) having is that I don't yet understand why
>> or how those are connected: what is it about having locked non-NULL that
>> means the process is killable? (Can you explain why that is?)
> 
> Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow
> retry at all it means we'll never wait in handle_mm_fault() anyway, then no
> need to worry on being interrupted by any kind of signal (fatal or not).
> 
> Then if we allow retry, we need some way to know "whether mmap_sem is
> released or not" during the process for the caller (because the caller
> cannot see VM_FAULT_RETRY).  That's why we added "locked" parameter, so
> that we can set *locked=false to tell the caller we have released mmap_sem.
> 
> I think that's why we have "locked" defined as "we allow this page fault
> request to retry and wait, during wait we can always allow fatal signals".
> I think that's defined throughout the gup call interfaces too, and
> faultin_page() is the last step to talk to handle_mm_fault().
> 
> To make this whole picture complete, NOWAIT is another thing that relies on
> ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all".
> For example, when we want to make sure no vma will be released after
> faultin_page() returned.
> 

Again, thanks for taking the time to explain that for me. :)

>>
>> If that were clear, I think I could suggest a good comment wording.
> 
> IMHO it's a little bit weird to explain "locked" here, especially after
> KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd
> attempt.  There are some comments for "locked" above the definition of
> faultin_page(), I think that'll be a nicer place to enrich explanations for
> "locked", and it seems even more suitable as a separate patch?
> 

Totally agreed. I didn't intend to ask for that kind of documentation
here.

For that, I'm thinking a combination of cleaning up *locked a little
bit, plus maybe some higher level notes like what you wrote above, added
to either pin_user_pages.rst or a new get_user_pages.rst or some .rst
anyway. Definitely a separately thing.


thanks,
Peter Xu June 30, 2022, 1:49 p.m. UTC | #12
On Wed, Jun 29, 2022 at 06:53:30PM -0700, John Hubbard wrote:
> On 6/29/22 08:47, Peter Xu wrote:
> > > It looks like part of this comment is trying to document a pre-existing
> > > concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
> > > if locked != NULL.
> > 
> > I'd say that's not what I wanted to comment.. I wanted to express that
> > INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to
> > be after KILLABLE, not before.  IMHO it makes sense already to have
> > "interruptible" only if "killable", no matter what's the pre-requisite for
> > KILLABLE (in this case it's having "locked" being non-null).
> > 
> 
> OK, I think I finally understand both the intention of the comment,
> and (thanks to your notes, below) the interaction between *locked and
> _RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading
> me by the nose through that. The pre-existing code is abusing *locked
> a bit, by treating it as a flag when really it is a side effect of
> flags, but at least now that's clear to me.

I agree, alternatively we could have some other FOLL_ flags to represent
"locked != NULL" and do sanity check to make sure when the flag is there
locked is always set correctly.  Current code is a more "dense" way to do
this, even though it could be slightly harder to follow.

> 
> Anyway...this leads to finally getting into the comment, which I now
> think is not quite what we want: there is no need for a hierarchy of
> "_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an
> application allows a fatal signal to get through, it's not clear to me
> that that implies that non-fatal signal handling should be prevented.
> 
> The code is only vaguely enforcing such a thing, because it just so
> happens that both cases require the same basic prerequisites. So the
> code looks good, but I don't see a need to claim a hierarchy in the
> comments.
> 
> So I'd either delete the comment entirely, or go with something that is
> doesn't try to talk about hierarchy nor locked/retry either. Does this
> look reasonable to you:
> 
> 
> 	/*
> 	 * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set
> 	 * FOLL_INTERRUPTIBLE. That's because some callers may not be
> 	 * prepared to handle early exits caused by non-fatal signals.
> 	 */
> 
> ?

Looks good to me, I'd tune a bit to make it less ambiguous on a few places:

		/*
		 * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set
		 * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE.
		 * That's because some callers may not be prepared to
		 * handle early exits caused by non-fatal signals.
		 */

Would that be okay to you?

> 
> > > The problem I am (personally) having is that I don't yet understand why
> > > or how those are connected: what is it about having locked non-NULL that
> > > means the process is killable? (Can you explain why that is?)
> > 
> > Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow
> > retry at all it means we'll never wait in handle_mm_fault() anyway, then no
> > need to worry on being interrupted by any kind of signal (fatal or not).
> > 
> > Then if we allow retry, we need some way to know "whether mmap_sem is
> > released or not" during the process for the caller (because the caller
> > cannot see VM_FAULT_RETRY).  That's why we added "locked" parameter, so
> > that we can set *locked=false to tell the caller we have released mmap_sem.
> > 
> > I think that's why we have "locked" defined as "we allow this page fault
> > request to retry and wait, during wait we can always allow fatal signals".
> > I think that's defined throughout the gup call interfaces too, and
> > faultin_page() is the last step to talk to handle_mm_fault().
> > 
> > To make this whole picture complete, NOWAIT is another thing that relies on
> > ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all".
> > For example, when we want to make sure no vma will be released after
> > faultin_page() returned.
> > 
> 
> Again, thanks for taking the time to explain that for me. :)

My thanks for reviewing!

> 
> > > 
> > > If that were clear, I think I could suggest a good comment wording.
> > 
> > IMHO it's a little bit weird to explain "locked" here, especially after
> > KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd
> > attempt.  There are some comments for "locked" above the definition of
> > faultin_page(), I think that'll be a nicer place to enrich explanations for
> > "locked", and it seems even more suitable as a separate patch?
> > 
> 
> Totally agreed. I didn't intend to ask for that kind of documentation
> here.
> 
> For that, I'm thinking a combination of cleaning up *locked a little
> bit, plus maybe some higher level notes like what you wrote above, added
> to either pin_user_pages.rst or a new get_user_pages.rst or some .rst
> anyway. Definitely a separately thing.

Sounds good.

Thanks,
John Hubbard June 30, 2022, 7:01 p.m. UTC | #13
On 6/30/22 06:49, Peter Xu wrote:
> Looks good to me, I'd tune a bit to make it less ambiguous on a few places:
> 
> 		/*
> 		 * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set
> 		 * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE.
> 		 * That's because some callers may not be prepared to
> 		 * handle early exits caused by non-fatal signals.
> 		 */
> 
> Would that be okay to you?
> 

Yes, looks good. With that change, please feel free to add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>



thanks,
Peter Xu June 30, 2022, 9:27 p.m. UTC | #14
On Thu, Jun 30, 2022 at 12:01:53PM -0700, John Hubbard wrote:
> On 6/30/22 06:49, Peter Xu wrote:
> > Looks good to me, I'd tune a bit to make it less ambiguous on a few places:
> > 
> > 		/*
> > 		 * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set
> > 		 * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE.
> > 		 * That's because some callers may not be prepared to
> > 		 * handle early exits caused by non-fatal signals.
> > 		 */
> > 
> > Would that be okay to you?
> > 
> 
> Yes, looks good. With that change, please feel free to add:
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Will do, thanks!
Matthew Wilcox July 4, 2022, 10:48 p.m. UTC | #15
On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote:
> +/*
> + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> + * specified, it'll also respond to generic signals.  The caller of GUP
> + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> + */
> +static bool gup_signal_pending(unsigned int flags)
> +{
> +	if (fatal_signal_pending(current))
> +		return true;
> +
> +	if (!(flags & FOLL_INTERRUPTIBLE))
> +		return false;
> +
> +	return signal_pending(current);
> +}

This should resemble signal_pending_state() more closely, if indeed not
be a wrapper of signal_pending_state().
Peter Xu July 7, 2022, 3:06 p.m. UTC | #16
On Mon, Jul 04, 2022 at 11:48:17PM +0100, Matthew Wilcox wrote:
> On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote:
> > +/*
> > + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> > + * specified, it'll also respond to generic signals.  The caller of GUP
> > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> > + */
> > +static bool gup_signal_pending(unsigned int flags)
> > +{
> > +	if (fatal_signal_pending(current))
> > +		return true;
> > +
> > +	if (!(flags & FOLL_INTERRUPTIBLE))
> > +		return false;
> > +
> > +	return signal_pending(current);
> > +}
> 
> This should resemble signal_pending_state() more closely, if indeed not
> be a wrapper of signal_pending_state().

Could you be more specific?  Note that the only thing that should affect
the signal handling here is FOLL_INTERRUPTIBLE, we don't allow anything
else being passed in, e.g. we don't take TASK_INTERRUPTIBLE or TASK_*.

Thanks,
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..ebdf8a6b86c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2941,6 +2941,7 @@  struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index 551264407624..ad74b137d363 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -933,8 +933,17 @@  static int faultin_page(struct vm_area_struct *vma,
 		fault_flags |= FAULT_FLAG_WRITE;
 	if (*flags & FOLL_REMOTE)
 		fault_flags |= FAULT_FLAG_REMOTE;
-	if (locked)
+	if (locked) {
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+		/*
+		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
+		 * (at least) killable.  It also mostly means we're not
+		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
+		 * it won't make a lot of sense to be used alone.
+		 */
+		if (*flags & FOLL_INTERRUPTIBLE)
+			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
+	}
 	if (*flags & FOLL_NOWAIT)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
 	if (*flags & FOLL_TRIED) {
@@ -1322,6 +1331,22 @@  int fixup_user_fault(struct mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(fixup_user_fault);
 
+/*
+ * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
+ * specified, it'll also respond to generic signals.  The caller of GUP
+ * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
+ */
+static bool gup_signal_pending(unsigned int flags)
+{
+	if (fatal_signal_pending(current))
+		return true;
+
+	if (!(flags & FOLL_INTERRUPTIBLE))
+		return false;
+
+	return signal_pending(current);
+}
+
 /*
  * Please note that this function, unlike __get_user_pages will not
  * return 0 for nr_pages > 0 without FOLL_NOWAIT
@@ -1403,11 +1428,11 @@  static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
 		 * Repeat on the address that fired VM_FAULT_RETRY
 		 * with both FAULT_FLAG_ALLOW_RETRY and
 		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
-		 * by fatal signals, so we need to check it before we
+		 * by fatal signals of even common signals, depending on
+		 * the caller's request. So we need to check it before we
 		 * start trying again otherwise it can loop forever.
 		 */
-
-		if (fatal_signal_pending(current)) {
+		if (gup_signal_pending(flags)) {
 			if (!pages_done)
 				pages_done = -EINTR;
 			break;