Message ID | 20220622213656.81546-2-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | kvm/mm: Allow GUP to respond to non fatal signals | expand |
On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote: > We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One > issue with it is that not all GUP paths are able to handle signal delivers > besides SIGKILL. > > That's not ideal for the GUP users who are actually able to handle these > cases, like KVM. > > KVM uses GUP extensively on faulting guest pages, during which we've got > existing infrastructures to retry a page fault at a later time. Allowing > the GUP to be interrupted by generic signals can make KVM related threads > to be more responsive. For examples: > > (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, > e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be > generated to kick the vcpus out of kernel context immediately, > > (2) SIGINT: which can be used with interactive hypervisor users to stop a > virtual machine with Ctrl-C without any delays/hangs, > > (3) SIGTRAP: which grants GDB capability even during page faults that are > stuck for a long time. > > Normally hypervisor will be able to receive these signals properly, but not > if we're stuck in a GUP for a long time for whatever reason. It happens > easily with a stucked postcopy migration when e.g. a network temp failure > happens, then some vcpu threads can hang death waiting for the pages. With > the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively > enable the ability to trap these signals. Can you talk abit about what is required to use this new interface correctly? Lots of GUP callers are in simple system call contexts (like ioctl), can/should they set this flag and if so what else do they need to do? Thanks, Jason
Hi, Jason, On Fri, Jun 24, 2022 at 09:35:54PM -0300, Jason Gunthorpe wrote: > Can you talk abit about what is required to use this new interface > correctly? > > Lots of GUP callers are in simple system call contexts (like ioctl), > can/should they set this flag and if so what else do they need to do? Thanks for taking a look. IMHO the major thing required is the caller can handle the case when GUP returned (1) less page than expected, and (2) -EINTR returns. For the -EINTR case, IIUC ideally in an ioctl context we should better deliver it back to user app this -EINTR (while do cleanups gracefully), rather than returning anything else (e.g. converting it to -EFAULT or something else). But note that FAULT_FLAG_INTERRUPTIBLE is only used in an userfaultfd context (aka, userfaultfd_get_blocking_state()). For example, if we hang at lock_page() (if not go into whether hanging at lock_page makes sense or not at all.. it really sounds like a bug) and we receive a non-fatal signal, we won't be able to be scheduled for that since lock_page() uses TASK_UNINTERRUPTIBLE always. I think it's a separate problem on whether we should extend the usage of FAULT_FLAG_INTERRUPTIBLE to things like lock_page() (and probably not..), and currently it does solve a major issue regarding postcopy hanging on pages for hypervisor use case. Hopefully that still justifies this plumber work to enable the interruptible cap to GUP layer. If to go back to the original question with a shorter answer: if the ioctl context that GUP upon a page that will never be with a uffd context, then it's probably not gonna help at all.. at least not before we use FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling. Thanks,
On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote: > If to go back to the original question with a shorter answer: if the ioctl > context that GUP upon a page that will never be with a uffd context, then > it's probably not gonna help at all.. at least not before we use > FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling. I think I would be more interested in this if it could abort a swap in, for instance. Doesn't this happen if it flows the interruptible flag into the VMA's fault handler? Jason
On Sat, Jun 25, 2022 at 08:59:04PM -0300, Jason Gunthorpe wrote: > On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote: > > If to go back to the original question with a shorter answer: if the ioctl > > context that GUP upon a page that will never be with a uffd context, then > > it's probably not gonna help at all.. at least not before we use > > FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling. > > I think I would be more interested in this if it could abort a swap > in, for instance. Doesn't this happen if it flows the interruptible > flag into the VMA's fault handler? The idea makes sense, but it doesn't work like that right now, afaict. We need to teach lock_page_or_retry() to be able to consume the flag as discussed. I don't see a major blocker for it if lock_page_or_retry() is the only one we'd like to touch. Say, the only caller currently is do_swap_page() (there's also remove_device_exclusive_entry() but just deeper in the stack). Looks doable so far but I'll need to think about it.. I can keep you updated if I get something, but it'll be separate from this patchset. Thanks,
On 6/22/22 14:36, Peter Xu wrote: > We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One > issue with it is that not all GUP paths are able to handle signal delivers > besides SIGKILL. > > That's not ideal for the GUP users who are actually able to handle these > cases, like KVM. > > KVM uses GUP extensively on faulting guest pages, during which we've got > existing infrastructures to retry a page fault at a later time. Allowing > the GUP to be interrupted by generic signals can make KVM related threads > to be more responsive. For examples: > > (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, > e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be > generated to kick the vcpus out of kernel context immediately, > > (2) SIGINT: which can be used with interactive hypervisor users to stop a > virtual machine with Ctrl-C without any delays/hangs, > > (3) SIGTRAP: which grants GDB capability even during page faults that are > stuck for a long time. > > Normally hypervisor will be able to receive these signals properly, but not > if we're stuck in a GUP for a long time for whatever reason. It happens > easily with a stucked postcopy migration when e.g. a network temp failure > happens, then some vcpu threads can hang death waiting for the pages. With > the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively > enable the ability to trap these signals. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/linux/mm.h | 1 + > mm/gup.c | 33 +++++++++++++++++++++++++++++---- > 2 files changed, 30 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index bc8f326be0ce..ebdf8a6b86c1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ > #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ > #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ > +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ Perhaps, s/generic/non-fatal/ ? > > /* > * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each > diff --git a/mm/gup.c b/mm/gup.c > index 551264407624..ad74b137d363 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, > fault_flags |= FAULT_FLAG_WRITE; > if (*flags & FOLL_REMOTE) > fault_flags |= FAULT_FLAG_REMOTE; > - if (locked) > + if (locked) { > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > + /* > + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're > + * (at least) killable. It also mostly means we're not > + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since > + * it won't make a lot of sense to be used alone. > + */ This comment seems a little confusing due to its location. We've just checked "locked", but the comment is talking about other constraints. Not sure what to suggest. Maybe move it somewhere else? > + if (*flags & FOLL_INTERRUPTIBLE) > + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; > + } > if (*flags & FOLL_NOWAIT) > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; > if (*flags & FOLL_TRIED) { > @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, > } > EXPORT_SYMBOL_GPL(fixup_user_fault); > > +/* > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > + * specified, it'll also respond to generic signals. The caller of GUP > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > + */ > +static bool gup_signal_pending(unsigned int flags) > +{ > + if (fatal_signal_pending(current)) > + return true; > + > + if (!(flags & FOLL_INTERRUPTIBLE)) > + return false; > + > + return signal_pending(current); > +} > + OK. > /* > * Please note that this function, unlike __get_user_pages will not > * return 0 for nr_pages > 0 without FOLL_NOWAIT > @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, > * Repeat on the address that fired VM_FAULT_RETRY > * with both FAULT_FLAG_ALLOW_RETRY and > * FAULT_FLAG_TRIED. Note that GUP can be interrupted > - * by fatal signals, so we need to check it before we > + * by fatal signals of even common signals, depending on > + * the caller's request. So we need to check it before we > * start trying again otherwise it can loop forever. > */ > - > - if (fatal_signal_pending(current)) { > + if (gup_signal_pending(flags)) { This is new and bold. :) Signals that an application was prepared to handle can now cause gup to quit early. I wonder if that will break any use cases out there (SIGPIPE...) ? Generally, gup callers handle failures pretty well, so it's probably not too bad. But I wanted to mention the idea that handled interrupts might be a little surprising here. thanks,
Hi, John, Thanks for your comments! On Mon, Jun 27, 2022 at 07:07:28PM -0700, John Hubbard wrote: [...] > > @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > > #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ > > #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ > > #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ > > +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ > > Perhaps, s/generic/non-fatal/ ? Sure. > > diff --git a/mm/gup.c b/mm/gup.c > > index 551264407624..ad74b137d363 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, > > fault_flags |= FAULT_FLAG_WRITE; > > if (*flags & FOLL_REMOTE) > > fault_flags |= FAULT_FLAG_REMOTE; > > - if (locked) > > + if (locked) { > > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > > + /* > > + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're > > + * (at least) killable. It also mostly means we're not > > + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since > > + * it won't make a lot of sense to be used alone. > > + */ > > This comment seems a little confusing due to its location. We've just > checked "locked", but the comment is talking about other constraints. > > Not sure what to suggest. Maybe move it somewhere else? I put it here to be after FAULT_FLAG_KILLABLE we just set. Only if we have "locked" will we set FAULT_FLAG_KILLABLE. That's also the key we grant "killable" attribute to this GUP. So I thought it'll be good to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked" being set. > > > + if (*flags & FOLL_INTERRUPTIBLE) > > + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; > > + } > > if (*flags & FOLL_NOWAIT) > > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; > > if (*flags & FOLL_TRIED) { > > @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, > > } > > EXPORT_SYMBOL_GPL(fixup_user_fault); > > +/* > > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > > + * specified, it'll also respond to generic signals. The caller of GUP > > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > > + */ > > +static bool gup_signal_pending(unsigned int flags) > > +{ > > + if (fatal_signal_pending(current)) > > + return true; > > + > > + if (!(flags & FOLL_INTERRUPTIBLE)) > > + return false; > > + > > + return signal_pending(current); > > +} > > + > > OK. > > > /* > > * Please note that this function, unlike __get_user_pages will not > > * return 0 for nr_pages > 0 without FOLL_NOWAIT > > @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, > > * Repeat on the address that fired VM_FAULT_RETRY > > * with both FAULT_FLAG_ALLOW_RETRY and > > * FAULT_FLAG_TRIED. Note that GUP can be interrupted > > - * by fatal signals, so we need to check it before we > > + * by fatal signals of even common signals, depending on > > + * the caller's request. So we need to check it before we > > * start trying again otherwise it can loop forever. > > */ > > - > > - if (fatal_signal_pending(current)) { > > + if (gup_signal_pending(flags)) { > > This is new and bold. :) Signals that an application was prepared to > handle can now cause gup to quit early. I wonder if that will break any > use cases out there (SIGPIPE...) ? Note: I introduced the new FOLL_INTERRUPTIBLE flag, so only if the caller explicitly passing in that flag could there be a functional change. IOW, no functional change intended for this single patch, not before I start to let KVM code passing over that flag. > > Generally, gup callers handle failures pretty well, so it's probably > not too bad. But I wanted to mention the idea that handled interrupts > might be a little surprising here. Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't need to worry at all, IMHO, because it should really work exactly like before, otherwise I had a bug somewhere else.. :) Thanks,
On 6/28/22 12:31, Peter Xu wrote: >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 551264407624..ad74b137d363 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, >>> fault_flags |= FAULT_FLAG_WRITE; >>> if (*flags & FOLL_REMOTE) >>> fault_flags |= FAULT_FLAG_REMOTE; >>> - if (locked) >>> + if (locked) { >>> fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; >>> + /* >>> + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're >>> + * (at least) killable. It also mostly means we're not >>> + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since >>> + * it won't make a lot of sense to be used alone. >>> + */ >> >> This comment seems a little confusing due to its location. We've just >> checked "locked", but the comment is talking about other constraints. >> >> Not sure what to suggest. Maybe move it somewhere else? > > I put it here to be after FAULT_FLAG_KILLABLE we just set. > > Only if we have "locked" will we set FAULT_FLAG_KILLABLE. That's also the > key we grant "killable" attribute to this GUP. So I thought it'll be good > to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked" > being set. > The key point is the connection between "locked" and killable. If the comment explained why "locked" means "killable", that would help clear this up. The NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not clear it up either... :) >> Generally, gup callers handle failures pretty well, so it's probably >> not too bad. But I wanted to mention the idea that handled interrupts >> might be a little surprising here. > > Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't > need to worry at all, IMHO, because it should really work exactly like > before, otherwise I had a bug somewhere else.. :) > Yes, that's true. OK then. thanks,
On Tue, Jun 28, 2022 at 02:40:53PM -0700, John Hubbard wrote: > On 6/28/22 12:31, Peter Xu wrote: > > > > diff --git a/mm/gup.c b/mm/gup.c > > > > index 551264407624..ad74b137d363 100644 > > > > --- a/mm/gup.c > > > > +++ b/mm/gup.c > > > > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, > > > > fault_flags |= FAULT_FLAG_WRITE; > > > > if (*flags & FOLL_REMOTE) > > > > fault_flags |= FAULT_FLAG_REMOTE; > > > > - if (locked) > > > > + if (locked) { > > > > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > > > > + /* > > > > + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're > > > > + * (at least) killable. It also mostly means we're not > > > > + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since > > > > + * it won't make a lot of sense to be used alone. > > > > + */ > > > > > > This comment seems a little confusing due to its location. We've just > > > checked "locked", but the comment is talking about other constraints. > > > > > > Not sure what to suggest. Maybe move it somewhere else? > > > > I put it here to be after FAULT_FLAG_KILLABLE we just set. > > > > Only if we have "locked" will we set FAULT_FLAG_KILLABLE. That's also the > > key we grant "killable" attribute to this GUP. So I thought it'll be good > > to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked" > > being set. > > > > The key point is the connection between "locked" and killable. If the comment > explained why "locked" means "killable", that would help clear this up. The > NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not > clear it up either... :) Sorry to have a comment that makes it feels confusing. I tried to explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but obviously I didn't do my job well.. Maybe that NOWAIT thing adds more complexity but not even necessary. Would below one more acceptable? /* * We'll only be able to respond to signals when "locked != * NULL". When with it, we'll always respond to SIGKILL * (as implied by FAULT_FLAG_KILLABLE above), and we'll * respond to non-fatal signals only if the GUP user has * specified FOLL_INTERRUPTIBLE. */ Thanks,
On 6/28/22 15:33, Peter Xu wrote: >> The key point is the connection between "locked" and killable. If the comment >> explained why "locked" means "killable", that would help clear this up. The >> NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not >> clear it up either... :) > > Sorry to have a comment that makes it feels confusing. I tried to > explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but > obviously I didn't do my job well.. > > Maybe that NOWAIT thing adds more complexity but not even necessary. > > Would below one more acceptable? > > /* > * We'll only be able to respond to signals when "locked != > * NULL". When with it, we'll always respond to SIGKILL > * (as implied by FAULT_FLAG_KILLABLE above), and we'll > * respond to non-fatal signals only if the GUP user has > * specified FOLL_INTERRUPTIBLE. > */ It looks like part of this comment is trying to document a pre-existing concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE if locked != NULL. The problem I am (personally) having is that I don't yet understand why or how those are connected: what is it about having locked non-NULL that means the process is killable? (Can you explain why that is?) If that were clear, I think I could suggest a good comment wording. thanks,
On Tue, Jun 28, 2022 at 05:31:43PM -0700, John Hubbard wrote: > On 6/28/22 15:33, Peter Xu wrote: > > > The key point is the connection between "locked" and killable. If the comment > > > explained why "locked" means "killable", that would help clear this up. The > > > NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not > > > clear it up either... :) > > > > Sorry to have a comment that makes it feels confusing. I tried to > > explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but > > obviously I didn't do my job well.. > > > > Maybe that NOWAIT thing adds more complexity but not even necessary. > > > > Would below one more acceptable? > > > > /* > > * We'll only be able to respond to signals when "locked != > > * NULL". When with it, we'll always respond to SIGKILL > > * (as implied by FAULT_FLAG_KILLABLE above), and we'll > > * respond to non-fatal signals only if the GUP user has > > * specified FOLL_INTERRUPTIBLE. > > */ > > > It looks like part of this comment is trying to document a pre-existing > concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE > if locked != NULL. I'd say that's not what I wanted to comment.. I wanted to express that INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to be after KILLABLE, not before. IMHO it makes sense already to have "interruptible" only if "killable", no matter what's the pre-requisite for KILLABLE (in this case it's having "locked" being non-null). > The problem I am (personally) having is that I don't yet understand why > or how those are connected: what is it about having locked non-NULL that > means the process is killable? (Can you explain why that is?) Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow retry at all it means we'll never wait in handle_mm_fault() anyway, then no need to worry on being interrupted by any kind of signal (fatal or not). Then if we allow retry, we need some way to know "whether mmap_sem is released or not" during the process for the caller (because the caller cannot see VM_FAULT_RETRY). That's why we added "locked" parameter, so that we can set *locked=false to tell the caller we have released mmap_sem. I think that's why we have "locked" defined as "we allow this page fault request to retry and wait, during wait we can always allow fatal signals". I think that's defined throughout the gup call interfaces too, and faultin_page() is the last step to talk to handle_mm_fault(). To make this whole picture complete, NOWAIT is another thing that relies on ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all". For example, when we want to make sure no vma will be released after faultin_page() returned. > > If that were clear, I think I could suggest a good comment wording. IMHO it's a little bit weird to explain "locked" here, especially after KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd attempt. There are some comments for "locked" above the definition of faultin_page(), I think that'll be a nicer place to enrich explanations for "locked", and it seems even more suitable as a separate patch? Thanks,
On 6/29/22 08:47, Peter Xu wrote: >> It looks like part of this comment is trying to document a pre-existing >> concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE >> if locked != NULL. > > I'd say that's not what I wanted to comment.. I wanted to express that > INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to > be after KILLABLE, not before. IMHO it makes sense already to have > "interruptible" only if "killable", no matter what's the pre-requisite for > KILLABLE (in this case it's having "locked" being non-null). > OK, I think I finally understand both the intention of the comment, and (thanks to your notes, below) the interaction between *locked and _RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading me by the nose through that. The pre-existing code is abusing *locked a bit, by treating it as a flag when really it is a side effect of flags, but at least now that's clear to me. Anyway...this leads to finally getting into the comment, which I now think is not quite what we want: there is no need for a hierarchy of "_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an application allows a fatal signal to get through, it's not clear to me that that implies that non-fatal signal handling should be prevented. The code is only vaguely enforcing such a thing, because it just so happens that both cases require the same basic prerequisites. So the code looks good, but I don't see a need to claim a hierarchy in the comments. So I'd either delete the comment entirely, or go with something that is doesn't try to talk about hierarchy nor locked/retry either. Does this look reasonable to you: /* * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set * FOLL_INTERRUPTIBLE. That's because some callers may not be * prepared to handle early exits caused by non-fatal signals. */ ? >> The problem I am (personally) having is that I don't yet understand why >> or how those are connected: what is it about having locked non-NULL that >> means the process is killable? (Can you explain why that is?) > > Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow > retry at all it means we'll never wait in handle_mm_fault() anyway, then no > need to worry on being interrupted by any kind of signal (fatal or not). > > Then if we allow retry, we need some way to know "whether mmap_sem is > released or not" during the process for the caller (because the caller > cannot see VM_FAULT_RETRY). That's why we added "locked" parameter, so > that we can set *locked=false to tell the caller we have released mmap_sem. > > I think that's why we have "locked" defined as "we allow this page fault > request to retry and wait, during wait we can always allow fatal signals". > I think that's defined throughout the gup call interfaces too, and > faultin_page() is the last step to talk to handle_mm_fault(). > > To make this whole picture complete, NOWAIT is another thing that relies on > ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all". > For example, when we want to make sure no vma will be released after > faultin_page() returned. > Again, thanks for taking the time to explain that for me. :) >> >> If that were clear, I think I could suggest a good comment wording. > > IMHO it's a little bit weird to explain "locked" here, especially after > KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd > attempt. There are some comments for "locked" above the definition of > faultin_page(), I think that'll be a nicer place to enrich explanations for > "locked", and it seems even more suitable as a separate patch? > Totally agreed. I didn't intend to ask for that kind of documentation here. For that, I'm thinking a combination of cleaning up *locked a little bit, plus maybe some higher level notes like what you wrote above, added to either pin_user_pages.rst or a new get_user_pages.rst or some .rst anyway. Definitely a separately thing. thanks,
On Wed, Jun 29, 2022 at 06:53:30PM -0700, John Hubbard wrote: > On 6/29/22 08:47, Peter Xu wrote: > > > It looks like part of this comment is trying to document a pre-existing > > > concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE > > > if locked != NULL. > > > > I'd say that's not what I wanted to comment.. I wanted to express that > > INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to > > be after KILLABLE, not before. IMHO it makes sense already to have > > "interruptible" only if "killable", no matter what's the pre-requisite for > > KILLABLE (in this case it's having "locked" being non-null). > > > > OK, I think I finally understand both the intention of the comment, > and (thanks to your notes, below) the interaction between *locked and > _RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading > me by the nose through that. The pre-existing code is abusing *locked > a bit, by treating it as a flag when really it is a side effect of > flags, but at least now that's clear to me. I agree, alternatively we could have some other FOLL_ flags to represent "locked != NULL" and do sanity check to make sure when the flag is there locked is always set correctly. Current code is a more "dense" way to do this, even though it could be slightly harder to follow. > > Anyway...this leads to finally getting into the comment, which I now > think is not quite what we want: there is no need for a hierarchy of > "_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an > application allows a fatal signal to get through, it's not clear to me > that that implies that non-fatal signal handling should be prevented. > > The code is only vaguely enforcing such a thing, because it just so > happens that both cases require the same basic prerequisites. So the > code looks good, but I don't see a need to claim a hierarchy in the > comments. > > So I'd either delete the comment entirely, or go with something that is > doesn't try to talk about hierarchy nor locked/retry either. Does this > look reasonable to you: > > > /* > * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set > * FOLL_INTERRUPTIBLE. That's because some callers may not be > * prepared to handle early exits caused by non-fatal signals. > */ > > ? Looks good to me, I'd tune a bit to make it less ambiguous on a few places: /* * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. * That's because some callers may not be prepared to * handle early exits caused by non-fatal signals. */ Would that be okay to you? > > > > The problem I am (personally) having is that I don't yet understand why > > > or how those are connected: what is it about having locked non-NULL that > > > means the process is killable? (Can you explain why that is?) > > > > Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow > > retry at all it means we'll never wait in handle_mm_fault() anyway, then no > > need to worry on being interrupted by any kind of signal (fatal or not). > > > > Then if we allow retry, we need some way to know "whether mmap_sem is > > released or not" during the process for the caller (because the caller > > cannot see VM_FAULT_RETRY). That's why we added "locked" parameter, so > > that we can set *locked=false to tell the caller we have released mmap_sem. > > > > I think that's why we have "locked" defined as "we allow this page fault > > request to retry and wait, during wait we can always allow fatal signals". > > I think that's defined throughout the gup call interfaces too, and > > faultin_page() is the last step to talk to handle_mm_fault(). > > > > To make this whole picture complete, NOWAIT is another thing that relies on > > ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all". > > For example, when we want to make sure no vma will be released after > > faultin_page() returned. > > > > Again, thanks for taking the time to explain that for me. :) My thanks for reviewing! > > > > > > > If that were clear, I think I could suggest a good comment wording. > > > > IMHO it's a little bit weird to explain "locked" here, especially after > > KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd > > attempt. There are some comments for "locked" above the definition of > > faultin_page(), I think that'll be a nicer place to enrich explanations for > > "locked", and it seems even more suitable as a separate patch? > > > > Totally agreed. I didn't intend to ask for that kind of documentation > here. > > For that, I'm thinking a combination of cleaning up *locked a little > bit, plus maybe some higher level notes like what you wrote above, added > to either pin_user_pages.rst or a new get_user_pages.rst or some .rst > anyway. Definitely a separately thing. Sounds good. Thanks,
On 6/30/22 06:49, Peter Xu wrote: > Looks good to me, I'd tune a bit to make it less ambiguous on a few places: > > /* > * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set > * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. > * That's because some callers may not be prepared to > * handle early exits caused by non-fatal signals. > */ > > Would that be okay to you? > Yes, looks good. With that change, please feel free to add: Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
On Thu, Jun 30, 2022 at 12:01:53PM -0700, John Hubbard wrote: > On 6/30/22 06:49, Peter Xu wrote: > > Looks good to me, I'd tune a bit to make it less ambiguous on a few places: > > > > /* > > * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set > > * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. > > * That's because some callers may not be prepared to > > * handle early exits caused by non-fatal signals. > > */ > > > > Would that be okay to you? > > > > Yes, looks good. With that change, please feel free to add: > > Reviewed-by: John Hubbard <jhubbard@nvidia.com> Will do, thanks!
On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote: > +/* > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > + * specified, it'll also respond to generic signals. The caller of GUP > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > + */ > +static bool gup_signal_pending(unsigned int flags) > +{ > + if (fatal_signal_pending(current)) > + return true; > + > + if (!(flags & FOLL_INTERRUPTIBLE)) > + return false; > + > + return signal_pending(current); > +} This should resemble signal_pending_state() more closely, if indeed not be a wrapper of signal_pending_state().
On Mon, Jul 04, 2022 at 11:48:17PM +0100, Matthew Wilcox wrote: > On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote: > > +/* > > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > > + * specified, it'll also respond to generic signals. The caller of GUP > > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > > + */ > > +static bool gup_signal_pending(unsigned int flags) > > +{ > > + if (fatal_signal_pending(current)) > > + return true; > > + > > + if (!(flags & FOLL_INTERRUPTIBLE)) > > + return false; > > + > > + return signal_pending(current); > > +} > > This should resemble signal_pending_state() more closely, if indeed not > be a wrapper of signal_pending_state(). Could you be more specific? Note that the only thing that should affect the signal handling here is FOLL_INTERRUPTIBLE, we don't allow anything else being passed in, e.g. we don't take TASK_INTERRUPTIBLE or TASK_*. Thanks,
diff --git a/include/linux/mm.h b/include/linux/mm.h index bc8f326be0ce..ebdf8a6b86c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index 551264407624..ad74b137d363 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (locked) + if (locked) { fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + /* + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're + * (at least) killable. It also mostly means we're not + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since + * it won't make a lot of sense to be used alone. + */ + if (*flags & FOLL_INTERRUPTIBLE) + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; + } if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, } EXPORT_SYMBOL_GPL(fixup_user_fault); +/* + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is + * specified, it'll also respond to generic signals. The caller of GUP + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. + */ +static bool gup_signal_pending(unsigned int flags) +{ + if (fatal_signal_pending(current)) + return true; + + if (!(flags & FOLL_INTERRUPTIBLE)) + return false; + + return signal_pending(current); +} + /* * Please note that this function, unlike __get_user_pages will not * return 0 for nr_pages > 0 without FOLL_NOWAIT @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, * Repeat on the address that fired VM_FAULT_RETRY * with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. Note that GUP can be interrupted - * by fatal signals, so we need to check it before we + * by fatal signals of even common signals, depending on + * the caller's request. So we need to check it before we * start trying again otherwise it can loop forever. */ - - if (fatal_signal_pending(current)) { + if (gup_signal_pending(flags)) { if (!pages_done) pages_done = -EINTR; break;
We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Signed-off-by: Peter Xu <peterx@redhat.com> --- include/linux/mm.h | 1 + mm/gup.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-)