diff mbox series

[1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process

Message ID 20201113173448.1863419-1-surenb@google.com (mailing list archive)
State New, archived
Headers show
Series [1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process | expand

Commit Message

Suren Baghdasaryan Nov. 13, 2020, 5:34 p.m. UTC
When a process is being killed it might be in an uninterruptible sleep
which leads to an unpredictable delay in its memory reclaim. In low memory
situations, when it's important to free up memory quickly, such delay is
problematic. Kernel solves this problem with oom-reaper thread which
performs memory reclaim even when the victim process is not runnable.
Userspace currently lacks such mechanisms and the need and potential
solutions were discussed before (see links below).
This patch provides a mechanism to perform memory reclaim in the context
of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for
pidfd_send_signal syscall can be used only when sending SIGKILL signal
and will lead to the caller synchronously reclaiming the memory that
belongs to the victim and can be easily reclaimed.

1. https://patchwork.kernel.org/cover/10894999
2. https://lwn.net/Articles/787217
3. https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/oom.h    |  2 ++
 include/linux/signal.h |  7 ++++
 kernel/signal.c        | 73 ++++++++++++++++++++++++++++++++++++++++--
 mm/oom_kill.c          |  2 +-
 4 files changed, 81 insertions(+), 3 deletions(-)

Comments

Andrew Morton Nov. 13, 2020, 11:55 p.m. UTC | #1
On Fri, 13 Nov 2020 09:34:48 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> When a process is being killed it might be in an uninterruptible sleep
> which leads to an unpredictable delay in its memory reclaim. In low memory
> situations, when it's important to free up memory quickly, such delay is
> problematic. Kernel solves this problem with oom-reaper thread which
> performs memory reclaim even when the victim process is not runnable.
> Userspace currently lacks such mechanisms and the need and potential
> solutions were discussed before (see links below).
> This patch provides a mechanism to perform memory reclaim in the context
> of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for
> pidfd_send_signal syscall can be used only when sending SIGKILL signal
> and will lead to the caller synchronously reclaiming the memory that
> belongs to the victim and can be easily reclaimed.

hm.

Seems to me that the ability to reap another process's memory is a
generally useful one, and that it should not be tied to delivering a
signal in this fashion.

And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
few changes and tweaks, but can't that be used to solve this problem?
Suren Baghdasaryan Nov. 14, 2020, 12:06 a.m. UTC | #2
On Fri, Nov 13, 2020 at 3:55 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 13 Nov 2020 09:34:48 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > When a process is being killed it might be in an uninterruptible sleep
> > which leads to an unpredictable delay in its memory reclaim. In low memory
> > situations, when it's important to free up memory quickly, such delay is
> > problematic. Kernel solves this problem with oom-reaper thread which
> > performs memory reclaim even when the victim process is not runnable.
> > Userspace currently lacks such mechanisms and the need and potential
> > solutions were discussed before (see links below).
> > This patch provides a mechanism to perform memory reclaim in the context
> > of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for
> > pidfd_send_signal syscall can be used only when sending SIGKILL signal
> > and will lead to the caller synchronously reclaiming the memory that
> > belongs to the victim and can be easily reclaimed.
>
> hm.
>
> Seems to me that the ability to reap another process's memory is a
> generally useful one, and that it should not be tied to delivering a
> signal in this fashion.
>
> And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> few changes and tweaks, but can't that be used to solve this problem?

Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
one of the options recently discussed in
https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
. The thread describes some of the issues with that approach but if we
limit it to processes with pending SIGKILL only then I think that
would be doable.
Andrew Morton Nov. 14, 2020, 1 a.m. UTC | #3
On Fri, 13 Nov 2020 16:06:25 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> On Fri, Nov 13, 2020 at 3:55 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 13 Nov 2020 09:34:48 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > When a process is being killed it might be in an uninterruptible sleep
> > > which leads to an unpredictable delay in its memory reclaim. In low memory
> > > situations, when it's important to free up memory quickly, such delay is
> > > problematic. Kernel solves this problem with oom-reaper thread which
> > > performs memory reclaim even when the victim process is not runnable.
> > > Userspace currently lacks such mechanisms and the need and potential
> > > solutions were discussed before (see links below).
> > > This patch provides a mechanism to perform memory reclaim in the context
> > > of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for
> > > pidfd_send_signal syscall can be used only when sending SIGKILL signal
> > > and will lead to the caller synchronously reclaiming the memory that
> > > belongs to the victim and can be easily reclaimed.
> >
> > hm.
> >
> > Seems to me that the ability to reap another process's memory is a
> > generally useful one, and that it should not be tied to delivering a
> > signal in this fashion.
> >
> > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > few changes and tweaks, but can't that be used to solve this problem?
> 
> Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> one of the options recently discussed in
> https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> . The thread describes some of the issues with that approach but if we
> limit it to processes with pending SIGKILL only then I think that
> would be doable.

Why would it be necessary to read /proc/pid/maps?  I'd have thought
that a starting effort would be

	madvise((void *)0, (void *)-1, MADV_PAGEOUT)

(after translation into process_madvise() speak).  Which is equivalent
to the proposed process_madvise(MADV_DONTNEED_MM)?

There may be things which trip this up, such as mlocked regions or
whatever, but we could add another madvise `advice' mode to handle
this?
Suren Baghdasaryan Nov. 14, 2020, 1:09 a.m. UTC | #4
On Fri, Nov 13, 2020 at 5:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 13 Nov 2020 16:06:25 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > On Fri, Nov 13, 2020 at 3:55 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Fri, 13 Nov 2020 09:34:48 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > > When a process is being killed it might be in an uninterruptible sleep
> > > > which leads to an unpredictable delay in its memory reclaim. In low memory
> > > > situations, when it's important to free up memory quickly, such delay is
> > > > problematic. Kernel solves this problem with oom-reaper thread which
> > > > performs memory reclaim even when the victim process is not runnable.
> > > > Userspace currently lacks such mechanisms and the need and potential
> > > > solutions were discussed before (see links below).
> > > > This patch provides a mechanism to perform memory reclaim in the context
> > > > of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for
> > > > pidfd_send_signal syscall can be used only when sending SIGKILL signal
> > > > and will lead to the caller synchronously reclaiming the memory that
> > > > belongs to the victim and can be easily reclaimed.
> > >
> > > hm.
> > >
> > > Seems to me that the ability to reap another process's memory is a
> > > generally useful one, and that it should not be tied to delivering a
> > > signal in this fashion.
> > >
> > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > few changes and tweaks, but can't that be used to solve this problem?
> >
> > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > one of the options recently discussed in
> > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > . The thread describes some of the issues with that approach but if we
> > limit it to processes with pending SIGKILL only then I think that
> > would be doable.
>
> Why would it be necessary to read /proc/pid/maps?  I'd have thought
> that a starting effort would be
>
>         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
>
> (after translation into process_madvise() speak).  Which is equivalent
> to the proposed process_madvise(MADV_DONTNEED_MM)?

Yep, this is very similar to option #3 in
https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
and I actually have a tested prototype for that. If that's the
preferred method then I can post it quite quickly.

>
> There may be things which trip this up, such as mlocked regions or
> whatever, but we could add another madvise `advice' mode to handle
> this?
Andrew Morton Nov. 14, 2020, 1:18 a.m. UTC | #5
On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> > > > Seems to me that the ability to reap another process's memory is a
> > > > generally useful one, and that it should not be tied to delivering a
> > > > signal in this fashion.
> > > >
> > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > few changes and tweaks, but can't that be used to solve this problem?
> > >
> > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > one of the options recently discussed in
> > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > . The thread describes some of the issues with that approach but if we
> > > limit it to processes with pending SIGKILL only then I think that
> > > would be doable.
> >
> > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > that a starting effort would be
> >
> >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> >
> > (after translation into process_madvise() speak).  Which is equivalent
> > to the proposed process_madvise(MADV_DONTNEED_MM)?
> 
> Yep, this is very similar to option #3 in
> https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> and I actually have a tested prototype for that.

Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
which spans the whole address range?

> If that's the
> preferred method then I can post it quite quickly.

I assume you've tested that prototype.  How did its usefulness compare
with this SIGKILL-based approach?
Suren Baghdasaryan Nov. 14, 2020, 1:57 a.m. UTC | #6
On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > > > > Seems to me that the ability to reap another process's memory is a
> > > > > generally useful one, and that it should not be tied to delivering a
> > > > > signal in this fashion.
> > > > >
> > > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > > few changes and tweaks, but can't that be used to solve this problem?
> > > >
> > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > > one of the options recently discussed in
> > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > . The thread describes some of the issues with that approach but if we
> > > > limit it to processes with pending SIGKILL only then I think that
> > > > would be doable.
> > >
> > > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > > that a starting effort would be
> > >
> > >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> > >
> > > (after translation into process_madvise() speak).  Which is equivalent
> > > to the proposed process_madvise(MADV_DONTNEED_MM)?
> >
> > Yep, this is very similar to option #3 in
> > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > and I actually have a tested prototype for that.
>
> Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
> which spans the whole address range?

That would be the option #4 from the same discussion and the issues
noted there are "process_madvise return value can't handle such a
large number of bytes and there is MAX_RW_COUNT limit on max number of
bytes one process_madvise call can handle". In my prototype I have a
special handling for such "bulk operation" to work around the
MAX_RW_COUNT limitation.

>
> > If that's the
> > preferred method then I can post it quite quickly.
>
> I assume you've tested that prototype.  How did its usefulness compare
> with this SIGKILL-based approach?

Just to make sure I understand correctly your question, you are asking
about performance comparison of:

// approach in this RFC
pidfd_send_signal(SIGKILL, SYNC_REAP_MM)

vs

// option #4 in the previous RFC
kill(SIGKILL); process_madvise(vector=NULL, MADV_DONTNEED);

If so, I have results for the current RFC approach but the previous
approach was testing on an older device, so don't have
apples-to-apples comparison results at the moment. I can collect the
data for fair comparison if desired, however I don't expect a
noticeable performance difference since they both do pretty much the
same thing (even on different devices my results are quite close). I
think it's more a question of which API would be more appropriate.

>
Andrew Morton Nov. 14, 2020, 2:16 a.m. UTC | #7
On Fri, 13 Nov 2020 17:57:02 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > > > > Seems to me that the ability to reap another process's memory is a
> > > > > > generally useful one, and that it should not be tied to delivering a
> > > > > > signal in this fashion.
> > > > > >
> > > > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > > > few changes and tweaks, but can't that be used to solve this problem?
> > > > >
> > > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > > > one of the options recently discussed in
> > > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > > . The thread describes some of the issues with that approach but if we
> > > > > limit it to processes with pending SIGKILL only then I think that
> > > > > would be doable.
> > > >
> > > > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > > > that a starting effort would be
> > > >
> > > >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> > > >
> > > > (after translation into process_madvise() speak).  Which is equivalent
> > > > to the proposed process_madvise(MADV_DONTNEED_MM)?
> > >
> > > Yep, this is very similar to option #3 in
> > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > and I actually have a tested prototype for that.
> >
> > Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
> > which spans the whole address range?
> 
> That would be the option #4 from the same discussion and the issues
> noted there are "process_madvise return value can't handle such a
> large number of bytes and there is MAX_RW_COUNT limit on max number of
> bytes one process_madvise call can handle". In my prototype I have a
> special handling for such "bulk operation" to work around the
> MAX_RW_COUNT limitation.

Ah, OK, return value.  Maybe process_madvise() shouldn't have done that
and should have simply returned 0 on success, like madvise().

I guess a special "nuke whole address space" command is OK.  But, again
in the search for generality, the ability to nuke very large amounts of
address space (but not the entire address space) would be better. 

The process_madvise() return value issue could be addressed by adding a
process_madvise() mode which return 0 on success.

And I guess the MAX_RW_COUNT issue is solvable by adding an
import_iovec() arg to say "don't check that".  Along those lines.

It's all sounding a bit painful (but not *too* painful).  But to
reiterate, I do think that adding the ability for a process to shoot
down a large amount of another process's memory is a lot more generally
useful than tying it to SIGKILL, agree?

> >
> > > If that's the
> > > preferred method then I can post it quite quickly.
> >
> > I assume you've tested that prototype.  How did its usefulness compare
> > with this SIGKILL-based approach?
> 
> Just to make sure I understand correctly your question, you are asking
> about performance comparison of:
> 
> // approach in this RFC
> pidfd_send_signal(SIGKILL, SYNC_REAP_MM)
> 
> vs
> 
> // option #4 in the previous RFC
> kill(SIGKILL); process_madvise(vector=NULL, MADV_DONTNEED);
> 
> If so, I have results for the current RFC approach but the previous
> approach was testing on an older device, so don't have
> apples-to-apples comparison results at the moment. I can collect the
> data for fair comparison if desired, however I don't expect a
> noticeable performance difference since they both do pretty much the
> same thing (even on different devices my results are quite close). I
> think it's more a question of which API would be more appropriate.

OK.  I wouldn't expect performance to be very different (and things can
be sped up if so), but the API usefulness might be an issue.  Using
process_madvise() (or similar) makes it a two-step operation, whereas
tying it to SIGKILL&&TASK_UNINTERRUPTIBLE provides a more precise tool.
Any thoughts on this?
Suren Baghdasaryan Nov. 14, 2020, 2:51 a.m. UTC | #8
On Fri, Nov 13, 2020 at 6:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 13 Nov 2020 17:57:02 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > > > > > Seems to me that the ability to reap another process's memory is a
> > > > > > > generally useful one, and that it should not be tied to delivering a
> > > > > > > signal in this fashion.
> > > > > > >
> > > > > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > > > > few changes and tweaks, but can't that be used to solve this problem?
> > > > > >
> > > > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > > > > one of the options recently discussed in
> > > > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > > > . The thread describes some of the issues with that approach but if we
> > > > > > limit it to processes with pending SIGKILL only then I think that
> > > > > > would be doable.
> > > > >
> > > > > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > > > > that a starting effort would be
> > > > >
> > > > >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> > > > >
> > > > > (after translation into process_madvise() speak).  Which is equivalent
> > > > > to the proposed process_madvise(MADV_DONTNEED_MM)?
> > > >
> > > > Yep, this is very similar to option #3 in
> > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > and I actually have a tested prototype for that.
> > >
> > > Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
> > > which spans the whole address range?
> >
> > That would be the option #4 from the same discussion and the issues
> > noted there are "process_madvise return value can't handle such a
> > large number of bytes and there is MAX_RW_COUNT limit on max number of
> > bytes one process_madvise call can handle". In my prototype I have a
> > special handling for such "bulk operation" to work around the
> > MAX_RW_COUNT limitation.
>
> Ah, OK, return value.  Maybe process_madvise() shouldn't have done that
> and should have simply returned 0 on success, like madvise().
>
> I guess a special "nuke whole address space" command is OK.  But, again
> in the search for generality, the ability to nuke very large amounts of
> address space (but not the entire address space) would be better.
>
> The process_madvise() return value issue could be addressed by adding a
> process_madvise() mode which return 0 on success.
>
> And I guess the MAX_RW_COUNT issue is solvable by adding an
> import_iovec() arg to say "don't check that".  Along those lines.
>
> It's all sounding a bit painful (but not *too* painful).  But to
> reiterate, I do think that adding the ability for a process to shoot
> down a large amount of another process's memory is a lot more generally
> useful than tying it to SIGKILL, agree?

I see. So you are suggesting a mode where process_madvise() can
operate on large areas spanning multiple VMAs. This slightly differs
from option 4 in the previous RFC which suggested a special mode that
operates on the *entire* mm of the process. I agree, your suggestion
is more generic.

>
> > >
> > > > If that's the
> > > > preferred method then I can post it quite quickly.
> > >
> > > I assume you've tested that prototype.  How did its usefulness compare
> > > with this SIGKILL-based approach?
> >
> > Just to make sure I understand correctly your question, you are asking
> > about performance comparison of:
> >
> > // approach in this RFC
> > pidfd_send_signal(SIGKILL, SYNC_REAP_MM)
> >
> > vs
> >
> > // option #4 in the previous RFC
> > kill(SIGKILL); process_madvise(vector=NULL, MADV_DONTNEED);
> >
> > If so, I have results for the current RFC approach but the previous
> > approach was testing on an older device, so don't have
> > apples-to-apples comparison results at the moment. I can collect the
> > data for fair comparison if desired, however I don't expect a
> > noticeable performance difference since they both do pretty much the
> > same thing (even on different devices my results are quite close). I
> > think it's more a question of which API would be more appropriate.
>
> OK.  I wouldn't expect performance to be very different (and things can
> be sped up if so), but the API usefulness might be an issue.  Using
> process_madvise() (or similar) makes it a two-step operation, whereas
> tying it to SIGKILL&&TASK_UNINTERRUPTIBLE provides a more precise tool.
> Any thoughts on this?
>
Minchan Kim Nov. 16, 2020, 11:24 p.m. UTC | #9
On Fri, Nov 13, 2020 at 06:16:32PM -0800, Andrew Morton wrote:
> On Fri, 13 Nov 2020 17:57:02 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> 
> > On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > > > > > Seems to me that the ability to reap another process's memory is a
> > > > > > > generally useful one, and that it should not be tied to delivering a
> > > > > > > signal in this fashion.
> > > > > > >
> > > > > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > > > > few changes and tweaks, but can't that be used to solve this problem?
> > > > > >
> > > > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > > > > one of the options recently discussed in
> > > > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > > > . The thread describes some of the issues with that approach but if we
> > > > > > limit it to processes with pending SIGKILL only then I think that
> > > > > > would be doable.
> > > > >
> > > > > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > > > > that a starting effort would be
> > > > >
> > > > >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> > > > >
> > > > > (after translation into process_madvise() speak).  Which is equivalent
> > > > > to the proposed process_madvise(MADV_DONTNEED_MM)?
> > > >
> > > > Yep, this is very similar to option #3 in
> > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com
> > > > and I actually have a tested prototype for that.
> > >
> > > Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
> > > which spans the whole address range?
> > 
> > That would be the option #4 from the same discussion and the issues
> > noted there are "process_madvise return value can't handle such a
> > large number of bytes and there is MAX_RW_COUNT limit on max number of
> > bytes one process_madvise call can handle". In my prototype I have a
> > special handling for such "bulk operation" to work around the
> > MAX_RW_COUNT limitation.
> 
> Ah, OK, return value.  Maybe process_madvise() shouldn't have done that
> and should have simply returned 0 on success, like madvise().
> 
> I guess a special "nuke whole address space" command is OK.  But, again
> in the search for generality, the ability to nuke very large amounts of
> address space (but not the entire address space) would be better. 
> 
> The process_madvise() return value issue could be addressed by adding a
> process_madvise() mode which return 0 on success.
> 
> And I guess the MAX_RW_COUNT issue is solvable by adding an
> import_iovec() arg to say "don't check that".  Along those lines.
> 
> It's all sounding a bit painful (but not *too* painful).  But to
> reiterate, I do think that adding the ability for a process to shoot
> down a large amount of another process's memory is a lot more generally
> useful than tying it to SIGKILL, agree?

I agree the direction but I think it's the general problem for every
APIs have supported iovec and not sure process_madvise is special to
chage it.
IOW, it wouldn't be a problem to support *entire address space* special
mode but not sure to support *large amount of address space* at the cost
of breaking existing iovec scheme.
Christian Brauner Nov. 18, 2020, 10:32 a.m. UTC | #10
On Fri, Nov 13, 2020 at 03:55:39PM -0800, Andrew Morton wrote:
> On Fri, 13 Nov 2020 09:34:48 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
> 
> > When a process is being killed it might be in an uninterruptible sleep
> > which leads to an unpredictable delay in its memory reclaim. In low memory
> > situations, when it's important to free up memory quickly, such delay is
> > problematic. Kernel solves this problem with oom-reaper thread which
> > performs memory reclaim even when the victim process is not runnable.
> > Userspace currently lacks such mechanisms and the need and potential
> > solutions were discussed before (see links below).
> > This patch provides a mechanism to perform memory reclaim in the context
> > of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for
> > pidfd_send_signal syscall can be used only when sending SIGKILL signal
> > and will lead to the caller synchronously reclaiming the memory that
> > belongs to the victim and can be easily reclaimed.
> 
> hm.
> 
> Seems to me that the ability to reap another process's memory is a
> generally useful one, and that it should not be tied to delivering a
> signal in this fashion.

I agree and I see you've already had some good ideas how to tie this to
process_madvise(). If that's workable for your use-case then I'd prefer
that approach. Signals are almost always not a great choice.

Christian
Michal Hocko Nov. 18, 2020, 7:10 p.m. UTC | #11
On Fri 13-11-20 18:16:32, Andrew Morton wrote:
[...]
> It's all sounding a bit painful (but not *too* painful).  But to
> reiterate, I do think that adding the ability for a process to shoot
> down a large amount of another process's memory is a lot more generally
> useful than tying it to SIGKILL, agree?

I am not sure TBH. Is there any reasonable usecase where uncoordinated
memory tear down is OK and a target process which is able to see the
unmapped memory?
Suren Baghdasaryan Nov. 18, 2020, 7:22 p.m. UTC | #12
On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> [...]
> > It's all sounding a bit painful (but not *too* painful).  But to
> > reiterate, I do think that adding the ability for a process to shoot
> > down a large amount of another process's memory is a lot more generally
> > useful than tying it to SIGKILL, agree?
>
> I am not sure TBH. Is there any reasonable usecase where uncoordinated
> memory tear down is OK and a target process which is able to see the
> unmapped memory?

I think uncoordinated memory tear down is a special case which makes
sense only when the target process is being killed (and we can enforce
that by allowing MADV_DONTNEED to be used only if the target process
has pending SIGKILL). However, the ability to apply other flavors of
process_madvise() to large memory areas spanning multiple VMAs can be
useful in more cases. For example in Android we will use
process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
process. Today we have to read /proc/maps and construct the vector of
VMAs even when applying this advice to the entire process. With such a
special mode we could achieve this more efficiently and with less
hussle.

> --
> Michal Hocko
> SUSE Labs
Michal Hocko Nov. 18, 2020, 7:32 p.m. UTC | #13
On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > [...]
> > > It's all sounding a bit painful (but not *too* painful).  But to
> > > reiterate, I do think that adding the ability for a process to shoot
> > > down a large amount of another process's memory is a lot more generally
> > > useful than tying it to SIGKILL, agree?
> >
> > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > memory tear down is OK and a target process which is able to see the
> > unmapped memory?
> 
> I think uncoordinated memory tear down is a special case which makes
> sense only when the target process is being killed (and we can enforce
> that by allowing MADV_DONTNEED to be used only if the target process
> has pending SIGKILL).

That would be safe but then I am wondering whether it makes sense to
implement as a madvise call. It is quite strange to expect somebody call
a syscall on a killed process. But this is more a detail. I am not a
great fan of a more generic MADV_DONTNEED on a remote process. This is
just too dangerous IMHO.

> However, the ability to apply other flavors of
> process_madvise() to large memory areas spanning multiple VMAs can be
> useful in more cases.

Yes I do agree with that. The error reporting would be more tricky but
I am not really sure that the exact reporting is really necessary for
advice like interface.

> For example in Android we will use
> process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> process.

That makes sense to me.
Suren Baghdasaryan Nov. 18, 2020, 7:51 p.m. UTC | #14
On Wed, Nov 18, 2020 at 11:32 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> > On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > > [...]
> > > > It's all sounding a bit painful (but not *too* painful).  But to
> > > > reiterate, I do think that adding the ability for a process to shoot
> > > > down a large amount of another process's memory is a lot more generally
> > > > useful than tying it to SIGKILL, agree?
> > >
> > > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > > memory tear down is OK and a target process which is able to see the
> > > unmapped memory?
> >
> > I think uncoordinated memory tear down is a special case which makes
> > sense only when the target process is being killed (and we can enforce
> > that by allowing MADV_DONTNEED to be used only if the target process
> > has pending SIGKILL).
>
> That would be safe but then I am wondering whether it makes sense to
> implement as a madvise call. It is quite strange to expect somebody call
> a syscall on a killed process. But this is more a detail. I am not a
> great fan of a more generic MADV_DONTNEED on a remote process. This is
> just too dangerous IMHO.

Agree 100%

>
> > However, the ability to apply other flavors of
> > process_madvise() to large memory areas spanning multiple VMAs can be
> > useful in more cases.
>
> Yes I do agree with that. The error reporting would be more tricky but
> I am not really sure that the exact reporting is really necessary for
> advice like interface.

Andrew's suggestion for this special mode to change return semantics
to the usual "0 or error code" seems to me like the most reasonable
way to deal with the return value limitation.

>
> > For example in Android we will use
> > process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> > process.
>
> That makes sense to me.
> --
> Michal Hocko
> SUSE Labs
Suren Baghdasaryan Nov. 18, 2020, 7:55 p.m. UTC | #15
On Wed, Nov 18, 2020 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Nov 18, 2020 at 11:32 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> > > On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > > > [...]
> > > > > It's all sounding a bit painful (but not *too* painful).  But to
> > > > > reiterate, I do think that adding the ability for a process to shoot
> > > > > down a large amount of another process's memory is a lot more generally
> > > > > useful than tying it to SIGKILL, agree?
> > > >
> > > > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > > > memory tear down is OK and a target process which is able to see the
> > > > unmapped memory?
> > >
> > > I think uncoordinated memory tear down is a special case which makes
> > > sense only when the target process is being killed (and we can enforce
> > > that by allowing MADV_DONTNEED to be used only if the target process
> > > has pending SIGKILL).
> >
> > That would be safe but then I am wondering whether it makes sense to
> > implement as a madvise call. It is quite strange to expect somebody call
> > a syscall on a killed process. But this is more a detail. I am not a
> > great fan of a more generic MADV_DONTNEED on a remote process. This is
> > just too dangerous IMHO.
>
> Agree 100%

I assumed here that by "a more generic MADV_DONTNEED on a remote
process" you meant "process_madvise(MADV_DONTNEED) applied to a
process that is not being killed". Re-reading your comment I realized
that you might have meant "process_madvice() with generic support to
large memory areas". I hope I understood you correctly.

>
> >
> > > However, the ability to apply other flavors of
> > > process_madvise() to large memory areas spanning multiple VMAs can be
> > > useful in more cases.
> >
> > Yes I do agree with that. The error reporting would be more tricky but
> > I am not really sure that the exact reporting is really necessary for
> > advice like interface.
>
> Andrew's suggestion for this special mode to change return semantics
> to the usual "0 or error code" seems to me like the most reasonable
> way to deal with the return value limitation.
>
> >
> > > For example in Android we will use
> > > process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> > > process.
> >
> > That makes sense to me.
> > --
> > Michal Hocko
> > SUSE Labs
Suren Baghdasaryan Nov. 19, 2020, 12:13 a.m. UTC | #16
On Wed, Nov 18, 2020 at 11:55 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Nov 18, 2020 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 11:32 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> > > > On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > > > > [...]
> > > > > > It's all sounding a bit painful (but not *too* painful).  But to
> > > > > > reiterate, I do think that adding the ability for a process to shoot
> > > > > > down a large amount of another process's memory is a lot more generally
> > > > > > useful than tying it to SIGKILL, agree?

I was looking into how to work around the limitation of MAX_RW_COUNT
and the conceptual issue there is the "struct iovec" which has its
iov_len as size_t that lacks capacity for expressing ranges like
"entire process memory". I would like to check your reaction to the
following idea which can be implemented without painful surgeries to
the import_iovec and its friends.

process_madvise(pidfd, iovec = [ { range_start_addr, 0 }, {
range_end_addr, 0 } ], vlen = 2, behavior=MADV_xxx, flags =
PMADV_FLAG_RANGE)

So, to represent a range we pass a new PMADV_FLAG_RANGE flag and
construct a 2-element vector to express range start and range end
using iovec.iov_base members. iov_len member of the iovec elements is
ignored in this mode. I know it sounds hacky but I think it's the
simplest way if we want the ability to express an arbitrarily large
range.
Another option is to do what Andrew described as "madvise((void *)0,
(void *)-1, MADV_PAGEOUT)" which means this mode works only with the
entire mm of the process.
WDYT?

> > > > >
> > > > > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > > > > memory tear down is OK and a target process which is able to see the
> > > > > unmapped memory?
> > > >
> > > > I think uncoordinated memory tear down is a special case which makes
> > > > sense only when the target process is being killed (and we can enforce
> > > > that by allowing MADV_DONTNEED to be used only if the target process
> > > > has pending SIGKILL).
> > >
> > > That would be safe but then I am wondering whether it makes sense to
> > > implement as a madvise call. It is quite strange to expect somebody call
> > > a syscall on a killed process. But this is more a detail. I am not a
> > > great fan of a more generic MADV_DONTNEED on a remote process. This is
> > > just too dangerous IMHO.
> >
> > Agree 100%
>
> I assumed here that by "a more generic MADV_DONTNEED on a remote
> process" you meant "process_madvise(MADV_DONTNEED) applied to a
> process that is not being killed". Re-reading your comment I realized
> that you might have meant "process_madvice() with generic support to
> large memory areas". I hope I understood you correctly.
>
> >
> > >
> > > > However, the ability to apply other flavors of
> > > > process_madvise() to large memory areas spanning multiple VMAs can be
> > > > useful in more cases.
> > >
> > > Yes I do agree with that. The error reporting would be more tricky but
> > > I am not really sure that the exact reporting is really necessary for
> > > advice like interface.
> >
> > Andrew's suggestion for this special mode to change return semantics
> > to the usual "0 or error code" seems to me like the most reasonable
> > way to deal with the return value limitation.
> >
> > >
> > > > For example in Android we will use
> > > > process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> > > > process.
> > >
> > > That makes sense to me.
> > > --
> > > Michal Hocko
> > > SUSE Labs
Suren Baghdasaryan Nov. 24, 2020, 5:45 a.m. UTC | #17
On Wed, Nov 18, 2020 at 4:13 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Nov 18, 2020 at 11:55 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Wed, Nov 18, 2020 at 11:32 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> > > > > On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > >
> > > > > > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > > > > > [...]
> > > > > > > It's all sounding a bit painful (but not *too* painful).  But to
> > > > > > > reiterate, I do think that adding the ability for a process to shoot
> > > > > > > down a large amount of another process's memory is a lot more generally
> > > > > > > useful than tying it to SIGKILL, agree?
>
> I was looking into how to work around the limitation of MAX_RW_COUNT
> and the conceptual issue there is the "struct iovec" which has its
> iov_len as size_t that lacks capacity for expressing ranges like
> "entire process memory". I would like to check your reaction to the
> following idea which can be implemented without painful surgeries to
> the import_iovec and its friends.
>
> process_madvise(pidfd, iovec = [ { range_start_addr, 0 }, {
> range_end_addr, 0 } ], vlen = 2, behavior=MADV_xxx, flags =
> PMADV_FLAG_RANGE)
>
> So, to represent a range we pass a new PMADV_FLAG_RANGE flag and
> construct a 2-element vector to express range start and range end
> using iovec.iov_base members. iov_len member of the iovec elements is
> ignored in this mode. I know it sounds hacky but I think it's the
> simplest way if we want the ability to express an arbitrarily large
> range.
> Another option is to do what Andrew described as "madvise((void *)0,
> (void *)-1, MADV_PAGEOUT)" which means this mode works only with the
> entire mm of the process.
> WDYT?
>

To follow up on this discussion, I posted a patchset to implement
process_madvise(MADV_DONTNEED) supporting the entire mm range at
https://lkml.org/lkml/2020/11/24/21.

> > > > > >
> > > > > > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > > > > > memory tear down is OK and a target process which is able to see the
> > > > > > unmapped memory?
> > > > >
> > > > > I think uncoordinated memory tear down is a special case which makes
> > > > > sense only when the target process is being killed (and we can enforce
> > > > > that by allowing MADV_DONTNEED to be used only if the target process
> > > > > has pending SIGKILL).
> > > >
> > > > That would be safe but then I am wondering whether it makes sense to
> > > > implement as a madvise call. It is quite strange to expect somebody call
> > > > a syscall on a killed process. But this is more a detail. I am not a
> > > > great fan of a more generic MADV_DONTNEED on a remote process. This is
> > > > just too dangerous IMHO.
> > >
> > > Agree 100%
> >
> > I assumed here that by "a more generic MADV_DONTNEED on a remote
> > process" you meant "process_madvise(MADV_DONTNEED) applied to a
> > process that is not being killed". Re-reading your comment I realized
> > that you might have meant "process_madvice() with generic support to
> > large memory areas". I hope I understood you correctly.
> >
> > >
> > > >
> > > > > However, the ability to apply other flavors of
> > > > > process_madvise() to large memory areas spanning multiple VMAs can be
> > > > > useful in more cases.
> > > >
> > > > Yes I do agree with that. The error reporting would be more tricky but
> > > > I am not really sure that the exact reporting is really necessary for
> > > > advice like interface.
> > >
> > > Andrew's suggestion for this special mode to change return semantics
> > > to the usual "0 or error code" seems to me like the most reasonable
> > > way to deal with the return value limitation.
> > >
> > > >
> > > > > For example in Android we will use
> > > > > process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> > > > > process.
> > > >
> > > > That makes sense to me.
> > > > --
> > > > Michal Hocko
> > > > SUSE Labs
diff mbox series

Patch

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 2db9a1432511..9a8dcabdfdf1 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -111,6 +111,8 @@  bool __oom_reap_task_mm(struct mm_struct *mm);
 long oom_badness(struct task_struct *p,
 		unsigned long totalpages);
 
+extern bool task_will_free_mem(struct task_struct *task);
+
 extern bool out_of_memory(struct oom_control *oc);
 
 extern void exit_oom_victim(void);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index b256f9c65661..5deafc99035d 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -449,6 +449,13 @@  extern bool unhandled_signal(struct task_struct *tsk, int sig);
 	(!siginmask(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \
 	 (t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL)
 
+/*
+ * Flag values used in pidfd_send_signal:
+ *
+ * SYNC_REAP_MM indicates request to reclaim mm after SIGKILL.
+ */
+#define SYNC_REAP_MM	0x1
+
 void signals_init(void);
 
 int restore_altstack(const stack_t __user *);
diff --git a/kernel/signal.c b/kernel/signal.c
index ef8f2a28d37c..15d4be5600a3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -46,6 +46,7 @@ 
 #include <linux/livepatch.h>
 #include <linux/cgroup.h>
 #include <linux/audit.h>
+#include <linux/oom.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
@@ -3711,6 +3712,63 @@  static struct pid *pidfd_to_pid(const struct file *file)
 	return tgid_pidfd_to_pid(file);
 }
 
+static int reap_mm(struct pid *pid)
+{
+	struct task_struct *task;
+	struct mm_struct *mm;
+	int ret = 0;
+
+	/* Get the task_struct */
+	task = get_pid_task(pid, PIDTYPE_PID);
+	if (!task) {
+		ret = -ESRCH;
+		goto out;
+	}
+
+	task_lock(task);
+
+	/* Check if memory can be easily reclaimed */
+	if (!task_will_free_mem(task)) {
+		task_unlock(task);
+		ret = -EBUSY;
+		goto release_task;
+	}
+
+	/* Get mm to prevent exit_mmap */
+	mm = task->mm;
+	mmget(mm);
+
+	/* Ensure no competition with OOM-killer to prevent contention */
+	if (unlikely(mm_is_oom_victim(mm)) ||
+	    unlikely(test_bit(MMF_OOM_SKIP, &mm->flags))) {
+		/* Already being reclaimed */
+		task_unlock(task);
+		goto drop_mm;
+	}
+	/*
+	 * Prevent OOM-killer or other pidfd_send_signal from considering
+	 * this task
+	 */
+	set_bit(MMF_OOM_SKIP, &mm->flags);
+
+	task_unlock(task);
+
+	mmap_read_lock(mm);
+	if (!__oom_reap_task_mm(mm)) {
+		/* Failed to reap part of the address space. User can retry */
+		ret = -EAGAIN;
+		clear_bit(MMF_OOM_SKIP, &mm->flags);
+	}
+	mmap_read_unlock(mm);
+
+drop_mm:
+	mmput(mm);
+release_task:
+	put_task_struct(task);
+out:
+	return ret;
+}
+
 /**
  * sys_pidfd_send_signal - Signal a process through a pidfd
  * @pidfd:  file descriptor of the process
@@ -3737,10 +3795,16 @@  SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	struct pid *pid;
 	kernel_siginfo_t kinfo;
 
-	/* Enforce flags be set to 0 until we add an extension. */
-	if (flags)
+	/* Enforce only valid flags. */
+	if (flags) {
+		/* Allow SYNC_REAP_MM only with SIGKILL. */
+		if (flags == SYNC_REAP_MM && sig == SIGKILL)
+			goto valid;
+
 		return -EINVAL;
+	}
 
+valid:
 	f = fdget(pidfd);
 	if (!f.file)
 		return -EBADF;
@@ -3775,6 +3839,11 @@  SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	}
 
 	ret = kill_pid_info(sig, &kinfo, pid);
+	if (unlikely(ret))
+		goto err;
+
+	if (flags & SYNC_REAP_MM)
+		ret = reap_mm(pid);
 
 err:
 	fdput(f);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8b84661a6410..66c90bca25bc 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -808,7 +808,7 @@  static inline bool __task_will_free_mem(struct task_struct *task)
  * Caller has to make sure that task->mm is stable (hold task_lock or
  * it operates on the current).
  */
-static bool task_will_free_mem(struct task_struct *task)
+bool task_will_free_mem(struct task_struct *task)
 {
 	struct mm_struct *mm = task->mm;
 	struct task_struct *p;