diff mbox series

[v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

Message ID 20210519201743.3260890-1-atomlin@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt | expand

Commit Message

Aaron Tomlin May 19, 2021, 8:17 p.m. UTC
It does not make sense to retry compaction when a fatal signal is
pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Andrew Morton May 20, 2021, 4:34 a.m. UTC | #1
On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:

> It does not make sense to retry compaction when a fatal signal is
> pending.

Well, it might make sense.  Presumably it is beneficial to other tasks.

> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
> 
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.

What observed problems motivated this change?

What were the observed runtime effects of this change?
Vlastimil Babka May 20, 2021, 10:20 a.m. UTC | #2
On 5/20/21 6:34 AM, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:
> 
>> It does not make sense to retry compaction when a fatal signal is
>> pending.
> 
> Well, it might make sense.  Presumably it is beneficial to other tasks.

Yeah but the compaction won't happen. compact_zone() will immediately detect it
via __compact_finished() and bail out. So in that sense it does not make sense
to retry :)

>> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
>> returned; albeit, not every zone, on the zone list, would be considered
>> in the case a fatal signal is found to be pending.
>> Yet, in should_compact_retry(), given the last known compaction result,
>> each zone, on the zone list, can be considered/or checked
>> (see compaction_zonelist_suitable()). For example, if a zone was found
>> to succeed, then reclaim/compaction would be tried again
>> (notwithstanding the above).
>> 
>> This patch ensures that compaction is not needlessly retried
>> irrespective of the last known compaction result e.g. if it was skipped,
>> in the unlikely case a fatal signal is found pending.
>> So, OOM is at least attempted.
> 
> What observed problems motivated this change?
> 
> What were the observed runtime effects of this change?

Yep those details from the previous thread should be included here.
Matthew Wilcox May 20, 2021, 11:09 a.m. UTC | #3
On Wed, May 19, 2021 at 09:34:55PM -0700, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:
> 
> > It does not make sense to retry compaction when a fatal signal is
> > pending.
> 
> Well, it might make sense.  Presumably it is beneficial to other tasks.

Apart from Vlastimil's point, if I hit ^C, I want the task to die,
as soon as possible.  I don't want it to do things which are beneficial
to other tasks, I want my shell prompt back, not spending seconds
trying to compact memory.  Some other task can do that if _it_ needs
large contiguous chunks.
Aaron Tomlin May 20, 2021, 11:42 a.m. UTC | #4
On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> On 5/20/21 6:34 AM, Andrew Morton wrote:
> > 
> > What observed problems motivated this change?
> > 
> > What were the observed runtime effects of this change?
> 
> Yep those details from the previous thread should be included here.

Fair enough.

During kernel crash dump/or vmcore analysis: I discovered in the context of
__alloc_pages_slowpath() the value stored in the no_progress_loops variable
was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
signal was pending against current.


     #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
     #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
     #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130

                                                             //      w28 = *(sp + 148)      /* no_progress_loops */
     0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>:      ldr     w0, [sp,#148]
                                                             //      w0 = w0 + 0x1
     0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>:      add     w0, w0, #0x1
                                                             //      *(sp + 148) = w0
     0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>:      str     w0, [sp,#148]
                                                             //      if (w0 >= 0x10)
                                                             //          goto __alloc_pages_nodemask+0x904
     0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>:      cmp     w0, #0x10
     0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>:      b.gt    0xffff0000102cd534

 - The stack pointer was 0xffff00002e78f900

     crash> p *(int *)(0xffff00002e78f900+148)
     $1 = 31611688

     crash> ps 521171
        PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
     > 521171      1  36  ffff8080e2128800  RU   0.0 34789440  18624  special

     crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
     $2 = (struct sigpending *) 0xffff80809a416e40

     crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
     $3 = 0x804100

     crash> sig -s 0x804100
     SIGKILL SIGTERM SIGXCPU

     crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
     $4 = 0x100


Unfortunately, this incident was not reproduced, to date.





Kind regards,
Matthew Wilcox May 20, 2021, 11:56 a.m. UTC | #5
On Thu, May 20, 2021 at 12:42:57PM +0100, Aaron Tomlin wrote:
> On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> > On 5/20/21 6:34 AM, Andrew Morton wrote:
> > > 
> > > What observed problems motivated this change?
> > > 
> > > What were the observed runtime effects of this change?
> > 
> > Yep those details from the previous thread should be included here.
> 
> Fair enough.
> 
> During kernel crash dump/or vmcore analysis: I discovered in the context of
> __alloc_pages_slowpath() the value stored in the no_progress_loops variable
> was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
> signal was pending against current.

While this is true, it's not really answering Andrew's question.
What we want as part of the commit message is something like:

"A customer experienced a low memory situation and sent their task a
fatal signal.  Instead of dying promptly, it looped in the page
allocator failing to make progress because ..."

> 
>      #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
>      #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
>      #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
> 
>                                                              //      w28 = *(sp + 148)      /* no_progress_loops */
>      0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>:      ldr     w0, [sp,#148]
>                                                              //      w0 = w0 + 0x1
>      0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>:      add     w0, w0, #0x1
>                                                              //      *(sp + 148) = w0
>      0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>:      str     w0, [sp,#148]
>                                                              //      if (w0 >= 0x10)
>                                                              //          goto __alloc_pages_nodemask+0x904
>      0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>:      cmp     w0, #0x10
>      0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>:      b.gt    0xffff0000102cd534
> 
>  - The stack pointer was 0xffff00002e78f900
> 
>      crash> p *(int *)(0xffff00002e78f900+148)
>      $1 = 31611688
> 
>      crash> ps 521171
>         PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>      > 521171      1  36  ffff8080e2128800  RU   0.0 34789440  18624  special
> 
>      crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
>      $2 = (struct sigpending *) 0xffff80809a416e40
> 
>      crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
>      $3 = 0x804100
> 
>      crash> sig -s 0x804100
>      SIGKILL SIGTERM SIGXCPU
> 
>      crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
>      $4 = 0x100
> 
> 
> Unfortunately, this incident was not reproduced, to date.
> 
> 
> 
> 
> 
> Kind regards,
> 
> -- 
> Aaron Tomlin
Aaron Tomlin May 20, 2021, 1:30 p.m. UTC | #6
Matthew,

On Thu 2021-05-20 12:56 +0100, Matthew Wilcox wrote:
> While this is true, it's not really answering Andrew's question.
> What we want as part of the commit message is something like:
> 
> "A customer experienced a low memory situation and sent their task a
> fatal signal.  Instead of dying promptly, it looped in the page
> allocator failing to make progress because ..."

Apologies, I did not understand - I will follow up with a v4.


Kind regards,
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@  should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (!order)
 		return false;
 
+	if (fatal_signal_pending(current))
+		return false;
+
 	if (compaction_made_progress(compact_result))
 		(*compaction_retries)++;