diff mbox series

mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

Message ID 20210519145014.3220164-1-atomlin@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt | expand

Commit Message

Aaron Tomlin May 19, 2021, 2:50 p.m. UTC
It does not make sense to retry compaction when the last known compact
result was skipped and a fatal signal is pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried when the
last known compaction result was skipped and in the unlikely case a
fatal signal is found pending. So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 mm/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Vlastimil Babka May 19, 2021, 3:22 p.m. UTC | #1
On 5/19/21 4:50 PM, Aaron Tomlin wrote:
> It does not make sense to retry compaction when the last known compact
> result was skipped and a fatal signal is pending.
> 
> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
> 
> This patch ensures that compaction is not needlessly retried when the
> last known compaction result was skipped and in the unlikely case a
> fatal signal is found pending. So, OOM is at least attempted.
> 
> Signed-off-by: Aaron Tomlin <atomlin@redhat.com>

Hm, indeed, if fatal_signal_pending() is true then try_to_compact_pages() will
bail out in the for-each-zone loop after trying a single zone and if that zone
keeps returning COMPACT_SKIPPED, things can get stuck.
And direct reclaim might see compaction_ready() for another zone and return 1,
faking the progress.
So your patch seems to be solving the issue. But maybe we should just do the
test at the beginning of should_compact_retry() and not specific to
compaction_needs_reclaim() - if there's a fatal signal, there will be no
compaction happening, so we should just say not to retry.
I suppose if the patch fixes your situation where fatal_signal_pending() was
true, there's hopefully not a more general problem with the retry logic?

> ---
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index aaa1655cf682..5f9aac27a1b5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4268,6 +4268,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	 * to work with, so we retry only if it looks like reclaim can help.
>  	 */
>  	if (compaction_needs_reclaim(compact_result)) {
> +		if (fatal_signal_pending(current))
> +			goto out;
>  		ret = compaction_zonelist_suitable(ac, order, alloc_flags);
>  		goto out;
>  	}
>
Aaron Tomlin May 19, 2021, 7:08 p.m. UTC | #2
On Wed 2021-05-19 17:22 +0200, Vlastimil Babka wrote:
> Hm, indeed, if fatal_signal_pending() is true then try_to_compact_pages() will
> bail out in the for-each-zone loop after trying a single zone and if that zone
> keeps returning COMPACT_SKIPPED, things can get stuck.
> And direct reclaim might see compaction_ready() for another zone and return 1,
> faking the progress.

Indeed.

> So your patch seems to be solving the issue. But maybe we should just do the
> test at the beginning of should_compact_retry() and not specific to
> compaction_needs_reclaim() - if there's a fatal signal, there will be no
> compaction happening, so we should just say not to retry.

Fair enough - I will post a v2.

> I suppose if the patch fixes your situation where fatal_signal_pending() was
> true, there's hopefully not a more general problem with the retry logic?

At the present time, not to my knowledge. That being said, I will continue
to review the relevant source code further.



Kind regards,
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..5f9aac27a1b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4268,6 +4268,8 @@  should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	 * to work with, so we retry only if it looks like reclaim can help.
 	 */
 	if (compaction_needs_reclaim(compact_result)) {
+		if (fatal_signal_pending(current))
+			goto out;
 		ret = compaction_zonelist_suitable(ac, order, alloc_flags);
 		goto out;
 	}