diff mbox series

[v2] mm, oom: prevent soft lockup on memcg oom for UP systems

Message ID alpine.DEB.2.21.2003171752030.115787@chino.kir.corp.google.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm, oom: prevent soft lockup on memcg oom for UP systems | expand

Commit Message

David Rientjes March 18, 2020, 12:55 a.m. UTC
When a process is oom killed as a result of memcg limits and the victim
is waiting to exit, nothing ends up actually yielding the processor back
to the victim on UP systems with preemption disabled.  Instead, the
charging process simply loops in memcg reclaim and eventually soft
lockups.

Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, 
anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB 
oom_score_adj:0
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [repro:806]
CPU: 0 PID: 806 Comm: repro Not tainted 5.6.0-rc5+ #136
RIP: 0010:shrink_lruvec+0x4e9/0xa40
...
Call Trace:
 shrink_node+0x40d/0x7d0
 do_try_to_free_pages+0x13f/0x470
 try_to_free_mem_cgroup_pages+0x16d/0x230
 try_charge+0x247/0xac0
 mem_cgroup_try_charge+0x10a/0x220
 mem_cgroup_try_charge_delay+0x1e/0x40
 handle_mm_fault+0xdf2/0x15f0
 do_user_addr_fault+0x21f/0x420
 page_fault+0x2f/0x40

Make sure that once the oom killer has been called that we forcibly yield 
if current is not the chosen victim regardless of priority to allow for 
memory freeing.  The same situation can theoretically occur in the page 
allocator, so do this after dropping oom_lock there as well.

Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Tested-by: Robert Kolchmeyer <rkolchmeyer@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/memcontrol.c | 2 ++
 mm/page_alloc.c | 2 ++
 2 files changed, 4 insertions(+)

Comments

Michal Hocko March 18, 2020, 9:42 a.m. UTC | #1
On Tue 17-03-20 17:55:04, David Rientjes wrote:
> When a process is oom killed as a result of memcg limits and the victim
> is waiting to exit, nothing ends up actually yielding the processor back
> to the victim on UP systems with preemption disabled.  Instead, the
> charging process simply loops in memcg reclaim and eventually soft
> lockups.

It seems that my request to describe the setup got ignored. Sigh.

> Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, 
> anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB 
> oom_score_adj:0
> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [repro:806]
> CPU: 0 PID: 806 Comm: repro Not tainted 5.6.0-rc5+ #136
> RIP: 0010:shrink_lruvec+0x4e9/0xa40
> ...
> Call Trace:
>  shrink_node+0x40d/0x7d0
>  do_try_to_free_pages+0x13f/0x470
>  try_to_free_mem_cgroup_pages+0x16d/0x230
>  try_charge+0x247/0xac0
>  mem_cgroup_try_charge+0x10a/0x220
>  mem_cgroup_try_charge_delay+0x1e/0x40
>  handle_mm_fault+0xdf2/0x15f0
>  do_user_addr_fault+0x21f/0x420
>  page_fault+0x2f/0x40
> 
> Make sure that once the oom killer has been called that we forcibly yield 
> if current is not the chosen victim regardless of priority to allow for 
> memory freeing.  The same situation can theoretically occur in the page 
> allocator, so do this after dropping oom_lock there as well.

I would have prefered the cond_resched solution proposed previously but
I can live with this as well. I would just ask to add more information
to the changelog. E.g.
"
We used to have a short sleep after the oom handling but 9bfe5ded054b
("mm, oom: remove sleep from under oom_lock") has removed it because
sleep inside the oom_lock is dangerous. This patch restores the sleep
outside of the lock.
"
> Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Tested-by: Robert Kolchmeyer <rkolchmeyer@google.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/memcontrol.c | 2 ++
>  mm/page_alloc.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1576,6 +1576,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	 */
>  	ret = should_force_charge() || out_of_memory(&oc);
>  	mutex_unlock(&oom_lock);
> +	if (!fatal_signal_pending(current))
> +		schedule_timeout_killable(1);

Check for fatal_signal_pending is redundant.
David Rientjes March 18, 2020, 9:40 p.m. UTC | #2
On Wed, 18 Mar 2020, Michal Hocko wrote:

> > When a process is oom killed as a result of memcg limits and the victim
> > is waiting to exit, nothing ends up actually yielding the processor back
> > to the victim on UP systems with preemption disabled.  Instead, the
> > charging process simply loops in memcg reclaim and eventually soft
> > lockups.
> 
> It seems that my request to describe the setup got ignored. Sigh.
> 
> > Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, 
> > anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB 
> > oom_score_adj:0
> > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [repro:806]
> > CPU: 0 PID: 806 Comm: repro Not tainted 5.6.0-rc5+ #136
> > RIP: 0010:shrink_lruvec+0x4e9/0xa40
> > ...
> > Call Trace:
> >  shrink_node+0x40d/0x7d0
> >  do_try_to_free_pages+0x13f/0x470
> >  try_to_free_mem_cgroup_pages+0x16d/0x230
> >  try_charge+0x247/0xac0
> >  mem_cgroup_try_charge+0x10a/0x220
> >  mem_cgroup_try_charge_delay+0x1e/0x40
> >  handle_mm_fault+0xdf2/0x15f0
> >  do_user_addr_fault+0x21f/0x420
> >  page_fault+0x2f/0x40
> > 
> > Make sure that once the oom killer has been called that we forcibly yield 
> > if current is not the chosen victim regardless of priority to allow for 
> > memory freeing.  The same situation can theoretically occur in the page 
> > allocator, so do this after dropping oom_lock there as well.
> 
> I would have prefered the cond_resched solution proposed previously but
> I can live with this as well. I would just ask to add more information
> to the changelog. E.g.

I'm still planning on sending the cond_resched() change as well, but not 
as advertised to fix this particular issue per Tetsuo's feedback.  I think 
the reported issue showed it's possible to excessively loop in reclaim 
without a conditional yield depending on various memcg configs and the 
shrink_node_memcgs() cond_resched() is still appropriate for interactivity 
but also because the iteration of memcgs can be particularly long.

> "
> We used to have a short sleep after the oom handling but 9bfe5ded054b
> ("mm, oom: remove sleep from under oom_lock") has removed it because
> sleep inside the oom_lock is dangerous. This patch restores the sleep
> outside of the lock.

Will do.

> "
> > Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> > Tested-by: Robert Kolchmeyer <rkolchmeyer@google.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> >  mm/memcontrol.c | 2 ++
> >  mm/page_alloc.c | 2 ++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1576,6 +1576,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  	 */
> >  	ret = should_force_charge() || out_of_memory(&oc);
> >  	mutex_unlock(&oom_lock);
> > +	if (!fatal_signal_pending(current))
> > +		schedule_timeout_killable(1);
> 
> Check for fatal_signal_pending is redundant.
> 
> -- 
> Michal Hocko
> SUSE Labs
>
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1576,6 +1576,8 @@  static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	ret = should_force_charge() || out_of_memory(&oc);
 	mutex_unlock(&oom_lock);
+	if (!fatal_signal_pending(current))
+		schedule_timeout_killable(1);
 	return ret;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3861,6 +3861,8 @@  __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	}
 out:
 	mutex_unlock(&oom_lock);
+	if (!fatal_signal_pending(current))
+		schedule_timeout_killable(1);
 	return page;
 }