diff mbox series

memcg, oom: do not bypass oom killer for dying tasks

Message ID 20250402090117.130245-1-mhocko@kernel.org (mailing list archive)
State New
Headers show
Series memcg, oom: do not bypass oom killer for dying tasks | expand

Commit Message

Michal Hocko April 2, 2025, 9:01 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

7775face2079 ("memcg: killed threads should not invoke memcg OOM killer") has added
a bypass of the oom killer path for dying threads because a very
specific workload (described in the changelog) could hit "no killable
tasks" path. This itself is not fatal condition but it could be annoying
if this was a common case.

On the other hand the bypass has some issues on its own. Without
triggering oom killer we won't be able to trigger async oom reclaim
(oom_reaper) which can operate on killed tasks as well as long as they
still have their mm available. This could be the case during futex
cleanup when the memory as pointed out by Johannes in [1]. The said case
is still not fully understood but let's drop this bypass that was mostly
driven by an artificial workload and allow dying tasks to go into oom
path. This will make the code easier to reason about and also help
corner cases where oom_reaper could help to release memory.

[1] https://lore.kernel.org/all/20241212183012.GB1026@cmpxchg.org/T/#u

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Johannes Weiner April 2, 2025, 3:27 p.m. UTC | #1
On Wed, Apr 02, 2025 at 11:01:17AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> 7775face2079 ("memcg: killed threads should not invoke memcg OOM killer") has added
> a bypass of the oom killer path for dying threads because a very
> specific workload (described in the changelog) could hit "no killable
> tasks" path. This itself is not fatal condition but it could be annoying
> if this was a common case.
> 
> On the other hand the bypass has some issues on its own. Without
> triggering oom killer we won't be able to trigger async oom reclaim
> (oom_reaper) which can operate on killed tasks as well as long as they
> still have their mm available. This could be the case during futex
> cleanup when the memory as pointed out by Johannes in [1]. The said case
> is still not fully understood but let's drop this bypass that was mostly
> driven by an artificial workload and allow dying tasks to go into oom
> path. This will make the code easier to reason about and also help
> corner cases where oom_reaper could help to release memory.
> 
> [1] https://lore.kernel.org/all/20241212183012.GB1026@cmpxchg.org/T/#u
> 
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Thanks, yeah, the investigation stalled out over the new years break
and then... distractions.

I think we'll eventually still need the second part of [2], to force
charge from dying OOM victims, but let's go with this for now.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

[2] https://lore.kernel.org/all/20241212183012.GB1026@cmpxchg.org/

> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7b3503d12aaf..9c30c442e3b0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1627,7 +1627,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	 * A few threads which were not waiting at mutex_lock_killable() can
>  	 * fail to bail out. Therefore, check again after holding oom_lock.
>  	 */
> -	ret = task_is_dying() || out_of_memory(&oc);
> +	ret = out_of_memory(&oc);
>  
>  unlock:
>  	mutex_unlock(&oom_lock);
> -- 
> 2.49.0
Shakeel Butt April 2, 2025, 4:01 p.m. UTC | #2
On Wed, Apr 02, 2025 at 11:27:15AM -0400, Johannes Weiner wrote:
> On Wed, Apr 02, 2025 at 11:01:17AM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > 7775face2079 ("memcg: killed threads should not invoke memcg OOM killer") has added
> > a bypass of the oom killer path for dying threads because a very
> > specific workload (described in the changelog) could hit "no killable
> > tasks" path. This itself is not fatal condition but it could be annoying
> > if this was a common case.
> > 
> > On the other hand the bypass has some issues on its own. Without
> > triggering oom killer we won't be able to trigger async oom reclaim
> > (oom_reaper) which can operate on killed tasks as well as long as they
> > still have their mm available. This could be the case during futex
> > cleanup when the memory as pointed out by Johannes in [1]. The said case
> > is still not fully understood but let's drop this bypass that was mostly
> > driven by an artificial workload and allow dying tasks to go into oom
> > path. This will make the code easier to reason about and also help
> > corner cases where oom_reaper could help to release memory.
> > 
> > [1] https://lore.kernel.org/all/20241212183012.GB1026@cmpxchg.org/T/#u
> > 
> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks, yeah, the investigation stalled out over the new years break
> and then... distractions.
> 
> I think we'll eventually still need the second part of [2], to force
> charge from dying OOM victims, but let's go with this for now.

Agreed.

> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> [2] https://lore.kernel.org/all/20241212183012.GB1026@cmpxchg.org/
> 

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b3503d12aaf..9c30c442e3b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1627,7 +1627,7 @@  static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 * A few threads which were not waiting at mutex_lock_killable() can
 	 * fail to bail out. Therefore, check again after holding oom_lock.
 	 */
-	ret = task_is_dying() || out_of_memory(&oc);
+	ret = out_of_memory(&oc);
 
 unlock:
 	mutex_unlock(&oom_lock);