diff mbox series

mm: memcg: yield cpu when we fail to charge pages

Message ID 20200908185051.62420-1-jpitti@cisco.com (mailing list archive)
State New, archived
Headers show
Series mm: memcg: yield cpu when we fail to charge pages | expand

Commit Message

Julius Hemanth Pitti Sept. 8, 2020, 6:50 p.m. UTC
For non root CG, in try_charge(), we keep trying
to charge until we succeed. On non-preemptive
kernel, when we are OOM, this results in holding
CPU forever.

On SMP systems, this doesn't create a big problem
because oom_reaper get a change to kill victim
and make some free pages. However on a single-core
CPU (or cases where oom_reaper pinned to same CPU
where try_charge is executing), oom_reaper shall
never get scheduled and we stay in try_charge forever.

Steps to repo this on non-smp:
1. mount -t tmpfs none /sys/fs/cgroup
2. mkdir /sys/fs/cgroup/memory
3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
4. mkdir /sys/fs/cgroup/memory/0
5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
6. echo $$ > /sys/fs/cgroup/memory/0/tasks
7. stress -m 5 --vm-bytes 10M --vm-hang 0

Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
---
 mm/memcontrol.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Roman Gushchin Sept. 8, 2020, 7:21 p.m. UTC | #1
On Tue, Sep 08, 2020 at 11:50:51AM -0700, Julius Hemanth Pitti wrote:
> For non root CG, in try_charge(), we keep trying
> to charge until we succeed. On non-preemptive
> kernel, when we are OOM, this results in holding
> CPU forever.
> 
> On SMP systems, this doesn't create a big problem
> because oom_reaper get a change to kill victim
> and make some free pages. However on a single-core
> CPU (or cases where oom_reaper pinned to same CPU
> where try_charge is executing), oom_reaper shall
> never get scheduled and we stay in try_charge forever.
> 
> Steps to repo this on non-smp:
> 1. mount -t tmpfs none /sys/fs/cgroup
> 2. mkdir /sys/fs/cgroup/memory
> 3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
> 4. mkdir /sys/fs/cgroup/memory/0
> 5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
> 6. echo $$ > /sys/fs/cgroup/memory/0/tasks
> 7. stress -m 5 --vm-bytes 10M --vm-hang 0
> 
> Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
> ---
>  mm/memcontrol.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0d6f3ea86738..4620d70267cb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	if (fatal_signal_pending(current))
>  		goto force;
>  
> +	cond_resched();
> +

Can you, please, add a short comment here?
Something like "give oom_reaper a chance on a non-SMP system"?

>  	/*
>  	 * keep retrying as long as the memcg oom killer is able to make
>  	 * a forward progress or bypass the charge if the oom killer
> -- 
> 2.17.1
>

The patch makes total sense to me. Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com> after adding a comment.

Thank you!
Julius Hemanth Pitti Sept. 8, 2020, 7:28 p.m. UTC | #2
On Tue, 2020-09-08 at 12:21 -0700, Roman Gushchin wrote:
> On Tue, Sep 08, 2020 at 11:50:51AM -0700, Julius Hemanth Pitti wrote:
> > For non root CG, in try_charge(), we keep trying
> > to charge until we succeed. On non-preemptive
> > kernel, when we are OOM, this results in holding
> > CPU forever.
> > 
> > On SMP systems, this doesn't create a big problem
> > because oom_reaper get a change to kill victim
> > and make some free pages. However on a single-core
> > CPU (or cases where oom_reaper pinned to same CPU
> > where try_charge is executing), oom_reaper shall
> > never get scheduled and we stay in try_charge forever.
> > 
> > Steps to repo this on non-smp:
> > 1. mount -t tmpfs none /sys/fs/cgroup
> > 2. mkdir /sys/fs/cgroup/memory
> > 3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
> > 4. mkdir /sys/fs/cgroup/memory/0
> > 5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
> > 6. echo $$ > /sys/fs/cgroup/memory/0/tasks
> > 7. stress -m 5 --vm-bytes 10M --vm-hang 0
> > 
> > Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
> > ---
> >  mm/memcontrol.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 0d6f3ea86738..4620d70267cb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup
> > *memcg, gfp_t gfp_mask,
> >  	if (fatal_signal_pending(current))
> >  		goto force;
> >  
> > +	cond_resched();
> > +
> 
> Can you, please, add a short comment here?
> Something like "give oom_reaper a chance on a non-SMP system"?
Sure.

> 
> >  	/*
> >  	 * keep retrying as long as the memcg oom killer is able to
> > make
> >  	 * a forward progress or bypass the charge if the oom killer
> > -- 
> > 2.17.1
> > 
> 
> The patch makes total sense to me. Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com> after adding a comment.
Thanks, I shall add.

> 
> Thank you!
Xunlei Pang Sept. 14, 2020, 4:15 a.m. UTC | #3
On 2020/9/9 AM2:50, Julius Hemanth Pitti wrote:
> For non root CG, in try_charge(), we keep trying
> to charge until we succeed. On non-preemptive
> kernel, when we are OOM, this results in holding
> CPU forever.
> 
> On SMP systems, this doesn't create a big problem
> because oom_reaper get a change to kill victim
> and make some free pages. However on a single-core
> CPU (or cases where oom_reaper pinned to same CPU
> where try_charge is executing), oom_reaper shall
> never get scheduled and we stay in try_charge forever.
> 
> Steps to repo this on non-smp:
> 1. mount -t tmpfs none /sys/fs/cgroup
> 2. mkdir /sys/fs/cgroup/memory
> 3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
> 4. mkdir /sys/fs/cgroup/memory/0
> 5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
> 6. echo $$ > /sys/fs/cgroup/memory/0/tasks
> 7. stress -m 5 --vm-bytes 10M --vm-hang 0
> 
> Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
> ---
>  mm/memcontrol.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0d6f3ea86738..4620d70267cb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	if (fatal_signal_pending(current))
>  		goto force;
>  
> +	cond_resched();
> +
>  	/*
>  	 * keep retrying as long as the memcg oom killer is able to make
>  	 * a forward progress or bypass the charge if the oom killer
> 

This should be fixed by:
https://lkml.org/lkml/2020/8/26/1440

Thanks,
Xunlei
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0d6f3ea86738..4620d70267cb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2652,6 +2652,8 @@  static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (fatal_signal_pending(current))
 		goto force;
 
+	cond_resched();
+
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
 	 * a forward progress or bypass the charge if the oom killer