diff mbox series

mm, memcg: avoid oom if cgroup is not populated

Message ID 1574773369-1634-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm, memcg: avoid oom if cgroup is not populated | expand

Commit Message

Yafang Shao Nov. 26, 2019, 1:02 p.m. UTC
There's one case that the processes in a memcg are all exit (due to OOM
group or some other reasons), but the file page caches are still exist.
These file page caches may be protected by memory.min so can't be
reclaimed. If we can't success to restart the processes in this memcg or
don't want to make this memcg offline, then we want to drop the file page
caches.
The advantage of droping this file caches is it can avoid the reclaimer
(either kswapd or direct) scanning and reclaiming pages from all memcgs
exist in this system, because currently the reclaimer will fairly reclaim
pages from all memcgs if the system is under memory pressure.
The possible method to drop these file page caches is setting the
hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
and generates lots of misleading outputs, that should not happen.
One misleading output is "Out of memory and no killable processes...",
while really there is no tasks rather than no killable tasks. Furthermore,
the OOM output is not expected by the admin if he or she only wants to drop
the cahes and knows there're no processes running in this memcg.

If memcg is not populated, we should not invoke the OOM killer.

Fixes: b6e6edcf ("mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Michal Hocko Nov. 26, 2019, 1:16 p.m. UTC | #1
On Tue 26-11-19 08:02:49, Yafang Shao wrote:
> There's one case that the processes in a memcg are all exit (due to OOM
> group or some other reasons), but the file page caches are still exist.
> These file page caches may be protected by memory.min so can't be
> reclaimed. If we can't success to restart the processes in this memcg or
> don't want to make this memcg offline, then we want to drop the file page
> caches.
> The advantage of droping this file caches is it can avoid the reclaimer
> (either kswapd or direct) scanning and reclaiming pages from all memcgs
> exist in this system, because currently the reclaimer will fairly reclaim
> pages from all memcgs if the system is under memory pressure.
> The possible method to drop these file page caches is setting the
> hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> and generates lots of misleading outputs, that should not happen.

I disagree that the output is misleading. Quite contrary, it provides a
useful lead on the unreclaimable memory.

> One misleading output is "Out of memory and no killable processes...",
> while really there is no tasks rather than no killable tasks.

Again, this is nothing misleading. No task is a trivial subset of no
killable task. I do not see why we should treat one differently than the
other.

> Furthermore,
> the OOM output is not expected by the admin if he or she only wants to drop
> the cahes and knows there're no processes running in this memcg.

But this is not what hard limit reduced to 0 really does. No matter
whether there is some task or not. It simply reclaims _all_ the memory
as explained in other email.
 
> If memcg is not populated, we should not invoke the OOM killer.

I have already explained why I believe this is not correct in other
email and this description doesn't provide any real justification. It is
merely your intepretation of what should happen and I believe you
haven't thought through it really.
 
> Fixes: b6e6edcf ("mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1c4c08b..4e08905 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6139,9 +6139,13 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>  			continue;
>  		}
>  
> -		memcg_memory_event(memcg, MEMCG_OOM);
> -		if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> +		if (cgroup_is_populated(memcg->css.cgroup)) {
> +			memcg_memory_event(memcg, MEMCG_OOM);
> +			if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> +				break;
> +		} else  {
>  			break;
> +		}
>  	}
>  
>  	memcg_wb_domain_size_changed(memcg);
> -- 
> 1.8.3.1
Yafang Shao Nov. 26, 2019, 2:25 p.m. UTC | #2
On Tue, Nov 26, 2019 at 9:16 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 26-11-19 08:02:49, Yafang Shao wrote:
> > There's one case that the processes in a memcg are all exit (due to OOM
> > group or some other reasons), but the file page caches are still exist.
> > These file page caches may be protected by memory.min so can't be
> > reclaimed. If we can't success to restart the processes in this memcg or
> > don't want to make this memcg offline, then we want to drop the file page
> > caches.
> > The advantage of droping this file caches is it can avoid the reclaimer
> > (either kswapd or direct) scanning and reclaiming pages from all memcgs
> > exist in this system, because currently the reclaimer will fairly reclaim
> > pages from all memcgs if the system is under memory pressure.
> > The possible method to drop these file page caches is setting the
> > hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> > and generates lots of misleading outputs, that should not happen.
>
> I disagree that the output is misleading. Quite contrary, it provides a
> useful lead on the unreclaimable memory.
>

We can show the unreclaimable memory independently, rather than print
the full oom output.
OOM killer is used to kill process, why do we invoke it when there's
no process ?
What's the advantage of doing it ?

> > One misleading output is "Out of memory and no killable processes...",
> > while really there is no tasks rather than no killable tasks.
>
> Again, this is nothing misleading. No task is a trivial subset of no
> killable task. I do not see why we should treat one differently than the
> other.
>

No killable tasks means  there's task and the OOM killer may be invoked.
While no tasks means the OOM killer is useless.

> > Furthermore,
> > the OOM output is not expected by the admin if he or she only wants to drop
> > the cahes and knows there're no processes running in this memcg.
>
> But this is not what hard limit reduced to 0 really does. No matter
> whether there is some task or not. It simply reclaims _all_ the memory
> as explained in other email.
>

Are there any way to reclaim page cache only ?
No.

I know it will relcaim all the memory.
If you really think this expression is a prolem,  but does it
improtant that we should distingush between  caches (both page caches
and kmem) and _all_ memory, especially when there's no processes ?

> > If memcg is not populated, we should not invoke the OOM killer.
>
> I have already explained why I believe this is not correct in other
> email and this description doesn't provide any real justification. It is
> merely your intepretation of what should happen and I believe you
> haven't thought through it really.
>
> > Fixes: b6e6edcf ("mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage")
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/memcontrol.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1c4c08b..4e08905 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6139,9 +6139,13 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
> >                       continue;
> >               }
> >
> > -             memcg_memory_event(memcg, MEMCG_OOM);
> > -             if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> > +             if (cgroup_is_populated(memcg->css.cgroup)) {
> > +                     memcg_memory_event(memcg, MEMCG_OOM);
> > +                     if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> > +                             break;
> > +             } else  {
> >                       break;
> > +             }
> >       }
> >
> >       memcg_wb_domain_size_changed(memcg);
> > --
> > 1.8.3.1
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Nov. 26, 2019, 2:45 p.m. UTC | #3
On Tue 26-11-19 22:25:27, Yafang Shao wrote:
> On Tue, Nov 26, 2019 at 9:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 26-11-19 08:02:49, Yafang Shao wrote:
> > > There's one case that the processes in a memcg are all exit (due to OOM
> > > group or some other reasons), but the file page caches are still exist.
> > > These file page caches may be protected by memory.min so can't be
> > > reclaimed. If we can't success to restart the processes in this memcg or
> > > don't want to make this memcg offline, then we want to drop the file page
> > > caches.
> > > The advantage of droping this file caches is it can avoid the reclaimer
> > > (either kswapd or direct) scanning and reclaiming pages from all memcgs
> > > exist in this system, because currently the reclaimer will fairly reclaim
> > > pages from all memcgs if the system is under memory pressure.
> > > The possible method to drop these file page caches is setting the
> > > hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> > > and generates lots of misleading outputs, that should not happen.
> >
> > I disagree that the output is misleading. Quite contrary, it provides a
> > useful lead on the unreclaimable memory.
> >
> 
> We can show the unreclaimable memory independently, rather than print
> the full oom output.
> OOM killer is used to kill process, why do we invoke it when there's
> no process ?
> What's the advantage of doing it ?

Consistency.

> > > One misleading output is "Out of memory and no killable processes...",
> > > while really there is no tasks rather than no killable tasks.
> >
> > Again, this is nothing misleading. No task is a trivial subset of no
> > killable task. I do not see why we should treat one differently than the
> > other.
> >
> 
> No killable tasks means  there's task and the OOM killer may be invoked.
> While no tasks means the OOM killer is useless.

I disagree.

> > > Furthermore,
> > > the OOM output is not expected by the admin if he or she only wants to drop
> > > the cahes and knows there're no processes running in this memcg.
> >
> > But this is not what hard limit reduced to 0 really does. No matter
> > whether there is some task or not. It simply reclaims _all_ the memory
> > as explained in other email.
> >
> 
> Are there any way to reclaim page cache only ?
> No.

Correct. And in absence of a solid usecase then I do not see a reason to
add this. We have a global knob to achieve this and it has turned out to
be abused and just used incorrectly most of the time.

> I know it will relcaim all the memory.
> If you really think this expression is a prolem,  but does it
> improtant that we should distingush between  caches (both page caches
> and kmem) and _all_ memory, especially when there's no processes ?

I do not think we should distinguish different memory types and treat
them differently when applying hard limit.
Yafang Shao Nov. 26, 2019, 2:51 p.m. UTC | #4
On Tue, Nov 26, 2019 at 10:45 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 26-11-19 22:25:27, Yafang Shao wrote:
> > On Tue, Nov 26, 2019 at 9:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Tue 26-11-19 08:02:49, Yafang Shao wrote:
> > > > There's one case that the processes in a memcg are all exit (due to OOM
> > > > group or some other reasons), but the file page caches are still exist.
> > > > These file page caches may be protected by memory.min so can't be
> > > > reclaimed. If we can't success to restart the processes in this memcg or
> > > > don't want to make this memcg offline, then we want to drop the file page
> > > > caches.
> > > > The advantage of droping this file caches is it can avoid the reclaimer
> > > > (either kswapd or direct) scanning and reclaiming pages from all memcgs
> > > > exist in this system, because currently the reclaimer will fairly reclaim
> > > > pages from all memcgs if the system is under memory pressure.
> > > > The possible method to drop these file page caches is setting the
> > > > hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> > > > and generates lots of misleading outputs, that should not happen.
> > >
> > > I disagree that the output is misleading. Quite contrary, it provides a
> > > useful lead on the unreclaimable memory.
> > >
> >
> > We can show the unreclaimable memory independently, rather than print
> > the full oom output.
> > OOM killer is used to kill process, why do we invoke it when there's
> > no process ?
> > What's the advantage of doing it ?
>
> Consistency.
>

If there are tasks, we invoke the OOM killer  to try to kill the tasks.
If there're no tasks, we just try to free the reclaimable pages.

Why do you think this is NOT Consistency?

Regarding the output, why should we distinguish the system OOM and memcg OOM,
and why do you think this is Consistency ?

> > > > One misleading output is "Out of memory and no killable processes...",
> > > > while really there is no tasks rather than no killable tasks.
> > >
> > > Again, this is nothing misleading. No task is a trivial subset of no
> > > killable task. I do not see why we should treat one differently than the
> > > other.
> > >
> >
> > No killable tasks means  there's task and the OOM killer may be invoked.
> > While no tasks means the OOM killer is useless.
>
> I disagree.
>
> > > > Furthermore,
> > > > the OOM output is not expected by the admin if he or she only wants to drop
> > > > the cahes and knows there're no processes running in this memcg.
> > >
> > > But this is not what hard limit reduced to 0 really does. No matter
> > > whether there is some task or not. It simply reclaims _all_ the memory
> > > as explained in other email.
> > >
> >
> > Are there any way to reclaim page cache only ?
> > No.
>
> Correct. And in absence of a solid usecase then I do not see a reason to
> add this. We have a global knob to achieve this and it has turned out to
> be abused and just used incorrectly most of the time.
>
> > I know it will relcaim all the memory.
> > If you really think this expression is a prolem,  but does it
> > improtant that we should distingush between  caches (both page caches
> > and kmem) and _all_ memory, especially when there's no processes ?
>
> I do not think we should distinguish different memory types and treat
> them differently when applying hard limit.
> --

It doesn't matter with different memory types.
It really matters with if there is no such memory that we don't need
to waster our time to handle it.

Thanks
Yafang
Michal Hocko Nov. 26, 2019, 3:06 p.m. UTC | #5
On Tue 26-11-19 22:51:30, Yafang Shao wrote:
> On Tue, Nov 26, 2019 at 10:45 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 26-11-19 22:25:27, Yafang Shao wrote:
> > > On Tue, Nov 26, 2019 at 9:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Tue 26-11-19 08:02:49, Yafang Shao wrote:
> > > > > There's one case that the processes in a memcg are all exit (due to OOM
> > > > > group or some other reasons), but the file page caches are still exist.
> > > > > These file page caches may be protected by memory.min so can't be
> > > > > reclaimed. If we can't success to restart the processes in this memcg or
> > > > > don't want to make this memcg offline, then we want to drop the file page
> > > > > caches.
> > > > > The advantage of droping this file caches is it can avoid the reclaimer
> > > > > (either kswapd or direct) scanning and reclaiming pages from all memcgs
> > > > > exist in this system, because currently the reclaimer will fairly reclaim
> > > > > pages from all memcgs if the system is under memory pressure.
> > > > > The possible method to drop these file page caches is setting the
> > > > > hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> > > > > and generates lots of misleading outputs, that should not happen.
> > > >
> > > > I disagree that the output is misleading. Quite contrary, it provides a
> > > > useful lead on the unreclaimable memory.
> > > >
> > >
> > > We can show the unreclaimable memory independently, rather than print
> > > the full oom output.
> > > OOM killer is used to kill process, why do we invoke it when there's
> > > no process ?
> > > What's the advantage of doing it ?
> >
> > Consistency.
> >
> 
> If there are tasks, we invoke the OOM killer  to try to kill the tasks.
> If there're no tasks, we just try to free the reclaimable pages.

The fact that the oom killer has been invoked implies there is _no_
reclaimable memory. Full stop.

Anyway I am getting a bit tired to repeat myself so I am not continuing
in this discussion. I have provided my feedback to your patch, please
try to think harder about it. If you have real life usecases which
cannot work properly with the existing functionality and APIs then bring
them up.
Johannes Weiner Nov. 26, 2019, 4:30 p.m. UTC | #6
On Tue, Nov 26, 2019 at 08:02:49AM -0500, Yafang Shao wrote:
> There's one case that the processes in a memcg are all exit (due to OOM
> group or some other reasons), but the file page caches are still exist.
> These file page caches may be protected by memory.min so can't be
> reclaimed. If we can't success to restart the processes in this memcg or
> don't want to make this memcg offline, then we want to drop the file page
> caches.
> The advantage of droping this file caches is it can avoid the reclaimer
> (either kswapd or direct) scanning and reclaiming pages from all memcgs
> exist in this system, because currently the reclaimer will fairly reclaim
> pages from all memcgs if the system is under memory pressure.
> The possible method to drop these file page caches is setting the
> hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> and generates lots of misleading outputs, that should not happen.

You can set memory.high instead...?
Yafang Shao Nov. 27, 2019, 1:16 a.m. UTC | #7
On Wed, Nov 27, 2019 at 12:30 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Nov 26, 2019 at 08:02:49AM -0500, Yafang Shao wrote:
> > There's one case that the processes in a memcg are all exit (due to OOM
> > group or some other reasons), but the file page caches are still exist.
> > These file page caches may be protected by memory.min so can't be
> > reclaimed. If we can't success to restart the processes in this memcg or
> > don't want to make this memcg offline, then we want to drop the file page
> > caches.
> > The advantage of droping this file caches is it can avoid the reclaimer
> > (either kswapd or direct) scanning and reclaiming pages from all memcgs
> > exist in this system, because currently the reclaimer will fairly reclaim
> > pages from all memcgs if the system is under memory pressure.
> > The possible method to drop these file page caches is setting the
> > hard limit of this memcg to 0. Unfortunately this may invoke the OOM killer
> > and generates lots of misleading outputs, that should not happen.
>
> You can set memory.high instead...?

Well, I will take a look at memory.high and analye whether it is reliable.

Thanks
Yafang
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1c4c08b..4e08905 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6139,9 +6139,13 @@  static ssize_t memory_max_write(struct kernfs_open_file *of,
 			continue;
 		}
 
-		memcg_memory_event(memcg, MEMCG_OOM);
-		if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
+		if (cgroup_is_populated(memcg->css.cgroup)) {
+			memcg_memory_event(memcg, MEMCG_OOM);
+			if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
+				break;
+		} else  {
 			break;
+		}
 	}
 
 	memcg_wb_domain_size_changed(memcg);