diff mbox series

mm, memcg: reset memcg's memory.{min, low} for reclaiming itself

Message ID 1577450633-2098-2-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm, memcg: reset memcg's memory.{min, low} for reclaiming itself | expand

Commit Message

Yafang Shao Dec. 27, 2019, 12:43 p.m. UTC
memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
them won't be changed until next recalculation in this function. After
either or both of them are set, the next reclaimer to relcaim this memcg
may be a different reclaimer, e.g. this memcg is also the root memcg of
the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
the old values of them will be used to calculate scan count, that is not
proper. We should reset them to zero in this case.

Here's an example of this issue.

    root_mem_cgroup
         /
        A   memory.max=1024M memory.min=512M memory.current=800M

Once kswapd is waked up, it will try to scan all MEMCGs, including
this A, and it will assign memory.emin of A with 512M.
After that, A may reach its hard limit(memory.max), and then it will
do memcg reclaim. Because A is the root of this reclaimer, so it will
not calculate its memory.emin. So the memory.emin is the old value
512M, and then this old value will be used in
mem_cgroup_protection() in get_scan_count() to get the scan count.
That is not proper.

Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org
---
 mm/memcontrol.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Roman Gushchin Dec. 27, 2019, 11:49 p.m. UTC | #1
On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> them won't be changed until next recalculation in this function. After
> either or both of them are set, the next reclaimer to relcaim this memcg
> may be a different reclaimer, e.g. this memcg is also the root memcg of
> the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> the old values of them will be used to calculate scan count, that is not
> proper. We should reset them to zero in this case.
> 
> Here's an example of this issue.
> 
>     root_mem_cgroup
>          /
>         A   memory.max=1024M memory.min=512M memory.current=800M
> 
> Once kswapd is waked up, it will try to scan all MEMCGs, including
> this A, and it will assign memory.emin of A with 512M.
> After that, A may reach its hard limit(memory.max), and then it will
> do memcg reclaim. Because A is the root of this reclaimer, so it will
> not calculate its memory.emin. So the memory.emin is the old value
> 512M, and then this old value will be used in
> mem_cgroup_protection() in get_scan_count() to get the scan count.
> That is not proper.
> 
> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/memcontrol.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 601405b..bb3925d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
>  
>  	if (!root)
>  		root = root_mem_cgroup;
> -	if (memcg == root)
> +	if (memcg == root) {
> +		/*
> +		 * Reset memory.(emin, elow) for reclaiming the memcg
> +		 * itself.
> +		 */
> +		if (memcg != root_mem_cgroup) {
> +			memcg->memory.emin = 0;
> +			memcg->memory.elow = 0;
> +		}

I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
set on the top level. Reaching this limit doesn't mean that all protection
configuration inside the tree can be ignored.

Instead we should respect memory.low/max set by a user on this level
(look at the parent == root case), maybe clamped by memory.high/max.

Thanks!
Yafang Shao Dec. 28, 2019, 1:45 a.m. UTC | #2
On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > them won't be changed until next recalculation in this function. After
> > either or both of them are set, the next reclaimer to relcaim this memcg
> > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > the old values of them will be used to calculate scan count, that is not
> > proper. We should reset them to zero in this case.
> >
> > Here's an example of this issue.
> >
> >     root_mem_cgroup
> >          /
> >         A   memory.max=1024M memory.min=512M memory.current=800M
> >
> > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > this A, and it will assign memory.emin of A with 512M.
> > After that, A may reach its hard limit(memory.max), and then it will
> > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > not calculate its memory.emin. So the memory.emin is the old value
> > 512M, and then this old value will be used in
> > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > That is not proper.
> >
> > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/memcontrol.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 601405b..bb3925d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> >
> >       if (!root)
> >               root = root_mem_cgroup;
> > -     if (memcg == root)
> > +     if (memcg == root) {
> > +             /*
> > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > +              * itself.
> > +              */
> > +             if (memcg != root_mem_cgroup) {
> > +                     memcg->memory.emin = 0;
> > +                     memcg->memory.elow = 0;
> > +             }
>
> I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> set on the top level. Reaching this limit doesn't mean that all protection
> configuration inside the tree can be ignored.
>

No, they won't be ignored.
Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
its children's effective min and low.

> Instead we should respect memory.low/max set by a user on this level
> (look at the parent == root case), maybe clamped by memory.high/max.
>

Let's look at the parent == root case.
What if the parent is the root_mem_cgroup?
The memory.{emin, elow} of root_mem_cgroup is always 0 right ?
So what's your problem ?

Thanks
Yafang
Roman Gushchin Dec. 28, 2019, 2:59 a.m. UTC | #3
On Sat, Dec 28, 2019 at 09:45:11AM +0800, Yafang Shao wrote:
> On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > them won't be changed until next recalculation in this function. After
> > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > the old values of them will be used to calculate scan count, that is not
> > > proper. We should reset them to zero in this case.
> > >
> > > Here's an example of this issue.
> > >
> > >     root_mem_cgroup
> > >          /
> > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > >
> > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > this A, and it will assign memory.emin of A with 512M.
> > > After that, A may reach its hard limit(memory.max), and then it will
> > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > not calculate its memory.emin. So the memory.emin is the old value
> > > 512M, and then this old value will be used in
> > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > That is not proper.
> > >
> > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > Cc: Chris Down <chris@chrisdown.name>
> > > Cc: Roman Gushchin <guro@fb.com>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  mm/memcontrol.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 601405b..bb3925d 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> > >
> > >       if (!root)
> > >               root = root_mem_cgroup;
> > > -     if (memcg == root)
> > > +     if (memcg == root) {
> > > +             /*
> > > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > > +              * itself.
> > > +              */
> > > +             if (memcg != root_mem_cgroup) {
> > > +                     memcg->memory.emin = 0;
> > > +                     memcg->memory.elow = 0;
> > > +             }
> >
> > I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> > protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> > set on the top level. Reaching this limit doesn't mean that all protection
> > configuration inside the tree can be ignored.
> >
> 
> No, they won't be ignored.
> Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
> its children's effective min and low.

Ah, you're right. I forgot about this
    if (parent == root)
	goto exit;

which saves elow/emin from being truncated to 0. Sorry.

Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>

Thanks!
Yafang Shao Dec. 28, 2019, 4:24 a.m. UTC | #4
On Sat, Dec 28, 2019 at 11:00 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Sat, Dec 28, 2019 at 09:45:11AM +0800, Yafang Shao wrote:
> > On Sat, Dec 28, 2019 at 7:49 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote:
> > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > them won't be changed until next recalculation in this function. After
> > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > the old values of them will be used to calculate scan count, that is not
> > > > proper. We should reset them to zero in this case.
> > > >
> > > > Here's an example of this issue.
> > > >
> > > >     root_mem_cgroup
> > > >          /
> > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > >
> > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > this A, and it will assign memory.emin of A with 512M.
> > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > 512M, and then this old value will be used in
> > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > That is not proper.
> > > >
> > > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > Cc: Chris Down <chris@chrisdown.name>
> > > > Cc: Roman Gushchin <guro@fb.com>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > >  mm/memcontrol.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 601405b..bb3925d 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> > > >
> > > >       if (!root)
> > > >               root = root_mem_cgroup;
> > > > -     if (memcg == root)
> > > > +     if (memcg == root) {
> > > > +             /*
> > > > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > > > +              * itself.
> > > > +              */
> > > > +             if (memcg != root_mem_cgroup) {
> > > > +                     memcg->memory.emin = 0;
> > > > +                     memcg->memory.elow = 0;
> > > > +             }
> > >
> > > I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting
> > > protection is correct. Imagine a simple config: a large cgroup subtree with memory.max
> > > set on the top level. Reaching this limit doesn't mean that all protection
> > > configuration inside the tree can be ignored.
> > >
> >
> > No, they won't be ignored.
> > Pls. see the logic in mem_cgroup_protected(), it will re-calculate all
> > its children's effective min and low.
>
> Ah, you're right. I forgot about this
>     if (parent == root)
>         goto exit;
>
> which saves elow/emin from being truncated to 0. Sorry.
>
> Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com>
>

Thanks for your review.

Thanks
Yafang
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 601405b..bb3925d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6287,8 +6287,17 @@  enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 
 	if (!root)
 		root = root_mem_cgroup;
-	if (memcg == root)
+	if (memcg == root) {
+		/*
+		 * Reset memory.(emin, elow) for reclaiming the memcg
+		 * itself.
+		 */
+		if (memcg != root_mem_cgroup) {
+			memcg->memory.emin = 0;
+			memcg->memory.elow = 0;
+		}
 		return MEMCG_PROT_NONE;
+	}
 
 	usage = page_counter_read(&memcg->memory);
 	if (!usage)