[resend] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
diff mbox series

Message ID 20200216145249.6900-1-laoar.shao@gmail.com
State New
Headers show
Series
  • [resend] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself
Related show

Commit Message

Yafang Shao Feb. 16, 2020, 2:52 p.m. UTC
memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
them won't be changed until next recalculation in this function. After
either or both of them are set, the next reclaimer to relcaim this memcg
may be a different reclaimer, e.g. this memcg is also the root memcg of
the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
the old values of them will be used to calculate scan count, that is not
proper. We should reset them to zero in this case.

Here's an example of this issue.

    root_mem_cgroup
         /
        A   memory.max=1024M memory.min=512M memory.current=800M

Once kswapd is waked up, it will try to scan all MEMCGs, including
this A, and it will assign memory.emin of A with 512M.
After that, A may reach its hard limit(memory.max), and then it will
do memcg reclaim. Because A is the root of this reclaimer, so it will
not calculate its memory.emin. So the memory.emin is the old value
512M, and then this old value will be used in
mem_cgroup_protection() in get_scan_count() to get the scan count.
That is not proper.

Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: stable@vger.kernel.org
---
 mm/memcontrol.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Michal Hocko Feb. 17, 2020, 9:24 a.m. UTC | #1
On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> them won't be changed until next recalculation in this function. After
> either or both of them are set, the next reclaimer to relcaim this memcg
> may be a different reclaimer, e.g. this memcg is also the root memcg of
> the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> the old values of them will be used to calculate scan count, that is not
> proper. We should reset them to zero in this case.
> 
> Here's an example of this issue.
> 
>     root_mem_cgroup
>          /
>         A   memory.max=1024M memory.min=512M memory.current=800M
> 
> Once kswapd is waked up, it will try to scan all MEMCGs, including
> this A, and it will assign memory.emin of A with 512M.
> After that, A may reach its hard limit(memory.max), and then it will
> do memcg reclaim. Because A is the root of this reclaimer, so it will
> not calculate its memory.emin. So the memory.emin is the old value
> 512M, and then this old value will be used in
> mem_cgroup_protection() in get_scan_count() to get the scan count.
> That is not proper.

Please document user visible effects of this patch. What does it mean
that this is not proper behavior? What happens if we have concurrent
reclaimers at different levels of the hierarchy how that would affect
the resulting protection?

> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> Cc: Chris Down <chris@chrisdown.name>
> Cc: stable@vger.kernel.org
> ---
>  mm/memcontrol.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6f6dc8712e39..df7fedbfc211 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6250,8 +6250,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
>  
>  	if (!root)
>  		root = root_mem_cgroup;
> -	if (memcg == root)
> +	if (memcg == root) {
> +		/*
> +		 * Reset memory.(emin, elow) for reclaiming the memcg
> +		 * itself.
> +		 */
> +		if (memcg != root_mem_cgroup) {
> +			memcg->memory.emin = 0;
> +			memcg->memory.elow = 0;
> +		}
>  		return MEMCG_PROT_NONE;
> +	}
>  
>  	usage = page_counter_read(&memcg->memory);
>  	if (!usage)
> -- 
> 2.14.1
Yafang Shao Feb. 17, 2020, 1:08 p.m. UTC | #2
On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > them won't be changed until next recalculation in this function. After
> > either or both of them are set, the next reclaimer to relcaim this memcg
> > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > the old values of them will be used to calculate scan count, that is not
> > proper. We should reset them to zero in this case.
> >
> > Here's an example of this issue.
> >
> >     root_mem_cgroup
> >          /
> >         A   memory.max=1024M memory.min=512M memory.current=800M
> >
> > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > this A, and it will assign memory.emin of A with 512M.
> > After that, A may reach its hard limit(memory.max), and then it will
> > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > not calculate its memory.emin. So the memory.emin is the old value
> > 512M, and then this old value will be used in
> > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > That is not proper.
>
> Please document user visible effects of this patch. What does it mean
> that this is not proper behavior?

In the memcg reclaim, if the target memcg is the root of the reclaimer,
the reclaimer should scan this memcg's all page cache pages in the LRU,
but now as the old memcg.{emin, elow} value are still there, it will get
a wrong protection value,
and the reclaimer can't reclaim the page cache pages protected by this
wrong protection.

> What happens if we have concurrent
> reclaimers at different levels of the hierarchy how that would affect
> the resulting protection?
>

Well, I thought the synchronization mechanisms have already existed ?
Otherwise there must be concurrent issue in the original code of
setting the memcg.{emin, elow} as well.
(Because memcg->memory.{emin, elow} are also set at the end of the
function mem_cgroup_protected())



> > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Acked-by: Roman Gushchin <guro@fb.com>
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/memcontrol.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 6f6dc8712e39..df7fedbfc211 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6250,8 +6250,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> >
> >       if (!root)
> >               root = root_mem_cgroup;
> > -     if (memcg == root)
> > +     if (memcg == root) {
> > +             /*
> > +              * Reset memory.(emin, elow) for reclaiming the memcg
> > +              * itself.
> > +              */
> > +             if (memcg != root_mem_cgroup) {
> > +                     memcg->memory.emin = 0;
> > +                     memcg->memory.elow = 0;
> > +             }
> >               return MEMCG_PROT_NONE;
> > +     }
> >
> >       usage = page_counter_read(&memcg->memory);
> >       if (!usage)
> > --
> > 2.14.1
>
> --
> Michal Hocko
> SUSE Labs



--
Yafang Shao
DiDi
Michal Hocko Feb. 17, 2020, 1:24 p.m. UTC | #3
On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > them won't be changed until next recalculation in this function. After
> > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > the old values of them will be used to calculate scan count, that is not
> > > proper. We should reset them to zero in this case.
> > >
> > > Here's an example of this issue.
> > >
> > >     root_mem_cgroup
> > >          /
> > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > >
> > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > this A, and it will assign memory.emin of A with 512M.
> > > After that, A may reach its hard limit(memory.max), and then it will
> > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > not calculate its memory.emin. So the memory.emin is the old value
> > > 512M, and then this old value will be used in
> > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > That is not proper.
> >
> > Please document user visible effects of this patch. What does it mean
> > that this is not proper behavior?
> 
> In the memcg reclaim, if the target memcg is the root of the reclaimer,
> the reclaimer should scan this memcg's all page cache pages in the LRU,
> but now as the old memcg.{emin, elow} value are still there, it will get
> a wrong protection value,
> and the reclaimer can't reclaim the page cache pages protected by this
> wrong protection.

Could you be more specific please. Your example above says that emin is
not going to be recalculated and stays at 512M even for a potential max
limit reclaim. The min limit is still 512M so why is this value wrong?

> > What happens if we have concurrent
> > reclaimers at different levels of the hierarchy how that would affect
> > the resulting protection?
> >
> 
> Well, I thought the synchronization mechanisms have already existed ?
> Otherwise there must be concurrent issue in the original code of
> setting the memcg.{emin, elow} as well.
> (Because memcg->memory.{emin, elow} are also set at the end of the
> function mem_cgroup_protected())

This function is documented to be racy and I believe this is OK because
it doesn't really have to be precise and concurrent updates are not
going to change values much. But does the same apply to reseting the
effective values? Maybe yes. Make sure to document this in the changelog
please.
Yafang Shao Feb. 17, 2020, 1:51 p.m. UTC | #4
On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > them won't be changed until next recalculation in this function. After
> > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > the old values of them will be used to calculate scan count, that is not
> > > > proper. We should reset them to zero in this case.
> > > >
> > > > Here's an example of this issue.
> > > >
> > > >     root_mem_cgroup
> > > >          /
> > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > >
> > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > this A, and it will assign memory.emin of A with 512M.
> > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > 512M, and then this old value will be used in
> > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > That is not proper.
> > >
> > > Please document user visible effects of this patch. What does it mean
> > > that this is not proper behavior?
> >
> > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > but now as the old memcg.{emin, elow} value are still there, it will get
> > a wrong protection value,
> > and the reclaimer can't reclaim the page cache pages protected by this
> > wrong protection.
>
> Could you be more specific please. Your example above says that emin is
> not going to be recalculated and stays at 512M even for a potential max
> limit reclaim. The min limit is still 512M so why is this value wrong?
>

Because the relcaimers are changed or the root the relcaimer is changed.

Kswapd begins to relcaim memcg-A.
kswapd
  |
calculate the {emin, elow} for memcg-A
 |
stores {emin, elow} in memory.{emin, elow} of memcg-A
|
This memory.{emin, elow} will protect the page cache pages in memcg-A
(See get_scan_count->mem_cgroup_protection)
|
exit
(And it won't relcaim memcg-A for a long time)


Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
and the root of this new reclaimer is memcg-A.

This memcg relcaimer begins to reclaim memcg-A.
memcg relcaimer
      |
As the root of the relcaimer is memcg-A, it won't calculate emin, elow
for memcg-A.
(See if (memcg == root) in mem_cgroup_protected())
     |
The old memory.{emin, elow} will protect the page cache pages in memcg-A
(SO WE SHOULD CLEAR THE OLD VALUE)
    |
exit

I try my best to illustrate it. Hope it could clarify.



> > > What happens if we have concurrent
> > > reclaimers at different levels of the hierarchy how that would affect
> > > the resulting protection?
> > >
> >
> > Well, I thought the synchronization mechanisms have already existed ?
> > Otherwise there must be concurrent issue in the original code of
> > setting the memcg.{emin, elow} as well.
> > (Because memcg->memory.{emin, elow} are also set at the end of the
> > function mem_cgroup_protected())
>
> This function is documented to be racy and I believe this is OK because
> it doesn't really have to be precise and concurrent updates are not
> going to change values much. But does the same apply to reseting the
> effective values? Maybe yes. Make sure to document this in the changelog
> please.

Sure. I will document it.


--
Yafang Shao
DiDi
Michal Hocko Feb. 17, 2020, 2:04 p.m. UTC | #5
On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > them won't be changed until next recalculation in this function. After
> > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > the old values of them will be used to calculate scan count, that is not
> > > > > proper. We should reset them to zero in this case.
> > > > >
> > > > > Here's an example of this issue.
> > > > >
> > > > >     root_mem_cgroup
> > > > >          /
> > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > >
> > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > 512M, and then this old value will be used in
> > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > That is not proper.
> > > >
> > > > Please document user visible effects of this patch. What does it mean
> > > > that this is not proper behavior?
> > >
> > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > a wrong protection value,
> > > and the reclaimer can't reclaim the page cache pages protected by this
> > > wrong protection.
> >
> > Could you be more specific please. Your example above says that emin is
> > not going to be recalculated and stays at 512M even for a potential max
> > limit reclaim. The min limit is still 512M so why is this value wrong?
> >
> 
> Because the relcaimers are changed or the root the relcaimer is changed.
> 
> Kswapd begins to relcaim memcg-A.
> kswapd
>   |
> calculate the {emin, elow} for memcg-A
>  |
> stores {emin, elow} in memory.{emin, elow} of memcg-A
> |
> This memory.{emin, elow} will protect the page cache pages in memcg-A
> (See get_scan_count->mem_cgroup_protection)
> |
> exit
> (And it won't relcaim memcg-A for a long time)
> 
> 
> Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> and the root of this new reclaimer is memcg-A.
> 
> This memcg relcaimer begins to reclaim memcg-A.
> memcg relcaimer
>       |
> As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> for memcg-A.
> (See if (memcg == root) in mem_cgroup_protected())
>      |
> The old memory.{emin, elow} will protect the page cache pages in memcg-A
> (SO WE SHOULD CLEAR THE OLD VALUE)

I am sorry but I still do not follow. Could you focus on _why_ the old
value is no longer valid?

Btw. have you seen the latest patch from Johannes touching this area
[1]? Is it possible that the issue you are referring to is related with
the one he has fixed?

[1] http://lkml.kernel.org/r/20191219200718.15696-2-hannes@cmpxchg.org
Yafang Shao Feb. 17, 2020, 2:28 p.m. UTC | #6
On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > them won't be changed until next recalculation in this function. After
> > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > proper. We should reset them to zero in this case.
> > > > > >
> > > > > > Here's an example of this issue.
> > > > > >
> > > > > >     root_mem_cgroup
> > > > > >          /
> > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > >
> > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > 512M, and then this old value will be used in
> > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > That is not proper.
> > > > >
> > > > > Please document user visible effects of this patch. What does it mean
> > > > > that this is not proper behavior?
> > > >
> > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > a wrong protection value,
> > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > wrong protection.
> > >
> > > Could you be more specific please. Your example above says that emin is
> > > not going to be recalculated and stays at 512M even for a potential max
> > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > >
> >
> > Because the relcaimers are changed or the root the relcaimer is changed.
> >
> > Kswapd begins to relcaim memcg-A.
> > kswapd
> >   |
> > calculate the {emin, elow} for memcg-A
> >  |
> > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > |
> > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > (See get_scan_count->mem_cgroup_protection)
> > |
> > exit
> > (And it won't relcaim memcg-A for a long time)
> >
> >
> > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > and the root of this new reclaimer is memcg-A.
> >
> > This memcg relcaimer begins to reclaim memcg-A.
> > memcg relcaimer
> >       |
> > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > for memcg-A.
> > (See if (memcg == root) in mem_cgroup_protected())
> >      |
> > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > (SO WE SHOULD CLEAR THE OLD VALUE)
>
> I am sorry but I still do not follow. Could you focus on _why_ the old
> value is no longer valid?

Because for the new reclaimer the memory.{emin, elow} should be 0.
The old value may be not 0, but it was thought as 0 in the if
statement (if (memcg == root)).

>
> Btw. have you seen the latest patch from Johannes touching this area
> [1]? Is it possible that the issue you are referring to is related with
> the one he has fixed?
>
> [1] http://lkml.kernel.org/r/20191219200718.15696-2-hannes@cmpxchg.org
>

I haven't taken a look at it yet.
Michal Hocko Feb. 17, 2020, 2:35 p.m. UTC | #7
On Mon 17-02-20 22:28:38, Yafang Shao wrote:
> On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > >
> > > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > > them won't be changed until next recalculation in this function. After
> > > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > > proper. We should reset them to zero in this case.
> > > > > > >
> > > > > > > Here's an example of this issue.
> > > > > > >
> > > > > > >     root_mem_cgroup
> > > > > > >          /
> > > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > > >
> > > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > > 512M, and then this old value will be used in
> > > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > > That is not proper.
> > > > > >
> > > > > > Please document user visible effects of this patch. What does it mean
> > > > > > that this is not proper behavior?
> > > > >
> > > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > > a wrong protection value,
> > > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > > wrong protection.
> > > >
> > > > Could you be more specific please. Your example above says that emin is
> > > > not going to be recalculated and stays at 512M even for a potential max
> > > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > > >
> > >
> > > Because the relcaimers are changed or the root the relcaimer is changed.
> > >
> > > Kswapd begins to relcaim memcg-A.
> > > kswapd
> > >   |
> > > calculate the {emin, elow} for memcg-A
> > >  |
> > > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > > |
> > > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > > (See get_scan_count->mem_cgroup_protection)
> > > |
> > > exit
> > > (And it won't relcaim memcg-A for a long time)
> > >
> > >
> > > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > > and the root of this new reclaimer is memcg-A.
> > >
> > > This memcg relcaimer begins to reclaim memcg-A.
> > > memcg relcaimer
> > >       |
> > > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > > for memcg-A.
> > > (See if (memcg == root) in mem_cgroup_protected())
> > >      |
> > > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > > (SO WE SHOULD CLEAR THE OLD VALUE)
> >
> > I am sorry but I still do not follow. Could you focus on _why_ the old
> > value is no longer valid?
> 
> Because for the new reclaimer the memory.{emin, elow} should be 0.
> The old value may be not 0, but it was thought as 0 in the if
> statement (if (memcg == root)).

Why should it be 0 when the A.min is still 512MB?
Yafang Shao Feb. 17, 2020, 2:40 p.m. UTC | #8
On Mon, Feb 17, 2020 at 10:35 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 17-02-20 22:28:38, Yafang Shao wrote:
> > On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > > > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > >
> > > > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > > > them won't be changed until next recalculation in this function. After
> > > > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > > > proper. We should reset them to zero in this case.
> > > > > > > >
> > > > > > > > Here's an example of this issue.
> > > > > > > >
> > > > > > > >     root_mem_cgroup
> > > > > > > >          /
> > > > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > > > >
> > > > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > > > 512M, and then this old value will be used in
> > > > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > > > That is not proper.
> > > > > > >
> > > > > > > Please document user visible effects of this patch. What does it mean
> > > > > > > that this is not proper behavior?
> > > > > >
> > > > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > > > a wrong protection value,
> > > > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > > > wrong protection.
> > > > >
> > > > > Could you be more specific please. Your example above says that emin is
> > > > > not going to be recalculated and stays at 512M even for a potential max
> > > > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > > > >
> > > >
> > > > Because the relcaimers are changed or the root the relcaimer is changed.
> > > >
> > > > Kswapd begins to relcaim memcg-A.
> > > > kswapd
> > > >   |
> > > > calculate the {emin, elow} for memcg-A
> > > >  |
> > > > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > > > |
> > > > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > (See get_scan_count->mem_cgroup_protection)
> > > > |
> > > > exit
> > > > (And it won't relcaim memcg-A for a long time)
> > > >
> > > >
> > > > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > > > and the root of this new reclaimer is memcg-A.
> > > >
> > > > This memcg relcaimer begins to reclaim memcg-A.
> > > > memcg relcaimer
> > > >       |
> > > > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > > > for memcg-A.
> > > > (See if (memcg == root) in mem_cgroup_protected())
> > > >      |
> > > > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > (SO WE SHOULD CLEAR THE OLD VALUE)
> > >
> > > I am sorry but I still do not follow. Could you focus on _why_ the old
> > > value is no longer valid?
> >
> > Because for the new reclaimer the memory.{emin, elow} should be 0.
> > The old value may be not 0, but it was thought as 0 in the if
> > statement (if (memcg == root)).
>
> Why should it be 0 when the A.min is still 512MB?

Because A's hard limit is reached and A is the root of memcg relcaimer.
If A is the root of the memcg reclaimer, then the memcg protection
should not prevent it from relcaiming the page cache pages of itself.
That is why the if statement if (memcg == root) exists.
Michal Hocko Feb. 17, 2020, 3:14 p.m. UTC | #9
On Mon 17-02-20 22:40:22, Yafang Shao wrote:
> On Mon, Feb 17, 2020 at 10:35 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Mon 17-02-20 22:28:38, Yafang Shao wrote:
> > > On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > > > > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > >
> > > > > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > > > > them won't be changed until next recalculation in this function. After
> > > > > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > > > > proper. We should reset them to zero in this case.
> > > > > > > > >
> > > > > > > > > Here's an example of this issue.
> > > > > > > > >
> > > > > > > > >     root_mem_cgroup
> > > > > > > > >          /
> > > > > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > > > > >
> > > > > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > > > > 512M, and then this old value will be used in
> > > > > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > > > > That is not proper.
> > > > > > > >
> > > > > > > > Please document user visible effects of this patch. What does it mean
> > > > > > > > that this is not proper behavior?
> > > > > > >
> > > > > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > > > > a wrong protection value,
> > > > > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > > > > wrong protection.
> > > > > >
> > > > > > Could you be more specific please. Your example above says that emin is
> > > > > > not going to be recalculated and stays at 512M even for a potential max
> > > > > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > > > > >
> > > > >
> > > > > Because the relcaimers are changed or the root the relcaimer is changed.
> > > > >
> > > > > Kswapd begins to relcaim memcg-A.
> > > > > kswapd
> > > > >   |
> > > > > calculate the {emin, elow} for memcg-A
> > > > >  |
> > > > > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > > > > |
> > > > > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > (See get_scan_count->mem_cgroup_protection)
> > > > > |
> > > > > exit
> > > > > (And it won't relcaim memcg-A for a long time)
> > > > >
> > > > >
> > > > > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > > > > and the root of this new reclaimer is memcg-A.
> > > > >
> > > > > This memcg relcaimer begins to reclaim memcg-A.
> > > > > memcg relcaimer
> > > > >       |
> > > > > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > > > > for memcg-A.
> > > > > (See if (memcg == root) in mem_cgroup_protected())
> > > > >      |
> > > > > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > (SO WE SHOULD CLEAR THE OLD VALUE)
> > > >
> > > > I am sorry but I still do not follow. Could you focus on _why_ the old
> > > > value is no longer valid?
> > >
> > > Because for the new reclaimer the memory.{emin, elow} should be 0.
> > > The old value may be not 0, but it was thought as 0 in the if
> > > statement (if (memcg == root)).
> >
> > Why should it be 0 when the A.min is still 512MB?
> 
> Because A's hard limit is reached and A is the root of memcg relcaimer.

Confused. But your examples suggests that memory.max > memory.min so
having an effective emin 0 or not doesn't make any difference.

> If A is the root of the memcg reclaimer, then the memcg protection
> should not prevent it from relcaiming the page cache pages of itself.
> That is why the if statement if (memcg == root) exists.

I suspect you misinterpret the code or your example is incomplete.
Please have a look at the patch I have referred to earlier. Johannes
explicitly sets effective values to their native ones
	if (parent == root) {
		memcg->memory.emin = memcg->memory.min;
		memcg->memory.elow = memcg->memory.low;
		goto out;
	}

and this matches my understanding.
Yafang Shao Feb. 18, 2020, 2:09 a.m. UTC | #10
On Mon, Feb 17, 2020 at 11:14 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 17-02-20 22:40:22, Yafang Shao wrote:
> > On Mon, Feb 17, 2020 at 10:35 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Mon 17-02-20 22:28:38, Yafang Shao wrote:
> > > > On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > > > > > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > > > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > > > > > them won't be changed until next recalculation in this function. After
> > > > > > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > > > > > proper. We should reset them to zero in this case.
> > > > > > > > > >
> > > > > > > > > > Here's an example of this issue.
> > > > > > > > > >
> > > > > > > > > >     root_mem_cgroup
> > > > > > > > > >          /
> > > > > > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > > > > > >
> > > > > > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > > > > > 512M, and then this old value will be used in
> > > > > > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > > > > > That is not proper.
> > > > > > > > >
> > > > > > > > > Please document user visible effects of this patch. What does it mean
> > > > > > > > > that this is not proper behavior?
> > > > > > > >
> > > > > > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > > > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > > > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > > > > > a wrong protection value,
> > > > > > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > > > > > wrong protection.
> > > > > > >
> > > > > > > Could you be more specific please. Your example above says that emin is
> > > > > > > not going to be recalculated and stays at 512M even for a potential max
> > > > > > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > > > > > >
> > > > > >
> > > > > > Because the relcaimers are changed or the root the relcaimer is changed.
> > > > > >
> > > > > > Kswapd begins to relcaim memcg-A.
> > > > > > kswapd
> > > > > >   |
> > > > > > calculate the {emin, elow} for memcg-A
> > > > > >  |
> > > > > > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > > > > > |
> > > > > > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > > (See get_scan_count->mem_cgroup_protection)
> > > > > > |
> > > > > > exit
> > > > > > (And it won't relcaim memcg-A for a long time)
> > > > > >
> > > > > >
> > > > > > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > > > > > and the root of this new reclaimer is memcg-A.
> > > > > >
> > > > > > This memcg relcaimer begins to reclaim memcg-A.
> > > > > > memcg relcaimer
> > > > > >       |
> > > > > > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > > > > > for memcg-A.
> > > > > > (See if (memcg == root) in mem_cgroup_protected())
> > > > > >      |
> > > > > > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > > (SO WE SHOULD CLEAR THE OLD VALUE)
> > > > >
> > > > > I am sorry but I still do not follow. Could you focus on _why_ the old
> > > > > value is no longer valid?
> > > >
> > > > Because for the new reclaimer the memory.{emin, elow} should be 0.
> > > > The old value may be not 0, but it was thought as 0 in the if
> > > > statement (if (memcg == root)).
> > >
> > > Why should it be 0 when the A.min is still 512MB?
> >
> > Because A's hard limit is reached and A is the root of memcg relcaimer.
>
> Confused. But your examples suggests that memory.max > memory.min so
> having an effective emin 0 or not doesn't make any difference.
>

Why is it having an effective emin 0 if memory.max > memory.min ?
Note that effective emin is only set in function
mem_cgroup_protected(), so if we don't set it explicitly to 0 then it
can't be 0.

Besides mem_cgroup_protected(), the effective emin also take effect in
the function mem_cgroup_protection(), but in this function it only use
the existed memory.emin rather than verifying memory.max > memory.min.

So the real issue is in mem_cgroup_protection(), because the value it
is using may be an old value.

> > If A is the root of the memcg reclaimer, then the memcg protection
> > should not prevent it from relcaiming the page cache pages of itself.
> > That is why the if statement if (memcg == root) exists.
>
> I suspect you misinterpret the code or your example is incomplete.
> Please have a look at the patch I have referred to earlier. Johannes
> explicitly sets effective values to their native ones
>         if (parent == root) {
>                 memcg->memory.emin = memcg->memory.min;
>                 memcg->memory.elow = memcg->memory.low;
>                 goto out;
>         }
>
> and this matches my understanding.

I haven't read Johannes's patch carefully, but take a first glance I
don't think it can fix this issue.
Michal Hocko Feb. 18, 2020, 8:59 a.m. UTC | #11
On Tue 18-02-20 10:09:06, Yafang Shao wrote:
> On Mon, Feb 17, 2020 at 11:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Mon 17-02-20 22:40:22, Yafang Shao wrote:
> > > On Mon, Feb 17, 2020 at 10:35 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Mon 17-02-20 22:28:38, Yafang Shao wrote:
> > > > > On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > >
> > > > > > On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > > > > > > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > > > > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > > > > > > them won't be changed until next recalculation in this function. After
> > > > > > > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > > > > > > proper. We should reset them to zero in this case.
> > > > > > > > > > >
> > > > > > > > > > > Here's an example of this issue.
> > > > > > > > > > >
> > > > > > > > > > >     root_mem_cgroup
> > > > > > > > > > >          /
> > > > > > > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > > > > > > >
> > > > > > > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > > > > > > 512M, and then this old value will be used in
> > > > > > > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > > > > > > That is not proper.
> > > > > > > > > >
> > > > > > > > > > Please document user visible effects of this patch. What does it mean
> > > > > > > > > > that this is not proper behavior?
> > > > > > > > >
> > > > > > > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > > > > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > > > > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > > > > > > a wrong protection value,
> > > > > > > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > > > > > > wrong protection.
> > > > > > > >
> > > > > > > > Could you be more specific please. Your example above says that emin is
> > > > > > > > not going to be recalculated and stays at 512M even for a potential max
> > > > > > > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > > > > > > >
> > > > > > >
> > > > > > > Because the relcaimers are changed or the root the relcaimer is changed.
> > > > > > >
> > > > > > > Kswapd begins to relcaim memcg-A.
> > > > > > > kswapd
> > > > > > >   |
> > > > > > > calculate the {emin, elow} for memcg-A
> > > > > > >  |
> > > > > > > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > > > > > > |
> > > > > > > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > > > (See get_scan_count->mem_cgroup_protection)
> > > > > > > |
> > > > > > > exit
> > > > > > > (And it won't relcaim memcg-A for a long time)
> > > > > > >
> > > > > > >
> > > > > > > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > > > > > > and the root of this new reclaimer is memcg-A.
> > > > > > >
> > > > > > > This memcg relcaimer begins to reclaim memcg-A.
> > > > > > > memcg relcaimer
> > > > > > >       |
> > > > > > > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > > > > > > for memcg-A.
> > > > > > > (See if (memcg == root) in mem_cgroup_protected())
> > > > > > >      |
> > > > > > > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > > > (SO WE SHOULD CLEAR THE OLD VALUE)
> > > > > >
> > > > > > I am sorry but I still do not follow. Could you focus on _why_ the old
> > > > > > value is no longer valid?
> > > > >
> > > > > Because for the new reclaimer the memory.{emin, elow} should be 0.
> > > > > The old value may be not 0, but it was thought as 0 in the if
> > > > > statement (if (memcg == root)).
> > > >
> > > > Why should it be 0 when the A.min is still 512MB?
> > >
> > > Because A's hard limit is reached and A is the root of memcg relcaimer.
> >
> > Confused. But your examples suggests that memory.max > memory.min so
> > having an effective emin 0 or not doesn't make any difference.
> >
> 
> Why is it having an effective emin 0 if memory.max > memory.min ?
> Note that effective emin is only set in function
> mem_cgroup_protected(), so if we don't set it explicitly to 0 then it
> can't be 0.
>
> Besides mem_cgroup_protected(), the effective emin also take effect in
> the function mem_cgroup_protection(), but in this function it only use
> the existed memory.emin rather than verifying memory.max > memory.min.
> 
> So the real issue is in mem_cgroup_protection(), because the value it
> is using may be an old value.

I am sorry but I still do not follow. You keep focusing on talking about
the code while I am really interested in the user visible semantic that
you want to achieve. I am sorry to be dense here but believe me I am
trying.

Your example doesn't help much because the effective protection doesn't
play any role in the limit reclaim there AFAICS. I would even argue that
emin == min is the proper thing in your example.

So I can only recommend you to rethink your usecase and try to describe
it in a higher level way.

Thanks!
Yafang Shao Feb. 18, 2020, 11:03 a.m. UTC | #12
On Tue, Feb 18, 2020 at 4:59 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 18-02-20 10:09:06, Yafang Shao wrote:
> > On Mon, Feb 17, 2020 at 11:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Mon 17-02-20 22:40:22, Yafang Shao wrote:
> > > > On Mon, Feb 17, 2020 at 10:35 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > On Mon 17-02-20 22:28:38, Yafang Shao wrote:
> > > > > > On Mon, Feb 17, 2020 at 10:04 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon 17-02-20 21:51:23, Yafang Shao wrote:
> > > > > > > > On Mon, Feb 17, 2020 at 9:24 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > On Mon 17-02-20 21:08:12, Yafang Shao wrote:
> > > > > > > > > > On Mon, Feb 17, 2020 at 5:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sun 16-02-20 09:52:49, Yafang Shao wrote:
> > > > > > > > > > > > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of
> > > > > > > > > > > > them won't be changed until next recalculation in this function. After
> > > > > > > > > > > > either or both of them are set, the next reclaimer to relcaim this memcg
> > > > > > > > > > > > may be a different reclaimer, e.g. this memcg is also the root memcg of
> > > > > > > > > > > > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count()
> > > > > > > > > > > > the old values of them will be used to calculate scan count, that is not
> > > > > > > > > > > > proper. We should reset them to zero in this case.
> > > > > > > > > > > >
> > > > > > > > > > > > Here's an example of this issue.
> > > > > > > > > > > >
> > > > > > > > > > > >     root_mem_cgroup
> > > > > > > > > > > >          /
> > > > > > > > > > > >         A   memory.max=1024M memory.min=512M memory.current=800M
> > > > > > > > > > > >
> > > > > > > > > > > > Once kswapd is waked up, it will try to scan all MEMCGs, including
> > > > > > > > > > > > this A, and it will assign memory.emin of A with 512M.
> > > > > > > > > > > > After that, A may reach its hard limit(memory.max), and then it will
> > > > > > > > > > > > do memcg reclaim. Because A is the root of this reclaimer, so it will
> > > > > > > > > > > > not calculate its memory.emin. So the memory.emin is the old value
> > > > > > > > > > > > 512M, and then this old value will be used in
> > > > > > > > > > > > mem_cgroup_protection() in get_scan_count() to get the scan count.
> > > > > > > > > > > > That is not proper.
> > > > > > > > > > >
> > > > > > > > > > > Please document user visible effects of this patch. What does it mean
> > > > > > > > > > > that this is not proper behavior?
> > > > > > > > > >
> > > > > > > > > > In the memcg reclaim, if the target memcg is the root of the reclaimer,
> > > > > > > > > > the reclaimer should scan this memcg's all page cache pages in the LRU,
> > > > > > > > > > but now as the old memcg.{emin, elow} value are still there, it will get
> > > > > > > > > > a wrong protection value,
> > > > > > > > > > and the reclaimer can't reclaim the page cache pages protected by this
> > > > > > > > > > wrong protection.
> > > > > > > > >
> > > > > > > > > Could you be more specific please. Your example above says that emin is
> > > > > > > > > not going to be recalculated and stays at 512M even for a potential max
> > > > > > > > > limit reclaim. The min limit is still 512M so why is this value wrong?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Because the relcaimers are changed or the root the relcaimer is changed.
> > > > > > > >
> > > > > > > > Kswapd begins to relcaim memcg-A.
> > > > > > > > kswapd
> > > > > > > >   |
> > > > > > > > calculate the {emin, elow} for memcg-A
> > > > > > > >  |
> > > > > > > > stores {emin, elow} in memory.{emin, elow} of memcg-A
> > > > > > > > |
> > > > > > > > This memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > > > > (See get_scan_count->mem_cgroup_protection)
> > > > > > > > |
> > > > > > > > exit
> > > > > > > > (And it won't relcaim memcg-A for a long time)
> > > > > > > >
> > > > > > > >
> > > > > > > > Then memcg relcaimer is woke up (reached the hard limit of memcg-A),
> > > > > > > > and the root of this new reclaimer is memcg-A.
> > > > > > > >
> > > > > > > > This memcg relcaimer begins to reclaim memcg-A.
> > > > > > > > memcg relcaimer
> > > > > > > >       |
> > > > > > > > As the root of the relcaimer is memcg-A, it won't calculate emin, elow
> > > > > > > > for memcg-A.
> > > > > > > > (See if (memcg == root) in mem_cgroup_protected())
> > > > > > > >      |
> > > > > > > > The old memory.{emin, elow} will protect the page cache pages in memcg-A
> > > > > > > > (SO WE SHOULD CLEAR THE OLD VALUE)
> > > > > > >
> > > > > > > I am sorry but I still do not follow. Could you focus on _why_ the old
> > > > > > > value is no longer valid?
> > > > > >
> > > > > > Because for the new reclaimer the memory.{emin, elow} should be 0.
> > > > > > The old value may be not 0, but it was thought as 0 in the if
> > > > > > statement (if (memcg == root)).
> > > > >
> > > > > Why should it be 0 when the A.min is still 512MB?
> > > >
> > > > Because A's hard limit is reached and A is the root of memcg relcaimer.
> > >
> > > Confused. But your examples suggests that memory.max > memory.min so
> > > having an effective emin 0 or not doesn't make any difference.
> > >
> >
> > Why is it having an effective emin 0 if memory.max > memory.min ?
> > Note that effective emin is only set in function
> > mem_cgroup_protected(), so if we don't set it explicitly to 0 then it
> > can't be 0.
> >
> > Besides mem_cgroup_protected(), the effective emin also take effect in
> > the function mem_cgroup_protection(), but in this function it only use
> > the existed memory.emin rather than verifying memory.max > memory.min.
> >
> > So the real issue is in mem_cgroup_protection(), because the value it
> > is using may be an old value.
>
> I am sorry but I still do not follow. You keep focusing on talking about
> the code while I am really interested in the user visible semantic that
> you want to achieve. I am sorry to be dense here but believe me I am
> trying.
>

Sorry about my poor English that hasn't described it clearly.

> Your example doesn't help much because the effective protection doesn't
> play any role in the limit reclaim there AFAICS. I would even argue that
> emin == min is the proper thing in your example.
>
> So I can only recommend you to rethink your usecase and try to describe
> it in a higher level way.
>

Yes, I will try to improve the example and make it more clear.

Patch
diff mbox series

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f6dc8712e39..df7fedbfc211 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6250,8 +6250,17 @@  enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 
 	if (!root)
 		root = root_mem_cgroup;
-	if (memcg == root)
+	if (memcg == root) {
+		/*
+		 * Reset memory.(emin, elow) for reclaiming the memcg
+		 * itself.
+		 */
+		if (memcg != root_mem_cgroup) {
+			memcg->memory.emin = 0;
+			memcg->memory.elow = 0;
+		}
 		return MEMCG_PROT_NONE;
+	}
 
 	usage = page_counter_read(&memcg->memory);
 	if (!usage)