diff mbox series

[v4,3/4] mm: memcg: let non-unified root stats flushes help unified flushes

Message ID 20230831165611.2610118-4-yosryahmed@google.com (mailing list archive)
State New
Headers show
Series memcg: non-unified flushing for userspace stats | expand

Commit Message

Yosry Ahmed Aug. 31, 2023, 4:56 p.m. UTC
Unified flushing of memcg stats keeps track of the magnitude of pending
updates, and only allows a flush if that magnitude exceeds a threshold.
It also keeps track of the time at which ratelimited flushing should be
allowed as flush_next_time.

A non-unified flush on the root memcg has the same effect as a unified
flush, so let it help unified flushing by resetting pending updates and
kicking flush_next_time forward. Move the logic into the common
do_stats_flush() helper, and do it for all root flushes, unified or
not.

There is a subtle change here, we reset stats_flush_threshold before a
flush rather than after a flush. This probably okay because:

(a) For flushers: only unified flushers check stats_flush_threshold, and
those flushers skip anyway if there is another unified flush ongoing.
Having them also skip if there is an ongoing non-unified root flush is
actually more consistent.

(b) For updaters: Resetting stats_flush_threshold early may lead to more
atomic updates of stats_flush_threshold, as we start updating it
earlier. This should not be significant in practice because we stop
updating stats_flush_threshold when it reaches the threshold anyway. If
we start early and stop early, the number of atomic updates remain the
same. The only difference is the scenario where we reset
stats_flush_threshold early, start doing atomic updates early, and then
the periodic flusher kicks in before we reach the threshold. In this
case, we will have done more atomic updates. However, since the
threshold wasn't reached, then we did not do a lot of updates anyway.

Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/memcontrol.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Michal Hocko Sept. 4, 2023, 2:50 p.m. UTC | #1
On Thu 31-08-23 16:56:10, Yosry Ahmed wrote:
> Unified flushing of memcg stats keeps track of the magnitude of pending
> updates, and only allows a flush if that magnitude exceeds a threshold.
> It also keeps track of the time at which ratelimited flushing should be
> allowed as flush_next_time.
> 
> A non-unified flush on the root memcg has the same effect as a unified
> flush, so let it help unified flushing by resetting pending updates and
> kicking flush_next_time forward. Move the logic into the common
> do_stats_flush() helper, and do it for all root flushes, unified or
> not.

I have hard time to follow why we really want/need this. Does this cause
any observable changes to the behavior?

> 
> There is a subtle change here, we reset stats_flush_threshold before a
> flush rather than after a flush. This probably okay because:
> 
> (a) For flushers: only unified flushers check stats_flush_threshold, and
> those flushers skip anyway if there is another unified flush ongoing.
> Having them also skip if there is an ongoing non-unified root flush is
> actually more consistent.
> 
> (b) For updaters: Resetting stats_flush_threshold early may lead to more
> atomic updates of stats_flush_threshold, as we start updating it
> earlier. This should not be significant in practice because we stop
> updating stats_flush_threshold when it reaches the threshold anyway. If
> we start early and stop early, the number of atomic updates remain the
> same. The only difference is the scenario where we reset
> stats_flush_threshold early, start doing atomic updates early, and then
> the periodic flusher kicks in before we reach the threshold. In this
> case, we will have done more atomic updates. However, since the
> threshold wasn't reached, then we did not do a lot of updates anyway.
> 
> Suggested-by: Michal Koutný <mkoutny@suse.com>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  mm/memcontrol.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8c046feeaae7..94d5a6751a9e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -647,6 +647,11 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
>   */
>  static void do_stats_flush(struct mem_cgroup *memcg)
>  {
> +	/* for unified flushing, root non-unified flushing can help as well */
> +	if (mem_cgroup_is_root(memcg)) {
> +		WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
> +		atomic_set(&stats_flush_threshold, 0);
> +	}
>  	cgroup_rstat_flush(memcg->css.cgroup);
>  }
>  
> @@ -665,11 +670,8 @@ static void do_unified_stats_flush(void)
>  	    atomic_xchg(&stats_unified_flush_ongoing, 1))
>  		return;
>  
> -	WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
> -
>  	do_stats_flush(root_mem_cgroup);
>  
> -	atomic_set(&stats_flush_threshold, 0);
>  	atomic_set(&stats_unified_flush_ongoing, 0);
>  }
>  
> -- 
> 2.42.0.rc2.253.gd59a3bf2b4-goog
Michal Koutný Sept. 4, 2023, 3:29 p.m. UTC | #2
Hello.

On Mon, Sep 04, 2023 at 04:50:15PM +0200, Michal Hocko <mhocko@suse.com> wrote:
> I have hard time to follow why we really want/need this. Does this cause
> any observable changes to the behavior?

Behavior change depends on how much userspace triggers the root memcg
flush, from nothing to effectively offloading flushing to userspace tasks.
(Theory^^^)

It keeps stats_flush_threshold up to date representing global error
estimate so that error-tolerant readers may save their time and it keeps
the reasoning about the stats_flush_threshold effect simple.

Michal
Michal Hocko Sept. 4, 2023, 3:41 p.m. UTC | #3
On Mon 04-09-23 17:29:14, Michal Koutny wrote:
> Hello.
> 
> On Mon, Sep 04, 2023 at 04:50:15PM +0200, Michal Hocko <mhocko@suse.com> wrote:
> > I have hard time to follow why we really want/need this. Does this cause
> > any observable changes to the behavior?
> 
> Behavior change depends on how much userspace triggers the root memcg
> flush, from nothing to effectively offloading flushing to userspace tasks.
> (Theory^^^)
> 
> It keeps stats_flush_threshold up to date representing global error
> estimate so that error-tolerant readers may save their time and it keeps
> the reasoning about the stats_flush_threshold effect simple.

So it also creates an undocumented but userspace visible behavior.
Something that userspace might start depending on, right?
Michal Koutný Sept. 5, 2023, 2:10 p.m. UTC | #4
On Mon, Sep 04, 2023 at 05:41:10PM +0200, Michal Hocko <mhocko@suse.com> wrote:
> So it also creates an undocumented but userspace visible behavior.
> Something that userspace might start depending on, right?

Yes but -
- depending on undocumented behavior is a mistake,
- breaking the dependency would manifest (in the case I imagine) as a
  performance regression (and if there are some users, the future can
  allow them configuring periodic kernel flush to compensate for that).

Or do you suggest these effects should be documented (that would require
deeper analysis of the actual effect)? 


Thanks,
Michal
Yosry Ahmed Sept. 5, 2023, 3:54 p.m. UTC | #5
On Tue, Sep 5, 2023 at 7:10 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Mon, Sep 04, 2023 at 05:41:10PM +0200, Michal Hocko <mhocko@suse.com> wrote:
> > So it also creates an undocumented but userspace visible behavior.
> > Something that userspace might start depending on, right?
>
> Yes but -
> - depending on undocumented behavior is a mistake,
> - breaking the dependency would manifest (in the case I imagine) as a
>   performance regression (and if there are some users, the future can
>   allow them configuring periodic kernel flush to compensate for that).

I think I am missing something. This change basically makes userspace
readers (for the root memcg) help out unified flushers, which are
in-kernel readers (e.g. reclaim) -- not the other way around.

How would that create a userspace visible behavior that a dependency
can be formed on? Users expecting reclaim to be faster right after
reading root stats? I would guess that would be too flaky to cause a
behavior that people can depend on tbh.
Michal Koutný Sept. 5, 2023, 4:07 p.m. UTC | #6
On Tue, Sep 05, 2023 at 08:54:46AM -0700, Yosry Ahmed <yosryahmed@google.com> wrote:
> How would that create a userspace visible behavior that a dependency
> can be formed on?

A userspace process reading out root memory.stat more frequently than
in-kernel periodic flusher.

> Users expecting reclaim to be faster right after reading root stats?

Yes, that is what I had in mind.

> I would guess that would be too flaky to cause a behavior that people
> can depend on tbh.

I agree it's a weird dependency. As I wrote, nothing that would be hard
to take away.


Michal
Michal Hocko Sept. 12, 2023, 11:03 a.m. UTC | #7
On Tue 05-09-23 08:54:46, Yosry Ahmed wrote:
> On Tue, Sep 5, 2023 at 7:10 AM Michal Koutný <mkoutny@suse.com> wrote:
> >
> > On Mon, Sep 04, 2023 at 05:41:10PM +0200, Michal Hocko <mhocko@suse.com> wrote:
> > > So it also creates an undocumented but userspace visible behavior.
> > > Something that userspace might start depending on, right?
> >
> > Yes but -
> > - depending on undocumented behavior is a mistake,
> > - breaking the dependency would manifest (in the case I imagine) as a
> >   performance regression (and if there are some users, the future can
> >   allow them configuring periodic kernel flush to compensate for that).
> 
> I think I am missing something. This change basically makes userspace
> readers (for the root memcg) help out unified flushers, which are
> in-kernel readers (e.g. reclaim) -- not the other way around.
> 
> How would that create a userspace visible behavior that a dependency
> can be formed on? Users expecting reclaim to be faster right after
> reading root stats? I would guess that would be too flaky to cause a
> behavior that people can depend on tbh.

Flaky or not, it might cause behavior difference and a subtle one. I can
imagine nohz and similar workloads wanting to (ab)use this to reduce
kernel footprint. If we really need this then make it obvious in the
changelog at least.
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8c046feeaae7..94d5a6751a9e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -647,6 +647,11 @@  static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
  */
 static void do_stats_flush(struct mem_cgroup *memcg)
 {
+	/* for unified flushing, root non-unified flushing can help as well */
+	if (mem_cgroup_is_root(memcg)) {
+		WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
+		atomic_set(&stats_flush_threshold, 0);
+	}
 	cgroup_rstat_flush(memcg->css.cgroup);
 }
 
@@ -665,11 +670,8 @@  static void do_unified_stats_flush(void)
 	    atomic_xchg(&stats_unified_flush_ongoing, 1))
 		return;
 
-	WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
-
 	do_stats_flush(root_mem_cgroup);
 
-	atomic_set(&stats_flush_threshold, 0);
 	atomic_set(&stats_unified_flush_ongoing, 0);
 }