diff mbox series

[v2,2/2] memcg: unify memcg stat flushing

Message ID 20211001190040.48086-2-shakeelb@google.com (mailing list archive)
State New
Headers show
Series [v2,1/2] memcg: flush stats only if updated | expand

Commit Message

Shakeel Butt Oct. 1, 2021, 7 p.m. UTC
The memcg stats can be flushed in multiple context and potentially in
parallel too.  For example multiple parallel user space readers for memcg
stats will contend on the rstat locks with each other.  There is no need
for that.  We just need one flusher and everyone else can benefit.  In
addition after aa48e47e3906 ("memcg: infrastructure to flush memcg stats")
the kernel periodically flush the memcg stats from the root, so, the other
flushers will potentially have much less work to do.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Changelog since v1:
- N/A

 mm/memcontrol.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Comments

Michal Koutný Oct. 13, 2021, 6:01 p.m. UTC | #1
Hello Shakeel.

(Sorry for taking so long getting down to this.)

On Fri, Oct 01, 2021 at 12:00:40PM -0700, Shakeel Butt <shakeelb@google.com> wrote:
> There is no need for that.  We just need one flusher and everyone else
> can benefit.

I imagine a cgroup with an intricate deep hiearchy with many updates and
a separate (simpler) sibling/independent cgroup that would need to pay
the costs of the first hierarchy updates [1] when it asks just for its
own stats (bound by the amount that's leftover from the periodic
updates).

The stats files (or wb stats) are likely not that time sensitive and the
reclaim (that can be local only but is slow path anyway) already uses
the global flushing.

I wonder whether the bigger benefit would be to retain the global
stats_flush_threshold counter but flush only local subtree.

Thanks,
Michal

[1] At first I thought non-memcg updates would interfere too via rstat
tree but I see it's actually filtered with the stats_flush_threshold so
only foreign memcg updates are relevant.
Shakeel Butt Oct. 13, 2021, 7:24 p.m. UTC | #2
On Wed, Oct 13, 2021 at 11:01 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello Shakeel.
>
> (Sorry for taking so long getting down to this.)
>
> On Fri, Oct 01, 2021 at 12:00:40PM -0700, Shakeel Butt <shakeelb@google.com> wrote:
> > There is no need for that.  We just need one flusher and everyone else
> > can benefit.
>
> I imagine a cgroup with an intricate deep hiearchy with many updates and
> a separate (simpler) sibling/independent cgroup that would need to pay
> the costs of the first hierarchy updates [1] when it asks just for its
> own stats (bound by the amount that's leftover from the periodic
> updates).
>
> The stats files (or wb stats) are likely not that time sensitive and the
> reclaim (that can be local only but is slow path anyway) already uses
> the global flushing.
>
> I wonder whether the bigger benefit would be to retain the global
> stats_flush_threshold counter but flush only local subtree.

I did contemplate on this (i.e. a stat read paying the flushing price
for everyone else) but decided to keep as is based on:

1) The periodic async flush will keep the update tree small and will
keep infrequent readers cheap.
2) Keep things simple for now and come back if someone complains for
very frequent stats readers.

Shakeel
Michal Koutný Oct. 14, 2021, 9:04 a.m. UTC | #3
On Wed, Oct 13, 2021 at 12:24:31PM -0700, Shakeel Butt <shakeelb@google.com> wrote:
> 1) The periodic async flush will keep the update tree small and will
> keep infrequent readers cheap.
> 2) Keep things simple for now and come back if someone complains for
> very frequent stats readers.

OK, understood.

Michal
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 25f55636ca37..22d905f30a30 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -638,12 +638,14 @@  static inline void memcg_rstat_updated(struct mem_cgroup *memcg)
 
 static void __mem_cgroup_flush_stats(void)
 {
-	if (!spin_trylock(&stats_flush_lock))
+	unsigned long flag;
+
+	if (!spin_trylock_irqsave(&stats_flush_lock, flag))
 		return;
 
 	cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
 	atomic_set(&stats_flush_threshold, 0);
-	spin_unlock(&stats_flush_lock);
+	spin_unlock_irqrestore(&stats_flush_lock, flag);
 }
 
 void mem_cgroup_flush_stats(void)
@@ -1462,7 +1464,7 @@  static char *memory_stat_format(struct mem_cgroup *memcg)
 	 *
 	 * Current memory state:
 	 */
-	cgroup_rstat_flush(memcg->css.cgroup);
+	mem_cgroup_flush_stats();
 
 	for (i = 0; i < ARRAY_SIZE(memory_stats); i++) {
 		u64 size;
@@ -3566,8 +3568,7 @@  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
 	unsigned long val;
 
 	if (mem_cgroup_is_root(memcg)) {
-		/* mem_cgroup_threshold() calls here from irqsafe context */
-		cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
+		mem_cgroup_flush_stats();
 		val = memcg_page_state(memcg, NR_FILE_PAGES) +
 			memcg_page_state(memcg, NR_ANON_MAPPED);
 		if (swap)
@@ -3948,7 +3949,7 @@  static int memcg_numa_stat_show(struct seq_file *m, void *v)
 	int nid;
 	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
 
-	cgroup_rstat_flush(memcg->css.cgroup);
+	mem_cgroup_flush_stats();
 
 	for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) {
 		seq_printf(m, "%s=%lu", stat->name,
@@ -4020,7 +4021,7 @@  static int memcg_stat_show(struct seq_file *m, void *v)
 
 	BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats));
 
-	cgroup_rstat_flush(memcg->css.cgroup);
+	mem_cgroup_flush_stats();
 
 	for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) {
 		unsigned long nr;
@@ -4523,7 +4524,7 @@  void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
 	struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css);
 	struct mem_cgroup *parent;
 
-	cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
+	mem_cgroup_flush_stats();
 
 	*pdirty = memcg_page_state(memcg, NR_FILE_DIRTY);
 	*pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
@@ -6408,7 +6409,7 @@  static int memory_numa_stat_show(struct seq_file *m, void *v)
 	int i;
 	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
 
-	cgroup_rstat_flush(memcg->css.cgroup);
+	mem_cgroup_flush_stats();
 
 	for (i = 0; i < ARRAY_SIZE(memory_stats); i++) {
 		int nid;