Message ID | 20211001190040.48086-2-shakeelb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/2] memcg: flush stats only if updated | expand |
Hello Shakeel. (Sorry for taking so long getting down to this.) On Fri, Oct 01, 2021 at 12:00:40PM -0700, Shakeel Butt <shakeelb@google.com> wrote: > There is no need for that. We just need one flusher and everyone else > can benefit. I imagine a cgroup with an intricate deep hiearchy with many updates and a separate (simpler) sibling/independent cgroup that would need to pay the costs of the first hierarchy updates [1] when it asks just for its own stats (bound by the amount that's leftover from the periodic updates). The stats files (or wb stats) are likely not that time sensitive and the reclaim (that can be local only but is slow path anyway) already uses the global flushing. I wonder whether the bigger benefit would be to retain the global stats_flush_threshold counter but flush only local subtree. Thanks, Michal [1] At first I thought non-memcg updates would interfere too via rstat tree but I see it's actually filtered with the stats_flush_threshold so only foreign memcg updates are relevant.
On Wed, Oct 13, 2021 at 11:01 AM Michal Koutný <mkoutny@suse.com> wrote: > > Hello Shakeel. > > (Sorry for taking so long getting down to this.) > > On Fri, Oct 01, 2021 at 12:00:40PM -0700, Shakeel Butt <shakeelb@google.com> wrote: > > There is no need for that. We just need one flusher and everyone else > > can benefit. > > I imagine a cgroup with an intricate deep hiearchy with many updates and > a separate (simpler) sibling/independent cgroup that would need to pay > the costs of the first hierarchy updates [1] when it asks just for its > own stats (bound by the amount that's leftover from the periodic > updates). > > The stats files (or wb stats) are likely not that time sensitive and the > reclaim (that can be local only but is slow path anyway) already uses > the global flushing. > > I wonder whether the bigger benefit would be to retain the global > stats_flush_threshold counter but flush only local subtree. I did contemplate on this (i.e. a stat read paying the flushing price for everyone else) but decided to keep as is based on: 1) The periodic async flush will keep the update tree small and will keep infrequent readers cheap. 2) Keep things simple for now and come back if someone complains for very frequent stats readers. Shakeel
On Wed, Oct 13, 2021 at 12:24:31PM -0700, Shakeel Butt <shakeelb@google.com> wrote: > 1) The periodic async flush will keep the update tree small and will > keep infrequent readers cheap. > 2) Keep things simple for now and come back if someone complains for > very frequent stats readers. OK, understood. Michal
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 25f55636ca37..22d905f30a30 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -638,12 +638,14 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg) static void __mem_cgroup_flush_stats(void) { - if (!spin_trylock(&stats_flush_lock)) + unsigned long flag; + + if (!spin_trylock_irqsave(&stats_flush_lock, flag)) return; cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock(&stats_flush_lock); + spin_unlock_irqrestore(&stats_flush_lock, flag); } void mem_cgroup_flush_stats(void) @@ -1462,7 +1464,7 @@ static char *memory_stat_format(struct mem_cgroup *memcg) * * Current memory state: */ - cgroup_rstat_flush(memcg->css.cgroup); + mem_cgroup_flush_stats(); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; @@ -3566,8 +3568,7 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) unsigned long val; if (mem_cgroup_is_root(memcg)) { - /* mem_cgroup_threshold() calls here from irqsafe context */ - cgroup_rstat_flush_irqsafe(memcg->css.cgroup); + mem_cgroup_flush_stats(); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) @@ -3948,7 +3949,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) int nid; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - cgroup_rstat_flush(memcg->css.cgroup); + mem_cgroup_flush_stats(); for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { seq_printf(m, "%s=%lu", stat->name, @@ -4020,7 +4021,7 @@ static int memcg_stat_show(struct seq_file *m, void *v) BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - cgroup_rstat_flush(memcg->css.cgroup); + mem_cgroup_flush_stats(); for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { unsigned long nr; @@ -4523,7 +4524,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - cgroup_rstat_flush_irqsafe(memcg->css.cgroup); + mem_cgroup_flush_stats(); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); @@ -6408,7 +6409,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v) int i; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - cgroup_rstat_flush(memcg->css.cgroup); + mem_cgroup_flush_stats(); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { int nid;