diff mbox series

[v2,7/9] workingset: memcg: sleep when flushing stats in workingset_refault()

Message ID 20230328221644.803272-8-yosryahmed@google.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series memcg: make rstat flushing irq and sleep | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Yosry Ahmed March 28, 2023, 10:16 p.m. UTC
In workingset_refault(), we call
mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an
RCU read section and with sleeping disallowed. Move the call above
the RCU read section and allow sleeping to avoid unnecessarily
performing a lot of work without sleeping.

Since workingset_refault() is the only caller of
mem_cgroup_flush_stats_atomic_ratelimited(), just make it non-atomic,
and rename it to mem_cgroup_flush_stats_ratelimited().

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 4 ++--
 mm/memcontrol.c            | 4 ++--
 mm/workingset.c            | 5 +++--
 3 files changed, 7 insertions(+), 6 deletions(-)

Comments

Michal Hocko March 30, 2023, 7:39 a.m. UTC | #1
On Tue 28-03-23 22:16:42, Yosry Ahmed wrote:
> In workingset_refault(), we call
> mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an
> RCU read section and with sleeping disallowed. Move the call above
> the RCU read section and allow sleeping to avoid unnecessarily
> performing a lot of work without sleeping.

Could you say few words why the flushing is done before counters are
updated rather than after (the RCU section)?
Yosry Ahmed March 30, 2023, 7:42 a.m. UTC | #2
On Thu, Mar 30, 2023 at 12:39 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 28-03-23 22:16:42, Yosry Ahmed wrote:
> > In workingset_refault(), we call
> > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an
> > RCU read section and with sleeping disallowed. Move the call above
> > the RCU read section and allow sleeping to avoid unnecessarily
> > performing a lot of work without sleeping.
>
> Could you say few words why the flushing is done before counters are
> updated rather than after (the RCU section)?

It's not about the counters that are updated, it's about the counters
that we read. Stats readers do a flush first to read accurate stats.
We flush before a read, not after an update.

> --
> Michal Hocko
> SUSE Labs
Michal Hocko March 30, 2023, 7:50 a.m. UTC | #3
On Thu 30-03-23 00:42:36, Yosry Ahmed wrote:
> On Thu, Mar 30, 2023 at 12:39 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 28-03-23 22:16:42, Yosry Ahmed wrote:
> > > In workingset_refault(), we call
> > > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an
> > > RCU read section and with sleeping disallowed. Move the call above
> > > the RCU read section and allow sleeping to avoid unnecessarily
> > > performing a lot of work without sleeping.
> >
> > Could you say few words why the flushing is done before counters are
> > updated rather than after (the RCU section)?
> 
> It's not about the counters that are updated, it's about the counters
> that we read. Stats readers do a flush first to read accurate stats.
> We flush before a read, not after an update.

Right you are, my bad I have misread the intention here.

Acked-by: Michal Hocko <mhocko@suse.com>
Yosry Ahmed March 30, 2023, 7:55 a.m. UTC | #4
On Thu, Mar 30, 2023 at 12:50 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 30-03-23 00:42:36, Yosry Ahmed wrote:
> > On Thu, Mar 30, 2023 at 12:39 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Tue 28-03-23 22:16:42, Yosry Ahmed wrote:
> > > > In workingset_refault(), we call
> > > > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an
> > > > RCU read section and with sleeping disallowed. Move the call above
> > > > the RCU read section and allow sleeping to avoid unnecessarily
> > > > performing a lot of work without sleeping.
> > >
> > > Could you say few words why the flushing is done before counters are
> > > updated rather than after (the RCU section)?
> >
> > It's not about the counters that are updated, it's about the counters
> > that we read. Stats readers do a flush first to read accurate stats.
> > We flush before a read, not after an update.
>
> Right you are, my bad I have misread the intention here.
>
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

>
> --
> Michal Hocko
> SUSE Labs
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b424ba3ebd09..a4bc3910a2eb 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1038,7 +1038,7 @@  static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
 
 void mem_cgroup_flush_stats(void);
 void mem_cgroup_flush_stats_atomic(void);
-void mem_cgroup_flush_stats_atomic_ratelimited(void);
+void mem_cgroup_flush_stats_ratelimited(void);
 
 void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			      int val);
@@ -1540,7 +1540,7 @@  static inline void mem_cgroup_flush_stats_atomic(void)
 {
 }
 
-static inline void mem_cgroup_flush_stats_atomic_ratelimited(void)
+static inline void mem_cgroup_flush_stats_ratelimited(void)
 {
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a2ce3aa10d94..361c0bbf7283 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -673,10 +673,10 @@  void mem_cgroup_flush_stats_atomic(void)
 		do_flush_stats(true);
 }
 
-void mem_cgroup_flush_stats_atomic_ratelimited(void)
+void mem_cgroup_flush_stats_ratelimited(void)
 {
 	if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
-		mem_cgroup_flush_stats_atomic();
+		mem_cgroup_flush_stats();
 }
 
 static void flush_memcg_stats_dwork(struct work_struct *w)
diff --git a/mm/workingset.c b/mm/workingset.c
index dab0c362b9e3..3025beee9b34 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -406,6 +406,9 @@  void workingset_refault(struct folio *folio, void *shadow)
 	unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
 	eviction <<= bucket_order;
 
+	/* Flush stats (and potentially sleep) before holding RCU read lock */
+	mem_cgroup_flush_stats_ratelimited();
+
 	rcu_read_lock();
 	/*
 	 * Look up the memcg associated with the stored ID. It might
@@ -461,8 +464,6 @@  void workingset_refault(struct folio *folio, void *shadow)
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
-
-	mem_cgroup_flush_stats_atomic_ratelimited();
 	/*
 	 * Compare the distance to the existing workingset size. We
 	 * don't activate pages that couldn't stay resident even if