Message ID | 20230328221644.803272-8-yosryahmed@google.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | memcg: make rstat flushing irq and sleep | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Tue 28-03-23 22:16:42, Yosry Ahmed wrote: > In workingset_refault(), we call > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an > RCU read section and with sleeping disallowed. Move the call above > the RCU read section and allow sleeping to avoid unnecessarily > performing a lot of work without sleeping. Could you say few words why the flushing is done before counters are updated rather than after (the RCU section)?
On Thu, Mar 30, 2023 at 12:39 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 28-03-23 22:16:42, Yosry Ahmed wrote: > > In workingset_refault(), we call > > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an > > RCU read section and with sleeping disallowed. Move the call above > > the RCU read section and allow sleeping to avoid unnecessarily > > performing a lot of work without sleeping. > > Could you say few words why the flushing is done before counters are > updated rather than after (the RCU section)? It's not about the counters that are updated, it's about the counters that we read. Stats readers do a flush first to read accurate stats. We flush before a read, not after an update. > -- > Michal Hocko > SUSE Labs
On Thu 30-03-23 00:42:36, Yosry Ahmed wrote: > On Thu, Mar 30, 2023 at 12:39 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 28-03-23 22:16:42, Yosry Ahmed wrote: > > > In workingset_refault(), we call > > > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an > > > RCU read section and with sleeping disallowed. Move the call above > > > the RCU read section and allow sleeping to avoid unnecessarily > > > performing a lot of work without sleeping. > > > > Could you say few words why the flushing is done before counters are > > updated rather than after (the RCU section)? > > It's not about the counters that are updated, it's about the counters > that we read. Stats readers do a flush first to read accurate stats. > We flush before a read, not after an update. Right you are, my bad I have misread the intention here. Acked-by: Michal Hocko <mhocko@suse.com>
On Thu, Mar 30, 2023 at 12:50 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 30-03-23 00:42:36, Yosry Ahmed wrote: > > On Thu, Mar 30, 2023 at 12:39 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Tue 28-03-23 22:16:42, Yosry Ahmed wrote: > > > > In workingset_refault(), we call > > > > mem_cgroup_flush_stats_atomic_ratelimited() to flush stats within an > > > > RCU read section and with sleeping disallowed. Move the call above > > > > the RCU read section and allow sleeping to avoid unnecessarily > > > > performing a lot of work without sleeping. > > > > > > Could you say few words why the flushing is done before counters are > > > updated rather than after (the RCU section)? > > > > It's not about the counters that are updated, it's about the counters > > that we read. Stats readers do a flush first to read accurate stats. > > We flush before a read, not after an update. > > Right you are, my bad I have misread the intention here. > > Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > > -- > Michal Hocko > SUSE Labs
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b424ba3ebd09..a4bc3910a2eb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1038,7 +1038,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, void mem_cgroup_flush_stats(void); void mem_cgroup_flush_stats_atomic(void); -void mem_cgroup_flush_stats_atomic_ratelimited(void); +void mem_cgroup_flush_stats_ratelimited(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1540,7 +1540,7 @@ static inline void mem_cgroup_flush_stats_atomic(void) { } -static inline void mem_cgroup_flush_stats_atomic_ratelimited(void) +static inline void mem_cgroup_flush_stats_ratelimited(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a2ce3aa10d94..361c0bbf7283 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -673,10 +673,10 @@ void mem_cgroup_flush_stats_atomic(void) do_flush_stats(true); } -void mem_cgroup_flush_stats_atomic_ratelimited(void) +void mem_cgroup_flush_stats_ratelimited(void) { if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) - mem_cgroup_flush_stats_atomic(); + mem_cgroup_flush_stats(); } static void flush_memcg_stats_dwork(struct work_struct *w) diff --git a/mm/workingset.c b/mm/workingset.c index dab0c362b9e3..3025beee9b34 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -406,6 +406,9 @@ void workingset_refault(struct folio *folio, void *shadow) unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); eviction <<= bucket_order; + /* Flush stats (and potentially sleep) before holding RCU read lock */ + mem_cgroup_flush_stats_ratelimited(); + rcu_read_lock(); /* * Look up the memcg associated with the stored ID. It might @@ -461,8 +464,6 @@ void workingset_refault(struct folio *folio, void *shadow) lruvec = mem_cgroup_lruvec(memcg, pgdat); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - - mem_cgroup_flush_stats_atomic_ratelimited(); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if