diff mbox series

[mm-unstable,v3,4/5] mm: restart if multiple traversals raced

Message ID 20240827230753.2073580-5-kinseyho@google.com (mailing list archive)
State New
Headers show
Series Improve mem_cgroup_iter() | expand

Commit Message

Kinsey Ho Aug. 27, 2024, 11:07 p.m. UTC
Currently, if multiple reclaimers raced on the same position, the
reclaimers which detect the race will still reclaim from the same memcg.
Instead, the reclaimers which detect the race should move on to the next
memcg in the hierarchy.

So, in the case where multiple traversals race, jump back to the start
of the mem_cgroup_iter() function to find the next memcg in the
hierarchy to reclaim from.

Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
 include/linux/memcontrol.h |  4 ++--
 mm/memcontrol.c            | 22 ++++++++++++++--------
 2 files changed, 16 insertions(+), 10 deletions(-)

Comments

T.J. Mercier Aug. 28, 2024, 5:49 p.m. UTC | #1
On Tue, Aug 27, 2024 at 4:11 PM Kinsey Ho <kinseyho@google.com> wrote:
>
> Currently, if multiple reclaimers raced on the same position, the
> reclaimers which detect the race will still reclaim from the same memcg.
> Instead, the reclaimers which detect the race should move on to the next
> memcg in the hierarchy.
>
> So, in the case where multiple traversals race, jump back to the start
> of the mem_cgroup_iter() function to find the next memcg in the
> hierarchy to reclaim from.
>
> Signed-off-by: Kinsey Ho <kinseyho@google.com>

Reviewed-by: T.J. Mercier <tjmercier@google.com>
Hugh Dickins Aug. 30, 2024, 10:04 a.m. UTC | #2
On Tue, 27 Aug 2024, Kinsey Ho wrote:

> Currently, if multiple reclaimers raced on the same position, the
> reclaimers which detect the race will still reclaim from the same memcg.
> Instead, the reclaimers which detect the race should move on to the next
> memcg in the hierarchy.
> 
> So, in the case where multiple traversals race, jump back to the start
> of the mem_cgroup_iter() function to find the next memcg in the
> hierarchy to reclaim from.
> 
> Signed-off-by: Kinsey Ho <kinseyho@google.com>

mm-unstable commit 954dd0848c61 needs the fix below to be merged in;
but the commit after it (the 5/5) then renames "memcg" to "next",
so that one has to be adjusted too.

[PATCH] mm: restart if multiple traversals raced: fix

mem_cgroup_iter() reset memcg to NULL before the goto restart, so that
goto out_unlock does not then return an ungotten memcg, causing oopses
on stale memcg in many places (often in memcg_rstat_updated()).

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/memcontrol.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f66ac0ad4f0..dd82dd1e1f0a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1049,6 +1049,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 		if (cmpxchg(&iter->position, pos, memcg) != pos) {
 			if (css && css != &root->css)
 				css_put(css);
+			memcg = NULL;
 			goto restart;
 		}
Kinsey Ho Aug. 30, 2024, 5:45 p.m. UTC | #3
On Fri, Aug 30, 2024 at 3:04 AM Hugh Dickins <hughd@google.com> wrote:
>
> mm-unstable commit 954dd0848c61 needs the fix below to be merged in;
> but the commit after it (the 5/5) then renames "memcg" to "next",
> so that one has to be adjusted too.
>
> [PATCH] mm: restart if multiple traversals raced: fix
>
> mem_cgroup_iter() reset memcg to NULL before the goto restart, so that
> goto out_unlock does not then return an ungotten memcg, causing oopses
> on stale memcg in many places (often in memcg_rstat_updated()).
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/memcontrol.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6f66ac0ad4f0..dd82dd1e1f0a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1049,6 +1049,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
>                 if (cmpxchg(&iter->position, pos, memcg) != pos) {
>                         if (css && css != &root->css)
>                                 css_put(css);
> +                       memcg = NULL;
>                         goto restart;
>                 }
>
> --
> 2.35.3

Hi Andrew,

Would you prefer that I resend the series with Hugh's fix inserted?

Acked-by: Kinsey Ho <kinseyho@google.com>
Yu Zhao Aug. 30, 2024, 7:04 p.m. UTC | #4
On Fri, Aug 30, 2024 at 11:45 AM Kinsey Ho <kinseyho@google.com> wrote:
>
> On Fri, Aug 30, 2024 at 3:04 AM Hugh Dickins <hughd@google.com> wrote:
> >
> > mm-unstable commit 954dd0848c61 needs the fix below to be merged in;
> > but the commit after it (the 5/5) then renames "memcg" to "next",
> > so that one has to be adjusted too.
> >
> > [PATCH] mm: restart if multiple traversals raced: fix
> >
> > mem_cgroup_iter() reset memcg to NULL before the goto restart, so that
> > goto out_unlock does not then return an ungotten memcg, causing oopses
> > on stale memcg in many places (often in memcg_rstat_updated()).
> >
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> >  mm/memcontrol.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 6f66ac0ad4f0..dd82dd1e1f0a 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1049,6 +1049,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >                 if (cmpxchg(&iter->position, pos, memcg) != pos) {
> >                         if (css && css != &root->css)
> >                                 css_put(css);
> > +                       memcg = NULL;
> >                         goto restart;
> >                 }
> >
> > --
> > 2.35.3
>
> Hi Andrew,
>
> Would you prefer that I resend the series with Hugh's fix inserted?

Please send a new version to get this properly fixed, preferably move
the initialization of `memcg` from the declaration to right below
`restart`, and also add the following footers:

Reported-by: syzbot+e099d407346c45275ce9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/000000000000817cf10620e20d33@google.com/
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index fe05fdb92779..2ef94c74847d 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -57,7 +57,7 @@  enum memcg_memory_event {
 
 struct mem_cgroup_reclaim_cookie {
 	pg_data_t *pgdat;
-	unsigned int generation;
+	int generation;
 };
 
 #ifdef CONFIG_MEMCG
@@ -78,7 +78,7 @@  struct lruvec_stats;
 struct mem_cgroup_reclaim_iter {
 	struct mem_cgroup *position;
 	/* scan generation, increased every round-trip */
-	unsigned int generation;
+	atomic_t generation;
 };
 
 /*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 51b194a4c375..33bd379c738b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -986,7 +986,7 @@  struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 				   struct mem_cgroup_reclaim_cookie *reclaim)
 {
 	struct mem_cgroup_reclaim_iter *iter;
-	struct cgroup_subsys_state *css = NULL;
+	struct cgroup_subsys_state *css;
 	struct mem_cgroup *memcg = NULL;
 	struct mem_cgroup *pos = NULL;
 
@@ -999,18 +999,20 @@  struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 	rcu_read_lock();
 restart:
 	if (reclaim) {
+		int gen;
 		struct mem_cgroup_per_node *mz;
 
 		mz = root->nodeinfo[reclaim->pgdat->node_id];
 		iter = &mz->iter;
+		gen = atomic_read(&iter->generation);
 
 		/*
 		 * On start, join the current reclaim iteration cycle.
 		 * Exit when a concurrent walker completes it.
 		 */
 		if (!prev)
-			reclaim->generation = iter->generation;
-		else if (reclaim->generation != iter->generation)
+			reclaim->generation = gen;
+		else if (reclaim->generation != gen)
 			goto out_unlock;
 
 		pos = READ_ONCE(iter->position);
@@ -1018,8 +1020,7 @@  struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 		pos = prev;
 	}
 
-	if (pos)
-		css = &pos->css;
+	css = pos ? &pos->css : NULL;
 
 	for (;;) {
 		css = css_next_descendant_pre(css, &root->css);
@@ -1033,21 +1034,26 @@  struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 		 * and kicking, and don't take an extra reference.
 		 */
 		if (css == &root->css || css_tryget(css)) {
-			memcg = mem_cgroup_from_css(css);
 			break;
 		}
 	}
 
+	memcg = mem_cgroup_from_css(css);
+
 	if (reclaim) {
 		/*
 		 * The position could have already been updated by a competing
 		 * thread, so check that the value hasn't changed since we read
 		 * it to avoid reclaiming from the same cgroup twice.
 		 */
-		(void)cmpxchg(&iter->position, pos, memcg);
+		if (cmpxchg(&iter->position, pos, memcg) != pos) {
+			if (css && css != &root->css)
+				css_put(css);
+			goto restart;
+		}
 
 		if (!memcg) {
-			iter->generation++;
+			atomic_inc(&iter->generation);
 
 			/*
 			 * Reclaimers share the hierarchy walk, and a