Message ID | 20210726150019.251820-1-hannes@cmpxchg.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code | expand |
Johannes Weiner writes: >Dan Carpenter reports: > > The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr > 29, 2021, leads to the following static checker warning: > > kernel/cgroup/rstat.c:200 cgroup_rstat_flush() > warn: sleeping in atomic context > > mm/memcontrol.c > 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) > 3573 { > 3574 unsigned long val; > 3575 > 3576 if (mem_cgroup_is_root(memcg)) { > 3577 cgroup_rstat_flush(memcg->css.cgroup); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This is from static analysis and potentially a false positive. The > problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() > which holds an rcu_read_lock(). And the cgroup_rstat_flush() function > can sleep. > > 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + > 3579 memcg_page_state(memcg, NR_ANON_MAPPED); > 3580 if (swap) > 3581 val += memcg_page_state(memcg, MEMCG_SWAP); > 3582 } else { > 3583 if (!swap) > 3584 val = page_counter_read(&memcg->memory); > 3585 else > 3586 val = page_counter_read(&memcg->memsw); > 3587 } > 3588 return val; > 3589 } > >__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the >thresholding code is invoked during stat changes, and those contexts >have irqs disabled as well. If the lock breaking occurs inside the >flush function, it will result in a sleep from an atomic context. > >Use the irsafe flushing variant in mem_cgroup_usage() to fix this. > >Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") >Cc: <stable@vger.kernel.org> >Reported-by: Dan Carpenter <dan.carpenter@oracle.com> >Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Thanks, looks good. Acked-by: Chris Down <chris@chrisdown.name> >--- > mm/memcontrol.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/mm/memcontrol.c b/mm/memcontrol.c >index ae1f5d0cb581..eb8e87c4833f 100644 >--- a/mm/memcontrol.c >+++ b/mm/memcontrol.c >@@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) > unsigned long val; > > if (mem_cgroup_is_root(memcg)) { >- cgroup_rstat_flush(memcg->css.cgroup); >+ /* mem_cgroup_threshold() calls here from irqsafe context */ >+ cgroup_rstat_flush_irqsafe(memcg->css.cgroup); > val = memcg_page_state(memcg, NR_FILE_PAGES) + > memcg_page_state(memcg, NR_ANON_MAPPED); > if (swap) >-- >2.32.0 > >
On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote: > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the > thresholding code is invoked during stat changes, and those contexts > have irqs disabled as well. If the lock breaking occurs inside the > flush function, it will result in a sleep from an atomic context. > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this While this fix is necessary, in the long term I think we may want some sort of redesign here, to make sure the irq safe version does not spin long times trying to get the statistics off some other CPU. I have seen a number of soft (IIRC) lockups deep inside the bowels of cgroup_rstat_flush_irqsafe, with the function taking multiple seconds to complete. Reviewed-by: Rik van Riel <riel@surriel.com>
On Mon 26-07-21 11:00:19, Johannes Weiner wrote: > Dan Carpenter reports: > > The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr > 29, 2021, leads to the following static checker warning: > > kernel/cgroup/rstat.c:200 cgroup_rstat_flush() > warn: sleeping in atomic context > > mm/memcontrol.c > 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) > 3573 { > 3574 unsigned long val; > 3575 > 3576 if (mem_cgroup_is_root(memcg)) { > 3577 cgroup_rstat_flush(memcg->css.cgroup); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This is from static analysis and potentially a false positive. The > problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() > which holds an rcu_read_lock(). And the cgroup_rstat_flush() function > can sleep. > > 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + > 3579 memcg_page_state(memcg, NR_ANON_MAPPED); > 3580 if (swap) > 3581 val += memcg_page_state(memcg, MEMCG_SWAP); > 3582 } else { > 3583 if (!swap) > 3584 val = page_counter_read(&memcg->memory); > 3585 else > 3586 val = page_counter_read(&memcg->memsw); > 3587 } > 3588 return val; > 3589 } > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the > thresholding code is invoked during stat changes, and those contexts > have irqs disabled as well. If the lock breaking occurs inside the > flush function, it will result in a sleep from an atomic context. > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this. > > Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") > Cc: <stable@vger.kernel.org> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/memcontrol.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ae1f5d0cb581..eb8e87c4833f 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) > unsigned long val; > > if (mem_cgroup_is_root(memcg)) { > - cgroup_rstat_flush(memcg->css.cgroup); > + /* mem_cgroup_threshold() calls here from irqsafe context */ > + cgroup_rstat_flush_irqsafe(memcg->css.cgroup); > val = memcg_page_state(memcg, NR_FILE_PAGES) + > memcg_page_state(memcg, NR_ANON_MAPPED); > if (swap) > -- > 2.32.0
On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel <riel@fb.com> wrote: > > On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote: > > > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the > > thresholding code is invoked during stat changes, and those contexts > > have irqs disabled as well. If the lock breaking occurs inside the > > flush function, it will result in a sleep from an atomic context. > > > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this > > While this fix is necessary, in the long term I think we may > want some sort of redesign here, to make sure the irq safe > version does not spin long times trying to get the statistics > off some other CPU. > > I have seen a number of soft (IIRC) lockups deep inside the > bowels of cgroup_rstat_flush_irqsafe, with the function taking > multiple seconds to complete. Can you please share a bit more detail on this lockup? I am wondering if this was due to the flush not happening more often and thus the update tree is large or if there are too many concurrent flushes happening. > > Reviewed-by: Rik van Riel <riel@surriel.com>
On Mon, Jul 26, 2021 at 8:01 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Dan Carpenter reports: > > The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr > 29, 2021, leads to the following static checker warning: > > kernel/cgroup/rstat.c:200 cgroup_rstat_flush() > warn: sleeping in atomic context > > mm/memcontrol.c > 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) > 3573 { > 3574 unsigned long val; > 3575 > 3576 if (mem_cgroup_is_root(memcg)) { > 3577 cgroup_rstat_flush(memcg->css.cgroup); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This is from static analysis and potentially a false positive. The > problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() > which holds an rcu_read_lock(). And the cgroup_rstat_flush() function > can sleep. > > 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + > 3579 memcg_page_state(memcg, NR_ANON_MAPPED); > 3580 if (swap) > 3581 val += memcg_page_state(memcg, MEMCG_SWAP); > 3582 } else { > 3583 if (!swap) > 3584 val = page_counter_read(&memcg->memory); > 3585 else > 3586 val = page_counter_read(&memcg->memsw); > 3587 } > 3588 return val; > 3589 } > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the > thresholding code is invoked during stat changes, and those contexts > have irqs disabled as well. If the lock breaking occurs inside the > flush function, it will result in a sleep from an atomic context. > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this. > > Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") > Cc: <stable@vger.kernel.org> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> BTW what do you think of removing stat flushes from the read side (kernel and userspace) completely after periodic flushing and async flushing from update side? Basically with "memcg: infrastructure to flush memcg stats".
On Tue, 2021-07-27 at 09:51 -0700, Shakeel Butt wrote: > On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel <riel@fb.com> wrote: > > > > On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote: > > > > > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, > > > the > > > thresholding code is invoked during stat changes, and those > > > contexts > > > have irqs disabled as well. If the lock breaking occurs inside > > > the > > > flush function, it will result in a sleep from an atomic context. > > > > > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this > > > > While this fix is necessary, in the long term I think we may > > want some sort of redesign here, to make sure the irq safe > > version does not spin long times trying to get the statistics > > off some other CPU. > > > > I have seen a number of soft (IIRC) lockups deep inside the > > bowels of cgroup_rstat_flush_irqsafe, with the function taking > > multiple seconds to complete. > > Can you please share a bit more detail on this lockup? I am wondering > if this was due to the flush not happening more often and thus the > update tree is large or if there are too many concurrent flushes > happening. I was not logged into any system while it happened, but only found it later in the logs. I suspect your explanation is the reason why it happened, though.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ae1f5d0cb581..eb8e87c4833f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) unsigned long val; if (mem_cgroup_is_root(memcg)) { - cgroup_rstat_flush(memcg->css.cgroup); + /* mem_cgroup_threshold() calls here from irqsafe context */ + cgroup_rstat_flush_irqsafe(memcg->css.cgroup); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap)
Dan Carpenter reports: The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr 29, 2021, leads to the following static checker warning: kernel/cgroup/rstat.c:200 cgroup_rstat_flush() warn: sleeping in atomic context mm/memcontrol.c 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) 3573 { 3574 unsigned long val; 3575 3576 if (mem_cgroup_is_root(memcg)) { 3577 cgroup_rstat_flush(memcg->css.cgroup); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is from static analysis and potentially a false positive. The problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() which holds an rcu_read_lock(). And the cgroup_rstat_flush() function can sleep. 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + 3579 memcg_page_state(memcg, NR_ANON_MAPPED); 3580 if (swap) 3581 val += memcg_page_state(memcg, MEMCG_SWAP); 3582 } else { 3583 if (!swap) 3584 val = page_counter_read(&memcg->memory); 3585 else 3586 val = page_counter_read(&memcg->memsw); 3587 } 3588 return val; 3589 } __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the thresholding code is invoked during stat changes, and those contexts have irqs disabled as well. If the lock breaking occurs inside the flush function, it will result in a sleep from an atomic context. Use the irsafe flushing variant in mem_cgroup_usage() to fix this. Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Cc: <stable@vger.kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)