diff mbox series

mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

Message ID 20210726150019.251820-1-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code | expand

Commit Message

Johannes Weiner July 26, 2021, 3 p.m. UTC
Dan Carpenter reports:

    The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
    29, 2021, leads to the following static checker warning:

	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
	    warn: sleeping in atomic context

    mm/memcontrol.c
      3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
      3573  {
      3574          unsigned long val;
      3575
      3576          if (mem_cgroup_is_root(memcg)) {
      3577                  cgroup_rstat_flush(memcg->css.cgroup);
			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    This is from static analysis and potentially a false positive.  The
    problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
    which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
    can sleep.

      3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
      3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
      3580                  if (swap)
      3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
      3582          } else {
      3583                  if (!swap)
      3584                          val = page_counter_read(&memcg->memory);
      3585                  else
      3586                          val = page_counter_read(&memcg->memsw);
      3587          }
      3588          return val;
      3589  }

__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
thresholding code is invoked during stat changes, and those contexts
have irqs disabled as well. If the lock breaking occurs inside the
flush function, it will result in a sleep from an atomic context.

Use the irsafe flushing variant in mem_cgroup_usage() to fix this.

Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: <stable@vger.kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Chris Down July 26, 2021, 3:08 p.m. UTC | #1
Johannes Weiner writes:
>Dan Carpenter reports:
>
>    The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
>    29, 2021, leads to the following static checker warning:
>
>	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
>	    warn: sleeping in atomic context
>
>    mm/memcontrol.c
>      3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
>      3573  {
>      3574          unsigned long val;
>      3575
>      3576          if (mem_cgroup_is_root(memcg)) {
>      3577                  cgroup_rstat_flush(memcg->css.cgroup);
>			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>    This is from static analysis and potentially a false positive.  The
>    problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
>    which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
>    can sleep.
>
>      3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
>      3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
>      3580                  if (swap)
>      3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
>      3582          } else {
>      3583                  if (!swap)
>      3584                          val = page_counter_read(&memcg->memory);
>      3585                  else
>      3586                          val = page_counter_read(&memcg->memsw);
>      3587          }
>      3588          return val;
>      3589  }
>
>__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
>thresholding code is invoked during stat changes, and those contexts
>have irqs disabled as well. If the lock breaking occurs inside the
>flush function, it will result in a sleep from an atomic context.
>
>Use the irsafe flushing variant in mem_cgroup_usage() to fix this.
>
>Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
>Cc: <stable@vger.kernel.org>
>Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks, looks good.

Acked-by: Chris Down <chris@chrisdown.name>

>---
> mm/memcontrol.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index ae1f5d0cb581..eb8e87c4833f 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> 	unsigned long val;
>
> 	if (mem_cgroup_is_root(memcg)) {
>-		cgroup_rstat_flush(memcg->css.cgroup);
>+		/* mem_cgroup_threshold() calls here from irqsafe context */
>+		cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
> 		val = memcg_page_state(memcg, NR_FILE_PAGES) +
> 			memcg_page_state(memcg, NR_ANON_MAPPED);
> 		if (swap)
>-- 
>2.32.0
>
>
Rik van Riel July 26, 2021, 3:16 p.m. UTC | #2
On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:
> 
> __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> thresholding code is invoked during stat changes, and those contexts
> have irqs disabled as well. If the lock breaking occurs inside the
> flush function, it will result in a sleep from an atomic context.
> 
> Use the irsafe flushing variant in mem_cgroup_usage() to fix this

While this fix is necessary, in the long term I think we may
want some sort of redesign here, to make sure the irq safe
version does not spin long times trying to get the statistics
off some other CPU.

I have seen a number of soft (IIRC) lockups deep inside the
bowels of cgroup_rstat_flush_irqsafe, with the function taking
multiple seconds to complete.

Reviewed-by: Rik van Riel <riel@surriel.com>
Michal Hocko July 26, 2021, 8:32 p.m. UTC | #3
On Mon 26-07-21 11:00:19, Johannes Weiner wrote:
> Dan Carpenter reports:
> 
>     The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
>     29, 2021, leads to the following static checker warning:
> 
> 	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
> 	    warn: sleeping in atomic context
> 
>     mm/memcontrol.c
>       3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
>       3573  {
>       3574          unsigned long val;
>       3575
>       3576          if (mem_cgroup_is_root(memcg)) {
>       3577                  cgroup_rstat_flush(memcg->css.cgroup);
> 			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>     This is from static analysis and potentially a false positive.  The
>     problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
>     which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
>     can sleep.
> 
>       3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
>       3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
>       3580                  if (swap)
>       3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
>       3582          } else {
>       3583                  if (!swap)
>       3584                          val = page_counter_read(&memcg->memory);
>       3585                  else
>       3586                          val = page_counter_read(&memcg->memsw);
>       3587          }
>       3588          return val;
>       3589  }
> 
> __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> thresholding code is invoked during stat changes, and those contexts
> have irqs disabled as well. If the lock breaking occurs inside the
> flush function, it will result in a sleep from an atomic context.
> 
> Use the irsafe flushing variant in mem_cgroup_usage() to fix this.
> 
> Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
> Cc: <stable@vger.kernel.org>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/memcontrol.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ae1f5d0cb581..eb8e87c4833f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
>  	unsigned long val;
>  
>  	if (mem_cgroup_is_root(memcg)) {
> -		cgroup_rstat_flush(memcg->css.cgroup);
> +		/* mem_cgroup_threshold() calls here from irqsafe context */
> +		cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
>  		val = memcg_page_state(memcg, NR_FILE_PAGES) +
>  			memcg_page_state(memcg, NR_ANON_MAPPED);
>  		if (swap)
> -- 
> 2.32.0
Shakeel Butt July 27, 2021, 4:51 p.m. UTC | #4
On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel <riel@fb.com> wrote:
>
> On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:
> >
> > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> > thresholding code is invoked during stat changes, and those contexts
> > have irqs disabled as well. If the lock breaking occurs inside the
> > flush function, it will result in a sleep from an atomic context.
> >
> > Use the irsafe flushing variant in mem_cgroup_usage() to fix this
>
> While this fix is necessary, in the long term I think we may
> want some sort of redesign here, to make sure the irq safe
> version does not spin long times trying to get the statistics
> off some other CPU.
>
> I have seen a number of soft (IIRC) lockups deep inside the
> bowels of cgroup_rstat_flush_irqsafe, with the function taking
> multiple seconds to complete.

Can you please share a bit more detail on this lockup? I am wondering
if this was due to the flush not happening more often and thus the
update tree is large or if there are too many concurrent flushes
happening.

>
> Reviewed-by: Rik van Riel <riel@surriel.com>
Shakeel Butt July 27, 2021, 4:59 p.m. UTC | #5
On Mon, Jul 26, 2021 at 8:01 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Dan Carpenter reports:
>
>     The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
>     29, 2021, leads to the following static checker warning:
>
>             kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
>             warn: sleeping in atomic context
>
>     mm/memcontrol.c
>       3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
>       3573  {
>       3574          unsigned long val;
>       3575
>       3576          if (mem_cgroup_is_root(memcg)) {
>       3577                  cgroup_rstat_flush(memcg->css.cgroup);
>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>     This is from static analysis and potentially a false positive.  The
>     problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
>     which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
>     can sleep.
>
>       3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
>       3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
>       3580                  if (swap)
>       3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
>       3582          } else {
>       3583                  if (!swap)
>       3584                          val = page_counter_read(&memcg->memory);
>       3585                  else
>       3586                          val = page_counter_read(&memcg->memsw);
>       3587          }
>       3588          return val;
>       3589  }
>
> __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> thresholding code is invoked during stat changes, and those contexts
> have irqs disabled as well. If the lock breaking occurs inside the
> flush function, it will result in a sleep from an atomic context.
>
> Use the irsafe flushing variant in mem_cgroup_usage() to fix this.
>
> Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
> Cc: <stable@vger.kernel.org>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

BTW what do you think of removing stat flushes from the read side
(kernel and userspace) completely after periodic flushing and async
flushing from update side? Basically with "memcg: infrastructure to
flush memcg stats".
Rik van Riel Aug. 3, 2021, 2:34 p.m. UTC | #6
On Tue, 2021-07-27 at 09:51 -0700, Shakeel Butt wrote:
> On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel <riel@fb.com> wrote:
> > 
> > On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:
> > > 
> > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition,
> > > the
> > > thresholding code is invoked during stat changes, and those
> > > contexts
> > > have irqs disabled as well. If the lock breaking occurs inside
> > > the
> > > flush function, it will result in a sleep from an atomic context.
> > > 
> > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this
> > 
> > While this fix is necessary, in the long term I think we may
> > want some sort of redesign here, to make sure the irq safe
> > version does not spin long times trying to get the statistics
> > off some other CPU.
> > 
> > I have seen a number of soft (IIRC) lockups deep inside the
> > bowels of cgroup_rstat_flush_irqsafe, with the function taking
> > multiple seconds to complete.
> 
> Can you please share a bit more detail on this lockup? I am wondering
> if this was due to the flush not happening more often and thus the
> update tree is large or if there are too many concurrent flushes
> happening.

I was not logged into any system while it happened, but
only found it later in the logs.

I suspect your explanation is the reason why it happened,
though.
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1f5d0cb581..eb8e87c4833f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3574,7 +3574,8 @@  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
 	unsigned long val;
 
 	if (mem_cgroup_is_root(memcg)) {
-		cgroup_rstat_flush(memcg->css.cgroup);
+		/* mem_cgroup_threshold() calls here from irqsafe context */
+		cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
 		val = memcg_page_state(memcg, NR_FILE_PAGES) +
 			memcg_page_state(memcg, NR_ANON_MAPPED);
 		if (swap)