[v2] mm: memcg: Fix memcg reclaim soft lockup

Message ID	1598449622-108748-1-git-send-email-xlpang@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=dJX+=CE=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2CE2622B49 From: Xunlei Pang <xlpang@linux.alibaba.com> To: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Xunlei Pang <xlpang@linux.alibaba.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2] mm: memcg: Fix memcg reclaim soft lockup Date: Wed, 26 Aug 2020 21:47:02 +0800 Message-Id: <1598449622-108748-1-git-send-email-xlpang@linux.alibaba.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[v2] mm: memcg: Fix memcg reclaim soft lockup \| expand [v2] mm: memcg: Fix memcg reclaim soft lockup

Xunlei Pang Aug. 26, 2020, 1:47 p.m. UTC

We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
the target memcg doesn't have any reclaimable memory.

It can be easily reproduced as below:
 watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
 CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
 Call Trace:
  shrink_lruvec+0x49f/0x640
  shrink_node+0x2a6/0x6f0
  do_try_to_free_pages+0xe9/0x3e0
  try_to_free_mem_cgroup_pages+0xef/0x1f0
  try_charge+0x2c1/0x750
  mem_cgroup_charge+0xd7/0x240
  __add_to_page_cache_locked+0x2fd/0x370
  add_to_page_cache_lru+0x4a/0xc0
  pagecache_get_page+0x10b/0x2f0
  filemap_fault+0x661/0xad0
  ext4_filemap_fault+0x2c/0x40
  __do_fault+0x4d/0xf9
  handle_mm_fault+0x1080/0x1790

It only happens on our 1-vcpu instances, because there's no chance
for oom reaper to run to reclaim the to-be-killed process.

Add cond_resched() at the upper shrink_node_memcgs() to solve this
issue, and any other possible issue like meomry.min protection.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
---
 mm/vmscan.c | 2 ++
 1 file changed, 2 insertions(+)

Chris Down Aug. 26, 2020, 2:10 p.m. UTC | #1

Hi Xunlei,

Xunlei Pang writes:
>Add cond_resched() at the upper shrink_node_memcgs() to solve this
>issue, and any other possible issue like meomry.min protection.

I understand the general intent of the patch, but could you clarify your 
concern around memory protection?

Michal Hocko Aug. 26, 2020, 3:05 p.m. UTC | #2

On Wed 26-08-20 21:47:02, Xunlei Pang wrote:
> We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
> the target memcg doesn't have any reclaimable memory.
> 
> It can be easily reproduced as below:
>  watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
>  CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
>  Call Trace:
>   shrink_lruvec+0x49f/0x640
>   shrink_node+0x2a6/0x6f0
>   do_try_to_free_pages+0xe9/0x3e0
>   try_to_free_mem_cgroup_pages+0xef/0x1f0
>   try_charge+0x2c1/0x750
>   mem_cgroup_charge+0xd7/0x240
>   __add_to_page_cache_locked+0x2fd/0x370
>   add_to_page_cache_lru+0x4a/0xc0
>   pagecache_get_page+0x10b/0x2f0
>   filemap_fault+0x661/0xad0
>   ext4_filemap_fault+0x2c/0x40
>   __do_fault+0x4d/0xf9
>   handle_mm_fault+0x1080/0x1790
> 
> It only happens on our 1-vcpu instances, because there's no chance
> for oom reaper to run to reclaim the to-be-killed process.
> 
> Add cond_resched() at the upper shrink_node_memcgs() to solve this
> issue, and any other possible issue like meomry.min protection.

I would just add
"
This will mean that we will get a scheduling point for each memcg in the
reclaimed hierarchy without any dependency on the reclaimable memory in
that memcg thus making it more predictable.
" 
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/vmscan.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 99e1796..bbdc38b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2617,6 +2617,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>  
>  		mem_cgroup_calculate_protection(target_memcg, memcg);
>  
> +		cond_resched();
> +
>  		if (mem_cgroup_below_min(memcg)) {
>  			/*
>  			 * Hard protection.
> -- 
> 1.8.3.1

Johannes Weiner Aug. 26, 2020, 4:43 p.m. UTC | #3

On Wed, Aug 26, 2020 at 09:47:02PM +0800, Xunlei Pang wrote:
> We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
> the target memcg doesn't have any reclaimable memory.
> 
> It can be easily reproduced as below:
>  watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
>  CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
>  Call Trace:
>   shrink_lruvec+0x49f/0x640
>   shrink_node+0x2a6/0x6f0
>   do_try_to_free_pages+0xe9/0x3e0
>   try_to_free_mem_cgroup_pages+0xef/0x1f0
>   try_charge+0x2c1/0x750
>   mem_cgroup_charge+0xd7/0x240
>   __add_to_page_cache_locked+0x2fd/0x370
>   add_to_page_cache_lru+0x4a/0xc0
>   pagecache_get_page+0x10b/0x2f0
>   filemap_fault+0x661/0xad0
>   ext4_filemap_fault+0x2c/0x40
>   __do_fault+0x4d/0xf9
>   handle_mm_fault+0x1080/0x1790
> 
> It only happens on our 1-vcpu instances, because there's no chance
> for oom reaper to run to reclaim the to-be-killed process.
> 
> Add cond_resched() at the upper shrink_node_memcgs() to solve this
> issue, and any other possible issue like meomry.min protection.
> 
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>

This generally makes sense to me but really should have a comment:

	/*
	 * This loop can become CPU-bound when there are thousands
	 * of cgroups that aren't eligible for reclaim - either
	 * because they don't have any pages, or because their
	 * memory is explicitly protected. Avoid soft lockups.
	 */
	 cond_resched();

The placement in the middle of the multi-part protection checks is a
bit odd too. It would be better to have it either at the top of the
loop, or at the end, by replacing the continues with goto next.

Michal Hocko Aug. 26, 2020, 5:29 p.m. UTC | #4

On Wed 26-08-20 12:43:32, Johannes Weiner wrote:
> On Wed, Aug 26, 2020 at 09:47:02PM +0800, Xunlei Pang wrote:
> > We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
> > the target memcg doesn't have any reclaimable memory.
> > 
> > It can be easily reproduced as below:
> >  watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
> >  CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
> >  Call Trace:
> >   shrink_lruvec+0x49f/0x640
> >   shrink_node+0x2a6/0x6f0
> >   do_try_to_free_pages+0xe9/0x3e0
> >   try_to_free_mem_cgroup_pages+0xef/0x1f0
> >   try_charge+0x2c1/0x750
> >   mem_cgroup_charge+0xd7/0x240
> >   __add_to_page_cache_locked+0x2fd/0x370
> >   add_to_page_cache_lru+0x4a/0xc0
> >   pagecache_get_page+0x10b/0x2f0
> >   filemap_fault+0x661/0xad0
> >   ext4_filemap_fault+0x2c/0x40
> >   __do_fault+0x4d/0xf9
> >   handle_mm_fault+0x1080/0x1790
> > 
> > It only happens on our 1-vcpu instances, because there's no chance
> > for oom reaper to run to reclaim the to-be-killed process.
> > 
> > Add cond_resched() at the upper shrink_node_memcgs() to solve this
> > issue, and any other possible issue like meomry.min protection.
> > 
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
> 
> This generally makes sense to me but really should have a comment:
> 
> 	/*
> 	 * This loop can become CPU-bound when there are thousands
> 	 * of cgroups that aren't eligible for reclaim - either
> 	 * because they don't have any pages, or because their
> 	 * memory is explicitly protected. Avoid soft lockups.
> 	 */
> 	 cond_resched();
> 
> The placement in the middle of the multi-part protection checks is a
> bit odd too. It would be better to have it either at the top of the
> loop, or at the end, by replacing the continues with goto next.

Yes makes sense. I would stick it to the begining of the loop to make it
stand out and make it obvious wrt code flow.

Chris Down Aug. 26, 2020, 6:17 p.m. UTC | #5

Chris Down writes:
>Xunlei Pang writes:
>>Add cond_resched() at the upper shrink_node_memcgs() to solve this
>>issue, and any other possible issue like meomry.min protection.
>
>I understand the general intent of the patch, but could you clarify 
>your concern around memory protection?

Oh, I see, your concern is just preemption in general rather than a fixing 
anything for the memory protection side. In which case, go for it, but I agree 
with Michael that it would be nice to send v3 with a clarifying comment.

Acked-by: Chris Down <chris@chrisdown.name>

[v2] mm: memcg: Fix memcg reclaim soft lockup

Commit Message

Comments

Patch