mm/list_lru.c: use cond_resched_lock() for nlru->lock

Message ID	1497228440-10349-1-git-send-email-stummala@codeaurora.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> sender: stummala@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id C9F4860853; Mon, 12 Jun 2017 00:47:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org C9F4860853 From: Sahitya Tummala <stummala@codeaurora.org> To: Alexander Polakov <apolyakov@beget.ru>, Andrew Morton <akpm@linux-foundation.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, Jan Kara <jack@suse.cz>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Sahitya Tummala <stummala@codeaurora.org> Subject: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock Date: Mon, 12 Jun 2017 06:17:20 +0530 Message-Id: <1497228440-10349-1-git-send-email-stummala@codeaurora.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Sahitya Tummala June 12, 2017, 12:47 a.m. UTC

__list_lru_walk_one() can hold the spin lock for longer duration
if there are more number of entries to be isolated.

This results in "BUG: spinlock lockup suspected" in the below path -

[<ffffff8eca0fb0bc>] spin_bug+0x90
[<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
[<ffffff8ecafb7798>] _raw_spin_lock+0x28
[<ffffff8eca1ae884>] list_lru_add+0x28
[<ffffff8eca1f5dac>] dput+0x1c8
[<ffffff8eca1eb46c>] path_put+0x20
[<ffffff8eca1eb73c>] terminate_walk+0x3c
[<ffffff8eca1eee58>] path_lookupat+0x100
[<ffffff8eca1f00fc>] filename_lookup+0x6c
[<ffffff8eca1f0264>] user_path_at_empty+0x54
[<ffffff8eca1e066c>] SyS_faccessat+0xd0
[<ffffff8eca084e30>] el0_svc_naked+0x24

This nlru->lock has been acquired by another CPU in this path -

[<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
[<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
[<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
[<ffffff8eca1aec34>] list_lru_walk_node+0x40
[<ffffff8eca1f6620>] shrink_dcache_sb+0x60
[<ffffff8eca1e56a8>] do_remount_sb+0xbc
[<ffffff8eca1e583c>] do_emergency_remount+0xb0
[<ffffff8eca0ba510>] process_one_work+0x228
[<ffffff8eca0bb158>] worker_thread+0x2e0
[<ffffff8eca0c040c>] kthread+0xf4
[<ffffff8eca084dd0>] ret_from_fork+0x10

Link: http://marc.info/?t=149511514800002&r=1&w=2
Fix-suggested-by: Jan kara <jack@suse.cz>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
 mm/list_lru.c | 2 ++
 1 file changed, 2 insertions(+)

Jan Kara June 12, 2017, 1:11 p.m. UTC | #1

On Mon 12-06-17 06:17:20, Sahitya Tummala wrote:
> __list_lru_walk_one() can hold the spin lock for longer duration
> if there are more number of entries to be isolated.
> 
> This results in "BUG: spinlock lockup suspected" in the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock has been acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Link: http://marc.info/?t=149511514800002&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/list_lru.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 5d8dffd..1af0709 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -249,6 +249,8 @@ restart:
>  		default:
>  			BUG();
>  		}
> +		if (cond_resched_lock(&nlru->lock))
> +			goto restart;
>  	}
>  
>  	spin_unlock(&nlru->lock);
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>

Andrew Morton June 15, 2017, 9:05 p.m. UTC | #2

On Mon, 12 Jun 2017 06:17:20 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:

> __list_lru_walk_one() can hold the spin lock for longer duration
> if there are more number of entries to be isolated.
> 
> This results in "BUG: spinlock lockup suspected" in the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock has been acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Link: http://marc.info/?t=149511514800002&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
>  mm/list_lru.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 5d8dffd..1af0709 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -249,6 +249,8 @@ restart:
>  		default:
>  			BUG();
>  		}
> +		if (cond_resched_lock(&nlru->lock))
> +			goto restart;
>  	}
>  
>  	spin_unlock(&nlru->lock);

This is rather worrying.

a) Why are we spending so long holding that lock that this is occurring?

b) With this patch, we're restarting the entire scan.  Are there
   situations in which this loop will never terminate, or will take a
   very long time?  Suppose that this process is getting rescheds
   blasted at it for some reason?

IOW this looks like a bit of a band-aid and a deeper analysis and
understanding might be needed.

Sahitya Tummala June 16, 2017, 2:44 p.m. UTC | #3

On 6/16/2017 2:35 AM, Andrew Morton wrote:

> diff --git a/mm/list_lru.c b/mm/list_lru.c
>> index 5d8dffd..1af0709 100644
>> --- a/mm/list_lru.c
>> +++ b/mm/list_lru.c
>> @@ -249,6 +249,8 @@ restart:
>>   		default:
>>   			BUG();
>>   		}
>> +		if (cond_resched_lock(&nlru->lock))
>> +			goto restart;
>>   	}
>>   
>>   	spin_unlock(&nlru->lock);
> This is rather worrying.
>
> a) Why are we spending so long holding that lock that this is occurring?

At the time of crash I see that __list_lru_walk_one() shows number of
entries isolated as 1774475 with nr_items still pending as 130748. On my
system, I see that for dentries of 100000, it takes around 75ms for
__list_lru_walk_one() to complete. So for a total of 1900000 dentries as
in issue scenario, it will take upto 1425ms, which explains why the spin
lockup condition got hit on the other CPU.

It looks like __list_lru_walk_one() is expected to take more time if
there are more number of dentries present. And I think it is a valid
scenario to have those many number dentries.

> b) With this patch, we're restarting the entire scan.  Are there
>     situations in which this loop will never terminate, or will take a
>     very long time?  Suppose that this process is getting rescheds
>     blasted at it for some reason?

In the above scenario, I observed that the dentry entries from lru list
are removedall the time i.e LRU_REMOVED is returned from the
isolate (dentry_lru_isolate()) callback. I don't know if there is any case
where we skip several entries in the lru list and restartseveral times due
to this cond_resched_lock(). This can happen even with theexisting code
if LRU_RETRY is returned often from the isolate callback.
> IOW this looks like a bit of a band-aid and a deeper analysis and
> understanding might be needed.

Vladimir Davydov June 17, 2017, 11:14 a.m. UTC | #4

Hello,

On Thu, Jun 15, 2017 at 02:05:23PM -0700, Andrew Morton wrote:
> On Mon, 12 Jun 2017 06:17:20 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:
> 
> > __list_lru_walk_one() can hold the spin lock for longer duration
> > if there are more number of entries to be isolated.
> > 
> > This results in "BUG: spinlock lockup suspected" in the below path -
> > 
> > [<ffffff8eca0fb0bc>] spin_bug+0x90
> > [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> > [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> > [<ffffff8eca1ae884>] list_lru_add+0x28
> > [<ffffff8eca1f5dac>] dput+0x1c8
> > [<ffffff8eca1eb46c>] path_put+0x20
> > [<ffffff8eca1eb73c>] terminate_walk+0x3c
> > [<ffffff8eca1eee58>] path_lookupat+0x100
> > [<ffffff8eca1f00fc>] filename_lookup+0x6c
> > [<ffffff8eca1f0264>] user_path_at_empty+0x54
> > [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> > [<ffffff8eca084e30>] el0_svc_naked+0x24
> > 
> > This nlru->lock has been acquired by another CPU in this path -
> > 
> > [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> > [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> > [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> > [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> > [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> > [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> > [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> > [<ffffff8eca0ba510>] process_one_work+0x228
> > [<ffffff8eca0bb158>] worker_thread+0x2e0
> > [<ffffff8eca0c040c>] kthread+0xf4
> > [<ffffff8eca084dd0>] ret_from_fork+0x10
> > 
> > Link: http://marc.info/?t=149511514800002&r=1&w=2
> > Fix-suggested-by: Jan kara <jack@suse.cz>
> > Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> > ---
> >  mm/list_lru.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/list_lru.c b/mm/list_lru.c
> > index 5d8dffd..1af0709 100644
> > --- a/mm/list_lru.c
> > +++ b/mm/list_lru.c
> > @@ -249,6 +249,8 @@ restart:
> >  		default:
> >  			BUG();
> >  		}
> > +		if (cond_resched_lock(&nlru->lock))
> > +			goto restart;
> >  	}
> >  
> >  	spin_unlock(&nlru->lock);
> 
> This is rather worrying.
> 
> a) Why are we spending so long holding that lock that this is occurring?
> 
> b) With this patch, we're restarting the entire scan.  Are there
>    situations in which this loop will never terminate, or will take a
>    very long time?  Suppose that this process is getting rescheds
>    blasted at it for some reason?
> 
> IOW this looks like a bit of a band-aid and a deeper analysis and
> understanding might be needed.

The goal of list_lru_walk is removing inactive entries from the lru list
(LRU_REMOVED). Memory shrinkers may also choose to move active entries
to the tail of the lru list (LRU_ROTATED). LRU_SKIP is supposed to be
returned only to avoid a possible deadlock. So I don't see how
restarting lru walk could have adverse effects.

However, I do find this patch kinda ugly, because:

 - list_lru_walk already gives you a way to avoid a lockup - just make
   the callback reschedule and return LRU_RETRY every now and then, see
   shadow_lru_isolate() for an example. Alternatively, you can limit the
   number of entries scanned in one go (nr_to_walk) and reschedule
   between calls. This is what shrink_slab() does: the number of
   dentries scanned without releasing the lock is limited to 1024, see
   how super_block::s_shrink is initialized.

 - Someone might want to call list_lru_walk with a spin lock held, and I
   don't see anything wrong in doing that. With your patch it can't be
   done anymore.

That said, I think it would be better to patch shrink_dcache_sb() or
dentry_lru_isolate_shrink() instead of list_lru_walk() in order to fix
this lockup.

Sahitya Tummala June 20, 2017, 2:52 a.m. UTC | #5

Hello,

On 6/17/2017 4:44 PM, Vladimir Davydov wrote:

>
> That said, I think it would be better to patch shrink_dcache_sb() or
> dentry_lru_isolate_shrink() instead of list_lru_walk() in order to fix
> this lockup.

Thanks for the review. I will enhance the patch as per your suggestion.

mm/list_lru.c: use cond_resched_lock() for nlru->lock

Commit Message

Comments

Patch