vfs: Avoid softlockups in drop_pagecache_sb()
diff mbox series

Message ID 20190114085343.15011-1-jack@suse.cz
State New
Headers show
Series
  • vfs: Avoid softlockups in drop_pagecache_sb()
Related show

Commit Message

Jan Kara Jan. 14, 2019, 8:53 a.m. UTC
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/drop_caches.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Andrew, can you please merge this patch? Thanks!

Comments

Andrew Morton Jan. 30, 2019, 12:56 a.m. UTC | #1
On Mon, 14 Jan 2019 09:53:43 +0100 Jan Kara <jack@suse.cz> wrote:

> When superblock has lots of inodes without any pagecache (like is the
> case for /proc), drop_pagecache_sb() will iterate through all of them
> without dropping sb->s_inode_list_lock which can lead to softlockups
> (one of our customers hit this).
> 
> Fix the problem by going to the slow path and doing cond_resched() in
> case the process needs rescheduling.
> 
> ...
>
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -21,8 +21,13 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
>  	spin_lock(&sb->s_inode_list_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>  		spin_lock(&inode->i_lock);
> +		/*
> +		 * We must skip inodes in unusual state. We may also skip
> +		 * inodes without pages but we deliberately won't in case
> +		 * we need to reschedule to avoid softlockups.
> +		 */
>  		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
> -		    (inode->i_mapping->nrpages == 0)) {
> +		    (inode->i_mapping->nrpages == 0 && !need_resched())) {
>  			spin_unlock(&inode->i_lock);
>  			continue;
>  		}
> @@ -30,6 +35,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
>  		spin_unlock(&inode->i_lock);
>  		spin_unlock(&sb->s_inode_list_lock);
>  
> +		cond_resched();
>  		invalidate_mapping_pages(inode->i_mapping, 0, -1);
>  		iput(toput_inode);
>  		toput_inode = inode;

Are we sure there's no situation in which a large number of inodes can
be in the "unusual state", leading to the same issue?
Jan Kara Jan. 30, 2019, 10:07 a.m. UTC | #2
On Tue 29-01-19 16:56:36, Andrew Morton wrote:
> On Mon, 14 Jan 2019 09:53:43 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > When superblock has lots of inodes without any pagecache (like is the
> > case for /proc), drop_pagecache_sb() will iterate through all of them
> > without dropping sb->s_inode_list_lock which can lead to softlockups
> > (one of our customers hit this).
> > 
> > Fix the problem by going to the slow path and doing cond_resched() in
> > case the process needs rescheduling.
> > 
> > ...
> >
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -21,8 +21,13 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
> >  	spin_lock(&sb->s_inode_list_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> >  		spin_lock(&inode->i_lock);
> > +		/*
> > +		 * We must skip inodes in unusual state. We may also skip
> > +		 * inodes without pages but we deliberately won't in case
> > +		 * we need to reschedule to avoid softlockups.
> > +		 */
> >  		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
> > -		    (inode->i_mapping->nrpages == 0)) {
> > +		    (inode->i_mapping->nrpages == 0 && !need_resched())) {
> >  			spin_unlock(&inode->i_lock);
> >  			continue;
> >  		}
> > @@ -30,6 +35,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
> >  		spin_unlock(&inode->i_lock);
> >  		spin_unlock(&sb->s_inode_list_lock);
> >  
> > +		cond_resched();
> >  		invalidate_mapping_pages(inode->i_mapping, 0, -1);
> >  		iput(toput_inode);
> >  		toput_inode = inode;
> 
> Are we sure there's no situation in which a large number of inodes can
> be in the "unusual state", leading to the same issue?

No, we cannot be really sure that there aren't many such inodes (although
I'd be surprised if there were). But the problem with "unusual state"
inodes is that you cannot just __iget() them (well, you could but it's
breaking the rules and would lead to use-after-free issues ;) and so
there's no easy way to drop the spinlock and then continue the iteration
after cond_resched(). So overall it's too complex to deal with this until
someone actually hits it.

								Honza

Patch
diff mbox series

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 82377017130f..d31b6c72b476 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -21,8 +21,13 @@  static void drop_pagecache_sb(struct super_block *sb, void *unused)
 	spin_lock(&sb->s_inode_list_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 		spin_lock(&inode->i_lock);
+		/*
+		 * We must skip inodes in unusual state. We may also skip
+		 * inodes without pages but we deliberately won't in case
+		 * we need to reschedule to avoid softlockups.
+		 */
 		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
-		    (inode->i_mapping->nrpages == 0)) {
+		    (inode->i_mapping->nrpages == 0 && !need_resched())) {
 			spin_unlock(&inode->i_lock);
 			continue;
 		}
@@ -30,6 +35,7 @@  static void drop_pagecache_sb(struct super_block *sb, void *unused)
 		spin_unlock(&inode->i_lock);
 		spin_unlock(&sb->s_inode_list_lock);
 
+		cond_resched();
 		invalidate_mapping_pages(inode->i_mapping, 0, -1);
 		iput(toput_inode);
 		toput_inode = inode;