fs: avoid softlockups in s_inodes iterators
diff mbox series

Message ID 841d0e0f-f04c-9611-2eea-0bcc40e5b084@redhat.com
State New
Headers show
Series
  • fs: avoid softlockups in s_inodes iterators
Related show

Commit Message

Eric Sandeen Oct. 11, 2019, 4:49 p.m. UTC
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.

Previous efforts were made in 2 functions, see:

c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes

but there hasn't been an audit of all walkers, so do that now.  This
also consistently moves the cond_resched() calls to the bottom of each
loop.

One remains: remove_dquot_ref(), because I'm not quite sure how to deal
with that one w/o taking the i_lock.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

Comments

Josef Bacik Oct. 11, 2019, 5:29 p.m. UTC | #1
On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
> Anything that walks all inodes on sb->s_inodes list without rescheduling
> risks softlockups.
> 
> Previous efforts were made in 2 functions, see:
> 
> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> ac05fbb inode: don't softlockup when evicting inodes
> 
> but there hasn't been an audit of all walkers, so do that now.  This
> also consistently moves the cond_resched() calls to the bottom of each
> loop.
> 
> One remains: remove_dquot_ref(), because I'm not quite sure how to deal
> with that one w/o taking the i_lock.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

You've got iput cleanups in here and cond_resched()'s.  I feel like this is a
missed opportunity to pad your patch count.  Thanks,

Josef
Eric Sandeen Oct. 11, 2019, 5:34 p.m. UTC | #2
On 10/11/19 12:29 PM, Josef Bacik wrote:
> On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
>> Anything that walks all inodes on sb->s_inodes list without rescheduling
>> risks softlockups.
>>
>> Previous efforts were made in 2 functions, see:
>>
>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
>> ac05fbb inode: don't softlockup when evicting inodes
>>
>> but there hasn't been an audit of all walkers, so do that now.  This
>> also consistently moves the cond_resched() calls to the bottom of each
>> loop.
>>
>> One remains: remove_dquot_ref(), because I'm not quite sure how to deal
>> with that one w/o taking the i_lock.
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> 
> You've got iput cleanups in here and cond_resched()'s.  I feel like this is a
> missed opportunity to pad your patch count.  Thanks,

yeah, I was going to suggest that I could split it out into 3
(move cond_rescheds, clean up iputs, add new rescheds) if there was a
request.  But it seemed a bit ridiculously granular.  Find by me
if desired, tho.

So, was that a request?

-Eric
Josef Bacik Oct. 11, 2019, 5:35 p.m. UTC | #3
On Fri, Oct 11, 2019 at 12:34:45PM -0500, Eric Sandeen wrote:
> On 10/11/19 12:29 PM, Josef Bacik wrote:
> > On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
> >> Anything that walks all inodes on sb->s_inodes list without rescheduling
> >> risks softlockups.
> >>
> >> Previous efforts were made in 2 functions, see:
> >>
> >> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> >> ac05fbb inode: don't softlockup when evicting inodes
> >>
> >> but there hasn't been an audit of all walkers, so do that now.  This
> >> also consistently moves the cond_resched() calls to the bottom of each
> >> loop.
> >>
> >> One remains: remove_dquot_ref(), because I'm not quite sure how to deal
> >> with that one w/o taking the i_lock.
> >>
> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > 
> > You've got iput cleanups in here and cond_resched()'s.  I feel like this is a
> > missed opportunity to pad your patch count.  Thanks,
> 
> yeah, I was going to suggest that I could split it out into 3
> (move cond_rescheds, clean up iputs, add new rescheds) if there was a
> request.  But it seemed a bit ridiculously granular.  Find by me
> if desired, tho.
> 
> So, was that a request?

I think just two patches, one for the iputs and one for the resched changes.
Thanks,

Josef
Matthew Wilcox Oct. 11, 2019, 6:32 p.m. UTC | #4
On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
>  		inode_lru_list_del(inode);
>  		spin_unlock(&inode->i_lock);
>  		list_add(&inode->i_lru, &dispose);
> +
> +		if (need_resched()) {
> +			spin_unlock(&sb->s_inode_list_lock);
> +			cond_resched();
> +			dispose_list(&dispose);
> +			goto again;
> +		}
>  	}
>  	spin_unlock(&sb->s_inode_list_lock);
>  

Is this equivalent to:

+		cond_resched_lock(&sb->s_inode_list_lock));

or is disposing of the list a crucial part here?
Eric Sandeen Oct. 11, 2019, 6:45 p.m. UTC | #5
On 10/11/19 1:32 PM, Matthew Wilcox wrote:
> On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
>> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
>>  		inode_lru_list_del(inode);
>>  		spin_unlock(&inode->i_lock);
>>  		list_add(&inode->i_lru, &dispose);
>> +
>> +		if (need_resched()) {
>> +			spin_unlock(&sb->s_inode_list_lock);
>> +			cond_resched();
>> +			dispose_list(&dispose);
>> +			goto again;
>> +		}
>>  	}
>>  	spin_unlock(&sb->s_inode_list_lock);
>>  
> 
> Is this equivalent to:
> 
> +		cond_resched_lock(&sb->s_inode_list_lock));
> 
> or is disposing of the list a crucial part here?

I think we need to dispose, or we'll start with the entire ~unmodified list again after the goto:

-Eric
Eric Sandeen Oct. 11, 2019, 9:14 p.m. UTC | #6
On 10/11/19 1:45 PM, Eric Sandeen wrote:
> On 10/11/19 1:32 PM, Matthew Wilcox wrote:
>> On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
>>> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
>>>  		inode_lru_list_del(inode);
>>>  		spin_unlock(&inode->i_lock);
>>>  		list_add(&inode->i_lru, &dispose);
>>> +
>>> +		if (need_resched()) {
>>> +			spin_unlock(&sb->s_inode_list_lock);
>>> +			cond_resched();
>>> +			dispose_list(&dispose);
>>> +			goto again;
>>> +		}
>>>  	}
>>>  	spin_unlock(&sb->s_inode_list_lock);
>>>  
>>
>> Is this equivalent to:
>>
>> +		cond_resched_lock(&sb->s_inode_list_lock));
>>
>> or is disposing of the list a crucial part here?
> 
> I think we need to dispose, or we'll start with the entire ~unmodified list again after the goto:

Oh, if you meant in lieu of the goto, we can't drop that lock and
expect to pick up our traversal where we left off, can we?

-Eric
Dave Chinner Oct. 12, 2019, 4:29 a.m. UTC | #7
On Fri, Oct 11, 2019 at 04:14:02PM -0500, Eric Sandeen wrote:
> 
> 
> On 10/11/19 1:45 PM, Eric Sandeen wrote:
> > On 10/11/19 1:32 PM, Matthew Wilcox wrote:
> >> On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote:
> >>> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
> >>>  		inode_lru_list_del(inode);
> >>>  		spin_unlock(&inode->i_lock);
> >>>  		list_add(&inode->i_lru, &dispose);
> >>> +
> >>> +		if (need_resched()) {
> >>> +			spin_unlock(&sb->s_inode_list_lock);
> >>> +			cond_resched();
> >>> +			dispose_list(&dispose);
> >>> +			goto again;
> >>> +		}
> >>>  	}
> >>>  	spin_unlock(&sb->s_inode_list_lock);
> >>>  
> >>
> >> Is this equivalent to:
> >>
> >> +		cond_resched_lock(&sb->s_inode_list_lock));
> >>
> >> or is disposing of the list a crucial part here?
> > 
> > I think we need to dispose, or we'll start with the entire ~unmodified list again after the goto:
> 
> Oh, if you meant in lieu of the goto, we can't drop that lock and
> expect to pick up our traversal where we left off, can we?

No, we can't. Unless you're doing the iget/iput game (which we can't
here!) the moment the s_inode_list_lock is dropped we cannot rely on
the inode or it's next pointer to still be valid. Hence we have to
restart the traversal. And we dispose of the list before restarting
because there's nothing to gain by waiting until we've done the
entire sb inode list walk (could be hundreds of millions of inodes)
before we start actually freeing them....

Cheers,

Dave.
Jan Kara Oct. 14, 2019, 8:46 a.m. UTC | #8
On Fri 11-10-19 11:49:38, Eric Sandeen wrote:
> One remains: remove_dquot_ref(), because I'm not quite sure how to deal
> with that one w/o taking the i_lock.

Yeah, that will be somewhat tricky. But I think we can modify the standard
iget-iput dance like:

		if (need_resched()) {
			spin_lock(&inode->i_lock);
			if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
				/*
				 * Cannot break on this inode, need to do one
				 * more.
				 */
				spin_unlock(&inode->i_lock);
				continue;
			}
			__iget(inode);
			spin_unlock(&inode->i_lock);
			spin_unlock(&sb->s_inode_list_lock);
			iput(put_inode);
			put_inode = inode;
			cond_resched();
			spin_lock(&sb->s_inode_list_lock);
		}
	...
	iput(put_inode);

Will you transform this into a proper patch in your series or should I do
it?

								Honza

Patch
diff mbox series

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d31b6c72b476..dc1a1d5d825b 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -35,11 +35,11 @@  static void drop_pagecache_sb(struct super_block *sb, void *unused)
 		spin_unlock(&inode->i_lock);
 		spin_unlock(&sb->s_inode_list_lock);
 
-		cond_resched();
 		invalidate_mapping_pages(inode->i_mapping, 0, -1);
 		iput(toput_inode);
 		toput_inode = inode;
 
+		cond_resched();
 		spin_lock(&sb->s_inode_list_lock);
 	}
 	spin_unlock(&sb->s_inode_list_lock);
diff --git a/fs/inode.c b/fs/inode.c
index fef457a42882..b0c789bb3dba 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -676,6 +676,7 @@  int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 	struct inode *inode, *next;
 	LIST_HEAD(dispose);
 
+again:
 	spin_lock(&sb->s_inode_list_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
 		spin_lock(&inode->i_lock);
@@ -698,6 +699,13 @@  int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 		inode_lru_list_del(inode);
 		spin_unlock(&inode->i_lock);
 		list_add(&inode->i_lru, &dispose);
+
+		if (need_resched()) {
+			spin_unlock(&sb->s_inode_list_lock);
+			cond_resched();
+			dispose_list(&dispose);
+			goto again;
+		}
 	}
 	spin_unlock(&sb->s_inode_list_lock);
 
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 2ecef6155fc0..15f77cbbd188 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -67,22 +67,19 @@  static void fsnotify_unmount_inodes(struct super_block *sb)
 		spin_unlock(&inode->i_lock);
 		spin_unlock(&sb->s_inode_list_lock);
 
-		if (iput_inode)
-			iput(iput_inode);
-
+		iput(iput_inode);
 		/* for each watch, send FS_UNMOUNT and then remove it */
 		fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
-
 		fsnotify_inode_delete(inode);
 
 		iput_inode = inode;
 
+		cond_resched();
 		spin_lock(&sb->s_inode_list_lock);
 	}
 	spin_unlock(&sb->s_inode_list_lock);
+	iput(iput_inode);
 
-	if (iput_inode)
-		iput(iput_inode);
 	/* Wait for outstanding inode references from connectors */
 	wait_var_event(&sb->s_fsnotify_inode_refs,
 		       !atomic_long_read(&sb->s_fsnotify_inode_refs));
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 6e826b454082..4a085b3c7cac 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -985,6 +985,7 @@  static int add_dquot_ref(struct super_block *sb, int type)
 		 * later.
 		 */
 		old_inode = inode;
+		cond_resched();
 		spin_lock(&sb->s_inode_list_lock);
 	}
 	spin_unlock(&sb->s_inode_list_lock);