fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
diff mbox

Message ID 20180413202823.204377-1-khazhy@google.com
State New
Headers show

Commit Message

Khazhismel Kumykov April 13, 2018, 8:28 p.m. UTC
shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
In this case we may have 0 dentries to dispose, so we will never
schedule out while waiting for the parallel shrink_dentry_list to
complete.

Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()

Fixes: 32785c0539b7 ("fs/dcache.c: add cond_resched() in shrink_dentry_list()")
Reported-by: syzbot+ae80b790eb412884ca77@syzkaller.appspotmail.com

Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 fs/dcache.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Andrew Morton April 13, 2018, 9:14 p.m. UTC | #1
On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@google.com> wrote:

> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
> In this case we may have 0 dentries to dispose, so we will never
> schedule out while waiting for the parallel shrink_dentry_list to
> complete.
> 
> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()

Well I guess the patch is OK as a stopgap, but things seem fairly
messed up in there.  shrink_dcache_parent() shouldn't be doing a
busywait, waiting for the concurrent shrink_dentry_list().

Either we should be waiting (sleeping) for the concurrent operation to
complete or we should just bail out of shrink_dcache_parent(), perhaps
with 

	if (list_empty(&data.dispose))
		break;

or similar.  Dunno.


That block comment over `struct select_data' is not a good one.  "It
returns zero iff...".  *What* returns zero?  select_collect()?  No it
doesn't, it returns an `enum d_walk_ret'.  Perhaps the comment is
trying to refer to select_data.found.  And the real interpretation of
select_data.found is, umm, hard to describe.  "Counts the number of
dentries which are on a shrink list or which were moved to the dispose
list".  Why?  What's that all about?

This code needs a bit of thought, documentation and perhaps a redo,
I suspect.
David Rientjes April 13, 2018, 9:15 p.m. UTC | #2
On Fri, 13 Apr 2018, Khazhismel Kumykov wrote:

> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
> In this case we may have 0 dentries to dispose, so we will never
> schedule out while waiting for the parallel shrink_dentry_list to
> complete.
> 
> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()
> 
> Fixes: 32785c0539b7 ("fs/dcache.c: add cond_resched() in shrink_dentry_list()")
> Reported-by: syzbot+ae80b790eb412884ca77@syzkaller.appspotmail.com
> 
> Cc: Nikolay Borisov <nborisov@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
> Cc: Jeff Mahoney <jeffm@suse.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>

Acked-by: David Rientjes <rientjes@google.com>
Nikolay Borisov April 14, 2018, 7 a.m. UTC | #3
On 14.04.2018 00:14, Andrew Morton wrote:
> On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@google.com> wrote:
> 
>> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
>> In this case we may have 0 dentries to dispose, so we will never
>> schedule out while waiting for the parallel shrink_dentry_list to
>> complete.
>>
>> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()
> 
> Well I guess the patch is OK as a stopgap, but things seem fairly
> messed up in there.  shrink_dcache_parent() shouldn't be doing a
> busywait, waiting for the concurrent shrink_dentry_list().
> 
> Either we should be waiting (sleeping) for the concurrent operation to
> complete or we should just bail out of shrink_dcache_parent(), perhaps
> with 
> 
> 	if (list_empty(&data.dispose))
> 		break;
> 
> or similar.  Dunno.

I agree, however, not being a dcache expert I'd refrain from touching
it, since it seems to be rather fragile. Perhaps Al could take a look in
there?

> 
> 
> That block comment over `struct select_data' is not a good one.  "It
> returns zero iff...".  *What* returns zero?  select_collect()?  No it
> doesn't, it returns an `enum d_walk_ret'.  Perhaps the comment is
> trying to refer to select_data.found.  And the real interpretation of
> select_data.found is, umm, hard to describe.  "Counts the number of
> dentries which are on a shrink list or which were moved to the dispose
> list".  Why?  What's that all about?
> 
> This code needs a bit of thought, documentation and perhaps a redo,
> I suspect.
>
Al Viro April 14, 2018, 8:02 a.m. UTC | #4
On Sat, Apr 14, 2018 at 10:00:29AM +0300, Nikolay Borisov wrote:
> 
> 
> On 14.04.2018 00:14, Andrew Morton wrote:
> > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@google.com> wrote:
> > 
> >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
> >> In this case we may have 0 dentries to dispose, so we will never
> >> schedule out while waiting for the parallel shrink_dentry_list to
> >> complete.
> >>
> >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()
> > 
> > Well I guess the patch is OK as a stopgap, but things seem fairly
> > messed up in there.  shrink_dcache_parent() shouldn't be doing a
> > busywait, waiting for the concurrent shrink_dentry_list().
> > 
> > Either we should be waiting (sleeping) for the concurrent operation to
> > complete or we should just bail out of shrink_dcache_parent(), perhaps
> > with 
> > 
> > 	if (list_empty(&data.dispose))
> > 		break;
> > 
> > or similar.  Dunno.
> 
> I agree, however, not being a dcache expert I'd refrain from touching
> it, since it seems to be rather fragile. Perhaps Al could take a look in
> there?

"Bail out" is definitely a bad idea, "sleep"... what on?  Especially
since there might be several evictions we are overlapping with...

Patch
diff mbox

diff --git a/fs/dcache.c b/fs/dcache.c
index 591b34500e41..3507badeb60a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1489,6 +1489,7 @@  void shrink_dcache_parent(struct dentry *parent)
 			break;
 
 		shrink_dentry_list(&data.dispose);
+		cond_resched();
 	}
 }
 EXPORT_SYMBOL(shrink_dcache_parent);