Message ID | 20180413202823.204377-1-khazhy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@google.com> wrote: > shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. > In this case we may have 0 dentries to dispose, so we will never > schedule out while waiting for the parallel shrink_dentry_list to > complete. > > Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() Well I guess the patch is OK as a stopgap, but things seem fairly messed up in there. shrink_dcache_parent() shouldn't be doing a busywait, waiting for the concurrent shrink_dentry_list(). Either we should be waiting (sleeping) for the concurrent operation to complete or we should just bail out of shrink_dcache_parent(), perhaps with if (list_empty(&data.dispose)) break; or similar. Dunno. That block comment over `struct select_data' is not a good one. "It returns zero iff...". *What* returns zero? select_collect()? No it doesn't, it returns an `enum d_walk_ret'. Perhaps the comment is trying to refer to select_data.found. And the real interpretation of select_data.found is, umm, hard to describe. "Counts the number of dentries which are on a shrink list or which were moved to the dispose list". Why? What's that all about? This code needs a bit of thought, documentation and perhaps a redo, I suspect.
On Fri, 13 Apr 2018, Khazhismel Kumykov wrote: > shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. > In this case we may have 0 dentries to dispose, so we will never > schedule out while waiting for the parallel shrink_dentry_list to > complete. > > Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() > > Fixes: 32785c0539b7 ("fs/dcache.c: add cond_resched() in shrink_dentry_list()") > Reported-by: syzbot+ae80b790eb412884ca77@syzkaller.appspotmail.com > > Cc: Nikolay Borisov <nborisov@suse.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> > Cc: Jeff Mahoney <jeffm@suse.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Acked-by: David Rientjes <rientjes@google.com>
On 14.04.2018 00:14, Andrew Morton wrote: > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@google.com> wrote: > >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. >> In this case we may have 0 dentries to dispose, so we will never >> schedule out while waiting for the parallel shrink_dentry_list to >> complete. >> >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() > > Well I guess the patch is OK as a stopgap, but things seem fairly > messed up in there. shrink_dcache_parent() shouldn't be doing a > busywait, waiting for the concurrent shrink_dentry_list(). > > Either we should be waiting (sleeping) for the concurrent operation to > complete or we should just bail out of shrink_dcache_parent(), perhaps > with > > if (list_empty(&data.dispose)) > break; > > or similar. Dunno. I agree, however, not being a dcache expert I'd refrain from touching it, since it seems to be rather fragile. Perhaps Al could take a look in there? > > > That block comment over `struct select_data' is not a good one. "It > returns zero iff...". *What* returns zero? select_collect()? No it > doesn't, it returns an `enum d_walk_ret'. Perhaps the comment is > trying to refer to select_data.found. And the real interpretation of > select_data.found is, umm, hard to describe. "Counts the number of > dentries which are on a shrink list or which were moved to the dispose > list". Why? What's that all about? > > This code needs a bit of thought, documentation and perhaps a redo, > I suspect. >
On Sat, Apr 14, 2018 at 10:00:29AM +0300, Nikolay Borisov wrote: > > > On 14.04.2018 00:14, Andrew Morton wrote: > > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov <khazhy@google.com> wrote: > > > >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. > >> In this case we may have 0 dentries to dispose, so we will never > >> schedule out while waiting for the parallel shrink_dentry_list to > >> complete. > >> > >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() > > > > Well I guess the patch is OK as a stopgap, but things seem fairly > > messed up in there. shrink_dcache_parent() shouldn't be doing a > > busywait, waiting for the concurrent shrink_dentry_list(). > > > > Either we should be waiting (sleeping) for the concurrent operation to > > complete or we should just bail out of shrink_dcache_parent(), perhaps > > with > > > > if (list_empty(&data.dispose)) > > break; > > > > or similar. Dunno. > > I agree, however, not being a dcache expert I'd refrain from touching > it, since it seems to be rather fragile. Perhaps Al could take a look in > there? "Bail out" is definitely a bad idea, "sleep"... what on? Especially since there might be several evictions we are overlapping with...
diff --git a/fs/dcache.c b/fs/dcache.c index 591b34500e41..3507badeb60a 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1489,6 +1489,7 @@ void shrink_dcache_parent(struct dentry *parent) break; shrink_dentry_list(&data.dispose); + cond_resched(); } } EXPORT_SYMBOL(shrink_dcache_parent);
shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. In this case we may have 0 dentries to dispose, so we will never schedule out while waiting for the parallel shrink_dentry_list to complete. Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() Fixes: 32785c0539b7 ("fs/dcache.c: add cond_resched() in shrink_dentry_list()") Reported-by: syzbot+ae80b790eb412884ca77@syzkaller.appspotmail.com Cc: Nikolay Borisov <nborisov@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Khazhismel Kumykov <khazhy@google.com> --- fs/dcache.c | 1 + 1 file changed, 1 insertion(+)