Hang/soft lockup in d_invalidate with simultaneous calls
diff mbox

Message ID CACGdZYLjD=g0Ye=L-iVNQ7y5+2_8uw1SLy2et2z70+wBQz4_2w@mail.gmail.com
State New
Headers show

Commit Message

Khazhismel Kumykov June 3, 2017, 5:22 a.m. UTC
On Fri, Jun 2, 2017 at 6:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Part of that could be relieved if we turned check_and_drop() into
> static void check_and_drop(void *_data)
> {
>         struct detach_data *data = _data;
>
>         if (!data->mountpoint && list_empty(&data->select.dispose))
>                 __d_drop(data->select.start);
> }

So with this change, d_invalidate will drop the starting dentry before
all it's children are dropped? Would it make sense to just drop it
right off the bat, and let one task handle shrinking all it's
children?
>
> It doesn't solve the entire problem, though - we still could get too
> many threads into that thing before any of them gets to that __d_drop().

Yes, the change isn't sufficient for my repro, as many threads get to
the loop before the drop, although new tasks don't get stuck in the
same loop after the dentry is dropped.

> I wonder if we should try and collect more victims; that could lead
> to contention of its own, but...

From my understanding, the contention is the worst when one task is
shrinking everything, and several other tasks are busily looping
walking the dentry until everything is done. In this case, the tasks
busily looping d_walk hold the d_lock for a dentry while walking over
all it's children, then soon after it finishes the d_walk, it queues
again to walk again, while shrink_dentry_list releases and re-grabs
for each entry.

If we limit the number of items we pass to shrink_dentry_list at one
time things actually look quite a bit better. e.g., in testing
arbitrarily limiting select.dispose to 1000 elements does fix my
repro.

e.g.
        /*
@@ -1397,6 +1399,9 @@ static enum d_walk_ret select_collect(void
*_data, struct dentry *dentry)
         */
        if (!list_empty(&data->dispose))
                ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
+
+       if (data->actually_found > 1000)
+               ret = D_WALK_QUIT;
 out:
        return ret;
 }
@@ -1415,6 +1420,7 @@ void shrink_dcache_parent(struct dentry *parent)
                INIT_LIST_HEAD(&data.dispose);
                data.start = parent;
                data.found = 0;
+               data.actually_found = 0;

                d_walk(parent, &data, select_collect, NULL);
                if (!data.found)
@@ -1536,6 +1542,7 @@ void d_invalidate(struct dentry *dentry)
                INIT_LIST_HEAD(&data.select.dispose);
                data.select.start = dentry;
                data.select.found = 0;
+               data.select.actually_found = 0;

                d_walk(dentry, &data, detach_and_collect, check_and_drop);

Patch
diff mbox

diff --git a/fs/dcache.c b/fs/dcache.c
index 22af360ceca3..3892e0eb7ec2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1367,6 +1367,7 @@  struct select_data {
        struct dentry *start;
        struct list_head dispose;
        int found;
+       int actually_found;
 };

 static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
@@ -1388,6 +1389,7 @@  static enum d_walk_ret select_collect(void
*_data, struct dentry *dentry)
                if (!dentry->d_lockref.count) {
                        d_shrink_add(dentry, &data->dispose);
                        data->found++;
+                       data->actually_found++;
                }
        }