diff mbox

fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

Message ID CA+55aFz4fwzcUWWT78AxK+GeVNVveWPNS+=V+ppb1ksn56TjUA@mail.gmail.com
State New, archived
Headers show

Commit Message

Linus Torvalds April 14, 2018, 4:36 p.m. UTC
On Sat, Apr 14, 2018 at 1:02 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> "Bail out" is definitely a bad idea, "sleep"... what on?  Especially
> since there might be several evictions we are overlapping with...

Well, one thing that should be looked at is the return condition from
select_collect() that shrink_dcache_parent() uses.

Because I think that return condition is somewhat insane.

The logic there seems to be:

 - if we have found something, stop walking. Either NOW (if somebody
is waiting) or after you've hit a rename (if nobody is)

Now, this actually makes perfect sense for the whole rename situation:
if there's nobody waiting for us, but we hit a rename, we probably
should stop anyway just to let whoever is doing that rename continue,
and we might as well try to get rid of the dentries we have found so
far.

But it does *not* make sense for the case where we've hit a dentry
that is already on the shrink list. Sure, we'll continue to gather all
the other dentries, but if there is concurrent shrinking, shouldn't we
give up the CPU more eagerly - *particularly* if somebody else is
waiting (it might be the other process that actually gets rid of the
shrinking dentries!)?

So my gut feel is that we should at least try doing something like
this in select_collect():

-       if (!list_empty(&data->dispose))
+       if (data->found)
                ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;

because even if we haven't actually been able to shrink something, if
we hit an already shrinking entry we should probably at least not do
the "retry for rename". And if we actually are going to reschedule, we
might as well start from the beginning.

I realize that *this* thread might not be making any actual progress
(because it didn't find any dentries to shrink), but since it did find
_a_ dentry that is being shrunk, we know the operation itself - on a
bigger scale - is making progress.

Hmm?

Now, this is independent of the fact that we probably do need a
cond_resched() in shrink_dcache_parent(), to actually do the
reschedule if we're not preemptible. The "need_resched()" in
select_collect() is obviously done while holding

HOWEVER. Even in that case, I don't think shrink_dcache_parent() is
the right point. I'd rather just do it differently in
shrink_dentry_list(): do it even for the empty list case by just doing
it at the top of the loop:

 static void shrink_dentry_list(struct list_head *list)
 {
-       while (!list_empty(list)) {
+       while (cond_resched(), !list_empty(list)) {
                struct dentry *dentry, *parent;

-               cond_resched();

so my full patch that I would suggest might be TheRightThing(tm) is
attached (but it should be committed as two patches, since the two
issues are independent - I'm just attaching it as one for testing in
case somebody wants to run some nasty workloads on it)

Comments?

Side note: I think we might want to make that

    while (cond_resched(), <condition>) {
        ....
    }

thing a pattern for doing cond_resched() in loops, instead of having
the cond_resched() inside the loop itself.

It not only handles the "zero iterations" case, it also ends up being
neutral location-waise wrt 'continue' statements, and potentially
generates *better* code.

For example, in this case, doing the cond_resched() at the very top of
the loop means that the loop itself then does that

                dentry = list_entry(list->prev, struct dentry, d_lru);

right after the "list_empty()" test - which means that register
allocation etc might be easier, because it doesn't have a function
call (with associated register clobbers) in between the two accesses
to "list".

And I think that might be a fairly common pattern - the loop
conditional uses the same values as the loop itself then uses.

I don't know. Maybe I'm just making excuses for the somewhat unusual syntax.

Anybody want to test this out?

                   Linus

Comments

Al Viro April 14, 2018, 8:58 p.m. UTC | #1
On Sat, Apr 14, 2018 at 09:36:23AM -0700, Linus Torvalds wrote:
> But it does *not* make sense for the case where we've hit a dentry
> that is already on the shrink list. Sure, we'll continue to gather all
> the other dentries, but if there is concurrent shrinking, shouldn't we
> give up the CPU more eagerly - *particularly* if somebody else is
> waiting (it might be the other process that actually gets rid of the
> shrinking dentries!)?
> 
> So my gut feel is that we should at least try doing something like
> this in select_collect():
> 
> -       if (!list_empty(&data->dispose))
> +       if (data->found)
>                 ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
> 
> because even if we haven't actually been able to shrink something, if
> we hit an already shrinking entry we should probably at least not do
> the "retry for rename". And if we actually are going to reschedule, we
> might as well start from the beginning.
> 
> I realize that *this* thread might not be making any actual progress
> (because it didn't find any dentries to shrink), but since it did find
> _a_ dentry that is being shrunk, we know the operation itself - on a
> bigger scale - is making progress.
> 
> Hmm?

That breaks d_invalidate(), unfortunately.  Look at the termination
conditions in the loop there...
Linus Torvalds April 14, 2018, 9:47 p.m. UTC | #2
On Sat, Apr 14, 2018 at 1:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> That breaks d_invalidate(), unfortunately.  Look at the termination
> conditions in the loop there...

Ugh. I was going to say "but that doesn't even use select_collect()",
but yeah, detach_and_collect() calls it.

It would be easy enough to just change the

                if (!list_empty(&data.select.dispose))

there to

                if (!list_empty(&data.select.found))

too.

In fact, it probably *should* do that, exactly to get the whole
"cond_resched()" call in that whole call chain too. Because as-is, it
looks like it has the same issue as shrink_dcache_parent() does..

But yeah, the fact that I didn't notice that makes me a bit nervous.
But now I triple-checked, there are no other indirect callers.

            Linus
Al Viro April 15, 2018, 12:51 a.m. UTC | #3
On Sat, Apr 14, 2018 at 02:47:21PM -0700, Linus Torvalds wrote:
> On Sat, Apr 14, 2018 at 1:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > That breaks d_invalidate(), unfortunately.  Look at the termination
> > conditions in the loop there...
> 
> Ugh. I was going to say "but that doesn't even use select_collect()",
> but yeah, detach_and_collect() calls it.
> 
> It would be easy enough to just change the
> 
>                 if (!list_empty(&data.select.dispose))
> 
> there to
> 
>                 if (!list_empty(&data.select.found))
> 
> too.

You would have to do the same in check_and_drop() as well,
and that brings back d_invalidate()/d_invalidate() livelock
we used to have.  See 81be24d263db...

I'm trying to put something together, but the damn thing is
full of potential livelocks, unfortunately ;-/  Will send
a followup once I have something resembling a sane solution...
Al Viro April 15, 2018, 2:39 a.m. UTC | #4
On Sun, Apr 15, 2018 at 01:51:07AM +0100, Al Viro wrote:
> On Sat, Apr 14, 2018 at 02:47:21PM -0700, Linus Torvalds wrote:
> > On Sat, Apr 14, 2018 at 1:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > That breaks d_invalidate(), unfortunately.  Look at the termination
> > > conditions in the loop there...
> > 
> > Ugh. I was going to say "but that doesn't even use select_collect()",
> > but yeah, detach_and_collect() calls it.
> > 
> > It would be easy enough to just change the
> > 
> >                 if (!list_empty(&data.select.dispose))
> > 
> > there to
> > 
> >                 if (!list_empty(&data.select.found))
> > 
> > too.
> 
> You would have to do the same in check_and_drop() as well,
> and that brings back d_invalidate()/d_invalidate() livelock
> we used to have.  See 81be24d263db...
> 
> I'm trying to put something together, but the damn thing is
> full of potential livelocks, unfortunately ;-/  Will send
> a followup once I have something resembling a sane solution...

I really wonder if we should just do the following in
d_invalidate():
	* grab ->d_lock on victim, check if it's unhashed,
unlock and bugger off if it is.  Otherwise, unhash and unlock.
From that point on any d_set_mounted() in the subtree will
fail.
	* shrink_dcache_parent() to reduce the subtree size.
	* go through the (hopefully shrunk) subtree, picking
mountpoints.  detach_mounts() for each of them.
	* shrink_dcache_parent() if any points had been
encountered, to kick the now-unpinned stuff.

As a side benefit, we could probably be gentler on rename_lock
in d_set_mounted() after that change...
Linus Torvalds April 15, 2018, 6:34 p.m. UTC | #5
On Sat, Apr 14, 2018 at 5:51 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>                 if (!list_empty(&data.select.found))

That was obviously meant to be just

                if (data.select.found)

I had just cut-and-pasted a bit too much.

> You would have to do the same in check_and_drop() as well,
> and that brings back d_invalidate()/d_invalidate() livelock
> we used to have.  See 81be24d263db...

Ugh. These are all really incestuous and very intertwined. Yes.

> I'm trying to put something together, but the damn thing is
> full of potential livelocks, unfortunately ;-/  Will send
> a followup once I have something resembling a sane solution...

Ok, that patch of yours looks like a nice cleanup, although *please*
don't do this:

-       struct detach_data *data = _data;
-
        if (d_mountpoint(dentry)) {
                __dget_dlock(dentry);
-               data->mountpoint = dentry;
+               *(struct dentry **)_data = dentry;

Please keep the temporary variable, and make it do

+       struct dcache **victim = _victim;
...
+               *victim = dentry;

to kind of match the caller, which does

                d_walk(dentry, &victim, find_submount);

because I abhor those casts inside code, and we have a pattern of
passing 'void *_xyz' to callback functions and then making the right
type by that kind of

        struct right_type *xyz = _xyz;

at the very top of the function.

No, it's obviously not type-safe, but at least it's _legible_, and is
a pattern, while that "let's randomly just do a cast in the middle of
the code" is just nasty.

Side note: I do feel like "d_walk()" should be returning whether it
terminated early or not. For example, this very same code in the
caller does

+               struct dentry *victim = NULL;
+               d_walk(dentry, &victim, find_submount);
+               if (!victim) {

but in many ways it would be more natural to just check the exit condition, and

+               struct dentry *victim;
+               if (!d_walk(dentry, &victim, find_submount)) {

don't you think? Because that matches the actual setting condition in
the find_submount() callback.

There are other situations where the same thing is true: that
path_check_mount() currently has that "info->mounted" flag, but again,
it could be replaced by just checking what the quit condition was, and
whether we terminated early or not. Because the two are 100%
equivalent, and the return value in many ways would be more logical, I
feel.

(I'm not sure if we should just return the actual exit condition -
defaulting to D_WALK_CONTINUE if there was nothing to walk at all - or
whether we should just return a boolean for "terminated early")

Hmm?

                    Linus
Al Viro April 15, 2018, 8:40 p.m. UTC | #6
On Sun, Apr 15, 2018 at 11:34:17AM -0700, Linus Torvalds wrote:

> No, it's obviously not type-safe, but at least it's _legible_, and is
> a pattern, while that "let's randomly just do a cast in the middle of
> the code" is just nasty.

Sure, no problem...  I really wish there was a way to say

void foo(int (*f)(α *), α *data) ∀ α

and have the compiler verify that foo(f, v) is done only
when f(v) is well-typed, but that's C, not Haskell...  The best
approximation is something along the lines of

void __foo(int (*f)(void *), void *data);
#define foo(f, v) (sizeof((f)((v)), 0), __foo((f),(v)))

and that relies upon the identical calling sequence for all pointer
arguments.  AFAIK, it's true for all ABIs we support, but...
Worse, there's no way to get #define in macro expansion, so the
above would be impossible to hide behind anything convenient ;-/

> Side note: I do feel like "d_walk()" should be returning whether it
> terminated early or not. For example, this very same code in the
> caller does
> 
> +               struct dentry *victim = NULL;
> +               d_walk(dentry, &victim, find_submount);
> +               if (!victim) {
> 
> but in many ways it would be more natural to just check the exit condition, and
> 
> +               struct dentry *victim;
> +               if (!d_walk(dentry, &victim, find_submount)) {
> 
> don't you think? Because that matches the actual setting condition in
> the find_submount() callback.
> 
> There are other situations where the same thing is true: that
> path_check_mount() currently has that "info->mounted" flag, but again,
> it could be replaced by just checking what the quit condition was, and
> whether we terminated early or not. Because the two are 100%
> equivalent, and the return value in many ways would be more logical, I
> feel.
> 
> (I'm not sure if we should just return the actual exit condition -
> defaulting to D_WALK_CONTINUE if there was nothing to walk at all - or
> whether we should just return a boolean for "terminated early")
> 
> Hmm?

Not sure...   There are 5 callers:
	* do_one_tree(), d_genocide() - nothing to return
	* path_has_submounts(), d_invalidate() - could use your trick,
but d_invalidate() wants to look at victim if not buggering off, so
that one doesn't win much
	* shrink_dcache_parent() - no way to use that.  Here we normally
run the walk to completion and need to repeat it in all cases of early
termination *and* in some of the ran-to-completion cases.

BTW, the current placement of cond_resched() looks bogus; suppose we
have collected a lot of victims and ran into need_resched().  We leave
d_walk() and call shrink_dentry_list().  At that point there's a lot
of stuff on our shrink list and anybody else running into them will
have to keep scanning.  Giving up the timeslice before we take care
of any of those looks like a bad idea, to put it mildly, and that's
precisely what will happen.

What about doing that in the end of __dentry_kill() instead?  And to
hell with both existing call sites - dput() one (before going to
the parent) is obviously covered by that (dentry_kill() only returns
non-NULL after having called __dentry_kill()) and in shrink_dentry_list()
we'll get to it as soon as we go through all dentries that can be
immediately kicked off the shrink list.  Which, AFAICS, improves the
situation, now that shrink_lock_dentry() contains no trylock loops...

Comments?
diff mbox

Patch

 fs/dcache.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 86d2de63461e..76507109cbcd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1049,11 +1049,9 @@  static bool shrink_lock_dentry(struct dentry *dentry)
 
 static void shrink_dentry_list(struct list_head *list)
 {
-	while (!list_empty(list)) {
+	while (cond_resched(), !list_empty(list)) {
 		struct dentry *dentry, *parent;
 
-		cond_resched();
-
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 		spin_lock(&dentry->d_lock);
 		rcu_read_lock();
@@ -1462,7 +1460,7 @@  static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
 	 * ensures forward progress). We'll be coming back to find
 	 * the rest.
 	 */
-	if (!list_empty(&data->dispose))
+	if (data->found)
 		ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
 out:
 	return ret;