[RFC] sunrpc: do not allow process to freeze within RPC state machine
diff mbox

Message ID 20160803165412.22407.47399.stgit@localhost.localdomain
State Under Review
Delegated to: Trond Myklebust
Headers show

Commit Message

Stanislav Kinsburskiy Aug. 3, 2016, 4:54 p.m. UTC
Otherwise freezer cgroup state might never become "FROZEN".

Here is a deadlock scheme for 2 processes in one freezer cgroup, which is
freezing:

CPU 0                                   CPU 1
--------                                --------
do_last
inode_lock(dir->d_inode)
vfs_create
nfs_create
...
__rpc_execute
rpc_wait_bit_killable
__refrigerator
                                        do_last
                                        inode_lock(dir->d_inode)

So, the problem is that one process takes directory inode mutex, executes
creation request and goes to refrigerator.
Another one waits till directory lock is released, remains "thawed" and thus
freezer cgroup state never becomes "FROZEN".

Notes:
1) Interesting, that this is not a pure deadlock: one can thaw cgroup and then
freeze it again.
2) The issue was introduced by commit d310310cbff18ec385c6ab4d58f33b100192a96a.
3) This patch is not aimed to fix the issue, but to show the problem root.
Look like this problem moght be applicable to other hunks from the commit,
mentioned above.


Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
---
 net/sunrpc/sched.c |    1 -
 1 file changed, 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stanislav Kinsburskiy Aug. 4, 2016, 10:55 a.m. UTC | #1
03.08.2016 19:36, Jeff Layton пишет:
> On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
>> Otherwise freezer cgroup state might never become "FROZEN".
>>
>> Here is a deadlock scheme for 2 processes in one freezer cgroup,
>> which is
>> freezing:
>>
>> CPU 0                                   CPU 1
>> --------                                --------
>> do_last
>> inode_lock(dir->d_inode)
>> vfs_create
>> nfs_create
>> ...
>> __rpc_execute
>> rpc_wait_bit_killable
>> __refrigerator
>>                                          do_last
>>                                          inode_lock(dir->d_inode)
>>
>> So, the problem is that one process takes directory inode mutex,
>> executes
>> creation request and goes to refrigerator.
>> Another one waits till directory lock is released, remains "thawed"
>> and thus
>> freezer cgroup state never becomes "FROZEN".
>>
>> Notes:
>> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
>> and then
>> freeze it again.
>> 2) The issue was introduced by commit
>> d310310cbff18ec385c6ab4d58f33b100192a96a.
>> 3) This patch is not aimed to fix the issue, but to show the problem
>> root.
>> Look like this problem moght be applicable to other hunks from the
>> commit,
>> mentioned above.
>>
>>
>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
>> ---
>>   net/sunrpc/sched.c |    1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 9ae5885..ec7ccc1 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>>   
>>   static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>>   {
>> -	freezable_schedule_unsafe();
>>   	if (signal_pending_state(mode, current))
>>   		return -ERESTARTSYS;
>>   	return 0;
>>
> Ummm...so what actually does the schedule() with this patch?

Schedule() replaces freezable_schedule_unsafe() of course, sorry for this.

> There was a bit of discussion on this recently -- see the thread with
> this subject line in linux-nfs:
>
>      Re: Hang due to nfs letting tasks freeze with locked inodes

Thanks, had a look.

> Basically it comes down to this:
>
> All of the proposals so far to fix this problem just switch out the
> freezable_schedule_unsafe (and similar) calls for those that don't
> allow the process to freeze.
>
> The problem there is that we originally added that stuff in response to
> bug reports about machines failing to suspend. What often happens is
> that the network interfaces come down, and then the freezer runs over
> all of the processes, which never return because they're blocked
> waiting on the server to reply.

I probably don't understand something, but this sounds somewhat wrong to 
me: freezing processes _after_ network is down.


>
> ...shrug...
>
> Maybe we should just go ahead and do it (and to CIFS as well). Just be
> prepared for the inevitable complaints about laptops failing to suspend
> once you do.

The worst part in all of this, from my POW, is that current behavior
makes NFS non-freezable in a generic case, even in case of freezing a
container, which has it's own net ns and NFS mount.
So, I would say, that returning of previous logic would make the
world better.

> Part of the fix, I think is to add a return code (similar to
> ERESTARTSYS) that gets interpreted near the kernel-userland boundary
> as: "allow the process to be frozen, and then retry the call once it's
> resumed".
>
> With that, filesystems could return the error code when they want to
> redrive the entire syscall from that level. That won't work for non-
> idempotent requests though. We'd need to do something more elaborate
> there.
>

Might be, that breaking rpc request is something that should be avoided 
at all.
With all these locks being held, almost all (any?) of the requests to 
remote server
should be considered as an atomic operation from freezer point of view.
The process always can be frozen on signal handling.

IOW, I might worth considering a scenario, when NFS is not freezable at all,
and any problems with suspend on laptops/whatever have to solved in 
suspend code.


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9ae5885..ec7ccc1 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -253,7 +253,6 @@  EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
 
 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;