Message ID | 4FA345DA4F4AE44899BD2B03EEEC2FA911993608@SACEXCMBX04-PRD.hq.netapp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> My main interest is always the upstream (Linus) kernel, however the RPC > client in the CentOS 6.3 kernel does actually contain a lot of code that > was recently backported from upstream. As such, it is definitely of > interest to figure out corner case bugs so that we can compare to > upstream... Ok, great. I will try this version of the patch as well. However, when just thinking about this, I'm concerned that the race still exists, but is just less likely to manifest. I imagine something like this happening. I assume there is some reason this can't happen that I'm not seeing? These are functions from linus's current git, not the CentOS 6.3 code: thread 1 thread 2 -------- -------- __rpc_execute __rpc_execute ... ... call_reserve xprt_reserve xprt_lock_and_alloc_slot xprt_lock_write xprt_reserve_xprt ... xprt_release_write call_reserve xprt_reserve xprt_lock_and_alloc_slot xprt_lock_write xprt_reserve_xprt rpc_sleep_on_priority __rpc_sleep_on_priority __rpc_add_wait_queue __rpc_add_wait_queue_priority (Now on the sending wait queue) xs_tcp_release_xprt xprt_release_xprt xprt_clear_locked __xprt_lock_write_next rpc_wake_up_first __rpc_find_next_queued __rpc_find_next_queued_priority ... (has now pulled thread 2 off the wait queue) out_of_line_wait_on_bit (receive SIGKILL) rpc_wait_bit_killable rpc_exit rpc_exit_task rpc_release_task (doesn't release xprt b/c he isn't listed in snd_task yet) (returns from __rpc_execute) __xprt_lock_write_func (thread 2 now has the transport locked) rpc_wake_up_task_queue_locked __rpc_remove_wait_queue __rpc_remove_wait_queue_priority (continues on, potentially exiting early, potentially blocking the next time it needs the transport) With the patch we're discussing, it would fix the case where thread 2 is breaking out of the FSM loop after having been given the transport lock. But what about the above. Is there something else synchronizing things? In my testing so far with my (not quite right) amended v3 of the patch, the timing has become such that I was having trouble reproducing the problem while attempting to instrument things with systemtap. However, without systemtap running, I'm still able to reproduce the hang pretty easily. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-08 at 16:01 -0500, Chris Perl wrote: > > My main interest is always the upstream (Linus) kernel, however the RPC > > client in the CentOS 6.3 kernel does actually contain a lot of code that > > was recently backported from upstream. As such, it is definitely of > > interest to figure out corner case bugs so that we can compare to > > upstream... > > Ok, great. I will try this version of the patch as well. However, when just > thinking about this, I'm concerned that the race still exists, but is just less > likely to manifest. > > I imagine something like this happening. I assume there is some reason this > can't happen that I'm not seeing? These are functions from linus's current > git, not the CentOS 6.3 code: > > thread 1 thread 2 > -------- -------- > __rpc_execute __rpc_execute > ... ... > call_reserve > xprt_reserve > xprt_lock_and_alloc_slot > xprt_lock_write > xprt_reserve_xprt > ... > xprt_release_write > call_reserve > xprt_reserve > xprt_lock_and_alloc_slot > xprt_lock_write > xprt_reserve_xprt > rpc_sleep_on_priority > __rpc_sleep_on_priority > __rpc_add_wait_queue > __rpc_add_wait_queue_priority > (Now on the sending wait queue) > xs_tcp_release_xprt > xprt_release_xprt > xprt_clear_locked > __xprt_lock_write_next > rpc_wake_up_first > __rpc_find_next_queued > __rpc_find_next_queued_priority > ... > (has now pulled thread 2 off the wait queue) > out_of_line_wait_on_bit > (receive SIGKILL) > rpc_wait_bit_killable > rpc_exit > rpc_exit_task > rpc_release_task > (doesn't release xprt b/c he isn't listed in snd_task yet) > (returns from __rpc_execute) > __xprt_lock_write_func > (thread 2 now has the transport locked) > rpc_wake_up_task_queue_locked > __rpc_remove_wait_queue > __rpc_remove_wait_queue_priority > (continues on, potentially exiting early, > potentially blocking the next time it needs the transport) > > With the patch we're discussing, it would fix the case where thread 2 is > breaking out of the FSM loop after having been given the transport lock. But > what about the above. Is there something else synchronizing things? Hi Chris, I'm not sure I understand how the above would work. rpc_exit_task() will remove the rpc_task from the xprt->sending rpc_wait_queue, at which point __xprt_lock_write_func() can no longer find it. > In my testing so far with my (not quite right) amended v3 of the patch, the > timing has become such that I was having trouble reproducing the problem while > attempting to instrument things with systemtap. However, without systemtap > running, I'm still able to reproduce the hang pretty easily. Is it really the same hang? Does 'echo 0 >/proc/sys/sunrpc/rpc_debug' show you a similar list of rpc tasks waiting on xprt->sending?
On Tue, Jan 08, 2013 at 09:13:45PM +0000, Myklebust, Trond wrote: > On Tue, 2013-01-08 at 16:01 -0500, Chris Perl wrote: > > > My main interest is always the upstream (Linus) kernel, however the RPC > > > client in the CentOS 6.3 kernel does actually contain a lot of code that > > > was recently backported from upstream. As such, it is definitely of > > > interest to figure out corner case bugs so that we can compare to > > > upstream... > > > > Ok, great. I will try this version of the patch as well. However, when just > > thinking about this, I'm concerned that the race still exists, but is just less > > likely to manifest. > > > > I imagine something like this happening. I assume there is some reason this > > can't happen that I'm not seeing? These are functions from linus's current > > git, not the CentOS 6.3 code: > > > > thread 1 thread 2 > > -------- -------- > > __rpc_execute __rpc_execute > > ... ... > > call_reserve > > xprt_reserve > > xprt_lock_and_alloc_slot > > xprt_lock_write > > xprt_reserve_xprt > > ... > > xprt_release_write > > call_reserve > > xprt_reserve > > xprt_lock_and_alloc_slot > > xprt_lock_write > > xprt_reserve_xprt > > rpc_sleep_on_priority > > __rpc_sleep_on_priority > > __rpc_add_wait_queue > > __rpc_add_wait_queue_priority > > (Now on the sending wait queue) > > xs_tcp_release_xprt > > xprt_release_xprt > > xprt_clear_locked > > __xprt_lock_write_next > > rpc_wake_up_first > > __rpc_find_next_queued > > __rpc_find_next_queued_priority > > ... > > (has now pulled thread 2 off the wait queue) > > out_of_line_wait_on_bit > > (receive SIGKILL) > > rpc_wait_bit_killable > > rpc_exit > > rpc_exit_task > > rpc_release_task > > (doesn't release xprt b/c he isn't listed in snd_task yet) > > (returns from __rpc_execute) > > __xprt_lock_write_func > > (thread 2 now has the transport locked) > > rpc_wake_up_task_queue_locked > > __rpc_remove_wait_queue > > __rpc_remove_wait_queue_priority > > (continues on, potentially exiting early, > > potentially blocking the next time it needs the transport) > > > > With the patch we're discussing, it would fix the case where thread 2 is > > breaking out of the FSM loop after having been given the transport lock. But > > what about the above. Is there something else synchronizing things? > > I'm not sure I understand how the above would work. rpc_exit_task() will > remove the rpc_task from the xprt->sending rpc_wait_queue, at which > point __xprt_lock_write_func() can no longer find it. I'm imagining that thread 1 pulls thread 2 off the sending wait queue almost right after it was added, but before thread 2 has a chance to execute rpc_exit_task. Now that thread 1 has a reference to thread 2, it proceeds to give it the lock on the transport, but before it does so, thread 2 completely finishes executing and returns from __rpc_execute. I'm probably missing something. > Is it really the same hang? Does 'echo 0 >/proc/sys/sunrpc/rpc_debug' > show you a similar list of rpc tasks waiting on xprt->sending? I'm not certain. Its possible that its not. I'm recompiling with your v4 of the patch right now and will test shortly. I didn't know about echoing 0 out to that file, good to know! :) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-08 at 16:23 -0500, Chris Perl wrote: > On Tue, Jan 08, 2013 at 09:13:45PM +0000, Myklebust, Trond wrote: > > On Tue, 2013-01-08 at 16:01 -0500, Chris Perl wrote: > > > > My main interest is always the upstream (Linus) kernel, however the RPC > > > > client in the CentOS 6.3 kernel does actually contain a lot of code that > > > > was recently backported from upstream. As such, it is definitely of > > > > interest to figure out corner case bugs so that we can compare to > > > > upstream... > > > > > > Ok, great. I will try this version of the patch as well. However, when just > > > thinking about this, I'm concerned that the race still exists, but is just less > > > likely to manifest. > > > > > > I imagine something like this happening. I assume there is some reason this > > > can't happen that I'm not seeing? These are functions from linus's current > > > git, not the CentOS 6.3 code: > > > > > > thread 1 thread 2 > > > -------- -------- > > > __rpc_execute __rpc_execute > > > ... ... > > > call_reserve > > > xprt_reserve > > > xprt_lock_and_alloc_slot > > > xprt_lock_write > > > xprt_reserve_xprt > > > ... > > > xprt_release_write > > > call_reserve > > > xprt_reserve > > > xprt_lock_and_alloc_slot > > > xprt_lock_write > > > xprt_reserve_xprt > > > rpc_sleep_on_priority > > > __rpc_sleep_on_priority > > > __rpc_add_wait_queue > > > __rpc_add_wait_queue_priority > > > (Now on the sending wait queue) > > > xs_tcp_release_xprt > > > xprt_release_xprt > > > xprt_clear_locked > > > __xprt_lock_write_next > > > rpc_wake_up_first > > > __rpc_find_next_queued > > > __rpc_find_next_queued_priority > > > ... > > > (has now pulled thread 2 off the wait queue) > > > out_of_line_wait_on_bit > > > (receive SIGKILL) > > > rpc_wait_bit_killable > > > rpc_exit > > > rpc_exit_task > > > rpc_release_task > > > (doesn't release xprt b/c he isn't listed in snd_task yet) > > > (returns from __rpc_execute) > > > __xprt_lock_write_func > > > (thread 2 now has the transport locked) > > > rpc_wake_up_task_queue_locked > > > __rpc_remove_wait_queue > > > __rpc_remove_wait_queue_priority > > > (continues on, potentially exiting early, > > > potentially blocking the next time it needs the transport) > > > > > > With the patch we're discussing, it would fix the case where thread 2 is > > > breaking out of the FSM loop after having been given the transport lock. But > > > what about the above. Is there something else synchronizing things? > > > > I'm not sure I understand how the above would work. rpc_exit_task() will > > remove the rpc_task from the xprt->sending rpc_wait_queue, at which > > point __xprt_lock_write_func() can no longer find it. > > I'm imagining that thread 1 pulls thread 2 off the sending wait queue > almost right after it was added, but before thread 2 has a chance to > execute rpc_exit_task. Now that thread 1 has a reference to > thread 2, it proceeds to give it the lock on the transport, but before > it does so, thread 2 completely finishes executing and returns from > __rpc_execute. > > I'm probably missing something. The lock is associated with the rpc_task. Threads can normally only access an rpc_task when it is on a wait queue (while holding the wait queue lock) unless they are given ownership of the rpc_task. IOW: the scenario you describe should not be possible, since it relies on thread 1 assigning the lock to the rpc_task after it has been removed from the wait queue. > > Is it really the same hang? Does 'echo 0 >/proc/sys/sunrpc/rpc_debug' > > show you a similar list of rpc tasks waiting on xprt->sending? > > I'm not certain. Its possible that its not. I'm recompiling with your > v4 of the patch right now and will test shortly. I didn't know about > echoing 0 out to that file, good to know! :) That would be great. If you are recompiling the kernel, perhaps you can also add in a patch to rpc_show_tasks() to display the current value of clnt->cl_xprt->snd_task?
> The lock is associated with the rpc_task. Threads can normally only > access an rpc_task when it is on a wait queue (while holding the wait > queue lock) unless they are given ownership of the rpc_task. > > IOW: the scenario you describe should not be possible, since it relies > on thread 1 assigning the lock to the rpc_task after it has been removed > from the wait queue. Hrm. I guess I'm in over my head here. Apologoies if I'm just asking silly bumbling questions. You can start ignoring me at any time. :) I was talking about setting (or leaving set) the XPRT_LOCKED bit in rpc_xprt->state. By "assigning the lock" I really just mean that thread 1 leaves XPRT_LOCKED set in rpc_xprt->state and sets rpc_xprt->snd_task to thread 2. > If you are recompiling the kernel, perhaps you can also add in a patch > to rpc_show_tasks() to display the current value of > clnt->cl_xprt->snd_task? Sure. This is what 'echo 0 > /proc/sys/sunrpc/rpc_debug' shows after the hang (with my extra prints): # cat /proc/kmsg ... <6>client: ffff88082b6c9c00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 <6>client: ffff88082b6c9e00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 <6>-pid- flgs status -client- --rqstp- -timeout ---ops-- <6>18091 0080 -11 ffff88082b6c9e00 (null) ffff0770ns3 ACCESS a:call_reserveresult q:xprt_sending <6>client: ffff88082a244600, xprt: ffff88082a343000, snd_task: (null) <6>client: ffff880829181600, xprt: ffff88082a343000, snd_task: (null) <6>client: ffff880828170200, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 08, 2013 at 05:16:51PM -0500, Chris Perl wrote: > > The lock is associated with the rpc_task. Threads can normally only > > access an rpc_task when it is on a wait queue (while holding the wait > > queue lock) unless they are given ownership of the rpc_task. > > > > IOW: the scenario you describe should not be possible, since it relies > > on thread 1 assigning the lock to the rpc_task after it has been removed > > from the wait queue. > > Hrm. I guess I'm in over my head here. Apologoies if I'm just asking > silly bumbling questions. You can start ignoring me at any time. :) > > I was talking about setting (or leaving set) the XPRT_LOCKED bit in > rpc_xprt->state. By "assigning the lock" I really just mean that thread > 1 leaves XPRT_LOCKED set in rpc_xprt->state and sets rpc_xprt->snd_task > to thread 2. > > > If you are recompiling the kernel, perhaps you can also add in a patch > > to rpc_show_tasks() to display the current value of > > clnt->cl_xprt->snd_task? > > Sure. This is what 'echo 0 > /proc/sys/sunrpc/rpc_debug' shows after > the hang (with my extra prints): > > # cat /proc/kmsg > ... > <6>client: ffff88082b6c9c00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 > <6>client: ffff88082b6c9e00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 > <6>-pid- flgs status -client- --rqstp- -timeout ---ops-- > <6>18091 0080 -11 ffff88082b6c9e00 (null) ffff0770ns3 ACCESS a:call_reserveresult q:xprt_sending > <6>client: ffff88082a244600, xprt: ffff88082a343000, snd_task: (null) > <6>client: ffff880829181600, xprt: ffff88082a343000, snd_task: (null) > <6>client: ffff880828170200, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 Sorry, that output was a little messed up. Here it is again: <6>client: ffff88082b6c9c00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 <6>client: ffff88082b6c9e00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 <6>-pid- flgs status -client- --rqstp- -timeout ---ops-- <6>18091 0080 -11 ffff88082b6c9e00 (null) 0 ffffffffa027b7e0 nfsv3 ACCESS a:call_reserveresult q:xprt_sending <6>client: ffff88082a244600, xprt: ffff88082a343000, snd_task: (null) <6>client: ffff880829181600, xprt: ffff88082a343000, snd_task: (null) <6>client: ffff880828170200, xprt: ffff880824aef800, snd_task: ffff881029c63ec0 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Mon, 7 Jan 2013 14:30:46 -0500 Subject: [PATCH v4] SUNRPC: Ensure we release the socket write lock if the rpc_task exits early If the rpc_task exits while holding the socket write lock before it has allocated an rpc slot, then the usual mechanism for releasing the write lock in xprt_release() is defeated. The problem occurs if the call to xprt_lock_write() initially fails, so that the rpc_task is put on the xprt->sending wait queue. If the task exits after being assigned the lock by __xprt_lock_write_func, but before it has retried the call to xprt_lock_and_alloc_slot(), then it calls xprt_release() while holding the write lock, but will immediately exit due to the test for task->tk_rqstp != NULL. Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.1] --- net/sunrpc/sched.c | 3 +-- net/sunrpc/xprt.c | 12 ++++++++++-- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index b4133bd..bfa3171 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -972,8 +972,7 @@ static void rpc_async_release(struct work_struct *work) static void rpc_release_resources_task(struct rpc_task *task) { - if (task->tk_rqstp) - xprt_release(task); + xprt_release(task); if (task->tk_msg.rpc_cred) { put_rpccred(task->tk_msg.rpc_cred); task->tk_msg.rpc_cred = NULL; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index bd462a5..33811db 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1136,10 +1136,18 @@ static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt) void xprt_release(struct rpc_task *task) { struct rpc_xprt *xprt; - struct rpc_rqst *req; + struct rpc_rqst *req = task->tk_rqstp; - if (!(req = task->tk_rqstp)) + if (req == NULL) { + if (task->tk_client) { + rcu_read_lock(); + xprt = rcu_dereference(task->tk_client->cl_xprt); + if (xprt->snd_task == task) + xprt_release_write(xprt, task); + rcu_read_unlock(); + } return; + } xprt = req->rq_xprt; if (task->tk_ops->rpc_count_stats != NULL) -- 1.7.11.7