diff mbox

Possible Race Condition on SIGKILL

Message ID 4FA345DA4F4AE44899BD2B03EEEC2FA911993608@SACEXCMBX04-PRD.hq.netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Jan. 8, 2013, 7:44 p.m. UTC
On Tue, 2013-01-08 at 13:40 -0500, Chris Perl wrote:
> On Mon, Jan 07, 2013 at 05:00:47PM -0500, Chris Perl wrote:
> Anyway, it appears that on mount the rpc_tasks tk_client member is NULL
> and therefore the double dereference of task->tk_xprt is what blew
> things up.  I ammended the patch for this [1] and am testing it
> now.
> 
> Thus far, I've still hit hangs, it just seems to take longer.  I'll have
> to dig in a bit further to see what's going on now.
> 
> Is this CentOS 6.3 kernel this system too old for you guys to care?
> I.e. should I spend time reporting digging into and reporting problems
> for this system as well or you only care about the fedora system?

My main interest is always the upstream (Linus) kernel, however the RPC
client in the CentOS 6.3 kernel does actually contain a lot of code that
was recently backported from upstream. As such, it is definitely of
interest to figure out corner case bugs so that we can compare to
upstream...

> I'll report back again when I have further info and after testing the
> fedora system.
> 
> [1] linux-kernel-test.patch

I've attached the latest copy of the patch (v4). In addition to the
check for tk_client!=NULL, it needed a couple of changes to deal with
the RCU code.

Cheers
   Trond

Comments

Chris Perl Jan. 8, 2013, 9:01 p.m. UTC | #1
> My main interest is always the upstream (Linus) kernel, however the RPC
> client in the CentOS 6.3 kernel does actually contain a lot of code that
> was recently backported from upstream. As such, it is definitely of
> interest to figure out corner case bugs so that we can compare to
> upstream...

Ok, great.  I will try this version of the patch as well.  However, when just
thinking about this, I'm concerned that the race still exists, but is just less
likely to manifest.

I imagine something like this happening.  I assume there is some reason this
can't happen that I'm not seeing?  These are functions from linus's current
git, not the CentOS 6.3 code: 

thread 1                                          thread 2
--------                                          --------
__rpc_execute                                     __rpc_execute
...                                               ...
call_reserve
xprt_reserve
xprt_lock_and_alloc_slot
xprt_lock_write
xprt_reserve_xprt
...
xprt_release_write
                                                  call_reserve
						  xprt_reserve
						  xprt_lock_and_alloc_slot
						  xprt_lock_write
						  xprt_reserve_xprt
						  rpc_sleep_on_priority
						  __rpc_sleep_on_priority
						  __rpc_add_wait_queue
						  __rpc_add_wait_queue_priority
						  (Now on the sending wait queue)
xs_tcp_release_xprt
xprt_release_xprt
xprt_clear_locked
__xprt_lock_write_next
rpc_wake_up_first
__rpc_find_next_queued
__rpc_find_next_queued_priority
...
(has now pulled thread 2 off the wait queue)
                                                 out_of_line_wait_on_bit
						 (receive SIGKILL)
						 rpc_wait_bit_killable
						 rpc_exit
						 rpc_exit_task
						 rpc_release_task
						 (doesn't release xprt b/c he isn't listed in snd_task yet)
						 (returns from __rpc_execute)
__xprt_lock_write_func
(thread 2 now has the transport locked)
rpc_wake_up_task_queue_locked
__rpc_remove_wait_queue
__rpc_remove_wait_queue_priority
(continues on, potentially exiting early,
 potentially blocking the next time it needs the transport)

With the patch we're discussing, it would fix the case where thread 2 is
breaking out of the FSM loop after having been given the transport lock.  But
what about the above.  Is there something else synchronizing things?

In my testing so far with my (not quite right) amended v3 of the patch, the
timing has become such that I was having trouble reproducing the problem while
attempting to instrument things with systemtap.  However, without systemtap
running, I'm still able to reproduce the hang pretty easily.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Jan. 8, 2013, 9:13 p.m. UTC | #2
On Tue, 2013-01-08 at 16:01 -0500, Chris Perl wrote:
> > My main interest is always the upstream (Linus) kernel, however the RPC
> > client in the CentOS 6.3 kernel does actually contain a lot of code that
> > was recently backported from upstream. As such, it is definitely of
> > interest to figure out corner case bugs so that we can compare to
> > upstream...
> 
> Ok, great.  I will try this version of the patch as well.  However, when just
> thinking about this, I'm concerned that the race still exists, but is just less
> likely to manifest.
> 
> I imagine something like this happening.  I assume there is some reason this
> can't happen that I'm not seeing?  These are functions from linus's current
> git, not the CentOS 6.3 code: 
> 
> thread 1                                          thread 2
> --------                                          --------
> __rpc_execute                                     __rpc_execute
> ...                                               ...
> call_reserve
> xprt_reserve
> xprt_lock_and_alloc_slot
> xprt_lock_write
> xprt_reserve_xprt
> ...
> xprt_release_write
>                                                   call_reserve
> 						  xprt_reserve
> 						  xprt_lock_and_alloc_slot
> 						  xprt_lock_write
> 						  xprt_reserve_xprt
> 						  rpc_sleep_on_priority
> 						  __rpc_sleep_on_priority
> 						  __rpc_add_wait_queue
> 						  __rpc_add_wait_queue_priority
> 						  (Now on the sending wait queue)
> xs_tcp_release_xprt
> xprt_release_xprt
> xprt_clear_locked
> __xprt_lock_write_next
> rpc_wake_up_first
> __rpc_find_next_queued
> __rpc_find_next_queued_priority
> ...
> (has now pulled thread 2 off the wait queue)
>                                                  out_of_line_wait_on_bit
> 						 (receive SIGKILL)
> 						 rpc_wait_bit_killable
> 						 rpc_exit
> 						 rpc_exit_task
> 						 rpc_release_task
> 						 (doesn't release xprt b/c he isn't listed in snd_task yet)
> 						 (returns from __rpc_execute)
> __xprt_lock_write_func
> (thread 2 now has the transport locked)
> rpc_wake_up_task_queue_locked
> __rpc_remove_wait_queue
> __rpc_remove_wait_queue_priority
> (continues on, potentially exiting early,
>  potentially blocking the next time it needs the transport)
> 
> With the patch we're discussing, it would fix the case where thread 2 is
> breaking out of the FSM loop after having been given the transport lock.  But
> what about the above.  Is there something else synchronizing things?

Hi Chris,

I'm not sure I understand how the above would work. rpc_exit_task() will
remove the rpc_task from the xprt->sending rpc_wait_queue, at which
point __xprt_lock_write_func() can no longer find it.

> In my testing so far with my (not quite right) amended v3 of the patch, the
> timing has become such that I was having trouble reproducing the problem while
> attempting to instrument things with systemtap.  However, without systemtap
> running, I'm still able to reproduce the hang pretty easily.

Is it really the same hang? Does 'echo 0 >/proc/sys/sunrpc/rpc_debug'
show you a similar list of rpc tasks waiting on xprt->sending?
Chris Perl Jan. 8, 2013, 9:23 p.m. UTC | #3
On Tue, Jan 08, 2013 at 09:13:45PM +0000, Myklebust, Trond wrote:
> On Tue, 2013-01-08 at 16:01 -0500, Chris Perl wrote:
> > > My main interest is always the upstream (Linus) kernel, however the RPC
> > > client in the CentOS 6.3 kernel does actually contain a lot of code that
> > > was recently backported from upstream. As such, it is definitely of
> > > interest to figure out corner case bugs so that we can compare to
> > > upstream...
> > 
> > Ok, great.  I will try this version of the patch as well.  However, when just
> > thinking about this, I'm concerned that the race still exists, but is just less
> > likely to manifest.
> > 
> > I imagine something like this happening.  I assume there is some reason this
> > can't happen that I'm not seeing?  These are functions from linus's current
> > git, not the CentOS 6.3 code: 
> > 
> > thread 1                                          thread 2
> > --------                                          --------
> > __rpc_execute                                     __rpc_execute
> > ...                                               ...
> > call_reserve
> > xprt_reserve
> > xprt_lock_and_alloc_slot
> > xprt_lock_write
> > xprt_reserve_xprt
> > ...
> > xprt_release_write
> >                                                   call_reserve
> > 						  xprt_reserve
> > 						  xprt_lock_and_alloc_slot
> > 						  xprt_lock_write
> > 						  xprt_reserve_xprt
> > 						  rpc_sleep_on_priority
> > 						  __rpc_sleep_on_priority
> > 						  __rpc_add_wait_queue
> > 						  __rpc_add_wait_queue_priority
> > 						  (Now on the sending wait queue)
> > xs_tcp_release_xprt
> > xprt_release_xprt
> > xprt_clear_locked
> > __xprt_lock_write_next
> > rpc_wake_up_first
> > __rpc_find_next_queued
> > __rpc_find_next_queued_priority
> > ...
> > (has now pulled thread 2 off the wait queue)
> >                                                  out_of_line_wait_on_bit
> > 						 (receive SIGKILL)
> > 						 rpc_wait_bit_killable
> > 						 rpc_exit
> > 						 rpc_exit_task
> > 						 rpc_release_task
> > 						 (doesn't release xprt b/c he isn't listed in snd_task yet)
> > 						 (returns from __rpc_execute)
> > __xprt_lock_write_func
> > (thread 2 now has the transport locked)
> > rpc_wake_up_task_queue_locked
> > __rpc_remove_wait_queue
> > __rpc_remove_wait_queue_priority
> > (continues on, potentially exiting early,
> >  potentially blocking the next time it needs the transport)
> > 
> > With the patch we're discussing, it would fix the case where thread 2 is
> > breaking out of the FSM loop after having been given the transport lock.  But
> > what about the above.  Is there something else synchronizing things?
> 
> I'm not sure I understand how the above would work. rpc_exit_task() will
> remove the rpc_task from the xprt->sending rpc_wait_queue, at which
> point __xprt_lock_write_func() can no longer find it.

I'm imagining that thread 1 pulls thread 2 off the sending wait queue
almost right after it was added, but before thread 2 has a chance to
execute rpc_exit_task.  Now that thread 1 has a reference to
thread 2, it proceeds to give it the lock on the transport, but before
it does so, thread 2 completely finishes executing and returns from
__rpc_execute.

I'm probably missing something.

> Is it really the same hang? Does 'echo 0 >/proc/sys/sunrpc/rpc_debug'
> show you a similar list of rpc tasks waiting on xprt->sending?

I'm not certain.  Its possible that its not.  I'm recompiling with your
v4 of the patch right now and will test shortly.  I didn't know about
echoing 0 out to that file, good to know! :)

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Jan. 8, 2013, 9:31 p.m. UTC | #4
On Tue, 2013-01-08 at 16:23 -0500, Chris Perl wrote:
> On Tue, Jan 08, 2013 at 09:13:45PM +0000, Myklebust, Trond wrote:
> > On Tue, 2013-01-08 at 16:01 -0500, Chris Perl wrote:
> > > > My main interest is always the upstream (Linus) kernel, however the RPC
> > > > client in the CentOS 6.3 kernel does actually contain a lot of code that
> > > > was recently backported from upstream. As such, it is definitely of
> > > > interest to figure out corner case bugs so that we can compare to
> > > > upstream...
> > > 
> > > Ok, great.  I will try this version of the patch as well.  However, when just
> > > thinking about this, I'm concerned that the race still exists, but is just less
> > > likely to manifest.
> > > 
> > > I imagine something like this happening.  I assume there is some reason this
> > > can't happen that I'm not seeing?  These are functions from linus's current
> > > git, not the CentOS 6.3 code: 
> > > 
> > > thread 1                                          thread 2
> > > --------                                          --------
> > > __rpc_execute                                     __rpc_execute
> > > ...                                               ...
> > > call_reserve
> > > xprt_reserve
> > > xprt_lock_and_alloc_slot
> > > xprt_lock_write
> > > xprt_reserve_xprt
> > > ...
> > > xprt_release_write
> > >                                                   call_reserve
> > > 						  xprt_reserve
> > > 						  xprt_lock_and_alloc_slot
> > > 						  xprt_lock_write
> > > 						  xprt_reserve_xprt
> > > 						  rpc_sleep_on_priority
> > > 						  __rpc_sleep_on_priority
> > > 						  __rpc_add_wait_queue
> > > 						  __rpc_add_wait_queue_priority
> > > 						  (Now on the sending wait queue)
> > > xs_tcp_release_xprt
> > > xprt_release_xprt
> > > xprt_clear_locked
> > > __xprt_lock_write_next
> > > rpc_wake_up_first
> > > __rpc_find_next_queued
> > > __rpc_find_next_queued_priority
> > > ...
> > > (has now pulled thread 2 off the wait queue)
> > >                                                  out_of_line_wait_on_bit
> > > 						 (receive SIGKILL)
> > > 						 rpc_wait_bit_killable
> > > 						 rpc_exit
> > > 						 rpc_exit_task
> > > 						 rpc_release_task
> > > 						 (doesn't release xprt b/c he isn't listed in snd_task yet)
> > > 						 (returns from __rpc_execute)
> > > __xprt_lock_write_func
> > > (thread 2 now has the transport locked)
> > > rpc_wake_up_task_queue_locked
> > > __rpc_remove_wait_queue
> > > __rpc_remove_wait_queue_priority
> > > (continues on, potentially exiting early,
> > >  potentially blocking the next time it needs the transport)
> > > 
> > > With the patch we're discussing, it would fix the case where thread 2 is
> > > breaking out of the FSM loop after having been given the transport lock.  But
> > > what about the above.  Is there something else synchronizing things?
> > 
> > I'm not sure I understand how the above would work. rpc_exit_task() will
> > remove the rpc_task from the xprt->sending rpc_wait_queue, at which
> > point __xprt_lock_write_func() can no longer find it.
> 
> I'm imagining that thread 1 pulls thread 2 off the sending wait queue
> almost right after it was added, but before thread 2 has a chance to
> execute rpc_exit_task.  Now that thread 1 has a reference to
> thread 2, it proceeds to give it the lock on the transport, but before
> it does so, thread 2 completely finishes executing and returns from
> __rpc_execute.
> 
> I'm probably missing something.

The lock is associated with the rpc_task. Threads can normally only
access an rpc_task when it is on a wait queue (while holding the wait
queue lock) unless they are given ownership of the rpc_task.

IOW: the scenario you describe should not be possible, since it relies
on thread 1 assigning the lock to the rpc_task after it has been removed
from the wait queue.

> > Is it really the same hang? Does 'echo 0 >/proc/sys/sunrpc/rpc_debug'
> > show you a similar list of rpc tasks waiting on xprt->sending?
> 
> I'm not certain.  Its possible that its not.  I'm recompiling with your
> v4 of the patch right now and will test shortly.  I didn't know about
> echoing 0 out to that file, good to know! :)

That would be great.

If you are recompiling the kernel, perhaps you can also add in a patch
to rpc_show_tasks() to display the current value of
clnt->cl_xprt->snd_task?
Chris Perl Jan. 8, 2013, 10:16 p.m. UTC | #5
> The lock is associated with the rpc_task. Threads can normally only
> access an rpc_task when it is on a wait queue (while holding the wait
> queue lock) unless they are given ownership of the rpc_task.
> 
> IOW: the scenario you describe should not be possible, since it relies
> on thread 1 assigning the lock to the rpc_task after it has been removed
> from the wait queue.

Hrm.  I guess I'm in over my head here. Apologoies if I'm just asking
silly bumbling questions.  You can start ignoring me at any time. :)

I was talking about setting (or leaving set) the XPRT_LOCKED bit in
rpc_xprt->state.  By "assigning the lock" I really just mean that thread
1 leaves XPRT_LOCKED set in rpc_xprt->state and sets rpc_xprt->snd_task
to thread 2.

> If you are recompiling the kernel, perhaps you can also add in a patch
> to rpc_show_tasks() to display the current value of
> clnt->cl_xprt->snd_task?

Sure.  This is what 'echo 0 > /proc/sys/sunrpc/rpc_debug' shows after
the hang (with my extra prints):

# cat /proc/kmsg
...
<6>client: ffff88082b6c9c00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
<6>client: ffff88082b6c9e00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
<6>-pid- flgs status -client- --rqstp- -timeout ---ops--
<6>18091 0080    -11 ffff88082b6c9e00   (null)      ffff0770ns3 ACCESS a:call_reserveresult q:xprt_sending
<6>client: ffff88082a244600, xprt: ffff88082a343000, snd_task:   (null)
<6>client: ffff880829181600, xprt: ffff88082a343000, snd_task:   (null)
<6>client: ffff880828170200, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Perl Jan. 8, 2013, 10:19 p.m. UTC | #6
On Tue, Jan 08, 2013 at 05:16:51PM -0500, Chris Perl wrote:
> > The lock is associated with the rpc_task. Threads can normally only
> > access an rpc_task when it is on a wait queue (while holding the wait
> > queue lock) unless they are given ownership of the rpc_task.
> > 
> > IOW: the scenario you describe should not be possible, since it relies
> > on thread 1 assigning the lock to the rpc_task after it has been removed
> > from the wait queue.
> 
> Hrm.  I guess I'm in over my head here. Apologoies if I'm just asking
> silly bumbling questions.  You can start ignoring me at any time. :)
> 
> I was talking about setting (or leaving set) the XPRT_LOCKED bit in
> rpc_xprt->state.  By "assigning the lock" I really just mean that thread
> 1 leaves XPRT_LOCKED set in rpc_xprt->state and sets rpc_xprt->snd_task
> to thread 2.
> 
> > If you are recompiling the kernel, perhaps you can also add in a patch
> > to rpc_show_tasks() to display the current value of
> > clnt->cl_xprt->snd_task?
> 
> Sure.  This is what 'echo 0 > /proc/sys/sunrpc/rpc_debug' shows after
> the hang (with my extra prints):
> 
> # cat /proc/kmsg
> ...
> <6>client: ffff88082b6c9c00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
> <6>client: ffff88082b6c9e00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
> <6>-pid- flgs status -client- --rqstp- -timeout ---ops--
> <6>18091 0080    -11 ffff88082b6c9e00   (null)      ffff0770ns3 ACCESS a:call_reserveresult q:xprt_sending
> <6>client: ffff88082a244600, xprt: ffff88082a343000, snd_task:   (null)
> <6>client: ffff880829181600, xprt: ffff88082a343000, snd_task:   (null)
> <6>client: ffff880828170200, xprt: ffff880824aef800, snd_task: ffff881029c63ec0

Sorry, that output was a little messed up.  Here it is again:

<6>client: ffff88082b6c9c00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
<6>client: ffff88082b6c9e00, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
<6>-pid- flgs status -client- --rqstp- -timeout ---ops--
<6>18091 0080    -11 ffff88082b6c9e00   (null)        0 ffffffffa027b7e0 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
<6>client: ffff88082a244600, xprt: ffff88082a343000, snd_task:   (null)
<6>client: ffff880829181600, xprt: ffff88082a343000, snd_task:   (null)
<6>client: ffff880828170200, xprt: ffff880824aef800, snd_task: ffff881029c63ec0
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Mon, 7 Jan 2013 14:30:46 -0500
Subject: [PATCH v4] SUNRPC: Ensure we release the socket write lock if the
 rpc_task exits early

If the rpc_task exits while holding the socket write lock before it has
allocated an rpc slot, then the usual mechanism for releasing the write
lock in xprt_release() is defeated.

The problem occurs if the call to xprt_lock_write() initially fails, so
that the rpc_task is put on the xprt->sending wait queue. If the task
exits after being assigned the lock by __xprt_lock_write_func, but
before it has retried the call to xprt_lock_and_alloc_slot(), then
it calls xprt_release() while holding the write lock, but will
immediately exit due to the test for task->tk_rqstp != NULL.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
---
 net/sunrpc/sched.c |  3 +--
 net/sunrpc/xprt.c  | 12 ++++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index b4133bd..bfa3171 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -972,8 +972,7 @@  static void rpc_async_release(struct work_struct *work)
 
 static void rpc_release_resources_task(struct rpc_task *task)
 {
-	if (task->tk_rqstp)
-		xprt_release(task);
+	xprt_release(task);
 	if (task->tk_msg.rpc_cred) {
 		put_rpccred(task->tk_msg.rpc_cred);
 		task->tk_msg.rpc_cred = NULL;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index bd462a5..33811db 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1136,10 +1136,18 @@  static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)
 void xprt_release(struct rpc_task *task)
 {
 	struct rpc_xprt	*xprt;
-	struct rpc_rqst	*req;
+	struct rpc_rqst	*req = task->tk_rqstp;
 
-	if (!(req = task->tk_rqstp))
+	if (req == NULL) {
+		if (task->tk_client) {
+			rcu_read_lock();
+			xprt = rcu_dereference(task->tk_client->cl_xprt);
+			if (xprt->snd_task == task)
+				xprt_release_write(xprt, task);
+			rcu_read_unlock();
+		}
 		return;
+	}
 
 	xprt = req->rq_xprt;
 	if (task->tk_ops->rpc_count_stats != NULL)
-- 
1.7.11.7