diff mbox series

[02/15] RDMA/iwcm: Fix a lock inversion issue

Message ID 20190930231707.48259-3-bvanassche@acm.org (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series RDMA patches for kernel v5.5 | expand

Commit Message

Bart Van Assche Sept. 30, 2019, 11:16 p.m. UTC
This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&id_priv->handler_mutex);
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
 #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
 dump_stack+0x8a/0xd6
 __lock_acquire.cold+0xe1/0x24d
 lock_acquire+0x106/0x240
 __mutex_lock+0x12e/0xcb0
 mutex_lock_nested+0x1f/0x30
 rdma_destroy_id+0x78/0x4a0 [rdma_cm]
 iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
 cm_work_handler+0xe62/0x1100 [iw_cm]
 process_one_work+0x56d/0xac0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

Cc: Or Gerlitz <gerlitz.or@gmail.com>
Cc: Steve Wise <larrystevenwise@gmail.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bernard Metzler <BMT@zurich.ibm.com>
Cc: Krishnamraju Eraparaju <krishna2@chelsio.com>
Cc: <stable@vger.kernel.org>
Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks"; v2.6.27).
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/infiniband/core/cma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Oct. 1, 2019, 3:17 p.m. UTC | #1
On Mon, Sep 30, 2019 at 04:16:54PM -0700, Bart Van Assche wrote:
> This patch fixes the lock inversion complaint:
> 
> ============================================
> WARNING: possible recursive locking detected
> 5.3.0-rc7-dbg+ #1 Not tainted
> kworker/u16:6/171 is trying to acquire lock:
> 00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]
> 
> but task is already holding lock:
> 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>   lock(&id_priv->handler_mutex);
>   lock(&id_priv->handler_mutex);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by kworker/u16:6/171:
>  #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
>  #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
>  #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
> 
> stack backtrace:
> CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> Call Trace:
>  dump_stack+0x8a/0xd6
>  __lock_acquire.cold+0xe1/0x24d
>  lock_acquire+0x106/0x240
>  __mutex_lock+0x12e/0xcb0
>  mutex_lock_nested+0x1f/0x30
>  rdma_destroy_id+0x78/0x4a0 [rdma_cm]
>  iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
>  cm_work_handler+0xe62/0x1100 [iw_cm]
>  process_one_work+0x56d/0xac0
>  worker_thread+0x7a/0x5d0
>  kthread+0x1bc/0x210
>  ret_from_fork+0x24/0x30
> 
> Cc: Or Gerlitz <gerlitz.or@gmail.com>
> Cc: Steve Wise <larrystevenwise@gmail.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Bernard Metzler <BMT@zurich.ibm.com>
> Cc: Krishnamraju Eraparaju <krishna2@chelsio.com>
> Cc: <stable@vger.kernel.org>
> Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks"; v2.6.27).
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>  drivers/infiniband/core/cma.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 0e3cf3461999..d78f67623f24 100644
> +++ b/drivers/infiniband/core/cma.c
> @@ -2396,9 +2396,10 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
>  		conn_id->cm_id.iw = NULL;
>  		cma_exch(conn_id, RDMA_CM_DESTROYING);
>  		mutex_unlock(&conn_id->handler_mutex);
> +		mutex_unlock(&listen_id->handler_mutex);
>  		cma_deref_id(conn_id);
>  		rdma_destroy_id(&conn_id->id);
> -		goto out;
> +		return ret;
>  	}

Hurm. Minimizing code under lock is always a good fix, but the lockdep
report is not a bug.

The issue is caused by the hacky use of SINGLE_DEPTH_NESTING when we
really have two lock classes, 'listening' and 'connecting' for CM ids.

connecting IDs can be nested under listening IDs but not the other way
around.

So two lock classes will also get rid of the lockdep warning, which is
why it isn't a bug..

Applied to for-rc

Thanks,
Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0e3cf3461999..d78f67623f24 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2396,9 +2396,10 @@  static int iw_conn_req_handler(struct iw_cm_id *cm_id,
 		conn_id->cm_id.iw = NULL;
 		cma_exch(conn_id, RDMA_CM_DESTROYING);
 		mutex_unlock(&conn_id->handler_mutex);
+		mutex_unlock(&listen_id->handler_mutex);
 		cma_deref_id(conn_id);
 		rdma_destroy_id(&conn_id->id);
-		goto out;
+		return ret;
 	}
 
 	mutex_unlock(&conn_id->handler_mutex);