From patchwork Mon Jun 26 22:59:20 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Saleem, Shiraz" <shiraz.saleem@intel.com>
X-Patchwork-Id: 9810585
Return-Path: <linux-rdma-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	5B6F460329 for <patchwork-linux-rdma@patchwork.kernel.org>;
	Mon, 26 Jun 2017 22:59:25 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 43AA92844E
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Mon, 26 Jun 2017 22:59:25 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 376CA28459; Mon, 26 Jun 2017 22:59:25 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9AF7F2844E
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Mon, 26 Jun 2017 22:59:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751410AbdFZW7X (ORCPT
	<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
	Mon, 26 Jun 2017 18:59:23 -0400
Received: from mga11.intel.com ([192.55.52.93]:57276 "EHLO mga11.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750819AbdFZW7X (ORCPT <rfc822;linux-rdma@vger.kernel.org>);
	Mon, 26 Jun 2017 18:59:23 -0400
Received: from orsmga001.jf.intel.com ([10.7.209.18])
	by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	26 Jun 2017 15:59:22 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.39,397,1493708400"; d="scan'208";a="1145062277"
Received: from ssaleem-mobl4.amr.corp.intel.com (HELO ssaleem-mobl1)
	([10.122.33.126])
	by orsmga001.jf.intel.com with SMTP; 26 Jun 2017 15:59:20 -0700
Received: by ssaleem-mobl1 (sSMTP sendmail emulation);
	Mon, 26 Jun 2017 17:59:20 -0500
Date: Mon, 26 Jun 2017 17:59:20 -0500
From: Shiraz Saleem <shiraz.saleem@intel.com>
To: hch@lst.de, sagi@grimberg.me
Cc: linux-rdma@vger.kernel.org
Subject: Deadlock on device removal event for NVMeF target
Message-ID: <20170626225920.GA11700@ssaleem-MOBL4.amr.corp.intel.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.7.2 (2016-11-26)
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hi Sagi/Christoph,

I am seeing a deadlock for a device removal event on NVMeF target.

The sequence of events leading to the deadlock are as follows,

1. i40iw posts IW_CM_EVENT_CLOSE events for all QPs causing the corresponding 
NVMet RDMA Queues to disconnect and schedule release of any pending work on WQ
2. i40iw triggers device removal 
	ib_unregister_device
	[..]
	cma_remove_id_dev (takes a handler lock before calling the event handler)
	nvmet_rdma_cm_handler
	nvmet_rdma_device_removal (queue->state = NVMET_RDMA_Q_DISCONNECTING due to 1.)
	flush_scheduled_work (blocks till all scheduled work is drained from WQ)
	nvmet_rdma_release_queue_work (queue->state = NVMET_RDMA_Q_DISCONNECTING)
	rdma_destroy_id (waits on the same handler lock as cma_remove_id_dev causing the deadlock)
     
So this problem can occur when there is a device removal event while the queue is in 
disconnect state with the some oustanding work that hasnt been drained from the WQ at the
time flush_scheduled_work is called.

Here is a call trace with some debug logs. The problematic cm_id is ffff8806e9924d18.

[371.577371] nvmet_rdma: nvmet_rdma_cm_handler: disconnected (10): cm_id ffff8806de8acd18
[371.577378] nvmet_rdma: __nvmet_rdma_queue_disconnect: cm_id= ffff8806de8acd18 queue->state= 1 queue->idx = 1
[371.577381] nvmet_rdma: __nvmet_rdma_queue_disconnect: rdma_disconnect cm_id= ffff8806de8acd18
[371.577501] nvmet_rdma: nvmet_rdma_cm_handler: disconnected (10): cm_id ffff8806e9924d18
[371.577502] nvmet_rdma: nvmet_rdma_cm_handler: device removal (11): cm_id ffff8806e6b6d5a8
[371.577504] nvmet_rdma: nvmet_rdma_device_removal: queue           (null)
[371.577506] nvmet_rdma: __nvmet_rdma_queue_disconnect: cm_id= ffff8806e9924d18 queue->state= 1 queue->idx = 0
[371.577507] rdma_destroy_id id ffff8806e6b6d5a8
[371.577509] nvmet_rdma: __nvmet_rdma_queue_disconnect: rdma_disconnect cm_id= ffff8806e9924d18
[371.577870] nvmet_rdma: nvmet_rdma_cm_handler: device removal (11): cm_id ffff8806e9924d18
[371.577877] nvmet_rdma: nvmet_rdma_device_removal: calling flush_scheduled_work queue state 2 idx 0 cm_id ffff8806e9924d18
[371.580174] nvmet_rdma: nvmet_rdma_queue_response_post_send: 2 opcode 2 queue_state 2 queue_idx 0
[371.580198] nvmet_rdma: sending cmd response failed
[371.596356] nvmet_rdma: nvmet_rdma_release_queue_work: calling rdma_destroy_id cm_id ffff8806de8acd18 queue_idx 1 state 2
[371.596358] rdma_destroy_id id ffff8806de8acd18
[371.596512] nvmet_rdma: nvmet_rdma_release_queue_work: calling rdma_destroy_id cm_id ffff8806e9924d18 queue_idx 0 state 2
[371.596514] rdma_destroy_id id ffff8806e9924d18                    ------------------------------------------------> Hang                                                             
[614.390162] INFO: task kworker/6:1:4429 blocked for more than 120 seconds.
[614.390165]       Tainted: G           O    4.11.0-rc8+ #11
[614.390166] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[614.390167] kworker/6:1     D30016  4429      2 0x00000000
[614.390176] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[614.390178] Call Trace:
[614.390182]  __schedule+0x378/0xbf0
[614.390185]  schedule+0x38/0x90
[614.390186]  schedule_preempt_disabled+0x10/0x20
[614.390188]  __mutex_lock+0x2ac/0x960
[614.390190]  ? __mutex_lock+0xb6/0x960
[614.390193]  ? rdma_destroy_id+0x6a/0x2e0 [rdma_cm]
[614.390196]  mutex_lock_nested+0x16/0x20
[614.390198]  rdma_destroy_id+0x6a/0x2e0 [rdma_cm]
[614.390201]  nvmet_rdma_release_queue_work+0x78/0x80 [nvmet_rdma]
[614.390203]  process_one_work+0x203/0x710
[614.390204]  ? process_one_work+0x16f/0x710
[614.390207]  worker_thread+0x126/0x4a0
[614.390210]  kthread+0x112/0x150
[614.390212]  ? process_one_work+0x710/0x710
[614.390213]  ? kthread_create_on_node+0x40/0x40
[614.390216]  ret_from_fork+0x2e/0x40
[614.390219] INFO: lockdep is turned off.
[614.390239] INFO: task rmmod:8443 blocked for more than 120 seconds.
[614.390241]       Tainted: G           O    4.11.0-rc8+ #11
[614.390242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[614.390243] rmmod           D29880  8443   7036 0x00000080
[614.390248] Call Trace:
[614.390250]  __schedule+0x378/0xbf0
[614.390252]  schedule+0x38/0x90
[614.390254]  schedule_timeout+0x235/0x4f0
[614.390257]  ? mark_held_locks+0x71/0x90
[614.390259]  ? _raw_spin_unlock_irq+0x27/0x40
[614.390261]  wait_for_completion+0xb4/0x120
[614.390264]  ? wake_up_q+0x70/0x70
[614.390267]  flush_workqueue+0x211/0x6c0
[614.390269]  ? flush_workqueue+0x100/0x6c0
[614.390271]  ? vprintk_emit+0x211/0x2e0
[614.390275]  ? printk+0x43/0x45
[614.390277]  ? wait_for_completion+0x36/0x120
[614.390280]  nvmet_rdma_cm_handler+0x34f/0xf50 [nvmet_rdma]
[614.390282]  ? cma_comp+0x39/0x50 [rdma_cm]
[614.390284]  ? mark_held_locks+0x71/0x90
[614.390286]  ? _raw_spin_unlock_irqrestore+0x31/0x50
[614.390289]  ? trace_hardirqs_on+0xd/0x10
[614.390292]  cma_remove_one+0x1fc/0x220 [rdma_cm]
[614.390299]  ib_unregister_device+0xd8/0x170 [ib_core]
[614.390303]  i40iw_destroy_rdma_device+0x55/0x160 [i40iw]
[614.390307]  i40iw_deinit_device+0x96/0x460 [i40iw]
[614.390311]  i40iw_close+0x54/0xc0 [i40iw]
[614.390316]  i40e_unregister_client+0xaf/0x1b0 [i40e]
[614.390320]  i40iw_exit_module+0x10/0xaa4 [i40iw]
[614.390322]  SyS_delete_module+0x16a/0x230
[614.390325]  do_syscall_64+0x59/0x1a0
[614.390327]  entry_SYSCALL64_slow_path+0x25/0x25
[614.390329] RIP: 0033:0x7f955841d697
[614.390339] INFO: lockdep is turned off

This patch fixed the problem but I am not sure if it is the correct fix.


Shiraz
---
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 9e45cde..d0fb307 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1349,6 +1349,12 @@ static int nvmet_rdma_device_removal(struct rdma_cm_id *cm_id,
                spin_lock_irqsave(&queue->state_lock, flags);
                if (queue->state != NVMET_RDMA_Q_DISCONNECTING)
                        queue->state = NVMET_RDMA_IN_DEVICE_REMOVAL;
+               else {
+                       /*queue is disconnecting; so cm_id and queues will be destroyed*/
+                       spin_unlock_irqrestore(&queue->state_lock, flags);
+                       return 0;
+               }
+
                spin_unlock_irqrestore(&queue->state_lock, flags);
                nvmet_rdma_queue_disconnect(queue);
                flush_scheduled_work();