diff mbox series

[for-rc,1/3] IB/isert: Fix dead lock in ib_isert

Message ID 20230601094220.64810-2-saravanan.vajravel@broadcom.com (mailing list archive)
State Superseded
Headers show
Series IB/isert Bug fixes in ib_isert | expand

Commit Message

Saravanan Vajravel June 1, 2023, 9:42 a.m. UTC
- When a iSER session is released, ib_isert module is taking a mutex
  lock and releasing all pending connections. As part of this, ib_isert
  is destroying rdma cm_id. To destroy cm_id, rdma_cm module is sending
  CM events to CMA handler of ib_isert. This handler is taking same
  mutex lock. Hence it leads to deadlock between ib_isert & rdma_cm
  modules.

- For fix, created local list of pending connections and release the
  connection outside of mutex lock.

Calltrace:
---------
[ 1229.791410] INFO: task kworker/10:1:642 blocked for more than 120 seconds.
[ 1229.791416]       Tainted: G           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[ 1229.791418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1229.791419] task:kworker/10:1    state:D stack:    0 pid:  642 ppid:     2 flags:0x80004000
[ 1229.791424] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 1229.791436] Call Trace:
[ 1229.791438]  __schedule+0x2d1/0x830
[ 1229.791445]  ? select_idle_sibling+0x23/0x6f0
[ 1229.791449]  schedule+0x35/0xa0
[ 1229.791451]  schedule_preempt_disabled+0xa/0x10
[ 1229.791453]  __mutex_lock.isra.7+0x310/0x420
[ 1229.791456]  ? select_task_rq_fair+0x351/0x990
[ 1229.791459]  isert_cma_handler+0x224/0x330 [ib_isert]
[ 1229.791463]  ? ttwu_queue_wakelist+0x159/0x170
[ 1229.791466]  cma_cm_event_handler+0x25/0xd0 [rdma_cm]
[ 1229.791474]  cma_ib_handler+0xa7/0x2e0 [rdma_cm]
[ 1229.791478]  cm_process_work+0x22/0xf0 [ib_cm]
[ 1229.791483]  cm_work_handler+0xf4/0xf30 [ib_cm]
[ 1229.791487]  ? move_linked_works+0x6e/0xa0
[ 1229.791490]  process_one_work+0x1a7/0x360
[ 1229.791491]  ? create_worker+0x1a0/0x1a0
[ 1229.791493]  worker_thread+0x30/0x390
[ 1229.791494]  ? create_worker+0x1a0/0x1a0
[ 1229.791495]  kthread+0x10a/0x120
[ 1229.791497]  ? set_kthread_struct+0x40/0x40
[ 1229.791499]  ret_from_fork+0x1f/0x40

[ 1229.791739] INFO: task targetcli:28666 blocked for more than 120 seconds.
[ 1229.791740]       Tainted: G           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[ 1229.791741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1229.791742] task:targetcli       state:D stack:    0 pid:28666 ppid:  5510 flags:0x00004080
[ 1229.791743] Call Trace:
[ 1229.791744]  __schedule+0x2d1/0x830
[ 1229.791746]  schedule+0x35/0xa0
[ 1229.791748]  schedule_preempt_disabled+0xa/0x10
[ 1229.791749]  __mutex_lock.isra.7+0x310/0x420
[ 1229.791751]  rdma_destroy_id+0x15/0x20 [rdma_cm]
[ 1229.791755]  isert_connect_release+0x115/0x130 [ib_isert]
[ 1229.791757]  isert_free_np+0x87/0x140 [ib_isert]
[ 1229.791761]  iscsit_del_np+0x74/0x120 [iscsi_target_mod]
[ 1229.791776]  lio_target_np_driver_store+0xe9/0x140 [iscsi_target_mod]
[ 1229.791784]  configfs_write_file+0xb2/0x110
[ 1229.791788]  vfs_write+0xa5/0x1a0
[ 1229.791792]  ksys_write+0x4f/0xb0
[ 1229.791794]  do_syscall_64+0x5b/0x1a0
[ 1229.791798]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

kernel test robot June 2, 2023, 2:38 a.m. UTC | #1
Hi Saravanan,

kernel test robot noticed the following build errors:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on linus/master v6.4-rc4 next-20230601]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravanan-Vajravel/IB-isert-Fix-dead-lock-in-ib_isert/20230601-225628
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
patch link:    https://lore.kernel.org/r/20230601094220.64810-2-saravanan.vajravel%40broadcom.com
patch subject: [PATCH for-rc 1/3] IB/isert: Fix dead lock in ib_isert
config: x86_64-buildonly-randconfig-r005-20230602 (https://download.01.org/0day-ci/archive/20230602/202306021057.prO0j0bN-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/c7031144a2a9b6f14201977b35600ab80ee30e09
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Saravanan-Vajravel/IB-isert-Fix-dead-lock-in-ib_isert/20230601-225628
        git checkout c7031144a2a9b6f14201977b35600ab80ee30e09
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/infiniband/ulp/isert/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306021057.prO0j0bN-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/infiniband/ulp/isert/ib_isert.c: In function 'isert_free_np':
>> drivers/infiniband/ulp/isert/ib_isert.c:2454:75: error: expected ';' before '}' token
    2454 |                         list_move_tail(&isert_conn->node, &drop_conn_list)
         |                                                                           ^
         |                                                                           ;
    2455 |                 }
         |                 ~                                                          


vim +2454 drivers/infiniband/ulp/isert/ib_isert.c

  2428	
  2429	static void
  2430	isert_free_np(struct iscsi_np *np)
  2431	{
  2432		struct isert_np *isert_np = np->np_context;
  2433		struct isert_conn *isert_conn, *n;
  2434		LIST_HEAD(drop_conn_list);
  2435	
  2436		if (isert_np->cm_id)
  2437			rdma_destroy_id(isert_np->cm_id);
  2438	
  2439		/*
  2440		 * FIXME: At this point we don't have a good way to insure
  2441		 * that at this point we don't have hanging connections that
  2442		 * completed RDMA establishment but didn't start iscsi login
  2443		 * process. So work-around this by cleaning up what ever piled
  2444		 * up in accepted and pending lists.
  2445		 */
  2446		mutex_lock(&isert_np->mutex);
  2447		if (!list_empty(&isert_np->pending)) {
  2448			isert_info("Still have isert pending connections\n");
  2449			list_for_each_entry_safe(isert_conn, n,
  2450						 &isert_np->pending,
  2451						 node) {
  2452				isert_info("cleaning isert_conn %p state (%d)\n",
  2453					   isert_conn, isert_conn->state);
> 2454				list_move_tail(&isert_conn->node, &drop_conn_list)
  2455			}
  2456		}
  2457	
  2458		if (!list_empty(&isert_np->accepted)) {
  2459			isert_info("Still have isert accepted connections\n");
  2460			list_for_each_entry_safe(isert_conn, n,
  2461						 &isert_np->accepted,
  2462						 node) {
  2463				isert_info("cleaning isert_conn %p state (%d)\n",
  2464					   isert_conn, isert_conn->state);
  2465				list_move_tail(&isert_conn->node, &drop_conn_list);
  2466			}
  2467		}
  2468		mutex_unlock(&isert_np->mutex);
  2469	
  2470		list_for_each_entry_safe(isert_conn, n, &drop_conn_list, node) {
  2471			list_del_init(&isert_conn->node);
  2472			isert_connect_release(isert_conn);
  2473		}
  2474	
  2475		np->np_context = NULL;
  2476		kfree(isert_np);
  2477	}
  2478
kernel test robot June 2, 2023, 3:09 a.m. UTC | #2
Hi Saravanan,

kernel test robot noticed the following build errors:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on linus/master v6.4-rc4 next-20230601]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravanan-Vajravel/IB-isert-Fix-dead-lock-in-ib_isert/20230601-225628
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
patch link:    https://lore.kernel.org/r/20230601094220.64810-2-saravanan.vajravel%40broadcom.com
patch subject: [PATCH for-rc 1/3] IB/isert: Fix dead lock in ib_isert
config: arm64-randconfig-r025-20230531 (https://download.01.org/0day-ci/archive/20230602/202306021038.ms6aIjpB-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 4faf3aaf28226a4e950c103a14f6fc1d1fdabb1b)
reproduce (this is a W=1 build):
        mkdir -p ~/bin
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/c7031144a2a9b6f14201977b35600ab80ee30e09
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Saravanan-Vajravel/IB-isert-Fix-dead-lock-in-ib_isert/20230601-225628
        git checkout c7031144a2a9b6f14201977b35600ab80ee30e09
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang ~/bin/make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang ~/bin/make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/infiniband/ulp/isert/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306021038.ms6aIjpB-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/infiniband/ulp/isert/ib_isert.c:2454:54: error: expected ';' after expression
                           list_move_tail(&isert_conn->node, &drop_conn_list)
                                                                             ^
                                                                             ;
   1 error generated.


vim +2454 drivers/infiniband/ulp/isert/ib_isert.c

  2428	
  2429	static void
  2430	isert_free_np(struct iscsi_np *np)
  2431	{
  2432		struct isert_np *isert_np = np->np_context;
  2433		struct isert_conn *isert_conn, *n;
  2434		LIST_HEAD(drop_conn_list);
  2435	
  2436		if (isert_np->cm_id)
  2437			rdma_destroy_id(isert_np->cm_id);
  2438	
  2439		/*
  2440		 * FIXME: At this point we don't have a good way to insure
  2441		 * that at this point we don't have hanging connections that
  2442		 * completed RDMA establishment but didn't start iscsi login
  2443		 * process. So work-around this by cleaning up what ever piled
  2444		 * up in accepted and pending lists.
  2445		 */
  2446		mutex_lock(&isert_np->mutex);
  2447		if (!list_empty(&isert_np->pending)) {
  2448			isert_info("Still have isert pending connections\n");
  2449			list_for_each_entry_safe(isert_conn, n,
  2450						 &isert_np->pending,
  2451						 node) {
  2452				isert_info("cleaning isert_conn %p state (%d)\n",
  2453					   isert_conn, isert_conn->state);
> 2454				list_move_tail(&isert_conn->node, &drop_conn_list)
  2455			}
  2456		}
  2457	
  2458		if (!list_empty(&isert_np->accepted)) {
  2459			isert_info("Still have isert accepted connections\n");
  2460			list_for_each_entry_safe(isert_conn, n,
  2461						 &isert_np->accepted,
  2462						 node) {
  2463				isert_info("cleaning isert_conn %p state (%d)\n",
  2464					   isert_conn, isert_conn->state);
  2465				list_move_tail(&isert_conn->node, &drop_conn_list);
  2466			}
  2467		}
  2468		mutex_unlock(&isert_np->mutex);
  2469	
  2470		list_for_each_entry_safe(isert_conn, n, &drop_conn_list, node) {
  2471			list_del_init(&isert_conn->node);
  2472			isert_connect_release(isert_conn);
  2473		}
  2474	
  2475		np->np_context = NULL;
  2476		kfree(isert_np);
  2477	}
  2478
Sagi Grimberg June 5, 2023, 10:45 p.m. UTC | #3
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Sagi Grimberg June 5, 2023, 10:46 p.m. UTC | #4
> Hi Saravanan,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on rdma/for-next]
> [also build test ERROR on linus/master v6.4-rc4 next-20230601]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Saravanan-Vajravel/IB-isert-Fix-dead-lock-in-ib_isert/20230601-225628
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
> patch link:    https://lore.kernel.org/r/20230601094220.64810-2-saravanan.vajravel%40broadcom.com
> patch subject: [PATCH for-rc 1/3] IB/isert: Fix dead lock in ib_isert
> config: x86_64-buildonly-randconfig-r005-20230602 (https://download.01.org/0day-ci/archive/20230602/202306021057.prO0j0bN-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build):
>          # https://github.com/intel-lab-lkp/linux/commit/c7031144a2a9b6f14201977b35600ab80ee30e09
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Saravanan-Vajravel/IB-isert-Fix-dead-lock-in-ib_isert/20230601-225628
>          git checkout c7031144a2a9b6f14201977b35600ab80ee30e09
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          make W=1 O=build_dir ARCH=x86_64 olddefconfig
>          make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/infiniband/ulp/isert/
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202306021057.prO0j0bN-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>     drivers/infiniband/ulp/isert/ib_isert.c: In function 'isert_free_np':
>>> drivers/infiniband/ulp/isert/ib_isert.c:2454:75: error: expected ';' before '}' token
>      2454 |                         list_move_tail(&isert_conn->node, &drop_conn_list)
>           |                                                                           ^
>           |                                                                           ;
>      2455 |                 }
>           |                 ~
> 

Obviously this would have been even better properly compiling..
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index f290cd49698e..b3471ac82c1a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2431,6 +2431,7 @@  isert_free_np(struct iscsi_np *np)
 {
 	struct isert_np *isert_np = np->np_context;
 	struct isert_conn *isert_conn, *n;
+	LIST_HEAD(drop_conn_list);
 
 	if (isert_np->cm_id)
 		rdma_destroy_id(isert_np->cm_id);
@@ -2450,7 +2451,7 @@  isert_free_np(struct iscsi_np *np)
 					 node) {
 			isert_info("cleaning isert_conn %p state (%d)\n",
 				   isert_conn, isert_conn->state);
-			isert_connect_release(isert_conn);
+			list_move_tail(&isert_conn->node, &drop_conn_list)
 		}
 	}
 
@@ -2461,11 +2462,16 @@  isert_free_np(struct iscsi_np *np)
 					 node) {
 			isert_info("cleaning isert_conn %p state (%d)\n",
 				   isert_conn, isert_conn->state);
-			isert_connect_release(isert_conn);
+			list_move_tail(&isert_conn->node, &drop_conn_list);
 		}
 	}
 	mutex_unlock(&isert_np->mutex);
 
+	list_for_each_entry_safe(isert_conn, n, &drop_conn_list, node) {
+		list_del_init(&isert_conn->node);
+		isert_connect_release(isert_conn);
+	}
+
 	np->np_context = NULL;
 	kfree(isert_np);
 }