diff mbox

Kernel crash with target-pending/for-next

Message ID 1496373556.27407.210.camel@haakon3.risingtidesystems.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicholas A. Bellinger June 2, 2017, 3:19 a.m. UTC
On Thu, 2017-06-01 at 18:17 +0000, Bart Van Assche wrote:
> Hello Nic,
> 
> This morning I started testing your for-next branch (commit b968ec8ff101
> with no changes). After a few seconds a kernel crash was triggered (see below).
> In the many tests I ran during the past two years of the ib_srpt driver I had
> not encountered any crash triggered by ib_srpt so this must be a recently
> introduced regression in the target core. This regression most likely has been
> introduced by one of these patches:
> 
> b968ec8ff101 target/configfs: Kill se_lun->lun_link_magic
> baacb554a73b target/configfs: Kill se_device->dev_link_magic
> ba438d7c8d39 target: Avoid target_shutdown_sessions loop during queue_depth change

Alas, the local list and unprotected list_del_init() ba438d7c8d won't
work for all fabric cases.

Here's the updated version to restore original behavior for se_node_acl
delete, but still avoid the endless loop with the iscsi-target specific
case where se_node_acl->queue_depth changes.

Care to verify on ib_srpt, or just a report and never confirm..?

From 9fdc33be4ef60ba87c8bf8a4dab6e756ec3b06e6 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Thu, 1 Jun 2017 08:57:15 +0200
Subject: [PATCH-v2] target: Avoid target_shutdown_sessions loop during
 queue_depth change

When target_shutdown_sessions() is invoked to shutdown all active
sessions associated with a se_node_acl when se_node_acl->queue_depth
is changed via core_tpg_set_initiator_node_queue_depth(), it's
possible that new connections reconnect immediately after explicit
shutdown occurs via target_shutdown_sessions().

Which means it's possible for the newly reconnected session with
the proper queue_depth can be shutdown multiple times when
target_shutdown_sessions() loops to drain all active sessions
for all cases.

This was regression was introduced by:

  commit bc6e6bb470eda42f44bcac96c261cff1216577b3
  Author: Christoph Hellwig <hch@lst.de>
  Date:   Mon May 2 15:45:19 2016 +0200

      target: consolidate and fix session shutdown

To avoid this case, instead change target_shutdown_sessions() to
pass 'do_restart' and avoid the looping drain of sessions when
invoked via core_tpg_set_initiator_node_queue_depth(), but still
loop during normal se_node_acl delete until all associated
sessions have been shutdown.

(v2 - go back to the original version instead of a local list,
 in order to protect list_del_init(&sess->sess_acl_list) from
 transport_deregister_session_configfs.
 Also use safe list walking in target_shutdown_sessions - nab)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/target/target_core_tpg.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bart Van Assche June 2, 2017, 4:30 p.m. UTC | #1
On Thu, 2017-06-01 at 20:19 -0700, Nicholas A. Bellinger wrote:
> Here's the updated version to restore original behavior for se_node_acl
> delete, but still avoid the endless loop with the iscsi-target specific
> case where se_node_acl->queue_depth changes.
> 
> Care to verify on ib_srpt, or just a report and never confirm..?

Hello Nic,

This is what I ran into with commit 4f61e1e687c4 ("target: Avoid
target_shutdown_sessions loop during queue_depth change") merged with kernel
v4.12-rc3. This is a crash I had never seen before. This crash disappears if
I revert commit 4f61e1e687c4 so I think this indicates a bug introduced by
that commit:

ib_srpt:srpt_close_ch: ib_srpt 0x0000000000000000e41d2d03000a6d51-1114: queued zerolength write
ib_srpt:srpt_release_channel_work: ib_srpt srpt_release_channel_work: 0x0000000000000000e41d2d03000a6d51-1114; release_done =           (null)
------------[ cut here ]------------
kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:2770!
invalid opcode: 0000 [#1] SMP
Modules linked in: dm_service_time target_core_user uio target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod brd netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables x_tables libcrc32c ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib af_packet ib_core sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel tg3 ipmi_ssif iTCO_wdt ptp aes_x86_64 mlx4_core mei_me pps_core iTCO_vendor_support
ipmi_si crypto_simd
 libphy glue_helper ioatdma dcdbas mei pcspkr lpc_ich devlink ipmi_devintf cryptd ipmi_msghandler tpm_tis mfd_core tpm_tis_core wmi shpchp tpm dca acpi_pad button hid_generic usbhid mgag200
i2c_algo_bit crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm sr_mod cdrom ehci_pci xhci_pci ehci_hcd xhci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac
scsi_dh_emc scsi_dh_alua autofs4 [last unloaded: scsi_transport_srp]
CPU: 5 PID: 16622 Comm: rmdir Tainted: G        W I     4.12.0-rc3-dbg+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
task: ffff8803730eb1c0 task.stack: ffffc900072e4000
RIP: 0010:srpt_close_session+0x161/0x190 [ib_srpt]
RSP: 0018:ffffc900072e7cd0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88044997b858 RCX: 0000000000000001
RDX: 0000000029ebae12 RSI: 0000000000000000 RDI: ffffffffa066b587
RBP: ffffc900072e7d88 R08: ffffffff82590e30 R09: 0000000000000000
R10: ffffc900072e7cc0 R11: ffffffffa066b587 R12: ffff88046c2221e8
R13: ffff88038d56bf40 R14: ffff88038dd21bf8 R15: ffff88038d56bf80
FS:  00007f12eb5bc700(0000) GS:ffff88046ef40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f12eb0fa700 CR3: 000000036e283000 CR4: 00000000001406e0
Call Trace:
 target_shutdown_sessions+0xc8/0x100 [target_core_mod]
 core_tpg_del_initiator_node_acl+0x97/0x140 [target_core_mod]
 target_fabric_nacl_base_release+0x25/0x30 [target_core_mod]
 config_item_put+0x73/0xb0 [configfs]
 configfs_rmdir+0x1ab/0x300 [configfs]
 vfs_rmdir+0x73/0x130
 do_rmdir+0x1a8/0x1f0
 SyS_rmdir+0x16/0x20
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7f12eb105f67
RSP: 002b:00007ffe1c8cff88 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f12eb105f67
RDX: 00007f12eb3c6e80 RSI: 00007ffe1c8d00b8 RDI: 00007ffe1c8d13a8
RBP: 0000000000000011 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000374 R11: 0000000000000246 R12: 00005557020e6ca0
R13: 00007ffe1c8d00b0 R14: 0000000000000000 R15: 0000000000000000
Code: 8d 93 f0 00 00 00 48 c7 c6 0f e1 66 a0 48 c7 c7 c8 14 67 a0 44 8b 83 b0 00 00 00 8b 88 d8 00 00 00 e8 34 8f d2 e0 e9 55 ff ff ff <0f> 0b 48 8b 43 08 4c 89 e2 48 c7 c6 10 fb 66 a0 48 c7 c7 08 ea 
RIP: srpt_close_session+0x161/0x190 [ib_srpt] RSP: ffffc900072e7cd0
---[ end trace 1eb31543ac57f02a ]---
ib_srpt Received CM TimeWait exit for ch 0x0000000000000000e41d2d03000a6d52-1121.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index 3691373..1b2b60e 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -336,14 +336,14 @@  struct se_node_acl *core_tpg_add_initiator_node_acl(
 	return acl;
 }
 
-static void target_shutdown_sessions(struct se_node_acl *acl)
+static void target_shutdown_sessions(struct se_node_acl *acl, bool do_restart)
 {
-	struct se_session *sess;
+	struct se_session *sess, *sess_tmp;
 	unsigned long flags;
 
 restart:
 	spin_lock_irqsave(&acl->nacl_sess_lock, flags);
-	list_for_each_entry(sess, &acl->acl_sess_list, sess_acl_list) {
+	list_for_each_entry_safe(sess, sess_tmp, &acl->acl_sess_list, sess_acl_list) {
 		if (sess->sess_tearing_down)
 			continue;
 
@@ -352,7 +352,11 @@  static void target_shutdown_sessions(struct se_node_acl *acl)
 
 		if (acl->se_tpg->se_tpg_tfo->close_session)
 			acl->se_tpg->se_tpg_tfo->close_session(sess);
-		goto restart;
+
+		if (do_restart)
+			goto restart;
+
+		spin_lock_irqsave(&acl->nacl_sess_lock, flags);
 	}
 	spin_unlock_irqrestore(&acl->nacl_sess_lock, flags);
 }
@@ -367,7 +371,7 @@  void core_tpg_del_initiator_node_acl(struct se_node_acl *acl)
 	list_del(&acl->acl_list);
 	mutex_unlock(&tpg->acl_node_mutex);
 
-	target_shutdown_sessions(acl);
+	target_shutdown_sessions(acl, true);
 
 	target_put_nacl(acl);
 	/*
@@ -414,7 +418,7 @@  int core_tpg_set_initiator_node_queue_depth(
 	/*
 	 * Shutdown all pending sessions to force session reinstatement.
 	 */
-	target_shutdown_sessions(acl);
+	target_shutdown_sessions(acl, false);
 
 	pr_debug("Successfully changed queue depth to: %d for Initiator"
 		" Node: %s on %s Target Portal Group: %u\n", acl->queue_depth,