diff mbox series

[04/12] qla2xxx: Fix hang during NVME session tear down

Message ID 20210817051315.2477-5-njavali@marvell.com (mailing list archive)
State Superseded
Headers show
Series qla2xxx driver bug fixes | expand

Commit Message

Nilesh Javali Aug. 17, 2021, 5:13 a.m. UTC
From: Arun Easi <aeasi@marvell.com>

The following hung task call trace was seen:

    [ 1230.183294] INFO: task qla2xxx_wq:523 blocked for more than 120 seconds.
    [ 1230.197749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1230.205585] qla2xxx_wq      D    0   523      2 0x80004000
    [ 1230.205636] Workqueue: qla2xxx_wq qlt_free_session_done [qla2xxx]
    [ 1230.205639] Call Trace:
    [ 1230.208100]  __schedule+0x2c4/0x700
    [ 1230.211607]  schedule+0x38/0xa0
    [ 1230.214769]  schedule_timeout+0x246/0x2f0
    [ 1230.222651]  wait_for_completion+0x97/0x100
    [ 1230.226921]  qlt_free_session_done+0x6a0/0x6f0 [qla2xxx]
    [ 1230.232254]  process_one_work+0x1a7/0x360

..when device side port resets were done.

Abort threads were getting out without processing due to the "deleted"
flag check. The delete thread, meanwhile, could not proceed with a
logout (that would have cleared out pending requests) as the logout iocb
work was not progressing. It appears like the hung qlt_free_session_done()
thread is causing the ha->wq works on hold. The qlt_free_session_done()
was hung waiting for nvme_fc_unregister_remoteport() + localport_delete cb
to be complete, which would only happen when all IOs are released.

Fix this by allowing abort to progress until device delete is completely
done. This should make the qlt_free_session_done proceed without hang
and thus clear up the deadlock.

Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
index 05cad06ff165..d294b590581e 100644
--- a/drivers/scsi/qla2xxx/qla_nvme.c
+++ b/drivers/scsi/qla2xxx/qla_nvme.c
@@ -233,7 +233,7 @@  static void qla_nvme_abort_work(struct work_struct *work)
 	       "%s called for sp=%p, hndl=%x on fcport=%p deleted=%d\n",
 	       __func__, sp, sp->handle, fcport, fcport->deleted);
 
-	if (!ha->flags.fw_started || fcport->deleted)
+	if (!ha->flags.fw_started || fcport->deleted == QLA_SESS_DELETED)
 		goto out;
 
 	if (ha->flags.host_shutting_down) {