diff mbox series

scsi: qla4xxx: fix potential deadlock on frwd/back lock

Message ID 20230928145401.32918-1-dg573847474@gmail.com (mailing list archive)
State Rejected
Headers show
Series scsi: qla4xxx: fix potential deadlock on frwd/back lock | expand

Commit Message

Chengfeng Ye Sept. 28, 2023, 2:54 p.m. UTC
iscsi_complete_pdu() is called by several rx callback, under
softirq context. Thus the callsite of it inside qla4xxx_task_work()
should better disable bottom half, as work queue is executed under
process context, it needs to protect race with softirq context
locking to avoid deadlock.

<potential deadlock #1>
qla4xxx_task_work()
--> iscsi_complete_pdu()
--> spin_lock(&conn->session->back_lock);
<interrupt>
   --> iscsi_tcp_data_recv_done()
   --> iscsi_complete_pdu()
   --> spin_lock(&conn->session->back_lock) (deadlock)

<potential deadlock #2>
qla4xxx_task_work()
--> iscsi_complete_pdu()
--> __iscsi_complete_pdu()
--> spin_lock(&conn->session->frwd_lock)
<interrupt>
   --> iscsi_tcp_data_recv_done()
   --> iscsi_complete_pdu()
   --> __iscsi_complete_pdu()
   --> spin_lock(&conn->session->frwd_lock) (deadlock)

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.

To avoid the problem, disable bh inside qla4xxx_task_work() before
calling iscsi_complete_pdu().

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
---
 drivers/scsi/qla4xxx/ql4_os.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Mike Christie Sept. 29, 2023, 4:03 p.m. UTC | #1
On 9/28/23 9:54 AM, Chengfeng Ye wrote:
> iscsi_complete_pdu() is called by several rx callback, under
> softirq context. Thus the callsite of it inside qla4xxx_task_work()
> should better disable bottom half, as work queue is executed under
> process context, it needs to protect race with softirq context
> locking to avoid deadlock.
> 
> <potential deadlock #1>
> qla4xxx_task_work()
> --> iscsi_complete_pdu()
> --> spin_lock(&conn->session->back_lock);
> <interrupt>
>    --> iscsi_tcp_data_recv_done()
>    --> iscsi_complete_pdu()
>    --> spin_lock(&conn->session->back_lock) (deadlock)
> 

This will not happen, because qla4xxx does not use the libiscsi_tcp
code. If you had a iscsi session with qla4xxx and another
iscsi session with iscsi_tcp, then you could get an interrupt like
this, but the "conn->session->back_lock" that's taken above would
be for the iscsi_tcp session.

So you would need to make sure that the above can't happen for
iscsi_tcp an iscsi_tcp session. Same with the below analysis.


> <potential deadlock #2>
> qla4xxx_task_work()
> --> iscsi_complete_pdu()
> --> __iscsi_complete_pdu()
> --> spin_lock(&conn->session->frwd_lock)
> <interrupt>
>    --> iscsi_tcp_data_recv_done()
>    --> iscsi_complete_pdu()
>    --> __iscsi_complete_pdu()
>    --> spin_lock(&conn->session->frwd_lock) (deadlock)
> 
> This flaw was found by an experimental static analysis tool I am
> developing for irq-related deadlock.
> 
> To avoid the problem, disable bh inside qla4xxx_task_work() before
> calling iscsi_complete_pdu().
> 
> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
> ---
>  drivers/scsi/qla4xxx/ql4_os.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
> index 675332e49a7b..c60781148e6c 100644
> --- a/drivers/scsi/qla4xxx/ql4_os.c
> +++ b/drivers/scsi/qla4xxx/ql4_os.c
> @@ -3382,7 +3382,9 @@ static void qla4xxx_task_work(struct work_struct *wdata)
>  		hdr->itt = itt;
>  		data = task_data->resp_buffer + hdr_len;
>  		data_len = task_data->resp_len - hdr_len;
> +		local_bh_disable();
>  		iscsi_complete_pdu(conn, hdr, data, data_len);
> +		local_bh_enable();
>  		break;
>  	default:
>  		ql4_printk(KERN_ERR, ha, "Passthru failed status = 0x%x\n",
Chengfeng Ye Sept. 29, 2023, 4:11 p.m. UTC | #2
Thanks for the explanation!

I thought the two locks are the same so made this bogus patch,
sorry for the confusion and thanks much for your time.

Regards,
Chengfeng
diff mbox series

Patch

diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 675332e49a7b..c60781148e6c 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -3382,7 +3382,9 @@  static void qla4xxx_task_work(struct work_struct *wdata)
 		hdr->itt = itt;
 		data = task_data->resp_buffer + hdr_len;
 		data_len = task_data->resp_len - hdr_len;
+		local_bh_disable();
 		iscsi_complete_pdu(conn, hdr, data, data_len);
+		local_bh_enable();
 		break;
 	default:
 		ql4_printk(KERN_ERR, ha, "Passthru failed status = 0x%x\n",