diff mbox series

[1/2] qla2xxx: perform lockless command completion in abort path

Message ID 20230313043711.13500-2-njavali@marvell.com (mailing list archive)
State Accepted
Commit 0367076b0817d5c75dfb83001ce7ce5c64d803a9
Headers show
Series qla2xxx driver fixes | expand

Commit Message

Nilesh Javali March 13, 2023, 4:37 a.m. UTC
While adding and removing the controller, call trace was observed,

WARNING: CPU: 3 PID: 623596 at kernel/dma/mapping.c:532 dma_free_attrs+0x33/0x50
CPU: 3 PID: 623596 Comm: sh Kdump: loaded Not tainted 5.14.0-96.el9.x86_64 #1
RIP: 0010:dma_free_attrs+0x33/0x50

Call Trace:
   qla2x00_async_sns_sp_done+0x107/0x1b0 [qla2xxx]
   qla2x00_abort_srb+0x8e/0x250 [qla2xxx]
   ? ql_dbg+0x70/0x100 [qla2xxx]
   __qla2x00_abort_all_cmds+0x108/0x190 [qla2xxx]
   qla2x00_abort_all_cmds+0x24/0x70 [qla2xxx]
   qla2x00_abort_isp_cleanup+0x305/0x3e0 [qla2xxx]
   qla2x00_remove_one+0x364/0x400 [qla2xxx]
   pci_device_remove+0x36/0xa0
   __device_release_driver+0x17a/0x230
   device_release_driver+0x24/0x30
   pci_stop_bus_device+0x68/0x90
   pci_stop_and_remove_bus_device_locked+0x16/0x30
   remove_store+0x75/0x90
   kernfs_fop_write_iter+0x11c/0x1b0
   new_sync_write+0x11f/0x1b0
   vfs_write+0x1eb/0x280
   ksys_write+0x5f/0xe0
   do_syscall_64+0x5c/0x80
   ? do_user_addr_fault+0x1d8/0x680
   ? do_syscall_64+0x69/0x80
   ? exc_page_fault+0x62/0x140
   ? asm_exc_page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xae

The command was completed in abort path during driver unload
with a lock held causing the warning in abort path.
Hence complete the command without any lock held.

Reported-by: Lin Li <lilin@redhat.com>
Tested-by: Lin Li <lilin@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_os.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Himanshu Madhani March 14, 2023, 1 a.m. UTC | #1
> On Mar 12, 2023, at 9:37 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> While adding and removing the controller, call trace was observed,
> 
> WARNING: CPU: 3 PID: 623596 at kernel/dma/mapping.c:532 dma_free_attrs+0x33/0x50
> CPU: 3 PID: 623596 Comm: sh Kdump: loaded Not tainted 5.14.0-96.el9.x86_64 #1
> RIP: 0010:dma_free_attrs+0x33/0x50
> 
> Call Trace:
>   qla2x00_async_sns_sp_done+0x107/0x1b0 [qla2xxx]
>   qla2x00_abort_srb+0x8e/0x250 [qla2xxx]
>   ? ql_dbg+0x70/0x100 [qla2xxx]
>   __qla2x00_abort_all_cmds+0x108/0x190 [qla2xxx]
>   qla2x00_abort_all_cmds+0x24/0x70 [qla2xxx]
>   qla2x00_abort_isp_cleanup+0x305/0x3e0 [qla2xxx]
>   qla2x00_remove_one+0x364/0x400 [qla2xxx]
>   pci_device_remove+0x36/0xa0
>   __device_release_driver+0x17a/0x230
>   device_release_driver+0x24/0x30
>   pci_stop_bus_device+0x68/0x90
>   pci_stop_and_remove_bus_device_locked+0x16/0x30
>   remove_store+0x75/0x90
>   kernfs_fop_write_iter+0x11c/0x1b0
>   new_sync_write+0x11f/0x1b0
>   vfs_write+0x1eb/0x280
>   ksys_write+0x5f/0xe0
>   do_syscall_64+0x5c/0x80
>   ? do_user_addr_fault+0x1d8/0x680
>   ? do_syscall_64+0x69/0x80
>   ? exc_page_fault+0x62/0x140
>   ? asm_exc_page_fault+0x8/0x30
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> The command was completed in abort path during driver unload
> with a lock held causing the warning in abort path.
> Hence complete the command without any lock held.
> 
> Reported-by: Lin Li <lilin@redhat.com>
> Tested-by: Lin Li <lilin@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_os.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 545167627e48..d2f6c0546587 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -1858,6 +1858,17 @@ __qla2x00_abort_all_cmds(struct qla_qpair *qp, int res)
> for (cnt = 1; cnt < req->num_outstanding_cmds; cnt++) {
> sp = req->outstanding_cmds[cnt];
> if (sp) {
> + /*
> + * perform lockless completion while driver unload
> + */
> + if (qla2x00_chip_is_down(vha)) {
> + req->outstanding_cmds[cnt] = NULL;
> + spin_unlock_irqrestore(qp->qp_lock_ptr, flags);
> + sp->done(sp, res);
> + spin_lock_irqsave(qp->qp_lock_ptr, flags);
> + continue;
> + }
> +
> switch (sp->cmd_type) {
> case TYPE_SRB:
> qla2x00_abort_srb(qp, sp, res, &flags);
> -- 
> 2.19.0.rc0
> 

Looks Good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com <mailto:himanshu.madhani@oracle.com>>
John Meneghini March 14, 2023, 2:16 a.m. UTC | #2
This patch has been tested at Red Hat and found to fix the reported problem.

Test-by: Lin Li <lilin@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>


On 3/13/23 00:37, Nilesh Javali wrote:
> While adding and removing the controller, call trace was observed,
> 
> WARNING: CPU: 3 PID: 623596 at kernel/dma/mapping.c:532 dma_free_attrs+0x33/0x50
> CPU: 3 PID: 623596 Comm: sh Kdump: loaded Not tainted 5.14.0-96.el9.x86_64 #1
> RIP: 0010:dma_free_attrs+0x33/0x50
> 
> Call Trace:
>     qla2x00_async_sns_sp_done+0x107/0x1b0 [qla2xxx]
>     qla2x00_abort_srb+0x8e/0x250 [qla2xxx]
>     ? ql_dbg+0x70/0x100 [qla2xxx]
>     __qla2x00_abort_all_cmds+0x108/0x190 [qla2xxx]
>     qla2x00_abort_all_cmds+0x24/0x70 [qla2xxx]
>     qla2x00_abort_isp_cleanup+0x305/0x3e0 [qla2xxx]
>     qla2x00_remove_one+0x364/0x400 [qla2xxx]
>     pci_device_remove+0x36/0xa0
>     __device_release_driver+0x17a/0x230
>     device_release_driver+0x24/0x30
>     pci_stop_bus_device+0x68/0x90
>     pci_stop_and_remove_bus_device_locked+0x16/0x30
>     remove_store+0x75/0x90
>     kernfs_fop_write_iter+0x11c/0x1b0
>     new_sync_write+0x11f/0x1b0
>     vfs_write+0x1eb/0x280
>     ksys_write+0x5f/0xe0
>     do_syscall_64+0x5c/0x80
>     ? do_user_addr_fault+0x1d8/0x680
>     ? do_syscall_64+0x69/0x80
>     ? exc_page_fault+0x62/0x140
>     ? asm_exc_page_fault+0x8/0x30
>     entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> The command was completed in abort path during driver unload
> with a lock held causing the warning in abort path.
> Hence complete the command without any lock held.
> 
> Reported-by: Lin Li <lilin@redhat.com>
> Tested-by: Lin Li <lilin@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
>   drivers/scsi/qla2xxx/qla_os.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 545167627e48..d2f6c0546587 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -1858,6 +1858,17 @@ __qla2x00_abort_all_cmds(struct qla_qpair *qp, int res)
>   	for (cnt = 1; cnt < req->num_outstanding_cmds; cnt++) {
>   		sp = req->outstanding_cmds[cnt];
>   		if (sp) {
> +			/*
> +			 * perform lockless completion while driver unload
> +			 */
> +			if (qla2x00_chip_is_down(vha)) {
> +				req->outstanding_cmds[cnt] = NULL;
> +				spin_unlock_irqrestore(qp->qp_lock_ptr, flags);
> +				sp->done(sp, res);
> +				spin_lock_irqsave(qp->qp_lock_ptr, flags);
> +				continue;
> +			}
> +
>   			switch (sp->cmd_type) {
>   			case TYPE_SRB:
>   				qla2x00_abort_srb(qp, sp, res, &flags);
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 545167627e48..d2f6c0546587 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1858,6 +1858,17 @@  __qla2x00_abort_all_cmds(struct qla_qpair *qp, int res)
 	for (cnt = 1; cnt < req->num_outstanding_cmds; cnt++) {
 		sp = req->outstanding_cmds[cnt];
 		if (sp) {
+			/*
+			 * perform lockless completion while driver unload
+			 */
+			if (qla2x00_chip_is_down(vha)) {
+				req->outstanding_cmds[cnt] = NULL;
+				spin_unlock_irqrestore(qp->qp_lock_ptr, flags);
+				sp->done(sp, res);
+				spin_lock_irqsave(qp->qp_lock_ptr, flags);
+				continue;
+			}
+
 			switch (sp->cmd_type) {
 			case TYPE_SRB:
 				qla2x00_abort_srb(qp, sp, res, &flags);