diff mbox

aacraid: Insure command thread is not recursively stopped

Message ID 20180403215042.12824-1-david.carroll@microsemi.com (mailing list archive)
State Accepted
Headers show

Commit Message

Dave Carroll April 3, 2018, 9:50 p.m. UTC
If a recursive IOP_RESET is invoked, usually due to the eh_thread handling 
errors after the first reset, be sure we flag that the command thread has 
been stopped to avoid an Oops of the form;

 [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not tainted 4.14.0-49.el7a.ppc64le #1
 [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000
 [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: c00000000024bc10
 [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-49.el7a.ppc64le)
 [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22084022 XER: 20040000
 [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
 [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 0000000000000000
 [ 336.620435] GPR04: 000000000000000c 000000000000000c 9000000000009033 0000000000000477
 [ 336.620435] GPR08: 0000000000000477 0000000000000000 0000000000000000 c008000010f7d940
 [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 c0000000001708a8 c000003fe3b881d8
 [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 000000000000001e
 [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 0000000000000001 c000003fe3b88000
 [ 336.620435] GPR24: 0000000000000003 0000000000000002 c000003fe3b88840 c000003fe3b887e8
 [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 c000003fc8181700
 [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160
 [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0
 [ 336.620804] Call Trace:
 [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 (unreliable)
 [ 336.620853] [c000003fd61a79d0] [c00000000013038c] __put_task_struct+0x8c/0x1f0
 [ 336.620889] [c000003fd61a7a00] [c000000000171418] kthread_stop+0x1e8/0x1f0
 [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] aac_reset_adapter+0x14c/0x8d0 [aacraid]
 [ 336.620959] [c000003fd61a7b00] [c008000010f60174] aac_eh_host_reset+0x84/0x100 [aacraid]
 [ 336.621010] [c000003fd61a7b30] [c000000000864f24] scsi_try_host_reset+0x74/0x180
 [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] scsi_eh_ready_devs+0xc00/0x14d0
 [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] scsi_error_handler+0x550/0x730
 [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0
 [ 336.639031] [c000003fd61a7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
 [ 336.645971] Instruction dump:
 [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000 60000000
 [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff 7d40192d 40c2fff4
 [ 336.663997] -[ end trace 4640cf8d4945ad95 ]-

So flag when the thread is stopped by setting the thread pointer to NULL.

Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
---
 drivers/scsi/aacraid/commsup.c | 4 +++-
 drivers/scsi/aacraid/linit.c   | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Raghava Aditya Renukunta April 4, 2018, 12:37 p.m. UTC | #1
> -----Original Message-----
> From: Dave Carroll [mailto:david.carroll@microsemi.com]
> Sent: Wednesday, April 4, 2018 3:21 AM
> To: Martin K . Petersen <martin.petersen@oracle.com>; James Bottomley
> <jejb@linux.vnet.ibm.com>
> Cc: Dave Carroll <david.carroll@microsemi.com>; linux-scsi <linux-
> scsi@vger.kernel.org>; dl-esc-Aacraid Linux Driver
> <aacraid@microsemi.com>; Scott Benesh <scott.benesh@microsemi.com>
> Subject: [PATCH] aacraid: Insure command thread is not recursively stopped
> 
> If a recursive IOP_RESET is invoked, usually due to the eh_thread handling
> errors after the first reset, be sure we flag that the command thread has
> been stopped to avoid an Oops of the form;
> 
>  [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not
> tainted 4.14.0-49.el7a.ppc64le #1
>  [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000
>  [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR:
> c00000000024bc10
>  [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-
> 49.el7a.ppc64le)
>  [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
> 22084022 XER: 20040000
>  [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR:
> 40000000 SOFTE: 1
>  [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00
> 0000000000000000
>  [ 336.620435] GPR04: 000000000000000c 000000000000000c
> 9000000000009033 0000000000000477
>  [ 336.620435] GPR08: 0000000000000477 0000000000000000
> 0000000000000000 c008000010f7d940
>  [ 336.620435] GPR12: c00000000024bc10 c000000007a33400
> c0000000001708a8 c000003fe3b881d8
>  [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000
> 000000000000001e
>  [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0
> 0000000000000001 c000003fe3b88000
>  [ 336.620435] GPR24: 0000000000000003 0000000000000002
> c000003fe3b88840 c000003fe3b887e8
>  [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000
> c000003fc8181700
>  [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160
>  [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0
>  [ 336.620804] Call Trace:
>  [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000
> (unreliable)
>  [ 336.620853] [c000003fd61a79d0] [c00000000013038c]
> __put_task_struct+0x8c/0x1f0
>  [ 336.620889] [c000003fd61a7a00] [c000000000171418]
> kthread_stop+0x1e8/0x1f0
>  [ 336.620922] [c000003fd61a7a40] [c008000010f7448c]
> aac_reset_adapter+0x14c/0x8d0 [aacraid]
>  [ 336.620959] [c000003fd61a7b00] [c008000010f60174]
> aac_eh_host_reset+0x84/0x100 [aacraid]
>  [ 336.621010] [c000003fd61a7b30] [c000000000864f24]
> scsi_try_host_reset+0x74/0x180
>  [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0]
> scsi_eh_ready_devs+0xc00/0x14d0
>  [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0]
> scsi_error_handler+0x550/0x730
>  [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0
>  [ 336.639031] [c000003fd61a7e30] [c00000000000b528]
> ret_from_kernel_thread+0x5c/0xb4
>  [ 336.645971] Instruction dump:
>  [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78
> 60000000 60000000
>  [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff
> 7d40192d 40c2fff4
>  [ 336.663997] -[ end trace 4640cf8d4945ad95 ]-
> 
> So flag when the thread is stopped by setting the thread pointer to NULL.
> 
> Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Martin K. Petersen April 10, 2018, 1:10 a.m. UTC | #2
Dave,

> If a recursive IOP_RESET is invoked, usually due to the eh_thread
> handling errors after the first reset, be sure we flag that the
> command thread has been stopped to avoid an Oops of the form;

Applied to 4.17/scsi-fixes. Thanks!
diff mbox

Patch

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 84858d5..0156c96 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1502,9 +1502,10 @@  static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type)
 	host = aac->scsi_host_ptr;
 	scsi_block_requests(host);
 	aac_adapter_disable_int(aac);
-	if (aac->thread->pid != current->pid) {
+	if (aac->thread && aac->thread->pid != current->pid) {
 		spin_unlock_irq(host->host_lock);
 		kthread_stop(aac->thread);
+		aac->thread = NULL;
 		jafo = 1;
 	}
 
@@ -1591,6 +1592,7 @@  static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type)
 					  aac->name);
 		if (IS_ERR(aac->thread)) {
 			retval = PTR_ERR(aac->thread);
+			aac->thread = NULL;
 			goto out;
 		}
 	}
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 2664ea0..f24fb94 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1562,6 +1562,7 @@  static void __aac_shutdown(struct aac_dev * aac)
 				up(&fib->event_wait);
 		}
 		kthread_stop(aac->thread);
+		aac->thread = NULL;
 	}
 
 	aac_send_shutdown(aac);