Message ID | 20180403215042.12824-1-david.carroll@microsemi.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
> -----Original Message----- > From: Dave Carroll [mailto:david.carroll@microsemi.com] > Sent: Wednesday, April 4, 2018 3:21 AM > To: Martin K . Petersen <martin.petersen@oracle.com>; James Bottomley > <jejb@linux.vnet.ibm.com> > Cc: Dave Carroll <david.carroll@microsemi.com>; linux-scsi <linux- > scsi@vger.kernel.org>; dl-esc-Aacraid Linux Driver > <aacraid@microsemi.com>; Scott Benesh <scott.benesh@microsemi.com> > Subject: [PATCH] aacraid: Insure command thread is not recursively stopped > > If a recursive IOP_RESET is invoked, usually due to the eh_thread handling > errors after the first reset, be sure we flag that the command thread has > been stopped to avoid an Oops of the form; > > [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not > tainted 4.14.0-49.el7a.ppc64le #1 > [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000 > [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: > c00000000024bc10 > [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0- > 49.el7a.ppc64le) > [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: > 22084022 XER: 20040000 > [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: > 40000000 SOFTE: 1 > [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 > 0000000000000000 > [ 336.620435] GPR04: 000000000000000c 000000000000000c > 9000000000009033 0000000000000477 > [ 336.620435] GPR08: 0000000000000477 0000000000000000 > 0000000000000000 c008000010f7d940 > [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 > c0000000001708a8 c000003fe3b881d8 > [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 > 000000000000001e > [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 > 0000000000000001 c000003fe3b88000 > [ 336.620435] GPR24: 0000000000000003 0000000000000002 > c000003fe3b88840 c000003fe3b887e8 > [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 > c000003fc8181700 > [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160 > [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0 > [ 336.620804] Call Trace: > [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 > (unreliable) > [ 336.620853] [c000003fd61a79d0] [c00000000013038c] > __put_task_struct+0x8c/0x1f0 > [ 336.620889] [c000003fd61a7a00] [c000000000171418] > kthread_stop+0x1e8/0x1f0 > [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] > aac_reset_adapter+0x14c/0x8d0 [aacraid] > [ 336.620959] [c000003fd61a7b00] [c008000010f60174] > aac_eh_host_reset+0x84/0x100 [aacraid] > [ 336.621010] [c000003fd61a7b30] [c000000000864f24] > scsi_try_host_reset+0x74/0x180 > [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] > scsi_eh_ready_devs+0xc00/0x14d0 > [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] > scsi_error_handler+0x550/0x730 > [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0 > [ 336.639031] [c000003fd61a7e30] [c00000000000b528] > ret_from_kernel_thread+0x5c/0xb4 > [ 336.645971] Instruction dump: > [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 > 60000000 60000000 > [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff > 7d40192d 40c2fff4 > [ 336.663997] -[ end trace 4640cf8d4945ad95 ]- > > So flag when the thread is stopped by setting the thread pointer to NULL. > > Signed-off-by: Dave Carroll <david.carroll@microsemi.com> Reviewed-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Dave, > If a recursive IOP_RESET is invoked, usually due to the eh_thread > handling errors after the first reset, be sure we flag that the > command thread has been stopped to avoid an Oops of the form; Applied to 4.17/scsi-fixes. Thanks!
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c index 84858d5..0156c96 100644 --- a/drivers/scsi/aacraid/commsup.c +++ b/drivers/scsi/aacraid/commsup.c @@ -1502,9 +1502,10 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type) host = aac->scsi_host_ptr; scsi_block_requests(host); aac_adapter_disable_int(aac); - if (aac->thread->pid != current->pid) { + if (aac->thread && aac->thread->pid != current->pid) { spin_unlock_irq(host->host_lock); kthread_stop(aac->thread); + aac->thread = NULL; jafo = 1; } @@ -1591,6 +1592,7 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type) aac->name); if (IS_ERR(aac->thread)) { retval = PTR_ERR(aac->thread); + aac->thread = NULL; goto out; } } diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c index 2664ea0..f24fb94 100644 --- a/drivers/scsi/aacraid/linit.c +++ b/drivers/scsi/aacraid/linit.c @@ -1562,6 +1562,7 @@ static void __aac_shutdown(struct aac_dev * aac) up(&fib->event_wait); } kthread_stop(aac->thread); + aac->thread = NULL; } aac_send_shutdown(aac);
If a recursive IOP_RESET is invoked, usually due to the eh_thread handling errors after the first reset, be sure we flag that the command thread has been stopped to avoid an Oops of the form; [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not tainted 4.14.0-49.el7a.ppc64le #1 [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000 [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: c00000000024bc10 [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-49.el7a.ppc64le) [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22084022 XER: 20040000 [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 0000000000000000 [ 336.620435] GPR04: 000000000000000c 000000000000000c 9000000000009033 0000000000000477 [ 336.620435] GPR08: 0000000000000477 0000000000000000 0000000000000000 c008000010f7d940 [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 c0000000001708a8 c000003fe3b881d8 [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 000000000000001e [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 0000000000000001 c000003fe3b88000 [ 336.620435] GPR24: 0000000000000003 0000000000000002 c000003fe3b88840 c000003fe3b887e8 [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 c000003fc8181700 [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160 [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0 [ 336.620804] Call Trace: [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 (unreliable) [ 336.620853] [c000003fd61a79d0] [c00000000013038c] __put_task_struct+0x8c/0x1f0 [ 336.620889] [c000003fd61a7a00] [c000000000171418] kthread_stop+0x1e8/0x1f0 [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] aac_reset_adapter+0x14c/0x8d0 [aacraid] [ 336.620959] [c000003fd61a7b00] [c008000010f60174] aac_eh_host_reset+0x84/0x100 [aacraid] [ 336.621010] [c000003fd61a7b30] [c000000000864f24] scsi_try_host_reset+0x74/0x180 [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] scsi_eh_ready_devs+0xc00/0x14d0 [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] scsi_error_handler+0x550/0x730 [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0 [ 336.639031] [c000003fd61a7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4 [ 336.645971] Instruction dump: [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000 60000000 [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff 7d40192d 40c2fff4 [ 336.663997] -[ end trace 4640cf8d4945ad95 ]- So flag when the thread is stopped by setting the thread pointer to NULL. Signed-off-by: Dave Carroll <david.carroll@microsemi.com> --- drivers/scsi/aacraid/commsup.c | 4 +++- drivers/scsi/aacraid/linit.c | 1 + 2 files changed, 4 insertions(+), 1 deletion(-)