Message ID | 20200515112647.49260-1-hare@suse.de (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | fnic: to not call 'scsi_done()' for unhandled commands | expand |
On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote: > The fnic drivers assigns an ioreq structure to each command, and > severs this assignment once scsi_done() has been called and the > command has been completed. > So when traversing commands to terminate outstanding I/O we should > not call scsi_done() on commands which do not have a corresponding > ioreq structure; these commands have either never entered the driver > or have already been completed. > > Signed-off-by: Hannes Reinecke <hare@suse.com> > --- > drivers/scsi/fnic/fnic_scsi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/fnic/fnic_scsi.c > b/drivers/scsi/fnic/fnic_scsi.c > index 27535c90b248..8d2798cbd30f 100644 > --- a/drivers/scsi/fnic/fnic_scsi.c > +++ b/drivers/scsi/fnic/fnic_scsi.c > @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic *fnic, > int exclude_id) > } > if (!io_req) { > spin_unlock_irqrestore(io_lock, flags); > - goto cleanup_scsi_cmd; > + continue; > } > > CMD_SP(sc) = NULL; Hi Hannes, Thanks for this patch, but can you share what the impact was of this issue. What diod you see in logs/behavior Regards Laurence
On 5/15/20 3:06 PM, Laurence Oberman wrote: > On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote: >> The fnic drivers assigns an ioreq structure to each command, and >> severs this assignment once scsi_done() has been called and the >> command has been completed. >> So when traversing commands to terminate outstanding I/O we should >> not call scsi_done() on commands which do not have a corresponding >> ioreq structure; these commands have either never entered the driver >> or have already been completed. >> >> Signed-off-by: Hannes Reinecke <hare@suse.com> >> --- >> drivers/scsi/fnic/fnic_scsi.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/fnic/fnic_scsi.c >> b/drivers/scsi/fnic/fnic_scsi.c >> index 27535c90b248..8d2798cbd30f 100644 >> --- a/drivers/scsi/fnic/fnic_scsi.c >> +++ b/drivers/scsi/fnic/fnic_scsi.c >> @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic *fnic, >> int exclude_id) >> } >> if (!io_req) { >> spin_unlock_irqrestore(io_lock, flags); >> - goto cleanup_scsi_cmd; >> + continue; >> } >> >> CMD_SP(sc) = NULL; > > Hi Hannes, > Thanks for this patch, but can you share what the impact was of this > issue. > What diod you see in logs/behavior > Unmap the LUNs from the array, and reboot the machine. Causing a nice kernel oops in fnic_terminate_rport_io: [ 41.904013] rport-3:0-2: blocked FC remote port time out: removing rport [ 41.911625] BUG: kernel NULL pointer dereference, address: 0000000000000040 [ 41.919408] #PF: supervisor read access in kernel mode [ 41.919409] #PF: error_code(0x0000) - not-present page [ 41.919411] PGD 0 P4D 0 [ 41.919416] Oops: 0000 [#1] SMP PTI [ 41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded Tainted: G X 5.3.18-16-default #1 SLE15-SP2 (unreleased) [ 41.919421] Hardware name: Cisco Systems Inc UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007 11/06/2019 [ 41.919433] Workqueue: fc_wq_3 fc_rport_final_delete [scsi_transport_fc] [ 41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic] [ 41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c 48 05 20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0 01 00 00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48 89 [ 41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082 [ 41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: ffff8c3363358bc0 [ 41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: ffff8c33632dd8c0 [ 41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: 8080808080808080 [ 41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: 0000000000000246 [ 41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: 0000000000000080 [ 41.919459] FS: 0000000000000000(0000) GS:ffff8c33bfa40000(0000) knlGS:0000000000000000 [ 41.919461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: 00000000000606e0 [ 42.066910] Call Trace: [ 42.066929] fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc] [ 42.066935] fc_rport_final_delete+0x53/0x1e0 [scsi_transport_fc] [ 42.066943] process_one_work+0x1f4/0x3e0 [ 42.066947] worker_thread+0x2d/0x3e0 [ 42.066951] ? process_one_work+0x3e0/0x3e0 [ 42.066954] kthread+0x10d/0x130 [ 42.066957] ? kthread_park+0xa0/0xa0 [ 42.066961] ret_from_fork+0x35/0x40 Cheers, Hannes
On Fri, 2020-05-15 at 15:52 +0200, Hannes Reinecke wrote: > On 5/15/20 3:06 PM, Laurence Oberman wrote: > > On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote: > > > The fnic drivers assigns an ioreq structure to each command, and > > > severs this assignment once scsi_done() has been called and the > > > command has been completed. > > > So when traversing commands to terminate outstanding I/O we > > > should > > > not call scsi_done() on commands which do not have a > > > corresponding > > > ioreq structure; these commands have either never entered the > > > driver > > > or have already been completed. > > > > > > Signed-off-by: Hannes Reinecke <hare@suse.com> > > > --- > > > drivers/scsi/fnic/fnic_scsi.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/scsi/fnic/fnic_scsi.c > > > b/drivers/scsi/fnic/fnic_scsi.c > > > index 27535c90b248..8d2798cbd30f 100644 > > > --- a/drivers/scsi/fnic/fnic_scsi.c > > > +++ b/drivers/scsi/fnic/fnic_scsi.c > > > @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic > > > *fnic, > > > int exclude_id) > > > } > > > if (!io_req) { > > > spin_unlock_irqrestore(io_lock, flags); > > > - goto cleanup_scsi_cmd; > > > + continue; > > > } > > > > > > CMD_SP(sc) = NULL; > > > > Hi Hannes, > > Thanks for this patch, but can you share what the impact was of > > this > > issue. > > What diod you see in logs/behavior > > > > Unmap the LUNs from the array, and reboot the machine. > Causing a nice kernel oops in fnic_terminate_rport_io: > > [ 41.904013] rport-3:0-2: blocked FC remote port time out: > removing rport > [ 41.911625] BUG: kernel NULL pointer dereference, address: > 0000000000000040 > [ 41.919408] #PF: supervisor read access in kernel mode > [ 41.919409] #PF: error_code(0x0000) - not-present page > [ 41.919411] PGD 0 P4D 0 > [ 41.919416] Oops: 0000 [#1] SMP PTI > [ 41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded > Tainted: > G X 5.3.18-16-default #1 SLE15-SP2 (unreleased) > [ 41.919421] Hardware name: Cisco Systems Inc > UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007 > 11/06/2019 > [ 41.919433] Workqueue: fc_wq_3 fc_rport_final_delete > [scsi_transport_fc] > [ 41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic] > [ 41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c 48 > 05 > 20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0 01 > 00 > 00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48 > 89 > [ 41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082 > [ 41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: > ffff8c3363358bc0 > [ 41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: > ffff8c33632dd8c0 > [ 41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: > 8080808080808080 > [ 41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: > 0000000000000246 > [ 41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: > 0000000000000080 > [ 41.919459] FS: 0000000000000000(0000) GS:ffff8c33bfa40000(0000) > knlGS:0000000000000000 > [ 41.919461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: > 00000000000606e0 > [ 42.066910] Call Trace: > [ 42.066929] fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc] > [ 42.066935] fc_rport_final_delete+0x53/0x1e0 [scsi_transport_fc] > [ 42.066943] process_one_work+0x1f4/0x3e0 > [ 42.066947] worker_thread+0x2d/0x3e0 > [ 42.066951] ? process_one_work+0x3e0/0x3e0 > [ 42.066954] kthread+0x10d/0x130 > [ 42.066957] ? kthread_park+0xa0/0xa0 > [ 42.066961] ret_from_fork+0x35/0x40 > > Cheers, > > Hannes Awesome, Ok Thank you! I looked at the patch and it looks correct to me at least. I will add a Review but prob best to have the Cisco fnic folks also review. Thank you for catching this. Reviewed-by: Laurence Oberman <loberman@redhat.com>
Hi Hannes,
I love your patch! Perhaps something to improve:
[auto build test WARNING on scsi/for-next]
[also build test WARNING on mkp-scsi/for-next v5.7-rc5 next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Hannes-Reinecke/fnic-to-not-call-scsi_done-for-unhandled-commands/20200515-192812
base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 9d4cf5bd421fb6467ff5f00e26a37527246dd4d6)
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>
All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> drivers/scsi/fnic/fnic_scsi.c:1419:1: warning: unused label 'cleanup_scsi_cmd' [-Wunused-label]
cleanup_scsi_cmd:
^~~~~~~~~~~~~~~~~
1 warning generated.
vim +/cleanup_scsi_cmd +1419 drivers/scsi/fnic/fnic_scsi.c
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1361
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1362 static void fnic_cleanup_io(struct fnic *fnic, int exclude_id)
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1363 {
1259c5dc752474 Sesidhar Beddel 2013-09-09 1364 int i;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1365 struct fnic_io_req *io_req;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1366 unsigned long flags = 0;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1367 struct scsi_cmnd *sc;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1368 spinlock_t *io_lock;
14eb5d905d16ec Hiral Patel 2013-02-12 1369 unsigned long start_time = 0;
67125b0287a9e6 Hiral Patel 2013-09-12 1370 struct fnic_stats *fnic_stats = &fnic->fnic_stats;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1371
fc85799ee362e3 Hiral Patel 2013-09-09 1372 for (i = 0; i < fnic->fnic_max_tag_id; i++) {
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1373 if (i == exclude_id)
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1374 continue;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1375
1259c5dc752474 Sesidhar Beddel 2013-09-09 1376 io_lock = fnic_io_lock_tag(fnic, i);
1259c5dc752474 Sesidhar Beddel 2013-09-09 1377 spin_lock_irqsave(io_lock, flags);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1378 sc = scsi_host_find_tag(fnic->lport->host, i);
1259c5dc752474 Sesidhar Beddel 2013-09-09 1379 if (!sc) {
1259c5dc752474 Sesidhar Beddel 2013-09-09 1380 spin_unlock_irqrestore(io_lock, flags);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1381 continue;
1259c5dc752474 Sesidhar Beddel 2013-09-09 1382 }
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1383
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1384 io_req = (struct fnic_io_req *)CMD_SP(sc);
03298552cba38f Hiral Patel 2013-02-12 1385 if ((CMD_FLAGS(sc) & FNIC_DEVICE_RESET) &&
03298552cba38f Hiral Patel 2013-02-12 1386 !(CMD_FLAGS(sc) & FNIC_DEV_RST_DONE)) {
03298552cba38f Hiral Patel 2013-02-12 1387 /*
03298552cba38f Hiral Patel 2013-02-12 1388 * We will be here only when FW completes reset
03298552cba38f Hiral Patel 2013-02-12 1389 * without sending completions for outstanding ios.
03298552cba38f Hiral Patel 2013-02-12 1390 */
03298552cba38f Hiral Patel 2013-02-12 1391 CMD_FLAGS(sc) |= FNIC_DEV_RST_DONE;
03298552cba38f Hiral Patel 2013-02-12 1392 if (io_req && io_req->dr_done)
03298552cba38f Hiral Patel 2013-02-12 1393 complete(io_req->dr_done);
03298552cba38f Hiral Patel 2013-02-12 1394 else if (io_req && io_req->abts_done)
03298552cba38f Hiral Patel 2013-02-12 1395 complete(io_req->abts_done);
03298552cba38f Hiral Patel 2013-02-12 1396 spin_unlock_irqrestore(io_lock, flags);
03298552cba38f Hiral Patel 2013-02-12 1397 continue;
03298552cba38f Hiral Patel 2013-02-12 1398 } else if (CMD_FLAGS(sc) & FNIC_DEVICE_RESET) {
03298552cba38f Hiral Patel 2013-02-12 1399 spin_unlock_irqrestore(io_lock, flags);
03298552cba38f Hiral Patel 2013-02-12 1400 continue;
03298552cba38f Hiral Patel 2013-02-12 1401 }
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1402 if (!io_req) {
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1403 spin_unlock_irqrestore(io_lock, flags);
cb830d88a62fde Hannes Reinecke 2020-05-15 1404 continue;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1405 }
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1406
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1407 CMD_SP(sc) = NULL;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1408
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1409 spin_unlock_irqrestore(io_lock, flags);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1410
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1411 /*
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1412 * If there is a scsi_cmnd associated with this io_req, then
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1413 * free the corresponding state
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1414 */
14eb5d905d16ec Hiral Patel 2013-02-12 1415 start_time = io_req->start_time;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1416 fnic_release_ioreq_buf(fnic, io_req, sc);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1417 mempool_free(io_req, fnic->io_req_pool);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1418
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 @1419 cleanup_scsi_cmd:
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1420 sc->result = DID_TRANSPORT_DISRUPTED << 16;
668186637e013f Hiral Shah 2014-04-18 1421 FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1422 "%s: tag:0x%x : sc:0x%p duration = %lu DID_TRANSPORT_DISRUPTED\n",
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1423 __func__, sc->request->tag, sc,
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1424 (jiffies - start_time));
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1425
67125b0287a9e6 Hiral Patel 2013-09-12 1426 if (atomic64_read(&fnic->io_cmpl_skip))
67125b0287a9e6 Hiral Patel 2013-09-12 1427 atomic64_dec(&fnic->io_cmpl_skip);
67125b0287a9e6 Hiral Patel 2013-09-12 1428 else
67125b0287a9e6 Hiral Patel 2013-09-12 1429 atomic64_inc(&fnic_stats->io_stats.io_completions);
67125b0287a9e6 Hiral Patel 2013-09-12 1430
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1431 /* Complete the command to SCSI */
4d7007b49d523d Hiral Patel 2013-02-12 1432 if (sc->scsi_done) {
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1433 if (!(CMD_FLAGS(sc) & FNIC_IO_ISSUED))
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1434 shost_printk(KERN_ERR, fnic->lport->host,
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1435 "Calling done for IO not issued to fw: tag:0x%x sc:0x%p\n",
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1436 sc->request->tag, sc);
e8bfe3e7ffc380 Satish Kharat 2019-01-14 1437
4d7007b49d523d Hiral Patel 2013-02-12 1438 FNIC_TRACE(fnic_cleanup_io,
4d7007b49d523d Hiral Patel 2013-02-12 1439 sc->device->host->host_no, i, sc,
4d7007b49d523d Hiral Patel 2013-02-12 1440 jiffies_to_msecs(jiffies - start_time),
4d7007b49d523d Hiral Patel 2013-02-12 1441 0, ((u64)sc->cmnd[0] << 32 |
4d7007b49d523d Hiral Patel 2013-02-12 1442 (u64)sc->cmnd[2] << 24 |
4d7007b49d523d Hiral Patel 2013-02-12 1443 (u64)sc->cmnd[3] << 16 |
4d7007b49d523d Hiral Patel 2013-02-12 1444 (u64)sc->cmnd[4] << 8 | sc->cmnd[5]),
4d7007b49d523d Hiral Patel 2013-02-12 1445 (((u64)CMD_FLAGS(sc) << 32) | CMD_STATE(sc)));
4d7007b49d523d Hiral Patel 2013-02-12 1446
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1447 sc->scsi_done(sc);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1448 }
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1449 }
4d7007b49d523d Hiral Patel 2013-02-12 1450 }
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 1451
:::::: The code at line 1419 was first introduced by commit
:::::: 5df6d737dd4b0fe9eccf943abb3677cfea05a6c4 [SCSI] fnic: Add new Cisco PCI-Express FCoE HBA
:::::: TO: Abhijeet Joglekar <abjoglek@cisco.com>
:::::: CC: James Bottomley <James.Bottomley@HansenPartnership.com>
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Fri, 2020-05-15 at 10:04 -0400, Laurence Oberman wrote: > On Fri, 2020-05-15 at 15:52 +0200, Hannes Reinecke wrote: > > On 5/15/20 3:06 PM, Laurence Oberman wrote: > > > On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote: > > > > The fnic drivers assigns an ioreq structure to each command, > > > > and > > > > severs this assignment once scsi_done() has been called and the > > > > command has been completed. > > > > So when traversing commands to terminate outstanding I/O we > > > > should > > > > not call scsi_done() on commands which do not have a > > > > corresponding > > > > ioreq structure; these commands have either never entered the > > > > driver > > > > or have already been completed. > > > > > > > > Signed-off-by: Hannes Reinecke <hare@suse.com> > > > > --- > > > > drivers/scsi/fnic/fnic_scsi.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/scsi/fnic/fnic_scsi.c > > > > b/drivers/scsi/fnic/fnic_scsi.c > > > > index 27535c90b248..8d2798cbd30f 100644 > > > > --- a/drivers/scsi/fnic/fnic_scsi.c > > > > +++ b/drivers/scsi/fnic/fnic_scsi.c > > > > @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic > > > > *fnic, > > > > int exclude_id) > > > > } > > > > if (!io_req) { > > > > spin_unlock_irqrestore(io_lock, flags); > > > > - goto cleanup_scsi_cmd; > > > > + continue; > > > > } > > > > > > > > CMD_SP(sc) = NULL; > > > > > > Hi Hannes, > > > Thanks for this patch, but can you share what the impact was of > > > this > > > issue. > > > What diod you see in logs/behavior > > > > > > > Unmap the LUNs from the array, and reboot the machine. > > Causing a nice kernel oops in fnic_terminate_rport_io: > > > > [ 41.904013] rport-3:0-2: blocked FC remote port time out: > > removing rport > > [ 41.911625] BUG: kernel NULL pointer dereference, address: > > 0000000000000040 > > [ 41.919408] #PF: supervisor read access in kernel mode > > [ 41.919409] #PF: error_code(0x0000) - not-present page > > [ 41.919411] PGD 0 P4D 0 > > [ 41.919416] Oops: 0000 [#1] SMP PTI > > [ 41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded > > Tainted: > > G X 5.3.18-16-default #1 SLE15-SP2 (unreleased) > > [ 41.919421] Hardware name: Cisco Systems Inc > > UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007 > > 11/06/2019 > > [ 41.919433] Workqueue: fc_wq_3 fc_rport_final_delete > > [scsi_transport_fc] > > [ 41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic] > > [ 41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c > > 48 > > 05 > > 20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0 > > 01 > > 00 > > 00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48 > > 89 > > [ 41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082 > > [ 41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: > > ffff8c3363358bc0 > > [ 41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: > > ffff8c33632dd8c0 > > [ 41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: > > 8080808080808080 > > [ 41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: > > 0000000000000246 > > [ 41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: > > 0000000000000080 > > [ 41.919459] FS: 0000000000000000(0000) > > GS:ffff8c33bfa40000(0000) > > knlGS:0000000000000000 > > [ 41.919461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: > > 00000000000606e0 > > [ 42.066910] Call Trace: > > [ 42.066929] fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc] > > [ 42.066935] fc_rport_final_delete+0x53/0x1e0 > > [scsi_transport_fc] > > [ 42.066943] process_one_work+0x1f4/0x3e0 > > [ 42.066947] worker_thread+0x2d/0x3e0 > > [ 42.066951] ? process_one_work+0x3e0/0x3e0 > > [ 42.066954] kthread+0x10d/0x130 > > [ 42.066957] ? kthread_park+0xa0/0xa0 > > [ 42.066961] ret_from_fork+0x35/0x40 > > > > Cheers, > > > > Hannes > > Awesome, Ok Thank you! > I looked at the patch and it looks correct to me at least. > I will add a Review but prob best to have the Cisco fnic folks also > review. > Thank you for catching this. > > Reviewed-by: Laurence Oberman <loberman@redhat.com> It seems that the Cisco people have missed this patch. Added them to the recpient list now. Regards Martin
Karan,
> This fix was tested by Cisco QA and the fix looks good.
Applied to 5.10/scsi-staging with a warning fix. Thanks!
On Fri, 15 May 2020 13:26:47 +0200, Hannes Reinecke wrote: > The fnic drivers assigns an ioreq structure to each command, and > severs this assignment once scsi_done() has been called and the > command has been completed. > So when traversing commands to terminate outstanding I/O we should > not call scsi_done() on commands which do not have a corresponding > ioreq structure; these commands have either never entered the driver > or have already been completed. Applied to 5.10/scsi-queue, thanks! [1/1] scsi: fnic: Do not call 'scsi_done()' for unhandled commands https://git.kernel.org/mkp/scsi/c/712582e60f28
diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c index 27535c90b248..8d2798cbd30f 100644 --- a/drivers/scsi/fnic/fnic_scsi.c +++ b/drivers/scsi/fnic/fnic_scsi.c @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic *fnic, int exclude_id) } if (!io_req) { spin_unlock_irqrestore(io_lock, flags); - goto cleanup_scsi_cmd; + continue; } CMD_SP(sc) = NULL;
The fnic drivers assigns an ioreq structure to each command, and severs this assignment once scsi_done() has been called and the command has been completed. So when traversing commands to terminate outstanding I/O we should not call scsi_done() on commands which do not have a corresponding ioreq structure; these commands have either never entered the driver or have already been completed. Signed-off-by: Hannes Reinecke <hare@suse.com> --- drivers/scsi/fnic/fnic_scsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)