diff mbox series

fnic: to not call 'scsi_done()' for unhandled commands

Message ID 20200515112647.49260-1-hare@suse.de (mailing list archive)
State Changes Requested
Headers show
Series fnic: to not call 'scsi_done()' for unhandled commands | expand

Commit Message

Hannes Reinecke May 15, 2020, 11:26 a.m. UTC
The fnic drivers assigns an ioreq structure to each command, and
severs this assignment once scsi_done() has been called and the
command has been completed.
So when traversing commands to terminate outstanding I/O we should
not call scsi_done() on commands which do not have a corresponding
ioreq structure; these commands have either never entered the driver
or have already been completed.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/fnic/fnic_scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Laurence Oberman May 15, 2020, 1:06 p.m. UTC | #1
On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote:
> The fnic drivers assigns an ioreq structure to each command, and
> severs this assignment once scsi_done() has been called and the
> command has been completed.
> So when traversing commands to terminate outstanding I/O we should
> not call scsi_done() on commands which do not have a corresponding
> ioreq structure; these commands have either never entered the driver
> or have already been completed.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/fnic/fnic_scsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/fnic/fnic_scsi.c
> b/drivers/scsi/fnic/fnic_scsi.c
> index 27535c90b248..8d2798cbd30f 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
> @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic *fnic,
> int exclude_id)
>  		}
>  		if (!io_req) {
>  			spin_unlock_irqrestore(io_lock, flags);
> -			goto cleanup_scsi_cmd;
> +			continue;
>  		}
>  
>  		CMD_SP(sc) = NULL;

Hi Hannes,
Thanks for this patch, but can you share what the impact was of this
issue.
What diod you see in logs/behavior

Regards
Laurence
Hannes Reinecke May 15, 2020, 1:52 p.m. UTC | #2
On 5/15/20 3:06 PM, Laurence Oberman wrote:
> On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote:
>> The fnic drivers assigns an ioreq structure to each command, and
>> severs this assignment once scsi_done() has been called and the
>> command has been completed.
>> So when traversing commands to terminate outstanding I/O we should
>> not call scsi_done() on commands which do not have a corresponding
>> ioreq structure; these commands have either never entered the driver
>> or have already been completed.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>   drivers/scsi/fnic/fnic_scsi.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/fnic/fnic_scsi.c
>> b/drivers/scsi/fnic/fnic_scsi.c
>> index 27535c90b248..8d2798cbd30f 100644
>> --- a/drivers/scsi/fnic/fnic_scsi.c
>> +++ b/drivers/scsi/fnic/fnic_scsi.c
>> @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic *fnic,
>> int exclude_id)
>>   		}
>>   		if (!io_req) {
>>   			spin_unlock_irqrestore(io_lock, flags);
>> -			goto cleanup_scsi_cmd;
>> +			continue;
>>   		}
>>   
>>   		CMD_SP(sc) = NULL;
> 
> Hi Hannes,
> Thanks for this patch, but can you share what the impact was of this
> issue.
> What diod you see in logs/behavior
> 
Unmap the LUNs from the array, and reboot the machine.
Causing a nice kernel oops in fnic_terminate_rport_io:

[   41.904013]  rport-3:0-2: blocked FC remote port time out: removing rport
[   41.911625] BUG: kernel NULL pointer dereference, address: 
0000000000000040
[   41.919408] #PF: supervisor read access in kernel mode
[   41.919409] #PF: error_code(0x0000) - not-present page
[   41.919411] PGD 0 P4D 0
[   41.919416] Oops: 0000 [#1] SMP PTI
[   41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded Tainted: 
G               X   5.3.18-16-default #1 SLE15-SP2 (unreleased)
[   41.919421] Hardware name: Cisco Systems Inc 
UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007 11/06/2019
[   41.919433] Workqueue: fc_wq_3 fc_rport_final_delete [scsi_transport_fc]
[   41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic]
[   41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c 48 05 
20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0 01 00 
00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48 89
[   41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082
[   41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: 
ffff8c3363358bc0
[   41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: 
ffff8c33632dd8c0
[   41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: 
8080808080808080
[   41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: 
0000000000000246
[   41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: 
0000000000000080
[   41.919459] FS:  0000000000000000(0000) GS:ffff8c33bfa40000(0000) 
knlGS:0000000000000000
[   41.919461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: 
00000000000606e0
[   42.066910] Call Trace:
[   42.066929]  fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc]
[   42.066935]  fc_rport_final_delete+0x53/0x1e0 [scsi_transport_fc]
[   42.066943]  process_one_work+0x1f4/0x3e0
[   42.066947]  worker_thread+0x2d/0x3e0
[   42.066951]  ? process_one_work+0x3e0/0x3e0
[   42.066954]  kthread+0x10d/0x130
[   42.066957]  ? kthread_park+0xa0/0xa0
[   42.066961]  ret_from_fork+0x35/0x40

Cheers,

Hannes
Laurence Oberman May 15, 2020, 2:04 p.m. UTC | #3
On Fri, 2020-05-15 at 15:52 +0200, Hannes Reinecke wrote:
> On 5/15/20 3:06 PM, Laurence Oberman wrote:
> > On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote:
> > > The fnic drivers assigns an ioreq structure to each command, and
> > > severs this assignment once scsi_done() has been called and the
> > > command has been completed.
> > > So when traversing commands to terminate outstanding I/O we
> > > should
> > > not call scsi_done() on commands which do not have a
> > > corresponding
> > > ioreq structure; these commands have either never entered the
> > > driver
> > > or have already been completed.
> > > 
> > > Signed-off-by: Hannes Reinecke <hare@suse.com>
> > > ---
> > >   drivers/scsi/fnic/fnic_scsi.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/scsi/fnic/fnic_scsi.c
> > > b/drivers/scsi/fnic/fnic_scsi.c
> > > index 27535c90b248..8d2798cbd30f 100644
> > > --- a/drivers/scsi/fnic/fnic_scsi.c
> > > +++ b/drivers/scsi/fnic/fnic_scsi.c
> > > @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic
> > > *fnic,
> > > int exclude_id)
> > >   		}
> > >   		if (!io_req) {
> > >   			spin_unlock_irqrestore(io_lock, flags);
> > > -			goto cleanup_scsi_cmd;
> > > +			continue;
> > >   		}
> > >   
> > >   		CMD_SP(sc) = NULL;
> > 
> > Hi Hannes,
> > Thanks for this patch, but can you share what the impact was of
> > this
> > issue.
> > What diod you see in logs/behavior
> > 
> 
> Unmap the LUNs from the array, and reboot the machine.
> Causing a nice kernel oops in fnic_terminate_rport_io:
> 
> [   41.904013]  rport-3:0-2: blocked FC remote port time out:
> removing rport
> [   41.911625] BUG: kernel NULL pointer dereference, address: 
> 0000000000000040
> [   41.919408] #PF: supervisor read access in kernel mode
> [   41.919409] #PF: error_code(0x0000) - not-present page
> [   41.919411] PGD 0 P4D 0
> [   41.919416] Oops: 0000 [#1] SMP PTI
> [   41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded
> Tainted: 
> G               X   5.3.18-16-default #1 SLE15-SP2 (unreleased)
> [   41.919421] Hardware name: Cisco Systems Inc 
> UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007
> 11/06/2019
> [   41.919433] Workqueue: fc_wq_3 fc_rport_final_delete
> [scsi_transport_fc]
> [   41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic]
> [   41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c 48
> 05 
> 20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0 01
> 00 
> 00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48
> 89
> [   41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082
> [   41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: 
> ffff8c3363358bc0
> [   41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: 
> ffff8c33632dd8c0
> [   41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: 
> 8080808080808080
> [   41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: 
> 0000000000000246
> [   41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: 
> 0000000000000080
> [   41.919459] FS:  0000000000000000(0000) GS:ffff8c33bfa40000(0000) 
> knlGS:0000000000000000
> [   41.919461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: 
> 00000000000606e0
> [   42.066910] Call Trace:
> [   42.066929]  fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc]
> [   42.066935]  fc_rport_final_delete+0x53/0x1e0 [scsi_transport_fc]
> [   42.066943]  process_one_work+0x1f4/0x3e0
> [   42.066947]  worker_thread+0x2d/0x3e0
> [   42.066951]  ? process_one_work+0x3e0/0x3e0
> [   42.066954]  kthread+0x10d/0x130
> [   42.066957]  ? kthread_park+0xa0/0xa0
> [   42.066961]  ret_from_fork+0x35/0x40
> 
> Cheers,
> 
> Hannes

Awesome, Ok Thank you!
I looked at the patch and it looks correct to me at least.
I will add a Review but prob best to have the Cisco fnic folks also
review.
Thank you for catching this.

Reviewed-by: Laurence Oberman <loberman@redhat.com>
kernel test robot May 15, 2020, 7:07 p.m. UTC | #4
Hi Hannes,

I love your patch! Perhaps something to improve:

[auto build test WARNING on scsi/for-next]
[also build test WARNING on mkp-scsi/for-next v5.7-rc5 next-20200515]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Hannes-Reinecke/fnic-to-not-call-scsi_done-for-unhandled-commands/20200515-192812
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 9d4cf5bd421fb6467ff5f00e26a37527246dd4d6)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> drivers/scsi/fnic/fnic_scsi.c:1419:1: warning: unused label 'cleanup_scsi_cmd' [-Wunused-label]
cleanup_scsi_cmd:
^~~~~~~~~~~~~~~~~
1 warning generated.

vim +/cleanup_scsi_cmd +1419 drivers/scsi/fnic/fnic_scsi.c

5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1361  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1362  static void fnic_cleanup_io(struct fnic *fnic, int exclude_id)
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1363  {
1259c5dc752474 Sesidhar Beddel   2013-09-09  1364  	int i;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1365  	struct fnic_io_req *io_req;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1366  	unsigned long flags = 0;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1367  	struct scsi_cmnd *sc;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1368  	spinlock_t *io_lock;
14eb5d905d16ec Hiral Patel       2013-02-12  1369  	unsigned long start_time = 0;
67125b0287a9e6 Hiral Patel       2013-09-12  1370  	struct fnic_stats *fnic_stats = &fnic->fnic_stats;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1371  
fc85799ee362e3 Hiral Patel       2013-09-09  1372  	for (i = 0; i < fnic->fnic_max_tag_id; i++) {
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1373  		if (i == exclude_id)
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1374  			continue;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1375  
1259c5dc752474 Sesidhar Beddel   2013-09-09  1376  		io_lock = fnic_io_lock_tag(fnic, i);
1259c5dc752474 Sesidhar Beddel   2013-09-09  1377  		spin_lock_irqsave(io_lock, flags);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1378  		sc = scsi_host_find_tag(fnic->lport->host, i);
1259c5dc752474 Sesidhar Beddel   2013-09-09  1379  		if (!sc) {
1259c5dc752474 Sesidhar Beddel   2013-09-09  1380  			spin_unlock_irqrestore(io_lock, flags);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1381  			continue;
1259c5dc752474 Sesidhar Beddel   2013-09-09  1382  		}
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1383  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1384  		io_req = (struct fnic_io_req *)CMD_SP(sc);
03298552cba38f Hiral Patel       2013-02-12  1385  		if ((CMD_FLAGS(sc) & FNIC_DEVICE_RESET) &&
03298552cba38f Hiral Patel       2013-02-12  1386  			!(CMD_FLAGS(sc) & FNIC_DEV_RST_DONE)) {
03298552cba38f Hiral Patel       2013-02-12  1387  			/*
03298552cba38f Hiral Patel       2013-02-12  1388  			 * We will be here only when FW completes reset
03298552cba38f Hiral Patel       2013-02-12  1389  			 * without sending completions for outstanding ios.
03298552cba38f Hiral Patel       2013-02-12  1390  			 */
03298552cba38f Hiral Patel       2013-02-12  1391  			CMD_FLAGS(sc) |= FNIC_DEV_RST_DONE;
03298552cba38f Hiral Patel       2013-02-12  1392  			if (io_req && io_req->dr_done)
03298552cba38f Hiral Patel       2013-02-12  1393  				complete(io_req->dr_done);
03298552cba38f Hiral Patel       2013-02-12  1394  			else if (io_req && io_req->abts_done)
03298552cba38f Hiral Patel       2013-02-12  1395  				complete(io_req->abts_done);
03298552cba38f Hiral Patel       2013-02-12  1396  			spin_unlock_irqrestore(io_lock, flags);
03298552cba38f Hiral Patel       2013-02-12  1397  			continue;
03298552cba38f Hiral Patel       2013-02-12  1398  		} else if (CMD_FLAGS(sc) & FNIC_DEVICE_RESET) {
03298552cba38f Hiral Patel       2013-02-12  1399  			spin_unlock_irqrestore(io_lock, flags);
03298552cba38f Hiral Patel       2013-02-12  1400  			continue;
03298552cba38f Hiral Patel       2013-02-12  1401  		}
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1402  		if (!io_req) {
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1403  			spin_unlock_irqrestore(io_lock, flags);
cb830d88a62fde Hannes Reinecke   2020-05-15  1404  			continue;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1405  		}
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1406  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1407  		CMD_SP(sc) = NULL;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1408  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1409  		spin_unlock_irqrestore(io_lock, flags);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1410  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1411  		/*
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1412  		 * If there is a scsi_cmnd associated with this io_req, then
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1413  		 * free the corresponding state
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1414  		 */
14eb5d905d16ec Hiral Patel       2013-02-12  1415  		start_time = io_req->start_time;
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1416  		fnic_release_ioreq_buf(fnic, io_req, sc);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1417  		mempool_free(io_req, fnic->io_req_pool);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1418  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17 @1419  cleanup_scsi_cmd:
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1420  		sc->result = DID_TRANSPORT_DISRUPTED << 16;
668186637e013f Hiral Shah        2014-04-18  1421  		FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1422  			      "%s: tag:0x%x : sc:0x%p duration = %lu DID_TRANSPORT_DISRUPTED\n",
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1423  			      __func__, sc->request->tag, sc,
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1424  			      (jiffies - start_time));
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1425  
67125b0287a9e6 Hiral Patel       2013-09-12  1426  		if (atomic64_read(&fnic->io_cmpl_skip))
67125b0287a9e6 Hiral Patel       2013-09-12  1427  			atomic64_dec(&fnic->io_cmpl_skip);
67125b0287a9e6 Hiral Patel       2013-09-12  1428  		else
67125b0287a9e6 Hiral Patel       2013-09-12  1429  			atomic64_inc(&fnic_stats->io_stats.io_completions);
67125b0287a9e6 Hiral Patel       2013-09-12  1430  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1431  		/* Complete the command to SCSI */
4d7007b49d523d Hiral Patel       2013-02-12  1432  		if (sc->scsi_done) {
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1433  			if (!(CMD_FLAGS(sc) & FNIC_IO_ISSUED))
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1434  				shost_printk(KERN_ERR, fnic->lport->host,
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1435  				"Calling done for IO not issued to fw: tag:0x%x sc:0x%p\n",
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1436  				 sc->request->tag, sc);
e8bfe3e7ffc380 Satish Kharat     2019-01-14  1437  
4d7007b49d523d Hiral Patel       2013-02-12  1438  			FNIC_TRACE(fnic_cleanup_io,
4d7007b49d523d Hiral Patel       2013-02-12  1439  				  sc->device->host->host_no, i, sc,
4d7007b49d523d Hiral Patel       2013-02-12  1440  				  jiffies_to_msecs(jiffies - start_time),
4d7007b49d523d Hiral Patel       2013-02-12  1441  				  0, ((u64)sc->cmnd[0] << 32 |
4d7007b49d523d Hiral Patel       2013-02-12  1442  				  (u64)sc->cmnd[2] << 24 |
4d7007b49d523d Hiral Patel       2013-02-12  1443  				  (u64)sc->cmnd[3] << 16 |
4d7007b49d523d Hiral Patel       2013-02-12  1444  				  (u64)sc->cmnd[4] << 8 | sc->cmnd[5]),
4d7007b49d523d Hiral Patel       2013-02-12  1445  				  (((u64)CMD_FLAGS(sc) << 32) | CMD_STATE(sc)));
4d7007b49d523d Hiral Patel       2013-02-12  1446  
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1447  			sc->scsi_done(sc);
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1448  		}
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1449  	}
4d7007b49d523d Hiral Patel       2013-02-12  1450  }
5df6d737dd4b0f Abhijeet Joglekar 2009-04-17  1451  

:::::: The code at line 1419 was first introduced by commit
:::::: 5df6d737dd4b0fe9eccf943abb3677cfea05a6c4 [SCSI] fnic: Add new Cisco PCI-Express FCoE HBA

:::::: TO: Abhijeet Joglekar <abjoglek@cisco.com>
:::::: CC: James Bottomley <James.Bottomley@HansenPartnership.com>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Martin Wilck June 23, 2020, 1:53 p.m. UTC | #5
On Fri, 2020-05-15 at 10:04 -0400, Laurence Oberman wrote:
> On Fri, 2020-05-15 at 15:52 +0200, Hannes Reinecke wrote:
> > On 5/15/20 3:06 PM, Laurence Oberman wrote:
> > > On Fri, 2020-05-15 at 13:26 +0200, Hannes Reinecke wrote:
> > > > The fnic drivers assigns an ioreq structure to each command,
> > > > and
> > > > severs this assignment once scsi_done() has been called and the
> > > > command has been completed.
> > > > So when traversing commands to terminate outstanding I/O we
> > > > should
> > > > not call scsi_done() on commands which do not have a
> > > > corresponding
> > > > ioreq structure; these commands have either never entered the
> > > > driver
> > > > or have already been completed.
> > > > 
> > > > Signed-off-by: Hannes Reinecke <hare@suse.com>
> > > > ---
> > > >   drivers/scsi/fnic/fnic_scsi.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/scsi/fnic/fnic_scsi.c
> > > > b/drivers/scsi/fnic/fnic_scsi.c
> > > > index 27535c90b248..8d2798cbd30f 100644
> > > > --- a/drivers/scsi/fnic/fnic_scsi.c
> > > > +++ b/drivers/scsi/fnic/fnic_scsi.c
> > > > @@ -1401,7 +1401,7 @@ static void fnic_cleanup_io(struct fnic
> > > > *fnic,
> > > > int exclude_id)
> > > >   		}
> > > >   		if (!io_req) {
> > > >   			spin_unlock_irqrestore(io_lock, flags);
> > > > -			goto cleanup_scsi_cmd;
> > > > +			continue;
> > > >   		}
> > > >   
> > > >   		CMD_SP(sc) = NULL;
> > > 
> > > Hi Hannes,
> > > Thanks for this patch, but can you share what the impact was of
> > > this
> > > issue.
> > > What diod you see in logs/behavior
> > > 
> > 
> > Unmap the LUNs from the array, and reboot the machine.
> > Causing a nice kernel oops in fnic_terminate_rport_io:
> > 
> > [   41.904013]  rport-3:0-2: blocked FC remote port time out:
> > removing rport
> > [   41.911625] BUG: kernel NULL pointer dereference, address: 
> > 0000000000000040
> > [   41.919408] #PF: supervisor read access in kernel mode
> > [   41.919409] #PF: error_code(0x0000) - not-present page
> > [   41.919411] PGD 0 P4D 0
> > [   41.919416] Oops: 0000 [#1] SMP PTI
> > [   41.919420] CPU: 1 PID: 219 Comm: kworker/1:1 Kdump: loaded
> > Tainted: 
> > G               X   5.3.18-16-default #1 SLE15-SP2 (unreleased)
> > [   41.919421] Hardware name: Cisco Systems Inc 
> > UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.3.0.4e.0.1106191007
> > 11/06/2019
> > [   41.919433] Workqueue: fc_wq_3 fc_rport_final_delete
> > [scsi_transport_fc]
> > [   41.919443] RIP: 0010:fnic_terminate_rport_io+0x2db/0x6c0 [fnic]
> > [   41.919446] Code: 3c c2 e8 48 00 95 f5 48 85 c0 49 89 c5 74 2c
> > 48
> > 05 
> > 20 01 00 00 48 89 44 24 10 74 1f 49 8b 85 58 01 00 00 48 8b 80 c0
> > 01
> > 00 
> > 00 <48> 8b 78 40 e8 1c 0f e4 ff 85 c0 0f 85 b2 fd ff ff 4c 89 e6 48
> > 89
> > [   41.919448] RSP: 0018:ffffa521c164bde0 EFLAGS: 00010082
> > [   41.919450] RAX: 0000000000000000 RBX: ffff8c33633587c8 RCX: 
> > ffff8c3363358bc0
> > [   41.919452] RDX: ffff8c336347bc80 RSI: 0000000000000080 RDI: 
> > ffff8c33632dd8c0
> > [   41.919453] RBP: ffff8c3363359208 R08: 00335f71775f6366 R09: 
> > 8080808080808080
> > [   41.919455] R10: ffffa521c0087dc8 R11: fefefefefefefeff R12: 
> > 0000000000000246
> > [   41.919456] R13: ffff8c33633e8100 R14: ffff8c24470a4000 R15: 
> > 0000000000000080
> > [   41.919459] FS:  0000000000000000(0000)
> > GS:ffff8c33bfa40000(0000) 
> > knlGS:0000000000000000
> > [   41.919461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   41.919466] CR2: 0000000000000040 CR3: 000000011340a003 CR4: 
> > 00000000000606e0
> > [   42.066910] Call Trace:
> > [   42.066929]  fc_terminate_rport_io+0x51/0x70 [scsi_transport_fc]
> > [   42.066935]  fc_rport_final_delete+0x53/0x1e0
> > [scsi_transport_fc]
> > [   42.066943]  process_one_work+0x1f4/0x3e0
> > [   42.066947]  worker_thread+0x2d/0x3e0
> > [   42.066951]  ? process_one_work+0x3e0/0x3e0
> > [   42.066954]  kthread+0x10d/0x130
> > [   42.066957]  ? kthread_park+0xa0/0xa0
> > [   42.066961]  ret_from_fork+0x35/0x40
> > 
> > Cheers,
> > 
> > Hannes
> 
> Awesome, Ok Thank you!
> I looked at the patch and it looks correct to me at least.
> I will add a Review but prob best to have the Cisco fnic folks also
> review.
> Thank you for catching this.
> 
> Reviewed-by: Laurence Oberman <loberman@redhat.com>

It seems that the Cisco people have missed this patch. Added them to
the recpient list now.

Regards
Martin
Martin K. Petersen Oct. 3, 2020, 1:11 a.m. UTC | #6
Karan,

> This fix was tested by Cisco QA and the fix looks good.

Applied to 5.10/scsi-staging with a warning fix. Thanks!
Martin K. Petersen Oct. 7, 2020, 3:47 a.m. UTC | #7
On Fri, 15 May 2020 13:26:47 +0200, Hannes Reinecke wrote:

> The fnic drivers assigns an ioreq structure to each command, and
> severs this assignment once scsi_done() has been called and the
> command has been completed.
> So when traversing commands to terminate outstanding I/O we should
> not call scsi_done() on commands which do not have a corresponding
> ioreq structure; these commands have either never entered the driver
> or have already been completed.

Applied to 5.10/scsi-queue, thanks!

[1/1] scsi: fnic: Do not call 'scsi_done()' for unhandled commands
      https://git.kernel.org/mkp/scsi/c/712582e60f28
diff mbox series

Patch

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 27535c90b248..8d2798cbd30f 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1401,7 +1401,7 @@  static void fnic_cleanup_io(struct fnic *fnic, int exclude_id)
 		}
 		if (!io_req) {
 			spin_unlock_irqrestore(io_lock, flags);
-			goto cleanup_scsi_cmd;
+			continue;
 		}
 
 		CMD_SP(sc) = NULL;