diff mbox series

[2/2] scsi: qla2xxx: don't shut down firmware before closing sessions

Message ID 20191129202627.19624-2-martin.wilck@suse.com (mailing list archive)
State Changes Requested
Headers show
Series [1/2] scsi: qla2xxx: avoid sending mailbox commands if firmware is stopped | expand

Commit Message

Martin Wilck Nov. 29, 2019, 8:26 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

Since 45235022da99, the firmware is shut down early in the controller
shutdown process. This causes commands sent to the firmware (such as LOGO)
to hang forever. Eventually one or more timeouts will be triggered.
Move the stopping of the firmware until after sessions have terminated.

Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

Comments

Roman Bolshakov Nov. 30, 2019, 10:23 a.m. UTC | #1
On Fri, Nov 29, 2019 at 08:26:36PM +0000, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since 45235022da99, the firmware is shut down early in the controller
> shutdown process. This causes commands sent to the firmware (such as LOGO)
> to hang forever. Eventually one or more timeouts will be triggered.
> Move the stopping of the firmware until after sessions have terminated.
> 
> Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 43d0aa0..0cc127d 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev)
>  	}
>  	qla2x00_wait_for_hba_ready(base_vha);
>  
> +	qla2x00_wait_for_sess_deletion(base_vha);
> +
> +	/*
> +	 * if UNLOAD flag is already set, then continue unload,
> +	 * where it was set first.
> +	 */
> +	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> +		return;
> +
> +	set_bit(UNLOADING, &base_vha->dpc_flags);

Hi Martin,

Moving qla2x00_wait_for_sess_deletion up ensures hw->wq is flushed
before shutdown is done.

But I think we need to set UNLOADING bit earlier to break-up async
discovery chain. The comment just above qla2x00_wait_for_sess_deletion
definition mentions that.

Thanks,
Roman
Hannes Reinecke Dec. 3, 2019, 7:52 a.m. UTC | #2
On 11/29/19 9:26 PM, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since 45235022da99, the firmware is shut down early in the controller
> shutdown process. This causes commands sent to the firmware (such as LOGO)
> to hang forever. Eventually one or more timeouts will be triggered.
> Move the stopping of the firmware until after sessions have terminated.
> 
> Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 43d0aa0..0cc127d 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev)
>  	}
>  	qla2x00_wait_for_hba_ready(base_vha);
>  
> +	qla2x00_wait_for_sess_deletion(base_vha);
> +
> +	/*
> +	 * if UNLOAD flag is already set, then continue unload,
> +	 * where it was set first.
> +	 */
> +	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> +		return;
> +
> +	set_bit(UNLOADING, &base_vha->dpc_flags);
>  	if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) ||
>  	    IS_QLA28XX(ha)) {
>  		if (ha->flags.fw_started)

test_and_set_bit(), maybe?

Cheers,

Hannes
Bart Van Assche Dec. 11, 2019, 12:06 p.m. UTC | #3
On 11/29/19 3:26 PM, Martin Wilck wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since 45235022da99, the firmware is shut down early in the controller
> shutdown process. This causes commands sent to the firmware (such as LOGO)
> to hang forever. Eventually one or more timeouts will be triggered.
> Move the stopping of the firmware until after sessions have terminated.
> 
> Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++-----------
>   1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 43d0aa0..0cc127d 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev)
>   	}
>   	qla2x00_wait_for_hba_ready(base_vha);
>   
> +	qla2x00_wait_for_sess_deletion(base_vha);
> +
> +	/*
> +	 * if UNLOAD flag is already set, then continue unload,
> +	 * where it was set first.
> +	 */
> +	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> +		return;
> +
> +	set_bit(UNLOADING, &base_vha->dpc_flags);
>   	if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) ||
>   	    IS_QLA28XX(ha)) {
>   		if (ha->flags.fw_started)
> @@ -3726,17 +3736,6 @@ qla2x00_remove_one(struct pci_dev *pdev)
>   		qla2x00_try_to_stop_firmware(base_vha);
>   	}
>   
> -	qla2x00_wait_for_sess_deletion(base_vha);
> -
> -	/*
> -	 * if UNLOAD flag is already set, then continue unload,
> -	 * where it was set first.
> -	 */
> -	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> -		return;
> -
> -	set_bit(UNLOADING, &base_vha->dpc_flags);
> -
>   	qla_nvme_delete(base_vha);
>   
>   	dma_free_coherent(&ha->pdev->dev,

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Martin Wilck Feb. 5, 2020, 9:10 a.m. UTC | #4
Hello Roman,

sorry for the late reply. I had temporary issues with email from linux-
scsi, and then was busy with other stuff.

On Sat, 2019-11-30 at 13:23 +0300, Roman Bolshakov wrote:
> On Fri, Nov 29, 2019 at 08:26:36PM +0000, Martin Wilck wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > Since 45235022da99, the firmware is shut down early in the
> > controller
> > shutdown process. This causes commands sent to the firmware (such
> > as LOGO)
> > to hang forever. Eventually one or more timeouts will be triggered.
> > Move the stopping of the firmware until after sessions have
> > terminated.
> > 
> > Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting
> > down chip")
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  drivers/scsi/qla2xxx/qla_os.c | 21 ++++++++++-----------
> >  1 file changed, 10 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/scsi/qla2xxx/qla_os.c
> > b/drivers/scsi/qla2xxx/qla_os.c
> > index 43d0aa0..0cc127d 100644
> > --- a/drivers/scsi/qla2xxx/qla_os.c
> > +++ b/drivers/scsi/qla2xxx/qla_os.c
> > @@ -3710,6 +3710,16 @@ qla2x00_remove_one(struct pci_dev *pdev)
> >  	}
> >  	qla2x00_wait_for_hba_ready(base_vha);
> >  
> > +	qla2x00_wait_for_sess_deletion(base_vha);
> > +
> > +	/*
> > +	 * if UNLOAD flag is already set, then continue unload,
> > +	 * where it was set first.
> > +	 */
> > +	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> > +		return;
> > +
> > +	set_bit(UNLOADING, &base_vha->dpc_flags);
> 
> Hi Martin,
> 
> Moving qla2x00_wait_for_sess_deletion up ensures hw->wq is flushed
> before shutdown is done.
> 
> But I think we need to set UNLOADING bit earlier to break-up async
> discovery chain. The comment just above
> qla2x00_wait_for_sess_deletion
> definition mentions that.

I was unsure about this, because fc_terminate_rport_io() will not send
LOGO any more if UNLOADING is set, which I thought might cause issues
in the SAN. But I suppose it's ok, because after setting flag UNLOADING
we will roughly proceed as follows, sending LOGO if required:

qla2x00_wait_for_sess_deletion()
  qla2x00_mark_all_devices_lost() 
    qlt_schedule_sess_for_deletion() -> set off sess->del_work
qla24xx_delete_sess_fn() (from sess->del_work)
  qlt_unreg_sess -> set off sess->free_work
qlt_free_session_done (from sess->free_work)
  This will send LOGO if sess->logout_on_delete is set.

So, I'll add this to the series, plus Hannes' suggestion to use
test_and_set_bit().

Thanks,
Martin

PS: the qla2xxx driver has multiple flags and tests for "can I access
the controller now?" with (at least for my mind) blurry semantics and
unclear mutual dependencies:

 - dpc_flags: UNLOADING
 - hw->flags.fw_started
 - vha->pci_flags.PFLG_DRIVER_REMOVING
 - qla2x00_chip_is_down() (tests fw_started and the dpc_flags
ISP_ABORT_NEEDED, ABORT_ISP_ACTIVE, ISP_ABORT_RETRY)

I've tried to review how these different flags are used - I don't see
obvious rules as for which flag is tested in what situation. Anyway, as
argued above, I think for the issue at hand using UNLOADING the way you
suggested is correct.
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 43d0aa0..0cc127d 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3710,6 +3710,16 @@  qla2x00_remove_one(struct pci_dev *pdev)
 	}
 	qla2x00_wait_for_hba_ready(base_vha);
 
+	qla2x00_wait_for_sess_deletion(base_vha);
+
+	/*
+	 * if UNLOAD flag is already set, then continue unload,
+	 * where it was set first.
+	 */
+	if (test_bit(UNLOADING, &base_vha->dpc_flags))
+		return;
+
+	set_bit(UNLOADING, &base_vha->dpc_flags);
 	if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) ||
 	    IS_QLA28XX(ha)) {
 		if (ha->flags.fw_started)
@@ -3726,17 +3736,6 @@  qla2x00_remove_one(struct pci_dev *pdev)
 		qla2x00_try_to_stop_firmware(base_vha);
 	}
 
-	qla2x00_wait_for_sess_deletion(base_vha);
-
-	/*
-	 * if UNLOAD flag is already set, then continue unload,
-	 * where it was set first.
-	 */
-	if (test_bit(UNLOADING, &base_vha->dpc_flags))
-		return;
-
-	set_bit(UNLOADING, &base_vha->dpc_flags);
-
 	qla_nvme_delete(base_vha);
 
 	dma_free_coherent(&ha->pdev->dev,