diff mbox

[03/29] scsi: aacraid: Fix hang in kdump

Message ID 20171221173420.8213-4-RaghavaAditya.Renukunta@microsemi.com (mailing list archive)
State Superseded
Headers show

Commit Message

Raghava Aditya Renukunta Dec. 21, 2017, 5:33 p.m. UTC
Driver attempts to perform a device scan and device add after coming out
of reset. At times when the kdump kernel loads and it tries to perform
eh recovery, the device scan hangs since its commands are blocked because
of the eh recovery. This should have shown up in normal eh recovery path
(Should have been obvious)

Remove the code that performs scanning.I can live without the rescanning
support in the stable kernels but a hanging kdump/eh recovery needs to be
fixed.

Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after controller reset)
Cc: <stable@vger.kernel.org>
Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after controller reset)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
---
 drivers/scsi/aacraid/commsup.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Comments

Guilherme G. Piccoli Dec. 21, 2017, 7:15 p.m. UTC | #1
On 12/21/2017 03:33 PM, Raghava Aditya Renukunta wrote:
> Driver attempts to perform a device scan and device add after coming out
> of reset. At times when the kdump kernel loads and it tries to perform
> eh recovery, the device scan hangs since its commands are blocked because
> of the eh recovery. This should have shown up in normal eh recovery path
> (Should have been obvious)
> 
> Remove the code that performs scanning.I can live without the rescanning
> support in the stable kernels but a hanging kdump/eh recovery needs to be
> fixed.
> 
> Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after controller reset)
> Cc: <stable@vger.kernel.org>
> Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after controller reset)
> Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>

Thanks a lot Raghava =)

> ---
>  drivers/scsi/aacraid/commsup.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> index 525a652..ffbfd04 100644
> --- a/drivers/scsi/aacraid/commsup.c
> +++ b/drivers/scsi/aacraid/commsup.c
> @@ -1672,14 +1672,7 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type)
>  out:
>  	aac->in_reset = 0;
>  	scsi_unblock_requests(host);
> -	/*
> -	 * Issue bus rescan to catch any configuration that might have
> -	 * occurred
> -	 */
> -	if (!retval) {
> -		dev_info(&aac->pdev->dev, "Issuing bus rescan\n");
> -		scsi_scan_host(host);
> -	}
> +
>  	if (jafo) {
>  		spin_lock_irq(host->host_lock);
>  	}
>
Guilherme G. Piccoli Dec. 22, 2017, 3:13 p.m. UTC | #2
On 12/21/2017 03:33 PM, Raghava Aditya Renukunta wrote:
> Driver attempts to perform a device scan and device add after coming out
> of reset. At times when the kdump kernel loads and it tries to perform
> eh recovery, the device scan hangs since its commands are blocked because
> of the eh recovery. This should have shown up in normal eh recovery path
> (Should have been obvious)
> 
> Remove the code that performs scanning.I can live without the rescanning
> support in the stable kernels but a hanging kdump/eh recovery needs to be
> fixed.
> 
> Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after controller reset)
> Cc: <stable@vger.kernel.org>
> Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

(Sorry in advance for flooding the thread heheh)
I guess it'd be more appropriate to:

Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com>

Although I've tested it, Doug isolated the race condition based on code
analysis...

Thanks,


Guilherme

> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after controller reset)
> Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
> ---
>  drivers/scsi/aacraid/commsup.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> index 525a652..ffbfd04 100644
> --- a/drivers/scsi/aacraid/commsup.c
> +++ b/drivers/scsi/aacraid/commsup.c
> @@ -1672,14 +1672,7 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type)
>  out:
>  	aac->in_reset = 0;
>  	scsi_unblock_requests(host);
> -	/*
> -	 * Issue bus rescan to catch any configuration that might have
> -	 * occurred
> -	 */
> -	if (!retval) {
> -		dev_info(&aac->pdev->dev, "Issuing bus rescan\n");
> -		scsi_scan_host(host);
> -	}
> +
>  	if (jafo) {
>  		spin_lock_irq(host->host_lock);
>  	}
>
Raghava Aditya Renukunta Dec. 27, 2017, 1:28 a.m. UTC | #3
> -----Original Message-----

> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]

> Sent: Friday, December 22, 2017 7:14 AM

> To: Raghava Aditya Renukunta

> <RaghavaAditya.Renukunta@microsemi.com>; jejb@linux.vnet.ibm.com;

> martin.petersen@oracle.com; linux-scsi@vger.kernel.org

> Cc: Scott Benesh <scott.benesh@microsemi.com>; dl-esc-Aacraid Linux

> Driver <aacraid@microsemi.com>; Tom White

> <tom.white@microsemi.com>; dougmill@linux.vnet.ibm.com

> Subject: Re: [PATCH 03/29] scsi: aacraid: Fix hang in kdump

> 

> EXTERNAL EMAIL

> 

> 

> On 12/21/2017 03:33 PM, Raghava Aditya Renukunta wrote:

> > Driver attempts to perform a device scan and device add after coming out

> > of reset. At times when the kdump kernel loads and it tries to perform

> > eh recovery, the device scan hangs since its commands are blocked

> because

> > of the eh recovery. This should have shown up in normal eh recovery path

> > (Should have been obvious)

> >

> > Remove the code that performs scanning.I can live without the rescanning

> > support in the stable kernels but a hanging kdump/eh recovery needs to

> be

> > fixed.

> >

> > Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after

> controller reset)

> > Cc: <stable@vger.kernel.org>

> > Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

> 

> (Sorry in advance for flooding the thread heheh)

> I guess it'd be more appropriate to:

> 

> Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com>

> 

> Although I've tested it, Doug isolated the race condition based on code

> analysis...


Thank you pointing that out, I will fix it in the next iteration.

Regards,
Raghava Aditya

> Thanks,

> 

> 

> Guilherme

> 

> > Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

> > Fixes: a2d0321dd532901e (scsi: aacraid: Reload offlined drives after

> controller reset)

> > Signed-off-by: Raghava Aditya Renukunta

> <RaghavaAditya.Renukunta@microsemi.com>

> > ---

> >  drivers/scsi/aacraid/commsup.c | 9 +--------

> >  1 file changed, 1 insertion(+), 8 deletions(-)

> >

> > diff --git a/drivers/scsi/aacraid/commsup.c

> b/drivers/scsi/aacraid/commsup.c

> > index 525a652..ffbfd04 100644

> > --- a/drivers/scsi/aacraid/commsup.c

> > +++ b/drivers/scsi/aacraid/commsup.c

> > @@ -1672,14 +1672,7 @@ static int _aac_reset_adapter(struct aac_dev

> *aac, int forced, u8 reset_type)

> >  out:

> >       aac->in_reset = 0;

> >       scsi_unblock_requests(host);

> > -     /*

> > -      * Issue bus rescan to catch any configuration that might have

> > -      * occurred

> > -      */

> > -     if (!retval) {

> > -             dev_info(&aac->pdev->dev, "Issuing bus rescan\n");

> > -             scsi_scan_host(host);

> > -     }

> > +

> >       if (jafo) {

> >               spin_lock_irq(host->host_lock);

> >       }

> >
diff mbox

Patch

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 525a652..ffbfd04 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1672,14 +1672,7 @@  static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type)
 out:
 	aac->in_reset = 0;
 	scsi_unblock_requests(host);
-	/*
-	 * Issue bus rescan to catch any configuration that might have
-	 * occurred
-	 */
-	if (!retval) {
-		dev_info(&aac->pdev->dev, "Issuing bus rescan\n");
-		scsi_scan_host(host);
-	}
+
 	if (jafo) {
 		spin_lock_irq(host->host_lock);
 	}