diff mbox

scsi: restart list search after unlock in scsi_remove_target

Message ID 20151019143546.GA7486@lst.de (mailing list archive)
State New, archived
Headers show

Commit Message

Christoph Hellwig Oct. 19, 2015, 2:35 p.m. UTC
When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us.  Without this
we can get stuck in an endless loop.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bart Van Assche Oct. 19, 2015, 3:36 p.m. UTC | #1
On 10/19/2015 07:35 AM, Christoph Hellwig wrote:
> When dropping a lock while iterating a list we must restart the search
> as other threads could have manipulated the list under us.  Without this
> we can get stuck in an endless loop.

Hello Christoph,

Thanks for looking into this. However, I think we need a motivation in 
the patch description why this patch does not reintroduce the soft 
lockup documented in patch "scsi_remove_target: fix softlockup 
regression on hot remove" (commit bc3f02a795d3).

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 19, 2015, 3:56 p.m. UTC | #2
On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote:
> Thanks for looking into this. However, I think we need a motivation in the 
> patch description why this patch does not reintroduce the soft lockup 
> documented in patch "scsi_remove_target: fix softlockup regression on hot 
> remove" (commit bc3f02a795d3).

Interesting.  I tried to find the original report and what state
changes would cause an endless loop here.  Dan, do you remember any
details about this bug report?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Oct. 19, 2015, 4:01 p.m. UTC | #3
On Mon, 2015-10-19 at 17:56 +0200, Christoph Hellwig wrote:
> On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote:
> > Thanks for looking into this. However, I think we need a motivation in the 
> > patch description why this patch does not reintroduce the soft lockup 
> > documented in patch "scsi_remove_target: fix softlockup regression on hot 
> > remove" (commit bc3f02a795d3).
> 
> Interesting.  I tried to find the original report and what state
> changes would cause an endless loop here.

The original report is this one:

http://thread.gmane.org/gmane.linux.kernel/1348679

James

>   Dan, do you remember any
> details about this bug report?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Oct. 19, 2015, 5:21 p.m. UTC | #4
On Mon, Oct 19, 2015 at 8:56 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote:
>> Thanks for looking into this. However, I think we need a motivation in the
>> patch description why this patch does not reintroduce the soft lockup
>> documented in patch "scsi_remove_target: fix softlockup regression on hot
>> remove" (commit bc3f02a795d3).
>
> Interesting.  I tried to find the original report and what state
> changes would cause an endless loop here.  Dan, do you remember any
> details about this bug report?

I believe the issue I was seeing back then might have been fixed or at
least modulated by "f2495e228fce [SCSI] dual scan thread bug fix"
which came a few years later.  The original problem was hot-remove
racing hot-add and that scsi_target_reap() was not guaranteed to
advance the state of the target if it was in the process of being
scanned when a removal event arrived.  However the comment in that
change:

+       /*
+        * if we get here and the target is still in the CREATED state that
+        * means it was allocated but never made visible (because a scan
+        * turned up no LUNs), so don't call device_del() on it.
+        */

...is not what I was seeing.  The target was in the CREATED state
because it had not yet completed the initial scan before tear down was
initiated.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Thumshirn Oct. 26, 2015, 8:35 a.m. UTC | #5
On Mon, 2015-10-19 at 16:35 +0200, Christoph Hellwig wrote:
> When dropping a lock while iterating a list we must restart the
> search
> as other threads could have manipulated the list under us.  Without
> this
> we can get stuck in an endless loop.
> 
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index b333389f..d3b34d8 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1158,31 +1158,23 @@ static void __scsi_remove_target(struct
> scsi_target *starget)
>  void scsi_remove_target(struct device *dev)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(dev->parent);
> -	struct scsi_target *starget, *last = NULL;
> +	struct scsi_target *starget;
>  	unsigned long flags;
>  
> -	/* remove targets being careful to lookup next entry before
> -	 * deleting the last
> -	 */
> +restart:
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_for_each_entry(starget, &shost->__targets, siblings) {
>  		if (starget->state == STARGET_DEL)
>  			continue;
>  		if (starget->dev.parent == dev || &starget->dev ==
> dev) {
> -			/* assuming new targets arrive at the end */
>  			kref_get(&starget->reap_ref);
>  			spin_unlock_irqrestore(shost->host_lock,
> flags);
> -			if (last)
> -				scsi_target_reap(last);
> -			last = starget;
>  			__scsi_remove_target(starget);
> -			spin_lock_irqsave(shost->host_lock, flags);
> +			scsi_target_reap(starget);
> +			goto restart;
>  		}
>  	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> -
> -	if (last)
> -		scsi_target_reap(last);
>  }
>  EXPORT_SYMBOL(scsi_remove_target);

Hi Christoph,

I haven't heard anything from the original reporter of the lockup but
my test's went all O.K., so 

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" 
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Thumshirn Oct. 27, 2015, 4:33 p.m. UTC | #6
On Mon, 2015-10-19 at 16:35 +0200, Christoph Hellwig wrote:
> When dropping a lock while iterating a list we must restart the
> search
> as other threads could have manipulated the list under us.  Without
> this
> we can get stuck in an endless loop.
> 
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I have an official acknowledge from the original author that a 48h long
term test on their side didn't reproduce the lockup as well.

James can we please get the patch included in 4.4 and stable?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b333389f..d3b34d8 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1158,31 +1158,23 @@  static void __scsi_remove_target(struct scsi_target *starget)
 void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
-	struct scsi_target *starget, *last = NULL;
+	struct scsi_target *starget;
 	unsigned long flags;
 
-	/* remove targets being careful to lookup next entry before
-	 * deleting the last
-	 */
+restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(starget, &shost->__targets, siblings) {
 		if (starget->state == STARGET_DEL)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			/* assuming new targets arrive at the end */
 			kref_get(&starget->reap_ref);
 			spin_unlock_irqrestore(shost->host_lock, flags);
-			if (last)
-				scsi_target_reap(last);
-			last = starget;
 			__scsi_remove_target(starget);
-			spin_lock_irqsave(shost->host_lock, flags);
+			scsi_target_reap(starget);
+			goto restart;
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-
-	if (last)
-		scsi_target_reap(last);
 }
 EXPORT_SYMBOL(scsi_remove_target);