diff mbox

[2/2] Restart list search after unlock in scsi_remove_target

Message ID 5633EA98.8050604@sandisk.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche Oct. 30, 2015, 10:09 p.m. UTC
When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us. Without this
we can get stuck in an endless loop.

This is a slightly modified version of a patch from Christoph Hellwig
(see also https://www.spinics.net/lists/linux-scsi/msg89416.html).

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: stable <stable@vger.kernel.org>
---
 drivers/scsi/scsi_sysfs.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

Comments

Bart Van Assche Nov. 4, 2015, 10:35 p.m. UTC | #1
On 10/30/2015 03:09 PM, Bart Van Assche wrote:
> When dropping a lock while iterating a list we must restart the search
> as other threads could have manipulated the list under us. Without this
> we can get stuck in an endless loop.
>
> This is a slightly modified version of a patch from Christoph Hellwig
> (see also https://www.spinics.net/lists/linux-scsi/msg89416.html).
>
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: stable <stable@vger.kernel.org>
> ---
>   drivers/scsi/scsi_sysfs.c | 16 ++++------------
>   1 file changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index b9fb61a..5a183d1 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1158,32 +1158,24 @@ static void __scsi_remove_target(struct scsi_target *starget)
>   void scsi_remove_target(struct device *dev)
>   {
>   	struct Scsi_Host *shost = dev_to_shost(dev->parent);
> -	struct scsi_target *starget, *last = NULL;
> +	struct scsi_target *starget;
>   	unsigned long flags;
>
> -	/* remove targets being careful to lookup next entry before
> -	 * deleting the last
> -	 */
> +restart:
>   	spin_lock_irqsave(shost->host_lock, flags);
>   	list_for_each_entry(starget, &shost->__targets, siblings) {
>   		if (starget->reaped)
>   			continue;
>   		if (starget->dev.parent == dev || &starget->dev == dev) {
> -			/* assuming new targets arrive at the end */
>   			kref_get(&starget->reap_ref);
>   			starget->reaped = true;
>   			spin_unlock_irqrestore(shost->host_lock, flags);
> -			if (last)
> -				scsi_target_reap(last);
> -			last = starget;
>   			__scsi_remove_target(starget);
> -			spin_lock_irqsave(shost->host_lock, flags);
> +			scsi_target_reap(starget);
> +			goto restart;
>   		}
>   	}
>   	spin_unlock_irqrestore(shost->host_lock, flags);
> -
> -	if (last)
> -		scsi_target_reap(last);
>   }
>   EXPORT_SYMBOL(scsi_remove_target);

(replying to my own e-mail)

Hello Christoph,

Is it OK for you if I mention you as author of this e-mail ?

Thanks,

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Nov. 4, 2015, 10:44 p.m. UTC | #2
On Wed, 2015-11-04 at 14:35 -0800, Bart Van Assche wrote:
> On 10/30/2015 03:09 PM, Bart Van Assche wrote:
> > When dropping a lock while iterating a list we must restart the search
> > as other threads could have manipulated the list under us. Without this
> > we can get stuck in an endless loop.
> >
> > This is a slightly modified version of a patch from Christoph Hellwig
> > (see also https://www.spinics.net/lists/linux-scsi/msg89416.html).
> >
> > Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > Cc: Johannes Thumshirn <jthumshirn@suse.de>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: stable <stable@vger.kernel.org>
> > ---
> >   drivers/scsi/scsi_sysfs.c | 16 ++++------------
> >   1 file changed, 4 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index b9fb61a..5a183d1 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -1158,32 +1158,24 @@ static void __scsi_remove_target(struct scsi_target *starget)
> >   void scsi_remove_target(struct device *dev)
> >   {
> >   	struct Scsi_Host *shost = dev_to_shost(dev->parent);
> > -	struct scsi_target *starget, *last = NULL;
> > +	struct scsi_target *starget;
> >   	unsigned long flags;
> >
> > -	/* remove targets being careful to lookup next entry before
> > -	 * deleting the last
> > -	 */
> > +restart:
> >   	spin_lock_irqsave(shost->host_lock, flags);
> >   	list_for_each_entry(starget, &shost->__targets, siblings) {
> >   		if (starget->reaped)
> >   			continue;
> >   		if (starget->dev.parent == dev || &starget->dev == dev) {
> > -			/* assuming new targets arrive at the end */
> >   			kref_get(&starget->reap_ref);
> >   			starget->reaped = true;
> >   			spin_unlock_irqrestore(shost->host_lock, flags);
> > -			if (last)
> > -				scsi_target_reap(last);
> > -			last = starget;
> >   			__scsi_remove_target(starget);
> > -			spin_lock_irqsave(shost->host_lock, flags);
> > +			scsi_target_reap(starget);
> > +			goto restart;
> >   		}
> >   	}
> >   	spin_unlock_irqrestore(shost->host_lock, flags);
> > -
> > -	if (last)
> > -		scsi_target_reap(last);
> >   }
> >   EXPORT_SYMBOL(scsi_remove_target);
> 
> (replying to my own e-mail)
> 
> Hello Christoph,
> 
> Is it OK for you if I mention you as author of this e-mail ?

Could you just both co-operate, especially since there's not much
difference between the patches.

The fundamental problem with this is how have the conditions that caused
us to move away from list restart:

commit bc3f02a795d3b4faa99d37390174be2a75d091bd
Author: Dan Williams <djbw@fb.com>
Date:   Tue Aug 28 22:12:10 2012 -0700

    [SCSI] scsi_remove_target: fix softlockup regression on hot remove

Which was triggered by this bug report

http://thread.gmane.org/gmane.linux.kernel/1348679

been mitigated?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 4, 2015, 11:20 p.m. UTC | #3
On 11/04/2015 02:44 PM, James Bottomley wrote:
> On Wed, 2015-11-04 at 14:35 -0800, Bart Van Assche wrote:
>> (replying to my own e-mail)
>>
>> Hello Christoph,
>>
>> Is it OK for you if I mention you as author of this e-mail ?
>
> Could you just both co-operate, especially since there's not much
> difference between the patches.

Hello James and Christoph,

Patch [2/2] is identical to the patch Christoph had posted except that 
it has been rebased on top of patch [1/2] of this series. I will be 
happy to step back if Christoph or someone else wants to take the lead 
here. The only reason I started looking into this is that there was no 
further progress on the original e-mail thread about this subject 
(http://thread.gmane.org/gmane.linux.kernel/2052359).

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 5, 2015, 8:51 a.m. UTC | #4
On Wed, Nov 04, 2015 at 02:35:40PM -0800, Bart Van Assche wrote:
> (replying to my own e-mail)
>
> Hello Christoph,
>
> Is it OK for you if I mention you as author of this e-mail ?

I don't care whom this fix is attributed to, but let's get it in:

Reviewed-by: Christoph Hellwig <hch@lst.de>

or

Signed-off-by: Christoph Hellwig <hch@lst.de>

at your preference :)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Nov. 5, 2015, 4:55 p.m. UTC | #5
On Wed, Nov 4, 2015 at 2:44 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
[..]
> The fundamental problem with this is how have the conditions that caused
> us to move away from list restart:
>
> commit bc3f02a795d3b4faa99d37390174be2a75d091bd
> Author: Dan Williams <djbw@fb.com>
> Date:   Tue Aug 28 22:12:10 2012 -0700
>
>     [SCSI] scsi_remove_target: fix softlockup regression on hot remove
>
> Which was triggered by this bug report
>
> http://thread.gmane.org/gmane.linux.kernel/1348679
>
> been mitigated?
>

I think it has because the problem that led to that report was the
fact that scsi_target_destroy() did not advance the target state, but
we changed that in f2495e228fce.

http://marc.info/?l=linux-scsi&m=144527527308725&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Nov. 5, 2015, 5:05 p.m. UTC | #6
On Thu, 2015-11-05 at 08:55 -0800, Dan Williams wrote:
> On Wed, Nov 4, 2015 at 2:44 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> [..]
> > The fundamental problem with this is how have the conditions that caused
> > us to move away from list restart:
> >
> > commit bc3f02a795d3b4faa99d37390174be2a75d091bd
> > Author: Dan Williams <djbw@fb.com>
> > Date:   Tue Aug 28 22:12:10 2012 -0700
> >
> >     [SCSI] scsi_remove_target: fix softlockup regression on hot remove
> >
> > Which was triggered by this bug report
> >
> > http://thread.gmane.org/gmane.linux.kernel/1348679
> >
> > been mitigated?
> >
> 
> I think it has because the problem that led to that report was the
> fact that scsi_target_destroy() did not advance the target state, but
> we changed that in f2495e228fce.
> 
> http://marc.info/?l=linux-scsi&m=144527527308725&w=2

Great, thanks, we'll put Christoph's version in then, because it can be
cc'd to stable without problems.  Bart, you can redo your state updates
on top of this because that's inessential to the current problem.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 16, 2015, 5:57 p.m. UTC | #7
Bart, can you resend your patch on top of 4.4-rc1?  I think we really
need it so we should get it into 4.4 and backport it to -stable.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b9fb61a..5a183d1 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1158,32 +1158,24 @@  static void __scsi_remove_target(struct scsi_target *starget)
 void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
-	struct scsi_target *starget, *last = NULL;
+	struct scsi_target *starget;
 	unsigned long flags;
 
-	/* remove targets being careful to lookup next entry before
-	 * deleting the last
-	 */
+restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(starget, &shost->__targets, siblings) {
 		if (starget->reaped)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			/* assuming new targets arrive at the end */
 			kref_get(&starget->reap_ref);
 			starget->reaped = true;
 			spin_unlock_irqrestore(shost->host_lock, flags);
-			if (last)
-				scsi_target_reap(last);
-			last = starget;
 			__scsi_remove_target(starget);
-			spin_lock_irqsave(shost->host_lock, flags);
+			scsi_target_reap(starget);
+			goto restart;
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-
-	if (last)
-		scsi_target_reap(last);
 }
 EXPORT_SYMBOL(scsi_remove_target);