Message ID | 1494059467.386778.1470762767417.JavaMail.zimbra@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 08/09/2016 10:12 AM, Laurence Oberman wrote: > I was talking about this patch > > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost) > restart: > spin_lock_irqsave(shost->host_lock, flags); > list_for_each_entry(sdev, &shost->__devices, siblings) { > - if (sdev->sdev_state == SDEV_DEL) > + if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > __scsi_remove_device(sdev); > + scsi_device_put(sdev); > goto restart; > } > spin_unlock_irqrestore(shost->host_lock, flags); Hello Laurence, Did you run your tests with that patch applied? If so, it would help if you could rerun your tests without that patch. If the above patch makes a difference it means that it can happen that __scsi_remove_device() does not change the device state into SDEV_DEL. That's a bug and we need to know whether or not __scsi_remove_device() behaves correctly. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > To: "Laurence Oberman" <loberman@redhat.com> > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn" > <jthumshirn@suse.de> > Sent: Tuesday, August 9, 2016 1:16:52 PM > Subject: Re: [dm-devel] dm-mq and end_clone_request() > > On 08/09/2016 10:12 AM, Laurence Oberman wrote: > > I was talking about this patch > > > > --- a/drivers/scsi/scsi_scan.c > > +++ b/drivers/scsi/scsi_scan.c > > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost) > > restart: > > spin_lock_irqsave(shost->host_lock, flags); > > list_for_each_entry(sdev, &shost->__devices, siblings) { > > - if (sdev->sdev_state == SDEV_DEL) > > + if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) > > < 0) > > continue; > > spin_unlock_irqrestore(shost->host_lock, flags); > > __scsi_remove_device(sdev); > > + scsi_device_put(sdev); > > goto restart; > > } > > spin_unlock_irqrestore(shost->host_lock, flags); > > Hello Laurence, > > Did you run your tests with that patch applied? If so, it would help if > you could rerun your tests without that patch. If the above patch makes > a difference it means that it can happen that __scsi_remove_device() > does not change the device state into SDEV_DEL. That's a bug and we need > to know whether or not __scsi_remove_device() behaves correctly. > > Thanks, > > Bart. > Yes Sir, I ran all yesterdays tests on your kernel with that patch applied. Of course it may well just be luck/coincidence that the host delete race is no longer happening so I agree we need to re-run the tests so I will revert and re-run. I will probably only get back to you tomorrow with the results. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Laurence Oberman" <loberman@redhat.com> > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn" > <jthumshirn@suse.de> > Sent: Tuesday, August 9, 2016 1:21:15 PM > Subject: Re: [dm-devel] dm-mq and end_clone_request() > > > > ----- Original Message ----- > > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > > To: "Laurence Oberman" <loberman@redhat.com> > > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" > > <snitzer@redhat.com>, "Johannes Thumshirn" > > <jthumshirn@suse.de> > > Sent: Tuesday, August 9, 2016 1:16:52 PM > > Subject: Re: [dm-devel] dm-mq and end_clone_request() > > > > On 08/09/2016 10:12 AM, Laurence Oberman wrote: > > > I was talking about this patch > > > > > > --- a/drivers/scsi/scsi_scan.c > > > +++ b/drivers/scsi/scsi_scan.c > > > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost) > > > restart: > > > spin_lock_irqsave(shost->host_lock, flags); > > > list_for_each_entry(sdev, &shost->__devices, siblings) { > > > - if (sdev->sdev_state == SDEV_DEL) > > > + if (sdev->sdev_state == SDEV_DEL || > > > scsi_device_get(sdev) > > > < 0) > > > continue; > > > spin_unlock_irqrestore(shost->host_lock, flags); > > > __scsi_remove_device(sdev); > > > + scsi_device_put(sdev); > > > goto restart; > > > } > > > spin_unlock_irqrestore(shost->host_lock, flags); > > > > Hello Laurence, > > > > Did you run your tests with that patch applied? If so, it would help if > > you could rerun your tests without that patch. If the above patch makes > > a difference it means that it can happen that __scsi_remove_device() > > does not change the device state into SDEV_DEL. That's a bug and we need > > to know whether or not __scsi_remove_device() behaves correctly. > > > > Thanks, > > > > Bart. > > > Yes Sir, I ran all yesterdays tests on your kernel with that patch applied. > Of course it may well just be luck/coincidence that the host delete race is > no longer happening > so I agree we need to re-run the tests so I will revert and re-run. > I will probably only get back to you tomorrow with the results. > > Thanks > Laurence > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hello Bart I only just got time now to revert that patch and build a kernel. Will test this tonight and let you know if I am back to seeing panics sporadically without the patch. As already mentioned, this is a different configuration to what I had when I was able to reproduce the panic. This means the lack of hitting this stack trace and panic may turn out to have nothing to do with the patch I applied. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Laurence Oberman" <loberman@redhat.com> > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn" > <jthumshirn@suse.de> > Sent: Wednesday, August 10, 2016 5:38:16 PM > Subject: Re: [dm-devel] dm-mq and end_clone_request() > > > > ----- Original Message ----- > > From: "Laurence Oberman" <loberman@redhat.com> > > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" > > <snitzer@redhat.com>, "Johannes Thumshirn" > > <jthumshirn@suse.de> > > Sent: Tuesday, August 9, 2016 1:21:15 PM > > Subject: Re: [dm-devel] dm-mq and end_clone_request() > > > > > > > > ----- Original Message ----- > > > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > > > To: "Laurence Oberman" <loberman@redhat.com> > > > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" > > > <snitzer@redhat.com>, "Johannes Thumshirn" > > > <jthumshirn@suse.de> > > > Sent: Tuesday, August 9, 2016 1:16:52 PM > > > Subject: Re: [dm-devel] dm-mq and end_clone_request() > > > > > > On 08/09/2016 10:12 AM, Laurence Oberman wrote: > > > > I was talking about this patch > > > > > > > > --- a/drivers/scsi/scsi_scan.c > > > > +++ b/drivers/scsi/scsi_scan.c > > > > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost) > > > > restart: > > > > spin_lock_irqsave(shost->host_lock, flags); > > > > list_for_each_entry(sdev, &shost->__devices, siblings) { > > > > - if (sdev->sdev_state == SDEV_DEL) > > > > + if (sdev->sdev_state == SDEV_DEL || > > > > scsi_device_get(sdev) > > > > < 0) > > > > continue; > > > > spin_unlock_irqrestore(shost->host_lock, flags); > > > > __scsi_remove_device(sdev); > > > > + scsi_device_put(sdev); > > > > goto restart; > > > > } > > > > spin_unlock_irqrestore(shost->host_lock, flags); > > > > > > Hello Laurence, > > > > > > Did you run your tests with that patch applied? If so, it would help if > > > you could rerun your tests without that patch. If the above patch makes > > > a difference it means that it can happen that __scsi_remove_device() > > > does not change the device state into SDEV_DEL. That's a bug and we need > > > to know whether or not __scsi_remove_device() behaves correctly. > > > > > > Thanks, > > > > > > Bart. > > > > > Yes Sir, I ran all yesterdays tests on your kernel with that patch applied. > > Of course it may well just be luck/coincidence that the host delete race is > > no longer happening > > so I agree we need to re-run the tests so I will revert and re-run. > > I will probably only get back to you tomorrow with the results. > > > > Thanks > > Laurence > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Hello Bart > > I only just got time now to revert that patch and build a kernel. > Will test this tonight and let you know if I am back to seeing panics > sporadically without the patch. > As already mentioned, this is a different configuration to what I had when I > was able to reproduce the panic. > This means the lack of hitting this stack trace and panic may turn out to > have nothing to do with the patch I applied. > > Thanks > Laurence > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hello Bart I can no longer reproduce the stack even with the patch reverted so its behaving as you expected and the patch is as you already said, not valid. I ran about 30 fio tests with your kernel and multiple host deletions and and did experience only one hard fio error. My tests now produce the same results as you are seeing. The single fio errors was with many more executions of the test so its not easy to get these fio errors. Away from tomorrow on vacation for 10 days Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost) restart: spin_lock_irqsave(shost->host_lock, flags); list_for_each_entry(sdev, &shost->__devices, siblings) { - if (sdev->sdev_state == SDEV_DEL) + if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0) continue; spin_unlock_irqrestore(shost->host_lock, flags); __scsi_remove_device(sdev); + scsi_device_put(sdev); goto restart; } spin_unlock_irqrestore(shost->host_lock, flags);