Message ID | 20200113033035.50875-1-ejh@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | usb: uas: fix a plug & unplug racing | expand |
Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: Hi, first, thank you for diagnosing this unusual bug. > When a uas disk is plugged into an external hub, uas_probe() > will be called by the hub thread to do the probe. It will > first create a SCSI host and then do the scan for this host. > During the scan, it will probe the LUN using SCSI INQUERY command > which will be packed in the URB and submitted to uas disk. > > There might be a chance that this external hub with uas disk > attached is unplugged during the scan. In this case, uas driver > will fail to submit the URB (due to the NOTATTACHED state of uas > device) and try to put this SCSI command back to request queue > waiting for next chance to run. Isn't that the bug? A command to a detached device should fail. Could you please elaborate? This issue would not be limited to uas. > --- > drivers/usb/storage/uas.c | 31 ++++++++++++++++++++++++++++++- > 1 file changed, 30 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c > index 95bba3ba6ac6..d367718fef45 100644 > --- a/drivers/usb/storage/uas.c > +++ b/drivers/usb/storage/uas.c > @@ -42,9 +42,11 @@ struct uas_dev_info { > unsigned cmd_pipe, status_pipe, data_in_pipe, data_out_pipe; > unsigned use_streams:1; > unsigned shutdown:1; > + unsigned scan_pending:1; > struct scsi_cmnd *cmnd[MAX_CMNDS]; > spinlock_t lock; > struct work_struct work; > + struct work_struct scan_work; /* for async scanning */ > }; > > enum { > @@ -114,6 +116,20 @@ static void uas_do_work(struct work_struct *work) > spin_unlock_irqrestore(&devinfo->lock, flags); > } > > +static void uas_scan_work(struct work_struct *work) > +{ > + struct uas_dev_info *devinfo = > + container_of(work, struct uas_dev_info, scan_work); > + struct Scsi_Host *shost = usb_get_intfdata(devinfo->intf); > + > + dev_dbg(&devinfo->intf->dev, "starting scan\n"); > + scsi_scan_host(shost); > + dev_dbg(&devinfo->intf->dev, "scan complete\n"); > + > + usb_autopm_put_interface(devinfo->intf); scsi_scan_host() does runtime PM on the SCSI level. There is no need for us to duplicate that. > + devinfo->scan_pending = 0; > +} > + > static void uas_add_work(struct uas_cmd_info *cmdinfo) > { > struct scsi_pointer *scp = (void *)cmdinfo; > @@ -982,6 +998,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id) > init_usb_anchor(&devinfo->data_urbs); > spin_lock_init(&devinfo->lock); > INIT_WORK(&devinfo->work, uas_do_work); > + INIT_WORK(&devinfo->scan_work, uas_scan_work); > > result = uas_configure_endpoints(devinfo); > if (result) > @@ -998,7 +1015,11 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id) > if (result) > goto free_streams; > > - scsi_scan_host(shost); > + /* Submit the delayed_work for SCSI-device scanning */ > + usb_autopm_get_interface_no_resume(intf); > + devinfo->scan_pending = 1; > + schedule_work(&devinfo->scan_work); > + > return result; > Is this approach really necessary > free_streams: > @@ -1166,6 +1187,14 @@ static void uas_disconnect(struct usb_interface *intf) > usb_kill_anchored_urbs(&devinfo->data_urbs); > uas_zap_pending(devinfo, DID_NO_CONNECT); > > + /* > + * Prevent SCSI scanning (if it hasn't started yet) > + * or wait for the SCSI-scanning routine to stop. > + */ > + cancel_work_sync(&devinfo->scan_work); > + if (devinfo->scan_pending) > + usb_autopm_put_interface_no_suspend(intf); It is not enough to do this in disconnect() We are guarded against runtime PM, but not against system sleep. You'd need to handle this in suspend() and resume(), too. And, unfortunately, the device could be reset from another interface. Is this approach really necessary? It solves the problem, but if you want to get it right, the corner cases are ugly. Regards Oliver
On Mon, 13 Jan 2020, Oliver Neukum wrote: > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > Hi, > > first, thank you for diagnosing this unusual bug. > > > When a uas disk is plugged into an external hub, uas_probe() > > will be called by the hub thread to do the probe. It will > > first create a SCSI host and then do the scan for this host. > > During the scan, it will probe the LUN using SCSI INQUERY command > > which will be packed in the URB and submitted to uas disk. > > > > There might be a chance that this external hub with uas disk > > attached is unplugged during the scan. In this case, uas driver > > will fail to submit the URB (due to the NOTATTACHED state of uas > > device) and try to put this SCSI command back to request queue > > waiting for next chance to run. > > Isn't that the bug? A command to a detached device should fail. > Could you please elaborate? This issue would not be limited to uas. > It is not enough to do this in disconnect() > We are guarded against runtime PM, but not against system sleep. > You'd need to handle this in suspend() and resume(), too. > And, unfortunately, the device could be reset from another interface. This is more or less a copy of the way usb-storage works. That driver doesn't have any protection here against suspend/resume or reset, and it's not clear that any such protection is needed. > Is this approach really necessary? It solves the problem, but > if you want to get it right, the corner cases are ugly. Minimizing the amount of time spent running in the context of the hub thread is generally a good thing to do. Alan Stern
Oliver Neukum wrote: > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > Isn't that the bug? A command to a detached device should fail. > Could you please elaborate? This issue would not be limited to uas. > In the case I mentioned, the hub thread of external hub running uas_probe() will get stuck waiting for the completion of scsi scan. The scsi scan will try to probe a single LUN using a SCSI INQUIRY. If the external hub has been unplugged before LUN probe, the device state of uas device will be set to USB_STATE_NOTATTACHED by the root hub thread. So, all the following calls to usb_submit_urb() in uas driver will return -NODEV, and accordingly uas_queuecommand_lck() will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). scsi_request_fn() then puts this scsi command back into request queue. Because this scsi device is just created and during LUN probe process, this scsi command is the only one in the request queue. So, it will be picked up soon and dispatched to uas driver again. This cycle will continue until uas_disconnect() is called and its "resetting" flag is set. However, the hub thread of external hub still got stuck waiting for the completion of this scsi command, and may not be able to run uas_disconnect(). A deadlock happened. > > +static void uas_scan_work(struct work_struct *work) { > > + struct uas_dev_info *devinfo = > > + container_of(work, struct uas_dev_info, scan_work); > > + struct Scsi_Host *shost = usb_get_intfdata(devinfo->intf); > > + > > + dev_dbg(&devinfo->intf->dev, "starting scan\n"); > > + scsi_scan_host(shost); > > + dev_dbg(&devinfo->intf->dev, "scan complete\n"); > > + > > + usb_autopm_put_interface(devinfo->intf); > > scsi_scan_host() does runtime PM on the SCSI level. There is no need for us to > duplicate that. > In my opinion, if scsi_scan_host() will be run asynchronously, this interface needs to be guarded against runtime PM between uas_probe() & uas_scan_work(). > > It is not enough to do this in disconnect() We are guarded against runtime PM, > but not against system sleep. > You'd need to handle this in suspend() and resume(), too. > And, unfortunately, the device could be reset from another interface. > As Allen said, this is a copy of the way usb-storage works and I did not see any related protection in usb-storage. But I will do more check on it. Thanks, EJ --nvpublic
Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu: > Oliver Neukum wrote: > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > > > Isn't that the bug? A command to a detached device should fail. > > Could you please elaborate? This issue would not be limited to uas. > > > > In the case I mentioned, the hub thread of external hub running > uas_probe() will get stuck waiting for the completion of scsi scan. > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY. > If the external hub has been unplugged before LUN probe, the device > state of uas device will be set to USB_STATE_NOTATTACHED by the > root hub thread. So, all the following calls to usb_submit_urb() in > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). And that looks like the root cause. The queue isn't busy. It is dead. > scsi_request_fn() then puts this scsi command back into request queue. > Because this scsi device is just created and during LUN probe process, > this scsi command is the only one in the request queue. So, it will be picked > up soon and dispatched to uas driver again. This cycle will continue until > uas_disconnect() is called and its "resetting" flag is set. However, the > hub thread of external hub still got stuck waiting for the completion of > this scsi command, and may not be able to run uas_disconnect(). > A deadlock happened. I see. But we are working around insufficient error reporting in the SCSI midlayer. > > > +static void uas_scan_work(struct work_struct *work) { > > > + struct uas_dev_info *devinfo = > > > + container_of(work, struct uas_dev_info, scan_work); > > > + struct Scsi_Host *shost = usb_get_intfdata(devinfo->intf); > > > + > > > + dev_dbg(&devinfo->intf->dev, "starting scan\n"); > > > + scsi_scan_host(shost); > > > + dev_dbg(&devinfo->intf->dev, "scan complete\n"); > > > + > > > + usb_autopm_put_interface(devinfo->intf); > > > > scsi_scan_host() does runtime PM on the SCSI level. There is no need for us to > > duplicate that. > > > > In my opinion, if scsi_scan_host() will be run asynchronously, this interface > needs to be guarded against runtime PM between uas_probe() & uas_scan_work(). Yes it does. But it has a child, the SCSI host, which has an elevated count. It is already guarded. Regards Oliver
On Tue, 14 Jan 2020, Oliver Neukum wrote: > Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu: > > Oliver Neukum wrote: > > > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > > > > > Isn't that the bug? A command to a detached device should fail. > > > Could you please elaborate? This issue would not be limited to uas. > > > > > > > In the case I mentioned, the hub thread of external hub running > > uas_probe() will get stuck waiting for the completion of scsi scan. > > > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY. > > If the external hub has been unplugged before LUN probe, the device > > state of uas device will be set to USB_STATE_NOTATTACHED by the > > root hub thread. So, all the following calls to usb_submit_urb() in > > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() > > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). > > And that looks like the root cause. The queue isn't busy. > It is dead. No. The discussion has gotten a little confused. EJ's point is that if SCSI scanning takes place in the context of the hub worker thread, then that thread won't be available to process a disconnect notification. The device will be unplugged, but the kernel won't realize it until the SCSI scanning is finished. > > scsi_request_fn() then puts this scsi command back into request queue. > > Because this scsi device is just created and during LUN probe process, > > this scsi command is the only one in the request queue. So, it will be picked > > up soon and dispatched to uas driver again. This cycle will continue until > > uas_disconnect() is called and its "resetting" flag is set. However, the > > hub thread of external hub still got stuck waiting for the completion of > > this scsi command, and may not be able to run uas_disconnect(). > > A deadlock happened. > > I see. But we are working around insufficient error reporting in the > SCSI midlayer. No, the error reporting there is correct. URBs will complete with errors like -EPROTO but no other indication that the device is gone, so the midlayer believes that a retry is appropriate. Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors. Alan Stern
Am Dienstag, den 14.01.2020, 10:04 -0500 schrieb Alan Stern: > On Tue, 14 Jan 2020, Oliver Neukum wrote: > > > Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu: > > > Oliver Neukum wrote: > > > > > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > > > > > > > Isn't that the bug? A command to a detached device should fail. > > > > Could you please elaborate? This issue would not be limited to uas. > > > > > > > > > > In the case I mentioned, the hub thread of external hub running > > > uas_probe() will get stuck waiting for the completion of scsi scan. > > > > > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY. > > > If the external hub has been unplugged before LUN probe, the device > > > state of uas device will be set to USB_STATE_NOTATTACHED by the > > > root hub thread. So, all the following calls to usb_submit_urb() in > > > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() > > > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). > > > > And that looks like the root cause. The queue isn't busy. > > It is dead. > > No. The discussion has gotten a little confused. EJ's point is that > if SCSI scanning takes place in the context of the hub worker thread, > then that thread won't be available to process a disconnect > notification. The device will be unplugged, but the kernel won't > realize it until the SCSI scanning is finished. OK, I think we have two possible code paths at least First: uas_queuecommand_lck() -> uas_submit_urbs() -> -ENODEV -> return SCSI_MLQUEUE_DEVICE_BUSY That looks wrong to me. Second: The submission actually works and we eventually terminate the URB with an error. In that case nothing happens and eventually SCSI core retries, times out and does error handling. > > > scsi_request_fn() then puts this scsi command back into request queue. > > > Because this scsi device is just created and during LUN probe process, > > > this scsi command is the only one in the request queue. So, it will be picked > > > up soon and dispatched to uas driver again. This cycle will continue until > > > uas_disconnect() is called and its "resetting" flag is set. However, the > > > hub thread of external hub still got stuck waiting for the completion of > > > this scsi command, and may not be able to run uas_disconnect(). > > > A deadlock happened. > > > > I see. But we are working around insufficient error reporting in the > > SCSI midlayer. > > No, the error reporting there is correct. URBs will complete with > errors like -EPROTO but no other indication that the device is gone, so > the midlayer believes that a retry is appropriate. > > Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors. They could happen due to bad cables. Now, another work queue would solve the second error case, but I think the first error case still exists and we would paper over it. Regards Oliver
On Wed, 15 Jan 2020, Oliver Neukum wrote: > Am Dienstag, den 14.01.2020, 10:04 -0500 schrieb Alan Stern: > > On Tue, 14 Jan 2020, Oliver Neukum wrote: > > > > > Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu: > > > > Oliver Neukum wrote: > > > > > > > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > > > > > > > > > Isn't that the bug? A command to a detached device should fail. > > > > > Could you please elaborate? This issue would not be limited to uas. > > > > > > > > > > > > > In the case I mentioned, the hub thread of external hub running > > > > uas_probe() will get stuck waiting for the completion of scsi scan. > > > > > > > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY. > > > > If the external hub has been unplugged before LUN probe, the device > > > > state of uas device will be set to USB_STATE_NOTATTACHED by the > > > > root hub thread. So, all the following calls to usb_submit_urb() in > > > > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() > > > > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). > > > > > > And that looks like the root cause. The queue isn't busy. > > > It is dead. > > > > No. The discussion has gotten a little confused. EJ's point is that > > if SCSI scanning takes place in the context of the hub worker thread, > > then that thread won't be available to process a disconnect > > notification. The device will be unplugged, but the kernel won't > > realize it until the SCSI scanning is finished. > > OK, I think we have two possible code paths at least > > First: > uas_queuecommand_lck() -> uas_submit_urbs() -> -ENODEV -> return > SCSI_MLQUEUE_DEVICE_BUSY > > That looks wrong to me. Agreed. Although it isn't the subject of this patch. > Second: > The submission actually works and we eventually terminate the URB > with an error. In that case nothing happens and eventually SCSI core > retries, times out and does error handling. > > > > > scsi_request_fn() then puts this scsi command back into request queue. > > > > Because this scsi device is just created and during LUN probe process, > > > > this scsi command is the only one in the request queue. So, it will be picked > > > > up soon and dispatched to uas driver again. This cycle will continue until > > > > uas_disconnect() is called and its "resetting" flag is set. However, the > > > > hub thread of external hub still got stuck waiting for the completion of > > > > this scsi command, and may not be able to run uas_disconnect(). > > > > A deadlock happened. > > > > > > I see. But we are working around insufficient error reporting in the > > > SCSI midlayer. > > > > No, the error reporting there is correct. URBs will complete with > > errors like -EPROTO but no other indication that the device is gone, so > > the midlayer believes that a retry is appropriate. > > > > Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors. > > They could happen due to bad cables. > > Now, another work queue would solve the second error case, but I think > the first error case still exists and we would paper over it. Then you do agree with this patch, and you or EJ ought to create another patch to handle the first error case. Alan Stern
Oliver Neukum wrote: > Am Dienstag, den 14.01.2020, 10:04 -0500 schrieb Alan Stern: > > > > No. The discussion has gotten a little confused. EJ's point is that > > if SCSI scanning takes place in the context of the hub worker thread, > > then that thread won't be available to process a disconnect > > notification. The device will be unplugged, but the kernel won't > > realize it until the SCSI scanning is finished. > > OK, I think we have two possible code paths at least > > First: > uas_queuecommand_lck() -> uas_submit_urbs() -> -ENODEV -> return > SCSI_MLQUEUE_DEVICE_BUSY > > That looks wrong to me. > > Second: > The submission actually works and we eventually terminate the URB with an > error. In that case nothing happens and eventually SCSI core retries, times out > and does error handling. > Actually, the second case might happen when we connect uas disk directly to root hub. In this case, the root hub thread got stuck waiting for the completion of the scsi command. It might not be able to handle the disconnection event and set uas disk state to USB_STATE_NOTATTACHED accordingly. So, the URB submission in uas driver works but nothing happens, and scsi layer will time out this operation and retry it several times. Eventually the root hub thread will continue to run when it got the result from scsi layer. But the whole process might take about 60 seconds. That's another reason why I would like to run scsi_scan_host() asynchronously and keep the hub thread free. > > In my opinion, if scsi_scan_host() will be run asynchronously, this > > interface needs to be guarded against runtime PM between uas_probe() & uas_scan_work(). > Yes it does. But it has a child, the SCSI host, which has an elevated count. It is already guarded. I just checked the code, and the reference count will be incremented in scsi_scan_host(), precisely speaking, in scsi_autopm_get_host(). So, I still think we need to manually add reference count of interface here. Please correct me if there is anything wrong. Thanks, EJ ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------
Am Mittwoch, den 15.01.2020, 15:54 +0000 schrieb EJ Hsu: Hi, > That's another reason why I would like to run scsi_scan_host() asynchronously > and keep the hub thread free. Very well. I must say I don't like it, but I have no good alternative. > > > In my opinion, if scsi_scan_host() will be run asynchronously, this > > > interface needs to be guarded against runtime PM between uas_probe() & uas_scan_work(). > > > > Yes it does. But it has a child, the SCSI host, which has an elevated count. It is already guarded. > > I just checked the code, and the reference count will be incremented in scsi_scan_host(), > precisely speaking, in scsi_autopm_get_host(). > So, I still think we need to manually add reference count of interface here. > Please correct me if there is anything wrong. AFAICT the generic power model will not suspend a parent while a child is active. The includes SCSI children of USB parents. So I think we are safe. Could you resubmit your patch without the PM counter manipulation and we can fix any issue that may exist in theory later? Regards Oliver
Oliver Neukum wrote: > > I just checked the code, and the reference count will be incremented > > in scsi_scan_host(), precisely speaking, in scsi_autopm_get_host(). > > So, I still think we need to manually add reference count of interface here. > > Please correct me if there is anything wrong. > > AFAICT the generic power model will not suspend a parent while a child is > active. The includes SCSI children of USB parents. So I think we are safe. Could > you resubmit your patch without the PM counter manipulation and we can fix > any issue that may exist in theory later? > The child_count of usb device should be incremented in the scsi_scan_host(). I can simulate this situation by forcibly enable the autosuspend of uas driver and add a delay before scsi_scan_host() is called in asynchronous way. But for now, as uas driver does not support autosuspend, it is indeed safe to remove the PM counter manipulation. If you have concern about it, it's OK to me to re-submit it. Thanks, EJ --nvpublic
EJ Hsu wrote: > Oliver Neukum wrote: > > > > I just checked the code, and the reference count will be incremented > > > in scsi_scan_host(), precisely speaking, in scsi_autopm_get_host(). > > > So, I still think we need to manually add reference count of interface here. > > > Please correct me if there is anything wrong. > > > > AFAICT the generic power model will not suspend a parent while a child > > is active. The includes SCSI children of USB parents. So I think we > > are safe. Could you resubmit your patch without the PM counter > > manipulation and we can fix any issue that may exist in theory later? > > > > The child_count of usb device should be incremented in the scsi_scan_host(). > I can simulate this situation by forcibly enable the autosuspend of uas driver > and add a delay before scsi_scan_host() is called in asynchronous way. > > But for now, as uas driver does not support autosuspend, it is indeed safe to > remove the PM counter manipulation. If you have concern about it, it's OK to > me to re-submit it. > By the way, if autosuspend will be enabled in uas driver in the future, the PM counter manipulation should be added before scsi_add_host(), just as MSC driver did. Otherwise, the child_count of usb device might be accidentally decremented. (the rpm "idle" message will be propagated from scsi host to usb interface and usb device) Thanks, EJ --nvpublic
diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index 95bba3ba6ac6..d367718fef45 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -42,9 +42,11 @@ struct uas_dev_info { unsigned cmd_pipe, status_pipe, data_in_pipe, data_out_pipe; unsigned use_streams:1; unsigned shutdown:1; + unsigned scan_pending:1; struct scsi_cmnd *cmnd[MAX_CMNDS]; spinlock_t lock; struct work_struct work; + struct work_struct scan_work; /* for async scanning */ }; enum { @@ -114,6 +116,20 @@ static void uas_do_work(struct work_struct *work) spin_unlock_irqrestore(&devinfo->lock, flags); } +static void uas_scan_work(struct work_struct *work) +{ + struct uas_dev_info *devinfo = + container_of(work, struct uas_dev_info, scan_work); + struct Scsi_Host *shost = usb_get_intfdata(devinfo->intf); + + dev_dbg(&devinfo->intf->dev, "starting scan\n"); + scsi_scan_host(shost); + dev_dbg(&devinfo->intf->dev, "scan complete\n"); + + usb_autopm_put_interface(devinfo->intf); + devinfo->scan_pending = 0; +} + static void uas_add_work(struct uas_cmd_info *cmdinfo) { struct scsi_pointer *scp = (void *)cmdinfo; @@ -982,6 +998,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id) init_usb_anchor(&devinfo->data_urbs); spin_lock_init(&devinfo->lock); INIT_WORK(&devinfo->work, uas_do_work); + INIT_WORK(&devinfo->scan_work, uas_scan_work); result = uas_configure_endpoints(devinfo); if (result) @@ -998,7 +1015,11 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id) if (result) goto free_streams; - scsi_scan_host(shost); + /* Submit the delayed_work for SCSI-device scanning */ + usb_autopm_get_interface_no_resume(intf); + devinfo->scan_pending = 1; + schedule_work(&devinfo->scan_work); + return result; free_streams: @@ -1166,6 +1187,14 @@ static void uas_disconnect(struct usb_interface *intf) usb_kill_anchored_urbs(&devinfo->data_urbs); uas_zap_pending(devinfo, DID_NO_CONNECT); + /* + * Prevent SCSI scanning (if it hasn't started yet) + * or wait for the SCSI-scanning routine to stop. + */ + cancel_work_sync(&devinfo->scan_work); + if (devinfo->scan_pending) + usb_autopm_put_interface_no_suspend(intf); + scsi_remove_host(shost); uas_free_streams(devinfo); scsi_host_put(shost);
When a uas disk is plugged into an external hub, uas_probe() will be called by the hub thread to do the probe. It will first create a SCSI host and then do the scan for this host. During the scan, it will probe the LUN using SCSI INQUERY command which will be packed in the URB and submitted to uas disk. There might be a chance that this external hub with uas disk attached is unplugged during the scan. In this case, uas driver will fail to submit the URB (due to the NOTATTACHED state of uas device) and try to put this SCSI command back to request queue waiting for next chance to run. In normal case, this cycle will terminate when hub thread gets disconnection event and calls into uas_disconnect() accordingly. But in this case, uas_disconnect() will not be called because hub thread of external hub gets stuck waiting for the completion of this SCSI command. A deadlock happened. In this fix, uas will call scsi_scan_host() asynchronously to avoid the blocking of hub thread. Signed-off-by: EJ Hsu <ejh@nvidia.com> --- drivers/usb/storage/uas.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)