diff mbox series

[v6] scsi: core: Wait until device is fully resumed before doing rescan

Message ID 20230602084956.1299001-1-kai.heng.feng@canonical.com (mailing list archive)
State Changes Requested
Headers show
Series [v6] scsi: core: Wait until device is fully resumed before doing rescan | expand

Commit Message

Kai-Heng Feng June 2, 2023, 8:49 a.m. UTC
During system resuming process, the resuming order is from top to down.
Namely, the ATA host is resumed before disks connected to it.

When an EH is scheduled while ATA host is resumed and disk device is
still suspended, the device_lock hold by scsi_rescan_device() is never
released so the dpm_resume() of the disk is blocked forerver, therefore
the system can never be resumed back.

That's because scsi_attach_vpd() is expecting the disk device is in
operational state, as it doesn't work on suspended device.

To avoid such deadlock, wait until the scsi device is fully resumed,
before continuing the rescan process.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v6:
 - Fix !CONFIG_PM_SLEEP compilation error.

v5:
 - Use a different approach. Wait until the disk device is resumed.

v4: 
 - No change.

v3:
 - New patch to resolve undefined pm_suspend_target_state.

v2:
 - Schedule rescan task at the end of system resume phase.
 - Wording.

 drivers/scsi/scsi_scan.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Bart Van Assche June 2, 2023, 3:30 p.m. UTC | #1
On 6/2/23 01:49, Kai-Heng Feng wrote:
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index d217be323cc6..092f37464101 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1621,6 +1621,11 @@ void scsi_rescan_device(struct device *dev)
>   {
>   	struct scsi_device *sdev = to_scsi_device(dev);
>   
> +#ifdef CONFIG_PM_SLEEP
> +	if (dev->power.is_suspended)
> +		wait_for_completion(&dev->power.completion);
> +#endif
> +
>   	device_lock(dev);
>   
>   	scsi_attach_vpd(sdev);

Directly accessing dev->power.completion from the SCSI core seems like a 
layering violation to me. Isn't this an object that should only be 
accessed directly by the device driver power management core? 
Additionally, what guarantees that the desired power state has been 
reached after wait_for_completion(&dev->power.completion) has finished?

I think we need another solution. The device_lock() and device_unlock() 
calls have been introduced by commit e27829dc92e5 ("scsi: serialize 
->rescan against ->remove"). I think there are other ways to serialize
scsi_rescan_device() against scsi_remove_device(), e.g. via 
host->scan_mutex. Is this something that has been considered?

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index d217be323cc6..092f37464101 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1621,6 +1621,11 @@  void scsi_rescan_device(struct device *dev)
 {
 	struct scsi_device *sdev = to_scsi_device(dev);
 
+#ifdef CONFIG_PM_SLEEP
+	if (dev->power.is_suspended)
+		wait_for_completion(&dev->power.completion);
+#endif
+
 	device_lock(dev);
 
 	scsi_attach_vpd(sdev);