diff mbox series

[v5] scsi: core: Wait until device is fully resumed before doing rescan

Message ID 20230601155652.1157611-1-kai.heng.feng@canonical.com (mailing list archive)
State Superseded, archived
Headers show
Series [v5] scsi: core: Wait until device is fully resumed before doing rescan | expand

Commit Message

Kai-Heng Feng June 1, 2023, 3:56 p.m. UTC
During system resuming process, the resuming order is from top to down.
Namely, the ATA host is resumed before disks connected to it.

When an EH is scheduled while ATA host is resumed and disk device is
still suspended, the device_lock hold by scsi_rescan_device() is never
released so the dpm_resume() of the disk is blocked forerver, therefore
the system can never be resumed back.

That's because scsi_attach_vpd() is expecting the disk device is in
operational state, as it doesn't work on suspended device.

To avoid such deadlock, wait until the scsi device is fully resumed,
before continuing the rescan process.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v5:
 - Use a different approach. Wait until the disk device is resumed.

v4: 
 - No change.

v3:
 - New patch to resolve undefined pm_suspend_target_state.

v2:
 - Schedule rescan task at the end of system resume phase.
 - Wording.

 drivers/scsi/scsi_scan.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bart Van Assche June 1, 2023, 7:48 p.m. UTC | #1
On 6/1/23 08:56, Kai-Heng Feng wrote:
> During system resuming process, the resuming order is from top to down.
> Namely, the ATA host is resumed before disks connected to it.
> 
> When an EH is scheduled while ATA host is resumed and disk device is
> still suspended, the device_lock hold by scsi_rescan_device() is never
> released so the dpm_resume() of the disk is blocked forerver, therefore
> the system can never be resumed back.
> 
> That's because scsi_attach_vpd() is expecting the disk device is in
> operational state, as it doesn't work on suspended device.
> 
> To avoid such deadlock, wait until the scsi device is fully resumed,
> before continuing the rescan process.

Why doesn't scsi_attach_vpd() support runtime power management? Calling 
scsi_attach_vpd() should result in a call of sdev_runtime_resume(), 
isn't it?

Thanks,

Bart.
Kai-Heng Feng June 2, 2023, 1 a.m. UTC | #2
On Fri, Jun 2, 2023 at 3:48 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 6/1/23 08:56, Kai-Heng Feng wrote:
> > During system resuming process, the resuming order is from top to down.
> > Namely, the ATA host is resumed before disks connected to it.
> >
> > When an EH is scheduled while ATA host is resumed and disk device is
> > still suspended, the device_lock hold by scsi_rescan_device() is never
> > released so the dpm_resume() of the disk is blocked forerver, therefore
> > the system can never be resumed back.
> >
> > That's because scsi_attach_vpd() is expecting the disk device is in
> > operational state, as it doesn't work on suspended device.
> >
> > To avoid such deadlock, wait until the scsi device is fully resumed,
> > before continuing the rescan process.
>
> Why doesn't scsi_attach_vpd() support runtime power management? Calling
> scsi_attach_vpd() should result in a call of sdev_runtime_resume(),

It's system-wide resume in this context, so it's dpm_resume() waiting
for the lock to be released by scsi_rescan_device().

Kai-Heng

> isn't it?
>
> Thanks,
>
> Bart.
>
kernel test robot June 2, 2023, 4:25 a.m. UTC | #3
Hi Kai-Heng,

kernel test robot noticed the following build errors:

[auto build test ERROR on mkp-scsi/for-next]
[also build test ERROR on jejb-scsi/for-next linus/master v6.4-rc4 next-20230601]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kai-Heng-Feng/scsi-core-Wait-until-device-is-fully-resumed-before-doing-rescan/20230601-235821
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
patch link:    https://lore.kernel.org/r/20230601155652.1157611-1-kai.heng.feng%40canonical.com
patch subject: [PATCH v5] scsi: core: Wait until device is fully resumed before doing rescan
config: parisc-defconfig (https://download.01.org/0day-ci/archive/20230602/202306021251.amkU7A6P-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 12.3.0
reproduce (this is a W=1 build):
        mkdir -p ~/bin
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/fe2b65dd8204442bd73db8a6e40d8307e11fcd04
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Kai-Heng-Feng/scsi-core-Wait-until-device-is-fully-resumed-before-doing-rescan/20230601-235821
        git checkout fe2b65dd8204442bd73db8a6e40d8307e11fcd04
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=parisc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=parisc SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306021251.amkU7A6P-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/scsi/scsi_scan.c: In function 'scsi_rescan_device':
>> drivers/scsi/scsi_scan.c:1627:48: error: 'struct dev_pm_info' has no member named 'completion'
    1627 |                 wait_for_completion(&dev->power.completion);
         |                                                ^


vim +1627 drivers/scsi/scsi_scan.c

  1621	
  1622	void scsi_rescan_device(struct device *dev)
  1623	{
  1624		struct scsi_device *sdev = to_scsi_device(dev);
  1625	
  1626		if (dev->power.is_suspended)
> 1627			wait_for_completion(&dev->power.completion);
  1628	
  1629		device_lock(dev);
  1630	
  1631		scsi_attach_vpd(sdev);
  1632		scsi_cdl_check(sdev);
  1633	
  1634		if (sdev->handler && sdev->handler->rescan)
  1635			sdev->handler->rescan(sdev);
  1636	
  1637		if (dev->driver && try_module_get(dev->driver->owner)) {
  1638			struct scsi_driver *drv = to_scsi_driver(dev->driver);
  1639	
  1640			if (drv->rescan)
  1641				drv->rescan(dev);
  1642			module_put(dev->driver->owner);
  1643		}
  1644		device_unlock(dev);
  1645	}
  1646	EXPORT_SYMBOL(scsi_rescan_device);
  1647
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index d217be323cc6..a59aada98ac5 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1621,6 +1621,9 @@  void scsi_rescan_device(struct device *dev)
 {
 	struct scsi_device *sdev = to_scsi_device(dev);
 
+	if (dev->power.is_suspended)
+		wait_for_completion(&dev->power.completion);
+
 	device_lock(dev);
 
 	scsi_attach_vpd(sdev);