Message ID | 20230923002932.1082348-10-dlemoal@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix libata suspend/resume handling and code cleanup | expand |
On 9/22/23 17:29, Damien Le Moal wrote: > diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h > index 5eea762f84d1..4d42392fae07 100644 > --- a/drivers/scsi/sd.h > +++ b/drivers/scsi/sd.h > @@ -150,6 +150,7 @@ struct scsi_disk { > unsigned urswrz : 1; > unsigned security : 1; > unsigned ignore_medium_access_errors : 1; > + unsigned suspended : 1; > }; > #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev) If the 'suspended' member is retained, please do not use a bitfield for the but use type 'bool' instead. Updates of instances of type 'bool' are atomic while there is no guarantee in the C standard that bitfield updates will be atomic. Bitfield updates are typically translated into a combination of &, | and ~ operations. Additionally, I'm not convinced that we need the new 'suspended' member. The Linux kernel runtime PM subsystem serializes I/O and system-wide power operations. No I/O happens during system-wide suspend or resume operations and no system-wide suspend or resume callbacks are invoked while I/O is ongoing. The only exception is I/O that is initiated as the result of error handling by suspend or resume callbacks, e.g. the SCSI commands submitted by sd_shutdown(). Even if sd_shutdown() is called indirectly by a suspend or resume callback, I don't think that it can happen that a suspend or resume operation is ongoing for the device sd_shutdown() operates on. If scsi_remove_host() is called from inside a resume callback, resuming of the devices affected by sd_shutdown() will only be attempted after the host adapter resume callback has finished. Thanks, Bart.
On 2023/09/25 22:22, Bart Van Assche wrote: > On 9/22/23 17:29, Damien Le Moal wrote: >> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h >> index 5eea762f84d1..4d42392fae07 100644 >> --- a/drivers/scsi/sd.h >> +++ b/drivers/scsi/sd.h >> @@ -150,6 +150,7 @@ struct scsi_disk { >> unsigned urswrz : 1; >> unsigned security : 1; >> unsigned ignore_medium_access_errors : 1; >> + unsigned suspended : 1; >> }; >> #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev) > > If the 'suspended' member is retained, please do not use a bitfield for the > but use type 'bool' instead. Updates of instances of type 'bool' are atomic > while there is no guarantee in the C standard that bitfield updates will be > atomic. Bitfield updates are typically translated into a combination of &, > | and ~ operations. Sure, I can make it a bool. > Additionally, I'm not convinced that we need the new 'suspended' member. > The Linux kernel runtime PM subsystem serializes I/O and system-wide power > operations. No I/O happens during system-wide suspend or resume operations > and no system-wide suspend or resume callbacks are invoked while I/O is > ongoing. The only exception is I/O that is initiated as the result of error > handling by suspend or resume callbacks, e.g. the SCSI commands submitted > by sd_shutdown(). Even if sd_shutdown() is called indirectly by a suspend > or resume callback, I don't think that it can happen that a suspend or > resume operation is ongoing for the device sd_shutdown() operates on. If Yes, but that is not what this patch addresses. > scsi_remove_host() is called from inside a resume callback, resuming of the > devices affected by sd_shutdown() will only be attempted after the host > adapter resume callback has finished. No it will not because the commands issued in sd_shutdown() are synchronous, so the adapter resume will wait for these to complete. But they will never complete as the adapter itself is not fully resumed, AND the disk may not be in a state that allows commands to be executed. Deadlock. It is easy to recreate this issue if you have a pm8001 adapter: remove that fix patch I sent to correctly re-allocate IRQs on resume and do a suspend-resume cycle: on resume, the adapter fails to allocate IRQs and gives up, calling scsi_remove_host(). The system end being stuck in resume context with no forward progress ever made. It seems that you are suggesting that we should use some information from the scsi_device->power structure to detect the suspended state... But as mentioned before, these are PM internal and should not be touched without the device lock held. So the little "suspended" falg simplifies things a lot. > > Thanks, > > Bart.
On 9/25/23 23:00, Damien Le Moal wrote: > It seems that you are suggesting that we should use some information > from the scsi_device->power structure to detect the suspended > state... Yes, that's indeed what I'm suggesting. > But as mentioned before, these are PM internal and should not be > touched without the device lock held. So the little "suspended" flag > simplifies things a lot. Hmm ... I think there is plenty of code in the Linux kernel that reads variables that can be modified by another thread without using locking. Hasn't the READ_ONCE() macro been introduced for this purpose? Anyway, I don't have a strong opinion about whether to read directly from the scsi_device->power data structure or whether to introduce the new 'suspended' member. Thanks, Bart.
On 9/26/23 07:51, Bart Van Assche wrote: > On 9/25/23 23:00, Damien Le Moal wrote: >> But as mentioned before, these are PM internal and should not be >> touched without the device lock held. So the little "suspended" flag >> simplifies things a lot. > > Hmm ... I think there is plenty of code in the Linux kernel that reads > variables that can be modified by another thread without using locking. > Hasn't the READ_ONCE() macro been introduced for this purpose? Anyway, I > don't have a strong opinion about whether to read directly from the > scsi_device->power data structure or whether to introduce the new > 'suspended' member. (replying to my own email) I think we need the new 'suspended' flag. device_resume(), a function executed during system-wide resume, executes the following code whether or not resuming succeeds: dev->power.is_suspended = false; Bart.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index a1ef4eef904f..bff8663be7e0 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3741,7 +3741,8 @@ static int sd_remove(struct device *dev) device_del(&sdkp->disk_dev); del_gendisk(sdkp->disk); - sd_shutdown(dev); + if (!sdkp->suspended) + sd_shutdown(dev); put_disk(sdkp->disk); return 0; @@ -3872,6 +3873,9 @@ static int sd_suspend_common(struct device *dev, bool runtime) ret = 0; } + if (!ret) + sdkp->suspended = 1; + return ret; } @@ -3891,21 +3895,26 @@ static int sd_suspend_runtime(struct device *dev) static int sd_resume(struct device *dev, bool runtime) { struct scsi_disk *sdkp = dev_get_drvdata(dev); - int ret; + int ret = 0; if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ return 0; - if (!sd_do_start_stop(sdkp->device, runtime)) + if (!sd_do_start_stop(sdkp->device, runtime)) { + sdkp->suspended = 0; return 0; + } if (!sdkp->device->no_start_on_resume) { sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); ret = sd_start_stop_device(sdkp, 1); } - if (!ret) + if (!ret) { opal_unlock_from_suspend(sdkp->opal_dev); + sdkp->suspended = 0; + } + return ret; } diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 5eea762f84d1..4d42392fae07 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -150,6 +150,7 @@ struct scsi_disk { unsigned urswrz : 1; unsigned security : 1; unsigned ignore_medium_access_errors : 1; + unsigned suspended : 1; }; #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
If an error occurs when resuming a host adapter before the devices attached to the adapter are resumed, the adapter low level driver may remove the scsi host, resulting in a call to sd_remove() for the disks of the host. This in turn results in a call to sd_shutdown() which will issue a synchronize cache command and a start stop unit command to spindown the disk. sd_shutdown() issues the commands only if the device is not already runtime suspended but does not check the power state for system-wide suspend/resume. That is, the commands may be issued with the device in a suspended state, which causes PM resume to hang, forcing a reset of the machine to recover. Fix this by tracking the suspended state of a disk using the sicsi_disk suspended flag and by not calling sd_shutdown() in sd_remove() if the disk is not running. Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> --- drivers/scsi/sd.c | 17 +++++++++++++---- drivers/scsi/sd.h | 1 + 2 files changed, 14 insertions(+), 4 deletions(-)