Message ID | 1347522049-1836-2-git-send-email-aaron.lu@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote: > The ready_to_power_off flag is used to give indication to ATA layer > if this device's power can be removed when runtime suspended. > > This flag is determined by individual SCSI driver like sr, sd. > > This flag is introduced to support zero power ODD. When ODD > is runtime suspended, it may not be OK to remove its power. > > But for disk, it is always OK to be powered off, so set this flag. It is? I may have missed this, but where do you flush the cache of write back cache devices you're about to power off? James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/13/2012 04:14 PM, James Bottomley wrote: > On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote: >> The ready_to_power_off flag is used to give indication to ATA layer >> if this device's power can be removed when runtime suspended. >> >> This flag is determined by individual SCSI driver like sr, sd. >> >> This flag is introduced to support zero power ODD. When ODD >> is runtime suspended, it may not be OK to remove its power. >> >> But for disk, it is always OK to be powered off, so set this flag. > > It is? I may have missed this, but where do you flush the cache of write > back cache devices you're about to power off? I suppose that is handled in sd_suspend callback, the power off happens after a device is runtime suspended. Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote: > On 09/13/2012 04:14 PM, James Bottomley wrote: > > On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote: > >> The ready_to_power_off flag is used to give indication to ATA layer > >> if this device's power can be removed when runtime suspended. > >> > >> This flag is determined by individual SCSI driver like sr, sd. > >> > >> This flag is introduced to support zero power ODD. When ODD > >> is runtime suspended, it may not be OK to remove its power. > >> > >> But for disk, it is always OK to be powered off, so set this flag. > > > > It is? I may have missed this, but where do you flush the cache of write > > back cache devices you're about to power off? > > I suppose that is handled in sd_suspend callback, the power off happens > after a device is runtime suspended. Well that would mean something is wrong somewhere: For runtime power management using idle timers and forced standby, there's no need to flush the cache (if the drive goes into standby on its own as a result of an idle timeout, the cache will never flush). The cache needs to flush before we power off the device: that's before the system goes into S3, or now before you power it off at runtime. Flushing the cache on runtime transitions to standby will likely cause performance problems since that happens quite often. James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/13/2012 04:37 PM, James Bottomley wrote: > On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote: >> On 09/13/2012 04:14 PM, James Bottomley wrote: >>> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote: >>>> The ready_to_power_off flag is used to give indication to ATA layer >>>> if this device's power can be removed when runtime suspended. >>>> >>>> This flag is determined by individual SCSI driver like sr, sd. >>>> >>>> This flag is introduced to support zero power ODD. When ODD >>>> is runtime suspended, it may not be OK to remove its power. >>>> >>>> But for disk, it is always OK to be powered off, so set this flag. >>> >>> It is? I may have missed this, but where do you flush the cache of write >>> back cache devices you're about to power off? >> >> I suppose that is handled in sd_suspend callback, the power off happens >> after a device is runtime suspended. > > Well that would mean something is wrong somewhere: For runtime power > management using idle timers and forced standby, there's no need to The current mechanism for scsi disk runtime pm is based on open/close. If there is some process opened this block device, it will be in active state; only when all opened session exited, it will enter runtime suspend state. > flush the cache (if the drive goes into standby on its own as a result > of an idle timeout, the cache will never flush). The cache needs to > flush before we power off the device: that's before the system goes into > S3, or now before you power it off at runtime. Flushing the cache on > runtime transitions to standby will likely cause performance problems > since that happens quite often. As explained above, it didn't happen that often, especially for user who has only one disk, the disk will be mounted, which makes it never be able to enter runtime suspend state. Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 16:49 +0800, Aaron Lu wrote: > On 09/13/2012 04:37 PM, James Bottomley wrote: > > On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote: > >> On 09/13/2012 04:14 PM, James Bottomley wrote: > >>> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote: > >>>> The ready_to_power_off flag is used to give indication to ATA layer > >>>> if this device's power can be removed when runtime suspended. > >>>> > >>>> This flag is determined by individual SCSI driver like sr, sd. > >>>> > >>>> This flag is introduced to support zero power ODD. When ODD > >>>> is runtime suspended, it may not be OK to remove its power. > >>>> > >>>> But for disk, it is always OK to be powered off, so set this flag. > >>> > >>> It is? I may have missed this, but where do you flush the cache of write > >>> back cache devices you're about to power off? > >> > >> I suppose that is handled in sd_suspend callback, the power off happens > >> after a device is runtime suspended. > > > > Well that would mean something is wrong somewhere: For runtime power > > management using idle timers and forced standby, there's no need to > > The current mechanism for scsi disk runtime pm is based on open/close. > If there is some process opened this block device, it will be in active > state; only when all opened session exited, it will enter runtime > suspend state. A mounted disk is open for the period of the mount. I thought the use case for runtime PM was the laptop one but most laptops have a single device to use as root, so if you never use runtime PM on an open device, you never use it on 99% of our target systems ... doesn't that make the feature a bit useless? > > flush the cache (if the drive goes into standby on its own as a result > > of an idle timeout, the cache will never flush). The cache needs to > > flush before we power off the device: that's before the system goes into > > S3, or now before you power it off at runtime. Flushing the cache on > > runtime transitions to standby will likely cause performance problems > > since that happens quite often. > > As explained above, it didn't happen that often, especially for user who > has only one disk, the disk will be mounted, which makes it never be > able to enter runtime suspend state. So what's the target audience for the feature. If it isn't laptops or standard desktops, is it the enterprise? James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/13/2012 04:56 PM, James Bottomley wrote: > On Thu, 2012-09-13 at 16:49 +0800, Aaron Lu wrote: >> On 09/13/2012 04:37 PM, James Bottomley wrote: >>> On Thu, 2012-09-13 at 16:23 +0800, Aaron Lu wrote: >>>> On 09/13/2012 04:14 PM, James Bottomley wrote: >>>>> On Thu, 2012-09-13 at 15:40 +0800, Aaron Lu wrote: >>>>>> The ready_to_power_off flag is used to give indication to ATA layer >>>>>> if this device's power can be removed when runtime suspended. >>>>>> >>>>>> This flag is determined by individual SCSI driver like sr, sd. >>>>>> >>>>>> This flag is introduced to support zero power ODD. When ODD >>>>>> is runtime suspended, it may not be OK to remove its power. >>>>>> >>>>>> But for disk, it is always OK to be powered off, so set this flag. >>>>> >>>>> It is? I may have missed this, but where do you flush the cache of write >>>>> back cache devices you're about to power off? >>>> >>>> I suppose that is handled in sd_suspend callback, the power off happens >>>> after a device is runtime suspended. >>> >>> Well that would mean something is wrong somewhere: For runtime power >>> management using idle timers and forced standby, there's no need to >> >> The current mechanism for scsi disk runtime pm is based on open/close. >> If there is some process opened this block device, it will be in active >> state; only when all opened session exited, it will enter runtime >> suspend state. > > A mounted disk is open for the period of the mount. I thought the use > case for runtime PM was the laptop one but most laptops have a single > device to use as root, so if you never use runtime PM on an open device, > you never use it on 99% of our target systems ... doesn't that make the > feature a bit useless? I agree, but it may be helpful in some cases. > >>> flush the cache (if the drive goes into standby on its own as a result >>> of an idle timeout, the cache will never flush). The cache needs to >>> flush before we power off the device: that's before the system goes into >>> S3, or now before you power it off at runtime. Flushing the cache on >>> runtime transitions to standby will likely cause performance problems >>> since that happens quite often. >> >> As explained above, it didn't happen that often, especially for user who >> has only one disk, the disk will be mounted, which makes it never be >> able to enter runtime suspend state. > > So what's the target audience for the feature. If it isn't laptops or > standard desktops, is it the enterprise? To make this feature useful for normal laptop user, a better mechanism for scsi disk runtime pm is needed. Alan Stern and Lin Ming has been working on this, and I'll see if I can make that patch work later. So I think this is basically 2 things, one is the runtime suspend of the disk, another is when it is runtime suspended, how to remove its power. I'm currently doing the latter one, which is simpler, so I want to do it first :-) And there may exist some cases this can be helpful, if user has 2 or more disks attached and he is only using one of them or some other corner cases that I don't know. Considering the effort to implement this feature pretty small, and it shouldn't cause trouble for existing system, I think this may be worth it. Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote: > On 09/13/2012 04:56 PM, James Bottomley wrote: > > So what's the target audience for the feature. If it isn't laptops or > > standard desktops, is it the enterprise? > > To make this feature useful for normal laptop user, a better mechanism > for scsi disk runtime pm is needed. Alan Stern and Lin Ming has been > working on this, and I'll see if I can make that patch work later. > > So I think this is basically 2 things, one is the runtime suspend of the > disk, another is when it is runtime suspended, how to remove its power. > I'm currently doing the latter one, which is simpler, so I want to do it > first :-) Well, I don't like the way the interaction of the patches is going. You're the one proposing powering down the device outside of the standards defined transitions, so you need to be responsible for the actions that necessitates, including synchronizing the cache. The specs (SPC-4) say that cache management is explicitly unnecessary for the standard SCSI power states (Active, Idle, Standby and Stopped), so someone at some point is going to read that and remove the unnecessary cache sync in the code. When that happens, you'll start getting data loss. James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 13 September 2012 10:26:44 James Bottomley wrote: > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote: > > So I think this is basically 2 things, one is the runtime suspend of the > > disk, another is when it is runtime suspended, how to remove its power. > > I'm currently doing the latter one, which is simpler, so I want to do it > > first :-) > > Well, I don't like the way the interaction of the patches is going. > You're the one proposing powering down the device outside of the > standards defined transitions, so you need to be responsible for the > actions that necessitates, including synchronizing the cache. The specs > (SPC-4) say that cache management is explicitly unnecessary for the > standard SCSI power states (Active, Idle, Standby and Stopped), so > someone at some point is going to read that and remove the unnecessary > cache sync in the code. When that happens, you'll start getting data > loss. The cache is handled identically in sd_suspend() and sd_shutdown(). In fact sd_shutdown() will skip handling it if the device has already been suspended, so the assumption is built into the code and has been so for a long time. Though it wouldn't hurt to add a comment that says that the system going to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced even if the spec says we need not. Runtime PM doesn't much alter the situation. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 12:16 +0200, Oliver Neukum wrote: > On Thursday 13 September 2012 10:26:44 James Bottomley wrote: > > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote: > > > > So I think this is basically 2 things, one is the runtime suspend of the > > > disk, another is when it is runtime suspended, how to remove its power. > > > I'm currently doing the latter one, which is simpler, so I want to do it > > > first :-) > > > > Well, I don't like the way the interaction of the patches is going. > > You're the one proposing powering down the device outside of the > > standards defined transitions, so you need to be responsible for the > > actions that necessitates, including synchronizing the cache. The specs > > (SPC-4) say that cache management is explicitly unnecessary for the > > standard SCSI power states (Active, Idle, Standby and Stopped), so > > someone at some point is going to read that and remove the unnecessary > > cache sync in the code. When that happens, you'll start getting data > > loss. > > The cache is handled identically in sd_suspend() and sd_shutdown(). > In fact sd_shutdown() will skip handling it if the device has already been > suspended, so the assumption is built into the code and has been so > for a long time. > > Though it wouldn't hurt to add a comment that says that the system going > to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced > even if the spec says we need not. Runtime PM doesn't much alter the > situation. I think you're confusing two things. Sleep states (S3 and S4) aren't spec'd in SCSI, so we have to take care of everything (including the cache before power off) because they're done invisibly to the disk. The same tends to go for link power management, which was previously our only form of runtime PM, but which doesn't actually affect the disk at all and, of course, ACPI power off of devices (ZPDD). Disk runtime power states are defined in the standard and so we rely on the standard taking care of the cache. I suspect the most efficient use may be via the power management mode page, which does everything automatically on timers (you just get to set the timer interval, plus some transports *may* require an initialising command which we already have some provision for) than doing it all ourselves from block. James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 13 September 2012 11:51:07 James Bottomley wrote: > On Thu, 2012-09-13 at 12:16 +0200, Oliver Neukum wrote: > > On Thursday 13 September 2012 10:26:44 James Bottomley wrote: > > > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote: > > > > > > So I think this is basically 2 things, one is the runtime suspend of the > > > > disk, another is when it is runtime suspended, how to remove its power. > > > > I'm currently doing the latter one, which is simpler, so I want to do it > > > > first :-) > > > > > > Well, I don't like the way the interaction of the patches is going. > > > You're the one proposing powering down the device outside of the > > > standards defined transitions, so you need to be responsible for the > > > actions that necessitates, including synchronizing the cache. The specs > > > (SPC-4) say that cache management is explicitly unnecessary for the > > > standard SCSI power states (Active, Idle, Standby and Stopped), so > > > someone at some point is going to read that and remove the unnecessary > > > cache sync in the code. When that happens, you'll start getting data > > > loss. > > > > The cache is handled identically in sd_suspend() and sd_shutdown(). > > In fact sd_shutdown() will skip handling it if the device has already been > > suspended, so the assumption is built into the code and has been so > > for a long time. > > > > Though it wouldn't hurt to add a comment that says that the system going > > to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced > > even if the spec says we need not. Runtime PM doesn't much alter the > > situation. > > I think you're confusing two things. Sleep states (S3 and S4) aren't > spec'd in SCSI, so we have to take care of everything (including the > cache before power off) because they're done invisibly to the disk. The Yes, but this confusion is necessary. The driver core is supposed to be generic and knows strictly speaking only suspended and active. It is a driver's job to do what needs to be done and translate this into the appropriate device states. > same tends to go for link power management, which was previously our > only form of runtime PM, but which doesn't actually affect the disk at > all and, of course, ACPI power off of devices (ZPDD). The latter however does cut power to the drive. So the driver should do what it does when other operations that affect power are done. > Disk runtime power states are defined in the standard and so we rely on > the standard taking care of the cache. I suspect the most efficient use > may be via the power management mode page, which does everything > automatically on timers (you just get to set the timer interval, plus > some transports *may* require an initialising command which we already > have some provision for) than doing it all ourselves from block. Well, yes, but we need support modes of power management that cut off power to the disk in any case, so what does it matter if we also do it for runtime PM? Are you concerned about layering? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 13 Sep 2012, Oliver Neukum wrote: > > > > Well, I don't like the way the interaction of the patches is going. > > > > You're the one proposing powering down the device outside of the > > > > standards defined transitions, so you need to be responsible for the > > > > actions that necessitates, including synchronizing the cache. The specs > > > > (SPC-4) say that cache management is explicitly unnecessary for the > > > > standard SCSI power states (Active, Idle, Standby and Stopped), so > > > > someone at some point is going to read that and remove the unnecessary > > > > cache sync in the code. When that happens, you'll start getting data > > > > loss. > > > > > > The cache is handled identically in sd_suspend() and sd_shutdown(). > > > In fact sd_shutdown() will skip handling it if the device has already been > > > suspended, so the assumption is built into the code and has been so > > > for a long time. > > > > > > Though it wouldn't hurt to add a comment that says that the system going > > > to S3 or S4 will cut power to a lot of disk so that the cache needs to be synced > > > even if the spec says we need not. Runtime PM doesn't much alter the > > > situation. > > > > I think you're confusing two things. Sleep states (S3 and S4) aren't > > spec'd in SCSI, so we have to take care of everything (including the > > cache before power off) because they're done invisibly to the disk. The > > Yes, but this confusion is necessary. The driver core is supposed to > be generic and knows strictly speaking only suspended and active. > It is a driver's job to do what needs to be done and translate this > into the appropriate device states. Currently the sd driver's suspend routine is not very sophisticated. It needs to become smarter about the differences between system suspend, runtime suspend, and power off. > > same tends to go for link power management, which was previously our > > only form of runtime PM, but which doesn't actually affect the disk at > > all and, of course, ACPI power off of devices (ZPDD). > > The latter however does cut power to the drive. So the driver should do > what it does when other operations that affect power are done. > > > Disk runtime power states are defined in the standard and so we rely on > > the standard taking care of the cache. I suspect the most efficient use > > may be via the power management mode page, which does everything > > automatically on timers (you just get to set the timer interval, plus > > some transports *may* require an initialising command which we already > > have some provision for) than doing it all ourselves from block. > > Well, yes, but we need support modes of power management that cut off > power to the disk in any case, so what does it matter if we also do it for > runtime PM? > > Are you concerned about layering? It sounds like James is partly concerned about efficiency. If Lin Ming's patches are merged then we will be doing runtime suspend relatively often, not just when the device file is closed. The sd_suspend routine should know when SYNCHRONIZE CACHE is needed and when it can be skipped. From what I gather of this discussion, we can avoid flushing the cache during (1) a runtime suspend provided (2) the drive isn't going to be powered down. If either (1) or (2) doesn't hold then the cache needs to be synchronized. The problem with relying on the internal timers and the power management mode page is that the transitions take place automatically and the host system doesn't know about them. We _want_ to know about them so that the higher layers of the device tree can go to low power when the disk does. On the other hand, perhaps sd_suspend/sd_resume could use the mode page by telling it to go into or out of Stopped mode immediately. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 13 September 2012 12:24:46 Alan Stern wrote: > On Thu, 13 Sep 2012, Oliver Neukum wrote: > > Yes, but this confusion is necessary. The driver core is supposed to > > be generic and knows strictly speaking only suspended and active. > > It is a driver's job to do what needs to be done and translate this > > into the appropriate device states. > > Currently the sd driver's suspend routine is not very sophisticated. > It needs to become smarter about the differences between system > suspend, runtime suspend, and power off. In what way? > > Well, yes, but we need support modes of power management that cut off > > power to the disk in any case, so what does it matter if we also do it for > > runtime PM? > > > > Are you concerned about layering? > > It sounds like James is partly concerned about efficiency. If Lin > Ming's patches are merged then we will be doing runtime suspend > relatively often, not just when the device file is closed. The > sd_suspend routine should know when SYNCHRONIZE CACHE is needed and > when it can be skipped. How? This depends on the hardware? > From what I gather of this discussion, we can avoid flushing the cache > during (1) a runtime suspend provided (2) the drive isn't going to be > powered down. If either (1) or (2) doesn't hold then the cache needs > to be synchronized. This is true, but how is it relevant? > The problem with relying on the internal timers and the power > management mode page is that the transitions take place automatically > and the host system doesn't know about them. We _want_ to know about > them so that the higher layers of the device tree can go to low power > when the disk does. Why would you want that to correlate? The operation of the controller and the driver is independent of the state. And what would it tell us, as the driver knows aout all IO anyway? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 13 Sep 2012, Oliver Neukum wrote: > On Thursday 13 September 2012 12:24:46 Alan Stern wrote: > > On Thu, 13 Sep 2012, Oliver Neukum wrote: > > > > Yes, but this confusion is necessary. The driver core is supposed to > > > be generic and knows strictly speaking only suspended and active. > > > It is a driver's job to do what needs to be done and translate this > > > into the appropriate device states. > > > > Currently the sd driver's suspend routine is not very sophisticated. > > It needs to become smarter about the differences between system > > suspend, runtime suspend, and power off. > > In what way? sd_suspend should know whether or not to issue the SYNCHRONIZE CACHE command. > > It sounds like James is partly concerned about efficiency. If Lin > > Ming's patches are merged then we will be doing runtime suspend > > relatively often, not just when the device file is closed. The > > sd_suspend routine should know when SYNCHRONIZE CACHE is needed and > > when it can be skipped. > > How? This depends on the hardware? It depends partly on the hardware, partly on the type of suspend, and partly on the flag settings in sysfs. > > From what I gather of this discussion, we can avoid flushing the cache > > during (1) a runtime suspend provided (2) the drive isn't going to be > > powered down. If either (1) or (2) doesn't hold then the cache needs > > to be synchronized. > > This is true, but how is it relevant? This, or something like it, is the algorithm sd_suspend should use for determining whether or not to issue SYNCHRONIZE CACHE. > > The problem with relying on the internal timers and the power > > management mode page is that the transitions take place automatically > > and the host system doesn't know about them. We _want_ to know about > > them so that the higher layers of the device tree can go to low power > > when the disk does. > > Why would you want that to correlate? The operation of the controller > and the driver is independent of the state. That's the problem -- I would like them not to be so independent. The reason stated above: If we know when the controller puts the drive in a low-power state then we can tell the higher layers of the device tree to go to low power at those times. > And what would it tell us, as the driver knows aout all IO anyway? But the driver doesn't know when the controller has spun down the disk. That's something else sd_suspend has to worry about. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 13, 2012 at 10:26:44AM +0100, James Bottomley wrote: > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote: > > So I think this is basically 2 things, one is the runtime suspend of the > > disk, another is when it is runtime suspended, how to remove its power. > > I'm currently doing the latter one, which is simpler, so I want to do it > > first :-) > > Well, I don't like the way the interaction of the patches is going. > You're the one proposing powering down the device outside of the > standards defined transitions, so you need to be responsible for the > actions that necessitates, including synchronizing the cache. The specs OK, I'll update the code. > (SPC-4) say that cache management is explicitly unnecessary for the > standard SCSI power states (Active, Idle, Standby and Stopped), so Just read the SPC-4 spec, in section 5.12.3, it has words like this: Logical units that contain cache memory shall write all cached data to the medium for the logical unit(e.g., as a logical unit would do in response to a SYNCHRONIZE CACHE command as described SBC-3) prior to entering into any power condition that prevents accessing the media(e.g., before a hard drive stops its spindle motor during a change to the standby power condition). So this looks like cache needs to be synced before the device enter standby/stopped power condition. Or do I miss somthing? > someone at some point is going to read that and remove the unnecessary > cache sync in the code. When that happens, you'll start getting data > loss. Indeed, I'll make sure cache gets synced when we are to power off the device. Thanks for the remind. -Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 13, 2012 at 12:24:46PM -0400, Alan Stern wrote: > > > Disk runtime power states are defined in the standard and so we rely on > > > the standard taking care of the cache. I suspect the most efficient use > > > may be via the power management mode page, which does everything > > > automatically on timers (you just get to set the timer interval, plus > > > some transports *may* require an initialising command which we already > > > have some provision for) than doing it all ourselves from block. > > > > Well, yes, but we need support modes of power management that cut off > > power to the disk in any case, so what does it matter if we also do it for > > runtime PM? > > > > Are you concerned about layering? > > It sounds like James is partly concerned about efficiency. If Lin > Ming's patches are merged then we will be doing runtime suspend > relatively often, not just when the device file is closed. The > sd_suspend routine should know when SYNCHRONIZE CACHE is needed and > when it can be skipped. > > From what I gather of this discussion, we can avoid flushing the cache > during (1) a runtime suspend provided (2) the drive isn't going to be > powered down. If either (1) or (2) doesn't hold then the cache needs > to be synchronized. Agree. > > The problem with relying on the internal timers and the power > management mode page is that the transitions take place automatically > and the host system doesn't know about them. We _want_ to know about > them so that the higher layers of the device tree can go to low power > when the disk does. Looks like it's not easy to know when the device entered a low power state. Constantly polling with request sense doesn't seem to be a good idea. This will make upper layer devices not able to enter runtime suspend state and device's power can't be cut. > > On the other hand, perhaps sd_suspend/sd_resume could use the mode page > by telling it to go into or out of Stopped mode immediately. BTW, is it necessary to issue the stop command before we cut its power either due to runtime power off or system entering S3/S4/S5? Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 12:24 -0400, Alan Stern wrote: > On Thu, 13 Sep 2012, Oliver Neukum wrote: > > > Disk runtime power states are defined in the standard and so we rely on > > > the standard taking care of the cache. I suspect the most efficient use > > > may be via the power management mode page, which does everything > > > automatically on timers (you just get to set the timer interval, plus > > > some transports *may* require an initialising command which we already > > > have some provision for) than doing it all ourselves from block. > > > > Well, yes, but we need support modes of power management that cut off > > power to the disk in any case, so what does it matter if we also do it for > > runtime PM? > > > > Are you concerned about layering? > > It sounds like James is partly concerned about efficiency. Sort of, but my main worry is correctness: I don't want a path in runtime suspend that requires a cache flush to be dependent on the flush being in a path which doesn't because efficiency dictates that at some time or other the unnecessary flush will get removed (and then we'll start corrupting data). > If Lin > Ming's patches are merged then we will be doing runtime suspend > relatively often, not just when the device file is closed. The > sd_suspend routine should know when SYNCHRONIZE CACHE is needed and > when it can be skipped. Keeping the flush in sd_suspend and making sure we know when to use it would be fine by me as well ... I just need all the independent runtime suspend patch authors to agree on this scheme. > >From what I gather of this discussion, we can avoid flushing the cache > during (1) a runtime suspend provided (2) the drive isn't going to be > powered down. If either (1) or (2) doesn't hold then the cache needs > to be synchronized. > > The problem with relying on the internal timers and the power > management mode page is that the transitions take place automatically > and the host system doesn't know about them. We _want_ to know about > them so that the higher layers of the device tree can go to low power > when the disk does. Sigh ... the standards guys didn't help there then, since SPC-4 specifically says there will be no notifications. > On the other hand, perhaps sd_suspend/sd_resume could use the mode page > by telling it to go into or out of Stopped mode immediately. That's perfectly legal. Even if you use timer based power state management afforded by the mode page you can still preempt the timer with an explicit go into this power state command. James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-09-14 at 13:20 +0800, Aaron Lu wrote: > On Thu, Sep 13, 2012 at 10:26:44AM +0100, James Bottomley wrote: > > On Thu, 2012-09-13 at 17:07 +0800, Aaron Lu wrote: > > > So I think this is basically 2 things, one is the runtime suspend of the > > > disk, another is when it is runtime suspended, how to remove its power. > > > I'm currently doing the latter one, which is simpler, so I want to do it > > > first :-) > > > > Well, I don't like the way the interaction of the patches is going. > > You're the one proposing powering down the device outside of the > > standards defined transitions, so you need to be responsible for the > > actions that necessitates, including synchronizing the cache. The specs > > OK, I'll update the code. > > > (SPC-4) say that cache management is explicitly unnecessary for the > > standard SCSI power states (Active, Idle, Standby and Stopped), so > > Just read the SPC-4 spec, in section 5.12.3, it has words like this: > > Logical units that contain cache memory shall write all cached data to > the medium for the logical unit(e.g., as a logical unit would do in > response to a SYNCHRONIZE CACHE command as described SBC-3) prior to > entering into any power condition that prevents accessing the > media(e.g., before a hard drive stops its spindle motor during a change > to the standby power condition). > > So this looks like cache needs to be synced before the device enter > standby/stopped power condition. Or do I miss somthing? Um, no it says the device shall do the sync on its own (as though it received a sync cache). That section says the device shall be responsible for cache management in the power states. > > someone at some point is going to read that and remove the unnecessary > > cache sync in the code. When that happens, you'll start getting data > > loss. > > Indeed, I'll make sure cache gets synced when we are to power off the > device. Thanks for the remind. Great, thanks. James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/14/2012 04:17 PM, James Bottomley wrote: >> Just read the SPC-4 spec, in section 5.12.3, it has words like this: >> >> Logical units that contain cache memory shall write all cached data to >> the medium for the logical unit(e.g., as a logical unit would do in >> response to a SYNCHRONIZE CACHE command as described SBC-3) prior to >> entering into any power condition that prevents accessing the >> media(e.g., before a hard drive stops its spindle motor during a change >> to the standby power condition). >> >> So this looks like cache needs to be synced before the device enter >> standby/stopped power condition. Or do I miss somthing? > > Um, no it says the device shall do the sync on its own (as though it > received a sync cache). That section says the device shall be > responsible for cache management in the power states. Oh, I thought it was the host software's responsibility, thanks for the explanation. So if we program the device to let it enter standby/stopped power condition with the start_stop_unit command, do we need to sync the cache? Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-09-14 at 16:48 +0800, Aaron Lu wrote: > On 09/14/2012 04:17 PM, James Bottomley wrote: > >> Just read the SPC-4 spec, in section 5.12.3, it has words like this: > >> > >> Logical units that contain cache memory shall write all cached data to > >> the medium for the logical unit(e.g., as a logical unit would do in > >> response to a SYNCHRONIZE CACHE command as described SBC-3) prior to > >> entering into any power condition that prevents accessing the > >> media(e.g., before a hard drive stops its spindle motor during a change > >> to the standby power condition). > >> > >> So this looks like cache needs to be synced before the device enter > >> standby/stopped power condition. Or do I miss somthing? > > > > Um, no it says the device shall do the sync on its own (as though it > > received a sync cache). That section says the device shall be > > responsible for cache management in the power states. > > Oh, I thought it was the host software's responsibility, thanks for the > explanation. > > So if we program the device to let it enter standby/stopped power > condition with the start_stop_unit command, do we need to sync the > cache? No, that's what the spec says. The device must manage the cache in both the forced (start stop unit) and timed (power control mode page) cases. The reason is the spec doesn't define what idle and standby actually mean (just that they're "lower" power states). So the device implementers get to choose if they stop the platter or power off the motor. The spec just means that if they do anything that causes danger to data in the cache, they have to deal with it themselves. James -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 4df73e5..de786cf 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2638,6 +2638,7 @@ static void sd_probe_async(void *data, async_cookie_t cookie) sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n", sdp->removable ? "removable " : ""); + sdp->ready_to_power_off = 1; scsi_autopm_put_device(sdp); put_device(&sdkp->dev); }
The ready_to_power_off flag is used to give indication to ATA layer if this device's power can be removed when runtime suspended. This flag is determined by individual SCSI driver like sr, sd. This flag is introduced to support zero power ODD. When ODD is runtime suspended, it may not be OK to remove its power. But for disk, it is always OK to be powered off, so set this flag. Signed-off-by: Aaron Lu <aaron.lu@intel.com> --- drivers/scsi/sd.c | 1 + 1 file changed, 1 insertion(+)