Message ID | 2314745.iZASKD2KPV@rjwysocki.net (mailing list archive) |
---|---|
Headers | show |
Series | PM: Make the core and pm_runtime_force_suspend/resume() agree more | expand |
On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > Hi Everyone, > > This series is a result of the discussion on a recently reported issue > with device runtime PM status propagation during system resume and > the resulting patches: > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > Overall, due to restrictions related to pm_runtime_force_suspend() and > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > setting propagation to the parent of the first device in a dependency > chain that turned out to have to be resumed during system resume even > though it was runtime-suspended before system suspend. > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > suspend devices that have never had runtime PM enabled if their runtime > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > will skip device whose runtime PM status is currently RPM_ACTIVE. > > The purpose of this series is to eliminate the above restrictions and > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > more with what the core does. For my understanding, would you mind elaborating a bit more around the end-goal with this? Are you trying to make adaptations for pm_runtime_force_suspend|resume() and the PM core, such that drivers that uses pm_runtime_force_suspend|resume() should be able to cope with other drivers for child-devices that make use of DPM_FLAG_SMART_SUSPEND? If we can make this work, it would enable the propagation of RPM_ACTIVE in the PM core for more devices, but still not for all, right? The point is, the other bigger issue that I pointed out in our earlier discussions; all those devices where their drivers/buses don't cope with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent the PM core from *unconditionally* propagating the RPM_ACTIVE to parents. I guess this is the best we can do then? > > First off, it turns out that detecting devices that have never had > runtime PM enabled is not really hard - it is sufficient to check > their power.last_status data when runtime PM is disabled. If > power.last_status is RPM_INVALID at that point, runtime PM has never > been enabled for the given device, so patch [01/10] adds a helper > function for checking that. > > Patch [02/10] makes the PM core use the new function to avoid setting > power.set_active for devices with no runtime PM support which really > is a fixup on top of > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > Patch [03/10] modifies pm_runtime_force_suspend() to skip devices > with no runtime PM support with the help of the new function. > > Next, patch [04/10] uses the observation that the runtime PM status > check in pm_runtime_force_resume() is redundant and drops that check. > > Patch [05/10] removes the wakeirq enabling from the pm_runtime_force_resume() > error path because it is not really a good idea to enable wakeirqs during > system resume. > > Patch [06/10] makes the PM core somewhat more consistent with > pm_runtime_force_suspend() and patch [07/10] prepares it for the subsequent > changes. > > Patch [08/10] changes pm_runtime_force_resume() to handle the case in > which the runtime PM status of the device has been updated by the core to > RPM_ACTIVE after pm_runtime_force_suspend() left it in RPM_SUSPENDED. > > Patch [09/10] restores the RPM_ACTIVE setting propagation to parents > and suppliers, but it takes exceptions into account (for example, devices > with no runtime PM support). > > Finally, patch [10/10] adds a mechanism to discover cases in which runtime PM > is disabled for a device permanently even though it has been enabled for that > device at one point. > > Please have a look and let me know if you see any problems. > > Thanks! Kind regards Uffe
On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > Hi Everyone, > > > > This series is a result of the discussion on a recently reported issue > > with device runtime PM status propagation during system resume and > > the resulting patches: > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > setting propagation to the parent of the first device in a dependency > > chain that turned out to have to be resumed during system resume even > > though it was runtime-suspended before system suspend. > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > suspend devices that have never had runtime PM enabled if their runtime > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > The purpose of this series is to eliminate the above restrictions and > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > more with what the core does. > > For my understanding, would you mind elaborating a bit more around the > end-goal with this? The end goal is, of course, full integration of runtime PM with system sleep for all devices. Eventually. And it is necessary to make the core and pm_runtime_force_suspend|resume() work together better for this purpose. > Are you trying to make adaptations for > pm_runtime_force_suspend|resume() and the PM core, such that drivers > that uses pm_runtime_force_suspend|resume() should be able to cope > with other drivers for child-devices that make use of > DPM_FLAG_SMART_SUSPEND? Yes. This is a more general case, though, when a device that was runtime-suspended before system suspend and is left in suspend during it, needs to be resumed during the system resume that follows. Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it cannot happen otherwise, but I think that it is a generally valid case. > If we can make this work, it would enable the propagation of > RPM_ACTIVE in the PM core for more devices, but still not for all, > right? It is all until there is a known case in which it isn't. So either you know a specific case in which it doesn't work, or I don't see a reason for avoiding it. ATM the only specific case in which it doesn't work is when pm_runtime_force_suspend|resume() are used. > The point is, the other bigger issue that I pointed out in our earlier > discussions; all those devices where their drivers/buses don't cope > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > the PM core from *unconditionally* propagating the RPM_ACTIVE to > parents. I guess this is the best we can do then? OK, what are they? You keep saying that they exist without giving any examples.
On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > Hi Everyone, > > > > > > This series is a result of the discussion on a recently reported issue > > > with device runtime PM status propagation during system resume and > > > the resulting patches: > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > setting propagation to the parent of the first device in a dependency > > > chain that turned out to have to be resumed during system resume even > > > though it was runtime-suspended before system suspend. > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > suspend devices that have never had runtime PM enabled if their runtime > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > The purpose of this series is to eliminate the above restrictions and > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > more with what the core does. > > > > For my understanding, would you mind elaborating a bit more around the > > end-goal with this? > > The end goal is, of course, full integration of runtime PM with system > sleep for all devices. Eventually. > > And it is necessary to make the core and > pm_runtime_force_suspend|resume() work together better for this > purpose. > > > Are you trying to make adaptations for > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > that uses pm_runtime_force_suspend|resume() should be able to cope > > with other drivers for child-devices that make use of > > DPM_FLAG_SMART_SUSPEND? > > Yes. > > This is a more general case, though, when a device that was > runtime-suspended before system suspend and is left in suspend during > it, needs to be resumed during the system resume that follows. > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > cannot happen otherwise, but I think that it is a generally valid > case. > > > If we can make this work, it would enable the propagation of > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > right? > > It is all until there is a known case in which it isn't. So either > you know a specific case in which it doesn't work, or I don't see a > reason for avoiding it. > > ATM the only specific case in which it doesn't work is when > pm_runtime_force_suspend|resume() are used. > > > The point is, the other bigger issue that I pointed out in our earlier > > discussions; all those devices where their drivers/buses don't cope > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > parents. I guess this is the best we can do then? > > OK, what are they? > > You keep saying that they exist without giving any examples. To put it more bluntly, I'm not aware of any place other than pm_runtime_force_suspend|resume() that can be confused by changing the runtime PM status of a device with runtime PM disabled (either permanently, or transiently during a system suspend-resume cycle) to RPM_ACTIVE, so if there are any such places, I would appreciate letting me know what they are. Note that rpm_active() could start producing confusing results if the runtime PM status of a device with runtime PM disabled is changed from RPM_ACTIVE to RPM_SUSPENDED because it will then start to return -EACCES instead of 1, but changing the status to RPM_ACTIVE will not confuse it the same way.
On Wed, Feb 12, 2025 at 12:33 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > Hi Everyone, > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > with device runtime PM status propagation during system resume and > > > > the resulting patches: > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > setting propagation to the parent of the first device in a dependency > > > > chain that turned out to have to be resumed during system resume even > > > > though it was runtime-suspended before system suspend. > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > more with what the core does. > > > > > > For my understanding, would you mind elaborating a bit more around the > > > end-goal with this? > > > > The end goal is, of course, full integration of runtime PM with system > > sleep for all devices. Eventually. > > > > And it is necessary to make the core and > > pm_runtime_force_suspend|resume() work together better for this > > purpose. > > > > > Are you trying to make adaptations for > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > with other drivers for child-devices that make use of > > > DPM_FLAG_SMART_SUSPEND? > > > > Yes. > > > > This is a more general case, though, when a device that was > > runtime-suspended before system suspend and is left in suspend during > > it, needs to be resumed during the system resume that follows. > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > cannot happen otherwise, but I think that it is a generally valid > > case. > > > > > If we can make this work, it would enable the propagation of > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > right? > > > > It is all until there is a known case in which it isn't. So either > > you know a specific case in which it doesn't work, or I don't see a > > reason for avoiding it. > > > > ATM the only specific case in which it doesn't work is when > > pm_runtime_force_suspend|resume() are used. > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > discussions; all those devices where their drivers/buses don't cope > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > parents. I guess this is the best we can do then? > > > > OK, what are they? > > > > You keep saying that they exist without giving any examples. > > To put it more bluntly, I'm not aware of any place other than > pm_runtime_force_suspend|resume() that can be confused by changing the > runtime PM status of a device with runtime PM disabled (either > permanently, or transiently during a system suspend-resume cycle) to > RPM_ACTIVE, so if there are any such places, I would appreciate > letting me know what they are. > > Note that rpm_active() could start producing confusing results if the rpm_resume() rather, sorry. > runtime PM status of a device with runtime PM disabled is changed from > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > confuse it the same way.
On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > Hi Everyone, > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > with device runtime PM status propagation during system resume and > > > > the resulting patches: > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > setting propagation to the parent of the first device in a dependency > > > > chain that turned out to have to be resumed during system resume even > > > > though it was runtime-suspended before system suspend. > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > more with what the core does. > > > > > > For my understanding, would you mind elaborating a bit more around the > > > end-goal with this? > > > > The end goal is, of course, full integration of runtime PM with system > > sleep for all devices. Eventually. > > > > And it is necessary to make the core and > > pm_runtime_force_suspend|resume() work together better for this > > purpose. > > > > > Are you trying to make adaptations for > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > with other drivers for child-devices that make use of > > > DPM_FLAG_SMART_SUSPEND? > > > > Yes. > > > > This is a more general case, though, when a device that was > > runtime-suspended before system suspend and is left in suspend during > > it, needs to be resumed during the system resume that follows. > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > cannot happen otherwise, but I think that it is a generally valid > > case. > > > > > If we can make this work, it would enable the propagation of > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > right? > > > > It is all until there is a known case in which it isn't. So either > > you know a specific case in which it doesn't work, or I don't see a > > reason for avoiding it. > > > > ATM the only specific case in which it doesn't work is when > > pm_runtime_force_suspend|resume() are used. > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > discussions; all those devices where their drivers/buses don't cope > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > parents. I guess this is the best we can do then? > > > > OK, what are they? > > > > You keep saying that they exist without giving any examples. > > To put it more bluntly, I'm not aware of any place other than > pm_runtime_force_suspend|resume() that can be confused by changing the > runtime PM status of a device with runtime PM disabled (either > permanently, or transiently during a system suspend-resume cycle) to > RPM_ACTIVE, so if there are any such places, I would appreciate > letting me know what they are. Well, sorry I thought you were aware. Anyway, I believe you need to do your own investigation as it's simply too time consuming for me to find them all. The problem is that it's not just a simple pattern to search for, so we would need some clever scripting to move forward to find them. To start with, we should look for drivers that enable runtime PM, by calling pm_runtime_enable(). Additionally, in their system suspend callback they should typically *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or pm_runtime_active() as that is usually (but no always) indicating that they got it right. Then there are those that don't have system suspend/resume callbacks assigned at all (or uses some other subsystem specific callback for this), but only uses runtime PM. On top of that, drivers that makes use of pm_runtime_force_suspend|resume() should be disregarded, which has reached beyond 300 as this point. Anyway, here is just one example that I found from a quick search. drivers/i2c/busses/i2c-qcom-geni.c > > Note that rpm_active() could start producing confusing results if the > runtime PM status of a device with runtime PM disabled is changed from > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > confuse it the same way. Trust me, it will cause problems. Drivers may call pm_runtime_get_sync() to turn on the resources for the device after the system has resumed, when runtime PM has been re-enabled for the device by the PM core. Those calls to pm_runtime_get_sync() will then not end up invoking any if ->runtime_resume() callbacks for the device since its state is already RPM_ACTIVE. Hence, the device will remain in a low power state even if the driver believes it has been powered on. In many cases, accessing the device (like reading/writing a register) will often just just hang in these cases, but in worst cases we could end up getting even more difficult bugs to debug. Kind regards Uffe
On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > Hi Everyone, > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > with device runtime PM status propagation during system resume and > > > > > the resulting patches: > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > setting propagation to the parent of the first device in a dependency > > > > > chain that turned out to have to be resumed during system resume even > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > more with what the core does. > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > end-goal with this? > > > > > > The end goal is, of course, full integration of runtime PM with system > > > sleep for all devices. Eventually. > > > > > > And it is necessary to make the core and > > > pm_runtime_force_suspend|resume() work together better for this > > > purpose. > > > > > > > Are you trying to make adaptations for > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > with other drivers for child-devices that make use of > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > Yes. > > > > > > This is a more general case, though, when a device that was > > > runtime-suspended before system suspend and is left in suspend during > > > it, needs to be resumed during the system resume that follows. > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > cannot happen otherwise, but I think that it is a generally valid > > > case. > > > > > > > If we can make this work, it would enable the propagation of > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > right? > > > > > > It is all until there is a known case in which it isn't. So either > > > you know a specific case in which it doesn't work, or I don't see a > > > reason for avoiding it. > > > > > > ATM the only specific case in which it doesn't work is when > > > pm_runtime_force_suspend|resume() are used. > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > discussions; all those devices where their drivers/buses don't cope > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > parents. I guess this is the best we can do then? > > > > > > OK, what are they? > > > > > > You keep saying that they exist without giving any examples. > > > > To put it more bluntly, I'm not aware of any place other than > > pm_runtime_force_suspend|resume() that can be confused by changing the > > runtime PM status of a device with runtime PM disabled (either > > permanently, or transiently during a system suspend-resume cycle) to > > RPM_ACTIVE, so if there are any such places, I would appreciate > > letting me know what they are. > > Well, sorry I thought you were aware. Anyway, I believe you need to do > your own investigation as it's simply too time consuming for me to > find them all. The problem is that it's not just a simple pattern to > search for, so we would need some clever scripting to move forward to > find them. > > To start with, we should look for drivers that enable runtime PM, by > calling pm_runtime_enable(). > > Additionally, in their system suspend callback they should typically > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > pm_runtime_active() as that is usually (but no always) indicating that > they got it right. Then there are those that don't have system > suspend/resume callbacks assigned at all (or uses some other subsystem > specific callback for this), but only uses runtime PM. > > On top of that, drivers that makes use of > pm_runtime_force_suspend|resume() should be disregarded, which has > reached beyond 300 as this point. > > Anyway, here is just one example that I found from a quick search. > drivers/i2c/busses/i2c-qcom-geni.c OK, I see. It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if not suspended already, and assumes it to stay this way across geni_i2c_resume_noirq() until someone resumes it via runtime PM. Fair enough, but somebody should tell them that they don't need to use pm_runtime_disable/enable() in _noirq. > > > > Note that rpm_active() could start producing confusing results if the > > runtime PM status of a device with runtime PM disabled is changed from > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > confuse it the same way. > > Trust me, it will cause problems. > > Drivers may call pm_runtime_get_sync() to turn on the resources for > the device after the system has resumed, when runtime PM has been > re-enabled for the device by the PM core. > > Those calls to pm_runtime_get_sync() will then not end up invoking any > if ->runtime_resume() callbacks for the device since its state is > already RPM_ACTIVE. Hence, the device will remain in a low power state > even if the driver believes it has been powered on. In many cases, > accessing the device (like reading/writing a register) will often just > just hang in these cases, but in worst cases we could end up getting > even more difficult bugs to debug. Sure, that would be a problem. I think I need to find a different way to address the problem I'm seeing, that is to resume devices that were runtime-suspended before system suspend along with their superiors. One way to do it would be to just defer their resume to the point when the core has enabled runtime PM for them, which means that it has also enabled runtime PM for all of their superiors, and then call pm_runtime_resume(). This should work unless one of the superiors has runtime PM disabled at that point, of course.
On Wed, Feb 12, 2025 at 6:05 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > with device runtime PM status propagation during system resume and > > > > > > the resulting patches: > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > more with what the core does. > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > end-goal with this? > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > sleep for all devices. Eventually. > > > > > > > > And it is necessary to make the core and > > > > pm_runtime_force_suspend|resume() work together better for this > > > > purpose. > > > > > > > > > Are you trying to make adaptations for > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > with other drivers for child-devices that make use of > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > Yes. > > > > > > > > This is a more general case, though, when a device that was > > > > runtime-suspended before system suspend and is left in suspend during > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > cannot happen otherwise, but I think that it is a generally valid > > > > case. > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > right? > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > you know a specific case in which it doesn't work, or I don't see a > > > > reason for avoiding it. > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > parents. I guess this is the best we can do then? > > > > > > > > OK, what are they? > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > To put it more bluntly, I'm not aware of any place other than > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > runtime PM status of a device with runtime PM disabled (either > > > permanently, or transiently during a system suspend-resume cycle) to > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > letting me know what they are. > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > your own investigation as it's simply too time consuming for me to > > find them all. The problem is that it's not just a simple pattern to > > search for, so we would need some clever scripting to move forward to > > find them. > > > > To start with, we should look for drivers that enable runtime PM, by > > calling pm_runtime_enable(). > > > > Additionally, in their system suspend callback they should typically > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > pm_runtime_active() as that is usually (but no always) indicating that > > they got it right. Then there are those that don't have system > > suspend/resume callbacks assigned at all (or uses some other subsystem > > specific callback for this), but only uses runtime PM. > > > > On top of that, drivers that makes use of > > pm_runtime_force_suspend|resume() should be disregarded, which has > > reached beyond 300 as this point. > > > > Anyway, here is just one example that I found from a quick search. > > drivers/i2c/busses/i2c-qcom-geni.c > > OK, I see. > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > not suspended already, and assumes it to stay this way across > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > Fair enough, but somebody should tell them that they don't need to use > pm_runtime_disable/enable() in _noirq. > > > > > > > Note that rpm_active() could start producing confusing results if the > > > runtime PM status of a device with runtime PM disabled is changed from > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > confuse it the same way. > > > > Trust me, it will cause problems. > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > the device after the system has resumed, when runtime PM has been > > re-enabled for the device by the PM core. > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > if ->runtime_resume() callbacks for the device since its state is > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > even if the driver believes it has been powered on. In many cases, > > accessing the device (like reading/writing a register) will often just > > just hang in these cases, but in worst cases we could end up getting > > even more difficult bugs to debug. > > Sure, that would be a problem. We may be making a logical mistake here by assuming that any of these devices will end up in dependency chains starting at the devices that I'm concerned about. > I think I need to find a different way to address the problem I'm > seeing, that is to resume devices that were runtime-suspended before > system suspend along with their superiors. Well, maybe not. So the "smart suspend" thing can be done only if all of the devices above the given one in the dependency graph (including the parent, suppliers etc) either opt in for "smart suspend" or are devices without PM callbacks and the dependency graphs can be cut at devices that don't support runtime PM. This would require another flag in struct dev_pm_info for the tracking of dependencies, but it should be entirely doable because all of them are known at the device_prepare() time. That should avoid involving anyone who can be surprised.
On Wed, 12 Feb 2025 at 18:05, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > with device runtime PM status propagation during system resume and > > > > > > the resulting patches: > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > more with what the core does. > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > end-goal with this? > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > sleep for all devices. Eventually. > > > > > > > > And it is necessary to make the core and > > > > pm_runtime_force_suspend|resume() work together better for this > > > > purpose. > > > > > > > > > Are you trying to make adaptations for > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > with other drivers for child-devices that make use of > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > Yes. > > > > > > > > This is a more general case, though, when a device that was > > > > runtime-suspended before system suspend and is left in suspend during > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > cannot happen otherwise, but I think that it is a generally valid > > > > case. > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > right? > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > you know a specific case in which it doesn't work, or I don't see a > > > > reason for avoiding it. > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > parents. I guess this is the best we can do then? > > > > > > > > OK, what are they? > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > To put it more bluntly, I'm not aware of any place other than > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > runtime PM status of a device with runtime PM disabled (either > > > permanently, or transiently during a system suspend-resume cycle) to > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > letting me know what they are. > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > your own investigation as it's simply too time consuming for me to > > find them all. The problem is that it's not just a simple pattern to > > search for, so we would need some clever scripting to move forward to > > find them. > > > > To start with, we should look for drivers that enable runtime PM, by > > calling pm_runtime_enable(). > > > > Additionally, in their system suspend callback they should typically > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > pm_runtime_active() as that is usually (but no always) indicating that > > they got it right. Then there are those that don't have system > > suspend/resume callbacks assigned at all (or uses some other subsystem > > specific callback for this), but only uses runtime PM. > > > > On top of that, drivers that makes use of > > pm_runtime_force_suspend|resume() should be disregarded, which has > > reached beyond 300 as this point. > > > > Anyway, here is just one example that I found from a quick search. > > drivers/i2c/busses/i2c-qcom-geni.c > > OK, I see. > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > not suspended already, and assumes it to stay this way across > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > Fair enough, but somebody should tell them that they don't need to use > pm_runtime_disable/enable() in _noirq. > > > > > > > Note that rpm_active() could start producing confusing results if the > > > runtime PM status of a device with runtime PM disabled is changed from > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > confuse it the same way. > > > > Trust me, it will cause problems. > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > the device after the system has resumed, when runtime PM has been > > re-enabled for the device by the PM core. > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > if ->runtime_resume() callbacks for the device since its state is > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > even if the driver believes it has been powered on. In many cases, > > accessing the device (like reading/writing a register) will often just > > just hang in these cases, but in worst cases we could end up getting > > even more difficult bugs to debug. > > Sure, that would be a problem. > > I think I need to find a different way to address the problem I'm > seeing, that is to resume devices that were runtime-suspended before > system suspend along with their superiors. > > One way to do it would be to just defer their resume to the point when > the core has enabled runtime PM for them, which means that it has also > enabled runtime PM for all of their superiors, and then call > pm_runtime_resume(). > > This should work unless one of the superiors has runtime PM disabled > at that point, of course. Right, so typically users of pm_runtime_force_suspend|resume() from the regular ->suspend|resume() callbacks would not work, because they disable/enable runtime PM. Although, we could probably fix this quite easily by making some adaptations in pm_runtime_force_suspend|resume(). I am not sure if this approach would have any other issue though, but it seems like it could make sense to explore this approach. In general drivers should cope with their devices being runtime resumed if runtime PM is enabled, right!? If this works, it seems like a generic and a good solution to me. Kind regards Uffe
On Wed, 12 Feb 2025 at 20:04, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Feb 12, 2025 at 6:05 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > > with device runtime PM status propagation during system resume and > > > > > > > the resulting patches: > > > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > > more with what the core does. > > > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > > end-goal with this? > > > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > > sleep for all devices. Eventually. > > > > > > > > > > And it is necessary to make the core and > > > > > pm_runtime_force_suspend|resume() work together better for this > > > > > purpose. > > > > > > > > > > > Are you trying to make adaptations for > > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > > with other drivers for child-devices that make use of > > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > > > Yes. > > > > > > > > > > This is a more general case, though, when a device that was > > > > > runtime-suspended before system suspend and is left in suspend during > > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > > cannot happen otherwise, but I think that it is a generally valid > > > > > case. > > > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > > right? > > > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > > you know a specific case in which it doesn't work, or I don't see a > > > > > reason for avoiding it. > > > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > > parents. I guess this is the best we can do then? > > > > > > > > > > OK, what are they? > > > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > > > To put it more bluntly, I'm not aware of any place other than > > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > > runtime PM status of a device with runtime PM disabled (either > > > > permanently, or transiently during a system suspend-resume cycle) to > > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > > letting me know what they are. > > > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > > your own investigation as it's simply too time consuming for me to > > > find them all. The problem is that it's not just a simple pattern to > > > search for, so we would need some clever scripting to move forward to > > > find them. > > > > > > To start with, we should look for drivers that enable runtime PM, by > > > calling pm_runtime_enable(). > > > > > > Additionally, in their system suspend callback they should typically > > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > > pm_runtime_active() as that is usually (but no always) indicating that > > > they got it right. Then there are those that don't have system > > > suspend/resume callbacks assigned at all (or uses some other subsystem > > > specific callback for this), but only uses runtime PM. > > > > > > On top of that, drivers that makes use of > > > pm_runtime_force_suspend|resume() should be disregarded, which has > > > reached beyond 300 as this point. > > > > > > Anyway, here is just one example that I found from a quick search. > > > drivers/i2c/busses/i2c-qcom-geni.c > > > > OK, I see. > > > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > > not suspended already, and assumes it to stay this way across > > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > > > Fair enough, but somebody should tell them that they don't need to use > > pm_runtime_disable/enable() in _noirq. > > > > > > > > > > Note that rpm_active() could start producing confusing results if the > > > > runtime PM status of a device with runtime PM disabled is changed from > > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > > confuse it the same way. > > > > > > Trust me, it will cause problems. > > > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > > the device after the system has resumed, when runtime PM has been > > > re-enabled for the device by the PM core. > > > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > > if ->runtime_resume() callbacks for the device since its state is > > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > > even if the driver believes it has been powered on. In many cases, > > > accessing the device (like reading/writing a register) will often just > > > just hang in these cases, but in worst cases we could end up getting > > > even more difficult bugs to debug. > > > > Sure, that would be a problem. > > We may be making a logical mistake here by assuming that any of these > devices will end up in dependency chains starting at the devices that > I'm concerned about. > > > I think I need to find a different way to address the problem I'm > > seeing, that is to resume devices that were runtime-suspended before > > system suspend along with their superiors. > > Well, maybe not. > > So the "smart suspend" thing can be done only if all of the devices > above the given one in the dependency graph (including the parent, > suppliers etc) either opt in for "smart suspend" or are devices > without PM callbacks and the dependency graphs can be cut at devices > that don't support runtime PM. This would require another flag in > struct dev_pm_info for the tracking of dependencies, but it should be > entirely doable because all of them are known at the device_prepare() > time. > > That should avoid involving anyone who can be surprised. This would be an improvement, while it is questionable for how valuable this would be in the end. As soon as there is a parent/supplier that has PM callbacks, we would need to prevent the propagation. I am also a bit worried about adding more corner cases to the PM core, as it's not entirely easy to understand them, I think. That said, maybe it's the best we can do. Kind regards Uffe
On Thu, Feb 13, 2025 at 2:11 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Wed, 12 Feb 2025 at 18:05, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > > with device runtime PM status propagation during system resume and > > > > > > > the resulting patches: > > > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > > more with what the core does. > > > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > > end-goal with this? > > > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > > sleep for all devices. Eventually. > > > > > > > > > > And it is necessary to make the core and > > > > > pm_runtime_force_suspend|resume() work together better for this > > > > > purpose. > > > > > > > > > > > Are you trying to make adaptations for > > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > > with other drivers for child-devices that make use of > > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > > > Yes. > > > > > > > > > > This is a more general case, though, when a device that was > > > > > runtime-suspended before system suspend and is left in suspend during > > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > > cannot happen otherwise, but I think that it is a generally valid > > > > > case. > > > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > > right? > > > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > > you know a specific case in which it doesn't work, or I don't see a > > > > > reason for avoiding it. > > > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > > parents. I guess this is the best we can do then? > > > > > > > > > > OK, what are they? > > > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > > > To put it more bluntly, I'm not aware of any place other than > > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > > runtime PM status of a device with runtime PM disabled (either > > > > permanently, or transiently during a system suspend-resume cycle) to > > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > > letting me know what they are. > > > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > > your own investigation as it's simply too time consuming for me to > > > find them all. The problem is that it's not just a simple pattern to > > > search for, so we would need some clever scripting to move forward to > > > find them. > > > > > > To start with, we should look for drivers that enable runtime PM, by > > > calling pm_runtime_enable(). > > > > > > Additionally, in their system suspend callback they should typically > > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > > pm_runtime_active() as that is usually (but no always) indicating that > > > they got it right. Then there are those that don't have system > > > suspend/resume callbacks assigned at all (or uses some other subsystem > > > specific callback for this), but only uses runtime PM. > > > > > > On top of that, drivers that makes use of > > > pm_runtime_force_suspend|resume() should be disregarded, which has > > > reached beyond 300 as this point. > > > > > > Anyway, here is just one example that I found from a quick search. > > > drivers/i2c/busses/i2c-qcom-geni.c > > > > OK, I see. > > > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > > not suspended already, and assumes it to stay this way across > > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > > > Fair enough, but somebody should tell them that they don't need to use > > pm_runtime_disable/enable() in _noirq. > > > > > > > > > > Note that rpm_active() could start producing confusing results if the > > > > runtime PM status of a device with runtime PM disabled is changed from > > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > > confuse it the same way. > > > > > > Trust me, it will cause problems. > > > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > > the device after the system has resumed, when runtime PM has been > > > re-enabled for the device by the PM core. > > > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > > if ->runtime_resume() callbacks for the device since its state is > > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > > even if the driver believes it has been powered on. In many cases, > > > accessing the device (like reading/writing a register) will often just > > > just hang in these cases, but in worst cases we could end up getting > > > even more difficult bugs to debug. > > > > Sure, that would be a problem. > > > > I think I need to find a different way to address the problem I'm > > seeing, that is to resume devices that were runtime-suspended before > > system suspend along with their superiors. > > > > One way to do it would be to just defer their resume to the point when > > the core has enabled runtime PM for them, which means that it has also > > enabled runtime PM for all of their superiors, and then call > > pm_runtime_resume(). > > > > This should work unless one of the superiors has runtime PM disabled > > at that point, of course. > > Right, so typically users of pm_runtime_force_suspend|resume() from > the regular ->suspend|resume() callbacks would not work, because they > disable/enable runtime PM. Although, we could probably fix this quite > easily by making some adaptations in > pm_runtime_force_suspend|resume(). > > I am not sure if this approach would have any other issue though, but > it seems like it could make sense to explore this approach. In general > drivers should cope with their devices being runtime resumed if > runtime PM is enabled, right!? > > If this works, it seems like a generic and a good solution to me. For PCI ports, though, it would require some tuning of ->runtime_resume(), so it is not as simple as it would seem to be in the end.
On Thu, Feb 13, 2025 at 2:38 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Wed, 12 Feb 2025 at 20:04, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Feb 12, 2025 at 6:05 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > > > with device runtime PM status propagation during system resume and > > > > > > > > the resulting patches: > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > > > more with what the core does. > > > > > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > > > end-goal with this? > > > > > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > > > sleep for all devices. Eventually. > > > > > > > > > > > > And it is necessary to make the core and > > > > > > pm_runtime_force_suspend|resume() work together better for this > > > > > > purpose. > > > > > > > > > > > > > Are you trying to make adaptations for > > > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > > > with other drivers for child-devices that make use of > > > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > > > > > Yes. > > > > > > > > > > > > This is a more general case, though, when a device that was > > > > > > runtime-suspended before system suspend and is left in suspend during > > > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > > > cannot happen otherwise, but I think that it is a generally valid > > > > > > case. > > > > > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > > > right? > > > > > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > > > you know a specific case in which it doesn't work, or I don't see a > > > > > > reason for avoiding it. > > > > > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > > > parents. I guess this is the best we can do then? > > > > > > > > > > > > OK, what are they? > > > > > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > > > > > To put it more bluntly, I'm not aware of any place other than > > > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > > > runtime PM status of a device with runtime PM disabled (either > > > > > permanently, or transiently during a system suspend-resume cycle) to > > > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > > > letting me know what they are. > > > > > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > > > your own investigation as it's simply too time consuming for me to > > > > find them all. The problem is that it's not just a simple pattern to > > > > search for, so we would need some clever scripting to move forward to > > > > find them. > > > > > > > > To start with, we should look for drivers that enable runtime PM, by > > > > calling pm_runtime_enable(). > > > > > > > > Additionally, in their system suspend callback they should typically > > > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > > > pm_runtime_active() as that is usually (but no always) indicating that > > > > they got it right. Then there are those that don't have system > > > > suspend/resume callbacks assigned at all (or uses some other subsystem > > > > specific callback for this), but only uses runtime PM. > > > > > > > > On top of that, drivers that makes use of > > > > pm_runtime_force_suspend|resume() should be disregarded, which has > > > > reached beyond 300 as this point. > > > > > > > > Anyway, here is just one example that I found from a quick search. > > > > drivers/i2c/busses/i2c-qcom-geni.c > > > > > > OK, I see. > > > > > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > > > not suspended already, and assumes it to stay this way across > > > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > > > > > Fair enough, but somebody should tell them that they don't need to use > > > pm_runtime_disable/enable() in _noirq. BTW, I'm wondering how this driver handles the case in which someone runs pm_runtime_get*() on its device and doesn't release the reference throughout a system suspend-resume cycle, which is a valid thing to do albeit unusual. > > > > > > > > > > Note that rpm_active() could start producing confusing results if the > > > > > runtime PM status of a device with runtime PM disabled is changed from > > > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > > > confuse it the same way. > > > > > > > > Trust me, it will cause problems. > > > > > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > > > the device after the system has resumed, when runtime PM has been > > > > re-enabled for the device by the PM core. > > > > > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > > > if ->runtime_resume() callbacks for the device since its state is > > > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > > > even if the driver believes it has been powered on. In many cases, > > > > accessing the device (like reading/writing a register) will often just > > > > just hang in these cases, but in worst cases we could end up getting > > > > even more difficult bugs to debug. > > > > > > Sure, that would be a problem. > > > > We may be making a logical mistake here by assuming that any of these > > devices will end up in dependency chains starting at the devices that > > I'm concerned about. > > > > > I think I need to find a different way to address the problem I'm > > > seeing, that is to resume devices that were runtime-suspended before > > > system suspend along with their superiors. > > > > Well, maybe not. > > > > So the "smart suspend" thing can be done only if all of the devices > > above the given one in the dependency graph (including the parent, > > suppliers etc) either opt in for "smart suspend" or are devices > > without PM callbacks and the dependency graphs can be cut at devices > > that don't support runtime PM. This would require another flag in > > struct dev_pm_info for the tracking of dependencies, but it should be > > entirely doable because all of them are known at the device_prepare() > > time. > > > > That should avoid involving anyone who can be surprised. > > This would be an improvement, while it is questionable for how > valuable this would be in the end. As soon as there is a > parent/supplier that has PM callbacks, we would need to prevent the > propagation. One without "smart suspend" that is. Yes. For the current users of DPM_FLAG_SMART_SUSPEND this shouldn't actually matter because it is the case already AFAICS, so I'm not worried about this. > I am also a bit worried about adding more corner cases to > the PM core, as it's not entirely easy to understand them, I think. That is always a valid concern, but sometimes there is no other way to make progress. > That said, maybe it's the best we can do.
On Thu, 13 Feb 2025 at 21:17, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Thu, Feb 13, 2025 at 2:11 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Wed, 12 Feb 2025 at 18:05, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > > > with device runtime PM status propagation during system resume and > > > > > > > > the resulting patches: > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > > > more with what the core does. > > > > > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > > > end-goal with this? > > > > > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > > > sleep for all devices. Eventually. > > > > > > > > > > > > And it is necessary to make the core and > > > > > > pm_runtime_force_suspend|resume() work together better for this > > > > > > purpose. > > > > > > > > > > > > > Are you trying to make adaptations for > > > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > > > with other drivers for child-devices that make use of > > > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > > > > > Yes. > > > > > > > > > > > > This is a more general case, though, when a device that was > > > > > > runtime-suspended before system suspend and is left in suspend during > > > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > > > cannot happen otherwise, but I think that it is a generally valid > > > > > > case. > > > > > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > > > right? > > > > > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > > > you know a specific case in which it doesn't work, or I don't see a > > > > > > reason for avoiding it. > > > > > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > > > parents. I guess this is the best we can do then? > > > > > > > > > > > > OK, what are they? > > > > > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > > > > > To put it more bluntly, I'm not aware of any place other than > > > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > > > runtime PM status of a device with runtime PM disabled (either > > > > > permanently, or transiently during a system suspend-resume cycle) to > > > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > > > letting me know what they are. > > > > > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > > > your own investigation as it's simply too time consuming for me to > > > > find them all. The problem is that it's not just a simple pattern to > > > > search for, so we would need some clever scripting to move forward to > > > > find them. > > > > > > > > To start with, we should look for drivers that enable runtime PM, by > > > > calling pm_runtime_enable(). > > > > > > > > Additionally, in their system suspend callback they should typically > > > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > > > pm_runtime_active() as that is usually (but no always) indicating that > > > > they got it right. Then there are those that don't have system > > > > suspend/resume callbacks assigned at all (or uses some other subsystem > > > > specific callback for this), but only uses runtime PM. > > > > > > > > On top of that, drivers that makes use of > > > > pm_runtime_force_suspend|resume() should be disregarded, which has > > > > reached beyond 300 as this point. > > > > > > > > Anyway, here is just one example that I found from a quick search. > > > > drivers/i2c/busses/i2c-qcom-geni.c > > > > > > OK, I see. > > > > > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > > > not suspended already, and assumes it to stay this way across > > > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > > > > > Fair enough, but somebody should tell them that they don't need to use > > > pm_runtime_disable/enable() in _noirq. > > > > > > > > > > > > > Note that rpm_active() could start producing confusing results if the > > > > > runtime PM status of a device with runtime PM disabled is changed from > > > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > > > confuse it the same way. > > > > > > > > Trust me, it will cause problems. > > > > > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > > > the device after the system has resumed, when runtime PM has been > > > > re-enabled for the device by the PM core. > > > > > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > > > if ->runtime_resume() callbacks for the device since its state is > > > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > > > even if the driver believes it has been powered on. In many cases, > > > > accessing the device (like reading/writing a register) will often just > > > > just hang in these cases, but in worst cases we could end up getting > > > > even more difficult bugs to debug. > > > > > > Sure, that would be a problem. > > > > > > I think I need to find a different way to address the problem I'm > > > seeing, that is to resume devices that were runtime-suspended before > > > system suspend along with their superiors. > > > > > > One way to do it would be to just defer their resume to the point when > > > the core has enabled runtime PM for them, which means that it has also > > > enabled runtime PM for all of their superiors, and then call > > > pm_runtime_resume(). > > > > > > This should work unless one of the superiors has runtime PM disabled > > > at that point, of course. > > > > Right, so typically users of pm_runtime_force_suspend|resume() from > > the regular ->suspend|resume() callbacks would not work, because they > > disable/enable runtime PM. Although, we could probably fix this quite > > easily by making some adaptations in > > pm_runtime_force_suspend|resume(). > > > > I am not sure if this approach would have any other issue though, but > > it seems like it could make sense to explore this approach. In general > > drivers should cope with their devices being runtime resumed if > > runtime PM is enabled, right!? > > > > If this works, it seems like a generic and a good solution to me. > > For PCI ports, though, it would require some tuning of > ->runtime_resume(), so it is not as simple as it would seem to be in > the end. Okay. Perhaps it would be worth it to try this out anyway, as it would allow us to keep the PM core as simple as possible? Kind regards Uffe
On Fri, Feb 14, 2025 at 10:55 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Thu, 13 Feb 2025 at 21:17, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Thu, Feb 13, 2025 at 2:11 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Wed, 12 Feb 2025 at 18:05, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 4:15 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > On Wed, 12 Feb 2025 at 12:33, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > > > On Wed, Feb 12, 2025 at 11:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > > > > > On Wed, Feb 12, 2025 at 10:12 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > > > > > > > > > On Tue, 11 Feb 2025 at 22:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > > > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > > > > > > > This series is a result of the discussion on a recently reported issue > > > > > > > > > with device runtime PM status propagation during system resume and > > > > > > > > > the resulting patches: > > > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-pm/12619233.O9o76ZdvQC@rjwysocki.net/ > > > > > > > > > https://lore.kernel.org/linux-pm/6137505.lOV4Wx5bFT@rjwysocki.net/ > > > > > > > > > > > > > > > > > > Overall, due to restrictions related to pm_runtime_force_suspend() and > > > > > > > > > pm_runtime_force_resume(), it was necessary to limit the RPM_ACTIVE > > > > > > > > > setting propagation to the parent of the first device in a dependency > > > > > > > > > chain that turned out to have to be resumed during system resume even > > > > > > > > > though it was runtime-suspended before system suspend. > > > > > > > > > > > > > > > > > > Those restrictions are that (1) pm_runtime_force_suspend() attempts to > > > > > > > > > suspend devices that have never had runtime PM enabled if their runtime > > > > > > > > > PM status is currently RPM_ACTIVE and (2) pm_runtime_force_resume() > > > > > > > > > will skip device whose runtime PM status is currently RPM_ACTIVE. > > > > > > > > > > > > > > > > > > The purpose of this series is to eliminate the above restrictions and > > > > > > > > > get pm_runtime_force_suspend() and pm_runtime_force_resume() to agree > > > > > > > > > more with what the core does. > > > > > > > > > > > > > > > > For my understanding, would you mind elaborating a bit more around the > > > > > > > > end-goal with this? > > > > > > > > > > > > > > The end goal is, of course, full integration of runtime PM with system > > > > > > > sleep for all devices. Eventually. > > > > > > > > > > > > > > And it is necessary to make the core and > > > > > > > pm_runtime_force_suspend|resume() work together better for this > > > > > > > purpose. > > > > > > > > > > > > > > > Are you trying to make adaptations for > > > > > > > > pm_runtime_force_suspend|resume() and the PM core, such that drivers > > > > > > > > that uses pm_runtime_force_suspend|resume() should be able to cope > > > > > > > > with other drivers for child-devices that make use of > > > > > > > > DPM_FLAG_SMART_SUSPEND? > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > This is a more general case, though, when a device that was > > > > > > > runtime-suspended before system suspend and is left in suspend during > > > > > > > it, needs to be resumed during the system resume that follows. > > > > > > > > > > > > > > Currently, DPM_FLAG_SMART_SUSPEND can lead to this sometimes and it > > > > > > > cannot happen otherwise, but I think that it is a generally valid > > > > > > > case. > > > > > > > > > > > > > > > If we can make this work, it would enable the propagation of > > > > > > > > RPM_ACTIVE in the PM core for more devices, but still not for all, > > > > > > > > right? > > > > > > > > > > > > > > It is all until there is a known case in which it isn't. So either > > > > > > > you know a specific case in which it doesn't work, or I don't see a > > > > > > > reason for avoiding it. > > > > > > > > > > > > > > ATM the only specific case in which it doesn't work is when > > > > > > > pm_runtime_force_suspend|resume() are used. > > > > > > > > > > > > > > > The point is, the other bigger issue that I pointed out in our earlier > > > > > > > > discussions; all those devices where their drivers/buses don't cope > > > > > > > > with the behaviour of the DPM_FLAG_SMART_SUSPEND flag, will prevent > > > > > > > > the PM core from *unconditionally* propagating the RPM_ACTIVE to > > > > > > > > parents. I guess this is the best we can do then? > > > > > > > > > > > > > > OK, what are they? > > > > > > > > > > > > > > You keep saying that they exist without giving any examples. > > > > > > > > > > > > To put it more bluntly, I'm not aware of any place other than > > > > > > pm_runtime_force_suspend|resume() that can be confused by changing the > > > > > > runtime PM status of a device with runtime PM disabled (either > > > > > > permanently, or transiently during a system suspend-resume cycle) to > > > > > > RPM_ACTIVE, so if there are any such places, I would appreciate > > > > > > letting me know what they are. > > > > > > > > > > Well, sorry I thought you were aware. Anyway, I believe you need to do > > > > > your own investigation as it's simply too time consuming for me to > > > > > find them all. The problem is that it's not just a simple pattern to > > > > > search for, so we would need some clever scripting to move forward to > > > > > find them. > > > > > > > > > > To start with, we should look for drivers that enable runtime PM, by > > > > > calling pm_runtime_enable(). > > > > > > > > > > Additionally, in their system suspend callback they should typically > > > > > *not* use pm_runtime_suspended(), pm_runtime_status_suspended() or > > > > > pm_runtime_active() as that is usually (but no always) indicating that > > > > > they got it right. Then there are those that don't have system > > > > > suspend/resume callbacks assigned at all (or uses some other subsystem > > > > > specific callback for this), but only uses runtime PM. > > > > > > > > > > On top of that, drivers that makes use of > > > > > pm_runtime_force_suspend|resume() should be disregarded, which has > > > > > reached beyond 300 as this point. > > > > > > > > > > Anyway, here is just one example that I found from a quick search. > > > > > drivers/i2c/busses/i2c-qcom-geni.c > > > > > > > > OK, I see. > > > > > > > > It sets the status to RPM_SUSPENDED in geni_i2c_suspend_noirq(), if > > > > not suspended already, and assumes it to stay this way across > > > > geni_i2c_resume_noirq() until someone resumes it via runtime PM. > > > > > > > > Fair enough, but somebody should tell them that they don't need to use > > > > pm_runtime_disable/enable() in _noirq. > > > > > > > > > > > > > > > > Note that rpm_active() could start producing confusing results if the > > > > > > runtime PM status of a device with runtime PM disabled is changed from > > > > > > RPM_ACTIVE to RPM_SUSPENDED because it will then start to return > > > > > > -EACCES instead of 1, but changing the status to RPM_ACTIVE will not > > > > > > confuse it the same way. > > > > > > > > > > Trust me, it will cause problems. > > > > > > > > > > Drivers may call pm_runtime_get_sync() to turn on the resources for > > > > > the device after the system has resumed, when runtime PM has been > > > > > re-enabled for the device by the PM core. > > > > > > > > > > Those calls to pm_runtime_get_sync() will then not end up invoking any > > > > > if ->runtime_resume() callbacks for the device since its state is > > > > > already RPM_ACTIVE. Hence, the device will remain in a low power state > > > > > even if the driver believes it has been powered on. In many cases, > > > > > accessing the device (like reading/writing a register) will often just > > > > > just hang in these cases, but in worst cases we could end up getting > > > > > even more difficult bugs to debug. > > > > > > > > Sure, that would be a problem. > > > > > > > > I think I need to find a different way to address the problem I'm > > > > seeing, that is to resume devices that were runtime-suspended before > > > > system suspend along with their superiors. > > > > > > > > One way to do it would be to just defer their resume to the point when > > > > the core has enabled runtime PM for them, which means that it has also > > > > enabled runtime PM for all of their superiors, and then call > > > > pm_runtime_resume(). > > > > > > > > This should work unless one of the superiors has runtime PM disabled > > > > at that point, of course. > > > > > > Right, so typically users of pm_runtime_force_suspend|resume() from > > > the regular ->suspend|resume() callbacks would not work, because they > > > disable/enable runtime PM. Although, we could probably fix this quite > > > easily by making some adaptations in > > > pm_runtime_force_suspend|resume(). > > > > > > I am not sure if this approach would have any other issue though, but > > > it seems like it could make sense to explore this approach. In general > > > drivers should cope with their devices being runtime resumed if > > > runtime PM is enabled, right!? In theory. In practice, though, to start with, this can only be done for devices whose drivers opt-in (or don't care) and starting with "leaf" devices (no children or consumers). > > > If this works, it seems like a generic and a good solution to me. > > > > For PCI ports, though, it would require some tuning of > > ->runtime_resume(), so it is not as simple as it would seem to be in > > the end. > > Okay. Perhaps it would be worth it to try this out anyway, as it would > allow us to keep the PM core as simple as possible? Except that initially it will need the same opt-in from everybody involved as the set active propagation, at least initially, so IMV it is better to start enforcing the opt-in requirement first and then convert stuff to the new approach.