diff mbox

PM / runtime: Use synchronous runtime PM call in rpm_put_suppliers()

Message ID 1490174570-18345-1-git-send-email-m.szyprowski@samsung.com (mailing list archive)
State Deferred
Headers show

Commit Message

Marek Szyprowski March 22, 2017, 9:22 a.m. UTC
rpm_put_suppliers() might be called as a result of pm_runtime_put_sync()
call, where the called explicitly wants to perform the operation in
synchronous way to avoid some deadlocks related to concurrent code
execution in the PM workers. However the current code for handling device
links always use asynchronous calls. This patch changes it to always use
the synchronous calls.

This patch fixes the potential deadlock, which appears during adding
runtime PM support to clock framework, which uses global mutex, which is
reentrant only for the current process, so trying to execute runtime PM
callback from the worker results in deadlock (the mentioned mutex is
already taken by the current process, which waits for execution of PM
worker).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Hi Rafael & Ulf,

This was the simplest way of fixing this issue. Other way would be to
propagate rpmflags to rpm_get/put_suppliers() to ensure that synchronous
calls will be also done on the linked devices. Which way is better in
your opinion?

Best regards
Marek Szyprowski
Samsung R&D Institute Poland
---
 drivers/base/power/runtime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ulf Hansson March 22, 2017, 11:34 a.m. UTC | #1
On 22 March 2017 at 10:22, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> rpm_put_suppliers() might be called as a result of pm_runtime_put_sync()
> call, where the called explicitly wants to perform the operation in
> synchronous way to avoid some deadlocks related to concurrent code
> execution in the PM workers. However the current code for handling device
> links always use asynchronous calls. This patch changes it to always use
> the synchronous calls.

This is a good idea! It is better to leave the decision to the "leaf"
node of whether to use asynchronous mode or not.

>
> This patch fixes the potential deadlock, which appears during adding
> runtime PM support to clock framework, which uses global mutex, which is
> reentrant only for the current process, so trying to execute runtime PM
> callback from the worker results in deadlock (the mentioned mutex is
> already taken by the current process, which waits for execution of PM
> worker).

This just tells that the locking mechanism for clocks are fragile.
Whether it is a good motivation for this change, I am not sure.

In either case, I like the change!

Kind regards
Uffe

>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Hi Rafael & Ulf,
>
> This was the simplest way of fixing this issue. Other way would be to
> propagate rpmflags to rpm_get/put_suppliers() to ensure that synchronous
> calls will be also done on the linked devices. Which way is better in
> your opinion?
>
> Best regards
> Marek Szyprowski
> Samsung R&D Institute Poland
> ---
>  drivers/base/power/runtime.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 7bcf80fa9ada..b7df589ae579 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -292,7 +292,7 @@ static void rpm_put_suppliers(struct device *dev)
>         list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
>                 if (link->rpm_active &&
>                     READ_ONCE(link->status) != DL_STATE_SUPPLIER_UNBIND) {
> -                       pm_runtime_put(link->supplier);
> +                       pm_runtime_put_sync(link->supplier);
>                         link->rpm_active = false;
>                 }
>  }
> --
> 1.9.1
>
Rafael J. Wysocki March 22, 2017, 8:56 p.m. UTC | #2
On Wed, Mar 22, 2017 at 12:34 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 22 March 2017 at 10:22, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> rpm_put_suppliers() might be called as a result of pm_runtime_put_sync()
>> call, where the called explicitly wants to perform the operation in
>> synchronous way to avoid some deadlocks related to concurrent code
>> execution in the PM workers. However the current code for handling device
>> links always use asynchronous calls. This patch changes it to always use
>> the synchronous calls.
>
> This is a good idea! It is better to leave the decision to the "leaf"
> node of whether to use asynchronous mode or not.

One immediate concern that I have is that this happens in an SRCU
read-side critical section and sync runtime suspend can take arbitrary
amount of time, especially if those devices have suppliers etc.  You
may very well wait for an entire subtree to suspend synchronously
here.

Moreover, if the suppliers have parents, those parents won't be
suspended synchronously anyway, so this only addresses the first level
of hierarchy in that case.

Also the behavior for suppliers follows the behavior for parents and I
don't quite see the reason for these two cases to behave differently
in principle.

>>
>> This patch fixes the potential deadlock, which appears during adding
>> runtime PM support to clock framework, which uses global mutex, which is
>> reentrant only for the current process, so trying to execute runtime PM
>> callback from the worker results in deadlock (the mentioned mutex is
>> already taken by the current process, which waits for execution of PM
>> worker).
>
> This just tells that the locking mechanism for clocks are fragile.

Right.

We seem to be working around problems in there with this patch.

> Whether it is a good motivation for this change, I am not sure.
>
> In either case, I like the change!

Why exactly do you like it?

Thanks,
Rafael
Marek Szyprowski March 23, 2017, 10:31 a.m. UTC | #3
Hi Rafael,

On 2017-03-22 21:56, Rafael J. Wysocki wrote:
> On Wed, Mar 22, 2017 at 12:34 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 22 March 2017 at 10:22, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> rpm_put_suppliers() might be called as a result of pm_runtime_put_sync()
>>> call, where the called explicitly wants to perform the operation in
>>> synchronous way to avoid some deadlocks related to concurrent code
>>> execution in the PM workers. However the current code for handling device
>>> links always use asynchronous calls. This patch changes it to always use
>>> the synchronous calls.
>> This is a good idea! It is better to leave the decision to the "leaf"
>> node of whether to use asynchronous mode or not.
> One immediate concern that I have is that this happens in an SRCU
> read-side critical section and sync runtime suspend can take arbitrary
> amount of time, especially if those devices have suppliers etc.  You
> may very well wait for an entire subtree to suspend synchronously
> here.
>
> Moreover, if the suppliers have parents, those parents won't be
> suspended synchronously anyway, so this only addresses the first level
> of hierarchy in that case.
>
> Also the behavior for suppliers follows the behavior for parents and I
> don't quite see the reason for these two cases to behave differently
> in principle.

Well, then I don't get why do we really have pm_runtime_put_sync() if it
doesn't guarantee synchronous suspend operations. This might be really 
tricky
to debug locking issues if one assumes that suspend operation has been
finished, but it is still being performed in parallel by the worker.

>>> This patch fixes the potential deadlock, which appears during adding
>>> runtime PM support to clock framework, which uses global mutex, which is
>>> reentrant only for the current process, so trying to execute runtime PM
>>> callback from the worker results in deadlock (the mentioned mutex is
>>> already taken by the current process, which waits for execution of PM
>>> worker).
>> This just tells that the locking mechanism for clocks are fragile.
> Right.
>
> We seem to be working around problems in there with this patch.

Not really. This problem will reappear always if re-entrant locks are used.
Clocks are just one of such frameworks, which use such locking mechanism.

Maybe propagating flags from the caller will be a better approach in this
case? So if caller wants synchronous call, it will be properly propagated,
otherwise, the current behavior will take place. Using pm_runtime_put_sync
is not that common, so probably in most cases callers have a good reason
for that. It would make sense to apply this also to operations on parents.

>> Whether it is a good motivation for this change, I am not sure.
>>
>> In either case, I like the change!
> Why exactly do you like it?

Best regards
Ulf Hansson March 23, 2017, 12:47 p.m. UTC | #4
On 22 March 2017 at 21:56, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Wed, Mar 22, 2017 at 12:34 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 22 March 2017 at 10:22, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> rpm_put_suppliers() might be called as a result of pm_runtime_put_sync()
>>> call, where the called explicitly wants to perform the operation in
>>> synchronous way to avoid some deadlocks related to concurrent code
>>> execution in the PM workers. However the current code for handling device
>>> links always use asynchronous calls. This patch changes it to always use
>>> the synchronous calls.
>>
>> This is a good idea! It is better to leave the decision to the "leaf"
>> node of whether to use asynchronous mode or not.
>
> One immediate concern that I have is that this happens in an SRCU
> read-side critical section and sync runtime suspend can take arbitrary
> amount of time, especially if those devices have suppliers etc.  You
> may very well wait for an entire subtree to suspend synchronously
> here.

Yes that's a problem. Can we address that somehow?

>
> Moreover, if the suppliers have parents, those parents won't be
> suspended synchronously anyway, so this only addresses the first level
> of hierarchy in that case.
>
> Also the behavior for suppliers follows the behavior for parents and I
> don't quite see the reason for these two cases to behave differently
> in principle.

Yes, you are right!

We shouldn't change this for suppliers unless we also change this for parents.

>
>>>
>>> This patch fixes the potential deadlock, which appears during adding
>>> runtime PM support to clock framework, which uses global mutex, which is
>>> reentrant only for the current process, so trying to execute runtime PM
>>> callback from the worker results in deadlock (the mentioned mutex is
>>> already taken by the current process, which waits for execution of PM
>>> worker).
>>
>> This just tells that the locking mechanism for clocks are fragile.
>
> Right.
>
> We seem to be working around problems in there with this patch.
>
>> Whether it is a good motivation for this change, I am not sure.
>>
>> In either case, I like the change!
>
> Why exactly do you like it?

Okay, let me try to elaborate.

For the async runtime PM suspend/idle case (including runtime PM autosuspend):
When a driver for a children/consumer decides to use the async runtime
PM APIs, I assume it's because the driver expects the device to be
used in some kind of a "bursty" manner. Meaning it doesn't make sense
to save power in-between every request, but instead leave the device
powered for little while after a request has been served. The purpose
is often to avoid the wakeup-latency for *each* request, when these
comes in bursts.

For these cases the driver has already traded some power savings for
improved performance of requests in bursts. When this trade is done
for the children/consumer, I think it seems reasonable to not further
defer the parent/supplier to be be runtime suspended/idled, as that
would further trade power for performance, which is what happens when
rpm_idle|suspend() decides to punt this to a work.

Moreover, scheduling a work for each parent/supplier seems suboptimal
to me, as in the async case we are already executing from within a
work. In cases of a higher level of device hierarchy, several works
becomes scheduled.

For the sync runtime PM suspend/idle case:
When a driver for a children/consumers decides to use the sync runtime
PM APIs, it's most likely because it wants the device to enter its low
power state as soon as possible. In other words, there is no need to
trade power for performance, either because power is more important
than performance or because the wakeup latency is negligible. To me, I
think it seems reasonable to respect this policy for parents/suppliers
as well instead of always punting to a work.

Kind regards
Uffe
Lukas Wunner March 24, 2017, 11:37 a.m. UTC | #5
On Thu, Mar 23, 2017 at 01:47:46PM +0100, Ulf Hansson wrote:
> For the sync runtime PM suspend/idle case:
> When a driver for a children/consumers decides to use the sync runtime
> PM APIs, it's most likely because it wants the device to enter its low
> power state as soon as possible.

I think a driver is doing it wrong if it uses sync vs. async to achieve
a certain policy.  (As in the example you've given above.)  That should
be left to the PM core, and the PM core should try to suspend as early
and as often as possible in order to save power.  If the driver wants to
achieve a policy such as "suspend this device ASAP but let this other
device idle a bit before suspend because its usage pattern is bursty"
then the correct way to implement this is via a sane default value for
the autosuspend delay.

Generelly, async should be used.  One example where sync is warranted
is when the ->runtime_suspend hook needs to be finished before power
is cut to the device.  That's a hard constraint.  E.g. vga_switcheroo
does this when runtime suspending the GPU.  In this case, "runtime
suspend" means that the driver puts the device in a state from which
power can be cut.  The actual power switch is toggled afterwards.

Regards,

Lukas
Marek Szyprowski March 27, 2017, 10:14 a.m. UTC | #6
Hi Ulf,

On 2017-03-23 13:47, Ulf Hansson wrote:
> On 22 March 2017 at 21:56, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Wed, Mar 22, 2017 at 12:34 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> On 22 March 2017 at 10:22, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>> rpm_put_suppliers() might be called as a result of pm_runtime_put_sync()
>>>> call, where the called explicitly wants to perform the operation in
>>>> synchronous way to avoid some deadlocks related to concurrent code
>>>> execution in the PM workers. However the current code for handling device
>>>> links always use asynchronous calls. This patch changes it to always use
>>>> the synchronous calls.
>>> This is a good idea! It is better to leave the decision to the "leaf"
>>> node of whether to use asynchronous mode or not.
>> One immediate concern that I have is that this happens in an SRCU
>> read-side critical section and sync runtime suspend can take arbitrary
>> amount of time, especially if those devices have suppliers etc.  You
>> may very well wait for an entire subtree to suspend synchronously
>> here.
> Yes that's a problem. Can we address that somehow?

Just a stupid question: why this is issue in case of rpm_put_suppliers()
and call to pm_runtime_put_sync(), but it is okay in rpm_get_suppliers()
and pm_runtime_get_sync(), which can also take a long time to complete?

>> Moreover, if the suppliers have parents, those parents won't be
>> suspended synchronously anyway, so this only addresses the first level
>> of hierarchy in that case.
>>
>> Also the behavior for suppliers follows the behavior for parents and I
>> don't quite see the reason for these two cases to behave differently
>> in principle.
> Yes, you are right!
>
> We shouldn't change this for suppliers unless we also change this for parents.

Right, this can also lead to the deadlock similar to the one observed with
links. Once again, it looks that if caller wants a synchronous operation,
it should be propagated to parents also.

>>>> This patch fixes the potential deadlock, which appears during adding
>>>> runtime PM support to clock framework, which uses global mutex, which is
>>>> reentrant only for the current process, so trying to execute runtime PM
>>>> callback from the worker results in deadlock (the mentioned mutex is
>>>> already taken by the current process, which waits for execution of PM
>>>> worker).
>>> This just tells that the locking mechanism for clocks are fragile.
>> Right.
>>
>> We seem to be working around problems in there with this patch.
>>
>>> Whether it is a good motivation for this change, I am not sure.
>>>
>>> In either case, I like the change!
>> Why exactly do you like it?
> Okay, let me try to elaborate.
>
> For the async runtime PM suspend/idle case (including runtime PM autosuspend):
> When a driver for a children/consumer decides to use the async runtime
> PM APIs, I assume it's because the driver expects the device to be
> used in some kind of a "bursty" manner. Meaning it doesn't make sense
> to save power in-between every request, but instead leave the device
> powered for little while after a request has been served. The purpose
> is often to avoid the wakeup-latency for *each* request, when these
> comes in bursts.
>
> For these cases the driver has already traded some power savings for
> improved performance of requests in bursts. When this trade is done
> for the children/consumer, I think it seems reasonable to not further
> defer the parent/supplier to be be runtime suspended/idled, as that
> would further trade power for performance, which is what happens when
> rpm_idle|suspend() decides to punt this to a work.
>
> Moreover, scheduling a work for each parent/supplier seems suboptimal
> to me, as in the async case we are already executing from within a
> work. In cases of a higher level of device hierarchy, several works
> becomes scheduled.
>
> For the sync runtime PM suspend/idle case:
> When a driver for a children/consumers decides to use the sync runtime
> PM APIs, it's most likely because it wants the device to enter its low
> power state as soon as possible. In other words, there is no need to
> trade power for performance, either because power is more important
> than performance or because the wakeup latency is negligible. To me, I
> think it seems reasonable to respect this policy for parents/suppliers
> as well instead of always punting to a work.

Thanks for the detailed explanation. It looks that is should be safe to
use sync for both links and parents in all cases...

Best regards
Ulf Hansson March 27, 2017, 12:05 p.m. UTC | #7
On 24 March 2017 at 12:37, Lukas Wunner <lukas@wunner.de> wrote:
> On Thu, Mar 23, 2017 at 01:47:46PM +0100, Ulf Hansson wrote:
>> For the sync runtime PM suspend/idle case:
>> When a driver for a children/consumers decides to use the sync runtime
>> PM APIs, it's most likely because it wants the device to enter its low
>> power state as soon as possible.
>
> I think a driver is doing it wrong if it uses sync vs. async to achieve
> a certain policy.  (As in the example you've given above.)  That should
> be left to the PM core, and the PM core should try to suspend as early
> and as often as possible in order to save power.  If the driver wants to

I am not really sure what you mean by this.

The runtime PM APIs is what we have today and those are of course used
a bit differently depending on the client. I was trying to summarize
my impressions from those that using the asynchronous interface.

> achieve a policy such as "suspend this device ASAP but let this other
> device idle a bit before suspend because its usage pattern is bursty"
> then the correct way to implement this is via a sane default value for
> the autosuspend delay.

The runtime PM autosuspend feature is per definition asynchronous. And
since user space anyway can change the value of the timeout, I don't
think it make sense to consider this as separate case within this
context.

>
> Generelly, async should be used.  One example where sync is warranted
> is when the ->runtime_suspend hook needs to be finished before power
> is cut to the device.  That's a hard constraint.  E.g. vga_switcheroo
> does this when runtime suspending the GPU.  In this case, "runtime
> suspend" means that the driver puts the device in a state from which
> power can be cut.  The actual power switch is toggled afterwards.

No, this isn't a valid case. If this how vga_switcheroo deploys
runtime PM supports this surely needs to be fixed!

A call to pm_runtime_put_sync() may not trigger the runtime suspend
callback to be invoked, because userspace via sysfs can prevent that.

Kind regards
Uffe
diff mbox

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 7bcf80fa9ada..b7df589ae579 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -292,7 +292,7 @@  static void rpm_put_suppliers(struct device *dev)
 	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
 		if (link->rpm_active &&
 		    READ_ONCE(link->status) != DL_STATE_SUPPLIER_UNBIND) {
-			pm_runtime_put(link->supplier);
+			pm_runtime_put_sync(link->supplier);
 			link->rpm_active = false;
 		}
 }