diff mbox series

[RFC,01/10] PM: runtime: Add helpers to resume consumers

Message ID 20250326-cross-lock-dep-v1-1-3199e49e8652@bootlin.com (mailing list archive)
State Under Review
Headers show
Series Fix the ABBA locking situation between clk and runtime PM | expand

Commit Message

Miquel Raynal March 26, 2025, 6:26 p.m. UTC
The runtime PM core currently allows to runtime resume/suspend a device,
or its suppliers.

Let's make it also possible to runtime resume/suspend consumers.

Consumers and suppliers are seen here through the description made by
device_links.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/base/power/runtime.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_runtime.h   |  2 ++
 2 files changed, 56 insertions(+)

Comments

Rafael J. Wysocki March 26, 2025, 7:18 p.m. UTC | #1
On Wed, Mar 26, 2025 at 7:26 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> The runtime PM core currently allows to runtime resume/suspend a device,
> or its suppliers.
>
> Let's make it also possible to runtime resume/suspend consumers.
>
> Consumers and suppliers are seen here through the description made by
> device_links.

It would be good to explain why all of this is needed.

I gather that it is used for resolving some synchronization issues in
the clk framework, but neither the cover letter nor this changelog
explains how it is used.

> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/base/power/runtime.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pm_runtime.h   |  2 ++
>  2 files changed, 56 insertions(+)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 2ee45841486bc73225b3e971164466647b3ce6d3..04bb66c18e3e4a45751fb3f9a6a1267d73757310 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1841,6 +1841,60 @@ void pm_runtime_put_suppliers(struct device *dev)
>         device_links_read_unlock(idx);
>  }
>
> +static void __pm_runtime_get_consumers(struct device *dev)
> +{
> +       struct device_link *link;
> +
> +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
> +                               device_links_read_lock_held())
> +               if (link->flags & DL_FLAG_PM_RUNTIME) {
> +                       pm_runtime_get_sync(link->consumer);
> +                       __pm_runtime_get_consumers(link->consumer);
> +               }
> +}
> +
> +/**
> + * pm_runtime_get_consumers - Resume and reference-count consumer devices.
> + * @dev: Supplier device.
> + */
> +void pm_runtime_get_consumers(struct device *dev)
> +{
> +       int idx;
> +
> +       idx = device_links_read_lock();
> +
> +       __pm_runtime_get_consumers(dev);
> +
> +       device_links_read_unlock(idx);
> +}
> +
> +static void __pm_runtime_put_consumers(struct device *dev)
> +{
> +       struct device_link *link;
> +
> +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
> +                               device_links_read_lock_held())
> +               if (link->flags & DL_FLAG_PM_RUNTIME) {
> +                       pm_runtime_put(link->consumer);
> +                       __pm_runtime_put_consumers(link->consumer);
> +               }
> +}
> +
> +/**
> + * pm_runtime_put_consumers - Drop references to consumer devices.
> + * @dev: Supplier device.
> + */
> +void pm_runtime_put_consumers(struct device *dev)
> +{
> +       int idx;
> +
> +       idx = device_links_read_lock();
> +
> +       __pm_runtime_put_consumers(dev);
> +
> +       device_links_read_unlock(idx);
> +}
> +
>  void pm_runtime_new_link(struct device *dev)
>  {
>         spin_lock_irq(&dev->power.lock);
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index d39dc863f612fe18dc34182117f87908d63c8e6d..151c885a3f421f09509232f144618da62297d61d 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -89,6 +89,8 @@ extern u64 pm_runtime_autosuspend_expiration(struct device *dev);
>  extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
>  extern void pm_runtime_get_suppliers(struct device *dev);
>  extern void pm_runtime_put_suppliers(struct device *dev);
> +extern void pm_runtime_get_consumers(struct device *dev);
> +extern void pm_runtime_put_consumers(struct device *dev);
>  extern void pm_runtime_new_link(struct device *dev);
>  extern void pm_runtime_drop_link(struct device_link *link);
>  extern void pm_runtime_release_supplier(struct device_link *link);
>
> --
> 2.48.1
>
Miquel Raynal March 28, 2025, 9:59 a.m. UTC | #2
Hello Rafael,

>> The runtime PM core currently allows to runtime resume/suspend a device,
>> or its suppliers.
>>
>> Let's make it also possible to runtime resume/suspend consumers.
>>
>> Consumers and suppliers are seen here through the description made by
>> device_links.
>
> It would be good to explain why all of this is needed.
>
> I gather that it is used for resolving some synchronization issues in
> the clk framework, but neither the cover letter nor this changelog
> explains how it is used.

The explanation is quite long, there have been already 3 full threads
from people attempting to fix a problem that resides in the clock
subsystem (but that may also be probably problematic in others, just
uncovered so far). I don't know if you took the time to read the cover
letter:
https://lore.kernel.org/linux-clk/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/
It tries to explain the problem and the approach to fix this problem,
but let me try to give a runtime PM focused view of it here.

[Problem]

We do have an ABBA locking situation between clk and any other subsystem
that might be in use during runtime_resume() operations, provided that
these subsystems also make clk calls at some point. The usual suspect
here are power domains.

There are different approaches that can be taken but the one that felt
the most promising when we discussed it during last LPC (and also the
one that was partially implemented in the clk subsystem already for a
tiny portion of it) is the rule that "subsystem locks should not be kept
acquired while calling in some other subsystems".

Typically in the clk subsystem the logic is:

func() {
        mutex_lock(clk);
        runtime_resume(clk);
        ...
}

Whereas what would definitely work without locking issues is the
opposite:

func() {
        runtime_resume(clk);
        mutex_lock(clk);
        ...
}

Of course life is not so simple, and the clock core is highly
recursive, which means inverting the two calls like I hinted above
simply does not work as we go deeper in the subcalls. As a result, we
need to runtime resume *all* the relevant clocks in advance, before
calling functions recursively (the lock itself is allowed to re-enter
and is not blocking in this case).

I followed all possible paths in the clock subsystem and identified 3
main categories. The list of clocks we need to runtime resume in advance
can either be:
1- the parent clocks
2- the child clocks
3- the parent and child clocks
4- all the clocks (typically for debugfs/sysfs purposes).

[Solution 1: discarded]

The first approach to do that was do to some guessing based on the clock
tree topology. Unfortunately this approach does not stand because it is
virtually unbounded. In order to know the clock topology we must acquire
the clock main lock. In order to runtime resume we must release it. As a
result, this logic is virtually unbounded (even though in practice we
would converge at some point). So this approach was discarded by Steven.

[Solution 2: this proposal]

After the LPC discussion with Steven, I also discussed with Saravana
about this and he pointed that since we were using fw_devlink=rpm by
default now, all providers -including clock controllers of course- would
already be runtime resumed the first time we would make a
runtime_resume(clk), and thus all the nested calls were no longer
needed. This native solution was already addressing point #1 above (and
partially point #3) and all I had to do was to make a similar function
for point #2.

And here we are, trying to resume all consumers (from a device link
perspective) which include, but is not limited to, consumer clocks.

I hope this explanation will help understanding this patch and why it is
needed for this series. As stated in the cover letter, I've tried to
keep the changes here to their minimum. Maybe there are other/better
ways to do that and we can discuss them. My priority is however to get
this possible ABBA deadlock situation sorted out.

I can further expand the commit log with these details if you want.

Thanks,
Miquèl
diff mbox series

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 2ee45841486bc73225b3e971164466647b3ce6d3..04bb66c18e3e4a45751fb3f9a6a1267d73757310 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1841,6 +1841,60 @@  void pm_runtime_put_suppliers(struct device *dev)
 	device_links_read_unlock(idx);
 }
 
+static void __pm_runtime_get_consumers(struct device *dev)
+{
+	struct device_link *link;
+
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
+				device_links_read_lock_held())
+		if (link->flags & DL_FLAG_PM_RUNTIME) {
+			pm_runtime_get_sync(link->consumer);
+			__pm_runtime_get_consumers(link->consumer);
+		}
+}
+
+/**
+ * pm_runtime_get_consumers - Resume and reference-count consumer devices.
+ * @dev: Supplier device.
+ */
+void pm_runtime_get_consumers(struct device *dev)
+{
+	int idx;
+
+	idx = device_links_read_lock();
+
+	__pm_runtime_get_consumers(dev);
+
+	device_links_read_unlock(idx);
+}
+
+static void __pm_runtime_put_consumers(struct device *dev)
+{
+	struct device_link *link;
+
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
+				device_links_read_lock_held())
+		if (link->flags & DL_FLAG_PM_RUNTIME) {
+			pm_runtime_put(link->consumer);
+			__pm_runtime_put_consumers(link->consumer);
+		}
+}
+
+/**
+ * pm_runtime_put_consumers - Drop references to consumer devices.
+ * @dev: Supplier device.
+ */
+void pm_runtime_put_consumers(struct device *dev)
+{
+	int idx;
+
+	idx = device_links_read_lock();
+
+	__pm_runtime_put_consumers(dev);
+
+	device_links_read_unlock(idx);
+}
+
 void pm_runtime_new_link(struct device *dev)
 {
 	spin_lock_irq(&dev->power.lock);
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index d39dc863f612fe18dc34182117f87908d63c8e6d..151c885a3f421f09509232f144618da62297d61d 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -89,6 +89,8 @@  extern u64 pm_runtime_autosuspend_expiration(struct device *dev);
 extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
 extern void pm_runtime_get_suppliers(struct device *dev);
 extern void pm_runtime_put_suppliers(struct device *dev);
+extern void pm_runtime_get_consumers(struct device *dev);
+extern void pm_runtime_put_consumers(struct device *dev);
 extern void pm_runtime_new_link(struct device *dev);
 extern void pm_runtime_drop_link(struct device_link *link);
 extern void pm_runtime_release_supplier(struct device_link *link);