mbox series

[RFC,00/10] Fix the ABBA locking situation between clk and runtime PM

Message ID 20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com (mailing list archive)
Headers show
Series Fix the ABBA locking situation between clk and runtime PM | expand

Message

Miquel Raynal March 26, 2025, 6:26 p.m. UTC
As explained in the following thread, there is a known ABBA locking
dependency between clk and runtime PM.
Link: https://lore.kernel.org/linux-clk/20240527181928.4fc6b5f0@xps-13/

The problem is that the clk subsystem uses a mutex to protect concurrent
accesses to its tree structure, and so do other subsystems such as
generic power domains. While it holds its own mutex, the clk subsystem
performs runtime PM calls which end up executing callbacks from other
subsystems (again, gen PD is in the loop). But typically power domains
may also need to perform clock related operations, and thus the
following two situations may happen:

mutex_lock(clk);
mutex_lock(genpd);

or

mutex_lock(genpd);
mutex_lock(clk);

As of today I know that at least NXP i.MX8MP and MediaTek MT8183 SoCs
are complex enough to face this kind of issues.

There's been a first workaround to "silence" lockdep with the most
obvious case triggering the warning: making sure all clocks are RPM
enabled before running the clk_disable_unused() work, but this is just
addressing one situation among many other potentially problematic
situations. In the past, both Laurent Pinchart and Marek Vasut have
experienced these issues when enabling HDMI and audio support,
respectively.

Following a discussion we had at last Plumbers with Steven, I am
proposing to decouple both locks by changing a bit the clk approach:
let's always runtime resume all clocks that we *might* need before
taking the clock lock. But how do we know the list? Well, depending on
the situation we may either need to wake up:
- the upper part of the tree during prepare/unprepare operations.
- the lower part of the tree during (read) rate operations.
- the upper part and the lower part of the tree otherwise (especially
  during rate changes which may involve reparenting).

Luckily, we do not need to do that by hand, are more importantly we do
not need to use the clock tree for that because thanks to the work from
Saravana, we already have device links describing exhaustively the
consumer/supplier relationships. The clock topology (from a runtime PM
perspective) is reflected in these links. In practice, we do not care
about all consumers, but the few clock operations that will actually
trigger runtime PM operations are probably not impacting enough to
justify something more complex.

So here it is: every patch in this series decouples the two locks in
various places of the clock subsystem, until we reach a point where all
needed clocks will always be resumed before acquiring the core lock. It
obviously requires a few new helpers in the RPM core which may probably
be enhanced, I've tried to keep them as simple and straightforward as
possible.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
Miquel Raynal (10):
      PM: runtime: Add helpers to resume consumers
      clk: Improve comments with usual punctuation
      clk: Avoid non needed runtime PM calls
      clk: Avoid open coded-logic when a there is a helper available
      clk: Move runtime PM calls out of the prepare_lock in clk_init()
      clk: Move runtime PM calls out of the prepare_lock in clk_prepare()
      clk: Ensure all RPM enabled clocks are enabled before reparenting orphans
      clk: Move runtime PM calls out of the prepare_lock in clk_unregister()
      clk: Make sure clock parents and children are resumed when necessary
      clk: Fix the ABBA locking issue with runtime PM subcalls

 drivers/base/power/runtime.c |  54 ++++++++++++
 drivers/clk/clk.c            | 204 +++++++++++++++++++++++++++++++------------
 include/linux/pm_runtime.h   |   2 +
 3 files changed, 204 insertions(+), 56 deletions(-)
---
base-commit: ab6df33805e6e6e4ac1a519cfcade3f7f19f6ff1
change-id: 20250307-cross-lock-dep-e65f285ce8e1

Best regards,