mbox series

[v2,0/5] PM: Fixes for Realtime systems

Message ID 20221219151503.385816-1-krzysztof.kozlowski@linaro.org (mailing list archive)
Headers show
Series PM: Fixes for Realtime systems | expand

Message

Krzysztof Kozlowski Dec. 19, 2022, 3:14 p.m. UTC
Hi,

The goal is to make Linux kernel PM / PM domains / cpuidle friendlier for
Realtime systsems (PREEMPT_RT).  Realtime changes regular spinlocks into
sleeping primitives, thus other parts of the code must be ready for it.

Changes since v1
================
1. Patch #1: Add missing WARN for parent domain
2. New patches 3-5 for other issues encountered with PREEMPT_RT.

Best regards,
Krzysztof

---

Cc: Adrien Thierry <athierry@redhat.com>
Cc: Brian Masney <bmasney@redhat.com>
Cc: linux-rt-users@vger.kernel.org

Krzysztof Kozlowski (5):
  PM: domains: Add GENPD_FLAG_RT_SAFE for PREEMPT_RT
  cpuidle: psci: Mark as PREEMPT_RT safe
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  PM: Allow calling dev_pm_domain_set() with raw spinlock
  PM: domains: Do not call device_pm_check_callbacks() when holding
    genpd_lock()

 drivers/base/power/common.c           | 27 ++++++++++-
 drivers/base/power/domain.c           | 65 +++++++++++++++++++++++++--
 drivers/cpuidle/cpuidle-psci-domain.c |  3 +-
 drivers/cpuidle/cpuidle-psci.c        |  4 +-
 include/linux/pm_domain.h             | 16 +++++++
 5 files changed, 107 insertions(+), 8 deletions(-)

Comments

Adrien Thierry Dec. 20, 2022, 9:36 p.m. UTC | #1
Hi Krzysztof,
Thanks for looking into this!

I tested your patchset on the QDrive3 on a CentOS Stream 9 RT kernel (I
couldn't test it on mainline because the latest RT patchset only supports
6.1 which is missing some bits needed to boot QDrive3).

It fixes the PSCI cpuidle issue I was encountering in [1]. However, I may
have found another code path that triggers a similar issue:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 113, name: kworker/4:2
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
4 locks held by kworker/4:2/113:
 #0: ffff09b0c2376928 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0
 #1: ffff800008bf3dd0 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0
 #2: ffff09b0c2e44860 (&genpd->rslock){....}-{2:2}, at: genpd_lock_rawspin+0x20/0x30
 #3: ffff09b0c6696a20 (&dev->power.lock){+.+.}-{2:2}, at: dev_pm_qos_flags+0x2c/0x60
irq event stamp: 170
hardirqs last  enabled at (169): [<ffffa1be822f8a78>] _raw_spin_unlock_irq+0x48/0xc4
hardirqs last disabled at (170): [<ffffa1be822f8df4>] _raw_spin_lock_irqsave+0xb0/0xfc
softirqs last  enabled at (0): [<ffffa1be814cfff0>] copy_process+0x68c/0x1500
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<ffffa1be81d7e620>] genpd_lock_rawspin+0x20/0x30
CPU: 4 PID: 113 Comm: kworker/4:2 Tainted: G               X --------- ---  5.14.0-rt14+ #2
Hardware name: Qualcomm SA8540 ADP (DT)
Workqueue: pm genpd_power_off_work_fn
Call trace:
 dump_backtrace+0xb4/0x12c
 show_stack+0x1c/0x70
 dump_stack_lvl+0x98/0xd0
 dump_stack+0x14/0x2c
 __might_resched+0x180/0x220
 rt_spin_lock+0x74/0x11c
 dev_pm_qos_flags+0x2c/0x60
 genpd_power_off.part.0.isra.0+0xac/0x2d0
 genpd_power_off_work_fn+0x68/0x8c
 process_one_work+0x2b8/0x7c0
 worker_thread+0x15c/0x44c
 kthread+0xf8/0x104
 ret_from_fork+0x10/0x20

This happens consistently during boot. But on the mainline kernel, this
code path has changed: genpd_power_off no longer calls dev_pm_qos_flags.
So it might not happen on mainline. I hope to be able to test your
patchset again soon on mainline with the next version of the RT patchset
(which should be able to boot the QDrive3).

Best,
Adrien

[1] https://lore.kernel.org/all/20220615203605.1068453-1-athierry@redhat.com/
Ulf Hansson Jan. 4, 2023, 3:15 p.m. UTC | #2
On Tue, 20 Dec 2022 at 22:36, Adrien Thierry <athierry@redhat.com> wrote:
>
> Hi Krzysztof,
> Thanks for looking into this!
>
> I tested your patchset on the QDrive3 on a CentOS Stream 9 RT kernel (I
> couldn't test it on mainline because the latest RT patchset only supports
> 6.1 which is missing some bits needed to boot QDrive3).
>
> It fixes the PSCI cpuidle issue I was encountering in [1]. However, I may
> have found another code path that triggers a similar issue:
>
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 113, name: kworker/4:2
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 4 locks held by kworker/4:2/113:
>  #0: ffff09b0c2376928 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0
>  #1: ffff800008bf3dd0 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0
>  #2: ffff09b0c2e44860 (&genpd->rslock){....}-{2:2}, at: genpd_lock_rawspin+0x20/0x30
>  #3: ffff09b0c6696a20 (&dev->power.lock){+.+.}-{2:2}, at: dev_pm_qos_flags+0x2c/0x60
> irq event stamp: 170
> hardirqs last  enabled at (169): [<ffffa1be822f8a78>] _raw_spin_unlock_irq+0x48/0xc4
> hardirqs last disabled at (170): [<ffffa1be822f8df4>] _raw_spin_lock_irqsave+0xb0/0xfc
> softirqs last  enabled at (0): [<ffffa1be814cfff0>] copy_process+0x68c/0x1500
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> Preemption disabled at:
> [<ffffa1be81d7e620>] genpd_lock_rawspin+0x20/0x30
> CPU: 4 PID: 113 Comm: kworker/4:2 Tainted: G               X --------- ---  5.14.0-rt14+ #2
> Hardware name: Qualcomm SA8540 ADP (DT)
> Workqueue: pm genpd_power_off_work_fn
> Call trace:
>  dump_backtrace+0xb4/0x12c
>  show_stack+0x1c/0x70
>  dump_stack_lvl+0x98/0xd0
>  dump_stack+0x14/0x2c
>  __might_resched+0x180/0x220
>  rt_spin_lock+0x74/0x11c
>  dev_pm_qos_flags+0x2c/0x60
>  genpd_power_off.part.0.isra.0+0xac/0x2d0
>  genpd_power_off_work_fn+0x68/0x8c
>  process_one_work+0x2b8/0x7c0
>  worker_thread+0x15c/0x44c
>  kthread+0xf8/0x104
>  ret_from_fork+0x10/0x20
>
> This happens consistently during boot. But on the mainline kernel, this
> code path has changed: genpd_power_off no longer calls dev_pm_qos_flags.
> So it might not happen on mainline. I hope to be able to test your
> patchset again soon on mainline with the next version of the RT patchset
> (which should be able to boot the QDrive3).

You are right, since commit 3f9ee7da724a ("PM: domains: Don't check
PM_QOS_FLAG_NO_POWER_OFF in genpd") dev_pm_qos_flags() doesn't get
called in genpd_power_off() anymore. That patch was introduced in
v5.19.
>
> Best,
> Adrien
>
> [1] https://lore.kernel.org/all/20220615203605.1068453-1-athierry@redhat.com/
>

Kind regards
Uffe