Message ID | 20230112194314.845371875@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | cpuidle,rcu: Clean up the mess | expand |
On Thu, Jan 12, 2023 at 08:43:14PM +0100, Peter Zijlstra wrote: > Hi All! > > The (hopefully) final respin of cpuidle vs rcu cleanup patches. Barring any > objections I'll be queueing these patches in tip/sched/core in the next few > days. > > v2: https://lkml.kernel.org/r/20220919095939.761690562@infradead.org > > These here patches clean up the mess that is cpuidle vs rcuidle. > > At the end of the ride there's only on RCU_NONIDLE user left: > > arch/arm64/kernel/suspend.c: RCU_NONIDLE(__cpu_suspend_exit()); > > And I know Mark has been prodding that with something sharp. > > The last version was tested by a number of people and I'm hoping to not have > broken anything in the meantime ;-) > > > Changes since v2: 150 rcutorture hours on each of the default scenarios passed. This is qemu/KVM on x86: Tested-by: Paul E. McKenney <paulmck@kernel.org> > - rebased to v6.2-rc3; as available at: > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/idle > > - folded: https://lkml.kernel.org/r/Y3UBwYNY15ETUKy9@hirez.programming.kicks-ass.net > which makes the ARM cpuidle index 0 consistently not use > CPUIDLE_FLAG_RCU_IDLE, as requested by Ulf. > > - added a few more __always_inline to empty stub functions as found by the > robot. > > - Used _RET_IP_ instead of _THIS_IP_ in a few placed because of: > https://github.com/ClangBuiltLinux/linux/issues/263 > > - Added new patches to address various robot reports: > > #35: trace,hardirq: No moar _rcuidle() tracing > #47: cpuidle: Ensure ct_cpuidle_enter() is always called from noinstr/__cpuidle > #48: cpuidle,arch: Mark all ct_cpuidle_enter() callers __cpuidle > #49: cpuidle,arch: Mark all regular cpuidle_state::enter methods __cpuidle > #50: cpuidle: Comments about noinstr/__cpuidle > #51: context_tracking: Fix noinstr vs KASAN > > > --- > arch/alpha/kernel/process.c | 1 - > arch/alpha/kernel/vmlinux.lds.S | 1 - > arch/arc/kernel/process.c | 3 ++ > arch/arc/kernel/vmlinux.lds.S | 1 - > arch/arm/include/asm/vmlinux.lds.h | 1 - > arch/arm/kernel/cpuidle.c | 4 +- > arch/arm/kernel/process.c | 1 - > arch/arm/kernel/smp.c | 6 +-- > arch/arm/mach-davinci/cpuidle.c | 4 +- > arch/arm/mach-gemini/board-dt.c | 3 +- > arch/arm/mach-imx/cpuidle-imx5.c | 4 +- > arch/arm/mach-imx/cpuidle-imx6q.c | 8 ++-- > arch/arm/mach-imx/cpuidle-imx6sl.c | 4 +- > arch/arm/mach-imx/cpuidle-imx6sx.c | 9 ++-- > arch/arm/mach-imx/cpuidle-imx7ulp.c | 4 +- > arch/arm/mach-omap2/common.h | 6 ++- > arch/arm/mach-omap2/cpuidle34xx.c | 16 ++++++- > arch/arm/mach-omap2/cpuidle44xx.c | 29 +++++++------ > arch/arm/mach-omap2/omap-mpuss-lowpower.c | 12 +++++- > arch/arm/mach-omap2/pm.h | 2 +- > arch/arm/mach-omap2/pm24xx.c | 51 +--------------------- > arch/arm/mach-omap2/pm34xx.c | 14 +++++-- > arch/arm/mach-omap2/pm44xx.c | 2 +- > arch/arm/mach-omap2/powerdomain.c | 10 ++--- > arch/arm/mach-s3c/cpuidle-s3c64xx.c | 5 +-- > arch/arm64/kernel/cpuidle.c | 2 +- > arch/arm64/kernel/idle.c | 1 - > arch/arm64/kernel/smp.c | 4 +- > arch/arm64/kernel/vmlinux.lds.S | 1 - > arch/csky/kernel/process.c | 1 - > arch/csky/kernel/smp.c | 2 +- > arch/csky/kernel/vmlinux.lds.S | 1 - > arch/hexagon/kernel/process.c | 1 - > arch/hexagon/kernel/vmlinux.lds.S | 1 - > arch/ia64/kernel/process.c | 1 + > arch/ia64/kernel/vmlinux.lds.S | 1 - > arch/loongarch/kernel/idle.c | 1 + > arch/loongarch/kernel/vmlinux.lds.S | 1 - > arch/m68k/kernel/vmlinux-nommu.lds | 1 - > arch/m68k/kernel/vmlinux-std.lds | 1 - > arch/m68k/kernel/vmlinux-sun3.lds | 1 - > arch/microblaze/kernel/process.c | 1 - > arch/microblaze/kernel/vmlinux.lds.S | 1 - > arch/mips/kernel/idle.c | 14 +++---- > arch/mips/kernel/vmlinux.lds.S | 1 - > arch/nios2/kernel/process.c | 1 - > arch/nios2/kernel/vmlinux.lds.S | 1 - > arch/openrisc/kernel/process.c | 1 + > arch/openrisc/kernel/vmlinux.lds.S | 1 - > arch/parisc/kernel/process.c | 2 - > arch/parisc/kernel/vmlinux.lds.S | 1 - > arch/powerpc/kernel/idle.c | 5 +-- > arch/powerpc/kernel/vmlinux.lds.S | 1 - > arch/riscv/kernel/process.c | 1 - > arch/riscv/kernel/vmlinux-xip.lds.S | 1 - > arch/riscv/kernel/vmlinux.lds.S | 1 - > arch/s390/kernel/idle.c | 1 - > arch/s390/kernel/vmlinux.lds.S | 1 - > arch/sh/kernel/idle.c | 1 + > arch/sh/kernel/vmlinux.lds.S | 1 - > arch/sparc/kernel/leon_pmc.c | 4 ++ > arch/sparc/kernel/process_32.c | 1 - > arch/sparc/kernel/process_64.c | 3 +- > arch/sparc/kernel/vmlinux.lds.S | 1 - > arch/um/kernel/dyn.lds.S | 1 - > arch/um/kernel/process.c | 1 - > arch/um/kernel/uml.lds.S | 1 - > arch/x86/boot/compressed/vmlinux.lds.S | 1 + > arch/x86/coco/tdx/tdcall.S | 15 +------ > arch/x86/coco/tdx/tdx.c | 25 ++++------- > arch/x86/events/amd/brs.c | 13 +++--- > arch/x86/include/asm/fpu/xcr.h | 4 +- > arch/x86/include/asm/irqflags.h | 11 ++--- > arch/x86/include/asm/mwait.h | 14 +++---- > arch/x86/include/asm/nospec-branch.h | 2 +- > arch/x86/include/asm/paravirt.h | 6 ++- > arch/x86/include/asm/perf_event.h | 2 +- > arch/x86/include/asm/shared/io.h | 4 +- > arch/x86/include/asm/shared/tdx.h | 1 - > arch/x86/include/asm/special_insns.h | 8 ++-- > arch/x86/include/asm/xen/hypercall.h | 2 +- > arch/x86/kernel/cpu/bugs.c | 2 +- > arch/x86/kernel/fpu/core.c | 4 +- > arch/x86/kernel/paravirt.c | 14 ++++++- > arch/x86/kernel/process.c | 65 ++++++++++++++-------------- > arch/x86/kernel/vmlinux.lds.S | 1 - > arch/x86/lib/memcpy_64.S | 5 +-- > arch/x86/lib/memmove_64.S | 4 +- > arch/x86/lib/memset_64.S | 4 +- > arch/x86/xen/enlighten_pv.c | 2 +- > arch/x86/xen/irq.c | 2 +- > arch/xtensa/kernel/process.c | 1 + > arch/xtensa/kernel/vmlinux.lds.S | 1 - > drivers/acpi/processor_idle.c | 28 ++++++++----- > drivers/base/power/runtime.c | 24 +++++------ > drivers/clk/clk.c | 8 ++-- > drivers/cpuidle/cpuidle-arm.c | 4 +- > drivers/cpuidle/cpuidle-big_little.c | 12 ++++-- > drivers/cpuidle/cpuidle-mvebu-v7.c | 13 ++++-- > drivers/cpuidle/cpuidle-psci.c | 26 +++++------- > drivers/cpuidle/cpuidle-qcom-spm.c | 4 +- > drivers/cpuidle/cpuidle-riscv-sbi.c | 19 +++++---- > drivers/cpuidle/cpuidle-tegra.c | 31 +++++++++----- > drivers/cpuidle/cpuidle.c | 70 ++++++++++++++++++++++--------- > drivers/cpuidle/dt_idle_states.c | 2 +- > drivers/cpuidle/poll_state.c | 10 ++++- > drivers/idle/intel_idle.c | 19 ++++----- > drivers/perf/arm_pmu.c | 11 +---- > drivers/perf/riscv_pmu_sbi.c | 8 +--- > include/asm-generic/vmlinux.lds.h | 9 ++-- > include/linux/clockchips.h | 4 +- > include/linux/compiler_types.h | 18 +++++++- > include/linux/cpu.h | 3 -- > include/linux/cpuidle.h | 32 ++++++++++++++ > include/linux/cpumask.h | 4 +- > include/linux/percpu-defs.h | 2 +- > include/linux/sched/idle.h | 40 +++++++++++++----- > include/linux/thread_info.h | 18 +++++++- > include/linux/tracepoint.h | 15 ++++++- > kernel/context_tracking.c | 12 +++--- > kernel/cpu_pm.c | 9 ---- > kernel/printk/printk.c | 2 +- > kernel/sched/idle.c | 47 ++++++--------------- > kernel/time/tick-broadcast-hrtimer.c | 29 ++++++------- > kernel/time/tick-broadcast.c | 6 ++- > kernel/trace/trace.c | 3 ++ > kernel/trace/trace_preemptirq.c | 50 ++++++---------------- > lib/ubsan.c | 5 ++- > mm/kasan/kasan.h | 4 ++ > mm/kasan/shadow.c | 38 +++++++++++++++++ > tools/objtool/check.c | 17 ++++++++ > 131 files changed, 617 insertions(+), 523 deletions(-) >
On Thu, Jan 12, 2023 at 08:43:14PM +0100, Peter Zijlstra wrote: > Hi All! Hi Peter, > The (hopefully) final respin of cpuidle vs rcu cleanup patches. Barring any > objections I'll be queueing these patches in tip/sched/core in the next few > days. I'm sorry to have to bear some bad news on that front. :( I just had a go at testing this on a Juno dev board, using your queue.git sched/idle branch and defconfig + CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y. With that I consistently see RCU at boot time (log below). | ============================= | WARNING: suspicious RCU usage | 6.2.0-rc3-00051-gced9b6eecb31 #1 Not tainted | ----------------------------- | include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage! | | other info that might help us debug this: | | | rcu_scheduler_active = 2, debug_locks = 1 | RCU used illegally from extended quiescent state! | no locks held by swapper/0/0. | | stack backtrace: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc3-00051-gced9b6eecb31 #1 | Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II May 16 2021 | Call trace: | dump_backtrace.part.0+0xe4/0xf0 | show_stack+0x18/0x30 | dump_stack_lvl+0x98/0xd4 | dump_stack+0x18/0x34 | lockdep_rcu_suspicious+0xf8/0x10c | trace_ipi_raise+0x1a8/0x1b0 | arch_irq_work_raise+0x4c/0x70 | __irq_work_queue_local+0x48/0x80 | irq_work_queue+0x50/0x80 | __wake_up_klogd.part.0+0x98/0xe0 | defer_console_output+0x20/0x30 | vprintk+0x98/0xf0 | _printk+0x5c/0x84 | lockdep_rcu_suspicious+0x34/0x10c | trace_lock_acquire+0x174/0x180 | lock_acquire+0x3c/0x8c | _raw_spin_lock_irqsave+0x70/0x150 | down_trylock+0x18/0x50 | __down_trylock_console_sem+0x3c/0xd0 | console_trylock+0x28/0x90 | vprintk_emit+0x11c/0x354 | vprintk_default+0x38/0x4c | vprintk+0xd4/0xf0 | _printk+0x5c/0x84 | lockdep_rcu_suspicious+0x34/0x10c | printk_sprint+0x238/0x240 | vprintk_store+0x32c/0x4b0 | vprintk_emit+0x104/0x354 | vprintk_default+0x38/0x4c | vprintk+0xd4/0xf0 | _printk+0x5c/0x84 | lockdep_rcu_suspicious+0x34/0x10c | trace_irq_disable+0x1ac/0x1b0 | trace_hardirqs_off+0xe8/0x110 | cpu_suspend+0x4c/0xfc | psci_cpu_suspend_enter+0x58/0x6c | psci_enter_idle_state+0x70/0x170 | cpuidle_enter_state+0xc4/0x464 | cpuidle_enter+0x38/0x50 | do_idle+0x230/0x2c0 | cpu_startup_entry+0x24/0x30 | rest_init+0x110/0x190 | arch_post_acpi_subsys_init+0x0/0x18 | start_kernel+0x6f8/0x738 | __primary_switched+0xbc/0xc4 IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our local_daif_*() helpers poke lockdep and tracing, hence the call to trace_hardirqs_off() and the RCU usage. I think we need RCU to be watching all the way down to cpu_suspend(), and it's cpu_suspend() that should actually enter/exit idle context. That and we need to make cpu_suspend() and the low-level PSCI invocation noinstr. I'm not sure whether 32-bit will have a similar issue or not. I'm surprised no-one else who has tested has seen this; I suspect people haven't enabled lockdep and friends. :/ Thanks, Mark.
On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote: > I'm sorry to have to bear some bad news on that front. :( Moo, something had to give.. > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our > local_daif_*() helpers poke lockdep and tracing, hence the call to > trace_hardirqs_off() and the RCU usage. Right, strictly speaking not needed at this point, IRQs should have been traced off a long time ago. > I think we need RCU to be watching all the way down to cpu_suspend(), and it's > cpu_suspend() that should actually enter/exit idle context. That and we need to > make cpu_suspend() and the low-level PSCI invocation noinstr. > > I'm not sure whether 32-bit will have a similar issue or not. I'm not seeing 32bit or Risc-V have similar issues here, but who knows, maybe I missed somsething. In any case, the below ought to cure the ARM64 case and remove that last known RCU_NONIDLE() user as a bonus. --- diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c index 41974a1a229a..42e19fff40ee 100644 --- a/arch/arm64/kernel/cpuidle.c +++ b/arch/arm64/kernel/cpuidle.c @@ -67,10 +67,10 @@ __cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi) u32 state = lpi->address; if (ARM64_LPI_IS_RETENTION_STATE(lpi->arch_flags)) - return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(psci_cpu_suspend_enter, + return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(psci_cpu_suspend_enter, lpi->index, state); else - return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, + return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, lpi->index, state); } #endif diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c index e7163f31f716..0fbdf5fe64d8 100644 --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c @@ -4,6 +4,7 @@ #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/pgtable.h> +#include <linux/cpuidle.h> #include <asm/alternative.h> #include <asm/cacheflush.h> #include <asm/cpufeature.h> @@ -104,6 +105,10 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) * From this point debug exceptions are disabled to prevent * updates to mdscr register (saved and restored along with * general purpose registers) from kernel debuggers. + * + * Strictly speaking the trace_hardirqs_off() here is superfluous, + * hardirqs should be firmly off by now. This really ought to use + * something like raw_local_daif_save(). */ flags = local_daif_save(); @@ -120,6 +125,8 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) */ arm_cpuidle_save_irq_context(&context); + ct_cpuidle_enter(); + if (__cpu_suspend_enter(&state)) { /* Call the suspend finisher */ ret = fn(arg); @@ -133,8 +140,11 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) */ if (!ret) ret = -EOPNOTSUPP; + + ct_cpuidle_exit(); } else { - RCU_NONIDLE(__cpu_suspend_exit()); + ct_cpuidle_exit(); + __cpu_suspend_exit(); } arm_cpuidle_restore_irq_context(&context); diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 4fc4e0381944..312a34ef28dc 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -69,16 +69,12 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev, else pm_runtime_put_sync_suspend(pd_dev); - ct_cpuidle_enter(); - state = psci_get_domain_state(); if (!state) state = states[idx]; ret = psci_cpu_suspend_enter(state) ? -1 : idx; - ct_cpuidle_exit(); - if (s2idle) dev_pm_genpd_resume(pd_dev); else @@ -192,7 +188,7 @@ static __cpuidle int psci_enter_idle_state(struct cpuidle_device *dev, { u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states); - return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state[idx]); + return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, idx, state[idx]); } static const struct of_device_id psci_idle_state_match[] = { diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index e7bcfca4159f..f3a044fa4652 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -462,11 +462,22 @@ int psci_cpu_suspend_enter(u32 state) if (!psci_power_state_loses_context(state)) { struct arm_cpuidle_irq_context context; + ct_cpuidle_enter(); arm_cpuidle_save_irq_context(&context); ret = psci_ops.cpu_suspend(state, 0); arm_cpuidle_restore_irq_context(&context); + ct_cpuidle_exit(); } else { + /* + * ARM64 cpu_suspend() wants to do ct_cpuidle_*() itself. + */ + if (!IS_ENABLED(CONFIG_ARM64)) + ct_cpuidle_enter(); + ret = cpu_suspend(state, psci_suspend_finisher); + + if (!IS_ENABLED(CONFIG_ARM64)) + ct_cpuidle_exit(); } return ret; diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 630c879143c7..3183aeb7f5b4 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -307,7 +307,7 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu); #define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \ idx, \ state, \ - is_retention) \ + is_retention, is_rcu) \ ({ \ int __ret = 0; \ \ @@ -319,9 +319,11 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu); if (!is_retention) \ __ret = cpu_pm_enter(); \ if (!__ret) { \ - ct_cpuidle_enter(); \ + if (!is_rcu) \ + ct_cpuidle_enter(); \ __ret = low_level_idle_enter(state); \ - ct_cpuidle_exit(); \ + if (!is_rcu) \ + ct_cpuidle_exit(); \ if (!is_retention) \ cpu_pm_exit(); \ } \ @@ -330,15 +332,21 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu); }) #define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx) \ - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0) + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 0) #define CPU_PM_CPU_IDLE_ENTER_RETENTION(low_level_idle_enter, idx) \ - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1) + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1, 0) #define CPU_PM_CPU_IDLE_ENTER_PARAM(low_level_idle_enter, idx, state) \ - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0) + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 0) + +#define CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(low_level_idle_enter, idx, state) \ + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 1) #define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(low_level_idle_enter, idx, state) \ - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1) + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 0) + +#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(low_level_idle_enter, idx, state) \ + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 1) #endif /* _LINUX_CPUIDLE_H */
On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote: > On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote: > > > I'm sorry to have to bear some bad news on that front. :( > > Moo, something had to give.. > > > > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU > > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our > > local_daif_*() helpers poke lockdep and tracing, hence the call to > > trace_hardirqs_off() and the RCU usage. > > Right, strictly speaking not needed at this point, IRQs should have been > traced off a long time ago. > > > I think we need RCU to be watching all the way down to cpu_suspend(), and it's > > cpu_suspend() that should actually enter/exit idle context. That and we need to > > make cpu_suspend() and the low-level PSCI invocation noinstr. > > > > I'm not sure whether 32-bit will have a similar issue or not. > > I'm not seeing 32bit or Risc-V have similar issues here, but who knows, > maybe I missed somsething. > > In any case, the below ought to cure the ARM64 case and remove that last > known RCU_NONIDLE() user as a bonus. > Thanks for the fix. I tested the series and did observe the same splat with both DT and ACPI boot(they enter idle in different code paths). Thanks to Mark for reminding me about ACPI. With this fix, I see the splat is gone in both DT(cpuidle-psci.c) and ACPI(acpi_processor_idle.c). You can add: Tested-by: Sudeep Holla <sudeep.holla@arm.com> -- Regards, Sudeep
On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote: > On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote: > > > I'm sorry to have to bear some bad news on that front. :( > > Moo, something had to give.. > > > > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU > > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our > > local_daif_*() helpers poke lockdep and tracing, hence the call to > > trace_hardirqs_off() and the RCU usage. > > Right, strictly speaking not needed at this point, IRQs should have been > traced off a long time ago. True, but there are some other calls around here that *might* end up invoking RCU stuff (e.g. the MTE code). That all needs a noinstr cleanup too, which I'll sort out as a follow-up. > > I think we need RCU to be watching all the way down to cpu_suspend(), and it's > > cpu_suspend() that should actually enter/exit idle context. That and we need to > > make cpu_suspend() and the low-level PSCI invocation noinstr. > > > > I'm not sure whether 32-bit will have a similar issue or not. > > I'm not seeing 32bit or Risc-V have similar issues here, but who knows, > maybe I missed somsething. I reckon if they do, the core changes here give us the infrastructure to fix them if/when we get reports. > In any case, the below ought to cure the ARM64 case and remove that last > known RCU_NONIDLE() user as a bonus. The below works for me testing on a Juno R1 board with PSCI, using defconfig + CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y. I'm not sure how to test the LPI / FFH part, but it looks good to me. FWIW: Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Sudeep, would you be able to give the LPI/FFH side a spin with the kconfig options above? Thanks, Mark. > > --- > diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c > index 41974a1a229a..42e19fff40ee 100644 > --- a/arch/arm64/kernel/cpuidle.c > +++ b/arch/arm64/kernel/cpuidle.c > @@ -67,10 +67,10 @@ __cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi) > u32 state = lpi->address; > > if (ARM64_LPI_IS_RETENTION_STATE(lpi->arch_flags)) > - return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(psci_cpu_suspend_enter, > + return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(psci_cpu_suspend_enter, > lpi->index, state); > else > - return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, > + return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, > lpi->index, state); > } > #endif > diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c > index e7163f31f716..0fbdf5fe64d8 100644 > --- a/arch/arm64/kernel/suspend.c > +++ b/arch/arm64/kernel/suspend.c > @@ -4,6 +4,7 @@ > #include <linux/slab.h> > #include <linux/uaccess.h> > #include <linux/pgtable.h> > +#include <linux/cpuidle.h> > #include <asm/alternative.h> > #include <asm/cacheflush.h> > #include <asm/cpufeature.h> > @@ -104,6 +105,10 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) > * From this point debug exceptions are disabled to prevent > * updates to mdscr register (saved and restored along with > * general purpose registers) from kernel debuggers. > + * > + * Strictly speaking the trace_hardirqs_off() here is superfluous, > + * hardirqs should be firmly off by now. This really ought to use > + * something like raw_local_daif_save(). > */ > flags = local_daif_save(); > > @@ -120,6 +125,8 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) > */ > arm_cpuidle_save_irq_context(&context); > > + ct_cpuidle_enter(); > + > if (__cpu_suspend_enter(&state)) { > /* Call the suspend finisher */ > ret = fn(arg); > @@ -133,8 +140,11 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) > */ > if (!ret) > ret = -EOPNOTSUPP; > + > + ct_cpuidle_exit(); > } else { > - RCU_NONIDLE(__cpu_suspend_exit()); > + ct_cpuidle_exit(); > + __cpu_suspend_exit(); > } > > arm_cpuidle_restore_irq_context(&context); > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c > index 4fc4e0381944..312a34ef28dc 100644 > --- a/drivers/cpuidle/cpuidle-psci.c > +++ b/drivers/cpuidle/cpuidle-psci.c > @@ -69,16 +69,12 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev, > else > pm_runtime_put_sync_suspend(pd_dev); > > - ct_cpuidle_enter(); > - > state = psci_get_domain_state(); > if (!state) > state = states[idx]; > > ret = psci_cpu_suspend_enter(state) ? -1 : idx; > > - ct_cpuidle_exit(); > - > if (s2idle) > dev_pm_genpd_resume(pd_dev); > else > @@ -192,7 +188,7 @@ static __cpuidle int psci_enter_idle_state(struct cpuidle_device *dev, > { > u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states); > > - return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state[idx]); > + return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, idx, state[idx]); > } > > static const struct of_device_id psci_idle_state_match[] = { > diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c > index e7bcfca4159f..f3a044fa4652 100644 > --- a/drivers/firmware/psci/psci.c > +++ b/drivers/firmware/psci/psci.c > @@ -462,11 +462,22 @@ int psci_cpu_suspend_enter(u32 state) > if (!psci_power_state_loses_context(state)) { > struct arm_cpuidle_irq_context context; > > + ct_cpuidle_enter(); > arm_cpuidle_save_irq_context(&context); > ret = psci_ops.cpu_suspend(state, 0); > arm_cpuidle_restore_irq_context(&context); > + ct_cpuidle_exit(); > } else { > + /* > + * ARM64 cpu_suspend() wants to do ct_cpuidle_*() itself. > + */ > + if (!IS_ENABLED(CONFIG_ARM64)) > + ct_cpuidle_enter(); > + > ret = cpu_suspend(state, psci_suspend_finisher); > + > + if (!IS_ENABLED(CONFIG_ARM64)) > + ct_cpuidle_exit(); > } > > return ret; > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index 630c879143c7..3183aeb7f5b4 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -307,7 +307,7 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu); > #define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \ > idx, \ > state, \ > - is_retention) \ > + is_retention, is_rcu) \ > ({ \ > int __ret = 0; \ > \ > @@ -319,9 +319,11 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu); > if (!is_retention) \ > __ret = cpu_pm_enter(); \ > if (!__ret) { \ > - ct_cpuidle_enter(); \ > + if (!is_rcu) \ > + ct_cpuidle_enter(); \ > __ret = low_level_idle_enter(state); \ > - ct_cpuidle_exit(); \ > + if (!is_rcu) \ > + ct_cpuidle_exit(); \ > if (!is_retention) \ > cpu_pm_exit(); \ > } \ > @@ -330,15 +332,21 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu); > }) > > #define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx) \ > - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0) > + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 0) > > #define CPU_PM_CPU_IDLE_ENTER_RETENTION(low_level_idle_enter, idx) \ > - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1) > + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1, 0) > > #define CPU_PM_CPU_IDLE_ENTER_PARAM(low_level_idle_enter, idx, state) \ > - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0) > + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 0) > + > +#define CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(low_level_idle_enter, idx, state) \ > + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 1) > > #define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(low_level_idle_enter, idx, state) \ > - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1) > + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 0) > + > +#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(low_level_idle_enter, idx, state) \ > + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 1) > > #endif /* _LINUX_CPUIDLE_H */
On Tue, Jan 17, 2023 at 01:16:21PM +0000, Mark Rutland wrote: > On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote: > > On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote: > > > > > I'm sorry to have to bear some bad news on that front. :( > > > > Moo, something had to give.. > > > > > > > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU > > > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our > > > local_daif_*() helpers poke lockdep and tracing, hence the call to > > > trace_hardirqs_off() and the RCU usage. > > > > Right, strictly speaking not needed at this point, IRQs should have been > > traced off a long time ago. > > True, but there are some other calls around here that *might* end up invoking > RCU stuff (e.g. the MTE code). > > That all needs a noinstr cleanup too, which I'll sort out as a follow-up. > > > > I think we need RCU to be watching all the way down to cpu_suspend(), and it's > > > cpu_suspend() that should actually enter/exit idle context. That and we need to > > > make cpu_suspend() and the low-level PSCI invocation noinstr. > > > > > > I'm not sure whether 32-bit will have a similar issue or not. > > > > I'm not seeing 32bit or Risc-V have similar issues here, but who knows, > > maybe I missed somsething. > > I reckon if they do, the core changes here give us the infrastructure to fix > them if/when we get reports. > > > In any case, the below ought to cure the ARM64 case and remove that last > > known RCU_NONIDLE() user as a bonus. > > The below works for me testing on a Juno R1 board with PSCI, using defconfig + > CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y. > I'm not sure how to test the LPI / FFH part, but it looks good to me. > > FWIW: > > Reviewed-by: Mark Rutland <mark.rutland@arm.com> > Tested-by: Mark Rutland <mark.rutland@arm.com> > > Sudeep, would you be able to give the LPI/FFH side a spin with the kconfig > options above? > Not sure if I have messed up something in my mail setup, but I did reply earlier. I did test both DT/cpuidle-psci driver and ACPI/LPI+FFH driver with the fix Peter sent. I was seeing same splat as you in both DT and ACPI boot which the patch fixed it. I used the same config as described by you above.
On Tue, Jan 17, 2023 at 02:21:40PM +0000, Sudeep Holla wrote: > On Tue, Jan 17, 2023 at 01:16:21PM +0000, Mark Rutland wrote: > > On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote: > > > On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote: > > > > > > > I'm sorry to have to bear some bad news on that front. :( > > > > > > Moo, something had to give.. > > > > > > > > > > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU > > > > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our > > > > local_daif_*() helpers poke lockdep and tracing, hence the call to > > > > trace_hardirqs_off() and the RCU usage. > > > > > > Right, strictly speaking not needed at this point, IRQs should have been > > > traced off a long time ago. > > > > True, but there are some other calls around here that *might* end up invoking > > RCU stuff (e.g. the MTE code). > > > > That all needs a noinstr cleanup too, which I'll sort out as a follow-up. > > > > > > I think we need RCU to be watching all the way down to cpu_suspend(), and it's > > > > cpu_suspend() that should actually enter/exit idle context. That and we need to > > > > make cpu_suspend() and the low-level PSCI invocation noinstr. > > > > > > > > I'm not sure whether 32-bit will have a similar issue or not. > > > > > > I'm not seeing 32bit or Risc-V have similar issues here, but who knows, > > > maybe I missed somsething. > > > > I reckon if they do, the core changes here give us the infrastructure to fix > > them if/when we get reports. > > > > > In any case, the below ought to cure the ARM64 case and remove that last > > > known RCU_NONIDLE() user as a bonus. > > > > The below works for me testing on a Juno R1 board with PSCI, using defconfig + > > CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y. > > I'm not sure how to test the LPI / FFH part, but it looks good to me. > > > > FWIW: > > > > Reviewed-by: Mark Rutland <mark.rutland@arm.com> > > Tested-by: Mark Rutland <mark.rutland@arm.com> > > > > Sudeep, would you be able to give the LPI/FFH side a spin with the kconfig > > options above? > > > > Not sure if I have messed up something in my mail setup, but I did reply > earlier. Sorry, that was my bad; I had been drafting my reply for a while and forgot to re-check prior to sending. > I did test both DT/cpuidle-psci driver and ACPI/LPI+FFH driver > with the fix Peter sent. I was seeing same splat as you in both DT and > ACPI boot which the patch fixed it. I used the same config as described by > you above. Perfect; thanks! Mark.