Message ID | 20201221162249.3119-1-lecopzer.chen@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] arm64: perf: Fix access percpu variables in preemptible context | expand |
On Mon, 21 Dec 2020 at 21:53, Lecopzer Chen <lecopzer.chen@mediatek.com> wrote: > > commit 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") > reinitilizes lockup detector after arm64 PMU is initialized and open > a window for accessing smp_processor_id() in preemptible context. > Since hardlockup_detector_perf_init() always called in init stage > with a single cpu, but we initialize lockup detector after the init task > is migratable. > > Fix this by utilizing lockup detector reconfiguration which calls > softlockup_start_all() on each cpu and calls watatchdog_nmi_enable() later. > Because softlockup_start_all() use IPI call function to make sure > watatchdog_nmi_enable() will bind on each cpu and fix this issue. IMO, this just creates unnecessary dependency for hardlockup detector init via softlockup detector (see the alternative definition of lockup_detector_reconfigure()). > > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 How about just the below fix in order to make CONFIG_DEBUG_PREEMPT happy? diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 247bf0b1582c..db06ee28f48e 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -165,7 +165,7 @@ static void watchdog_overflow_callback(struct perf_event *event, static int hardlockup_detector_event_create(void) { - unsigned int cpu = smp_processor_id(); + unsigned int cpu = raw_smp_processor_id(); struct perf_event_attr *wd_attr; struct perf_event *evt; -Sumit > caller is debug_smp_processor_id+0x20/0x2c > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace+0x0/0x3c0 > show_stack+0x20/0x6c > dump_stack+0x2f0/0x42c > check_preemption_disabled+0x1cc/0x1dc > debug_smp_processor_id+0x20/0x2c > hardlockup_detector_event_create+0x34/0x18c > hardlockup_detector_perf_init+0x2c/0x134 > watchdog_nmi_probe+0x18/0x24 > lockup_detector_init+0x44/0xa8 > armv8_pmu_driver_init+0x54/0x78 > do_one_initcall+0x184/0x43c > kernel_init_freeable+0x368/0x380 > kernel_init+0x1c/0x1cc > ret_from_fork+0x10/0x30 > > > Fixes: 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") > Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> > Reported-by: kernel test robot <oliver.sang@intel.com> > Cc: Sumit Garg <sumit.garg@linaro.org> > --- > > Changelog v1 -> v2: > * https://lore.kernel.org/lkml/20201217130617.32202-1-lecopzer.chen@mediatek.com/ > * Move solution from kernel/watchdog_hld.c to arm64 perf_event > * avoid preemptive kmalloc in preempt_disable(). > > > > arch/arm64/kernel/perf_event.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > index 38bb07eff872..c03e21210bbb 100644 > --- a/arch/arm64/kernel/perf_event.c > +++ b/arch/arm64/kernel/perf_event.c > @@ -1345,4 +1345,20 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh) > > return (u64)max_cpu_freq * watchdog_thresh; > } > + > +/* > + * hardlockup_detector_perf_init() always call in init stage with a single > + * cpu. In arm64 case, we re-initialize lockup detector after pmu driver > + * initialized. Lockup detector initial function use lots of percpu variables > + * and this makes CONFIG_DEBUG_PREEMPT unhappy because we are now in > + * preemptive context. > + * Return 0 if the nmi is ready and register nmi hardlockup detector by > + * lockup detector reconfiguration. > + */ > +int __init watchdog_nmi_probe(void) > +{ > + if (arm_pmu_irq_is_nmi()) > + return 0; > + return -ENODEV; > +} > #endif > -- > 2.25.1 >
Hi Sumit, Thanks for your reply. > On Mon, 21 Dec 2020 at 21:53, Lecopzer Chen <lecopzer.chen@mediatek.com> wrote: > > > > commit 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") > > reinitilizes lockup detector after arm64 PMU is initialized and open > > a window for accessing smp_processor_id() in preemptible context. > > Since hardlockup_detector_perf_init() always called in init stage > > with a single cpu, but we initialize lockup detector after the init task > > is migratable. > > > > Fix this by utilizing lockup detector reconfiguration which calls > > softlockup_start_all() on each cpu and calls watatchdog_nmi_enable() later. > > Because softlockup_start_all() use IPI call function to make sure > > watatchdog_nmi_enable() will bind on each cpu and fix this issue. > > IMO, this just creates unnecessary dependency for hardlockup detector > init via softlockup detector (see the alternative definition of > lockup_detector_reconfigure()). The arm64/Kconfig select HAVE_HARDLOCKUP_DETECTOR_PERF if we have NMI: select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI And in lib/Kconfig.debug HARDLOCKUP_DETECTOR select SOFTLOCKUP_DETECTOR automatically. config HARDLOCKUP_DETECTOR_PERF bool select SOFTLOCKUP_DETECTOR So we don't need to explicitly select softlockup. And actually this patch is not a perfect solution like you said (hardlockup depends on softlockup), but the key point is that lockup_detector_init() seems only design for using in early init stage and not for calling in later deffered initial process. > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 > > How about just the below fix in order to make CONFIG_DEBUG_PREEMPT happy? > > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c > index 247bf0b1582c..db06ee28f48e 100644 > --- a/kernel/watchdog_hld.c > +++ b/kernel/watchdog_hld.c > @@ -165,7 +165,7 @@ static void watchdog_overflow_callback(struct > perf_event *event, > > static int hardlockup_detector_event_create(void) > { > - unsigned int cpu = smp_processor_id(); > + unsigned int cpu = raw_smp_processor_id(); > struct perf_event_attr *wd_attr; > struct perf_event *evt; > This won't solve the issue that arm64 called this in preemptible context, I was trying to find a balance that can pass CONFIG_DEBUG_PREEMPT and calling lockup_detector_init() in non-preemptive context. watchdog_nmi_probe() and the following hardlockup_detector_event_create use this_cpu_read/write, thus the topic of solution is better to be 'how to call lockup_detector_init() in preemptive context' we can't just use preempt_disable/enable between lockup_detector_init() because the call tree inside it will use kamlloc() with GFP_KERNEL which would check by might_sleep() The v2 is now what I can find to solve this and the smallest change. But the drawback, again, is hardlockup depends on softlockup. The other solution may be executed lockup_detector_init in a binded thread which only bind to one cpu. BRs, Lecopzer > -Sumit > > > caller is debug_smp_processor_id+0x20/0x2c > > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276 > > Hardware name: linux,dummy-virt (DT) > > Call trace: > > dump_backtrace+0x0/0x3c0 > > show_stack+0x20/0x6c > > dump_stack+0x2f0/0x42c > > check_preemption_disabled+0x1cc/0x1dc > > debug_smp_processor_id+0x20/0x2c > > hardlockup_detector_event_create+0x34/0x18c > > hardlockup_detector_perf_init+0x2c/0x134 > > watchdog_nmi_probe+0x18/0x24 > > lockup_detector_init+0x44/0xa8 > > armv8_pmu_driver_init+0x54/0x78 > > do_one_initcall+0x184/0x43c > > kernel_init_freeable+0x368/0x380 > > kernel_init+0x1c/0x1cc > > ret_from_fork+0x10/0x30 >
On Fri, Jan 08, 2021 at 08:55:27PM +0800, Lecopzer Chen wrote: > > On Mon, 21 Dec 2020 at 21:53, Lecopzer Chen <lecopzer.chen@mediatek.com> wrote: > > > > > > commit 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") > > > reinitilizes lockup detector after arm64 PMU is initialized and open > > > a window for accessing smp_processor_id() in preemptible context. > > > Since hardlockup_detector_perf_init() always called in init stage > > > with a single cpu, but we initialize lockup detector after the init task > > > is migratable. > > > > > > Fix this by utilizing lockup detector reconfiguration which calls > > > softlockup_start_all() on each cpu and calls watatchdog_nmi_enable() later. > > > Because softlockup_start_all() use IPI call function to make sure > > > watatchdog_nmi_enable() will bind on each cpu and fix this issue. > > > > IMO, this just creates unnecessary dependency for hardlockup detector > > init via softlockup detector (see the alternative definition of > > lockup_detector_reconfigure()). > > > The arm64/Kconfig select HAVE_HARDLOCKUP_DETECTOR_PERF if we have NMI: > select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI > > And in lib/Kconfig.debug HARDLOCKUP_DETECTOR select SOFTLOCKUP_DETECTOR automatically. > config HARDLOCKUP_DETECTOR_PERF > bool > select SOFTLOCKUP_DETECTOR > > So we don't need to explicitly select softlockup. > And actually this patch is not a perfect solution like you said > (hardlockup depends on softlockup), > but the key point is that lockup_detector_init() seems only design for > using in early init stage and not for calling in later deffered initial process. I agree; the current usage in armv8_pmu_driver_init() looks very broken to me, and bodging it with raw_smp_processor_id() isn't the right solution. Maybe we should just revert 367c820ef08082, as this looks like a design issue rather than something with a simple fix? Will
On Tue, Jan 12, 2021 at 03:07:36PM +0000, Will Deacon wrote: > On Fri, Jan 08, 2021 at 08:55:27PM +0800, Lecopzer Chen wrote: > > > On Mon, 21 Dec 2020 at 21:53, Lecopzer Chen <lecopzer.chen@mediatek.com> wrote: > > > > > > > > commit 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") > > > > reinitilizes lockup detector after arm64 PMU is initialized and open > > > > a window for accessing smp_processor_id() in preemptible context. > > > > Since hardlockup_detector_perf_init() always called in init stage > > > > with a single cpu, but we initialize lockup detector after the init task > > > > is migratable. > > > > > > > > Fix this by utilizing lockup detector reconfiguration which calls > > > > softlockup_start_all() on each cpu and calls watatchdog_nmi_enable() later. > > > > Because softlockup_start_all() use IPI call function to make sure > > > > watatchdog_nmi_enable() will bind on each cpu and fix this issue. > > > > > > IMO, this just creates unnecessary dependency for hardlockup detector > > > init via softlockup detector (see the alternative definition of > > > lockup_detector_reconfigure()). > > > > > > The arm64/Kconfig select HAVE_HARDLOCKUP_DETECTOR_PERF if we have NMI: > > select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI > > > > And in lib/Kconfig.debug HARDLOCKUP_DETECTOR select SOFTLOCKUP_DETECTOR automatically. > > config HARDLOCKUP_DETECTOR_PERF > > bool > > select SOFTLOCKUP_DETECTOR > > > > So we don't need to explicitly select softlockup. > > And actually this patch is not a perfect solution like you said > > (hardlockup depends on softlockup), > > but the key point is that lockup_detector_init() seems only design for > > using in early init stage and not for calling in later deffered initial process. > > I agree; the current usage in armv8_pmu_driver_init() looks very broken to > me, and bodging it with raw_smp_processor_id() isn't the right solution. > > Maybe we should just revert 367c820ef08082, as this looks like a design > issue rather than something with a simple fix? I think that would make sense for now, then we can reconsider the whole thing rather than looking for a point-fix. Thanks, Mark.
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 38bb07eff872..c03e21210bbb 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -1345,4 +1345,20 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh) return (u64)max_cpu_freq * watchdog_thresh; } + +/* + * hardlockup_detector_perf_init() always call in init stage with a single + * cpu. In arm64 case, we re-initialize lockup detector after pmu driver + * initialized. Lockup detector initial function use lots of percpu variables + * and this makes CONFIG_DEBUG_PREEMPT unhappy because we are now in + * preemptive context. + * Return 0 if the nmi is ready and register nmi hardlockup detector by + * lockup detector reconfiguration. + */ +int __init watchdog_nmi_probe(void) +{ + if (arm_pmu_irq_is_nmi()) + return 0; + return -ENODEV; +} #endif
commit 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") reinitilizes lockup detector after arm64 PMU is initialized and open a window for accessing smp_processor_id() in preemptible context. Since hardlockup_detector_perf_init() always called in init stage with a single cpu, but we initialize lockup detector after the init task is migratable. Fix this by utilizing lockup detector reconfiguration which calls softlockup_start_all() on each cpu and calls watatchdog_nmi_enable() later. Because softlockup_start_all() use IPI call function to make sure watatchdog_nmi_enable() will bind on each cpu and fix this issue. BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id+0x20/0x2c CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x3c0 show_stack+0x20/0x6c dump_stack+0x2f0/0x42c check_preemption_disabled+0x1cc/0x1dc debug_smp_processor_id+0x20/0x2c hardlockup_detector_event_create+0x34/0x18c hardlockup_detector_perf_init+0x2c/0x134 watchdog_nmi_probe+0x18/0x24 lockup_detector_init+0x44/0xa8 armv8_pmu_driver_init+0x54/0x78 do_one_initcall+0x184/0x43c kernel_init_freeable+0x368/0x380 kernel_init+0x1c/0x1cc ret_from_fork+0x10/0x30 Fixes: 367c820ef08082 ("arm64: Enable perf events based hard lockup detector") Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Reported-by: kernel test robot <oliver.sang@intel.com> Cc: Sumit Garg <sumit.garg@linaro.org> --- Changelog v1 -> v2: * https://lore.kernel.org/lkml/20201217130617.32202-1-lecopzer.chen@mediatek.com/ * Move solution from kernel/watchdog_hld.c to arm64 perf_event * avoid preemptive kmalloc in preempt_disable(). arch/arm64/kernel/perf_event.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)