Message ID | 20220324141405.10835-6-lecopzer.chen@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Support hld delayed init based on Pseudo-NMI for arm64 | expand |
On Thu 2022-03-24 22:14:05, Lecopzer Chen wrote: > With the recent feature added to enable perf events to use pseudo NMIs > as interrupts on platforms which support GICv3 or later, its now been > possible to enable hard lockup detector (or NMI watchdog) on arm64 > platforms. So enable corresponding support. > > One thing to note here is that normally lockup detector is initialized > just after the early initcalls but PMU on arm64 comes up much later as > device_initcall(). To cope with that, overriding watchdog_nmi_probe() to > let the watchdog framework know PMU not ready, and inform the framework > to re-initialize lockup detection once PMU has been initialized. > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > --- /dev/null > +++ b/arch/arm64/kernel/watchdog_hld.c > @@ -0,0 +1,37 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <linux/nmi.h> > +#include <linux/cpufreq.h> > +#include <linux/perf/arm_pmu.h> > + > +/* > + * Safe maximum CPU frequency in case a particular platform doesn't implement > + * cpufreq driver. Although, architecture doesn't put any restrictions on > + * maximum frequency but 5 GHz seems to be safe maximum given the available > + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other > + * hand, we can't make it much higher as it would lead to a large hard-lockup > + * detection timeout on parts which are running slower (eg. 1GHz on > + * Developerbox) and doesn't possess a cpufreq driver. > + */ > +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz > +u64 hw_nmi_get_sample_period(int watchdog_thresh) > +{ > + unsigned int cpu = smp_processor_id(); > + unsigned long max_cpu_freq; > + > + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; > + if (!max_cpu_freq) > + max_cpu_freq = SAFE_MAX_CPU_FREQ; > + > + return (u64)max_cpu_freq * watchdog_thresh; > +} This change is not mentioned in the commit message. Please, put it into a separate patch. > +int __init watchdog_nmi_probe(void) > +{ > + if (!allow_lockup_detector_init_retry) > + return -EBUSY; How do you know that you should return -EBUSY when retry in not enabled? I guess that it is an optimization to make it fast during the first call. But the logic is far from obvious. > + > + if (!arm_pmu_irq_is_nmi()) > + return -ENODEV; > + > + return hardlockup_detector_perf_init(); > +} Is this just an optimization or is it really needed? Why this was not needed in v2 patchset? If it is just an optimization then I would remove it. IMHO, it just adds confusion and it is not worth it. Best Regards, Petr
> On Thu 2022-03-24 22:14:05, Lecopzer Chen wrote: > > With the recent feature added to enable perf events to use pseudo NMIs > > as interrupts on platforms which support GICv3 or later, its now been > > possible to enable hard lockup detector (or NMI watchdog) on arm64 > > platforms. So enable corresponding support. > > > > One thing to note here is that normally lockup detector is initialized > > just after the early initcalls but PMU on arm64 comes up much later as > > device_initcall(). To cope with that, overriding watchdog_nmi_probe() to > > let the watchdog framework know PMU not ready, and inform the framework > > to re-initialize lockup detection once PMU has been initialized. > > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > > --- /dev/null > > +++ b/arch/arm64/kernel/watchdog_hld.c > > @@ -0,0 +1,37 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include <linux/nmi.h> > > +#include <linux/cpufreq.h> > > +#include <linux/perf/arm_pmu.h> > > + > > +/* > > + * Safe maximum CPU frequency in case a particular platform doesn't implement > > + * cpufreq driver. Although, architecture doesn't put any restrictions on > > + * maximum frequency but 5 GHz seems to be safe maximum given the available > > + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other > > + * hand, we can't make it much higher as it would lead to a large hard-lockup > > + * detection timeout on parts which are running slower (eg. 1GHz on > > + * Developerbox) and doesn't possess a cpufreq driver. > > + */ > > +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz > > +u64 hw_nmi_get_sample_period(int watchdog_thresh) > > +{ > > + unsigned int cpu = smp_processor_id(); > > + unsigned long max_cpu_freq; > > + > > + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; > > + if (!max_cpu_freq) > > + max_cpu_freq = SAFE_MAX_CPU_FREQ; > > + > > + return (u64)max_cpu_freq * watchdog_thresh; > > +} > > This change is not mentioned in the commit message. > Please, put it into a separate patch. Actully, This cames from [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org And I didn't touch the commit message from the origin patch. But of course, I could imporve it with proper description if anyone thinks it's not good enough. Would you mean put this function hw_nmi_get_sample_period() in patch 6th? In the view of "arm64 uses delayed init with all the functionality it need to set up", IMO, this make sense for me to put into a single patch. But if you still think this should put into a separate patch, I'll do it:) > > > +int __init watchdog_nmi_probe(void) > > +{ > > + if (!allow_lockup_detector_init_retry) > > + return -EBUSY; > > How do you know that you should return -EBUSY > when retry in not enabled? > > I guess that it is an optimization to make it fast > during the first call. But the logic is far from > obvious. > Yes, you can see this as an optimization, because arm64 PMU is not ready during lockup_detector_init(), so the watchdog_nmi_probe() must fail. Thus we only want to do watchdog_nmi_probe() in delayed init, so if not in the state (allow_lockup_detector_init_retry=true), just tell if it's unclear, maybe a brief comment can be add like this: + /* arm64 is only able to initialize lockup detecor during delayed init */ + if (!allow_lockup_detector_init_retry) + return -EBUSY; > > + > > + if (!arm_pmu_irq_is_nmi()) > > + return -ENODEV; > > + > > + return hardlockup_detector_perf_init(); > > +} > > Is this just an optimization or is it really needed? > Why this was not needed in v2 patchset? > > If it is just an optimization then I would remove it. > IMHO, it just adds confusion and it is not worth it. > It was a mistake when I rebased v2, This should be included in v2 but I missed it. For arm_pmu_irq_is_nmi() checking, we do need it, becasue arm64 needs explictly turns on Pseudo-NMI to support base function for NMI. hardlockup_detector_perf_init() will success even if we haven't had Pseudo-NMI turns on, however, the pmu interrupts will act like a normal interrupt instead of NMI and the hardlockup detector would be broken. thanks for all the comment BRs, Lecopzer
On Tue 2022-04-05 20:53:04, Lecopzer Chen wrote: > > > On Thu 2022-03-24 22:14:05, Lecopzer Chen wrote: > > > With the recent feature added to enable perf events to use pseudo NMIs > > > as interrupts on platforms which support GICv3 or later, its now been > > > possible to enable hard lockup detector (or NMI watchdog) on arm64 > > > platforms. So enable corresponding support. > > > > > > One thing to note here is that normally lockup detector is initialized > > > just after the early initcalls but PMU on arm64 comes up much later as > > > device_initcall(). To cope with that, overriding watchdog_nmi_probe() to > > > let the watchdog framework know PMU not ready, and inform the framework > > > to re-initialize lockup detection once PMU has been initialized. > > > > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > > > > --- /dev/null > > > +++ b/arch/arm64/kernel/watchdog_hld.c > > > @@ -0,0 +1,37 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +#include <linux/nmi.h> > > > +#include <linux/cpufreq.h> > > > +#include <linux/perf/arm_pmu.h> > > > + > > > +/* > > > + * Safe maximum CPU frequency in case a particular platform doesn't implement > > > + * cpufreq driver. Although, architecture doesn't put any restrictions on > > > + * maximum frequency but 5 GHz seems to be safe maximum given the available > > > + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other > > > + * hand, we can't make it much higher as it would lead to a large hard-lockup > > > + * detection timeout on parts which are running slower (eg. 1GHz on > > > + * Developerbox) and doesn't possess a cpufreq driver. > > > + */ > > > +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz > > > +u64 hw_nmi_get_sample_period(int watchdog_thresh) > > > +{ > > > + unsigned int cpu = smp_processor_id(); > > > + unsigned long max_cpu_freq; > > > + > > > + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; > > > + if (!max_cpu_freq) > > > + max_cpu_freq = SAFE_MAX_CPU_FREQ; > > > + > > > + return (u64)max_cpu_freq * watchdog_thresh; > > > +} > > > > This change is not mentioned in the commit message. > > Please, put it into a separate patch. > > > Actully, This cames from > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > And I didn't touch the commit message from the origin patch. > But of course, I could imporve it with proper description if > anyone thinks it's not good enough. I see. > Would you mean put this function hw_nmi_get_sample_period() in patch > 6th? > In the view of "arm64 uses delayed init with all the functionality it need to set up", > IMO, this make sense for me to put into a single patch. Or you could split it in two patches and add hw_nmi_get_sample_period() in the earlier patch. > But if you still think this should put into a separate patch, I'll do it:) It is always better to split the changes whenever possible. It makes the review easier. And it also helps to find the real culprit of a regression using bisection. > > > +int __init watchdog_nmi_probe(void) > > > +{ > > > + if (!allow_lockup_detector_init_retry) > > > + return -EBUSY; > > > > How do you know that you should return -EBUSY > > when retry in not enabled? > > > > I guess that it is an optimization to make it fast > > during the first call. But the logic is far from > > obvious. > > > > Yes, you can see this as an optimization, because arm64 PMU is not ready > during lockup_detector_init(), so the watchdog_nmi_probe() must fail. > > Thus we only want to do watchdog_nmi_probe() in delayed init, > so if not in the state (allow_lockup_detector_init_retry=true), just tell > > if it's unclear Yes, it is far from obvious. > maybe a brief comment can be add like this: > > + /* arm64 is only able to initialize lockup detecor during delayed init */ > + if (!allow_lockup_detector_init_retry) > + return -EBUSY; No, please, remove this optimization. It just makes problems: + it requires a comment here because the logic is far from obvious. + it is the reason why we need another variable to avoid the race in lockup_detector_check(), see the discussion about the 4th patch. > > > + > > > + if (!arm_pmu_irq_is_nmi()) > > > + return -ENODEV; > > > + > > > + return hardlockup_detector_perf_init(); > > > +} > > > For arm_pmu_irq_is_nmi() checking, we do need it, becasue arm64 needs > explictly turns on Pseudo-NMI to support base function for NMI. > > hardlockup_detector_perf_init() will success even if we haven't had > Pseudo-NMI turns on, however, the pmu interrupts will act like a > normal interrupt instead of NMI and the hardlockup detector would be broken. I see. Please, explain this in a comment. It is another thing that is far from obvious. Best Regards, Petr
> On Tue 2022-04-05 20:53:04, Lecopzer Chen wrote: > > > > > On Thu 2022-03-24 22:14:05, Lecopzer Chen wrote: > > > > With the recent feature added to enable perf events to use pseudo NMIs > > > > as interrupts on platforms which support GICv3 or later, its now been > > > > possible to enable hard lockup detector (or NMI watchdog) on arm64 > > > > platforms. So enable corresponding support. > > > > > > > > One thing to note here is that normally lockup detector is initialized > > > > just after the early initcalls but PMU on arm64 comes up much later as > > > > device_initcall(). To cope with that, overriding watchdog_nmi_probe() to > > > > let the watchdog framework know PMU not ready, and inform the framework > > > > to re-initialize lockup detection once PMU has been initialized. > > > > > > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > > > > > > --- /dev/null > > > > +++ b/arch/arm64/kernel/watchdog_hld.c > > > > @@ -0,0 +1,37 @@ > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > +#include <linux/nmi.h> > > > > +#include <linux/cpufreq.h> > > > > +#include <linux/perf/arm_pmu.h> > > > > + > > > > +/* > > > > + * Safe maximum CPU frequency in case a particular platform doesn't implement > > > > + * cpufreq driver. Although, architecture doesn't put any restrictions on > > > > + * maximum frequency but 5 GHz seems to be safe maximum given the available > > > > + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other > > > > + * hand, we can't make it much higher as it would lead to a large hard-lockup > > > > + * detection timeout on parts which are running slower (eg. 1GHz on > > > > + * Developerbox) and doesn't possess a cpufreq driver. > > > > + */ > > > > +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz > > > > +u64 hw_nmi_get_sample_period(int watchdog_thresh) > > > > +{ > > > > + unsigned int cpu = smp_processor_id(); > > > > + unsigned long max_cpu_freq; > > > > + > > > > + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; > > > > + if (!max_cpu_freq) > > > > + max_cpu_freq = SAFE_MAX_CPU_FREQ; > > > > + > > > > + return (u64)max_cpu_freq * watchdog_thresh; > > > > +} > > > > > > This change is not mentioned in the commit message. > > > Please, put it into a separate patch. > > > > > > Actully, This cames from > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > And I didn't touch the commit message from the origin patch. > > But of course, I could imporve it with proper description if > > anyone thinks it's not good enough. > > I see. > > > Would you mean put this function hw_nmi_get_sample_period() in patch > > 6th? > > In the view of "arm64 uses delayed init with all the functionality it need to set up", > > IMO, this make sense for me to put into a single patch. > > Or you could split it in two patches and add > hw_nmi_get_sample_period() in the earlier patch. > > > > But if you still think this should put into a separate patch, I'll do it:) > > It is always better to split the changes whenever possible. It makes > the review easier. And it also helps to find the real culprit of > a regression using bisection. Okay, I'll split this part into another change, thanks. > > > > +int __init watchdog_nmi_probe(void) > > > > +{ > > > > + if (!allow_lockup_detector_init_retry) > > > > + return -EBUSY; > > > > > > How do you know that you should return -EBUSY > > > when retry in not enabled? > > > > > > I guess that it is an optimization to make it fast > > > during the first call. But the logic is far from > > > obvious. > > > > > > > Yes, you can see this as an optimization, because arm64 PMU is not ready > > during lockup_detector_init(), so the watchdog_nmi_probe() must fail. > > > > Thus we only want to do watchdog_nmi_probe() in delayed init, > > so if not in the state (allow_lockup_detector_init_retry=true), just tell > > > > if it's unclear > > Yes, it is far from obvious. > > > maybe a brief comment can be add like this: > > > > + /* arm64 is only able to initialize lockup detecor during delayed init */ > > + if (!allow_lockup_detector_init_retry) > > + return -EBUSY; > > No, please, remove this optimization. It just makes problems: > > + it requires a comment here because the logic is far from obvious. > > + it is the reason why we need another variable to avoid the race in > lockup_detector_check(), see the discussion about the 4th patch. After some days studying, if I remove this if-condition which means the following hardlockup_detector_perf_init() needs to return -EBUSY. However, the default return value that if pmu is not ready is -ENOENT. The call path for hardlockup_detector_perf_init() is really complicated, I have some approach about this: 1. abstract second variable with Kconfig. a. Add a ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT (the naming is a little bit long, may have better naming) in "lib/Kconfig.debug" if ARCH knew they do need delayed init for lockup detector. + select ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT if HAVE_HARDLOCKUP_DETECTOR_PERF b. and the watchdog_nmi_probe would look like. +int __init watchdog_nmi_probe(void) +{ + int ret; + + /* comment here... */ + if (!arm_pmu_irq_is_nmi()) + return -ENODEV; + + ret = hardlockup_detector_perf_init(); + if (ret && + IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) + return -EBUSY; + + return ret; +} and than we can have only one variable (allow_lockup_detector_init_retry) in 4th patch. 2. base on ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT, change inside hardlockup_detector_perf_init(). int __init hardlockup_detector_perf_init(void) { int ret = hardlockup_detector_event_create(); if (ret) { pr_info("Perf NMI watchdog permanently disabled\n"); + + /* comment here... */ + if (IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) + ret = -EBUSY; } else { perf_event_release_kernel(this_cpu_read(watchdog_ev)); this_cpu_write(watchdog_ev, NULL); } return ret; } 3. Don't add any other config, try to find a proper location to return -EBUSY in hardlockup_detector_event_create(). IMHO, this may involve the PMU subsys and should be the hardest approach. > > > > + > > > > + if (!arm_pmu_irq_is_nmi()) > > > > + return -ENODEV; > > > > + > > > > + return hardlockup_detector_perf_init(); > > > > +} > > > > > For arm_pmu_irq_is_nmi() checking, we do need it, becasue arm64 needs > > explictly turns on Pseudo-NMI to support base function for NMI. > > > > hardlockup_detector_perf_init() will success even if we haven't had > > Pseudo-NMI turns on, however, the pmu interrupts will act like a > > normal interrupt instead of NMI and the hardlockup detector would be broken. > > I see. Please, explain this in a comment. It is another thing > that is far from obvious. > thank you, I'll just add the comment above like this. /* * hardlockup_detector_perf_init() will success even if we haven't had * Pseudo-NMI turns on, however, the pmu interrupts will act like a * normal interrupt instead of NMI and the hardlockup detector would be broken. */ if (!arm_pmu_irq_is_nmi()) return -ENODEV; thanks BRs, Lecopzer
On Fri 2022-04-08 00:59:49, Lecopzer Chen wrote: > > > On Tue 2022-04-05 20:53:04, Lecopzer Chen wrote: > > > > > > > On Thu 2022-03-24 22:14:05, Lecopzer Chen wrote: > > > > > With the recent feature added to enable perf events to use pseudo NMIs > > > > > as interrupts on platforms which support GICv3 or later, its now been > > > > > possible to enable hard lockup detector (or NMI watchdog) on arm64 > > > > > platforms. So enable corresponding support. > > > > > > > > > > One thing to note here is that normally lockup detector is initialized > > > > > just after the early initcalls but PMU on arm64 comes up much later as > > > > > device_initcall(). To cope with that, overriding watchdog_nmi_probe() to > > > > > let the watchdog framework know PMU not ready, and inform the framework > > > > > to re-initialize lockup detection once PMU has been initialized. > > > > > > > > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > > > > > > > > --- /dev/null > > > > > +++ b/arch/arm64/kernel/watchdog_hld.c > > > > > @@ -0,0 +1,37 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > > +#include <linux/nmi.h> > > > > > +#include <linux/cpufreq.h> > > > > > +#include <linux/perf/arm_pmu.h> > > > > > + > > > > > +/* > > > > > + * Safe maximum CPU frequency in case a particular platform doesn't implement > > > > > + * cpufreq driver. Although, architecture doesn't put any restrictions on > > > > > + * maximum frequency but 5 GHz seems to be safe maximum given the available > > > > > + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other > > > > > + * hand, we can't make it much higher as it would lead to a large hard-lockup > > > > > + * detection timeout on parts which are running slower (eg. 1GHz on > > > > > + * Developerbox) and doesn't possess a cpufreq driver. > > > > > + */ > > > > > +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz > > > > > +u64 hw_nmi_get_sample_period(int watchdog_thresh) > > > > > +{ > > > > > + unsigned int cpu = smp_processor_id(); > > > > > + unsigned long max_cpu_freq; > > > > > + > > > > > + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; > > > > > + if (!max_cpu_freq) > > > > > + max_cpu_freq = SAFE_MAX_CPU_FREQ; > > > > > + > > > > > + return (u64)max_cpu_freq * watchdog_thresh; > > > > > +} > > > > > > > > This change is not mentioned in the commit message. > > > > Please, put it into a separate patch. > > > > > > > > > Actully, This cames from > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > And I didn't touch the commit message from the origin patch. > > > But of course, I could imporve it with proper description if > > > anyone thinks it's not good enough. > > > > I see. > > > > > Would you mean put this function hw_nmi_get_sample_period() in patch > > > 6th? > > > In the view of "arm64 uses delayed init with all the functionality it need to set up", > > > IMO, this make sense for me to put into a single patch. > > > > Or you could split it in two patches and add > > hw_nmi_get_sample_period() in the earlier patch. > > > > > > > But if you still think this should put into a separate patch, I'll do it:) > > > > It is always better to split the changes whenever possible. It makes > > the review easier. And it also helps to find the real culprit of > > a regression using bisection. > > Okay, I'll split this part into another change, thanks. > > > > > > > +int __init watchdog_nmi_probe(void) > > > > > +{ > > > > > + if (!allow_lockup_detector_init_retry) > > > > > + return -EBUSY; > > > > > > > > How do you know that you should return -EBUSY > > > > when retry in not enabled? > > > > > > > > I guess that it is an optimization to make it fast > > > > during the first call. But the logic is far from > > > > obvious. > > > > > > > > > > Yes, you can see this as an optimization, because arm64 PMU is not ready > > > during lockup_detector_init(), so the watchdog_nmi_probe() must fail. > > > > > > Thus we only want to do watchdog_nmi_probe() in delayed init, > > > so if not in the state (allow_lockup_detector_init_retry=true), just tell > > > > > > if it's unclear > > > > Yes, it is far from obvious. > > > > > maybe a brief comment can be add like this: > > > > > > + /* arm64 is only able to initialize lockup detecor during delayed init */ > > > + if (!allow_lockup_detector_init_retry) > > > + return -EBUSY; > > > > No, please, remove this optimization. It just makes problems: > > > > + it requires a comment here because the logic is far from obvious. > > > > + it is the reason why we need another variable to avoid the race in > > lockup_detector_check(), see the discussion about the 4th patch. > > After some days studying, if I remove this if-condition which means the > following hardlockup_detector_perf_init() needs to return -EBUSY. > However, the default return value that if pmu is not ready is -ENOENT. I see. > The call path for hardlockup_detector_perf_init() is really complicated, > > I have some approach about this: > 1. abstract second variable with Kconfig. > a. Add a ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT > (the naming is a little bit long, may have better naming) > in "lib/Kconfig.debug" if ARCH knew they do need delayed init for > lockup detector. > > + select ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT if HAVE_HARDLOCKUP_DETECTOR_PERF > > b. and the watchdog_nmi_probe would look like. > > +int __init watchdog_nmi_probe(void) > +{ > + int ret; > + > + /* comment here... */ > + if (!arm_pmu_irq_is_nmi()) > + return -ENODEV; > + > + ret = hardlockup_detector_perf_init(); > + if (ret && > + IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) > + return -EBUSY; > + > + return ret; > +} > > and than we can have only one variable (allow_lockup_detector_init_retry) > in 4th patch. > > > 2. base on ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT, change > inside hardlockup_detector_perf_init(). > > int __init hardlockup_detector_perf_init(void) > { > int ret = hardlockup_detector_event_create(); > > if (ret) { > pr_info("Perf NMI watchdog permanently disabled\n"); > + > + /* comment here... */ > + if (IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) > + ret = -EBUSY; > } else { > perf_event_release_kernel(this_cpu_read(watchdog_ev)); > this_cpu_write(watchdog_ev, NULL); > } > return ret; > } > > 3. Don't add any other config, try to find a proper location > to return -EBUSY in hardlockup_detector_event_create(). > IMHO, this may involve the PMU subsys and should be > the hardest approach. Honestly, everything looks a bit ugly and complicated to me. OKAY, is the return value actually important? What about just introducing the API that will allow to try to initialize the hardlockup detector later: /* * Retry hardlockup detector init. It is useful when it requires some * functionality that has to be initialized later on a particular * platform. */ void __init retry_lockup_detector_init(void) { /* Must be called before late init calls. */ if (!allow_lockup_detector_init_retry) return 0; queue_work_on(__smp_processor_id(), system_wq, &detector_work); } /* * Ensure that optional delayed hardlockup init is proceed before * the init code and memory is freed. */ static int __init lockup_detector_check(void) { /* Prevent any later retry. */ allow_lockup_detector_init_retry = false; /* Make sure no work is pending. */ flush_work(&detector_work); } late_initcall_sync(lockup_detector_check); You could leave lockup_detector_init() as it is. It does not really matter what was the exact error value returned by watchdog_nmi_probe(). Then you could call retry_lockup_detector_init() in armv8_pmu_driver_init() and be done with it. It will be universal API that might be used on any architecture for any reason. If nobody calls retry_lockup_detector_init() then nohing will happen and the code will work as before. It might make sense to provide the API only on architectures that really need it. We could hide it under ARCH_NEED_DELAYED_HARDLOCKUP_DETECTOR_INIT , similar to ARCH_NEEDS_CPU_IDLE_COUPLE. Best Regards, Petr
> On Fri 2022-04-08 00:59:49, Lecopzer Chen wrote: > > > > > On Tue 2022-04-05 20:53:04, Lecopzer Chen wrote: > > > > > > > > > On Thu 2022-03-24 22:14:05, Lecopzer Chen wrote: > > > > > > With the recent feature added to enable perf events to use pseudo NMIs > > > > > > as interrupts on platforms which support GICv3 or later, its now been > > > > > > possible to enable hard lockup detector (or NMI watchdog) on arm64 > > > > > > platforms. So enable corresponding support. > > > > > > > > > > > > One thing to note here is that normally lockup detector is initialized > > > > > > just after the early initcalls but PMU on arm64 comes up much later as > > > > > > device_initcall(). To cope with that, overriding watchdog_nmi_probe() to > > > > > > let the watchdog framework know PMU not ready, and inform the framework > > > > > > to re-initialize lockup detection once PMU has been initialized. > > > > > > > > > > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > > > > > > > > > > --- /dev/null > > > > > > +++ b/arch/arm64/kernel/watchdog_hld.c > > > > > > @@ -0,0 +1,37 @@ > > > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > > > +#include <linux/nmi.h> > > > > > > +#include <linux/cpufreq.h> > > > > > > +#include <linux/perf/arm_pmu.h> > > > > > > + > > > > > > +/* > > > > > > + * Safe maximum CPU frequency in case a particular platform doesn't implement > > > > > > + * cpufreq driver. Although, architecture doesn't put any restrictions on > > > > > > + * maximum frequency but 5 GHz seems to be safe maximum given the available > > > > > > + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other > > > > > > + * hand, we can't make it much higher as it would lead to a large hard-lockup > > > > > > + * detection timeout on parts which are running slower (eg. 1GHz on > > > > > > + * Developerbox) and doesn't possess a cpufreq driver. > > > > > > + */ > > > > > > +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz > > > > > > +u64 hw_nmi_get_sample_period(int watchdog_thresh) > > > > > > +{ > > > > > > + unsigned int cpu = smp_processor_id(); > > > > > > + unsigned long max_cpu_freq; > > > > > > + > > > > > > + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; > > > > > > + if (!max_cpu_freq) > > > > > > + max_cpu_freq = SAFE_MAX_CPU_FREQ; > > > > > > + > > > > > > + return (u64)max_cpu_freq * watchdog_thresh; > > > > > > +} > > > > > > > > > > This change is not mentioned in the commit message. > > > > > Please, put it into a separate patch. > > > > > > > > > > > > Actully, This cames from > > > > [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org > > > > And I didn't touch the commit message from the origin patch. > > > > But of course, I could imporve it with proper description if > > > > anyone thinks it's not good enough. > > > > > > I see. > > > > > > > Would you mean put this function hw_nmi_get_sample_period() in patch > > > > 6th? > > > > In the view of "arm64 uses delayed init with all the functionality it need to set up", > > > > IMO, this make sense for me to put into a single patch. > > > > > > Or you could split it in two patches and add > > > hw_nmi_get_sample_period() in the earlier patch. > > > > > > > > > > But if you still think this should put into a separate patch, I'll do it:) > > > > > > It is always better to split the changes whenever possible. It makes > > > the review easier. And it also helps to find the real culprit of > > > a regression using bisection. > > > > Okay, I'll split this part into another change, thanks. > > > > > > > > > > +int __init watchdog_nmi_probe(void) > > > > > > +{ > > > > > > + if (!allow_lockup_detector_init_retry) > > > > > > + return -EBUSY; > > > > > > > > > > How do you know that you should return -EBUSY > > > > > when retry in not enabled? > > > > > > > > > > I guess that it is an optimization to make it fast > > > > > during the first call. But the logic is far from > > > > > obvious. > > > > > > > > > > > > > Yes, you can see this as an optimization, because arm64 PMU is not ready > > > > during lockup_detector_init(), so the watchdog_nmi_probe() must fail. > > > > > > > > Thus we only want to do watchdog_nmi_probe() in delayed init, > > > > so if not in the state (allow_lockup_detector_init_retry=true), just tell > > > > > > > > if it's unclear > > > > > > Yes, it is far from obvious. > > > > > > > maybe a brief comment can be add like this: > > > > > > > > + /* arm64 is only able to initialize lockup detecor during delayed init */ > > > > + if (!allow_lockup_detector_init_retry) > > > > + return -EBUSY; > > > > > > No, please, remove this optimization. It just makes problems: > > > > > > + it requires a comment here because the logic is far from obvious. > > > > > > + it is the reason why we need another variable to avoid the race in > > > lockup_detector_check(), see the discussion about the 4th patch. > > > > After some days studying, if I remove this if-condition which means the > > following hardlockup_detector_perf_init() needs to return -EBUSY. > > However, the default return value that if pmu is not ready is -ENOENT. > > I see. > > > The call path for hardlockup_detector_perf_init() is really complicated, > > > > I have some approach about this: > > 1. abstract second variable with Kconfig. > > a. Add a ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT > > (the naming is a little bit long, may have better naming) > > in "lib/Kconfig.debug" if ARCH knew they do need delayed init for > > lockup detector. > > > > + select ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT if HAVE_HARDLOCKUP_DETECTOR_PERF > > > > b. and the watchdog_nmi_probe would look like. > > > > +int __init watchdog_nmi_probe(void) > > +{ > > + int ret; > > + > > + /* comment here... */ > > + if (!arm_pmu_irq_is_nmi()) > > + return -ENODEV; > > + > > + ret = hardlockup_detector_perf_init(); > > + if (ret && > > + IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) > > + return -EBUSY; > > + > > + return ret; > > +} > > > > and than we can have only one variable (allow_lockup_detector_init_retry) > > in 4th patch. > > > > > > 2. base on ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT, change > > inside hardlockup_detector_perf_init(). > > > > int __init hardlockup_detector_perf_init(void) > > { > > int ret = hardlockup_detector_event_create(); > > > > if (ret) { > > pr_info("Perf NMI watchdog permanently disabled\n"); > > + > > + /* comment here... */ > > + if (IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) > > + ret = -EBUSY; > > } else { > > perf_event_release_kernel(this_cpu_read(watchdog_ev)); > > this_cpu_write(watchdog_ev, NULL); > > } > > return ret; > > } > > > > 3. Don't add any other config, try to find a proper location > > to return -EBUSY in hardlockup_detector_event_create(). > > IMHO, this may involve the PMU subsys and should be > > the hardest approach. > > Honestly, everything looks a bit ugly and complicated to me. > > OKAY, is the return value actually important? > > What about just introducing the API that will allow to try to > initialize the hardlockup detector later: > > /* > * Retry hardlockup detector init. It is useful when it requires some > * functionality that has to be initialized later on a particular > * platform. > */ > void __init retry_lockup_detector_init(void) > { > /* Must be called before late init calls. */ > if (!allow_lockup_detector_init_retry) > return 0; > > queue_work_on(__smp_processor_id(), system_wq, &detector_work); > } > > /* > * Ensure that optional delayed hardlockup init is proceed before > * the init code and memory is freed. > */ > static int __init lockup_detector_check(void) > { > /* Prevent any later retry. */ > allow_lockup_detector_init_retry = false; > > /* Make sure no work is pending. */ > flush_work(&detector_work); > } > late_initcall_sync(lockup_detector_check); > > You could leave lockup_detector_init() as it is. It does not really > matter what was the exact error value returned by watchdog_nmi_probe(). > > Then you could call retry_lockup_detector_init() in > armv8_pmu_driver_init() and be done with it. > > It will be universal API that might be used on any architecture > for any reason. If nobody calls retry_lockup_detector_init() > then nohing will happen and the code will work as before. > > It might make sense to provide the API only on architectures that > really need it. We could hide it under > > ARCH_NEED_DELAYED_HARDLOCKUP_DETECTOR_INIT > > , similar to ARCH_NEEDS_CPU_IDLE_COUPLE. > Sorry for late reply. It's really a good idea. Since I have already had lots things to revise in v3, I'm now preparing the V4. I'll send it in these few days. Thanks a lots for your great idea. BRs, Lecopzer
> > The call path for hardlockup_detector_perf_init() is really complicated, > > > > I have some approach about this: > > 1. abstract second variable with Kconfig. > > a. Add a ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT > > (the naming is a little bit long, may have better naming) > > in "lib/Kconfig.debug" if ARCH knew they do need delayed init for > > lockup detector. > > > > + select ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT if HAVE_HARDLOCKUP_DETECTOR_PERF > > > > b. and the watchdog_nmi_probe would look like. > > > > +int __init watchdog_nmi_probe(void) > > +{ > > + int ret; > > + > > + /* comment here... */ > > + if (!arm_pmu_irq_is_nmi()) > > + return -ENODEV; > > + > > + ret = hardlockup_detector_perf_init(); > > + if (ret && > > + IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) > > + return -EBUSY; > > + > > + return ret; > > +} > > > > and than we can have only one variable (allow_lockup_detector_init_retry) > > in 4th patch. > > > > > > 2. base on ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT, change > > inside hardlockup_detector_perf_init(). > > > > int __init hardlockup_detector_perf_init(void) > > { > > int ret = hardlockup_detector_event_create(); > > > > if (ret) { > > pr_info("Perf NMI watchdog permanently disabled\n"); > > + > > + /* comment here... */ > > + if (IS_ENABLED(ARCH_SUPPORTS_HARDLOCKUP_DETECTOR_DLAYED_INIT)) > > + ret = -EBUSY; > > } else { > > perf_event_release_kernel(this_cpu_read(watchdog_ev)); > > this_cpu_write(watchdog_ev, NULL); > > } > > return ret; > > } > > > > 3. Don't add any other config, try to find a proper location > > to return -EBUSY in hardlockup_detector_event_create(). > > IMHO, this may involve the PMU subsys and should be > > the hardest approach. > > Honestly, everything looks a bit ugly and complicated to me. > > OKAY, is the return value actually important? > > What about just introducing the API that will allow to try to > initialize the hardlockup detector later: > > /* > * Retry hardlockup detector init. It is useful when it requires some > * functionality that has to be initialized later on a particular > * platform. > */ > void __init retry_lockup_detector_init(void) > { > /* Must be called before late init calls. */ > if (!allow_lockup_detector_init_retry) > return 0; > > queue_work_on(__smp_processor_id(), system_wq, &detector_work); > } > > /* > * Ensure that optional delayed hardlockup init is proceed before > * the init code and memory is freed. > */ > static int __init lockup_detector_check(void) > { > /* Prevent any later retry. */ > allow_lockup_detector_init_retry = false; > > /* Make sure no work is pending. */ > flush_work(&detector_work); > } > late_initcall_sync(lockup_detector_check); > > You could leave lockup_detector_init() as it is. It does not really > matter what was the exact error value returned by watchdog_nmi_probe(). > > Then you could call retry_lockup_detector_init() in > armv8_pmu_driver_init() and be done with it. > > It will be universal API that might be used on any architecture > for any reason. If nobody calls retry_lockup_detector_init() > then nohing will happen and the code will work as before. > > It might make sense to provide the API only on architectures that > really need it. We could hide it under > > ARCH_NEED_DELAYED_HARDLOCKUP_DETECTOR_INIT > > , similar to ARCH_NEEDS_CPU_IDLE_COUPLE. > During implementation, if I add ARCH_NEED_DELAYED_..., there will contain many enclosed ifdef-endif and is a little bit ugly. Also, I didn't find a must-have reason to this Kconfig after I rebase from your suggestion. The one calls retry_lockup_detector_init() must fail at lockup_detector_init, so I think anyone who has aleady failed at lockup_detector_init() has right to retry no matter HW, SW or Arch reason. Thus I might not introduce a new Kconfig in v4, and I would be glad to see if any further commet on this.
On Wed 2022-04-27 00:38:40, Lecopzer Chen wrote: > > What about just introducing the API that will allow to try to > > initialize the hardlockup detector later: > > > > /* > > * Retry hardlockup detector init. It is useful when it requires some > > * functionality that has to be initialized later on a particular > > * platform. > > */ > > void __init retry_lockup_detector_init(void) > > { > > /* Must be called before late init calls. */ > > if (!allow_lockup_detector_init_retry) > > return 0; > > > > queue_work_on(__smp_processor_id(), system_wq, &detector_work); > > } > > > > /* > > * Ensure that optional delayed hardlockup init is proceed before > > * the init code and memory is freed. > > */ > > static int __init lockup_detector_check(void) > > { > > /* Prevent any later retry. */ > > allow_lockup_detector_init_retry = false; > > > > /* Make sure no work is pending. */ > > flush_work(&detector_work); > > } > > late_initcall_sync(lockup_detector_check); > > > > You could leave lockup_detector_init() as it is. It does not really > > matter what was the exact error value returned by watchdog_nmi_probe(). > > > > Then you could call retry_lockup_detector_init() in > > armv8_pmu_driver_init() and be done with it. > > > > It will be universal API that might be used on any architecture > > for any reason. If nobody calls retry_lockup_detector_init() > > then nohing will happen and the code will work as before. > > > > It might make sense to provide the API only on architectures that > > really need it. We could hide it under > > > > ARCH_NEED_DELAYED_HARDLOCKUP_DETECTOR_INIT > > > > , similar to ARCH_NEEDS_CPU_IDLE_COUPLE. > > > > During implementation, if I add ARCH_NEED_DELAYED_..., there will contain > many enclosed ifdef-endif and is a little bit ugly. > Also, I didn't find a must-have reason to this Kconfig after I rebase from > your suggestion. > > The one calls retry_lockup_detector_init() must fail at lockup_detector_init, > so I think anyone who has aleady failed at lockup_detector_init() has right > to retry no matter HW, SW or Arch reason. > Thus I might not introduce a new Kconfig in v4, and I would be glad to see > if any further commet on this. Sounds reasonable from my POV. There is not much code. It will be removed after the init phase. It will be most likely removed even during compilation when linked with gcc LTE optimization and the symbols are not used. Best Regards, Petr PS: It might take few days until I do the review of v4. I am a bit overloaded at the moment.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c842878f8133..9c24052c5360 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -190,6 +190,8 @@ config ARM64 select HAVE_NMI select HAVE_PATA_PLATFORM select HAVE_PERF_EVENTS + select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI + select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 88b3e2a21408..3e62a8877ed7 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_CPU_PM) += sleep.o suspend.o obj-$(CONFIG_CPU_IDLE) += cpuidle.o diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index cab678ed6618..f44ad669c18f 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -23,6 +23,7 @@ #include <linux/platform_device.h> #include <linux/sched_clock.h> #include <linux/smp.h> +#include <linux/nmi.h> /* ARMv8 Cortex-A53 specific event types. */ #define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2 @@ -1380,10 +1381,15 @@ static struct platform_driver armv8_pmu_driver = { static int __init armv8_pmu_driver_init(void) { + int ret; + if (acpi_disabled) - return platform_driver_register(&armv8_pmu_driver); + ret = platform_driver_register(&armv8_pmu_driver); else - return arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); + ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); + + retry_lockup_detector_init(); + return ret; } device_initcall(armv8_pmu_driver_init) diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c new file mode 100644 index 000000000000..c085e99b3cd2 --- /dev/null +++ b/arch/arm64/kernel/watchdog_hld.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/nmi.h> +#include <linux/cpufreq.h> +#include <linux/perf/arm_pmu.h> + +/* + * Safe maximum CPU frequency in case a particular platform doesn't implement + * cpufreq driver. Although, architecture doesn't put any restrictions on + * maximum frequency but 5 GHz seems to be safe maximum given the available + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other + * hand, we can't make it much higher as it would lead to a large hard-lockup + * detection timeout on parts which are running slower (eg. 1GHz on + * Developerbox) and doesn't possess a cpufreq driver. + */ +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz +u64 hw_nmi_get_sample_period(int watchdog_thresh) +{ + unsigned int cpu = smp_processor_id(); + unsigned long max_cpu_freq; + + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; + if (!max_cpu_freq) + max_cpu_freq = SAFE_MAX_CPU_FREQ; + + return (u64)max_cpu_freq * watchdog_thresh; +} + +int __init watchdog_nmi_probe(void) +{ + if (!allow_lockup_detector_init_retry) + return -EBUSY; + + if (!arm_pmu_irq_is_nmi()) + return -ENODEV; + + return hardlockup_detector_perf_init(); +} diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 295cc7952d0e..e77f4897fca2 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -697,6 +697,11 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu) return per_cpu(hw_events->irq, cpu); } +bool arm_pmu_irq_is_nmi(void) +{ + return has_nmi; +} + /* * PMU hardware loses all context when a CPU goes offline. * When a CPU is hotplugged back in, since some hardware registers are diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 2512e2f9cd4e..9325d01adc3e 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -169,6 +169,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu); #define kvm_host_pmu_init(x) do { } while(0) #endif +bool arm_pmu_irq_is_nmi(void); + /* Internal functions only for core arm_pmu code */ struct arm_pmu *armpmu_alloc(void); struct arm_pmu *armpmu_alloc_atomic(void);