Message ID | 20220307154729.13477-5-lecopzer.chen@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Suppot hld based on Pseudo-NMI for arm64 | expand |
On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote: > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready > yet. E.g. on arm64, PMU is not ready until > device_initcall(armv8_pmu_driver_init). And it is deeply integrated > with the driver model and cpuhp. Hence it is hard to push this > initialization before smp_init(). The above is clear. > But it is easy to take an opposite approach by enabling watchdog_hld to > get the capability of PMU async. > > The async model is achieved by expanding watchdog_nmi_probe() with > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head. These two paragraphs are a bit confusing to me. It might be just a problem with translation. I am not a native speaker. Anyway, I wonder if the following is more clear: <proposal> But it is easy to take an opposite approach and try to initialize the watchdog once again later. The delayed probe is called using workqueues. It need to allocate memory and must be proceed in a normal context. The delayed probe is queued only when the early one returns -EBUSY. It is the return code returned when PMU is not ready yet. </proposal> > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu) > hardlockup_detector_perf_disable(); > } > > -/* Return 0, if a NMI watchdog is available. Error code otherwise */ > +/* > + * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not > + * ready, and arch code should wake up hld_detector_wait when ready. Other > + * negative value if not support. > + */ I wonder if the following is slightly more clear: /* * Arch specific API. * * Return 0 when NMI watchdog is available, negative value otherwise. * The error code -EBUSY is special. It means that a deferred probe * might succeed later. */ > int __weak __init watchdog_nmi_probe(void) > { > return hardlockup_detector_perf_init(); > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void) > #define watchdog_sysctl_init() do { } while (0) > #endif /* CONFIG_SYSCTL */ > > +static void lockup_detector_delay_init(struct work_struct *work); > +bool lockup_detector_pending_init __initdata; > + > +struct wait_queue_head hld_detector_wait __initdata = > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); > + > +static struct work_struct detector_work __initdata = > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); > + > +static void __init lockup_detector_delay_init(struct work_struct *work) > +{ > + int ret; > + > + wait_event(hld_detector_wait, > + lockup_detector_pending_init == false); > + > + /* > + * Here, we know the PMU should be ready, so set pending to true to > + * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again. > + */ > + lockup_detector_pending_init = true; This does not make sense to me. We are here only when: 1. lockup_detector_init() queued this work. 2. Someone cleared @lockup_detector_pending_init and woke the worker via wait_queue. IT might be either PMU init code or the late lockup_detector_check(). watchdog_nmi_probe() might still return -EBUSY when PMU init failed. If you wanted to try the delayed probe once again (3rd attempt) from lockup_detector_check(), you would need to queue the work once again. But you need to be sure that lockup_detector_check() was not called yet. Otherwise, the 2nd work might wait forewer. IMHO, it is not worth the complexity. > + ret = watchdog_nmi_probe(); > + if (ret) { > + pr_info("Delayed init of the lockup detector failed: %d\n", ret); > + pr_info("Perf NMI watchdog permanently disabled\n"); > + return; > + } > + > + nmi_watchdog_available = true; > + lockup_detector_setup(); > + lockup_detector_pending_init = false; > +} Otherwise, it looks good to me. Best Regards, Petr
> On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote: > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready > > yet. E.g. on arm64, PMU is not ready until > > device_initcall(armv8_pmu_driver_init). And it is deeply integrated > > with the driver model and cpuhp. Hence it is hard to push this > > initialization before smp_init(). > > The above is clear. > > > But it is easy to take an opposite approach by enabling watchdog_hld to > > get the capability of PMU async. > > > > The async model is achieved by expanding watchdog_nmi_probe() with > > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head. > > These two paragraphs are a bit confusing to me. It might be just a > problem with translation. I am not a native speaker. Anyway, I wonder > if the following is more clear: > > <proposal> > But it is easy to take an opposite approach and try to initialize > the watchdog once again later. > > The delayed probe is called using workqueues. It need to allocate > memory and must be proceed in a normal context. > > The delayed probe is queued only when the early one returns -EBUSY. > It is the return code returned when PMU is not ready yet. > </proposal> Of course, the original description only briefly told us the functionality. So I think it makes sense to explain it if anyone think it's unclear. Also I'm not native speaker either, but I don't feel anything weird in the description. I'll use the description you provided, thanks. > > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu) > > hardlockup_detector_perf_disable(); > > } > > > > -/* Return 0, if a NMI watchdog is available. Error code otherwise */ > > +/* > > + * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not > > + * ready, and arch code should wake up hld_detector_wait when ready. Other > > + * negative value if not support. > > + */ > > I wonder if the following is slightly more clear: > > /* > * Arch specific API. > * > * Return 0 when NMI watchdog is available, negative value otherwise. > * The error code -EBUSY is special. It means that a deferred probe > * might succeed later. > */ > Yes this should be more clear. Abstract `hld_detector_wait` with `deferred probe` is a good idea. Thanks, I'll take this. > > int __weak __init watchdog_nmi_probe(void) > > { > > return hardlockup_detector_perf_init(); > > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void) > > #define watchdog_sysctl_init() do { } while (0) > > #endif /* CONFIG_SYSCTL */ > > > > +static void lockup_detector_delay_init(struct work_struct *work); > > +bool lockup_detector_pending_init __initdata; > > + > > +struct wait_queue_head hld_detector_wait __initdata = > > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); > > + > > +static struct work_struct detector_work __initdata = > > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); > > + > > +static void __init lockup_detector_delay_init(struct work_struct *work) > > +{ > > + int ret; > > + > > + wait_event(hld_detector_wait, > > + lockup_detector_pending_init == false); > > + > > + /* > > + * Here, we know the PMU should be ready, so set pending to true to > > + * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again. > > + */ > > + lockup_detector_pending_init = true; > > This does not make sense to me. We are here only when: > > 1. lockup_detector_init() queued this work. > > 2. Someone cleared @lockup_detector_pending_init and woke the > worker via wait_queue. IT might be either PMU init code > or the late lockup_detector_check(). > > watchdog_nmi_probe() might still return -EBUSY when PMU init failed. > > If you wanted to try the delayed probe once again (3rd attempt) from > lockup_detector_check(), you would need to queue the work once again. > But you need to be sure that lockup_detector_check() was not called > yet. Otherwise, the 2nd work might wait forewer. > > IMHO, it is not worth the complexity. The original assumption is: nobody should use delayed probe after lockup_detector_check() (which has __init attribute). That is, everything including PMU and delayed probe of lock detector must finsh before do_initcalls() which means delayed probe can't support with external PMU module init. Also, 1. lockup_detector_check is registered with late_initcall_sync(), so it'd be called in the last order of do_initcalls()). 2. watchdog_nmi_probe() and all the delayed relative functions and variables have __init attribute, no one should ever use it after __init section is released. The only case is PMU probe function is also late_initcall_sync(). How about this one: 1. Wrap the wake_up code to reduce the complexity for user side. 2. Remove wait queue. Instead queue work when lockup_detector_init(), queue the delayed probe work when arch PMU code finish probe. and the flow turns to 1. lockup_detector_init() get -EBUSY, set lockup_detector_pending_init=true 2. PMU arch code init done, call lockup_detector_queue_work(). 3. lockup_detector_queue_work() queue the work only when lockup_detector_pending_init=true which means nobody should call this before lockup_detector_init(). 4. the work lockup_detector_delay_init() is doing without wait event. if probe success, set lockup_detector_pending_init=false. 5. at late_initcall_sync(), lockup_detector_check() call flush_work() first to avoid previous lockup_detector_queue_work() is not scheduled. And then test whether lockup_detector_pending_init is false, if it's true, means we have pending init un-finished, than forcely queue work again and flush_work to make sure the __init section won't be freed before the work done. This remove the complexity of wait event which we were disscussed. The draft of the diff code(diff with this series) shows below. diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 77eaefee13ea..c776618fbfa8 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -1388,9 +1388,7 @@ static int __init armv8_pmu_driver_init(void) else ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); - /* Inform watchdog core we are ready to probe hld by delayed init. */ - lockup_detector_pending_init = false; - wake_up(&hld_detector_wait); + lockup_detector_queue_work(); return ret; } device_initcall(armv8_pmu_driver_init) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index cc7df31be9db..98060a86fac6 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -120,7 +120,7 @@ void watchdog_nmi_stop(void); void watchdog_nmi_start(void); extern bool lockup_detector_pending_init; -extern struct wait_queue_head hld_detector_wait; +void lockup_detector_queue_work(void); int watchdog_nmi_probe(void); void watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 49bdcaf5bd8f..acaa9f3ac162 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -846,9 +846,6 @@ static void __init watchdog_sysctl_init(void) static void lockup_detector_delay_init(struct work_struct *work); bool lockup_detector_pending_init __initdata; -struct wait_queue_head hld_detector_wait __initdata = - __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); - static struct work_struct detector_work __initdata = __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); @@ -856,14 +853,6 @@ static void __init lockup_detector_delay_init(struct work_struct *work) { int ret; - wait_event(hld_detector_wait, - lockup_detector_pending_init == false); - - /* - * Here, we know the PMU should be ready, so set pending to true to - * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again. - */ - lockup_detector_pending_init = true; ret = watchdog_nmi_probe(); if (ret) { pr_info("Delayed init of the lockup detector failed: %d\n", ret); @@ -876,15 +865,27 @@ static void __init lockup_detector_delay_init(struct work_struct *work) lockup_detector_pending_init = false; } +/* Must call after lockup_detector_init() that we do need delayed probe */ +void __init lockup_detector_queue_work(void) +{ + if (!lockup_detector_pending_init) + return; + + queue_work_on(__smp_processor_id(), system_wq, &detector_work); +} + /* Ensure the check is called after the initialization of PMU driver */ static int __init lockup_detector_check(void) { + /* Make sure no work is pending. */ + flush_work(&detector_work); + if (!lockup_detector_pending_init) return 0; pr_info("Delayed init checking failed, retry for once.\n"); - lockup_detector_pending_init = false; - wake_up(&hld_detector_wait); + lockup_detector_queue_work(); + flush_work(&detector_work); return 0; } late_initcall_sync(lockup_detector_check); @@ -902,10 +903,8 @@ void __init lockup_detector_init(void) ret = watchdog_nmi_probe(); if (!ret) nmi_watchdog_available = true; - else if (ret == -EBUSY) { + else if (ret == -EBUSY) lockup_detector_pending_init = true; - queue_work_on(smp_processor_id(), system_wq, &detector_work); - } lockup_detector_setup(); watchdog_sysctl_init();
On Sat 2022-03-19 16:18:22, Lecopzer Chen wrote: > > On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote: > > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready > > > yet. E.g. on arm64, PMU is not ready until > > > device_initcall(armv8_pmu_driver_init). And it is deeply integrated > > > with the driver model and cpuhp. Hence it is hard to push this > > > initialization before smp_init(). > > > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void) > > > #define watchdog_sysctl_init() do { } while (0) > > > #endif /* CONFIG_SYSCTL */ > > > > > > +static void lockup_detector_delay_init(struct work_struct *work); > > > +bool lockup_detector_pending_init __initdata; > > > + > > > +struct wait_queue_head hld_detector_wait __initdata = > > > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); > > > + > > > +static struct work_struct detector_work __initdata = > > > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); > > > + > > > +static void __init lockup_detector_delay_init(struct work_struct *work) > > > +{ > > > + int ret; > > > + > > > + wait_event(hld_detector_wait, > > > + lockup_detector_pending_init == false); > > > + > > > + /* > > > + * Here, we know the PMU should be ready, so set pending to true to > > > + * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again. > > > + */ > > > + lockup_detector_pending_init = true; > > > > This does not make sense to me. We are here only when: > > > > 1. lockup_detector_init() queued this work. > > > > 2. Someone cleared @lockup_detector_pending_init and woke the > > worker via wait_queue. IT might be either PMU init code > > or the late lockup_detector_check(). > > > > watchdog_nmi_probe() might still return -EBUSY when PMU init failed. > > > > If you wanted to try the delayed probe once again (3rd attempt) from > > lockup_detector_check(), you would need to queue the work once again. > > But you need to be sure that lockup_detector_check() was not called > > yet. Otherwise, the 2nd work might wait forewer. > > > > IMHO, it is not worth the complexity. > > The original assumption is: nobody should use delayed probe after > lockup_detector_check() (which has __init attribute). Good point. It makes perfect sense. But it was not mentioned anywhere. And the code did not work this way. > > That is, everything including PMU and delayed probe of lock detector must > finsh before do_initcalls() which means delayed probe can't support with > external PMU module init. > > Also, > 1. lockup_detector_check is registered with late_initcall_sync(), so it'd > be called in the last order of do_initcalls()). > > 2. watchdog_nmi_probe() and all the delayed relative functions and variables > have __init attribute, no one should ever use it after __init section > is released. > > The only case is PMU probe function is also late_initcall_sync(). This is the case for PMU. The API for delayed init is generic a should be safe even for other users. > How about this one: > 1. Wrap the wake_up code to reduce the complexity for user side. > > 2. Remove wait queue. > Instead queue work when lockup_detector_init(), queue the delayed > probe work when arch PMU code finish probe. > > and the flow turns to > > 1. lockup_detector_init() get -EBUSY, set lockup_detector_pending_init=true > > 2. PMU arch code init done, call lockup_detector_queue_work(). > > 3. lockup_detector_queue_work() queue the work only when > lockup_detector_pending_init=true which means nobody should call > this before lockup_detector_init(). > > 4. the work lockup_detector_delay_init() is doing without wait event. > if probe success, set lockup_detector_pending_init=false. > > 5. at late_initcall_sync(), lockup_detector_check() call flush_work() first > to avoid previous lockup_detector_queue_work() is not scheduled. > And then test whether lockup_detector_pending_init is false, if it's > true, means we have pending init un-finished, than forcely queue work > again and flush_work to make sure the __init section won't be freed > before the work done. Nice, I like it. > This remove the complexity of wait event which we were disscussed. > The draft of the diff code(diff with this series) shows below. > > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > index 77eaefee13ea..c776618fbfa8 100644 > --- a/arch/arm64/kernel/perf_event.c > +++ b/arch/arm64/kernel/perf_event.c > @@ -1388,9 +1388,7 @@ static int __init armv8_pmu_driver_init(void) > else > ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); > > - /* Inform watchdog core we are ready to probe hld by delayed init. */ > - lockup_detector_pending_init = false; > - wake_up(&hld_detector_wait); > + lockup_detector_queue_work(); The name is strange. The fact that it uses workqueues is an implementation detail. I would call it retry_lockup_detector_init() so that it is more obvious what it does. > return ret; > } > device_initcall(armv8_pmu_driver_init) > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -876,15 +865,27 @@ static void __init lockup_detector_delay_init(struct work_struct *work) > lockup_detector_pending_init = false; > } > > +/* Must call after lockup_detector_init() that we do need delayed probe */ > +void __init lockup_detector_queue_work(void) > +{ > + if (!lockup_detector_pending_init) > + return; > + > + queue_work_on(__smp_processor_id(), system_wq, &detector_work); > +} > + > /* Ensure the check is called after the initialization of PMU driver */ > static int __init lockup_detector_check(void) > { > + /* Make sure no work is pending. */ > + flush_work(&detector_work); > + > if (!lockup_detector_pending_init) > return 0; > > pr_info("Delayed init checking failed, retry for once.\n"); > - lockup_detector_pending_init = false; > - wake_up(&hld_detector_wait); > + lockup_detector_queue_work(); I would do here lockup_detector_pending_init = false; to make sure that lockup_detector_queue_work() will not longer queue the work after the final flush. Maybe, we could rename the variable to allow_lockup_detector_init_retry. > + flush_work(&detector_work); > > return 0; > } > late_initcall_sync(lockup_detector_check); Best Regards, Petr
> On Sat 2022-03-19 16:18:22, Lecopzer Chen wrote: > > > On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote: > > > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready > > > > yet. E.g. on arm64, PMU is not ready until > > > > device_initcall(armv8_pmu_driver_init). And it is deeply integrated > > > > with the driver model and cpuhp. Hence it is hard to push this > > > > initialization before smp_init(). > > > > > > > --- a/kernel/watchdog.c > > > > +++ b/kernel/watchdog.c > > > > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void) > > > > #define watchdog_sysctl_init() do { } while (0) > > > > #endif /* CONFIG_SYSCTL */ > > > > > > > > +static void lockup_detector_delay_init(struct work_struct *work); > > > > +bool lockup_detector_pending_init __initdata; > > > > + > > > > +struct wait_queue_head hld_detector_wait __initdata = > > > > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); > > > > + > > > > +static struct work_struct detector_work __initdata = > > > > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); > > > > + > > > > +static void __init lockup_detector_delay_init(struct work_struct *work) > > > > +{ > > > > + int ret; > > > > + > > > > + wait_event(hld_detector_wait, > > > > + lockup_detector_pending_init == false); > > > > + > > > > + /* > > > > + * Here, we know the PMU should be ready, so set pending to true to > > > > + * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again. > > > > + */ > > > > + lockup_detector_pending_init = true; > > > > > > This does not make sense to me. We are here only when: > > > > > > 1. lockup_detector_init() queued this work. > > > > > > 2. Someone cleared @lockup_detector_pending_init and woke the > > > worker via wait_queue. IT might be either PMU init code > > > or the late lockup_detector_check(). > > > > > > watchdog_nmi_probe() might still return -EBUSY when PMU init failed. > > > > > > If you wanted to try the delayed probe once again (3rd attempt) from > > > lockup_detector_check(), you would need to queue the work once again. > > > But you need to be sure that lockup_detector_check() was not called > > > yet. Otherwise, the 2nd work might wait forewer. > > > > > > IMHO, it is not worth the complexity. > > > > The original assumption is: nobody should use delayed probe after > > lockup_detector_check() (which has __init attribute). > > Good point. It makes perfect sense. > > But it was not mentioned anywhere. And the code did not work this way. > > > > > That is, everything including PMU and delayed probe of lock detector must > > finsh before do_initcalls() which means delayed probe can't support with > > external PMU module init. > > > > Also, > > 1. lockup_detector_check is registered with late_initcall_sync(), so it'd > > be called in the last order of do_initcalls()). > > > > 2. watchdog_nmi_probe() and all the delayed relative functions and variables > > have __init attribute, no one should ever use it after __init section > > is released. > > > > The only case is PMU probe function is also late_initcall_sync(). > > This is the case for PMU. The API for delayed init is generic a should > be safe even for other users. > I think this can be fixed after the suggestion provied by you below. Set lockup_detector_pending_init=false at the end of lockup_detector_check(). So nobody after lockup_detector_check() can ever queue another work. > > > How about this one: > > 1. Wrap the wake_up code to reduce the complexity for user side. > > > > 2. Remove wait queue. > > Instead queue work when lockup_detector_init(), queue the delayed > > probe work when arch PMU code finish probe. > > > > and the flow turns to > > > > 1. lockup_detector_init() get -EBUSY, set lockup_detector_pending_init=true > > > > 2. PMU arch code init done, call lockup_detector_queue_work(). > > > > 3. lockup_detector_queue_work() queue the work only when > > lockup_detector_pending_init=true which means nobody should call > > this before lockup_detector_init(). > > > > 4. the work lockup_detector_delay_init() is doing without wait event. > > if probe success, set lockup_detector_pending_init=false. > > > > 5. at late_initcall_sync(), lockup_detector_check() call flush_work() first > > to avoid previous lockup_detector_queue_work() is not scheduled. > > And then test whether lockup_detector_pending_init is false, if it's > > true, means we have pending init un-finished, than forcely queue work > > again and flush_work to make sure the __init section won't be freed > > before the work done. > > Nice, I like it. > > > This remove the complexity of wait event which we were disscussed. > > The draft of the diff code(diff with this series) shows below. > > > > > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > > index 77eaefee13ea..c776618fbfa8 100644 > > --- a/arch/arm64/kernel/perf_event.c > > +++ b/arch/arm64/kernel/perf_event.c > > @@ -1388,9 +1388,7 @@ static int __init armv8_pmu_driver_init(void) > > else > > ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); > > > > - /* Inform watchdog core we are ready to probe hld by delayed init. */ > > - lockup_detector_pending_init = false; > > - wake_up(&hld_detector_wait); > > + lockup_detector_queue_work(); > > The name is strange. The fact that it uses workqueues is an > implementation detail. I would call it > retry_lockup_detector_init() so that it is more obvious what it does. Okay, I don't have a good taste in naming. I'll provide next version patches including this. > > > return ret; > > } > > device_initcall(armv8_pmu_driver_init) > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -876,15 +865,27 @@ static void __init lockup_detector_delay_init(struct work_struct *work) > > lockup_detector_pending_init = false; > > } > > > > +/* Must call after lockup_detector_init() that we do need delayed probe */ > > +void __init lockup_detector_queue_work(void) > > +{ > > + if (!lockup_detector_pending_init) > > + return; > > + > > + queue_work_on(__smp_processor_id(), system_wq, &detector_work); > > +} > > + > > /* Ensure the check is called after the initialization of PMU driver */ > > static int __init lockup_detector_check(void) > > { > > + /* Make sure no work is pending. */ > > + flush_work(&detector_work); > > + > > if (!lockup_detector_pending_init) > > return 0; > > > > pr_info("Delayed init checking failed, retry for once.\n"); > > - lockup_detector_pending_init = false; > > - wake_up(&hld_detector_wait); > > + lockup_detector_queue_work(); > > I would do here > > lockup_detector_pending_init = false; > > to make sure that lockup_detector_queue_work() will not longer > queue the work after the final flush. > > Maybe, we could rename the variable to allow_lockup_detector_init_retry. Okay, I'm prepareing the next version patches, I'll include in it thanks a lot for all of the suggestion BRs, Lecopzer
diff --git a/include/linux/nmi.h b/include/linux/nmi.h index b7bcd63c36b4..cc7df31be9db 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -118,6 +118,9 @@ static inline int hardlockup_detector_perf_init(void) { return 0; } void watchdog_nmi_stop(void); void watchdog_nmi_start(void); + +extern bool lockup_detector_pending_init; +extern struct wait_queue_head hld_detector_wait; int watchdog_nmi_probe(void); void watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index b71d434cf648..49bdcaf5bd8f 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu) hardlockup_detector_perf_disable(); } -/* Return 0, if a NMI watchdog is available. Error code otherwise */ +/* + * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not + * ready, and arch code should wake up hld_detector_wait when ready. Other + * negative value if not support. + */ int __weak __init watchdog_nmi_probe(void) { return hardlockup_detector_perf_init(); @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void) #define watchdog_sysctl_init() do { } while (0) #endif /* CONFIG_SYSCTL */ +static void lockup_detector_delay_init(struct work_struct *work); +bool lockup_detector_pending_init __initdata; + +struct wait_queue_head hld_detector_wait __initdata = + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); + +static struct work_struct detector_work __initdata = + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); + +static void __init lockup_detector_delay_init(struct work_struct *work) +{ + int ret; + + wait_event(hld_detector_wait, + lockup_detector_pending_init == false); + + /* + * Here, we know the PMU should be ready, so set pending to true to + * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again. + */ + lockup_detector_pending_init = true; + ret = watchdog_nmi_probe(); + if (ret) { + pr_info("Delayed init of the lockup detector failed: %d\n", ret); + pr_info("Perf NMI watchdog permanently disabled\n"); + return; + } + + nmi_watchdog_available = true; + lockup_detector_setup(); + lockup_detector_pending_init = false; +} + +/* Ensure the check is called after the initialization of PMU driver */ +static int __init lockup_detector_check(void) +{ + if (!lockup_detector_pending_init) + return 0; + + pr_info("Delayed init checking failed, retry for once.\n"); + lockup_detector_pending_init = false; + wake_up(&hld_detector_wait); + return 0; +} +late_initcall_sync(lockup_detector_check); + void __init lockup_detector_init(void) { + int ret; + if (tick_nohz_full_enabled()) pr_info("Disabling watchdog on nohz_full cores by default\n"); cpumask_copy(&watchdog_cpumask, housekeeping_cpumask(HK_FLAG_TIMER)); - if (!watchdog_nmi_probe()) + ret = watchdog_nmi_probe(); + if (!ret) nmi_watchdog_available = true; + else if (ret == -EBUSY) { + lockup_detector_pending_init = true; + queue_work_on(smp_processor_id(), system_wq, &detector_work); + } + lockup_detector_setup(); watchdog_sysctl_init(); }