diff mbox series

[v2,4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model

Message ID 20220307154729.13477-5-lecopzer.chen@mediatek.com (mailing list archive)
State New, archived
Headers show
Series Suppot hld based on Pseudo-NMI for arm64 | expand

Commit Message

Lecopzer Chen March 7, 2022, 3:47 p.m. UTC
When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.

The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.

Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/nmi.h |  3 +++
 kernel/watchdog.c   | 62 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 2 deletions(-)

Comments

Petr Mladek March 18, 2022, 10:40 a.m. UTC | #1
On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().

The above is clear.

> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
> 
> The async model is achieved by expanding watchdog_nmi_probe() with
> -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.

These two paragraphs are a bit confusing to me. It might be just a
problem with translation. I am not a native speaker. Anyway, I wonder
if the following is more clear:

<proposal>
But it is easy to take an opposite approach and try to initialize
the watchdog once again later.

The delayed probe is called using workqueues. It need to allocate
memory and must be proceed in a normal context.

The delayed probe is queued only when the early one returns -EBUSY.
It is the return code returned when PMU is not ready yet.
</proposal>

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
>  	hardlockup_detector_perf_disable();
>  }
>  
> -/* Return 0, if a NMI watchdog is available. Error code otherwise */
> +/*
> + * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not
> + * ready, and arch code should wake up hld_detector_wait when ready. Other
> + * negative value if not support.
> + */

I wonder if the following is slightly more clear:

 /*
 * Arch specific API.
 *
 * Return 0 when NMI watchdog is available, negative value otherwise.
 * The error code -EBUSY is special. It means that a deferred probe
 * might succeed later.
 */

>  int __weak __init watchdog_nmi_probe(void)
>  {
>  	return hardlockup_detector_perf_init();
> @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void)
>  #define watchdog_sysctl_init() do { } while (0)
>  #endif /* CONFIG_SYSCTL */
>  
> +static void lockup_detector_delay_init(struct work_struct *work);
> +bool lockup_detector_pending_init __initdata;
> +
> +struct wait_queue_head hld_detector_wait __initdata =
> +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> +
> +static struct work_struct detector_work __initdata =
> +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> +
> +static void __init lockup_detector_delay_init(struct work_struct *work)
> +{
> +	int ret;
> +
> +	wait_event(hld_detector_wait,
> +			lockup_detector_pending_init == false);
> +
> +	/*
> +	 * Here, we know the PMU should be ready, so set pending to true to
> +	 * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
> +	 */
> +	lockup_detector_pending_init = true;

This does not make sense to me. We are here only when:

   1. lockup_detector_init() queued this work.

   2. Someone cleared @lockup_detector_pending_init and woke the
      worker via wait_queue. IT might be either PMU init code
      or the late lockup_detector_check().

watchdog_nmi_probe() might still return -EBUSY when PMU init failed.

If you wanted to try the delayed probe once again (3rd attempt) from
lockup_detector_check(), you would need to queue the work once again.
But you need to be sure that lockup_detector_check() was not called
yet. Otherwise, the 2nd work might wait forewer.

IMHO, it is not worth the complexity.

> +	ret = watchdog_nmi_probe();
> +	if (ret) {
> +		pr_info("Delayed init of the lockup detector failed: %d\n", ret);
> +		pr_info("Perf NMI watchdog permanently disabled\n");
> +		return;
> +	}
> +
> +	nmi_watchdog_available = true;
> +	lockup_detector_setup();
> +	lockup_detector_pending_init = false;
> +}

Otherwise, it looks good to me.

Best Regards,
Petr
Lecopzer Chen March 19, 2022, 8:18 a.m. UTC | #2
> On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> 
> The above is clear.
> 
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> > 
> > The async model is achieved by expanding watchdog_nmi_probe() with
> > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> 
> These two paragraphs are a bit confusing to me. It might be just a
> problem with translation. I am not a native speaker. Anyway, I wonder
> if the following is more clear:
> 
> <proposal>
> But it is easy to take an opposite approach and try to initialize
> the watchdog once again later.
> 
> The delayed probe is called using workqueues. It need to allocate
> memory and must be proceed in a normal context.
> 
> The delayed probe is queued only when the early one returns -EBUSY.
> It is the return code returned when PMU is not ready yet.
> </proposal>

Of course, the original description only briefly told us the functionality.
So I think it makes sense to explain it if anyone think it's unclear.
Also I'm not native speaker either, but I don't feel anything weird in the
description.

I'll use the description you provided, thanks.


> 
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
> >  	hardlockup_detector_perf_disable();
> >  }
> >  
> > -/* Return 0, if a NMI watchdog is available. Error code otherwise */
> > +/*
> > + * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not
> > + * ready, and arch code should wake up hld_detector_wait when ready. Other
> > + * negative value if not support.
> > + */
> 
> I wonder if the following is slightly more clear:
> 
>  /*
>  * Arch specific API.
>  *
>  * Return 0 when NMI watchdog is available, negative value otherwise.
>  * The error code -EBUSY is special. It means that a deferred probe
>  * might succeed later.
>  */
> 

Yes this should be more clear.
Abstract `hld_detector_wait` with `deferred probe` is a good idea.

Thanks, I'll take this.

> >  int __weak __init watchdog_nmi_probe(void)
> >  {
> >  	return hardlockup_detector_perf_init();
> > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void)
> >  #define watchdog_sysctl_init() do { } while (0)
> >  #endif /* CONFIG_SYSCTL */
> >  
> > +static void lockup_detector_delay_init(struct work_struct *work);
> > +bool lockup_detector_pending_init __initdata;
> > +
> > +struct wait_queue_head hld_detector_wait __initdata =
> > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > +
> > +static struct work_struct detector_work __initdata =
> > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > +
> > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > +{
> > +	int ret;
> > +
> > +	wait_event(hld_detector_wait,
> > +			lockup_detector_pending_init == false);
> > +
> > +	/*
> > +	 * Here, we know the PMU should be ready, so set pending to true to
> > +	 * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
> > +	 */
> > +	lockup_detector_pending_init = true;
> 
> This does not make sense to me. We are here only when:
> 
>    1. lockup_detector_init() queued this work.
> 
>    2. Someone cleared @lockup_detector_pending_init and woke the
>       worker via wait_queue. IT might be either PMU init code
>       or the late lockup_detector_check().
> 
> watchdog_nmi_probe() might still return -EBUSY when PMU init failed.
> 
> If you wanted to try the delayed probe once again (3rd attempt) from
> lockup_detector_check(), you would need to queue the work once again.
> But you need to be sure that lockup_detector_check() was not called
> yet. Otherwise, the 2nd work might wait forewer.
> 
> IMHO, it is not worth the complexity.

The original assumption is: nobody should use delayed probe after
lockup_detector_check() (which has __init attribute).


That is, everything including PMU and delayed probe of lock detector must
finsh before do_initcalls() which means delayed probe can't support with
external PMU module init.

Also,
  1. lockup_detector_check is registered with late_initcall_sync(), so it'd
     be called in the last order of do_initcalls()).

  2. watchdog_nmi_probe() and all the delayed relative functions and variables
     have __init attribute, no one should ever use it after __init section
     is released.

The only case is PMU probe function is also late_initcall_sync().


How about this one:
  1. Wrap the wake_up code to reduce the complexity for user side.

  2. Remove wait queue.
     Instead queue work when lockup_detector_init(), queue the delayed
     probe work when arch PMU code finish probe.

and the flow turns to

  1. lockup_detector_init() get -EBUSY, set lockup_detector_pending_init=true

  2. PMU arch code init done, call lockup_detector_queue_work().

  3. lockup_detector_queue_work() queue the work only when
     lockup_detector_pending_init=true which means nobody should call
     this before lockup_detector_init().

  4. the work lockup_detector_delay_init() is doing without wait event.
     if probe success, set lockup_detector_pending_init=false.

  5. at late_initcall_sync(), lockup_detector_check() call flush_work() first
     to avoid previous lockup_detector_queue_work() is not scheduled.
     And then test whether lockup_detector_pending_init is false, if it's
     true, means we have pending init un-finished, than forcely queue work
     again and flush_work to make sure the __init section won't be freed
     before the work done.
 
This remove the complexity of wait event which we were disscussed.
The draft of the diff code(diff with this series) shows below.


diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 77eaefee13ea..c776618fbfa8 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1388,9 +1388,7 @@ static int __init armv8_pmu_driver_init(void)
 	else
 		ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
 
-	/* Inform watchdog core we are ready to probe hld by delayed init. */
-	lockup_detector_pending_init = false;
-	wake_up(&hld_detector_wait);
+	lockup_detector_queue_work();
 	return ret;
 }
 device_initcall(armv8_pmu_driver_init)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index cc7df31be9db..98060a86fac6 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -120,7 +120,7 @@ void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 
 extern bool lockup_detector_pending_init;
-extern struct wait_queue_head hld_detector_wait;
+void lockup_detector_queue_work(void);
 int watchdog_nmi_probe(void);
 void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 49bdcaf5bd8f..acaa9f3ac162 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -846,9 +846,6 @@ static void __init watchdog_sysctl_init(void)
 static void lockup_detector_delay_init(struct work_struct *work);
 bool lockup_detector_pending_init __initdata;
 
-struct wait_queue_head hld_detector_wait __initdata =
-		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
-
 static struct work_struct detector_work __initdata =
 		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
 
@@ -856,14 +853,6 @@ static void __init lockup_detector_delay_init(struct work_struct *work)
 {
 	int ret;
 
-	wait_event(hld_detector_wait,
-			lockup_detector_pending_init == false);
-
-	/*
-	 * Here, we know the PMU should be ready, so set pending to true to
-	 * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
-	 */
-	lockup_detector_pending_init = true;
 	ret = watchdog_nmi_probe();
 	if (ret) {
 		pr_info("Delayed init of the lockup detector failed: %d\n", ret);
@@ -876,15 +865,27 @@ static void __init lockup_detector_delay_init(struct work_struct *work)
 	lockup_detector_pending_init = false;
 }
 
+/* Must call after lockup_detector_init() that we do need delayed probe */
+void __init lockup_detector_queue_work(void)
+{
+	if (!lockup_detector_pending_init)
+		return;
+
+	queue_work_on(__smp_processor_id(), system_wq, &detector_work);
+}
+
 /* Ensure the check is called after the initialization of PMU driver */
 static int __init lockup_detector_check(void)
 {
+	/* Make sure no work is pending. */
+	flush_work(&detector_work);
+
 	if (!lockup_detector_pending_init)
 		return 0;
 
 	pr_info("Delayed init checking failed, retry for once.\n");
-	lockup_detector_pending_init = false;
-	wake_up(&hld_detector_wait);
+	lockup_detector_queue_work();
+	flush_work(&detector_work);
 	return 0;
 }
 late_initcall_sync(lockup_detector_check);
@@ -902,10 +903,8 @@ void __init lockup_detector_init(void)
 	ret = watchdog_nmi_probe();
 	if (!ret)
 		nmi_watchdog_available = true;
-	else if (ret == -EBUSY) {
+	else if (ret == -EBUSY)
 		lockup_detector_pending_init = true;
-		queue_work_on(smp_processor_id(), system_wq, &detector_work);
-	}
 
 	lockup_detector_setup();
 	watchdog_sysctl_init();
Petr Mladek March 21, 2022, 5:37 p.m. UTC | #3
On Sat 2022-03-19 16:18:22, Lecopzer Chen wrote:
> > On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote:
> > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > yet. E.g. on arm64, PMU is not ready until
> > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > with the driver model and cpuhp. Hence it is hard to push this
> > > initialization before smp_init().
> > 
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void)
> > >  #define watchdog_sysctl_init() do { } while (0)
> > >  #endif /* CONFIG_SYSCTL */
> > >  
> > > +static void lockup_detector_delay_init(struct work_struct *work);
> > > +bool lockup_detector_pending_init __initdata;
> > > +
> > > +struct wait_queue_head hld_detector_wait __initdata =
> > > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > > +
> > > +static struct work_struct detector_work __initdata =
> > > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > > +
> > > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > > +{
> > > +	int ret;
> > > +
> > > +	wait_event(hld_detector_wait,
> > > +			lockup_detector_pending_init == false);
> > > +
> > > +	/*
> > > +	 * Here, we know the PMU should be ready, so set pending to true to
> > > +	 * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
> > > +	 */
> > > +	lockup_detector_pending_init = true;
> > 
> > This does not make sense to me. We are here only when:
> > 
> >    1. lockup_detector_init() queued this work.
> > 
> >    2. Someone cleared @lockup_detector_pending_init and woke the
> >       worker via wait_queue. IT might be either PMU init code
> >       or the late lockup_detector_check().
> > 
> > watchdog_nmi_probe() might still return -EBUSY when PMU init failed.
> > 
> > If you wanted to try the delayed probe once again (3rd attempt) from
> > lockup_detector_check(), you would need to queue the work once again.
> > But you need to be sure that lockup_detector_check() was not called
> > yet. Otherwise, the 2nd work might wait forewer.
> > 
> > IMHO, it is not worth the complexity.
> 
> The original assumption is: nobody should use delayed probe after
> lockup_detector_check() (which has __init attribute).

Good point. It makes perfect sense.

But it was not mentioned anywhere. And the code did not work this way.

> 
> That is, everything including PMU and delayed probe of lock detector must
> finsh before do_initcalls() which means delayed probe can't support with
> external PMU module init.
> 
> Also,
>   1. lockup_detector_check is registered with late_initcall_sync(), so it'd
>      be called in the last order of do_initcalls()).
> 
>   2. watchdog_nmi_probe() and all the delayed relative functions and variables
>      have __init attribute, no one should ever use it after __init section
>      is released.
> 
> The only case is PMU probe function is also late_initcall_sync().

This is the case for PMU. The API for delayed init is generic a should
be safe even for other users.


> How about this one:
>   1. Wrap the wake_up code to reduce the complexity for user side.
> 
>   2. Remove wait queue.
>      Instead queue work when lockup_detector_init(), queue the delayed
>      probe work when arch PMU code finish probe.
> 
> and the flow turns to
> 
>   1. lockup_detector_init() get -EBUSY, set lockup_detector_pending_init=true
> 
>   2. PMU arch code init done, call lockup_detector_queue_work().
> 
>   3. lockup_detector_queue_work() queue the work only when
>      lockup_detector_pending_init=true which means nobody should call
>      this before lockup_detector_init().
> 
>   4. the work lockup_detector_delay_init() is doing without wait event.
>      if probe success, set lockup_detector_pending_init=false.
> 
>   5. at late_initcall_sync(), lockup_detector_check() call flush_work() first
>      to avoid previous lockup_detector_queue_work() is not scheduled.
>      And then test whether lockup_detector_pending_init is false, if it's
>      true, means we have pending init un-finished, than forcely queue work
>      again and flush_work to make sure the __init section won't be freed
>      before the work done.

Nice, I like it.

> This remove the complexity of wait event which we were disscussed.
> The draft of the diff code(diff with this series) shows below.
> 
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 77eaefee13ea..c776618fbfa8 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1388,9 +1388,7 @@ static int __init armv8_pmu_driver_init(void)
>  	else
>  		ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
>  
> -	/* Inform watchdog core we are ready to probe hld by delayed init. */
> -	lockup_detector_pending_init = false;
> -	wake_up(&hld_detector_wait);
> +	lockup_detector_queue_work();

The name is strange. The fact that it uses workqueues is an
implementation detail. I would call it
retry_lockup_detector_init() so that it is more obvious what it does.

>  	return ret;
>  }
>  device_initcall(armv8_pmu_driver_init)
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -876,15 +865,27 @@ static void __init lockup_detector_delay_init(struct work_struct *work)
>  	lockup_detector_pending_init = false;
>  }
>  
> +/* Must call after lockup_detector_init() that we do need delayed probe */
> +void __init lockup_detector_queue_work(void)
> +{
> +	if (!lockup_detector_pending_init)
> +		return;
> +
> +	queue_work_on(__smp_processor_id(), system_wq, &detector_work);
> +}
> +
>  /* Ensure the check is called after the initialization of PMU driver */
>  static int __init lockup_detector_check(void)
>  {
> +	/* Make sure no work is pending. */
> +	flush_work(&detector_work);
> +
>  	if (!lockup_detector_pending_init)
>  		return 0;
>  
>  	pr_info("Delayed init checking failed, retry for once.\n");
> -	lockup_detector_pending_init = false;
> -	wake_up(&hld_detector_wait);
> +	lockup_detector_queue_work();

I would do here

	lockup_detector_pending_init = false;

to make sure that lockup_detector_queue_work() will not longer
queue the work after the final flush.

Maybe, we could rename the variable to allow_lockup_detector_init_retry.

> +	flush_work(&detector_work);
>
>	return 0;
>  }
>  late_initcall_sync(lockup_detector_check);

Best Regards,
Petr
Lecopzer Chen March 24, 2022, 12:55 p.m. UTC | #4
> On Sat 2022-03-19 16:18:22, Lecopzer Chen wrote:
> > > On Mon 2022-03-07 23:47:28, Lecopzer Chen wrote:
> > > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > > yet. E.g. on arm64, PMU is not ready until
> > > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > > with the driver model and cpuhp. Hence it is hard to push this
> > > > initialization before smp_init().
> > > 
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > @@ -839,16 +843,70 @@ static void __init watchdog_sysctl_init(void)
> > > >  #define watchdog_sysctl_init() do { } while (0)
> > > >  #endif /* CONFIG_SYSCTL */
> > > >  
> > > > +static void lockup_detector_delay_init(struct work_struct *work);
> > > > +bool lockup_detector_pending_init __initdata;
> > > > +
> > > > +struct wait_queue_head hld_detector_wait __initdata =
> > > > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > > > +
> > > > +static struct work_struct detector_work __initdata =
> > > > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > > > +
> > > > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	wait_event(hld_detector_wait,
> > > > +			lockup_detector_pending_init == false);
> > > > +
> > > > +	/*
> > > > +	 * Here, we know the PMU should be ready, so set pending to true to
> > > > +	 * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
> > > > +	 */
> > > > +	lockup_detector_pending_init = true;
> > > 
> > > This does not make sense to me. We are here only when:
> > > 
> > >    1. lockup_detector_init() queued this work.
> > > 
> > >    2. Someone cleared @lockup_detector_pending_init and woke the
> > >       worker via wait_queue. IT might be either PMU init code
> > >       or the late lockup_detector_check().
> > > 
> > > watchdog_nmi_probe() might still return -EBUSY when PMU init failed.
> > > 
> > > If you wanted to try the delayed probe once again (3rd attempt) from
> > > lockup_detector_check(), you would need to queue the work once again.
> > > But you need to be sure that lockup_detector_check() was not called
> > > yet. Otherwise, the 2nd work might wait forewer.
> > > 
> > > IMHO, it is not worth the complexity.
> > 
> > The original assumption is: nobody should use delayed probe after
> > lockup_detector_check() (which has __init attribute).
> 
> Good point. It makes perfect sense.
> 
> But it was not mentioned anywhere. And the code did not work this way.
> 
> > 
> > That is, everything including PMU and delayed probe of lock detector must
> > finsh before do_initcalls() which means delayed probe can't support with
> > external PMU module init.
> > 
> > Also,
> >   1. lockup_detector_check is registered with late_initcall_sync(), so it'd
> >      be called in the last order of do_initcalls()).
> > 
> >   2. watchdog_nmi_probe() and all the delayed relative functions and variables
> >      have __init attribute, no one should ever use it after __init section
> >      is released.
> > 
> > The only case is PMU probe function is also late_initcall_sync().
> 
> This is the case for PMU. The API for delayed init is generic a should
> be safe even for other users.
> 

I think this can be fixed after the suggestion provied by you below.
Set lockup_detector_pending_init=false at the end of lockup_detector_check().
So nobody after lockup_detector_check() can ever queue another work.

> 
> > How about this one:
> >   1. Wrap the wake_up code to reduce the complexity for user side.
> > 
> >   2. Remove wait queue.
> >      Instead queue work when lockup_detector_init(), queue the delayed
> >      probe work when arch PMU code finish probe.
> > 
> > and the flow turns to
> > 
> >   1. lockup_detector_init() get -EBUSY, set lockup_detector_pending_init=true
> > 
> >   2. PMU arch code init done, call lockup_detector_queue_work().
> > 
> >   3. lockup_detector_queue_work() queue the work only when
> >      lockup_detector_pending_init=true which means nobody should call
> >      this before lockup_detector_init().
> > 
> >   4. the work lockup_detector_delay_init() is doing without wait event.
> >      if probe success, set lockup_detector_pending_init=false.
> > 
> >   5. at late_initcall_sync(), lockup_detector_check() call flush_work() first
> >      to avoid previous lockup_detector_queue_work() is not scheduled.
> >      And then test whether lockup_detector_pending_init is false, if it's
> >      true, means we have pending init un-finished, than forcely queue work
> >      again and flush_work to make sure the __init section won't be freed
> >      before the work done.
> 
> Nice, I like it.
> 
> > This remove the complexity of wait event which we were disscussed.
> > The draft of the diff code(diff with this series) shows below.
> > 
> > 
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index 77eaefee13ea..c776618fbfa8 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -1388,9 +1388,7 @@ static int __init armv8_pmu_driver_init(void)
> >  	else
> >  		ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
> >  
> > -	/* Inform watchdog core we are ready to probe hld by delayed init. */
> > -	lockup_detector_pending_init = false;
> > -	wake_up(&hld_detector_wait);
> > +	lockup_detector_queue_work();
> 
> The name is strange. The fact that it uses workqueues is an
> implementation detail. I would call it
> retry_lockup_detector_init() so that it is more obvious what it does.

Okay, I don't have a good taste in naming.
I'll provide next version patches including this.


> 
> >  	return ret;
> >  }
> >  device_initcall(armv8_pmu_driver_init)
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -876,15 +865,27 @@ static void __init lockup_detector_delay_init(struct work_struct *work)
> >  	lockup_detector_pending_init = false;
> >  }
> >  
> > +/* Must call after lockup_detector_init() that we do need delayed probe */
> > +void __init lockup_detector_queue_work(void)
> > +{
> > +	if (!lockup_detector_pending_init)
> > +		return;
> > +
> > +	queue_work_on(__smp_processor_id(), system_wq, &detector_work);
> > +}
> > +
> >  /* Ensure the check is called after the initialization of PMU driver */
> >  static int __init lockup_detector_check(void)
> >  {
> > +	/* Make sure no work is pending. */
> > +	flush_work(&detector_work);
> > +
> >  	if (!lockup_detector_pending_init)
> >  		return 0;
> >  
> >  	pr_info("Delayed init checking failed, retry for once.\n");
> > -	lockup_detector_pending_init = false;
> > -	wake_up(&hld_detector_wait);
> > +	lockup_detector_queue_work();
> 
> I would do here
> 
> 	lockup_detector_pending_init = false;
> 
> to make sure that lockup_detector_queue_work() will not longer
> queue the work after the final flush.
> 
> Maybe, we could rename the variable to allow_lockup_detector_init_retry.

Okay, I'm prepareing the next version patches, I'll include in it

thanks a lot for all of the suggestion


BRs,
Lecopzer
diff mbox series

Patch

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b7bcd63c36b4..cc7df31be9db 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -118,6 +118,9 @@  static inline int hardlockup_detector_perf_init(void) { return 0; }
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
+
+extern bool lockup_detector_pending_init;
+extern struct wait_queue_head hld_detector_wait;
 int watchdog_nmi_probe(void);
 void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b71d434cf648..49bdcaf5bd8f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -103,7 +103,11 @@  void __weak watchdog_nmi_disable(unsigned int cpu)
 	hardlockup_detector_perf_disable();
 }
 
-/* Return 0, if a NMI watchdog is available. Error code otherwise */
+/*
+ * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not
+ * ready, and arch code should wake up hld_detector_wait when ready. Other
+ * negative value if not support.
+ */
 int __weak __init watchdog_nmi_probe(void)
 {
 	return hardlockup_detector_perf_init();
@@ -839,16 +843,70 @@  static void __init watchdog_sysctl_init(void)
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
+static void lockup_detector_delay_init(struct work_struct *work);
+bool lockup_detector_pending_init __initdata;
+
+struct wait_queue_head hld_detector_wait __initdata =
+		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
+
+static struct work_struct detector_work __initdata =
+		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
+
+static void __init lockup_detector_delay_init(struct work_struct *work)
+{
+	int ret;
+
+	wait_event(hld_detector_wait,
+			lockup_detector_pending_init == false);
+
+	/*
+	 * Here, we know the PMU should be ready, so set pending to true to
+	 * inform watchdog_nmi_probe() that it shouldn't return -EBUSY again.
+	 */
+	lockup_detector_pending_init = true;
+	ret = watchdog_nmi_probe();
+	if (ret) {
+		pr_info("Delayed init of the lockup detector failed: %d\n", ret);
+		pr_info("Perf NMI watchdog permanently disabled\n");
+		return;
+	}
+
+	nmi_watchdog_available = true;
+	lockup_detector_setup();
+	lockup_detector_pending_init = false;
+}
+
+/* Ensure the check is called after the initialization of PMU driver */
+static int __init lockup_detector_check(void)
+{
+	if (!lockup_detector_pending_init)
+		return 0;
+
+	pr_info("Delayed init checking failed, retry for once.\n");
+	lockup_detector_pending_init = false;
+	wake_up(&hld_detector_wait);
+	return 0;
+}
+late_initcall_sync(lockup_detector_check);
+
 void __init lockup_detector_init(void)
 {
+	int ret;
+
 	if (tick_nohz_full_enabled())
 		pr_info("Disabling watchdog on nohz_full cores by default\n");
 
 	cpumask_copy(&watchdog_cpumask,
 		     housekeeping_cpumask(HK_FLAG_TIMER));
 
-	if (!watchdog_nmi_probe())
+	ret = watchdog_nmi_probe();
+	if (!ret)
 		nmi_watchdog_available = true;
+	else if (ret == -EBUSY) {
+		lockup_detector_pending_init = true;
+		queue_work_on(smp_processor_id(), system_wq, &detector_work);
+	}
+
 	lockup_detector_setup();
 	watchdog_sysctl_init();
 }