diff mbox series

[4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model

Message ID 20220212104349.14266-5-lecopzer.chen@mediatek.com (mailing list archive)
State New, archived
Headers show
Series Support hld based on Pseudo-NMI for arm64 | expand

Commit Message

Lecopzer Chen Feb. 12, 2022, 10:43 a.m. UTC
From: Pingfan Liu <kernelfans@gmail.com>

from: Pingfan Liu <kernelfans@gmail.com>

When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.

The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
---
 kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

Comments

Petr Mladek Feb. 25, 2022, 3:20 p.m. UTC | #1
On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> From: Pingfan Liu <kernelfans@gmail.com>
> 
> from: Pingfan Liu <kernelfans@gmail.com>
> 
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
> 
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
> 
> The async model is achieved by expanding watchdog_nmi_probe() with
> -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> ---
>  kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b71d434cf648..fa8490cfeef8 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
>  #define watchdog_sysctl_init() do { } while (0)
>  #endif /* CONFIG_SYSCTL */
>  
> +static void lockup_detector_delay_init(struct work_struct *work);
> +enum hld_detector_state detector_delay_init_state __initdata;

I would call this "lockup_detector_init_state" to use the same
naming scheme everywhere.

> +
> +struct wait_queue_head hld_detector_wait __initdata =
> +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> +
> +static struct work_struct detector_work __initdata =

I would call this "lockup_detector_work" to use the same naming scheme
everywhere.

> +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> +
> +static void __init lockup_detector_delay_init(struct work_struct *work)
> +{
> +	int ret;
> +
> +	wait_event(hld_detector_wait,
> +			detector_delay_init_state == DELAY_INIT_READY);

DELAY_INIT_READY is defined in the 5th patch.

There are many other build errors because this patch uses something
that is defined in the 5th patch.

> +	ret = watchdog_nmi_probe();
> +	if (!ret) {
> +		nmi_watchdog_available = true;
> +		lockup_detector_setup();
> +	} else {
> +		WARN_ON(ret == -EBUSY);

Why WARN_ON(), please?

Note that it might cause panic() when "panic_on_warn" command line
parameter is used.

Also the backtrace will not help much. The context is well known.
This code is called from a workqueue worker.


> +		pr_info("Perf NMI watchdog permanently disabled\n");
> +	}
> +}
> +
> +/* Ensure the check is called after the initialization of PMU driver */
> +static int __init lockup_detector_check(void)
> +{
> +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> +		return 0;
> +
> +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {

Again. Is WARN_ON() needed?

Also the condition looks wrong. IMHO, this is the expected state.

> +		detector_delay_init_state = DELAY_INIT_READY;
> +		wake_up(&hld_detector_wait);
> +	}
> +	flush_work(&detector_work);
> +	return 0;
> +}
> +late_initcall_sync(lockup_detector_check);

Otherwise, it make sense.

Best Regards,
Petr

PS: I am not going to review the last patch because I am no familiar
    with arm. I reviewed just the changes in the generic watchdog
    code.
Lecopzer Chen Feb. 26, 2022, 10:52 a.m. UTC | #2
> On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> > From: Pingfan Liu <kernelfans@gmail.com>
> > 
> > from: Pingfan Liu <kernelfans@gmail.com>
> > 
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> > 
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> > 
> > The async model is achieved by expanding watchdog_nmi_probe() with
> > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> > Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> > ---
> >  kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 54 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index b71d434cf648..fa8490cfeef8 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
> >  #define watchdog_sysctl_init() do { } while (0)
> >  #endif /* CONFIG_SYSCTL */
> >  
> > +static void lockup_detector_delay_init(struct work_struct *work);
> > +enum hld_detector_state detector_delay_init_state __initdata;
> 
> I would call this "lockup_detector_init_state" to use the same
> naming scheme everywhere.
> 
> > +
> > +struct wait_queue_head hld_detector_wait __initdata =
> > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > +
> > +static struct work_struct detector_work __initdata =
> 
> I would call this "lockup_detector_work" to use the same naming scheme
> everywhere.

For the naming part, I'll revise both of them in next patch.

> 
> > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > +
> > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > +{
> > +	int ret;
> > +
> > +	wait_event(hld_detector_wait,
> > +			detector_delay_init_state == DELAY_INIT_READY);
> 
> DELAY_INIT_READY is defined in the 5th patch.
> 
> There are many other build errors because this patch uses something
> that is defined in the 5th patch.

Thanks for pointing this out, the I'll fix 4th and 5th patches to correct the order.

> 
> > +	ret = watchdog_nmi_probe();
> > +	if (!ret) {
> > +		nmi_watchdog_available = true;
> > +		lockup_detector_setup();
> > +	} else {
> > +		WARN_ON(ret == -EBUSY);
> 
> Why WARN_ON(), please?
> 
> Note that it might cause panic() when "panic_on_warn" command line
> parameter is used.
> 
> Also the backtrace will not help much. The context is well known.
> This code is called from a workqueue worker.
 
The motivation to WARN should be:

lockup_detector_init
-> watchdog_nmi_probe return -EBUSY
-> lockup_detector_delay_init checks (detector_delay_init_state == DELAY_INIT_READY)
-> watchdog_nmi_probe checks
+	if (detector_delay_init_state != DELAY_INIT_READY)
+		return -EBUSY;

Since we first check detector_delay_init_state equals to DELAY_INIT_READY
and goes into watchdog_nmi_probe() and checks detector_delay_init_state again
becasue now we move from common part to arch part code.
In this condition, there shouldn't have any racing to detector_delay_init_state.
If it does happend an unknown racing, then shows a warning to it.

I think it make sense to remove WARN now becasue it looks verbosely...
However, I would rather change the following printk to
"Delayed init for lockup detector failed."

Is this fine with you?



> 
> > +		pr_info("Perf NMI watchdog permanently disabled\n");
> > +	}
> > +}
> > +
> > +/* Ensure the check is called after the initialization of PMU driver */
> > +static int __init lockup_detector_check(void)
> > +{
> > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > +		return 0;
> > +
> > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> 
> Again. Is WARN_ON() needed?
> 
> Also the condition looks wrong. IMHO, this is the expected state.
> 

This does expected DELAY_INIT_READY here, which means,
every one who comes here to be checked should be READY and WARN if you're
still in WAIT state, and which means the previous lockup_detector_delay_init()
failed.

IMO, either keeping or removing WARN is fine with me.

I think I'll remove WARN and add
pr_info("Delayed init checking for lockup detector failed, retry for once.");
inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`

Or would you have any other suggestion? thanks.

> > +		detector_delay_init_state = DELAY_INIT_READY;
> > +		wake_up(&hld_detector_wait);
> > +	}
> > +	flush_work(&detector_work);
> > +	return 0;
> > +}
> > +late_initcall_sync(lockup_detector_check);
> 
> Otherwise, it make sense.
> 
> Best Regards,
> Petr
> 
> PS: I am not going to review the last patch because I am no familiar
>     with arm. I reviewed just the changes in the generic watchdog
>     code.

Thanks again for your review.


BRs,
Lecopzer
Petr Mladek Feb. 28, 2022, 10:14 a.m. UTC | #3
On Sat 2022-02-26 18:52:29, Lecopzer Chen wrote:
> > On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> > > From: Pingfan Liu <kernelfans@gmail.com>
> > > 
> > > from: Pingfan Liu <kernelfans@gmail.com>
> > > 
> > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > yet. E.g. on arm64, PMU is not ready until
> > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > with the driver model and cpuhp. Hence it is hard to push this
> > > initialization before smp_init().
> > > 
> > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > get the capability of PMU async.
> > > 
> > > The async model is achieved by expanding watchdog_nmi_probe() with
> > > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> > > 
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index b71d434cf648..fa8490cfeef8 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
> > >  #define watchdog_sysctl_init() do { } while (0)
> > >  #endif /* CONFIG_SYSCTL */
> > >  
> > > +static void lockup_detector_delay_init(struct work_struct *work);
> > > +enum hld_detector_state detector_delay_init_state __initdata;
> > 
> > I would call this "lockup_detector_init_state" to use the same
> > naming scheme everywhere.
> > 
> > > +
> > > +struct wait_queue_head hld_detector_wait __initdata =
> > > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > > +
> > > +static struct work_struct detector_work __initdata =
> > 
> > I would call this "lockup_detector_work" to use the same naming scheme
> > everywhere.
> 
> For the naming part, I'll revise both of them in next patch.
> 
> > 
> > > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > > +
> > > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > > +{
> > > +	int ret;
> > > +
> > > +	wait_event(hld_detector_wait,
> > > +			detector_delay_init_state == DELAY_INIT_READY);
> > 
> > DELAY_INIT_READY is defined in the 5th patch.
> > 
> > There are many other build errors because this patch uses something
> > that is defined in the 5th patch.
> 
> Thanks for pointing this out, the I'll fix 4th and 5th patches to correct the order.
> 
> > 
> > > +	ret = watchdog_nmi_probe();
> > > +	if (!ret) {
> > > +		nmi_watchdog_available = true;
> > > +		lockup_detector_setup();
> > > +	} else {
> > > +		WARN_ON(ret == -EBUSY);
> > 
> > Why WARN_ON(), please?
> > 
> > Note that it might cause panic() when "panic_on_warn" command line
> > parameter is used.
> > 
> > Also the backtrace will not help much. The context is well known.
> > This code is called from a workqueue worker.
>  
> The motivation to WARN should be:
> 
> lockup_detector_init
> -> watchdog_nmi_probe return -EBUSY
> -> lockup_detector_delay_init checks (detector_delay_init_state == DELAY_INIT_READY)
> -> watchdog_nmi_probe checks
> +	if (detector_delay_init_state != DELAY_INIT_READY)
> +		return -EBUSY;
> 
> Since we first check detector_delay_init_state equals to DELAY_INIT_READY
> and goes into watchdog_nmi_probe() and checks detector_delay_init_state again
> becasue now we move from common part to arch part code.
> In this condition, there shouldn't have any racing to detector_delay_init_state.
> If it does happend an unknown racing, then shows a warning to it.

There should not be any race.

     wait_event(hld_detector_wait,
		detector_delay_init_state == DELAY_INIT_READY);

waits until it is waken by lockup_detector_check(). Well, it could
wait forewer when lockup_detector_check() is caller earlier, see below.


> I think it make sense to remove WARN now becasue it looks verbosely...
> However, I would rather change the following printk to
> "Delayed init for lockup detector failed."

I would print both messages. The above message says what failed.


> > > +		pr_info("Perf NMI watchdog permanently disabled\n");

And this message explains what is the result of the above failure.
It is not obvious.

> > > +	}
> > > +}
> > > +
> > > +/* Ensure the check is called after the initialization of PMU driver */
> > > +static int __init lockup_detector_check(void)
> > > +{
> > > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > > +		return 0;
> > > +
> > > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> > 
> > Again. Is WARN_ON() needed?
> > 
> > Also the condition looks wrong. IMHO, this is the expected state.
> > 
> 
> This does expected DELAY_INIT_READY here, which means,
> every one who comes here to be checked should be READY and WARN if you're
> still in WAIT state, and which means the previous lockup_detector_delay_init()
> failed.

No, DELAY_INIT_READY is set below. DELAY_INIT_WAIT is valid value here.
It means that lockup_detector_delay_init() work is queued.


> IMO, either keeping or removing WARN is fine with me.
> 
> I think I'll remove WARN and add
> pr_info("Delayed init checking for lockup detector failed, retry for once.");
> inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`
> 
> Or would you have any other suggestion? thanks.
> 
> > > +		detector_delay_init_state = DELAY_INIT_READY;
> > > +		wake_up(&hld_detector_wait);

I see another problem now. We should always call the wake up here
when the work was queued. Otherwise, the worker will stay blocked
forewer.

The worker will also get blocked when the late_initcall is called
before the work is proceed by a worker.

> > > +	}
> > > +	flush_work(&detector_work);
> > > +	return 0;
> > > +}
> > > +late_initcall_sync(lockup_detector_check);


OK, I think that the three states are too complicated. I suggest to
use only a single bool. Something like:

static bool lockup_detector_pending_init __initdata;

struct wait_queue_head lockup_detector_wait __initdata =
		__WAIT_QUEUE_HEAD_INITIALIZER(lockup_detector_wait);

static struct work_struct detector_work __initdata =
		__WORK_INITIALIZER(lockup_detector_work,
				   lockup_detector_delay_init);

static void __init lockup_detector_delay_init(struct work_struct *work)
{
	int ret;

	wait_event(lockup_detector_wait, lockup_detector_pending_init == false);

	ret = watchdog_nmi_probe();
	if (ret) {
		pr_info("Delayed init of the lockup detector failed: %\n);
		pr_info("Perf NMI watchdog permanently disabled\n");
		return;
	}

	nmi_watchdog_available = true;
	lockup_detector_setup();
}

/* Trigger delayedEnsure the check is called after the initialization of PMU driver */
static int __init lockup_detector_check(void)
{
	if (!lockup_detector_pending_init)
		return;

	lockup_detector_pending_init = false;
	wake_up(&lockup_detector_wait);
	return 0;
}
late_initcall_sync(lockup_detector_check);

void __init lockup_detector_init(void)
{
	int ret;

	if (tick_nohz_full_enabled())
		pr_info("Disabling watchdog on nohz_full cores by default\n");

	cpumask_copy(&watchdog_cpumask,
		     housekeeping_cpumask(HK_FLAG_TIMER));

	ret = watchdog_nmi_probe();
	if (!ret)
		nmi_watchdog_available = true;
	else if (ret == -EBUSY) {
		detector_delay_pending_init = true;
		/* Init must be done in a process context on a bound CPU. */
		queue_work_on(smp_processor_id(), system_wq, 
				  &lockup_detector_work);
	}

	lockup_detector_setup();
	watchdog_sysctl_init();
}

The result is that lockup_detector_work() will never stay blocked
forever. There are two possibilities:

1.  lockup_detector_work() called before lockup_detector_check().
    In this case, wait_event() will wait until lockup_detector_check()
    clears detector_delay_pending_init and calls wake_up().

2. lockup_detector_check() called before lockup_detector_work().
   In this case, wait_even() will immediately continue because
   it will see cleared detector_delay_pending_init.


Best Regards,
Petr
Lecopzer Chen Feb. 28, 2022, 4:32 p.m. UTC | #4
Yes, there is no race now, the condition is much like a verbose checking for
the state. I'll remove it.


> > I think it make sense to remove WARN now becasue it looks verbosely...
> > However, I would rather change the following printk to
> > "Delayed init for lockup detector failed."
> 
> I would print both messages. The above message says what failed.
> 
> 
> > > > +		pr_info("Perf NMI watchdog permanently disabled\n");
> 
> And this message explains what is the result of the above failure.
> It is not obvious.

Yes, make sense, let's print both.


> 
> > > > +	}
> > > > +}
> > > > +
> > > > +/* Ensure the check is called after the initialization of PMU driver */
> > > > +static int __init lockup_detector_check(void)
> > > > +{
> > > > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > > > +		return 0;
> > > > +
> > > > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> > > 
> > > Again. Is WARN_ON() needed?
> > > 
> > > Also the condition looks wrong. IMHO, this is the expected state.
> > > 
> > 
> > This does expected DELAY_INIT_READY here, which means,
> > every one who comes here to be checked should be READY and WARN if you're
> > still in WAIT state, and which means the previous lockup_detector_delay_init()
> > failed.
> 
> No, DELAY_INIT_READY is set below. DELAY_INIT_WAIT is valid value here.
> It means that lockup_detector_delay_init() work is queued.
> 

Sorry, I didn't describe clearly,

For the call flow:

kernel_init_freeable()
-> lockup_detector_init()
--> queue work(lockup_detector_delay_init) with state registering
    to DELAY_INIT_WAIT.
---> lockup_detector_delay_init wait DELAY_INIT_READY that set
     by armv8_pmu_driver_init().
----> device_initcall(armv8_pmu_driver_init),
      set state to READY and wake_up the work. (in 5th patch)
-----> lockup_detector_delay_init recieves READY and calls
       watchdog_nmi_probe() again.
------> late_initcall_sync(lockup_detector_check);
        check if the state is READY? In other words, did the arch driver
        finish probing watchdog between "queue work" and "late_initcall_sync()"?
        If not, we forcely set state to READY and wake_up again.


> 
> > IMO, either keeping or removing WARN is fine with me.
> > 
> > I think I'll remove WARN and add
> > pr_info("Delayed init checking for lockup detector failed, retry for once.");
> > inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`
> > 
> > Or would you have any other suggestion? thanks.
> > 
> > > > +		detector_delay_init_state = DELAY_INIT_READY;
> > > > +		wake_up(&hld_detector_wait);
> 
> I see another problem now. We should always call the wake up here
> when the work was queued. Otherwise, the worker will stay blocked
> forewer.
> 
> The worker will also get blocked when the late_initcall is called
> before the work is proceed by a worker.

lockup_detector_check() is used to solve the blocking state.
As the description above, if state is WAIT when lockup_detector_check(),
we would forcely set state to READY can wake up the work for once.
After lockup_detector_check(), nobody cares about the state and the worker
also finishes its work.

> 
> > > > +	}
> > > > +	flush_work(&detector_work);
> > > > +	return 0;
> > > > +}
> > > > +late_initcall_sync(lockup_detector_check);
> 
> 
> OK, I think that the three states are too complicated. I suggest to
> use only a single bool. Something like:
> 
> static bool lockup_detector_pending_init __initdata;
> 
> struct wait_queue_head lockup_detector_wait __initdata =
> 		__WAIT_QUEUE_HEAD_INITIALIZER(lockup_detector_wait);
> 
> static struct work_struct detector_work __initdata =
> 		__WORK_INITIALIZER(lockup_detector_work,
> 				   lockup_detector_delay_init);
> 
> static void __init lockup_detector_delay_init(struct work_struct *work)
> {
> 	int ret;
> 
> 	wait_event(lockup_detector_wait, lockup_detector_pending_init == false);
> 
> 	ret = watchdog_nmi_probe();
> 	if (ret) {
> 		pr_info("Delayed init of the lockup detector failed: %\n);
> 		pr_info("Perf NMI watchdog permanently disabled\n");
> 		return;
> 	}
> 
> 	nmi_watchdog_available = true;
> 	lockup_detector_setup();
> }
> 
> /* Trigger delayedEnsure the check is called after the initialization of PMU driver */
> static int __init lockup_detector_check(void)
> {
> 	if (!lockup_detector_pending_init)
> 		return;
> 
> 	lockup_detector_pending_init = false;
> 	wake_up(&lockup_detector_wait);
> 	return 0;
> }
> late_initcall_sync(lockup_detector_check);
> 
> void __init lockup_detector_init(void)
> {
> 	int ret;
> 
> 	if (tick_nohz_full_enabled())
> 		pr_info("Disabling watchdog on nohz_full cores by default\n");
> 
> 	cpumask_copy(&watchdog_cpumask,
> 		     housekeeping_cpumask(HK_FLAG_TIMER));
> 
> 	ret = watchdog_nmi_probe();
> 	if (!ret)
> 		nmi_watchdog_available = true;
> 	else if (ret == -EBUSY) {
> 		detector_delay_pending_init = true;
> 		/* Init must be done in a process context on a bound CPU. */
> 		queue_work_on(smp_processor_id(), system_wq, 
> 				  &lockup_detector_work);
> 	}
> 
> 	lockup_detector_setup();
> 	watchdog_sysctl_init();
> }
> 
> The result is that lockup_detector_work() will never stay blocked
> forever. There are two possibilities:
> 
> 1.  lockup_detector_work() called before lockup_detector_check().
>     In this case, wait_event() will wait until lockup_detector_check()
>     clears detector_delay_pending_init and calls wake_up().
> 
> 2. lockup_detector_check() called before lockup_detector_work().
>    In this case, wait_even() will immediately continue because
>    it will see cleared detector_delay_pending_init.
> 

Thanks, I think this logic is much simpler than three states for our use case now,
It also fits the call flow described above, I will revise it base on this
code.


Thanks a lot for your code and review!

BRs,
Lecopzer
diff mbox series

Patch

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b71d434cf648..fa8490cfeef8 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -103,7 +103,11 @@  void __weak watchdog_nmi_disable(unsigned int cpu)
 	hardlockup_detector_perf_disable();
 }
 
-/* Return 0, if a NMI watchdog is available. Error code otherwise */
+/*
+ * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not
+ * ready, and arch code should wake up hld_detector_wait when ready. Other
+ * negative value if not support.
+ */
 int __weak __init watchdog_nmi_probe(void)
 {
 	return hardlockup_detector_perf_init();
@@ -839,16 +843,64 @@  static void __init watchdog_sysctl_init(void)
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
+static void lockup_detector_delay_init(struct work_struct *work);
+enum hld_detector_state detector_delay_init_state __initdata;
+
+struct wait_queue_head hld_detector_wait __initdata =
+		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
+
+static struct work_struct detector_work __initdata =
+		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
+
+static void __init lockup_detector_delay_init(struct work_struct *work)
+{
+	int ret;
+
+	wait_event(hld_detector_wait,
+			detector_delay_init_state == DELAY_INIT_READY);
+	ret = watchdog_nmi_probe();
+	if (!ret) {
+		nmi_watchdog_available = true;
+		lockup_detector_setup();
+	} else {
+		WARN_ON(ret == -EBUSY);
+		pr_info("Perf NMI watchdog permanently disabled\n");
+	}
+}
+
+/* Ensure the check is called after the initialization of PMU driver */
+static int __init lockup_detector_check(void)
+{
+	if (detector_delay_init_state < DELAY_INIT_WAIT)
+		return 0;
+
+	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
+		detector_delay_init_state = DELAY_INIT_READY;
+		wake_up(&hld_detector_wait);
+	}
+	flush_work(&detector_work);
+	return 0;
+}
+late_initcall_sync(lockup_detector_check);
+
 void __init lockup_detector_init(void)
 {
+	int ret;
+
 	if (tick_nohz_full_enabled())
 		pr_info("Disabling watchdog on nohz_full cores by default\n");
 
 	cpumask_copy(&watchdog_cpumask,
 		     housekeeping_cpumask(HK_FLAG_TIMER));
 
-	if (!watchdog_nmi_probe())
+	ret = watchdog_nmi_probe();
+	if (!ret)
 		nmi_watchdog_available = true;
+	else if (ret == -EBUSY) {
+		detector_delay_init_state = DELAY_INIT_WAIT;
+		queue_work_on(smp_processor_id(), system_wq, &detector_work);
+	}
+
 	lockup_detector_setup();
 	watchdog_sysctl_init();
 }