diff mbox series

[RESEND,v8,09/10] watchdog: rzg2l_wdt: Power on the PM domain in rzg2l_wdt_restart()

Message ID 20240410134044.2138310-10-claudiu.beznea.uj@bp.renesas.com (mailing list archive)
State New
Headers show
Series watchdog: rzg2l_wdt: Add support for RZ/G3S | expand

Commit Message

Claudiu Beznea April 10, 2024, 1:40 p.m. UTC
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

The rzg2l_wdt_restart() is called from atomic context. Calling
pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
APIs is not an option as it may lead to issues as described in commit
e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
that removed the pm_runtime_get_sync() and used directly the
clk_prepare_enable() APIs.

Starting with RZ/G3S the watchdog could be part of its own software
controlled power domain (see the initial implementation in Link section).
In case the watchdog is not used the power domain is off and accessing
watchdog registers leads to aborts.

To solve this the patch powers on the power domain using
dev_pm_genpd_resume() API before enabling its clock. This is not
sleeping or taking any other locks as the power domain will not be
registered with GENPD_FLAG_IRQ_SAFE flags.

Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
---

Changes in v8:
- none, this patch is new

 drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Guenter Roeck April 10, 2024, 4:43 p.m. UTC | #1
On Wed, Apr 10, 2024 at 04:40:43PM +0300, Claudiu wrote:
> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> 
> The rzg2l_wdt_restart() is called from atomic context. Calling
> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
> APIs is not an option as it may lead to issues as described in commit
> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
> that removed the pm_runtime_get_sync() and used directly the
> clk_prepare_enable() APIs.
> 
> Starting with RZ/G3S the watchdog could be part of its own software
> controlled power domain (see the initial implementation in Link section).
> In case the watchdog is not used the power domain is off and accessing
> watchdog registers leads to aborts.
> 
> To solve this the patch powers on the power domain using
> dev_pm_genpd_resume() API before enabling its clock. This is not
> sleeping or taking any other locks as the power domain will not be
> registered with GENPD_FLAG_IRQ_SAFE flags.
> 
> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

Reviewed-by: Guenter Roeck <linux@roeck-us.net>

> ---
> 
> Changes in v8:
> - none, this patch is new
> 
>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
> index c8c20cfb97a3..98e5e9914a5d 100644
> --- a/drivers/watchdog/rzg2l_wdt.c
> +++ b/drivers/watchdog/rzg2l_wdt.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>  #include <linux/units.h>
> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>  	struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>  	int ret;
>  
> +	/*
> +	 * The device may be part of a power domain that is currently
> +	 * powered off. We need to power it up before accessing registers.
> +	 * We don't undo the dev_pm_genpd_resume() as the device need to
> +	 * be up for the reboot to happen. Also, as we are in atomic context
> +	 * here there is no need to increment PM runtime usage counter
> +	 * (to make sure pm_runtime_active() doesn't return wrong code).
> +	 */
> +	if (!pm_runtime_active(wdev->parent))
> +		dev_pm_genpd_resume(wdev->parent);
> +
>  	clk_prepare_enable(priv->pclk);
>  	clk_prepare_enable(priv->osc_clk);
>  
> -- 
> 2.39.2
>
Geert Uytterhoeven April 11, 2024, 3:20 p.m. UTC | #2
Hi Claudiu,

CC pmdomain

On Thu, Apr 11, 2024 at 1:11 PM Claudiu <claudiu.beznea@tuxon.dev> wrote:
> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>
> The rzg2l_wdt_restart() is called from atomic context. Calling
> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
> APIs is not an option as it may lead to issues as described in commit
> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
> that removed the pm_runtime_get_sync() and used directly the
> clk_prepare_enable() APIs.
>
> Starting with RZ/G3S the watchdog could be part of its own software
> controlled power domain (see the initial implementation in Link section).
> In case the watchdog is not used the power domain is off and accessing
> watchdog registers leads to aborts.
>
> To solve this the patch powers on the power domain using
> dev_pm_genpd_resume() API before enabling its clock. This is not
> sleeping or taking any other locks as the power domain will not be
> registered with GENPD_FLAG_IRQ_SAFE flags.
>
> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

Thanks for your patch!

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

> --- a/drivers/watchdog/rzg2l_wdt.c
> +++ b/drivers/watchdog/rzg2l_wdt.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>  #include <linux/units.h>
> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>         int ret;
>
> +       /*
> +        * The device may be part of a power domain that is currently
> +        * powered off. We need to power it up before accessing registers.
> +        * We don't undo the dev_pm_genpd_resume() as the device need to
> +        * be up for the reboot to happen. Also, as we are in atomic context
> +        * here there is no need to increment PM runtime usage counter
> +        * (to make sure pm_runtime_active() doesn't return wrong code).
> +        */
> +       if (!pm_runtime_active(wdev->parent))
> +               dev_pm_genpd_resume(wdev->parent);
> +
>         clk_prepare_enable(priv->pclk);
>         clk_prepare_enable(priv->osc_clk);
>

Gr{oetje,eeting}s,

                        Geert
Ulf Hansson April 12, 2024, 11:14 a.m. UTC | #3
On Wed, 10 Apr 2024 at 16:19, Claudiu <claudiu.beznea@tuxon.dev> wrote:
>
> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>
> The rzg2l_wdt_restart() is called from atomic context. Calling
> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
> APIs is not an option as it may lead to issues as described in commit
> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
> that removed the pm_runtime_get_sync() and used directly the
> clk_prepare_enable() APIs.
>
> Starting with RZ/G3S the watchdog could be part of its own software
> controlled power domain (see the initial implementation in Link section).
> In case the watchdog is not used the power domain is off and accessing
> watchdog registers leads to aborts.
>
> To solve this the patch powers on the power domain using
> dev_pm_genpd_resume() API before enabling its clock. This is not
> sleeping or taking any other locks as the power domain will not be
> registered with GENPD_FLAG_IRQ_SAFE flags.
>
> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> ---
>
> Changes in v8:
> - none, this patch is new
>
>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
> index c8c20cfb97a3..98e5e9914a5d 100644
> --- a/drivers/watchdog/rzg2l_wdt.c
> +++ b/drivers/watchdog/rzg2l_wdt.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>  #include <linux/units.h>
> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>         int ret;
>
> +       /*
> +        * The device may be part of a power domain that is currently
> +        * powered off. We need to power it up before accessing registers.
> +        * We don't undo the dev_pm_genpd_resume() as the device need to
> +        * be up for the reboot to happen. Also, as we are in atomic context
> +        * here there is no need to increment PM runtime usage counter
> +        * (to make sure pm_runtime_active() doesn't return wrong code).
> +        */
> +       if (!pm_runtime_active(wdev->parent))
> +               dev_pm_genpd_resume(wdev->parent);
> +

I doubt this is the correct solution, but I may be wrong. Unless this
is invoked at the syscore stage?

>         clk_prepare_enable(priv->pclk);
>         clk_prepare_enable(priv->osc_clk);
>
> --
> 2.39.2
>
>

Can you redirectly me to the complete series, so I can have a better
overview of the problem?

Kind regards
Uffe
Ulf Hansson April 12, 2024, 11:18 a.m. UTC | #4
On Wed, 10 Apr 2024 at 16:19, Claudiu <claudiu.beznea@tuxon.dev> wrote:
>
> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>
> The rzg2l_wdt_restart() is called from atomic context. Calling
> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
> APIs is not an option as it may lead to issues as described in commit
> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
> that removed the pm_runtime_get_sync() and used directly the
> clk_prepare_enable() APIs.

Calling clk_prepare_enable() doesn't work from an atomic context
either as it may sleep on the clk prepare mutex.

As I said in the other reply too, it looks like we need a different
solution. I am not sure what, but I am happy to help discuss it.

>
> Starting with RZ/G3S the watchdog could be part of its own software
> controlled power domain (see the initial implementation in Link section).
> In case the watchdog is not used the power domain is off and accessing
> watchdog registers leads to aborts.
>
> To solve this the patch powers on the power domain using
> dev_pm_genpd_resume() API before enabling its clock. This is not
> sleeping or taking any other locks as the power domain will not be
> registered with GENPD_FLAG_IRQ_SAFE flags.
>
> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

Kind regards
Uffe

> ---
>
> Changes in v8:
> - none, this patch is new
>
>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
> index c8c20cfb97a3..98e5e9914a5d 100644
> --- a/drivers/watchdog/rzg2l_wdt.c
> +++ b/drivers/watchdog/rzg2l_wdt.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>  #include <linux/units.h>
> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>         int ret;
>
> +       /*
> +        * The device may be part of a power domain that is currently
> +        * powered off. We need to power it up before accessing registers.
> +        * We don't undo the dev_pm_genpd_resume() as the device need to
> +        * be up for the reboot to happen. Also, as we are in atomic context
> +        * here there is no need to increment PM runtime usage counter
> +        * (to make sure pm_runtime_active() doesn't return wrong code).
> +        */
> +       if (!pm_runtime_active(wdev->parent))
> +               dev_pm_genpd_resume(wdev->parent);
> +
>         clk_prepare_enable(priv->pclk);
>         clk_prepare_enable(priv->osc_clk);
>
> --
> 2.39.2
>
>
Claudiu Beznea April 12, 2024, 2:02 p.m. UTC | #5
Hi, Ulf,

On 12.04.2024 14:14, Ulf Hansson wrote:
> On Wed, 10 Apr 2024 at 16:19, Claudiu <claudiu.beznea@tuxon.dev> wrote:
>>
>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>
>> The rzg2l_wdt_restart() is called from atomic context. Calling
>> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
>> APIs is not an option as it may lead to issues as described in commit
>> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
>> that removed the pm_runtime_get_sync() and used directly the
>> clk_prepare_enable() APIs.
>>
>> Starting with RZ/G3S the watchdog could be part of its own software
>> controlled power domain (see the initial implementation in Link section).
>> In case the watchdog is not used the power domain is off and accessing
>> watchdog registers leads to aborts.
>>
>> To solve this the patch powers on the power domain using
>> dev_pm_genpd_resume() API before enabling its clock. This is not
>> sleeping or taking any other locks as the power domain will not be
>> registered with GENPD_FLAG_IRQ_SAFE flags.
>>
>> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>> ---
>>
>> Changes in v8:
>> - none, this patch is new
>>
>>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
>> index c8c20cfb97a3..98e5e9914a5d 100644
>> --- a/drivers/watchdog/rzg2l_wdt.c
>> +++ b/drivers/watchdog/rzg2l_wdt.c
>> @@ -12,6 +12,7 @@
>>  #include <linux/module.h>
>>  #include <linux/of.h>
>>  #include <linux/platform_device.h>
>> +#include <linux/pm_domain.h>
>>  #include <linux/pm_runtime.h>
>>  #include <linux/reset.h>
>>  #include <linux/units.h>
>> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>>         int ret;
>>
>> +       /*
>> +        * The device may be part of a power domain that is currently
>> +        * powered off. We need to power it up before accessing registers.
>> +        * We don't undo the dev_pm_genpd_resume() as the device need to
>> +        * be up for the reboot to happen. Also, as we are in atomic context
>> +        * here there is no need to increment PM runtime usage counter
>> +        * (to make sure pm_runtime_active() doesn't return wrong code).
>> +        */
>> +       if (!pm_runtime_active(wdev->parent))
>> +               dev_pm_genpd_resume(wdev->parent);
>> +
> 
> I doubt this is the correct solution, but I may be wrong. Unless this
> is invoked at the syscore stage?

On my case I see it invoked from kernel_restart(). As of my code reading,
at that point only one CPU is active with IRQs disabled (done in
machine_restart()). Below is the stack trace decoded on next-20240410 with
this series
(https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/)
on top and the one from here (adding power domain support):
https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/

Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
Call trace:
dump_backtrace (arch/arm64/kernel/stacktrace.c:319)
show_stack (arch/arm64/kernel/stacktrace.c:326)
dump_stack_lvl (lib/dump_stack.c:117)
dump_stack (lib/dump_stack.c:124)
rzg2l_wdt_restart (drivers/watchdog/rzg2l_wdt.c:180)
watchdog_restart_notifier (drivers/watchdog/watchdog_core.c:188)
atomic_notifier_call_chain (kernel/notifier.c:98 kernel/notifier.c:231)
do_kernel_restart (kernel/reboot.c:236)
machine_restart (arch/arm64/kernel/process.c:145)
kernel_restart (kernel/reboot.c:287)
__do_sys_reboot (kernel/reboot.c:755)
__arm64_sys_reboot (kernel/reboot.c:715)
invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:53)
el0_svc_common.constprop.0 (include/linux/thread_info.h:127
arch/arm64/kernel/syscall.c:141)
do_el0_svc (arch/arm64/kernel/syscall.c:153)
el0_svc (arch/arm64/include/asm/irqflags.h:56
arch/arm64/include/asm/irqflags.h:77 arch/arm64/kernel/entry-common.c:165
arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
el0t_64_sync (arch/arm64/kernel/entry.S:598)

The watchdog restart handler is added in restart_handler_list and this list
is invoked though do_kernel_restart(). As of my code investigation the
restart_handler_list is invoked only though do_kernel_restart() and only
though the stack trace above.

Thank you,
Claudiu Beznea

> 
>>         clk_prepare_enable(priv->pclk);
>>         clk_prepare_enable(priv->osc_clk);
>>
>> --
>> 2.39.2
>>
>>
> 
> Can you redirectly me to the complete series, so I can have a better
> overview of the problem?

This is the series that adds power domain support for RZ/G3S SoC:
https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/

This is the series that adds watchdog support for RZ/G3S SoC:
https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/

Thank you for your review,
Claudiu Beznea

> 
> Kind regards
> Uffe
Claudiu Beznea April 24, 2024, 11:14 a.m. UTC | #6
Hi, Ulf,

On 12.04.2024 17:02, claudiu beznea wrote:
> Hi, Ulf,
> 
> On 12.04.2024 14:14, Ulf Hansson wrote:
>> On Wed, 10 Apr 2024 at 16:19, Claudiu <claudiu.beznea@tuxon.dev> wrote:
>>>
>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>
>>> The rzg2l_wdt_restart() is called from atomic context. Calling
>>> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
>>> APIs is not an option as it may lead to issues as described in commit
>>> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
>>> that removed the pm_runtime_get_sync() and used directly the
>>> clk_prepare_enable() APIs.
>>>
>>> Starting with RZ/G3S the watchdog could be part of its own software
>>> controlled power domain (see the initial implementation in Link section).
>>> In case the watchdog is not used the power domain is off and accessing
>>> watchdog registers leads to aborts.
>>>
>>> To solve this the patch powers on the power domain using
>>> dev_pm_genpd_resume() API before enabling its clock. This is not
>>> sleeping or taking any other locks as the power domain will not be
>>> registered with GENPD_FLAG_IRQ_SAFE flags.
>>>
>>> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>> ---
>>>
>>> Changes in v8:
>>> - none, this patch is new
>>>
>>>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
>>> index c8c20cfb97a3..98e5e9914a5d 100644
>>> --- a/drivers/watchdog/rzg2l_wdt.c
>>> +++ b/drivers/watchdog/rzg2l_wdt.c
>>> @@ -12,6 +12,7 @@
>>>  #include <linux/module.h>
>>>  #include <linux/of.h>
>>>  #include <linux/platform_device.h>
>>> +#include <linux/pm_domain.h>
>>>  #include <linux/pm_runtime.h>
>>>  #include <linux/reset.h>
>>>  #include <linux/units.h>
>>> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>>>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>>>         int ret;
>>>
>>> +       /*
>>> +        * The device may be part of a power domain that is currently
>>> +        * powered off. We need to power it up before accessing registers.
>>> +        * We don't undo the dev_pm_genpd_resume() as the device need to
>>> +        * be up for the reboot to happen. Also, as we are in atomic context
>>> +        * here there is no need to increment PM runtime usage counter
>>> +        * (to make sure pm_runtime_active() doesn't return wrong code).
>>> +        */
>>> +       if (!pm_runtime_active(wdev->parent))
>>> +               dev_pm_genpd_resume(wdev->parent);
>>> +
>>
>> I doubt this is the correct solution, but I may be wrong. Unless this
>> is invoked at the syscore stage?
> 
> On my case I see it invoked from kernel_restart(). As of my code reading,

With the above explanations, do you consider calling dev_pm_genpd_resume()
here is still wrong?

Do you have any suggestions I could try?

Thank you,
Claudiu Beznea

> at that point only one CPU is active with IRQs disabled (done in
> machine_restart()). Below is the stack trace decoded on next-20240410 with
> this series
> (https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/)
> on top and the one from here (adding power domain support):
> https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/
> 
> Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
> Call trace:
> dump_backtrace (arch/arm64/kernel/stacktrace.c:319)
> show_stack (arch/arm64/kernel/stacktrace.c:326)
> dump_stack_lvl (lib/dump_stack.c:117)
> dump_stack (lib/dump_stack.c:124)
> rzg2l_wdt_restart (drivers/watchdog/rzg2l_wdt.c:180)
> watchdog_restart_notifier (drivers/watchdog/watchdog_core.c:188)
> atomic_notifier_call_chain (kernel/notifier.c:98 kernel/notifier.c:231)
> do_kernel_restart (kernel/reboot.c:236)
> machine_restart (arch/arm64/kernel/process.c:145)
> kernel_restart (kernel/reboot.c:287)
> __do_sys_reboot (kernel/reboot.c:755)
> __arm64_sys_reboot (kernel/reboot.c:715)
> invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:53)
> el0_svc_common.constprop.0 (include/linux/thread_info.h:127
> arch/arm64/kernel/syscall.c:141)
> do_el0_svc (arch/arm64/kernel/syscall.c:153)
> el0_svc (arch/arm64/include/asm/irqflags.h:56
> arch/arm64/include/asm/irqflags.h:77 arch/arm64/kernel/entry-common.c:165
> arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
> el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
> el0t_64_sync (arch/arm64/kernel/entry.S:598)
> 
> The watchdog restart handler is added in restart_handler_list and this list
> is invoked though do_kernel_restart(). As of my code investigation the
> restart_handler_list is invoked only though do_kernel_restart() and only
> though the stack trace above.
> 
> Thank you,
> Claudiu Beznea
> 
>>
>>>         clk_prepare_enable(priv->pclk);
>>>         clk_prepare_enable(priv->osc_clk);
>>>
>>> --
>>> 2.39.2
>>>
>>>
>>
>> Can you redirectly me to the complete series, so I can have a better
>> overview of the problem?
> 
> This is the series that adds power domain support for RZ/G3S SoC:
> https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/
> 
> This is the series that adds watchdog support for RZ/G3S SoC:
> https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/
> 
> Thank you for your review,
> Claudiu Beznea
> 
>>
>> Kind regards
>> Uffe
Ulf Hansson April 29, 2024, 2:19 p.m. UTC | #7
On Wed, 24 Apr 2024 at 13:14, claudiu beznea <claudiu.beznea@tuxon.dev> wrote:
>
> Hi, Ulf,
>
> On 12.04.2024 17:02, claudiu beznea wrote:
> > Hi, Ulf,
> >
> > On 12.04.2024 14:14, Ulf Hansson wrote:
> >> On Wed, 10 Apr 2024 at 16:19, Claudiu <claudiu.beznea@tuxon.dev> wrote:
> >>>
> >>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> >>>
> >>> The rzg2l_wdt_restart() is called from atomic context. Calling
> >>> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
> >>> APIs is not an option as it may lead to issues as described in commit
> >>> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
> >>> that removed the pm_runtime_get_sync() and used directly the
> >>> clk_prepare_enable() APIs.
> >>>
> >>> Starting with RZ/G3S the watchdog could be part of its own software
> >>> controlled power domain (see the initial implementation in Link section).
> >>> In case the watchdog is not used the power domain is off and accessing
> >>> watchdog registers leads to aborts.
> >>>
> >>> To solve this the patch powers on the power domain using
> >>> dev_pm_genpd_resume() API before enabling its clock. This is not
> >>> sleeping or taking any other locks as the power domain will not be
> >>> registered with GENPD_FLAG_IRQ_SAFE flags.
> >>>
> >>> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
> >>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> >>> ---
> >>>
> >>> Changes in v8:
> >>> - none, this patch is new
> >>>
> >>>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
> >>>  1 file changed, 12 insertions(+)
> >>>
> >>> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
> >>> index c8c20cfb97a3..98e5e9914a5d 100644
> >>> --- a/drivers/watchdog/rzg2l_wdt.c
> >>> +++ b/drivers/watchdog/rzg2l_wdt.c
> >>> @@ -12,6 +12,7 @@
> >>>  #include <linux/module.h>
> >>>  #include <linux/of.h>
> >>>  #include <linux/platform_device.h>
> >>> +#include <linux/pm_domain.h>
> >>>  #include <linux/pm_runtime.h>
> >>>  #include <linux/reset.h>
> >>>  #include <linux/units.h>
> >>> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
> >>>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
> >>>         int ret;
> >>>
> >>> +       /*
> >>> +        * The device may be part of a power domain that is currently
> >>> +        * powered off. We need to power it up before accessing registers.
> >>> +        * We don't undo the dev_pm_genpd_resume() as the device need to
> >>> +        * be up for the reboot to happen. Also, as we are in atomic context
> >>> +        * here there is no need to increment PM runtime usage counter
> >>> +        * (to make sure pm_runtime_active() doesn't return wrong code).
> >>> +        */
> >>> +       if (!pm_runtime_active(wdev->parent))
> >>> +               dev_pm_genpd_resume(wdev->parent);
> >>> +
> >>
> >> I doubt this is the correct solution, but I may be wrong. Unless this
> >> is invoked at the syscore stage?
> >
> > On my case I see it invoked from kernel_restart(). As of my code reading,
>
> With the above explanations, do you consider calling dev_pm_genpd_resume()
> here is still wrong?

Yes. At least, those genpd functions were not added to cope for cases like this.

Moreover, you still need to find another solution as
clk_prepare_enable() can't be called in this path.

>
> Do you have any suggestions I could try?

Not at the moment, but I will try to circle back to this topic more
thinking next week, when I have some more time.

>
> Thank you,
> Claudiu Beznea

Kind regards
Uffe

>
> > at that point only one CPU is active with IRQs disabled (done in
> > machine_restart()). Below is the stack trace decoded on next-20240410 with
> > this series
> > (https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/)
> > on top and the one from here (adding power domain support):
> > https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/
> >
> > Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
> > Call trace:
> > dump_backtrace (arch/arm64/kernel/stacktrace.c:319)
> > show_stack (arch/arm64/kernel/stacktrace.c:326)
> > dump_stack_lvl (lib/dump_stack.c:117)
> > dump_stack (lib/dump_stack.c:124)
> > rzg2l_wdt_restart (drivers/watchdog/rzg2l_wdt.c:180)
> > watchdog_restart_notifier (drivers/watchdog/watchdog_core.c:188)
> > atomic_notifier_call_chain (kernel/notifier.c:98 kernel/notifier.c:231)
> > do_kernel_restart (kernel/reboot.c:236)
> > machine_restart (arch/arm64/kernel/process.c:145)
> > kernel_restart (kernel/reboot.c:287)
> > __do_sys_reboot (kernel/reboot.c:755)
> > __arm64_sys_reboot (kernel/reboot.c:715)
> > invoke_syscall (arch/arm64/include/asm/current.h:19
> > arch/arm64/kernel/syscall.c:53)
> > el0_svc_common.constprop.0 (include/linux/thread_info.h:127
> > arch/arm64/kernel/syscall.c:141)
> > do_el0_svc (arch/arm64/kernel/syscall.c:153)
> > el0_svc (arch/arm64/include/asm/irqflags.h:56
> > arch/arm64/include/asm/irqflags.h:77 arch/arm64/kernel/entry-common.c:165
> > arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
> > el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
> > el0t_64_sync (arch/arm64/kernel/entry.S:598)
> >
> > The watchdog restart handler is added in restart_handler_list and this list
> > is invoked though do_kernel_restart(). As of my code investigation the
> > restart_handler_list is invoked only though do_kernel_restart() and only
> > though the stack trace above.
> >
> > Thank you,
> > Claudiu Beznea
> >
> >>
> >>>         clk_prepare_enable(priv->pclk);
> >>>         clk_prepare_enable(priv->osc_clk);
> >>>
> >>> --
> >>> 2.39.2
> >>>
> >>>
> >>
> >> Can you redirectly me to the complete series, so I can have a better
> >> overview of the problem?
> >
> > This is the series that adds power domain support for RZ/G3S SoC:
> > https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/
> >
> > This is the series that adds watchdog support for RZ/G3S SoC:
> > https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/
> >
> > Thank you for your review,
> > Claudiu Beznea
> >
> >>
> >> Kind regards
> >> Uffe
Claudiu Beznea May 20, 2024, 11:55 a.m. UTC | #8
Hi, Ulf,

On 29.04.2024 17:19, Ulf Hansson wrote:
> On Wed, 24 Apr 2024 at 13:14, claudiu beznea <claudiu.beznea@tuxon.dev> wrote:
>>
>> Hi, Ulf,
>>
>> On 12.04.2024 17:02, claudiu beznea wrote:
>>> Hi, Ulf,
>>>
>>> On 12.04.2024 14:14, Ulf Hansson wrote:
>>>> On Wed, 10 Apr 2024 at 16:19, Claudiu <claudiu.beznea@tuxon.dev> wrote:
>>>>>
>>>>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>>
>>>>> The rzg2l_wdt_restart() is called from atomic context. Calling
>>>>> pm_runtime_{get_sync, resume_and_get}() or any other runtime PM resume
>>>>> APIs is not an option as it may lead to issues as described in commit
>>>>> e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait context'")
>>>>> that removed the pm_runtime_get_sync() and used directly the
>>>>> clk_prepare_enable() APIs.
>>>>>
>>>>> Starting with RZ/G3S the watchdog could be part of its own software
>>>>> controlled power domain (see the initial implementation in Link section).
>>>>> In case the watchdog is not used the power domain is off and accessing
>>>>> watchdog registers leads to aborts.
>>>>>
>>>>> To solve this the patch powers on the power domain using
>>>>> dev_pm_genpd_resume() API before enabling its clock. This is not
>>>>> sleeping or taking any other locks as the power domain will not be
>>>>> registered with GENPD_FLAG_IRQ_SAFE flags.
>>>>>
>>>>> Link: https://lore.kernel.org/all/20240208124300.2740313-1-claudiu.beznea.uj@bp.renesas.com
>>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
>>>>> ---
>>>>>
>>>>> Changes in v8:
>>>>> - none, this patch is new
>>>>>
>>>>>  drivers/watchdog/rzg2l_wdt.c | 12 ++++++++++++
>>>>>  1 file changed, 12 insertions(+)
>>>>>
>>>>> diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
>>>>> index c8c20cfb97a3..98e5e9914a5d 100644
>>>>> --- a/drivers/watchdog/rzg2l_wdt.c
>>>>> +++ b/drivers/watchdog/rzg2l_wdt.c
>>>>> @@ -12,6 +12,7 @@
>>>>>  #include <linux/module.h>
>>>>>  #include <linux/of.h>
>>>>>  #include <linux/platform_device.h>
>>>>> +#include <linux/pm_domain.h>
>>>>>  #include <linux/pm_runtime.h>
>>>>>  #include <linux/reset.h>
>>>>>  #include <linux/units.h>
>>>>> @@ -164,6 +165,17 @@ static int rzg2l_wdt_restart(struct watchdog_device *wdev,
>>>>>         struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
>>>>>         int ret;
>>>>>
>>>>> +       /*
>>>>> +        * The device may be part of a power domain that is currently
>>>>> +        * powered off. We need to power it up before accessing registers.
>>>>> +        * We don't undo the dev_pm_genpd_resume() as the device need to
>>>>> +        * be up for the reboot to happen. Also, as we are in atomic context
>>>>> +        * here there is no need to increment PM runtime usage counter
>>>>> +        * (to make sure pm_runtime_active() doesn't return wrong code).
>>>>> +        */
>>>>> +       if (!pm_runtime_active(wdev->parent))
>>>>> +               dev_pm_genpd_resume(wdev->parent);
>>>>> +
>>>>
>>>> I doubt this is the correct solution, but I may be wrong. Unless this
>>>> is invoked at the syscore stage?
>>>
>>> On my case I see it invoked from kernel_restart(). As of my code reading,
>>
>> With the above explanations, do you consider calling dev_pm_genpd_resume()
>> here is still wrong?
> 
> Yes. At least, those genpd functions were not added to cope for cases like this.

Sorry to bother you, do you have some suggestions on this topic?

On my side I did some investigation to see how else it could be implemented
but I don't have much clue how to go forward.

Would you prefer to have a separate API to deal with domain power on in
this scenario? Maybe one that should run only in the reboot context?

Would you consider only updating the description of  dev_pm_genpd_resume()
and genpd_sync_power_on() to specify that it could run in a reboot context?

Would you consider updating the genpd_switch_state() to take into system
reboot state and do locking based on that, too?


> 
> Moreover, you still need to find another solution as
> clk_prepare_enable() can't be called in this path.

The clock driver doesn't implement clk_ops::prepare in all
micro-architectures that this watchdog driver is used. This may be the
reason the clk_prepare_enable() was used on this path from the beginning.

Even though, a simple solution I have in mind for this is to keep the clk
prepared all the time.

Thank you,
Claudiu Beznea

> 
>>
>> Do you have any suggestions I could try?
> 
> Not at the moment, but I will try to circle back to this topic more
> thinking next week, when I have some more time.
> 
>>
>> Thank you,
>> Claudiu Beznea
> 
> Kind regards
> Uffe
> 
>>
>>> at that point only one CPU is active with IRQs disabled (done in
>>> machine_restart()). Below is the stack trace decoded on next-20240410 with
>>> this series
>>> (https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/)
>>> on top and the one from here (adding power domain support):
>>> https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/
>>>
>>> Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
>>> Call trace:
>>> dump_backtrace (arch/arm64/kernel/stacktrace.c:319)
>>> show_stack (arch/arm64/kernel/stacktrace.c:326)
>>> dump_stack_lvl (lib/dump_stack.c:117)
>>> dump_stack (lib/dump_stack.c:124)
>>> rzg2l_wdt_restart (drivers/watchdog/rzg2l_wdt.c:180)
>>> watchdog_restart_notifier (drivers/watchdog/watchdog_core.c:188)
>>> atomic_notifier_call_chain (kernel/notifier.c:98 kernel/notifier.c:231)
>>> do_kernel_restart (kernel/reboot.c:236)
>>> machine_restart (arch/arm64/kernel/process.c:145)
>>> kernel_restart (kernel/reboot.c:287)
>>> __do_sys_reboot (kernel/reboot.c:755)
>>> __arm64_sys_reboot (kernel/reboot.c:715)
>>> invoke_syscall (arch/arm64/include/asm/current.h:19
>>> arch/arm64/kernel/syscall.c:53)
>>> el0_svc_common.constprop.0 (include/linux/thread_info.h:127
>>> arch/arm64/kernel/syscall.c:141)
>>> do_el0_svc (arch/arm64/kernel/syscall.c:153)
>>> el0_svc (arch/arm64/include/asm/irqflags.h:56
>>> arch/arm64/include/asm/irqflags.h:77 arch/arm64/kernel/entry-common.c:165
>>> arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
>>> el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
>>> el0t_64_sync (arch/arm64/kernel/entry.S:598)
>>>
>>> The watchdog restart handler is added in restart_handler_list and this list
>>> is invoked though do_kernel_restart(). As of my code investigation the
>>> restart_handler_list is invoked only though do_kernel_restart() and only
>>> though the stack trace above.
>>>
>>> Thank you,
>>> Claudiu Beznea
>>>
>>>>
>>>>>         clk_prepare_enable(priv->pclk);
>>>>>         clk_prepare_enable(priv->osc_clk);
>>>>>
>>>>> --
>>>>> 2.39.2
>>>>>
>>>>>
>>>>
>>>> Can you redirectly me to the complete series, so I can have a better
>>>> overview of the problem?
>>>
>>> This is the series that adds power domain support for RZ/G3S SoC:
>>> https://lore.kernel.org/all/20240410122657.2051132-1-claudiu.beznea.uj@bp.renesas.com/
>>>
>>> This is the series that adds watchdog support for RZ/G3S SoC:
>>> https://lore.kernel.org/all/20240410134044.2138310-1-claudiu.beznea.uj@bp.renesas.com/
>>>
>>> Thank you for your review,
>>> Claudiu Beznea
>>>
>>>>
>>>> Kind regards
>>>> Uffe
diff mbox series

Patch

diff --git a/drivers/watchdog/rzg2l_wdt.c b/drivers/watchdog/rzg2l_wdt.c
index c8c20cfb97a3..98e5e9914a5d 100644
--- a/drivers/watchdog/rzg2l_wdt.c
+++ b/drivers/watchdog/rzg2l_wdt.c
@@ -12,6 +12,7 @@ 
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
+#include <linux/pm_domain.h>
 #include <linux/pm_runtime.h>
 #include <linux/reset.h>
 #include <linux/units.h>
@@ -164,6 +165,17 @@  static int rzg2l_wdt_restart(struct watchdog_device *wdev,
 	struct rzg2l_wdt_priv *priv = watchdog_get_drvdata(wdev);
 	int ret;
 
+	/*
+	 * The device may be part of a power domain that is currently
+	 * powered off. We need to power it up before accessing registers.
+	 * We don't undo the dev_pm_genpd_resume() as the device need to
+	 * be up for the reboot to happen. Also, as we are in atomic context
+	 * here there is no need to increment PM runtime usage counter
+	 * (to make sure pm_runtime_active() doesn't return wrong code).
+	 */
+	if (!pm_runtime_active(wdev->parent))
+		dev_pm_genpd_resume(wdev->parent);
+
 	clk_prepare_enable(priv->pclk);
 	clk_prepare_enable(priv->osc_clk);