diff mbox series

coresight: cti: Fix hang in cti_disable_hw()

Message ID 20221005131452.1506328-1-james.clark@arm.com (mailing list archive)
State New, archived
Headers show
Series coresight: cti: Fix hang in cti_disable_hw() | expand

Commit Message

James Clark Oct. 5, 2022, 1:14 p.m. UTC
cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.

Since commit 3c6656337852 ("Revert "firmware: arm_scmi: Add clock
management to the SCMI power domain""), this causes a hang on Juno when
running the Perf Coresight tests or running this command:

  perf record -e cs_etm//u -- ls

This was also missed until the revert commit because pm_runtime_put()
was called with the wrong device until commit 692c9a499b28 ("coresight:
cti: Correct the parameter for pm_runtime_put")

With lock and scheduler debugging enabled the following is output:

   coresight cti_sys0: cti_enable_hw -- dev:cti_sys0  parent: 20020000.cti
   BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
   in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
   preempt_count: 2, expected: 0
   RCU nest depth: 0, expected: 0
   INFO: lockdep is turned off.
   irq event stamp: 0
   hardirqs last  enabled at (0): [<0000000000000000>] 0x0
   hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
   softirqs last  enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
   softirqs last disabled at (0): [<0000000000000000>] 0x0
   CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
   Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
   Call trace:
    dump_backtrace+0x134/0x140
    show_stack+0x20/0x58
    dump_stack_lvl+0x8c/0xb8
    dump_stack+0x18/0x34
    __might_resched+0x180/0x228
    __might_sleep+0x50/0x88
    __pm_runtime_resume+0xac/0xb0
    cti_enable+0x44/0x120
    coresight_control_assoc_ectdev+0xc0/0x150
    coresight_enable_path+0xb4/0x288
    etm_event_start+0x138/0x170
    etm_event_add+0x48/0x70
    event_sched_in.isra.122+0xb4/0x280
    merge_sched_in+0x1fc/0x3d0
    visit_groups_merge.constprop.137+0x16c/0x4b0
    ctx_sched_in+0x114/0x1f0
    perf_event_sched_in+0x60/0x90
    ctx_resched+0x68/0xb0
    perf_event_exec+0x138/0x508
    begin_new_exec+0x52c/0xd40
    load_elf_binary+0x6b8/0x17d0
    bprm_execve+0x360/0x7f8
    do_execveat_common.isra.47+0x218/0x238
    __arm64_sys_execve+0x48/0x60
    invoke_syscall+0x4c/0x110
    el0_svc_common.constprop.4+0xfc/0x120
    do_el0_svc+0x34/0xc0
    el0_svc+0x40/0x98
    el0t_64_sync_handler+0x98/0xc0
    el0t_64_sync+0x170/0x174

Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.

Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
Signed-off-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
---
 drivers/hwtracing/coresight/coresight-cti-core.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

James Clark Oct. 5, 2022, 1:24 p.m. UTC | #1
On 05/10/2022 14:14, James Clark wrote:
> cti_enable_hw() and cti_disable_hw() are called from an atomic context
> so shouldn't use runtime PM because it can result in a sleep when
> communicating with firmware.

Hi Mike,

Are you able to test this on a Dragonboard please? I've only tested on
Juno so far.

Thanks
James

> 
> Since commit 3c6656337852 ("Revert "firmware: arm_scmi: Add clock
> management to the SCMI power domain""), this causes a hang on Juno when
> running the Perf Coresight tests or running this command:
> 
>   perf record -e cs_etm//u -- ls
> 
> This was also missed until the revert commit because pm_runtime_put()
> was called with the wrong device until commit 692c9a499b28 ("coresight:
> cti: Correct the parameter for pm_runtime_put")
> 
> With lock and scheduler debugging enabled the following is output:
> 
>    coresight cti_sys0: cti_enable_hw -- dev:cti_sys0  parent: 20020000.cti
>    BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
>    in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
>    preempt_count: 2, expected: 0
>    RCU nest depth: 0, expected: 0
>    INFO: lockdep is turned off.
>    irq event stamp: 0
>    hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>    hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
>    softirqs last  enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
>    softirqs last disabled at (0): [<0000000000000000>] 0x0
>    CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
>    Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
>    Call trace:
>     dump_backtrace+0x134/0x140
>     show_stack+0x20/0x58
>     dump_stack_lvl+0x8c/0xb8
>     dump_stack+0x18/0x34
>     __might_resched+0x180/0x228
>     __might_sleep+0x50/0x88
>     __pm_runtime_resume+0xac/0xb0
>     cti_enable+0x44/0x120
>     coresight_control_assoc_ectdev+0xc0/0x150
>     coresight_enable_path+0xb4/0x288
>     etm_event_start+0x138/0x170
>     etm_event_add+0x48/0x70
>     event_sched_in.isra.122+0xb4/0x280
>     merge_sched_in+0x1fc/0x3d0
>     visit_groups_merge.constprop.137+0x16c/0x4b0
>     ctx_sched_in+0x114/0x1f0
>     perf_event_sched_in+0x60/0x90
>     ctx_resched+0x68/0xb0
>     perf_event_exec+0x138/0x508
>     begin_new_exec+0x52c/0xd40
>     load_elf_binary+0x6b8/0x17d0
>     bprm_execve+0x360/0x7f8
>     do_execveat_common.isra.47+0x218/0x238
>     __arm64_sys_execve+0x48/0x60
>     invoke_syscall+0x4c/0x110
>     el0_svc_common.constprop.4+0xfc/0x120
>     do_el0_svc+0x34/0xc0
>     el0_svc+0x40/0x98
>     el0t_64_sync_handler+0x98/0xc0
>     el0t_64_sync+0x170/0x174
> 
> Fix the issue by removing the runtime PM calls completely. They are not
> needed here because it must have already been done when building the
> path for a trace.
> 
> Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
> Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
> Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
> Signed-off-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-cti-core.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index 8988b2ed2ea6..c5f7fc4e2552 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -94,7 +94,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
>  	unsigned long flags;
>  	int rc = 0;
>  
> -	pm_runtime_get_sync(dev->parent);
>  	spin_lock_irqsave(&drvdata->spinlock, flags);
>  
>  	/* no need to do anything if enabled or unpowered*/
> @@ -119,7 +118,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
>  	/* cannot enable due to error */
>  cti_err_not_enabled:
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -	pm_runtime_put(dev->parent);
>  	return rc;
>  }
>  
> @@ -175,7 +173,6 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
>  	coresight_disclaim_device_unlocked(csdev);
>  	CS_LOCK(drvdata->base);
>  	spin_unlock(&drvdata->spinlock);
> -	pm_runtime_put(dev->parent);
>  	return 0;
>  
>  	/* not disabled this call */
Mike Leach Oct. 19, 2022, 1:53 p.m. UTC | #2
Tested on DB410c + kernel 6.0.0

Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>

On Wed, 5 Oct 2022 at 14:16, James Clark <james.clark@arm.com> wrote:
>
> cti_enable_hw() and cti_disable_hw() are called from an atomic context
> so shouldn't use runtime PM because it can result in a sleep when
> communicating with firmware.
>
> Since commit 3c6656337852 ("Revert "firmware: arm_scmi: Add clock
> management to the SCMI power domain""), this causes a hang on Juno when
> running the Perf Coresight tests or running this command:
>
>   perf record -e cs_etm//u -- ls
>
> This was also missed until the revert commit because pm_runtime_put()
> was called with the wrong device until commit 692c9a499b28 ("coresight:
> cti: Correct the parameter for pm_runtime_put")
>
> With lock and scheduler debugging enabled the following is output:
>
>    coresight cti_sys0: cti_enable_hw -- dev:cti_sys0  parent: 20020000.cti
>    BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
>    in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
>    preempt_count: 2, expected: 0
>    RCU nest depth: 0, expected: 0
>    INFO: lockdep is turned off.
>    irq event stamp: 0
>    hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>    hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
>    softirqs last  enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
>    softirqs last disabled at (0): [<0000000000000000>] 0x0
>    CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
>    Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
>    Call trace:
>     dump_backtrace+0x134/0x140
>     show_stack+0x20/0x58
>     dump_stack_lvl+0x8c/0xb8
>     dump_stack+0x18/0x34
>     __might_resched+0x180/0x228
>     __might_sleep+0x50/0x88
>     __pm_runtime_resume+0xac/0xb0
>     cti_enable+0x44/0x120
>     coresight_control_assoc_ectdev+0xc0/0x150
>     coresight_enable_path+0xb4/0x288
>     etm_event_start+0x138/0x170
>     etm_event_add+0x48/0x70
>     event_sched_in.isra.122+0xb4/0x280
>     merge_sched_in+0x1fc/0x3d0
>     visit_groups_merge.constprop.137+0x16c/0x4b0
>     ctx_sched_in+0x114/0x1f0
>     perf_event_sched_in+0x60/0x90
>     ctx_resched+0x68/0xb0
>     perf_event_exec+0x138/0x508
>     begin_new_exec+0x52c/0xd40
>     load_elf_binary+0x6b8/0x17d0
>     bprm_execve+0x360/0x7f8
>     do_execveat_common.isra.47+0x218/0x238
>     __arm64_sys_execve+0x48/0x60
>     invoke_syscall+0x4c/0x110
>     el0_svc_common.constprop.4+0xfc/0x120
>     do_el0_svc+0x34/0xc0
>     el0_svc+0x40/0x98
>     el0t_64_sync_handler+0x98/0xc0
>     el0t_64_sync+0x170/0x174
>
> Fix the issue by removing the runtime PM calls completely. They are not
> needed here because it must have already been done when building the
> path for a trace.
>
> Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
> Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
> Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
> Signed-off-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  drivers/hwtracing/coresight/coresight-cti-core.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index 8988b2ed2ea6..c5f7fc4e2552 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -94,7 +94,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
>         unsigned long flags;
>         int rc = 0;
>
> -       pm_runtime_get_sync(dev->parent);
>         spin_lock_irqsave(&drvdata->spinlock, flags);
>
>         /* no need to do anything if enabled or unpowered*/
> @@ -119,7 +118,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
>         /* cannot enable due to error */
>  cti_err_not_enabled:
>         spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -       pm_runtime_put(dev->parent);
>         return rc;
>  }
>
> @@ -175,7 +173,6 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
>         coresight_disclaim_device_unlocked(csdev);
>         CS_LOCK(drvdata->base);
>         spin_unlock(&drvdata->spinlock);
> -       pm_runtime_put(dev->parent);
>         return 0;
>
>         /* not disabled this call */
> --
> 2.28.0
>
Suzuki K Poulose Oct. 19, 2022, 2:51 p.m. UTC | #3
Hi James

On 05/10/2022 14:14, James Clark wrote:
> cti_enable_hw() and cti_disable_hw() are called from an atomic context
> so shouldn't use runtime PM because it can result in a sleep when
> communicating with firmware.
> 
> Since commit 3c6656337852 ("Revert "firmware: arm_scmi: Add clock
> management to the SCMI power domain""), this causes a hang on Juno when
> running the Perf Coresight tests or running this command:
> 
>    perf record -e cs_etm//u -- ls
> 
> This was also missed until the revert commit because pm_runtime_put()
> was called with the wrong device until commit 692c9a499b28 ("coresight:
> cti: Correct the parameter for pm_runtime_put")
> 
> With lock and scheduler debugging enabled the following is output:
> 

...

> 
> Fix the issue by removing the runtime PM calls completely. They are not
> needed here because it must have already been done when building the
> path for a trace.
> 
> Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
> Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
> Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
> Signed-off-by: Suzuki Poulose <Suzuki.Poulose@arm.com>

I have applied this patch, changing the above to:

Suggested-by: ...

Suzuki
diff mbox series

Patch

diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index 8988b2ed2ea6..c5f7fc4e2552 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -94,7 +94,6 @@  static int cti_enable_hw(struct cti_drvdata *drvdata)
 	unsigned long flags;
 	int rc = 0;
 
-	pm_runtime_get_sync(dev->parent);
 	spin_lock_irqsave(&drvdata->spinlock, flags);
 
 	/* no need to do anything if enabled or unpowered*/
@@ -119,7 +118,6 @@  static int cti_enable_hw(struct cti_drvdata *drvdata)
 	/* cannot enable due to error */
 cti_err_not_enabled:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-	pm_runtime_put(dev->parent);
 	return rc;
 }
 
@@ -175,7 +173,6 @@  static int cti_disable_hw(struct cti_drvdata *drvdata)
 	coresight_disclaim_device_unlocked(csdev);
 	CS_LOCK(drvdata->base);
 	spin_unlock(&drvdata->spinlock);
-	pm_runtime_put(dev->parent);
 	return 0;
 
 	/* not disabled this call */