diff mbox

ARM: hw_breakpoint: Enable debug powerdown only if system supports 'has_ossr'

Message ID 5146B972.1010002@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Santosh Shilimkar March 18, 2013, 6:51 a.m. UTC
On Friday 15 March 2013 10:30 AM, Will Deacon wrote:
> On Thu, Mar 14, 2013 at 01:08:00PM +0530, Santosh Shilimkar wrote:
>> Will,
> 
> Hi guys,
> 
> I'm out of the office at the moment and have really terrible connectivity,
> so I can't do too much until next week. However, I don't think adding the
> has_ossr check is the right fix for this problem.
> 
>> On Wednesday 13 March 2013 05:59 PM, Lokesh Vutla wrote:
>>> Hi Dietmar,
>>> On Wednesday 13 March 2013 05:35 PM, Dietmar Eggemann wrote:
>>>> On 13/03/13 06:52, Lokesh Vutla wrote:
>>>>> Commit {9a6eb31 ARM: hw_breakpoint: Debug powerdown support for
>>>>> self-hosted
>>>>> debug} introduces debug powerdown support for self-hosted debug.
>>>>> While merging the patch 'has_ossr' check was removed which
>>>>> was needed for hardwares which doesn't support self-hosted debug.
>>>>> Pandaboard (A9) is one such hardware and Dietmar's orginial
>>>>> patch did mention this issue.
>>>>> Without that check on Panda with CPUIDLE enabled, a flood of
>>>>> below messages thrown.
>>>>>
>>>>> [ 3.597930] hw-breakpoint: CPU 0 failed to disable vector catch
>>>>> [ 3.597991] hw-breakpoint: CPU 1 failed to disable vector catch
> 
> Ok, so this means that we've taken an undefined instruction exception while
> trying to reset the debug registers on the PM_EXIT path. Now, the code there
> deals with CPUs that don't have the save/restore registers just fine, so
> that shouldn't have anything to do with this problem, particularly if the
> bit that is tripping us up is related to clearing vector catch.
>
Agree.
 
> Furthermore, I was under the impression that hw_breakpoint did actually
> work on panda, which implies that a cold boot *does* manage to reset the
> registers (can you please confirm this by looking in your dmesg during
> boot?). In that case, it seems as though a PM cycle is powering down a
> bunch of debug logic that was powered up during boot, and then we trip over
> because we can't access the register bank.
> 
Actually it seems to be without PM. Thanks to analysis from Lokesh, the issue
can be seen even with just suspend or cpu hotplug. So cold boot as such is
fine.

> The proper solution to this problem requires us to establish exactly what is
> turning off the debug registers, and then having an OMAP PM notifier to
> enable it again. Assuming this has always been the case, I expect hardware
> debug across PM fails silently with older kernels.
> 
This has been always the case it seems with CPU power cycle.
After the CPU is power cycled, 'DBGAUTHSTATUS' reads '0xaa' rather
than '0xaf' which means 'DBGEN = 0' and hence code fails to enable
monitor mode. This happens on both secure and GP devices and it can not
be patched since the secure code is ROM'ed. We didn't notice so far
because hw_breakpoint support was not default enabled on OMAP till the
multi-platform build.

>> I was also wondering whether we should just warn once rather
>> than continuous warnings in the notifier. Patch is end of the
>> email.
> 
> Could do, but I'd like to see a fix for the real issue before we simply hide
> the warnings :)
> 
Agree here too. As evident above, the feature won't work on OMAP4
devices with PM and hence some solution is needed.

What you think of below ?


From d74b4264f6a5967b0f7ada96aad77ab0ac30dbed Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@ti.com>
Date: Mon, 18 Mar 2013 11:59:04 +0530
Subject: [PATCH] ARM: hw_breakpoints: Check for CPU debug availability before
 enabling it

CPU debug features like hardware break, watchpoints can be used only when
the debug mode is enabled and available for non-secure mode.

Hence check 'DBGAUTHSTATUS.DBGEN' before proceeding to enable the
features.

Thanks to Will for pointers and Lokesh for the analysis of the issue on
OMAP4 where after a CPU power cycle, debug mode gets disabled.

Cc: Will Deacon <Will.Deacon@arm.com>

Tested-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/kernel/hw_breakpoint.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Will Deacon March 18, 2013, 3:07 p.m. UTC | #1
Hi Santosh,

On Mon, Mar 18, 2013 at 06:51:30AM +0000, Santosh Shilimkar wrote:
> On Friday 15 March 2013 10:30 AM, Will Deacon wrote:
> > Furthermore, I was under the impression that hw_breakpoint did actually
> > work on panda, which implies that a cold boot *does* manage to reset the
> > registers (can you please confirm this by looking in your dmesg during
> > boot?). In that case, it seems as though a PM cycle is powering down a
> > bunch of debug logic that was powered up during boot, and then we trip over
> > because we can't access the register bank.
> > 
> Actually it seems to be without PM. Thanks to analysis from Lokesh, the issue
> can be seen even with just suspend or cpu hotplug. So cold boot as such is
> fine.

Right, so what you're saying is that monitor-mode hardware debug only works
until the first pm/suspend/hotplug operation, then it's dead in the water?

> > The proper solution to this problem requires us to establish exactly what is
> > turning off the debug registers, and then having an OMAP PM notifier to
> > enable it again. Assuming this has always been the case, I expect hardware
> > debug across PM fails silently with older kernels.
> > 
> This has been always the case it seems with CPU power cycle.
> After the CPU is power cycled, 'DBGAUTHSTATUS' reads '0xaa' rather
> than '0xaf' which means 'DBGEN = 0' and hence code fails to enable
> monitor mode. This happens on both secure and GP devices and it can not
> be patched since the secure code is ROM'ed. We didn't notice so far
> because hw_breakpoint support was not default enabled on OMAP till the
> multi-platform build.

That really sucks :( Does this affect all OMAP-based boards?

> >> I was also wondering whether we should just warn once rather
> >> than continuous warnings in the notifier. Patch is end of the
> >> email.
> > 
> > Could do, but I'd like to see a fix for the real issue before we simply hide
> > the warnings :)
> > 
> Agree here too. As evident above, the feature won't work on OMAP4
> devices with PM and hence some solution is needed.
> 
> What you think of below ?

Comments inline...

> 
> From d74b4264f6a5967b0f7ada96aad77ab0ac30dbed Mon Sep 17 00:00:00 2001
> From: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Date: Mon, 18 Mar 2013 11:59:04 +0530
> Subject: [PATCH] ARM: hw_breakpoints: Check for CPU debug availability before
>  enabling it
> 
> CPU debug features like hardware break, watchpoints can be used only when
> the debug mode is enabled and available for non-secure mode.
> 
> Hence check 'DBGAUTHSTATUS.DBGEN' before proceeding to enable the
> features.
> 
> Thanks to Will for pointers and Lokesh for the analysis of the issue on
> OMAP4 where after a CPU power cycle, debug mode gets disabled.
> 
> Cc: Will Deacon <Will.Deacon@arm.com>
> 
> Tested-by: Lokesh Vutla <lokeshvutla@ti.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/kernel/hw_breakpoint.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
> index 96093b7..683a7cf 100644
> --- a/arch/arm/kernel/hw_breakpoint.c
> +++ b/arch/arm/kernel/hw_breakpoint.c
> @@ -930,6 +930,14 @@ static void reset_ctrl_regs(void *unused)
>  	int i, raw_num_brps, err = 0, cpu = smp_processor_id();
>  	u32 val;
>  
> +	/* Check if we have access to CPU debug features */
> +	ARM_DBG_READ(c7, c14, 6, val);
> +	if ((val & 0x1) == 0) {
> +		pr_warn_once("CPU %d debug is unavailable\n", cpu);
> +		cpumask_or(&debug_err_mask, &debug_err_mask, cpumask_of(cpu));
> +		return;
> +	}

There are a few of problems here:

	1. That is only checking for non-secure access, which precludes
	   running Linux in secure mode.

	2. DBGAUTHSTATUS accesses are UNPREDICTABLE when the double-lock is
	   set for v7.1 processors.

	3. DBGAUTHSTATUS doesn't exist in ARMv6.

	4. CPUs without the security extensions have DBGAUTHSTATUS.NSE == 0

	5. Accessing DBGAUTHSTATUS requires DBGEN to be driven high.
	   Assuming that your pr_warn_once is emitted, this implies that
	   DBGEN is driven high from cold boot, yet the NSE bit is low,
	   implying that DBGEN is also low. That's contradictory, so I have
	   no idea what's going on...

Apart from that, it's fine!

Will
Santosh Shilimkar March 18, 2013, 3:46 p.m. UTC | #2
On Monday 18 March 2013 08:37 PM, Will Deacon wrote:
> Hi Santosh,
> 
> On Mon, Mar 18, 2013 at 06:51:30AM +0000, Santosh Shilimkar wrote:
>> On Friday 15 March 2013 10:30 AM, Will Deacon wrote:
>>> Furthermore, I was under the impression that hw_breakpoint did actually
>>> work on panda, which implies that a cold boot *does* manage to reset the
>>> registers (can you please confirm this by looking in your dmesg during
>>> boot?). In that case, it seems as though a PM cycle is powering down a
>>> bunch of debug logic that was powered up during boot, and then we trip over
>>> because we can't access the register bank.
>>>
>> Actually it seems to be without PM. Thanks to analysis from Lokesh, the issue
>> can be seen even with just suspend or cpu hotplug. So cold boot as such is
>> fine.
> 
> Right, so what you're saying is that monitor-mode hardware debug only works
> until the first pm/suspend/hotplug operation, then it's dead in the water?
> 
>>> The proper solution to this problem requires us to establish exactly what is
>>> turning off the debug registers, and then having an OMAP PM notifier to
>>> enable it again. Assuming this has always been the case, I expect hardware
>>> debug across PM fails silently with older kernels.
>>>
>> This has been always the case it seems with CPU power cycle.
>> After the CPU is power cycled, 'DBGAUTHSTATUS' reads '0xaa' rather
>> than '0xaf' which means 'DBGEN = 0' and hence code fails to enable
>> monitor mode. This happens on both secure and GP devices and it can not
>> be patched since the secure code is ROM'ed. We didn't notice so far
>> because hw_breakpoint support was not default enabled on OMAP till the
>> multi-platform build.
> 
> That really sucks :( Does this affect all OMAP-based boards?
> 
All OMAP4 based boards..

>>>> I was also wondering whether we should just warn once rather
>>>> than continuous warnings in the notifier. Patch is end of the
>>>> email.
>>>
>>> Could do, but I'd like to see a fix for the real issue before we simply hide
>>> the warnings :)
>>>
>> Agree here too. As evident above, the feature won't work on OMAP4
>> devices with PM and hence some solution is needed.
>>
>> What you think of below ?
> 
> Comments inline...
> 
>>
>> From d74b4264f6a5967b0f7ada96aad77ab0ac30dbed Mon Sep 17 00:00:00 2001
>> From: Santosh Shilimkar <santosh.shilimkar@ti.com>
>> Date: Mon, 18 Mar 2013 11:59:04 +0530
>> Subject: [PATCH] ARM: hw_breakpoints: Check for CPU debug availability before
>>  enabling it
>>
>> CPU debug features like hardware break, watchpoints can be used only when
>> the debug mode is enabled and available for non-secure mode.
>>
>> Hence check 'DBGAUTHSTATUS.DBGEN' before proceeding to enable the
>> features.
>>
>> Thanks to Will for pointers and Lokesh for the analysis of the issue on
>> OMAP4 where after a CPU power cycle, debug mode gets disabled.
>>
>> Cc: Will Deacon <Will.Deacon@arm.com>
>>
>> Tested-by: Lokesh Vutla <lokeshvutla@ti.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
>> ---
>>  arch/arm/kernel/hw_breakpoint.c |    8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
>> index 96093b7..683a7cf 100644
>> --- a/arch/arm/kernel/hw_breakpoint.c
>> +++ b/arch/arm/kernel/hw_breakpoint.c
>> @@ -930,6 +930,14 @@ static void reset_ctrl_regs(void *unused)
>>  	int i, raw_num_brps, err = 0, cpu = smp_processor_id();
>>  	u32 val;
>>  
>> +	/* Check if we have access to CPU debug features */
>> +	ARM_DBG_READ(c7, c14, 6, val);
>> +	if ((val & 0x1) == 0) {
>> +		pr_warn_once("CPU %d debug is unavailable\n", cpu);
>> +		cpumask_or(&debug_err_mask, &debug_err_mask, cpumask_of(cpu));
>> +		return;
>> +	}
> 
> There are a few of problems here:
> 
> 	1. That is only checking for non-secure access, which precludes
> 	   running Linux in secure mode.
> 
We can check bit 4 as well to take care linux running in secure mode.

> 	2. DBGAUTHSTATUS accesses are UNPREDICTABLE when the double-lock is
> 	   set for v7.1 processors.
>
> 	3. DBGAUTHSTATUS doesn't exist in ARMv6.
> 
We cans skip the code for these versions like...
	if (!ARM_DEBUG_ARCH_V7_ECP14 == get_debug_arch())
		goto skip_dbgsts_read;

> 	4. CPUs without the security extensions have DBGAUTHSTATUS.NSE == 0
>
Which bit is that ? I don't find this bit in Cortex-A9 docs. With all
these checks, may be a separate function like 'is_debug_available()'
would be better.
 
> 	5. Accessing DBGAUTHSTATUS requires DBGEN to be driven high.
> 	   Assuming that your pr_warn_once is emitted, this implies that
> 	   DBGEN is driven high from cold boot, yet the NSE bit is low,
> 	   implying that DBGEN is also low. That's contradictory, so I have
> 	   no idea what's going on...
>
Me neither. The warning is emitted at least.
 
> Apart from that, it's fine!
> 
If you agree, I can update patch accordingly.
BTW, do you have any better trick to take care of
above scenraio ?

Regards,
Santosh
Will Deacon March 18, 2013, 5:06 p.m. UTC | #3
On Mon, Mar 18, 2013 at 03:46:28PM +0000, Santosh Shilimkar wrote:
> On Monday 18 March 2013 08:37 PM, Will Deacon wrote:
> > That really sucks :( Does this affect all OMAP-based boards?
> > 
> All OMAP4 based boards..

Brilliant. Is there any way that the secure code can be fixed in future
products?

> >> +	/* Check if we have access to CPU debug features */
> >> +	ARM_DBG_READ(c7, c14, 6, val);
> >> +	if ((val & 0x1) == 0) {
> >> +		pr_warn_once("CPU %d debug is unavailable\n", cpu);
> >> +		cpumask_or(&debug_err_mask, &debug_err_mask, cpumask_of(cpu));
> >> +		return;
> >> +	}
> > 
> > There are a few of problems here:
> > 
> > 	1. That is only checking for non-secure access, which precludes
> > 	   running Linux in secure mode.
> > 
> We can check bit 4 as well to take care linux running in secure mode.

It still doesn't help unless we know whether Linux is running secure or
non-secure.

> > 	2. DBGAUTHSTATUS accesses are UNPREDICTABLE when the double-lock is
> > 	   set for v7.1 processors.
> >
> > 	3. DBGAUTHSTATUS doesn't exist in ARMv6.
> > 
> We cans skip the code for these versions like...
> 	if (!ARM_DEBUG_ARCH_V7_ECP14 == get_debug_arch())
> 		goto skip_dbgsts_read;

Sure, I was just pointing out that the code needs fixing for this.

> > 	4. CPUs without the security extensions have DBGAUTHSTATUS.NSE == 0
> >
> Which bit is that ? I don't find this bit in Cortex-A9 docs. With all
> these checks, may be a separate function like 'is_debug_available()'
> would be better.

NSE is bit 0 (the one you're checking).

>  
> > 	5. Accessing DBGAUTHSTATUS requires DBGEN to be driven high.
> > 	   Assuming that your pr_warn_once is emitted, this implies that
> > 	   DBGEN is driven high from cold boot, yet the NSE bit is low,
> > 	   implying that DBGEN is also low. That's contradictory, so I have
> > 	   no idea what's going on...
> >
> Me neither. The warning is emitted at least.

Any chance you could follow up with your firmware/hardware guys about this
please? I'd really like to understand how we end up in this state in case we
can do something in the architecture to stop it from happening in future.

> > Apart from that, it's fine!
> > 
> If you agree, I can update patch accordingly.
> BTW, do you have any better trick to take care of
> above scenraio ?

Well, we could just add the warn_once prints but that doesn't stop debug
from breaking after the first pm/suspend/hotplug operation.

Will
diff mbox

Patch

diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index 96093b7..683a7cf 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -930,6 +930,14 @@  static void reset_ctrl_regs(void *unused)
 	int i, raw_num_brps, err = 0, cpu = smp_processor_id();
 	u32 val;
 
+	/* Check if we have access to CPU debug features */
+	ARM_DBG_READ(c7, c14, 6, val);
+	if ((val & 0x1) == 0) {
+		pr_warn_once("CPU %d debug is unavailable\n", cpu);
+		cpumask_or(&debug_err_mask, &debug_err_mask, cpumask_of(cpu));
+		return;
+	}
+
 	/*
 	 * v7 debug contains save and restore registers so that debug state
 	 * can be maintained across low-power modes without leaving the debug