diff mbox

[v9,1/6] arm64: Add macros to manage processor debug state

Message ID CALicx6t7RSS_Zo9RGr76LMnNth1gQ7pukUtAmDY4gWRN5KGVKQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vijay Kilari Feb. 20, 2014, 6:58 a.m. UTC
Hi Will,

On Wed, Feb 19, 2014 at 9:42 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Feb 19, 2014 at 04:03:59PM +0000, Catalin Marinas wrote:
>> On Wed, Feb 19, 2014 at 11:31:57AM +0000, Catalin Marinas wrote:
>> > BUG: scheduling while atomic: kworker/u8:0/6/0x00000002
>> > Modules linked in:
>> > CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 3.14.0-rc3+ #306
>> > Workqueue: khelper __call_usermodehelper
>> > Call trace:
>> > [<ffffffc000087dbc>] dump_backtrace+0x0/0x12c
>> > [<ffffffc000087efc>] show_stack+0x14/0x1c
>> > [<ffffffc00043c224>] dump_stack+0x78/0xc4
>> > [<ffffffc000439b48>] __schedule_bug+0x40/0x54
>> > [<ffffffc00043d67c>] __schedule+0x514/0x604
>> > [<ffffffc00043d794>] schedule+0x28/0x78
>> > [<ffffffc00043cc90>] schedule_timeout+0x170/0x1bc
>> > [<ffffffc00043e16c>] wait_for_common+0xc0/0x14c
>> > [<ffffffc00043e280>] wait_for_completion_killable+0x14/0x28
>> > [<ffffffc0000942f8>] do_fork+0x158/0x2a8
>> > [<ffffffc000094478>] kernel_thread+0x30/0x38
>> > [<ffffffc0000a842c>] __call_usermodehelper+0x34/0xa8
>> > [<ffffffc0000ab300>] process_one_work+0x118/0x354
>> > [<ffffffc0000abfcc>] worker_thread+0x13c/0x3c0
>> > [<ffffffc0000b1e84>] kthread+0xd4/0xe8
>> >
>> >
>> > It gets much worse if I run with two CPUs and CONFIG_KGDB_KDB enabled
>> > (but fine with a single CPU).
>> >
>> > So no need to post another series for now but please check the multi-CPU
>> > case as well and send a separate fix. I'll dig a bit on my side as well.
>>
>> So far I'm done with the investigation. It looks to me like one of the
>> kgdb tests, kgdb core or the arm64 back-end (or maybe more than one) is
>> not SMP safe. The errors either appear or disappear based on the printks
>> I put through the kgdb test or other config options which I enable.
>>
>> Could you please look into making the kgdb back-end SMP-safe?
>
> There are certainly potential SMP problems in the back-end, which I asked
> about in the initial series:
>
>   http://lkml.kernel.org/r/CALicx6v1eGHRwWPrjzihzBZxCu8t1vpMoq-YfutSm4mRmP6gEQ@mail.gmail.com
>
> The reply from Vijay suggested that everything is confined to a single CPU.
>

  This issue is seen only when kgdb test is triggered on secondary cpu's.
when executed on cpu0 issue is not seen. Though cpu0 triggers kgdb tests
the do_fork test might trigger debug exception on other cpu's as well,
which fails.

For the patch 1 you suggested me to call local_dbg_enable() from
clear_os_mask() function
as below

static void clear_os_lock(void *unused)
{
        asm volatile("msr oslar_el1, %0" : : "r" (0));
        isb();
        local_dbg_enable();
}

This is an SMP call. So when more than one core is enabled, this
local_dbg_enable() is called
from SMP call context, which is el1_irq context for secondary cpu's.
So PSTATE.D flag is unmasked
only in irq context but not in normal context.

For CPU0 this clear_os_lock() is not called from smp call. it is
directly called in debug_monitors_init()
call. So PSTATE.D is unmasked for CPU0 and hence kgdb tests passed
when triggered only on
cpu0.

static int debug_monitors_init(void)
{
        /* Clear the OS lock. */
        smp_call_function(clear_os_lock, NULL, 1);
        clear_os_lock(NULL);

        /* Register hotplug handler. */
        register_cpu_notifier(&os_lock_nb);
        return 0;
}

When I made below patch it works. I have tested with 4 cores on foundation model

kgdbts:RUN plant and detach test
kgdbts:RUN sw breakpoint test
kgdbts:RUN bad memory access test
kgdbts:RUN singlestep test 1000 iterations
kgdbts:RUN singlestep [0/1000]
kgdbts:RUN singlestep [100/1000]
kgdbts:RUN singlestep [200/1000]
kgdbts:RUN singlestep [300/1000]
kgdbts:RUN singlestep [400/1000]
kgdbts:RUN singlestep [500/1000]
kgdbts:RUN singlestep [600/1000]
kgdbts:RUN singlestep [700/1000]
kgdbts:RUN singlestep [800/1000]
kgdbts:RUN singlestep [900/1000]
kgdbts:RUN do_fork for 100 breakpoints
smc91x: not found (-19).
mousedev: PS/2 mouse device common for all mice
TCP: cubic registered
NET: Registered protocol family 17
drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
EXT4-fs (vda2): recovery complete
EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext4 filesystem) on device 254:2.
Freeing unused kernel memory: 256K (ffffffc0005d1000 - ffffffc000611000)
udevd[476]: starting version 182
EXT4-fs (vda2): re-mounted. Opts: data=ordered
random: dd urandom read with 11 bits of entropy available
rpcbind: cannot create socket for udp6
rpcbind: cannot create socket for tcp6
rpcbind: cannot get uid of '': Success

Last login: Mon Jan 27 08:00:01 UTC 2014 on tty1
root@genericarmv8:~# kgdb: Unregistered I/O driver kgdbts, debugger disabled.

root@genericarmv8:~# uname -a
Linux genericarmv8 3.14.0-rc3+ #18 SMP PREEMPT Thu Feb 20 11:53:17 IST
2014 aarch64 GNU/Linux
root@genericarmv8:~# cat /proc/cpuinfo
Processor       : AArch64 Processor rev 0 (aarch64)
processor       : 0
processor       : 1
processor       : 2
processor       : 3
Features        : fp asimd evtstrm
CPU implementer : 0x41
CPU architecture: AArch64
CPU variant     : 0x0
CPU part        : 0xd00
CPU revision    : 0

Hardware        : Foundation-v8A
root@genericarmv8:~#

- Vijay

> Will

Comments

Will Deacon Feb. 20, 2014, 11:42 a.m. UTC | #1
On Thu, Feb 20, 2014 at 06:58:25AM +0000, Vijay Kilari wrote:
> Hi Will,

Hi Vijay,

> On Wed, Feb 19, 2014 at 9:42 PM, Will Deacon <will.deacon@arm.com> wrote:
> > The reply from Vijay suggested that everything is confined to a single CPU.
> 
>   This issue is seen only when kgdb test is triggered on secondary cpu's.
> when executed on cpu0 issue is not seen. Though cpu0 triggers kgdb tests
> the do_fork test might trigger debug exception on other cpu's as well,
> which fails.
> 
> For the patch 1 you suggested me to call local_dbg_enable() from
> clear_os_mask() function
> as below
> 
> static void clear_os_lock(void *unused)
> {
>         asm volatile("msr oslar_el1, %0" : : "r" (0));
>         isb();
>         local_dbg_enable();
> }
> 
> This is an SMP call. So when more than one core is enabled, this
> local_dbg_enable() is called
> from SMP call context, which is el1_irq context for secondary cpu's.
> So PSTATE.D flag is unmasked
> only in irq context but not in normal context.

Aha, well spotted!

> For CPU0 this clear_os_lock() is not called from smp call. it is
> directly called in debug_monitors_init()
> call. So PSTATE.D is unmasked for CPU0 and hence kgdb tests passed
> when triggered only on
> cpu0.
> 
> static int debug_monitors_init(void)
> {
>         /* Clear the OS lock. */
>         smp_call_function(clear_os_lock, NULL, 1);
>         clear_os_lock(NULL);
> 
>         /* Register hotplug handler. */
>         register_cpu_notifier(&os_lock_nb);
>         return 0;
> }
> 
> When I made below patch it works. I have tested with 4 cores on foundation model

The main change here is that we enable debug exceptions before we unlock the
os lock. That should be fine, since everything apart from software
breakpoint exceptions are masked when the lock is locked.

> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -160,6 +160,8 @@ asmlinkage void secondary_start_kernel(void)
>         set_cpu_online(cpu, true);
>         complete(&cpu_running);
> 
> +       local_dbg_enable();
>         local_irq_enable();
>         local_async_enable();

The only thing to add then moving the isb(); local_dbg_enable() out of
clear_os_lock and into debug_monitors_init. You can probably make the
smp_call_function an on_each_cpu too.

Does that make sense?

Will
Catalin Marinas Feb. 20, 2014, 11:52 a.m. UTC | #2
On Thu, Feb 20, 2014 at 06:58:25AM +0000, Vijay Kilari wrote:
> When I made below patch it works. I have tested with 4 cores on foundation model
> 
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -160,6 +160,8 @@ asmlinkage void secondary_start_kernel(void)
>         set_cpu_online(cpu, true);
>         complete(&cpu_running);
> 
> +       local_dbg_enable();
>         local_irq_enable();
>         local_async_enable();

I tested this as well and seems to work fine. I'll let Will comment on
whether this fix is enough and if yes, I'll fold it into one of your
patches (or push it on top of them).
diff mbox

Patch

--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -160,6 +160,8 @@  asmlinkage void secondary_start_kernel(void)
        set_cpu_online(cpu, true);
        complete(&cpu_running);

+       local_dbg_enable();
        local_irq_enable();
        local_async_enable();

kgdb: Registered I/O driver kgdbts.