diff mbox series

[v4,9/9] riscv: Fixup lockdep_assert_held(&text_mutex) in patch_insn_write

Message ID 1602918377-23573-10-git-send-email-guoren@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add k/uprobe & fentry & error_injection supported | expand

Commit Message

Guo Ren Oct. 17, 2020, 7:06 a.m. UTC
From: Guo Ren <guoren@linux.alibaba.com>

It will cause warning messages:
echo function_graph > /sys/kernel/debug/tracing/current_tracer
[   47.691397] ------------[ cut here ]------------
[   47.692899] WARNING: CPU: 0 PID: 11 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x182/0x19a
[   47.694483] Modules linked in:
[   47.695754] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
[   47.698228] epc: ffffffe000204530 ra : ffffffe00020452c sp : ffffffe0023ffc20 gp : ffffffe0013e1fe0 tp : ffffffe0023e4e00 t0 : 0000000000000000
[   47.701872]  t1 : 000000000000000e t2 : 000000000000001b s0 : ffffffe0023ffc70 s1 : ffffffe000206850 a0 : 0000000000000000 a1 : ffffffffffffffff
[   47.705550]  a2 : 0000000000000000 a3 : ffffffe03af7c5e8 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 00000000150b02d8
[   47.709159]  s2 : 0000000000000008 s3 : 0000000000000858 s4 : ffffffe0023ffc98 s5 : 0000000000000850 s6 : 0000000000000003 s7 : 0000000000000002
[   47.714135]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 s11: 0000000000000003 t3 : ffffffffffffe000 t4 : 0000000000d86254
[   47.716574]  t5 : 0000000000000005 t6 : 0000000000040000status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
[   47.720019] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
[   47.722074] Call Trace:
[   47.722561] [<ffffffe000203c88>] walk_stackframe+0x0/0xc2
[   47.724608] [<ffffffe000b2020a>] show_stack+0x46/0x52
[   47.726246] [<ffffffe000b23c24>] dump_stack+0x90/0xb6
[   47.727672] [<ffffffe00020c9c4>] __warn+0x98/0xfa
[   47.729131] [<ffffffe0005f4be0>] report_bug+0xaa/0x11e
[   47.730624] [<ffffffe000203b74>] do_trap_break+0x96/0xfe
[   47.732448] [<ffffffe000201daa>] ret_from_exception+0x0/0x14
[   47.734341] [<ffffffe00020452c>] patch_insn_write+0x17e/0x19a
[   47.737235] irq event stamp: 39
[   47.738521] hardirqs last  enabled at (39): [<ffffffe000201d0a>] _save_context+0xa2/0xe6
[   47.741055] hardirqs last disabled at (38): [<ffffffe0002c2404>] multi_cpu_stop+0x130/0x166
[   47.743551] softirqs last  enabled at (0): [<ffffffe00020a4d2>] copy_process+0x430/0x1316
[   47.746031] softirqs last disabled at (0): [<0000000000000000>] 0x0
[   47.748617] ---[ end trace 88a1054faa6524ef ]---

Because the path of stop_machine(__ftrace_modify_code)->
ftrace_modify_all_code->...->patch_insn_write has no pair of
lock&unlock text_mutex, so we shouldn't put assert here.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
---
 arch/riscv/kernel/patch.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

Masami Hiramatsu (Google) Oct. 19, 2020, 8:36 a.m. UTC | #1
On Sat, 17 Oct 2020 07:06:17 +0000
guoren@kernel.org wrote:

> From: Guo Ren <guoren@linux.alibaba.com>
> 
> It will cause warning messages:
> echo function_graph > /sys/kernel/debug/tracing/current_tracer
> [   47.691397] ------------[ cut here ]------------
> [   47.692899] WARNING: CPU: 0 PID: 11 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x182/0x19a
> [   47.694483] Modules linked in:
> [   47.695754] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> [   47.698228] epc: ffffffe000204530 ra : ffffffe00020452c sp : ffffffe0023ffc20 gp : ffffffe0013e1fe0 tp : ffffffe0023e4e00 t0 : 0000000000000000
> [   47.701872]  t1 : 000000000000000e t2 : 000000000000001b s0 : ffffffe0023ffc70 s1 : ffffffe000206850 a0 : 0000000000000000 a1 : ffffffffffffffff
> [   47.705550]  a2 : 0000000000000000 a3 : ffffffe03af7c5e8 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 00000000150b02d8
> [   47.709159]  s2 : 0000000000000008 s3 : 0000000000000858 s4 : ffffffe0023ffc98 s5 : 0000000000000850 s6 : 0000000000000003 s7 : 0000000000000002
> [   47.714135]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 s11: 0000000000000003 t3 : ffffffffffffe000 t4 : 0000000000d86254
> [   47.716574]  t5 : 0000000000000005 t6 : 0000000000040000status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
> [   47.720019] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> [   47.722074] Call Trace:
> [   47.722561] [<ffffffe000203c88>] walk_stackframe+0x0/0xc2
> [   47.724608] [<ffffffe000b2020a>] show_stack+0x46/0x52
> [   47.726246] [<ffffffe000b23c24>] dump_stack+0x90/0xb6
> [   47.727672] [<ffffffe00020c9c4>] __warn+0x98/0xfa
> [   47.729131] [<ffffffe0005f4be0>] report_bug+0xaa/0x11e
> [   47.730624] [<ffffffe000203b74>] do_trap_break+0x96/0xfe
> [   47.732448] [<ffffffe000201daa>] ret_from_exception+0x0/0x14
> [   47.734341] [<ffffffe00020452c>] patch_insn_write+0x17e/0x19a
> [   47.737235] irq event stamp: 39
> [   47.738521] hardirqs last  enabled at (39): [<ffffffe000201d0a>] _save_context+0xa2/0xe6
> [   47.741055] hardirqs last disabled at (38): [<ffffffe0002c2404>] multi_cpu_stop+0x130/0x166
> [   47.743551] softirqs last  enabled at (0): [<ffffffe00020a4d2>] copy_process+0x430/0x1316
> [   47.746031] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   47.748617] ---[ end trace 88a1054faa6524ef ]---
> 
> Because the path of stop_machine(__ftrace_modify_code)->
> ftrace_modify_all_code->...->patch_insn_write has no pair of
> lock&unlock text_mutex, so we shouldn't put assert here.

Hm, are there any chance to get the text_mutex in that path?
There are ftrace_arch_code_modify_prepare/post_process() in arch/riscv/kernel/ftrace.c,
aren't those used?

Thank you,

> 
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> ---
>  arch/riscv/kernel/patch.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 0b55287..2ee5063 100644
> --- a/arch/riscv/kernel/patch.c
> +++ b/arch/riscv/kernel/patch.c
> @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
>  	bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
>  	int ret;
>  
> -	/*
> -	 * Before reaching here, it was expected to lock the text_mutex
> -	 * already, so we don't need to give another lock here and could
> -	 * ensure that it was safe between each cores.
> -	 */
> -	lockdep_assert_held(&text_mutex);
> -
>  	if (across_pages)
>  		patch_map(addr + len, FIX_TEXT_POKE1);
>  
> -- 
> 2.7.4
>
Guo Ren Oct. 19, 2020, 11:03 a.m. UTC | #2
Good Idea! I'll try :P

On Mon, Oct 19, 2020 at 4:36 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Sat, 17 Oct 2020 07:06:17 +0000
> guoren@kernel.org wrote:
>
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > It will cause warning messages:
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > [   47.691397] ------------[ cut here ]------------
> > [   47.692899] WARNING: CPU: 0 PID: 11 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x182/0x19a
> > [   47.694483] Modules linked in:
> > [   47.695754] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> > [   47.698228] epc: ffffffe000204530 ra : ffffffe00020452c sp : ffffffe0023ffc20 gp : ffffffe0013e1fe0 tp : ffffffe0023e4e00 t0 : 0000000000000000
> > [   47.701872]  t1 : 000000000000000e t2 : 000000000000001b s0 : ffffffe0023ffc70 s1 : ffffffe000206850 a0 : 0000000000000000 a1 : ffffffffffffffff
> > [   47.705550]  a2 : 0000000000000000 a3 : ffffffe03af7c5e8 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 00000000150b02d8
> > [   47.709159]  s2 : 0000000000000008 s3 : 0000000000000858 s4 : ffffffe0023ffc98 s5 : 0000000000000850 s6 : 0000000000000003 s7 : 0000000000000002
> > [   47.714135]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 s11: 0000000000000003 t3 : ffffffffffffe000 t4 : 0000000000d86254
> > [   47.716574]  t5 : 0000000000000005 t6 : 0000000000040000status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [   47.720019] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> > [   47.722074] Call Trace:
> > [   47.722561] [<ffffffe000203c88>] walk_stackframe+0x0/0xc2
> > [   47.724608] [<ffffffe000b2020a>] show_stack+0x46/0x52
> > [   47.726246] [<ffffffe000b23c24>] dump_stack+0x90/0xb6
> > [   47.727672] [<ffffffe00020c9c4>] __warn+0x98/0xfa
> > [   47.729131] [<ffffffe0005f4be0>] report_bug+0xaa/0x11e
> > [   47.730624] [<ffffffe000203b74>] do_trap_break+0x96/0xfe
> > [   47.732448] [<ffffffe000201daa>] ret_from_exception+0x0/0x14
> > [   47.734341] [<ffffffe00020452c>] patch_insn_write+0x17e/0x19a
> > [   47.737235] irq event stamp: 39
> > [   47.738521] hardirqs last  enabled at (39): [<ffffffe000201d0a>] _save_context+0xa2/0xe6
> > [   47.741055] hardirqs last disabled at (38): [<ffffffe0002c2404>] multi_cpu_stop+0x130/0x166
> > [   47.743551] softirqs last  enabled at (0): [<ffffffe00020a4d2>] copy_process+0x430/0x1316
> > [   47.746031] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [   47.748617] ---[ end trace 88a1054faa6524ef ]---
> >
> > Because the path of stop_machine(__ftrace_modify_code)->
> > ftrace_modify_all_code->...->patch_insn_write has no pair of
> > lock&unlock text_mutex, so we shouldn't put assert here.
>
> Hm, are there any chance to get the text_mutex in that path?
> There are ftrace_arch_code_modify_prepare/post_process() in arch/riscv/kernel/ftrace.c,
> aren't those used?
>
> Thank you,
>
> >
> > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > ---
> >  arch/riscv/kernel/patch.c | 7 -------
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > index 0b55287..2ee5063 100644
> > --- a/arch/riscv/kernel/patch.c
> > +++ b/arch/riscv/kernel/patch.c
> > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> >       bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> >       int ret;
> >
> > -     /*
> > -      * Before reaching here, it was expected to lock the text_mutex
> > -      * already, so we don't need to give another lock here and could
> > -      * ensure that it was safe between each cores.
> > -      */
> > -     lockdep_assert_held(&text_mutex);
> > -
> >       if (across_pages)
> >               patch_map(addr + len, FIX_TEXT_POKE1);
> >
> > --
> > 2.7.4
> >
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>
Atish Patra Oct. 19, 2020, 6:54 p.m. UTC | #3
On Sat, Oct 17, 2020 at 12:07 AM <guoren@kernel.org> wrote:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> It will cause warning messages:
> echo function_graph > /sys/kernel/debug/tracing/current_tracer
> [   47.691397] ------------[ cut here ]------------
> [   47.692899] WARNING: CPU: 0 PID: 11 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x182/0x19a
> [   47.694483] Modules linked in:
> [   47.695754] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> [   47.698228] epc: ffffffe000204530 ra : ffffffe00020452c sp : ffffffe0023ffc20 gp : ffffffe0013e1fe0 tp : ffffffe0023e4e00 t0 : 0000000000000000
> [   47.701872]  t1 : 000000000000000e t2 : 000000000000001b s0 : ffffffe0023ffc70 s1 : ffffffe000206850 a0 : 0000000000000000 a1 : ffffffffffffffff
> [   47.705550]  a2 : 0000000000000000 a3 : ffffffe03af7c5e8 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 00000000150b02d8
> [   47.709159]  s2 : 0000000000000008 s3 : 0000000000000858 s4 : ffffffe0023ffc98 s5 : 0000000000000850 s6 : 0000000000000003 s7 : 0000000000000002
> [   47.714135]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 s11: 0000000000000003 t3 : ffffffffffffe000 t4 : 0000000000d86254
> [   47.716574]  t5 : 0000000000000005 t6 : 0000000000040000status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
> [   47.720019] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> [   47.722074] Call Trace:
> [   47.722561] [<ffffffe000203c88>] walk_stackframe+0x0/0xc2
> [   47.724608] [<ffffffe000b2020a>] show_stack+0x46/0x52
> [   47.726246] [<ffffffe000b23c24>] dump_stack+0x90/0xb6
> [   47.727672] [<ffffffe00020c9c4>] __warn+0x98/0xfa
> [   47.729131] [<ffffffe0005f4be0>] report_bug+0xaa/0x11e
> [   47.730624] [<ffffffe000203b74>] do_trap_break+0x96/0xfe
> [   47.732448] [<ffffffe000201daa>] ret_from_exception+0x0/0x14
> [   47.734341] [<ffffffe00020452c>] patch_insn_write+0x17e/0x19a
> [   47.737235] irq event stamp: 39
> [   47.738521] hardirqs last  enabled at (39): [<ffffffe000201d0a>] _save_context+0xa2/0xe6
> [   47.741055] hardirqs last disabled at (38): [<ffffffe0002c2404>] multi_cpu_stop+0x130/0x166
> [   47.743551] softirqs last  enabled at (0): [<ffffffe00020a4d2>] copy_process+0x430/0x1316
> [   47.746031] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   47.748617] ---[ end trace 88a1054faa6524ef ]---
>
> Because the path of stop_machine(__ftrace_modify_code)->
> ftrace_modify_all_code->...->patch_insn_write has no pair of
> lock&unlock text_mutex, so we shouldn't put assert here.
>

What platform are you testing ? We are seeing a crash while enabling
any of the tracers multiple times
on Qemu/HiFive Unleashed.

http://lists.infradead.org/pipermail/linux-riscv/2020-October/002354.html

As per the thread, Zong agreed to take a look into that.
@Zong Li : Do you have any update regarding the ftrace issue.

> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> ---
>  arch/riscv/kernel/patch.c | 7 -------
>  1 file changed, 7 deletions(-)
>
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 0b55287..2ee5063 100644
> --- a/arch/riscv/kernel/patch.c
> +++ b/arch/riscv/kernel/patch.c
> @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
>         bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
>         int ret;
>
> -       /*
> -        * Before reaching here, it was expected to lock the text_mutex
> -        * already, so we don't need to give another lock here and could
> -        * ensure that it was safe between each cores.
> -        */
> -       lockdep_assert_held(&text_mutex);
> -
>         if (across_pages)
>                 patch_map(addr + len, FIX_TEXT_POKE1);
>
> --
> 2.7.4
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Guo Ren Oct. 20, 2020, 11:14 a.m. UTC | #4
On Mon, Oct 19, 2020 at 4:36 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Sat, 17 Oct 2020 07:06:17 +0000
> guoren@kernel.org wrote:
>
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > It will cause warning messages:
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > [   47.691397] ------------[ cut here ]------------
> > [   47.692899] WARNING: CPU: 0 PID: 11 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x182/0x19a
> > [   47.694483] Modules linked in:
> > [   47.695754] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> > [   47.698228] epc: ffffffe000204530 ra : ffffffe00020452c sp : ffffffe0023ffc20 gp : ffffffe0013e1fe0 tp : ffffffe0023e4e00 t0 : 0000000000000000
> > [   47.701872]  t1 : 000000000000000e t2 : 000000000000001b s0 : ffffffe0023ffc70 s1 : ffffffe000206850 a0 : 0000000000000000 a1 : ffffffffffffffff
> > [   47.705550]  a2 : 0000000000000000 a3 : ffffffe03af7c5e8 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 00000000150b02d8
> > [   47.709159]  s2 : 0000000000000008 s3 : 0000000000000858 s4 : ffffffe0023ffc98 s5 : 0000000000000850 s6 : 0000000000000003 s7 : 0000000000000002
> > [   47.714135]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 s11: 0000000000000003 t3 : ffffffffffffe000 t4 : 0000000000d86254
> > [   47.716574]  t5 : 0000000000000005 t6 : 0000000000040000status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [   47.720019] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> > [   47.722074] Call Trace:
> > [   47.722561] [<ffffffe000203c88>] walk_stackframe+0x0/0xc2
> > [   47.724608] [<ffffffe000b2020a>] show_stack+0x46/0x52
> > [   47.726246] [<ffffffe000b23c24>] dump_stack+0x90/0xb6
> > [   47.727672] [<ffffffe00020c9c4>] __warn+0x98/0xfa
> > [   47.729131] [<ffffffe0005f4be0>] report_bug+0xaa/0x11e
> > [   47.730624] [<ffffffe000203b74>] do_trap_break+0x96/0xfe
> > [   47.732448] [<ffffffe000201daa>] ret_from_exception+0x0/0x14
> > [   47.734341] [<ffffffe00020452c>] patch_insn_write+0x17e/0x19a
> > [   47.737235] irq event stamp: 39
> > [   47.738521] hardirqs last  enabled at (39): [<ffffffe000201d0a>] _save_context+0xa2/0xe6
> > [   47.741055] hardirqs last disabled at (38): [<ffffffe0002c2404>] multi_cpu_stop+0x130/0x166
> > [   47.743551] softirqs last  enabled at (0): [<ffffffe00020a4d2>] copy_process+0x430/0x1316
> > [   47.746031] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [   47.748617] ---[ end trace 88a1054faa6524ef ]---
> >
> > Because the path of stop_machine(__ftrace_modify_code)->
> > ftrace_modify_all_code->...->patch_insn_write has no pair of
> > lock&unlock text_mutex, so we shouldn't put assert here.
>
> Hm, are there any chance to get the text_mutex in that path?
> There are ftrace_arch_code_modify_prepare/post_process() in arch/riscv/kernel/ftrace.c,
> aren't those used?
I found ftrace_run_update_code:
 - ftrace_arch_code_modify_prepare()-> mutex_lock(&text_mutex)
 - stop_machine(__ftrace_modify_code, &command, NULL); - >
__ftrace_modify_code -> patch_insn_write -> lockdep_assert_held ->
lockdep_is_held
 - ftrace_arch_code_modify_post_process()-> mutex_unlock(&text_mutex)

Because of stop_machine, __ftrace_modify_code and
ftrace_run_update_code are not the same task. This will cause
lockdep_is_held return 0.

ref to __lock_is_held:
volatile int __lock_is_held(const struct lockdep_map *lock, int read)
{
        struct task_struct *curr = current; // current is from
stop_machine, not ftrace_run_update_code. So never match!
        int i;

        for (i = 0; i < curr->lockdep_depth; i++) {
                struct held_lock *hlock = curr->held_locks + i;

                if (match_held_lock(hlock, lock)) {
                        if (read == -1 || hlock->read == read)
                                return 1;

                        return 0;
                }
        }

        return 0;
}

Does it seem a comm issue? I'm not so expert on lockdep mechanism.

>
> Thank you,
>
> >
> > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > ---
> >  arch/riscv/kernel/patch.c | 7 -------
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > index 0b55287..2ee5063 100644
> > --- a/arch/riscv/kernel/patch.c
> > +++ b/arch/riscv/kernel/patch.c
> > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> >       bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> >       int ret;
> >
> > -     /*
> > -      * Before reaching here, it was expected to lock the text_mutex
> > -      * already, so we don't need to give another lock here and could
> > -      * ensure that it was safe between each cores.
> > -      */
> > -     lockdep_assert_held(&text_mutex);
> > -
> >       if (across_pages)
> >               patch_map(addr + len, FIX_TEXT_POKE1);
> >
> > --
> > 2.7.4
> >
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>
Guo Ren Oct. 20, 2020, 11:18 a.m. UTC | #5
On Tue, Oct 20, 2020 at 2:54 AM Atish Patra <atishp@atishpatra.org> wrote:
>
> On Sat, Oct 17, 2020 at 12:07 AM <guoren@kernel.org> wrote:
> >
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > It will cause warning messages:
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > [   47.691397] ------------[ cut here ]------------
> > [   47.692899] WARNING: CPU: 0 PID: 11 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x182/0x19a
> > [   47.694483] Modules linked in:
> > [   47.695754] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> > [   47.698228] epc: ffffffe000204530 ra : ffffffe00020452c sp : ffffffe0023ffc20 gp : ffffffe0013e1fe0 tp : ffffffe0023e4e00 t0 : 0000000000000000
> > [   47.701872]  t1 : 000000000000000e t2 : 000000000000001b s0 : ffffffe0023ffc70 s1 : ffffffe000206850 a0 : 0000000000000000 a1 : ffffffffffffffff
> > [   47.705550]  a2 : 0000000000000000 a3 : ffffffe03af7c5e8 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 00000000150b02d8
> > [   47.709159]  s2 : 0000000000000008 s3 : 0000000000000858 s4 : ffffffe0023ffc98 s5 : 0000000000000850 s6 : 0000000000000003 s7 : 0000000000000002
> > [   47.714135]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001 s11: 0000000000000003 t3 : ffffffffffffe000 t4 : 0000000000d86254
> > [   47.716574]  t5 : 0000000000000005 t6 : 0000000000040000status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [   47.720019] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-11367-g1054335 #132
> > [   47.722074] Call Trace:
> > [   47.722561] [<ffffffe000203c88>] walk_stackframe+0x0/0xc2
> > [   47.724608] [<ffffffe000b2020a>] show_stack+0x46/0x52
> > [   47.726246] [<ffffffe000b23c24>] dump_stack+0x90/0xb6
> > [   47.727672] [<ffffffe00020c9c4>] __warn+0x98/0xfa
> > [   47.729131] [<ffffffe0005f4be0>] report_bug+0xaa/0x11e
> > [   47.730624] [<ffffffe000203b74>] do_trap_break+0x96/0xfe
> > [   47.732448] [<ffffffe000201daa>] ret_from_exception+0x0/0x14
> > [   47.734341] [<ffffffe00020452c>] patch_insn_write+0x17e/0x19a
> > [   47.737235] irq event stamp: 39
> > [   47.738521] hardirqs last  enabled at (39): [<ffffffe000201d0a>] _save_context+0xa2/0xe6
> > [   47.741055] hardirqs last disabled at (38): [<ffffffe0002c2404>] multi_cpu_stop+0x130/0x166
> > [   47.743551] softirqs last  enabled at (0): [<ffffffe00020a4d2>] copy_process+0x430/0x1316
> > [   47.746031] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [   47.748617] ---[ end trace 88a1054faa6524ef ]---
> >
> > Because the path of stop_machine(__ftrace_modify_code)->
> > ftrace_modify_all_code->...->patch_insn_write has no pair of
> > lock&unlock text_mutex, so we shouldn't put assert here.
> >
>
> What platform are you testing ? We are seeing a crash while enabling
> any of the tracers multiple times
> on Qemu/HiFive Unleashed.
I use qemu for testing. I've changed dynamic ftrace mechanism from
mcount to -fpatchable-entry.

The problem is made by the lockdep checking of text_mutex.

>
> http://lists.infradead.org/pipermail/linux-riscv/2020-October/002354.html
>
> As per the thread, Zong agreed to take a look into that.
> @Zong Li : Do you have any update regarding the ftrace issue.
>
> > Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> > ---
> >  arch/riscv/kernel/patch.c | 7 -------
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > index 0b55287..2ee5063 100644
> > --- a/arch/riscv/kernel/patch.c
> > +++ b/arch/riscv/kernel/patch.c
> > @@ -55,13 +55,6 @@ static int patch_insn_write(void *addr, const void *insn, size_t len)
> >         bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
> >         int ret;
> >
> > -       /*
> > -        * Before reaching here, it was expected to lock the text_mutex
> > -        * already, so we don't need to give another lock here and could
> > -        * ensure that it was safe between each cores.
> > -        */
> > -       lockdep_assert_held(&text_mutex);
> > -
> >         if (across_pages)
> >                 patch_map(addr + len, FIX_TEXT_POKE1);
> >
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
>
> --
> Regards,
> Atish
Steven Rostedt Oct. 20, 2020, 8:41 p.m. UTC | #6
On Tue, 20 Oct 2020 19:18:01 +0800
Guo Ren <guoren@kernel.org> wrote:

> > What platform are you testing ? We are seeing a crash while enabling
> > any of the tracers multiple times
> > on Qemu/HiFive Unleashed.  
> I use qemu for testing. I've changed dynamic ftrace mechanism from
> mcount to -fpatchable-entry.
> 
> The problem is made by the lockdep checking of text_mutex.

If you are switching to "patchable-entry" you shouldn't need to use
stop_machine for the updates. No?

-- Steve
Zong Li Oct. 21, 2020, 7:10 a.m. UTC | #7
On Wed, Oct 21, 2020 at 4:41 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 20 Oct 2020 19:18:01 +0800
> Guo Ren <guoren@kernel.org> wrote:
>
> > > What platform are you testing ? We are seeing a crash while enabling
> > > any of the tracers multiple times
> > > on Qemu/HiFive Unleashed.
> > I use qemu for testing. I've changed dynamic ftrace mechanism from
> > mcount to -fpatchable-entry.
> >
> > The problem is made by the lockdep checking of text_mutex.
>
> If you are switching to "patchable-entry" you shouldn't need to use
> stop_machine for the updates. No?
>
> -- Steve

Hi all,

I'm going to send the patch to fix the problem. Ftrace was broken from
v5.3 kernel version, and only happen on SMP. The problem is caused by
the following two patches:

Commit 4ecf0a43e729a7e641d800c294faabe87378fc05 ("processor: get rid
of cpu_relax_yield")
and
Commit 366237e7b0833faa2d8da7a8d7d7da8c3ca802e5 ("stop_machine:
Provide RCU quiescent state in multi_cpu_stop()")

We have to mark these two functions as notrace. The stopped CPUs
cannot make function calls to traceable functions on RISC-V, the
function call instruction pattern needs two instructions (auipc,
jalr), so there is a change to execute the (auipc + nop) or (nop,
jalr) when patching code.

There is a similar fix as follow:
Commit cb9d7fd51d9fbb329d182423bd7b92d0f8cb0e01 ("watchdog: Mark
watchdog touch functions as notrace")

I have verified my patches, and I'm going to send it to the mailing
list these few days.
Guo Ren Oct. 21, 2020, 8:27 a.m. UTC | #8
Why?

funca:
nop                   addi  sp,sp, -8
nop                   sd ra, 0(sp)
nop, nop           auipc ra, 0x?
nop, nop  ->      jalr -?(ra)
nop                   ld ra,0(sp)
nop                   addi sp,sp, 8

When CPU0 has only changed half of them, CPU1 call funca then CPU1 broken.

On Wed, Oct 21, 2020 at 4:41 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 20 Oct 2020 19:18:01 +0800
> Guo Ren <guoren@kernel.org> wrote:
>
> > > What platform are you testing ? We are seeing a crash while enabling
> > > any of the tracers multiple times
> > > on Qemu/HiFive Unleashed.
> > I use qemu for testing. I've changed dynamic ftrace mechanism from
> > mcount to -fpatchable-entry.
> >
> > The problem is made by the lockdep checking of text_mutex.
>
> If you are switching to "patchable-entry" you shouldn't need to use
> stop_machine for the updates. No?
>
> -- Steve



--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
diff mbox series

Patch

diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 0b55287..2ee5063 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -55,13 +55,6 @@  static int patch_insn_write(void *addr, const void *insn, size_t len)
 	bool across_pages = (((uintptr_t) addr & ~PAGE_MASK) + len) > PAGE_SIZE;
 	int ret;
 
-	/*
-	 * Before reaching here, it was expected to lock the text_mutex
-	 * already, so we don't need to give another lock here and could
-	 * ensure that it was safe between each cores.
-	 */
-	lockdep_assert_held(&text_mutex);
-
 	if (across_pages)
 		patch_map(addr + len, FIX_TEXT_POKE1);