Message ID | 56554740.4070904@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Nov 25, 2015, at 2:29 PM, AKASHI Takahiro wrote: > On 11/24/2015 10:37 PM, Jungseok Lee wrote: >> On Nov 18, 2015, at 3:43 PM, AKASHI Takahiro wrote: >> >> Hi Akashi, >> >>> Function graph tracer modifies a return address (LR) in a stack frame >>> to hook a function return. This will result in many useless entries >>> (return_to_handler) showing up in >>> a) a stack tracer's output >>> b) perf call graph (with perf record -g) >>> c) dump_backtrace (at panic et al.) >>> >>> For example, in case of a), >>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>> $ echo 1 > /proc/sys/kernel/stack_trace_enabled >>> $ cat /sys/kernel/debug/tracing/stack_trace >>> Depth Size Location (54 entries) >>> ----- ---- -------- >>> 0) 4504 16 gic_raise_softirq+0x28/0x150 >>> 1) 4488 80 smp_cross_call+0x38/0xb8 >>> 2) 4408 48 return_to_handler+0x0/0x40 >>> 3) 4360 32 return_to_handler+0x0/0x40 >>> ... >>> >>> In case of b), >>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>> $ perf record -e mem:XXX:x -ag -- sleep 10 >>> $ perf report >>> ... >>> | | |--0.22%-- 0x550f8 >>> | | | 0x10888 >>> | | | el0_svc_naked >>> | | | sys_openat >>> | | | return_to_handler >>> | | | return_to_handler >>> ... >>> >>> In case of c), >>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>> $ echo c > /proc/sysrq-trigger >>> ... >>> Call trace: >>> [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 >>> [<ffffffc000092250>] return_to_handler+0x0/0x40 >>> [<ffffffc000092250>] return_to_handler+0x0/0x40 >>> ... >>> >>> This patch replaces such entries with real addresses preserved in >>> current->ret_stack[] at unwind_frame(). This way, we can cover all >>> the cases. >> >> I've observed a strange behavior when playing with case c). Call trace >> is as follows when function_graph is not used. >> >> Call trace: >> [<fffffe00003dc738>] sysrq_handle_crash+0x24/0x30 <- (1) >> [<fffffe00003dd2ac>] __handle_sysrq+0x128/0x19c <- (2) >> [<fffffe00003dd730>] write_sysrq_trigger+0x60/0x74 >> [<fffffe0000249fc4>] proc_reg_write+0x84/0xc0 >> [<fffffe00001f2638>] __vfs_write+0x44/0x104 <- (3) >> [<fffffe00001f2e60>] vfs_write+0x98/0x1a8 >> [<fffffe00001f3730>] SyS_write+0x50/0xb0 >> [<fffffe00000939ec>] el0_svc_naked+0x20/0x28 <- (4) >> >> When function_graph is set, some entries, such as do_mem_abort, are added >> between (1) and (2). In addition, entries from (3) to (4) are not printed >> out. As tracking down elements of ret_stack[], I realize dump_backtrace() >> has been terminated before reaching to ret_stack[0]. Have you seen this >> kind of behavior? I believe push & pop operations work correctly. >> >> Please correct me if I'm wrong. > > Oops, I mis-interpreted the result output. > You are right. > This can happen because the original dump_backtrace() wants to trace a stack > from a function where an exception has taken place, sysrq_handle_crash(), while > ret_stack[curr_ret_stack] doesn't point to that function, but the very top of > traced functions in callchains, that is, __do_kernel_fault in your case, probably. Yes, __do_kernel_fault. > So it results in replacing entries of return_to_handler to wrong function addresses. > > A fixup! patch attached below (informative only) fixes this issue by once tracing > all the functions on a stack, but preventing a top few ones from being printed. > But there is a tricky thing here: we have to use 'regs->pc' instead of frame.pc > as a trapped function because, as I've already mentioned before, we always miss > the function when walking through a stack from an exception handler to functions > in a thread context. > (Please note that we will introduce a dummy stack frame at interrupt, but not > at exception.) Thanks for clear explanation! Since I'm not the only person to experience the above case, I report one more and the last observation ;) PC and LR information is printed out when function_graph is set. PC is at sysrq_handle_crash+0x24/0x30 LR is at sysrq_handle_crash+0x10/0x30 The logs are as follows when function_graph turns off. PC is at sysrq_handle_crash+0x24/0x30 LR is at __handle_sysrq+0x128/0x19c I think __show_regs() might have a similar problem when retrieving LR according to the below informative patch. Thoughts? >>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >>> --- >>> arch/arm64/include/asm/ftrace.h | 2 ++ >>> arch/arm64/include/asm/stacktrace.h | 3 +++ >>> arch/arm64/kernel/perf_callchain.c | 3 +++ >>> arch/arm64/kernel/process.c | 3 +++ >>> arch/arm64/kernel/return_address.c | 3 +++ >>> arch/arm64/kernel/stacktrace.c | 17 +++++++++++++++++ >>> arch/arm64/kernel/time.c | 3 +++ >>> arch/arm64/kernel/traps.c | 3 +++ >>> 8 files changed, 37 insertions(+) >>> >>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h >>> index c5534fa..3c60f37 100644 >>> --- a/arch/arm64/include/asm/ftrace.h >>> +++ b/arch/arm64/include/asm/ftrace.h >>> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace { >>> >>> extern unsigned long ftrace_graph_call; >>> >>> +extern void return_to_handler(void); >>> + >>> static inline unsigned long ftrace_call_adjust(unsigned long addr) >>> { >>> /* >>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h >>> index 6fb61c5..801a16db 100644 >>> --- a/arch/arm64/include/asm/stacktrace.h >>> +++ b/arch/arm64/include/asm/stacktrace.h >>> @@ -22,6 +22,9 @@ struct stackframe { >>> unsigned long fp; >>> unsigned long sp; >>> unsigned long pc; >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + unsigned int graph; >>> +#endif >>> }; >> >> How about using int instead of unsigned int to align with cure_ret_stack >> of struct task_struct? >> >>> extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame); >>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c >>> index 797220d..ff46654 100644 >>> --- a/arch/arm64/kernel/perf_callchain.c >>> +++ b/arch/arm64/kernel/perf_callchain.c >>> @@ -164,6 +164,9 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry, >>> frame.fp = regs->regs[29]; >>> frame.sp = regs->sp; >>> frame.pc = regs->pc; >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + frame.graph = current->curr_ret_stack; >>> +#endif >>> >>> walk_stackframe(current, &frame, callchain_trace, entry); >>> } >>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >>> index 98bf546..88d742b 100644 >>> --- a/arch/arm64/kernel/process.c >>> +++ b/arch/arm64/kernel/process.c >>> @@ -344,6 +344,9 @@ unsigned long get_wchan(struct task_struct *p) >>> frame.fp = thread_saved_fp(p); >>> frame.sp = thread_saved_sp(p); >>> frame.pc = thread_saved_pc(p); >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + frame.graph = p->curr_ret_stack; >>> +#endif >>> stack_page = (unsigned long)task_stack_page(p); >>> do { >>> if (frame.sp < stack_page || >>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c >>> index 07b37ac..1718706 100644 >>> --- a/arch/arm64/kernel/return_address.c >>> +++ b/arch/arm64/kernel/return_address.c >>> @@ -43,6 +43,9 @@ void *return_address(unsigned int level) >>> frame.fp = (unsigned long)__builtin_frame_address(0); >>> frame.sp = current_stack_pointer; >>> frame.pc = (unsigned long)return_address; /* dummy */ >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + frame.graph = current->curr_ret_stack; >>> +#endif >>> >>> walk_stackframe(current, &frame, save_return_addr, &data); >>> >>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >>> index 9c7acf8..0a39049 100644 >>> --- a/arch/arm64/kernel/stacktrace.c >>> +++ b/arch/arm64/kernel/stacktrace.c >>> @@ -17,6 +17,7 @@ >>> */ >>> #include <linux/kernel.h> >>> #include <linux/export.h> >>> +#include <linux/ftrace.h> >>> #include <linux/sched.h> >>> #include <linux/stacktrace.h> >>> >>> @@ -66,6 +67,19 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >>> frame->fp = *(unsigned long *)(fp); >>> frame->pc = *(unsigned long *)(fp + 8); >>> >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + if (tsk && tsk->ret_stack && >>> + (frame->pc == (unsigned long)return_to_handler)) { >>> + /* >>> + * This is a case where function graph tracer has >>> + * modified a return address (LR) in a stack frame >>> + * to hook a function return. >>> + * So replace it to an original value. >>> + */ >>> + frame->pc = tsk->ret_stack[frame->graph--].ret; >>> + } >>> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */ >> >> There is an index check of ret_stack[] in case of x86 [1]. Even though >> graph is unsigned int, I think we need to check the value of frame->graph >> before accessing ret_stack[]. > > I'm not sure that the checking is very useful because, if it happens, > it is a bug. It might make sense to avoid a possible panic though. > >>> + >>> /* >>> * Check whether we are going to walk through from interrupt stack >>> * to task stack. >>> @@ -137,6 +151,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) >>> frame.sp = current_stack_pointer; >>> frame.pc = (unsigned long)save_stack_trace_tsk; >>> } >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + frame.graph = tsk->curr_ret_stack; >>> +#endif >>> >>> walk_stackframe(tsk, &frame, save_trace, &data); >>> if (trace->nr_entries < trace->max_entries) >>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c >>> index 6e5c521..5977969 100644 >>> --- a/arch/arm64/kernel/time.c >>> +++ b/arch/arm64/kernel/time.c >>> @@ -52,6 +52,9 @@ unsigned long profile_pc(struct pt_regs *regs) >>> frame.fp = regs->regs[29]; >>> frame.sp = regs->sp; >>> frame.pc = regs->pc; >>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>> + frame.graph = -1; /* no task info */ >>> +#endif >> >> graph is unsigned int type. Is this intentional? > > No. This initialization is, I believe, redundant as it is not checked anywhere, > but I will re-think of it along with the checking above. > > Thanks, > -Takahiro AKASHI > >> Best Regards >> Jungseok Lee >> >> [1] arch/x86/kernel/dumpstack.c >> > ----8<---- > From 9ebba7167f7838daf68d8231f04141d2f4d4b7b5 Mon Sep 17 00:00:00 2001 > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > Date: Wed, 25 Nov 2015 13:29:54 +0900 > Subject: [PATCH] fixup! arm64: ftrace: fix a stack tracer's output under > function graph tracer > > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > --- > arch/arm64/kernel/traps.c | 23 +++++++++++++++++------ > 1 file changed, 17 insertions(+), 6 deletions(-) > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c > index 46053c2..f140029 100644 > --- a/arch/arm64/kernel/traps.c > +++ b/arch/arm64/kernel/traps.c > @@ -147,17 +147,14 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) > { > struct stackframe frame; > unsigned long _irq_stack_ptr = per_cpu(irq_stack_ptr, smp_processor_id()); > + int skip; > > pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); > > if (!tsk) > tsk = current; > > - if (regs) { > - frame.fp = regs->regs[29]; > - frame.sp = regs->sp; > - frame.pc = regs->pc; > - } else if (tsk == current) { > + if (tsk == current) { > frame.fp = (unsigned long)__builtin_frame_address(0); > frame.sp = current_stack_pointer; > frame.pc = (unsigned long)dump_backtrace; > @@ -173,13 +170,27 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) > frame.graph = tsk->curr_ret_stack; > #endif > > + skip = (regs ? 1 : 0); > pr_emerg("Call trace:\n"); > while (1) { > unsigned long where = frame.pc; > unsigned long stack; > int ret; > > - dump_backtrace_entry(where); > + /* skip until specified stack frame */ > + if (!skip) > + dump_backtrace_entry(where); > + else if (frame.fp == regs->regs[29]) { > + skip = 0; > + /* > + * Mostly, this is the case where this function is > + * called in panic/abort. As exception handler's > + * stack frame does not contain the corresponding pc > + * at which an exception has taken place, use regs->pc > + * instead. > + */ > + dump_backtrace_entry(regs->pc); > + } > ret = unwind_frame(tsk, &frame); > if (ret < 0) > break; > -- > 1.7.9.5 This fixes up the issue I mentioned. Best Regards Jungseok Lee
Jungseok, On 11/25/2015 08:48 PM, Jungseok Lee wrote: > On Nov 25, 2015, at 2:29 PM, AKASHI Takahiro wrote: >> On 11/24/2015 10:37 PM, Jungseok Lee wrote: >>> On Nov 18, 2015, at 3:43 PM, AKASHI Takahiro wrote: >>> >>> Hi Akashi, >>> >>>> Function graph tracer modifies a return address (LR) in a stack frame >>>> to hook a function return. This will result in many useless entries >>>> (return_to_handler) showing up in >>>> a) a stack tracer's output >>>> b) perf call graph (with perf record -g) >>>> c) dump_backtrace (at panic et al.) >>>> >>>> For example, in case of a), >>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>> $ echo 1 > /proc/sys/kernel/stack_trace_enabled >>>> $ cat /sys/kernel/debug/tracing/stack_trace >>>> Depth Size Location (54 entries) >>>> ----- ---- -------- >>>> 0) 4504 16 gic_raise_softirq+0x28/0x150 >>>> 1) 4488 80 smp_cross_call+0x38/0xb8 >>>> 2) 4408 48 return_to_handler+0x0/0x40 >>>> 3) 4360 32 return_to_handler+0x0/0x40 >>>> ... >>>> >>>> In case of b), >>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>> $ perf record -e mem:XXX:x -ag -- sleep 10 >>>> $ perf report >>>> ... >>>> | | |--0.22%-- 0x550f8 >>>> | | | 0x10888 >>>> | | | el0_svc_naked >>>> | | | sys_openat >>>> | | | return_to_handler >>>> | | | return_to_handler >>>> ... >>>> >>>> In case of c), >>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>> $ echo c > /proc/sysrq-trigger >>>> ... >>>> Call trace: >>>> [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 >>>> [<ffffffc000092250>] return_to_handler+0x0/0x40 >>>> [<ffffffc000092250>] return_to_handler+0x0/0x40 >>>> ... >>>> >>>> This patch replaces such entries with real addresses preserved in >>>> current->ret_stack[] at unwind_frame(). This way, we can cover all >>>> the cases. >>> >>> I've observed a strange behavior when playing with case c). Call trace >>> is as follows when function_graph is not used. >>> >>> Call trace: >>> [<fffffe00003dc738>] sysrq_handle_crash+0x24/0x30 <- (1) >>> [<fffffe00003dd2ac>] __handle_sysrq+0x128/0x19c <- (2) >>> [<fffffe00003dd730>] write_sysrq_trigger+0x60/0x74 >>> [<fffffe0000249fc4>] proc_reg_write+0x84/0xc0 >>> [<fffffe00001f2638>] __vfs_write+0x44/0x104 <- (3) >>> [<fffffe00001f2e60>] vfs_write+0x98/0x1a8 >>> [<fffffe00001f3730>] SyS_write+0x50/0xb0 >>> [<fffffe00000939ec>] el0_svc_naked+0x20/0x28 <- (4) >>> >>> When function_graph is set, some entries, such as do_mem_abort, are added >>> between (1) and (2). In addition, entries from (3) to (4) are not printed >>> out. As tracking down elements of ret_stack[], I realize dump_backtrace() >>> has been terminated before reaching to ret_stack[0]. Have you seen this >>> kind of behavior? I believe push & pop operations work correctly. >>> >>> Please correct me if I'm wrong. >> >> Oops, I mis-interpreted the result output. >> You are right. >> This can happen because the original dump_backtrace() wants to trace a stack >> from a function where an exception has taken place, sysrq_handle_crash(), while >> ret_stack[curr_ret_stack] doesn't point to that function, but the very top of >> traced functions in callchains, that is, __do_kernel_fault in your case, probably. > > Yes, __do_kernel_fault. > >> So it results in replacing entries of return_to_handler to wrong function addresses. >> >> A fixup! patch attached below (informative only) fixes this issue by once tracing >> all the functions on a stack, but preventing a top few ones from being printed. >> But there is a tricky thing here: we have to use 'regs->pc' instead of frame.pc >> as a trapped function because, as I've already mentioned before, we always miss >> the function when walking through a stack from an exception handler to functions >> in a thread context. >> (Please note that we will introduce a dummy stack frame at interrupt, but not >> at exception.) > > Thanks for clear explanation! > > Since I'm not the only person to experience the above case, I report one more and > the last observation ;) > > PC and LR information is printed out when function_graph is set. > > PC is at sysrq_handle_crash+0x24/0x30 > LR is at sysrq_handle_crash+0x10/0x30 > > The logs are as follows when function_graph turns off. > > PC is at sysrq_handle_crash+0x24/0x30 > LR is at __handle_sysrq+0x128/0x19c > > I think __show_regs() might have a similar problem when retrieving LR according to > the below informative patch. Thoughts? Well, I believe that it is normal. Sysrq_handle_crash() is basically a leaf function, but once function or function graph tracer is turned on, ftrace_caller() is called before its function body and so LR has been modified when panic/show_regs(). Thanks, -Takahiro AKASHI >>>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >>>> --- >>>> arch/arm64/include/asm/ftrace.h | 2 ++ >>>> arch/arm64/include/asm/stacktrace.h | 3 +++ >>>> arch/arm64/kernel/perf_callchain.c | 3 +++ >>>> arch/arm64/kernel/process.c | 3 +++ >>>> arch/arm64/kernel/return_address.c | 3 +++ >>>> arch/arm64/kernel/stacktrace.c | 17 +++++++++++++++++ >>>> arch/arm64/kernel/time.c | 3 +++ >>>> arch/arm64/kernel/traps.c | 3 +++ >>>> 8 files changed, 37 insertions(+) >>>> >>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h >>>> index c5534fa..3c60f37 100644 >>>> --- a/arch/arm64/include/asm/ftrace.h >>>> +++ b/arch/arm64/include/asm/ftrace.h >>>> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace { >>>> >>>> extern unsigned long ftrace_graph_call; >>>> >>>> +extern void return_to_handler(void); >>>> + >>>> static inline unsigned long ftrace_call_adjust(unsigned long addr) >>>> { >>>> /* >>>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h >>>> index 6fb61c5..801a16db 100644 >>>> --- a/arch/arm64/include/asm/stacktrace.h >>>> +++ b/arch/arm64/include/asm/stacktrace.h >>>> @@ -22,6 +22,9 @@ struct stackframe { >>>> unsigned long fp; >>>> unsigned long sp; >>>> unsigned long pc; >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + unsigned int graph; >>>> +#endif >>>> }; >>> >>> How about using int instead of unsigned int to align with cure_ret_stack >>> of struct task_struct? >>> >>>> extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame); >>>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c >>>> index 797220d..ff46654 100644 >>>> --- a/arch/arm64/kernel/perf_callchain.c >>>> +++ b/arch/arm64/kernel/perf_callchain.c >>>> @@ -164,6 +164,9 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry, >>>> frame.fp = regs->regs[29]; >>>> frame.sp = regs->sp; >>>> frame.pc = regs->pc; >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + frame.graph = current->curr_ret_stack; >>>> +#endif >>>> >>>> walk_stackframe(current, &frame, callchain_trace, entry); >>>> } >>>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >>>> index 98bf546..88d742b 100644 >>>> --- a/arch/arm64/kernel/process.c >>>> +++ b/arch/arm64/kernel/process.c >>>> @@ -344,6 +344,9 @@ unsigned long get_wchan(struct task_struct *p) >>>> frame.fp = thread_saved_fp(p); >>>> frame.sp = thread_saved_sp(p); >>>> frame.pc = thread_saved_pc(p); >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + frame.graph = p->curr_ret_stack; >>>> +#endif >>>> stack_page = (unsigned long)task_stack_page(p); >>>> do { >>>> if (frame.sp < stack_page || >>>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c >>>> index 07b37ac..1718706 100644 >>>> --- a/arch/arm64/kernel/return_address.c >>>> +++ b/arch/arm64/kernel/return_address.c >>>> @@ -43,6 +43,9 @@ void *return_address(unsigned int level) >>>> frame.fp = (unsigned long)__builtin_frame_address(0); >>>> frame.sp = current_stack_pointer; >>>> frame.pc = (unsigned long)return_address; /* dummy */ >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + frame.graph = current->curr_ret_stack; >>>> +#endif >>>> >>>> walk_stackframe(current, &frame, save_return_addr, &data); >>>> >>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c >>>> index 9c7acf8..0a39049 100644 >>>> --- a/arch/arm64/kernel/stacktrace.c >>>> +++ b/arch/arm64/kernel/stacktrace.c >>>> @@ -17,6 +17,7 @@ >>>> */ >>>> #include <linux/kernel.h> >>>> #include <linux/export.h> >>>> +#include <linux/ftrace.h> >>>> #include <linux/sched.h> >>>> #include <linux/stacktrace.h> >>>> >>>> @@ -66,6 +67,19 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >>>> frame->fp = *(unsigned long *)(fp); >>>> frame->pc = *(unsigned long *)(fp + 8); >>>> >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + if (tsk && tsk->ret_stack && >>>> + (frame->pc == (unsigned long)return_to_handler)) { >>>> + /* >>>> + * This is a case where function graph tracer has >>>> + * modified a return address (LR) in a stack frame >>>> + * to hook a function return. >>>> + * So replace it to an original value. >>>> + */ >>>> + frame->pc = tsk->ret_stack[frame->graph--].ret; >>>> + } >>>> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */ >>> >>> There is an index check of ret_stack[] in case of x86 [1]. Even though >>> graph is unsigned int, I think we need to check the value of frame->graph >>> before accessing ret_stack[]. >> >> I'm not sure that the checking is very useful because, if it happens, >> it is a bug. It might make sense to avoid a possible panic though. >> >>>> + >>>> /* >>>> * Check whether we are going to walk through from interrupt stack >>>> * to task stack. >>>> @@ -137,6 +151,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) >>>> frame.sp = current_stack_pointer; >>>> frame.pc = (unsigned long)save_stack_trace_tsk; >>>> } >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + frame.graph = tsk->curr_ret_stack; >>>> +#endif >>>> >>>> walk_stackframe(tsk, &frame, save_trace, &data); >>>> if (trace->nr_entries < trace->max_entries) >>>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c >>>> index 6e5c521..5977969 100644 >>>> --- a/arch/arm64/kernel/time.c >>>> +++ b/arch/arm64/kernel/time.c >>>> @@ -52,6 +52,9 @@ unsigned long profile_pc(struct pt_regs *regs) >>>> frame.fp = regs->regs[29]; >>>> frame.sp = regs->sp; >>>> frame.pc = regs->pc; >>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER >>>> + frame.graph = -1; /* no task info */ >>>> +#endif >>> >>> graph is unsigned int type. Is this intentional? >> >> No. This initialization is, I believe, redundant as it is not checked anywhere, >> but I will re-think of it along with the checking above. >> >> Thanks, >> -Takahiro AKASHI >> >>> Best Regards >>> Jungseok Lee >>> >>> [1] arch/x86/kernel/dumpstack.c >>> >> ----8<---- >> From 9ebba7167f7838daf68d8231f04141d2f4d4b7b5 Mon Sep 17 00:00:00 2001 >> From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> Date: Wed, 25 Nov 2015 13:29:54 +0900 >> Subject: [PATCH] fixup! arm64: ftrace: fix a stack tracer's output under >> function graph tracer >> >> >> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >> --- >> arch/arm64/kernel/traps.c | 23 +++++++++++++++++------ >> 1 file changed, 17 insertions(+), 6 deletions(-) >> >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c >> index 46053c2..f140029 100644 >> --- a/arch/arm64/kernel/traps.c >> +++ b/arch/arm64/kernel/traps.c >> @@ -147,17 +147,14 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) >> { >> struct stackframe frame; >> unsigned long _irq_stack_ptr = per_cpu(irq_stack_ptr, smp_processor_id()); >> + int skip; >> >> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); >> >> if (!tsk) >> tsk = current; >> >> - if (regs) { >> - frame.fp = regs->regs[29]; >> - frame.sp = regs->sp; >> - frame.pc = regs->pc; >> - } else if (tsk == current) { >> + if (tsk == current) { >> frame.fp = (unsigned long)__builtin_frame_address(0); >> frame.sp = current_stack_pointer; >> frame.pc = (unsigned long)dump_backtrace; >> @@ -173,13 +170,27 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) >> frame.graph = tsk->curr_ret_stack; >> #endif >> >> + skip = (regs ? 1 : 0); >> pr_emerg("Call trace:\n"); >> while (1) { >> unsigned long where = frame.pc; >> unsigned long stack; >> int ret; >> >> - dump_backtrace_entry(where); >> + /* skip until specified stack frame */ >> + if (!skip) >> + dump_backtrace_entry(where); >> + else if (frame.fp == regs->regs[29]) { >> + skip = 0; >> + /* >> + * Mostly, this is the case where this function is >> + * called in panic/abort. As exception handler's >> + * stack frame does not contain the corresponding pc >> + * at which an exception has taken place, use regs->pc >> + * instead. >> + */ >> + dump_backtrace_entry(regs->pc); >> + } >> ret = unwind_frame(tsk, &frame); >> if (ret < 0) >> break; >> -- >> 1.7.9.5 > > This fixes up the issue I mentioned. > > Best Regards > Jungseok Lee >
On Nov 26, 2015, at 12:05 PM, AKASHI Takahiro wrote: > Jungseok, Hi Akashi, > On 11/25/2015 08:48 PM, Jungseok Lee wrote: >> On Nov 25, 2015, at 2:29 PM, AKASHI Takahiro wrote: >>> On 11/24/2015 10:37 PM, Jungseok Lee wrote: >>>> On Nov 18, 2015, at 3:43 PM, AKASHI Takahiro wrote: >>>> >>>> Hi Akashi, >>>> >>>>> Function graph tracer modifies a return address (LR) in a stack frame >>>>> to hook a function return. This will result in many useless entries >>>>> (return_to_handler) showing up in >>>>> a) a stack tracer's output >>>>> b) perf call graph (with perf record -g) >>>>> c) dump_backtrace (at panic et al.) >>>>> >>>>> For example, in case of a), >>>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>>> $ echo 1 > /proc/sys/kernel/stack_trace_enabled >>>>> $ cat /sys/kernel/debug/tracing/stack_trace >>>>> Depth Size Location (54 entries) >>>>> ----- ---- -------- >>>>> 0) 4504 16 gic_raise_softirq+0x28/0x150 >>>>> 1) 4488 80 smp_cross_call+0x38/0xb8 >>>>> 2) 4408 48 return_to_handler+0x0/0x40 >>>>> 3) 4360 32 return_to_handler+0x0/0x40 >>>>> ... >>>>> >>>>> In case of b), >>>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>>> $ perf record -e mem:XXX:x -ag -- sleep 10 >>>>> $ perf report >>>>> ... >>>>> | | |--0.22%-- 0x550f8 >>>>> | | | 0x10888 >>>>> | | | el0_svc_naked >>>>> | | | sys_openat >>>>> | | | return_to_handler >>>>> | | | return_to_handler >>>>> ... >>>>> >>>>> In case of c), >>>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>>> $ echo c > /proc/sysrq-trigger >>>>> ... >>>>> Call trace: >>>>> [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 >>>>> [<ffffffc000092250>] return_to_handler+0x0/0x40 >>>>> [<ffffffc000092250>] return_to_handler+0x0/0x40 >>>>> ... >>>>> >>>>> This patch replaces such entries with real addresses preserved in >>>>> current->ret_stack[] at unwind_frame(). This way, we can cover all >>>>> the cases. >>>> >>>> I've observed a strange behavior when playing with case c). Call trace >>>> is as follows when function_graph is not used. >>>> >>>> Call trace: >>>> [<fffffe00003dc738>] sysrq_handle_crash+0x24/0x30 <- (1) >>>> [<fffffe00003dd2ac>] __handle_sysrq+0x128/0x19c <- (2) >>>> [<fffffe00003dd730>] write_sysrq_trigger+0x60/0x74 >>>> [<fffffe0000249fc4>] proc_reg_write+0x84/0xc0 >>>> [<fffffe00001f2638>] __vfs_write+0x44/0x104 <- (3) >>>> [<fffffe00001f2e60>] vfs_write+0x98/0x1a8 >>>> [<fffffe00001f3730>] SyS_write+0x50/0xb0 >>>> [<fffffe00000939ec>] el0_svc_naked+0x20/0x28 <- (4) >>>> >>>> When function_graph is set, some entries, such as do_mem_abort, are added >>>> between (1) and (2). In addition, entries from (3) to (4) are not printed >>>> out. As tracking down elements of ret_stack[], I realize dump_backtrace() >>>> has been terminated before reaching to ret_stack[0]. Have you seen this >>>> kind of behavior? I believe push & pop operations work correctly. >>>> >>>> Please correct me if I'm wrong. >>> >>> Oops, I mis-interpreted the result output. >>> You are right. >>> This can happen because the original dump_backtrace() wants to trace a stack >>> from a function where an exception has taken place, sysrq_handle_crash(), while >>> ret_stack[curr_ret_stack] doesn't point to that function, but the very top of >>> traced functions in callchains, that is, __do_kernel_fault in your case, probably. >> >> Yes, __do_kernel_fault. >> >>> So it results in replacing entries of return_to_handler to wrong function addresses. >>> >>> A fixup! patch attached below (informative only) fixes this issue by once tracing >>> all the functions on a stack, but preventing a top few ones from being printed. >>> But there is a tricky thing here: we have to use 'regs->pc' instead of frame.pc >>> as a trapped function because, as I've already mentioned before, we always miss >>> the function when walking through a stack from an exception handler to functions >>> in a thread context. >>> (Please note that we will introduce a dummy stack frame at interrupt, but not >>> at exception.) >> >> Thanks for clear explanation! >> >> Since I'm not the only person to experience the above case, I report one more and >> the last observation ;) >> >> PC and LR information is printed out when function_graph is set. >> >> PC is at sysrq_handle_crash+0x24/0x30 >> LR is at sysrq_handle_crash+0x10/0x30 >> >> The logs are as follows when function_graph turns off. >> >> PC is at sysrq_handle_crash+0x24/0x30 >> LR is at __handle_sysrq+0x128/0x19c >> >> I think __show_regs() might have a similar problem when retrieving LR according to >> the below informative patch. Thoughts? > > Well, I believe that it is normal. > Sysrq_handle_crash() is basically a leaf function, but once function or function graph > tracer is turned on, ftrace_caller() is called before its function body and so LR has > been modified when panic/show_regs(). Got it. All aspects of this patch is clear now. Thanks! Best Regards Jungseok Lee
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 46053c2..f140029 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -147,17 +147,14 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) { struct stackframe frame; unsigned long _irq_stack_ptr = per_cpu(irq_stack_ptr, smp_processor_id()); + int skip; pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk); if (!tsk) tsk = current; - if (regs) { - frame.fp = regs->regs[29]; - frame.sp = regs->sp; - frame.pc = regs->pc; - } else if (tsk == current) { + if (tsk == current) { frame.fp = (unsigned long)__builtin_frame_address(0); frame.sp = current_stack_pointer; frame.pc = (unsigned long)dump_backtrace; @@ -173,13 +170,27 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) frame.graph = tsk->curr_ret_stack; #endif + skip = (regs ? 1 : 0); pr_emerg("Call trace:\n"); while (1) { unsigned long where = frame.pc; unsigned long stack; int ret; - dump_backtrace_entry(where); + /* skip until specified stack frame */ + if (!skip) + dump_backtrace_entry(where); + else if (frame.fp == regs->regs[29]) { + skip = 0; + /* + * Mostly, this is the case where this function is + * called in panic/abort. As exception handler's + * stack frame does not contain the corresponding pc + * at which an exception has taken place, use regs->pc + * instead. + */ + dump_backtrace_entry(regs->pc); + } ret = unwind_frame(tsk, &frame); if (ret < 0) break;