diff mbox

[v4,2/2] arm64: Expand the stack trace feature to support IRQ stack

Message ID EB44D363-ACBA-48DB-B4EA-EDE1B87B99C8@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jungseok Lee Oct. 14, 2015, 12:24 p.m. UTC
On Oct 14, 2015, at 4:13 PM, AKASHI Takahiro wrote:
> On 10/09/2015 11:24 PM, James Morse wrote:
>> Hi Jungseok,
>> 
>> On 07/10/15 16:28, Jungseok Lee wrote:
>>> Currently, a call trace drops a process stack walk when a separate IRQ
>>> stack is used. It makes a call trace information much less useful when
>>> a system gets paniked in interrupt context.
>> 
>> panicked
>> 
>>> This patch addresses the issue with the following schemes:
>>> 
>>>   - Store aborted stack frame data
>>>   - Decide whether another stack walk is needed or not via current sp
>>>   - Loosen the frame pointer upper bound condition
>> 
>> It may be worth merging this patch with its predecessor - anyone trying to
>> bisect a problem could land between these two patches, and spend time
>> debugging the truncated call traces.
>> 
>> 
>>> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
>>> index 6ea82e8..e5904a1 100644
>>> --- a/arch/arm64/include/asm/irq.h
>>> +++ b/arch/arm64/include/asm/irq.h
>>> @@ -2,13 +2,25 @@
>>>  #define __ASM_IRQ_H
>>> 
>>>  #include <linux/irqchip/arm-gic-acpi.h>
>>> +#include <asm/stacktrace.h>
>>> 
>>>  #include <asm-generic/irq.h>
>>> 
>>>  struct irq_stack {
>>>  	void *stack;
>>> +	struct stackframe frame;
>>>  };
>>> 
>>> +DECLARE_PER_CPU(struct irq_stack, irq_stacks);
>> 
>> Good idea, storing this in the per-cpu data makes it immune to stack
>> corruption.
> 
> Is this the only reason that you have a dummy stack frame in per-cpu data?
> By placing this frame in an interrupt stack, I think, we will be able to eliminate
> changes in dump_stace(). and
> 
>> 
>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>> index 407991b..5124649 100644
>>> --- a/arch/arm64/kernel/stacktrace.c
>>> +++ b/arch/arm64/kernel/stacktrace.c
>>> @@ -43,7 +43,27 @@ int notrace unwind_frame(struct stackframe *frame)
>>>  	low  = frame->sp;
>>>  	high = ALIGN(low, THREAD_SIZE);
>>> 
>>> -	if (fp < low || fp > high - 0x18 || fp & 0xf)
>>> +	/*
>>> +	 * A frame pointer would reach an upper bound if a prologue of the
>>> +	 * first function of call trace looks as follows:
>>> +	 *
>>> +	 *	stp     x29, x30, [sp,#-16]!
>>> +	 *	mov     x29, sp
>>> +	 *
>>> +	 * Thus, the upper bound is (top of stack - 0x20) with consideration
>> 
>> The terms 'top' and 'bottom' of the stack are confusing, your 'top' appears
>> to be the highest address, which is used first, making it the bottom of the
>> stack.
>> 
>> I would try to use the terms low/est and high/est, in keeping with the
>> variable names in use here.
>> 
>> 
>>> +	 * of a 16-byte empty space in THREAD_START_SP.
>>> +	 *
>>> +	 * The value, 0x20, however, does not cover all cases as interrupts
>>> +	 * are handled using a separate stack. That is, a call trace can start
>>> +	 * from elx_irq exception vectors. The symbols could not be promoted
>>> +	 * to candidates for a stack trace under the restriction, 0x20.
>>> +	 *
>>> +	 * The scenario is handled without complexity as 1) considering
>>> +	 * (bottom of stack + THREAD_START_SP) as a dummy frame pointer, the
>>> +	 * content of which is 0, and 2) allowing the case, which changes
>>> +	 * the value to 0x10 from 0x20.
>> 
>> Where has 0x20 come from? The old value was 0x18.
>> 
>> My understanding is the highest part of the stack looks like this:
>> high        [ off-stack ]
>> high - 0x08 [ left free by THREAD_START_SP ]
>> high - 0x10 [ left free by THREAD_START_SP ]
>> high - 0x18 [#1 x30 ]
>> high - 0x20 [#1 x29 ]
>> 
>> So the condition 'fp > high - 0x18' prevents returning either 'left free'
>> address, or off-stack-value as a frame. Changing it to 'fp > high - 0x10'
>> allows the first half of that reserved area to be a valid stack frame.
>> 
>> This change is breaking perf using incantations [0] and [1]:
>> 
>> Before, with just patch 1/2:
>>                   ---__do_softirq
>>                      |
>>                      |--92.95%-- __handle_domain_irq
>>                      |          __irqentry_text_start
>>                      |          el1_irq
>>                      |
>> 
>> After, with both patches:
>>                  ---__do_softirq
>>                     |
>>                     |--83.83%-- __handle_domain_irq
>>                     |          __irqentry_text_start
>>                     |          el1_irq
>>                     |          |
>>                     |          |--99.39%-- 0x400008040d00000c
>>                     |           --0.61%-- [...]
>>                     |
> 
> This also shows that walk_stackframe() doesn't walk through a process stack.
> Now I'm trying the following hack on top of Jungseok's patch.
> (It doesn't traverse from an irq stack to an process stack yet. I need modify
> unwind_frame().)

I've got a difference between perf and dump_backtrace() as reviewing perf call
chain operation. Perf relies on walk_stackframe(), but dump_backtrace() does not.
That is, a symbol is printed out *before* unwind_frame() call in case of perf.
By contrast, dump_backtrace() records a symbol *after* unwind_frame(). I think
perf behavior is correct since frame.pc is retrieved from a valid stack frame.

So, the following diff is a prerequisite. It looks reasonable to remove dump_mem()
call since frame.sp is calculated incorrectly now. If accepted, dump_backtrace()
could utilize walk_stackframe(), which simplifies the code.

----8<----
> Thanks,
> -Takahiro AKASHI
> ----8<----
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 650cc05..5fbd1ea 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -185,14 +185,12 @@ alternative_endif
> 	mov	x23, sp
> 	and	x23, x23, #~(THREAD_SIZE - 1)
> 	cmp	x20, x23			// check irq re-enterance
> +	mov	x19, sp
> 	beq	1f
> -	str	x29, [x19, #IRQ_FRAME_FP]
> -	str	x21, [x19, #IRQ_FRAME_SP]
> -	str	x22, [x19, #IRQ_FRAME_PC]
> -	mov	x29, x24
> -1:	mov	x19, sp
> -	csel	x23, x19, x24, eq		// x24 = top of irq stack
> -	mov	sp, x23
> +	mov	sp, x24				// x24 = top of irq stack
> +	stp	x29, x22, [sp, #-32]!
> +	mov	x29, sp
> +1:
> 	.endm
> 
> 	/*

Is it possible to decide which stack is used without aborted SP information?
In addition, I'm curious about an origin of #-32.

Thanks!

Best Regards
Jungseok Lee

Comments

Jungseok Lee Oct. 14, 2015, 12:55 p.m. UTC | #1
On Oct 14, 2015, at 9:24 PM, Jungseok Lee wrote:
> On Oct 14, 2015, at 4:13 PM, AKASHI Takahiro wrote:
>> On 10/09/2015 11:24 PM, James Morse wrote:
>>> Hi Jungseok,
>>> 
>>> On 07/10/15 16:28, Jungseok Lee wrote:
>>>> Currently, a call trace drops a process stack walk when a separate IRQ
>>>> stack is used. It makes a call trace information much less useful when
>>>> a system gets paniked in interrupt context.
>>> 
>>> panicked
>>> 
>>>> This patch addresses the issue with the following schemes:
>>>> 
>>>>  - Store aborted stack frame data
>>>>  - Decide whether another stack walk is needed or not via current sp
>>>>  - Loosen the frame pointer upper bound condition
>>> 
>>> It may be worth merging this patch with its predecessor - anyone trying to
>>> bisect a problem could land between these two patches, and spend time
>>> debugging the truncated call traces.
>>> 
>>> 
>>>> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
>>>> index 6ea82e8..e5904a1 100644
>>>> --- a/arch/arm64/include/asm/irq.h
>>>> +++ b/arch/arm64/include/asm/irq.h
>>>> @@ -2,13 +2,25 @@
>>>> #define __ASM_IRQ_H
>>>> 
>>>> #include <linux/irqchip/arm-gic-acpi.h>
>>>> +#include <asm/stacktrace.h>
>>>> 
>>>> #include <asm-generic/irq.h>
>>>> 
>>>> struct irq_stack {
>>>> 	void *stack;
>>>> +	struct stackframe frame;
>>>> };
>>>> 
>>>> +DECLARE_PER_CPU(struct irq_stack, irq_stacks);
>>> 
>>> Good idea, storing this in the per-cpu data makes it immune to stack
>>> corruption.
>> 
>> Is this the only reason that you have a dummy stack frame in per-cpu data?
>> By placing this frame in an interrupt stack, I think, we will be able to eliminate
>> changes in dump_stace(). and
>> 
>>> 
>>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>>> index 407991b..5124649 100644
>>>> --- a/arch/arm64/kernel/stacktrace.c
>>>> +++ b/arch/arm64/kernel/stacktrace.c
>>>> @@ -43,7 +43,27 @@ int notrace unwind_frame(struct stackframe *frame)
>>>> 	low  = frame->sp;
>>>> 	high = ALIGN(low, THREAD_SIZE);
>>>> 
>>>> -	if (fp < low || fp > high - 0x18 || fp & 0xf)
>>>> +	/*
>>>> +	 * A frame pointer would reach an upper bound if a prologue of the
>>>> +	 * first function of call trace looks as follows:
>>>> +	 *
>>>> +	 *	stp     x29, x30, [sp,#-16]!
>>>> +	 *	mov     x29, sp
>>>> +	 *
>>>> +	 * Thus, the upper bound is (top of stack - 0x20) with consideration
>>> 
>>> The terms 'top' and 'bottom' of the stack are confusing, your 'top' appears
>>> to be the highest address, which is used first, making it the bottom of the
>>> stack.
>>> 
>>> I would try to use the terms low/est and high/est, in keeping with the
>>> variable names in use here.
>>> 
>>> 
>>>> +	 * of a 16-byte empty space in THREAD_START_SP.
>>>> +	 *
>>>> +	 * The value, 0x20, however, does not cover all cases as interrupts
>>>> +	 * are handled using a separate stack. That is, a call trace can start
>>>> +	 * from elx_irq exception vectors. The symbols could not be promoted
>>>> +	 * to candidates for a stack trace under the restriction, 0x20.
>>>> +	 *
>>>> +	 * The scenario is handled without complexity as 1) considering
>>>> +	 * (bottom of stack + THREAD_START_SP) as a dummy frame pointer, the
>>>> +	 * content of which is 0, and 2) allowing the case, which changes
>>>> +	 * the value to 0x10 from 0x20.
>>> 
>>> Where has 0x20 come from? The old value was 0x18.
>>> 
>>> My understanding is the highest part of the stack looks like this:
>>> high        [ off-stack ]
>>> high - 0x08 [ left free by THREAD_START_SP ]
>>> high - 0x10 [ left free by THREAD_START_SP ]
>>> high - 0x18 [#1 x30 ]
>>> high - 0x20 [#1 x29 ]
>>> 
>>> So the condition 'fp > high - 0x18' prevents returning either 'left free'
>>> address, or off-stack-value as a frame. Changing it to 'fp > high - 0x10'
>>> allows the first half of that reserved area to be a valid stack frame.
>>> 
>>> This change is breaking perf using incantations [0] and [1]:
>>> 
>>> Before, with just patch 1/2:
>>>                  ---__do_softirq
>>>                     |
>>>                     |--92.95%-- __handle_domain_irq
>>>                     |          __irqentry_text_start
>>>                     |          el1_irq
>>>                     |
>>> 
>>> After, with both patches:
>>>                 ---__do_softirq
>>>                    |
>>>                    |--83.83%-- __handle_domain_irq
>>>                    |          __irqentry_text_start
>>>                    |          el1_irq
>>>                    |          |
>>>                    |          |--99.39%-- 0x400008040d00000c
>>>                    |           --0.61%-- [...]
>>>                    |
>> 
>> This also shows that walk_stackframe() doesn't walk through a process stack.
>> Now I'm trying the following hack on top of Jungseok's patch.
>> (It doesn't traverse from an irq stack to an process stack yet. I need modify
>> unwind_frame().)
> 
> I've got a difference between perf and dump_backtrace() as reviewing perf call
> chain operation. Perf relies on walk_stackframe(), but dump_backtrace() does not.
> That is, a symbol is printed out *before* unwind_frame() call in case of perf.
> By contrast, dump_backtrace() records a symbol *after* unwind_frame(). I think
> perf behavior is correct since frame.pc is retrieved from a valid stack frame.
> 
> So, the following diff is a prerequisite. It looks reasonable to remove dump_mem()
> call since frame.sp is calculated incorrectly now. If accepted, dump_backtrace()
> could utilize walk_stackframe(), which simplifies the code.
> 
> ----8<----
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index f93aae5..e18be43 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -103,12 +103,15 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
>        set_fs(fs);
> }
> 
> -static void dump_backtrace_entry(unsigned long where, unsigned long stack)
> +static void dump_backtrace_entry(unsigned long where)
> {
> +       /*
> +        * PC has a physical address when MMU is disabled.
> +        */
> +       if (!kernel_text_address(where))
> +               where = (unsigned long)phys_to_virt(where);
> +
>        print_ip_sym(where);
> -       if (in_exception_text(where))
> -               dump_mem("", "Exception stack", stack,
> -                        stack + sizeof(struct pt_regs), false);
> }
> 
> static void dump_instr(const char *lvl, struct pt_regs *regs)
> @@ -172,12 +175,17 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>        pr_emerg("Call trace:\n");
>        while (1) {
>                unsigned long where = frame.pc;
> +               unsigned long stack;
>                int ret;
> 
> +               dump_backtrace_entry(where);
>                ret = unwind_frame(&frame);
>                if (ret < 0)
>                        break;
> -               dump_backtrace_entry(where, frame.sp);
> +               stack = frame.sp;
> +               if (in_exception_text(where))
> +                       dump_mem("", "Exception stack", stack,
> +                                stack + sizeof(struct pt_regs), false);
>        }
> }
> ----8<----
> 
>> Thanks,
>> -Takahiro AKASHI
>> ----8<----
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index 650cc05..5fbd1ea 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -185,14 +185,12 @@ alternative_endif
>> 	mov	x23, sp
>> 	and	x23, x23, #~(THREAD_SIZE - 1)
>> 	cmp	x20, x23			// check irq re-enterance
>> +	mov	x19, sp
>> 	beq	1f
>> -	str	x29, [x19, #IRQ_FRAME_FP]
>> -	str	x21, [x19, #IRQ_FRAME_SP]
>> -	str	x22, [x19, #IRQ_FRAME_PC]
>> -	mov	x29, x24
>> -1:	mov	x19, sp
>> -	csel	x23, x19, x24, eq		// x24 = top of irq stack
>> -	mov	sp, x23
>> +	mov	sp, x24				// x24 = top of irq stack
>> +	stp	x29, x22, [sp, #-32]!
>> +	mov	x29, sp
>> +1:
>> 	.endm
>> 
>> 	/*
> 
> Is it possible to decide which stack is used without aborted SP information?

We could know which stack is used via current SP, but how could we decide
a variable 'low' in unwind_frame() when walking a process stack?

Sorry for confusion.

Best Regards
Jungseok Lee
diff mbox

Patch

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index f93aae5..e18be43 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -103,12 +103,15 @@  static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
        set_fs(fs);
 }
 
-static void dump_backtrace_entry(unsigned long where, unsigned long stack)
+static void dump_backtrace_entry(unsigned long where)
 {
+       /*
+        * PC has a physical address when MMU is disabled.
+        */
+       if (!kernel_text_address(where))
+               where = (unsigned long)phys_to_virt(where);
+
        print_ip_sym(where);
-       if (in_exception_text(where))
-               dump_mem("", "Exception stack", stack,
-                        stack + sizeof(struct pt_regs), false);
 }
 
 static void dump_instr(const char *lvl, struct pt_regs *regs)
@@ -172,12 +175,17 @@  static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
        pr_emerg("Call trace:\n");
        while (1) {
                unsigned long where = frame.pc;
+               unsigned long stack;
                int ret;
 
+               dump_backtrace_entry(where);
                ret = unwind_frame(&frame);
                if (ret < 0)
                        break;
-               dump_backtrace_entry(where, frame.sp);
+               stack = frame.sp;
+               if (in_exception_text(where))
+                       dump_mem("", "Exception stack", stack,
+                                stack + sizeof(struct pt_regs), false);
        }
 }
----8<----