Message ID | 20210812190603.25326-4-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Reorganize the unwinder and implement stack trace reliability checks | expand |
Hi Madhavan, > @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry, > fp = thread_saved_fp(task); > pc = thread_saved_pc(task); > } > - unwind(consume_entry, cookie, task, fp, pc); > + unwind(consume_entry, cookie, task, fp, pc, false); > +} > + > +/* > + * arch_stack_walk_reliable() may not be used for livepatch until all of > + * the reliability checks are in place in unwind_consume(). However, > + * debug and test code can choose to use it even if all the checks are not > + * in place. > + */ I'm glad to see the long-awaited function :) Does the above comment mean that this comment will be removed by another patch series that about live patch enablement, instead of [PATCH 4/4]? It seems to take time... But I start thinking about test code. Thanks, Keiya > +noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn, > + void *cookie, > + struct task_struct *task) > +{ > + unsigned long fp, pc; > + > + if (!task) > + task = current; > + > + if (task == current) { > + /* Skip arch_stack_walk_reliable() in the stack trace. */ > + fp = (unsigned long)__builtin_frame_address(1); > + pc = (unsigned long)__builtin_return_address(0); > + } else { > + /* Caller guarantees that the task is not running. */ > + fp = thread_saved_fp(task); > + pc = thread_saved_pc(task); > + } > + if (unwind(consume_fn, cookie, task, fp, pc, true)) > + return 0; > + return -EINVAL; > } > > #endif > -- > 2.25.1
On 8/24/21 12:55 AM, nobuta.keiya@fujitsu.com wrote: > Hi Madhavan, > >> @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry, >> fp = thread_saved_fp(task); >> pc = thread_saved_pc(task); >> } >> - unwind(consume_entry, cookie, task, fp, pc); >> + unwind(consume_entry, cookie, task, fp, pc, false); >> +} >> + >> +/* >> + * arch_stack_walk_reliable() may not be used for livepatch until all of >> + * the reliability checks are in place in unwind_consume(). However, >> + * debug and test code can choose to use it even if all the checks are not >> + * in place. >> + */ > > I'm glad to see the long-awaited function :) > > Does the above comment mean that this comment will be removed by > another patch series that about live patch enablement, instead of [PATCH 4/4]? > > It seems to take time... But I start thinking about test code. > Yes. This comment will be removed when livepatch will be enabled eventually. So, AFAICT, there are 4 pieces that are needed: - Reliable stack trace in the kernel. I am trying to address that with my patch series. - Mark Rutland's work for making patching safe on ARM64. - Objtool (or alternative method) for stack validation. - Suraj Jitindar Singh's patch for miscellaneous things needed to enable live patch. Once all of these pieces are in place, livepatch can be enabled. That said, arch_stack_walk_reliable() can be used for test and debug purposes anytime once this patch series gets accepted. Thanks. Madhavan
> > Hi Madhavan, > > > >> @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry, > >> fp = thread_saved_fp(task); > >> pc = thread_saved_pc(task); > >> } > >> - unwind(consume_entry, cookie, task, fp, pc); > >> + unwind(consume_entry, cookie, task, fp, pc, false); } > >> + > >> +/* > >> + * arch_stack_walk_reliable() may not be used for livepatch until > >> +all of > >> + * the reliability checks are in place in unwind_consume(). However, > >> + * debug and test code can choose to use it even if all the checks > >> +are not > >> + * in place. > >> + */ > > > > I'm glad to see the long-awaited function :) > > > > Does the above comment mean that this comment will be removed by > > another patch series that about live patch enablement, instead of [PATCH 4/4]? > > > > It seems to take time... But I start thinking about test code. > > > > Yes. This comment will be removed when livepatch will be enabled eventually. > So, AFAICT, there are 4 pieces that are needed: > > - Reliable stack trace in the kernel. I am trying to address that with my patch > series. > > - Mark Rutland's work for making patching safe on ARM64. > > - Objtool (or alternative method) for stack validation. > > - Suraj Jitindar Singh's patch for miscellaneous things needed to enable live patch. > > Once all of these pieces are in place, livepatch can be enabled. > > That said, arch_stack_walk_reliable() can be used for test and debug purposes anytime once this patch series gets accepted. > > Thanks. > > Madhavan Thank you for the information. Keiya
On Thu, Aug 12, 2021 at 02:06:02PM -0500, madvenka@linux.microsoft.com wrote: > + if (frame->need_reliable && !unwind_is_reliable(frame)) { > + /* Cannot unwind to the next frame reliably. */ > + frame->failed = true; > + return false; > + } This means we only collect reliability information in the case where we're specifically doing a reliable stacktrace. For example when printing stack traces on the console it might be useful to print a ? or something if the frame is unreliable as a hint to the reader that the information might be misleading. Could we therefore change the flag here to a reliability one and our need_reliable check so that we always run unwind_is_reliable()? I'm not sure if we need to abandon the trace on first error when doing a reliable trace but I can see it's a bit safer so perhaps better to do so. If we don't abandon then we don't require the need_reliable check at all.
On 8/26/21 10:57 AM, Mark Brown wrote: > On Thu, Aug 12, 2021 at 02:06:02PM -0500, madvenka@linux.microsoft.com wrote: > >> + if (frame->need_reliable && !unwind_is_reliable(frame)) { >> + /* Cannot unwind to the next frame reliably. */ >> + frame->failed = true; >> + return false; >> + } > > This means we only collect reliability information in the case > where we're specifically doing a reliable stacktrace. For > example when printing stack traces on the console it might be > useful to print a ? or something if the frame is unreliable as a > hint to the reader that the information might be misleading. > Could we therefore change the flag here to a reliability one and > our need_reliable check so that we always run > unwind_is_reliable()? > > I'm not sure if we need to abandon the trace on first error when > doing a reliable trace but I can see it's a bit safer so perhaps > better to do so. If we don't abandon then we don't require the > need_reliable check at all. > I think that the caller should be able to specify that the stack trace should be abandoned. Like Livepatch. So, we could always do the reliability check. But keep need_reliable. Thanks. Madhavan
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index 407007376e97..65ea151da5da 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -53,6 +53,9 @@ struct stack_info { * replacement lr value in the ftrace graph stack. * * @failed: Unwind failed. + * + * @need_reliable The caller needs a reliable stack trace. Treat any + * unreliability as a fatal error. */ struct stackframe { struct task_struct *task; @@ -65,6 +68,7 @@ struct stackframe { int graph; #endif bool failed; + bool need_reliable; }; extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index ec8f5163c4d0..b60f8a20ba64 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -34,7 +34,8 @@ static void notrace unwind_start(struct stackframe *frame, struct task_struct *task, - unsigned long fp, unsigned long pc) + unsigned long fp, unsigned long pc, + bool need_reliable) { frame->task = task; frame->fp = fp; @@ -56,6 +57,7 @@ static void notrace unwind_start(struct stackframe *frame, frame->prev_fp = 0; frame->prev_type = STACK_TYPE_UNKNOWN; frame->failed = false; + frame->need_reliable = need_reliable; } NOKPROBE_SYMBOL(unwind_start); @@ -178,6 +180,23 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl) barrier(); } +/* + * Check the stack frame for conditions that make further unwinding unreliable. + */ +static bool notrace unwind_is_reliable(struct stackframe *frame) +{ + /* + * If the PC is not a known kernel text address, then we cannot + * be sure that a subsequent unwind will be reliable, as we + * don't know that the code follows our unwind requirements. + */ + if (!__kernel_text_address(frame->pc)) + return false; + return true; +} + +NOKPROBE_SYMBOL(unwind_is_reliable); + static bool notrace unwind_consume(struct stackframe *frame, stack_trace_consume_fn consume_entry, void *cookie) @@ -197,6 +216,12 @@ static bool notrace unwind_consume(struct stackframe *frame, /* Final frame; nothing to unwind */ return false; } + + if (frame->need_reliable && !unwind_is_reliable(frame)) { + /* Cannot unwind to the next frame reliably. */ + frame->failed = true; + return false; + } return true; } @@ -210,11 +235,12 @@ static inline bool unwind_failed(struct stackframe *frame) /* Core unwind function */ static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie, struct task_struct *task, - unsigned long fp, unsigned long pc) + unsigned long fp, unsigned long pc, + bool need_reliable) { struct stackframe frame; - unwind_start(&frame, task, fp, pc); + unwind_start(&frame, task, fp, pc, need_reliable); while (unwind_consume(&frame, consume_entry, cookie)) unwind_next(&frame); return !unwind_failed(&frame); @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry, fp = thread_saved_fp(task); pc = thread_saved_pc(task); } - unwind(consume_entry, cookie, task, fp, pc); + unwind(consume_entry, cookie, task, fp, pc, false); +} + +/* + * arch_stack_walk_reliable() may not be used for livepatch until all of + * the reliability checks are in place in unwind_consume(). However, + * debug and test code can choose to use it even if all the checks are not + * in place. + */ +noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn, + void *cookie, + struct task_struct *task) +{ + unsigned long fp, pc; + + if (!task) + task = current; + + if (task == current) { + /* Skip arch_stack_walk_reliable() in the stack trace. */ + fp = (unsigned long)__builtin_frame_address(1); + pc = (unsigned long)__builtin_return_address(0); + } else { + /* Caller guarantees that the task is not running. */ + fp = thread_saved_fp(task); + pc = thread_saved_pc(task); + } + if (unwind(consume_fn, cookie, task, fp, pc, true)) + return 0; + return -EINVAL; } #endif