Message ID | 1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | usercopy: delete __noreturn from usercopy_abort | expand |
On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: > When the last instruction of a noreturn function is a call > to another function, the return address falls outside > of the function boundary. This seems to cause kernel > to interrupt the backtrace. [...] > Delete __noreturn from usercopy_abort, This sounds like the actual bug is in the backtracing logic? I don't think removing __noreturn annotations from an individual function is a good fix, since the same thing can happen with other __noreturn functions depending on what choices the compiler makes.
On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote: > On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: > > When the last instruction of a noreturn function is a call > > to another function, the return address falls outside > > of the function boundary. This seems to cause kernel > > to interrupt the backtrace. FWIW, all email from huawei.com continues to get eaten by anti-spam checking. I've reported this a few times -- it'd be really nice if the domain configuration could get fixed. > [...] > > Delete __noreturn from usercopy_abort, > > This sounds like the actual bug is in the backtracing logic? I don't > think removing __noreturn annotations from an individual function is a > good fix, since the same thing can happen with other __noreturn > functions depending on what choices the compiler makes. Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG().
On 2024/3/4 23:15, Jann Horn wrote: > On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: >> When the last instruction of a noreturn function is a call >> to another function, the return address falls outside >> of the function boundary. This seems to cause kernel >> to interrupt the backtrace. > [...] >> Delete __noreturn from usercopy_abort, > > This sounds like the actual bug is in the backtracing logic? I don't > think removing __noreturn annotations from an individual function is a > good fix, since the same thing can happen with other __noreturn > functions depending on what choices the compiler makes. > . > Yes, you make a point. This may be a bug is in the backtracing logic, but the kernel backtracing always parses symbols using (lr) instead of (lr-4). This may be due to historical reasons or more comprehensive considerations. In addition, modifying the implementation logic of the kernel backtracing has a great impact. Therefore, I do not dare to modify the implementation logic of the kernel backtracing. Not all noreturn functions are ended with calling other functions. Therefore, only a few individual functions may have the same problem. In addition, deleting '__noreturn' from usercopy_abort does not change the internal behavior of usercopy_abort function. Therefore, there is no risk. Deleting '__noreturn' from usercopy_abort is the solution that I can think of with minimal impact and minimum risk. If you will submit a better patch to solve this problem, I would like to learn from you. Thank you.
On 2024/3/5 10:54, Jiangfeng Xiao wrote: > > > On 2024/3/4 23:15, Jann Horn wrote: >> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: >>> When the last instruction of a noreturn function is a call >>> to another function, the return address falls outside >>> of the function boundary. This seems to cause kernel >>> to interrupt the backtrace. >> [...] >>> Delete __noreturn from usercopy_abort, >> >> This sounds like the actual bug is in the backtracing logic? I don't >> think removing __noreturn annotations from an individual function is a >> good fix, since the same thing can happen with other __noreturn >> functions depending on what choices the compiler makes. >> . >> > Yes, you make a point. This may be a bug is in the backtracing logic, but > the kernel backtracing always parses symbols using (lr) instead of (lr-4). > This may be due to historical reasons or more comprehensive considerations. > In addition, modifying the implementation logic of the kernel backtracing > has a great impact. Therefore, I do not dare to modify the implementation > logic of the kernel backtracing. > > Not all noreturn functions are ended with calling other functions. > Therefore, only a few individual functions may have the same problem. > In addition, deleting '__noreturn' from usercopy_abort does not > change the internal behavior of usercopy_abort function. > Therefore, there is no risk. Deleting '__noreturn' from usercopy_abort > is the solution that I can think of with minimal impact and minimum risk. > > If you will submit a better patch to solve this problem, > I would like to learn from you. Thank you. > What are your suggestions on modifying the kernel backtracing logic or deleting '__noretrun'? If you insist on modifying the kernel backtracing logic and persuade me with good reasons, I can also try to submit the patch for modifying the kernel backtracing logic to the community.
On 2024/3/5 1:40, Kees Cook wrote: > On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote: >> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: >>> When the last instruction of a noreturn function is a call >>> to another function, the return address falls outside >>> of the function boundary. This seems to cause kernel >>> to interrupt the backtrace. > > FWIW, all email from huawei.com continues to get eaten by anti-spam > checking. I've reported this a few times -- it'd be really nice if the > domain configuration could get fixed. > >> [...] >>> Delete __noreturn from usercopy_abort, >> >> This sounds like the actual bug is in the backtracing logic? I don't >> think removing __noreturn annotations from an individual function is a >> good fix, since the same thing can happen with other __noreturn >> functions depending on what choices the compiler makes. > > Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG(). > When the user directly or indirectly calls usercopy_abort, the final call stack is incorrect, and the code where the problem occurs cannot be located. In this case, the user will be frustrated. For the usercopy_abort function, whether '__noreturn' is added does not affect the internal behavior of the usercopy_abort function. Therefore, it is recommended that '__noreturn' be deleted so that backtrace can work properly.
On Tue, Mar 05, 2024 at 11:31:06AM +0800, Jiangfeng Xiao wrote: > > > On 2024/3/5 1:40, Kees Cook wrote: > > On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote: > >> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: > >>> When the last instruction of a noreturn function is a call > >>> to another function, the return address falls outside > >>> of the function boundary. This seems to cause kernel > >>> to interrupt the backtrace. > > > > FWIW, all email from huawei.com continues to get eaten by anti-spam > > checking. I've reported this a few times -- it'd be really nice if the > > domain configuration could get fixed. > > > >> [...] > >>> Delete __noreturn from usercopy_abort, > >> > >> This sounds like the actual bug is in the backtracing logic? I don't > >> think removing __noreturn annotations from an individual function is a > >> good fix, since the same thing can happen with other __noreturn > >> functions depending on what choices the compiler makes. > > > > Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG(). > > > When the user directly or indirectly calls usercopy_abort, > the final call stack is incorrect, and the > code where the problem occurs cannot be located. > In this case, the user will be frustrated. Can you please give an example of this? > For the usercopy_abort function, whether '__noreturn' is added > does not affect the internal behavior of the usercopy_abort function. > Therefore, it is recommended that '__noreturn' be deleted > so that backtrace can work properly. This isn't acceptable. Removing __noreturn this will break objtool's processing of execution flow for livepatching, IBT, and KCFI instrumentation. These all depend on an accurate control flow descriptions, and usercopy_abort is correctly marked __noreturn.
On 2024/3/5 17:32, Kees Cook wrote: > On Tue, Mar 05, 2024 at 11:31:06AM +0800, Jiangfeng Xiao wrote: >> >> >> On 2024/3/5 1:40, Kees Cook wrote: >>> On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote: >>>> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote: >>>>> When the last instruction of a noreturn function is a call >>>>> to another function, the return address falls outside >>>>> of the function boundary. This seems to cause kernel >>>>> to interrupt the backtrace. >>> >>> FWIW, all email from huawei.com continues to get eaten by anti-spam >>> checking. I've reported this a few times -- it'd be really nice if the >>> domain configuration could get fixed. >>> >>>> [...] >>>>> Delete __noreturn from usercopy_abort, >>>> >>>> This sounds like the actual bug is in the backtracing logic? I don't >>>> think removing __noreturn annotations from an individual function is a >>>> good fix, since the same thing can happen with other __noreturn >>>> functions depending on what choices the compiler makes. >>> >>> Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG(). >>> >> When the user directly or indirectly calls usercopy_abort, >> the final call stack is incorrect, and the >> code where the problem occurs cannot be located. >> In this case, the user will be frustrated. > > Can you please give an example of this? The main configurations of my kernel are as follows: CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is enabled. (This config uses the compilation parameter -O2.) CONFIG_RELOCATABLE is disabled. (This config uses the compilation option -fpic.) You can use the following kernel module testcase to reproduce this problem on the ARM32 platform. ``` #include <linux/module.h> #include <linux/sched.h> static volatile size_t unconst = 0; /* check_object_size __check_object_size check_kernel_text_object usercopy_abort("kernel text", ...) */ void test_usercopy_kernel(void) { check_object_size(schedule, unconst + PAGE_SIZE, 1); } static int __init test_usercopy_init(void) { test_usercopy_kernel(); return 0; } static void __exit test_usercopy_exit(void) { } module_init(test_usercopy_init); module_exit(test_usercopy_exit); MODULE_LICENSE("GPL"); ``` > >> For the usercopy_abort function, whether '__noreturn' is added >> does not affect the internal behavior of the usercopy_abort function. >> Therefore, it is recommended that '__noreturn' be deleted >> so that backtrace can work properly. > > This isn't acceptable. Removing __noreturn this will break > objtool's processing of execution flow for livepatching, IBT, and > KCFI instrumentation. These all depend on an accurate control flow > descriptions, and usercopy_abort is correctly marked __noreturn. > Thank you for providing this information. I'll go back to further understand how __noreturn is used in features such as KCFI and livepatching.
> >> For the usercopy_abort function, whether '__noreturn' is added > >> does not affect the internal behavior of the usercopy_abort function. > >> Therefore, it is recommended that '__noreturn' be deleted > >> so that backtrace can work properly. > > > > This isn't acceptable. Removing __noreturn this will break > > objtool's processing of execution flow for livepatching, IBT, and > > KCFI instrumentation. These all depend on an accurate control flow > > descriptions, and usercopy_abort is correctly marked __noreturn. __noreturn also has the benefit of enabling the compiler to produce more compact code for callees. > Thank you for providing this information. > I'll go back to further understand how __noreturn is used > in features such as KCFI and livepatching. Adding ARM folks -- see https://lkml.kernel.org/lkml/1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com for the original bug report. This is an off-by-one bug which is common in unwinders, due to the fact that the address on the stack points to the return address rather than the call address. So, for example, when the last instruction of a function is a function call (e.g., to a noreturn function), it can cause the unwinder to incorrectly try to unwind from the function *after* the callee. For ORC (x86), we fixed this by decrementing the PC for call frames (but not exception frames). I've seen user space unwinders do similar, for non-signal frames. Something like the following might fix your issue (completely untested): diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h index 360f0d2406bf..4891e38cdc1f 100644 --- a/arch/arm/include/asm/stacktrace.h +++ b/arch/arm/include/asm/stacktrace.h @@ -21,9 +21,7 @@ struct stackframe { struct llist_node *kr_cur; struct task_struct *tsk; #endif -#ifdef CONFIG_UNWINDER_FRAME_POINTER bool ex_frame; -#endif }; static __always_inline @@ -37,9 +35,8 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame) frame->kr_cur = NULL; frame->tsk = current; #endif -#ifdef CONFIG_UNWINDER_FRAME_POINTER - frame->ex_frame = in_entry_text(frame->pc); -#endif + frame->ex_frame = !!regs; + } extern int unwind_frame(struct stackframe *frame); diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c index 620aa82e3bdd..caed7436da09 100644 --- a/arch/arm/kernel/stacktrace.c +++ b/arch/arm/kernel/stacktrace.c @@ -154,9 +154,6 @@ static void start_stack_trace(struct stackframe *frame, struct task_struct *task frame->kr_cur = NULL; frame->tsk = task; #endif -#ifdef CONFIG_UNWINDER_FRAME_POINTER - frame->ex_frame = in_entry_text(frame->pc); -#endif } void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, @@ -167,6 +164,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, if (regs) { start_stack_trace(&frame, NULL, regs->ARM_fp, regs->ARM_sp, regs->ARM_lr, regs->ARM_pc); + frame.ex_frame = true; } else if (task != current) { #ifdef CONFIG_SMP /* @@ -180,6 +178,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, thread_saved_sp(task), 0, thread_saved_pc(task)); #endif + frame.ex_frame = false; } else { here: start_stack_trace(&frame, task, @@ -187,6 +186,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, current_stack_pointer, (unsigned long)__builtin_return_address(0), (unsigned long)&&here); + frame.ex_frame = false; /* skip this function */ if (unwind_frame(&frame)) return; diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 3bad79db5d6e..46a5b1eb3f0a 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -84,10 +84,10 @@ void dump_backtrace_entry(unsigned long where, unsigned long from, printk("%sFunction entered at [<%08lx>] from [<%08lx>]\n", loglvl, where, from); #elif defined CONFIG_BACKTRACE_VERBOSE - printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pS)\n", + printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pB)\n", loglvl, where, (void *)where, from, (void *)from); #else - printk("%s %ps from %pS\n", loglvl, (void *)where, (void *)from); + printk("%s %ps from %pB\n", loglvl, (void *)where, (void *)from); #endif if (in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE)) diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c index 9d2192156087..99ded32196af 100644 --- a/arch/arm/kernel/unwind.c +++ b/arch/arm/kernel/unwind.c @@ -407,7 +407,7 @@ int unwind_frame(struct stackframe *frame) { const struct unwind_idx *idx; struct unwind_ctrl_block ctrl; - unsigned long sp_low; + unsigned long sp_low, pc; /* store the highest address on the stack to avoid crossing it*/ sp_low = frame->sp; @@ -417,19 +417,22 @@ int unwind_frame(struct stackframe *frame) pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__, frame->pc, frame->lr, frame->sp); - idx = unwind_find_idx(frame->pc); + pc = frame->ex_frame ? frame->pc : frame->pc - 4; + + idx = unwind_find_idx(pc); if (!idx) { - if (frame->pc && kernel_text_address(frame->pc)) { - if (in_module_plt(frame->pc) && frame->pc != frame->lr) { + if (kernel_text_address(pc)) { + if (in_module_plt(pc) && frame->pc != frame->lr) { /* * Quoting Ard: Veneers only set PC using a * PC+immediate LDR, and so they don't affect * the state of the stack or the register file */ frame->pc = frame->lr; + frame->ex_frame = false; return URC_OK; } - pr_warn("unwind: Index not found %08lx\n", frame->pc); + pr_warn("unwind: Index not found %08lx\n", pc); } return -URC_FAILURE; } @@ -442,7 +445,7 @@ int unwind_frame(struct stackframe *frame) if (idx->insn == 1) /* can't unwind */ return -URC_FAILURE; - else if (frame->pc == prel31_to_addr(&idx->addr_offset)) { + else if (frame->ex_frame && pc == prel31_to_addr(&idx->addr_offset)) { /* * Unwinding is tricky when we're halfway through the prologue, * since the stack frame that the unwinder expects may not be @@ -451,9 +454,10 @@ int unwind_frame(struct stackframe *frame) * a function, we are still effectively in the stack frame of * the caller, and the unwind info has no relevance yet. */ - if (frame->pc == frame->lr) + if (pc == frame->lr) return -URC_FAILURE; frame->pc = frame->lr; + frame->ex_frame = false; return URC_OK; } else if ((idx->insn & 0x80000000) == 0) /* prel31 to the unwind table */ @@ -515,6 +519,7 @@ int unwind_frame(struct stackframe *frame) frame->lr = ctrl.vrs[LR]; frame->pc = ctrl.vrs[PC]; frame->lr_addr = ctrl.lr_addr; + frame->ex_frame = false; return URC_OK; } @@ -544,6 +549,7 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk, */ here: frame.pc = (unsigned long)&&here; + frame.ex_frame = false; } else { /* task blocked in __switch_to */ frame.fp = thread_saved_fp(tsk); @@ -554,11 +560,12 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk, */ frame.lr = 0; frame.pc = thread_saved_pc(tsk); + frame.ex_frame = false; } while (1) { int urc; - unsigned long where = frame.pc; + unsigned long where = frame.ex_frame ? frame.pc : frame.pc - 4; urc = unwind_frame(&frame); if (urc < 0)
On 2024/3/6 1:58, Josh Poimboeuf wrote: >>>> For the usercopy_abort function, whether '__noreturn' is added >>>> does not affect the internal behavior of the usercopy_abort function. >>>> Therefore, it is recommended that '__noreturn' be deleted >>>> so that backtrace can work properly. >>> >>> This isn't acceptable. Removing __noreturn this will break >>> objtool's processing of execution flow for livepatching, IBT, and >>> KCFI instrumentation. These all depend on an accurate control flow >>> descriptions, and usercopy_abort is correctly marked __noreturn. > > __noreturn also has the benefit of enabling the compiler to produce more > compact code for callees. > >> Thank you for providing this information. >> I'll go back to further understand how __noreturn is used >> in features such as KCFI and livepatching. > > Adding ARM folks -- see > https://lkml.kernel.org/lkml/1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com > for the original bug report. > > This is an off-by-one bug which is common in unwinders, due to the fact > that the address on the stack points to the return address rather than > the call address. > Thanks for your advice. I think I understand. To solve this problem, I need to fix the off-by-one bug which is common in unwinders. I'll try to fix it later by referring to your patch. > So, for example, when the last instruction of a function is a function > call (e.g., to a noreturn function), it can cause the unwinder to > incorrectly try to unwind from the function *after* the callee. > > For ORC (x86), we fixed this by decrementing the PC for call frames (but > not exception frames). I've seen user space unwinders do similar, for > non-signal frames. > > Something like the following might fix your issue (completely untested): > > diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h > index 360f0d2406bf..4891e38cdc1f 100644 > --- a/arch/arm/include/asm/stacktrace.h > +++ b/arch/arm/include/asm/stacktrace.h > @@ -21,9 +21,7 @@ struct stackframe { > struct llist_node *kr_cur; > struct task_struct *tsk; > #endif > -#ifdef CONFIG_UNWINDER_FRAME_POINTER > bool ex_frame; > -#endif > }; > > static __always_inline > @@ -37,9 +35,8 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame) > frame->kr_cur = NULL; > frame->tsk = current; > #endif > -#ifdef CONFIG_UNWINDER_FRAME_POINTER > - frame->ex_frame = in_entry_text(frame->pc); > -#endif > + frame->ex_frame = !!regs; > + > } > > extern int unwind_frame(struct stackframe *frame); > diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c > index 620aa82e3bdd..caed7436da09 100644 > --- a/arch/arm/kernel/stacktrace.c > +++ b/arch/arm/kernel/stacktrace.c > @@ -154,9 +154,6 @@ static void start_stack_trace(struct stackframe *frame, struct task_struct *task > frame->kr_cur = NULL; > frame->tsk = task; > #endif > -#ifdef CONFIG_UNWINDER_FRAME_POINTER > - frame->ex_frame = in_entry_text(frame->pc); > -#endif > } > > void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > @@ -167,6 +164,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > if (regs) { > start_stack_trace(&frame, NULL, regs->ARM_fp, regs->ARM_sp, > regs->ARM_lr, regs->ARM_pc); > + frame.ex_frame = true; > } else if (task != current) { > #ifdef CONFIG_SMP > /* > @@ -180,6 +178,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > thread_saved_sp(task), 0, > thread_saved_pc(task)); > #endif > + frame.ex_frame = false; > } else { > here: > start_stack_trace(&frame, task, > @@ -187,6 +186,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > current_stack_pointer, > (unsigned long)__builtin_return_address(0), > (unsigned long)&&here); > + frame.ex_frame = false; > /* skip this function */ > if (unwind_frame(&frame)) > return; > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c > index 3bad79db5d6e..46a5b1eb3f0a 100644 > --- a/arch/arm/kernel/traps.c > +++ b/arch/arm/kernel/traps.c > @@ -84,10 +84,10 @@ void dump_backtrace_entry(unsigned long where, unsigned long from, > printk("%sFunction entered at [<%08lx>] from [<%08lx>]\n", > loglvl, where, from); > #elif defined CONFIG_BACKTRACE_VERBOSE > - printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pS)\n", > + printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pB)\n", > loglvl, where, (void *)where, from, (void *)from); > #else > - printk("%s %ps from %pS\n", loglvl, (void *)where, (void *)from); > + printk("%s %ps from %pB\n", loglvl, (void *)where, (void *)from); > #endif > > if (in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE)) > diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c > index 9d2192156087..99ded32196af 100644 > --- a/arch/arm/kernel/unwind.c > +++ b/arch/arm/kernel/unwind.c > @@ -407,7 +407,7 @@ int unwind_frame(struct stackframe *frame) > { > const struct unwind_idx *idx; > struct unwind_ctrl_block ctrl; > - unsigned long sp_low; > + unsigned long sp_low, pc; > > /* store the highest address on the stack to avoid crossing it*/ > sp_low = frame->sp; > @@ -417,19 +417,22 @@ int unwind_frame(struct stackframe *frame) > pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__, > frame->pc, frame->lr, frame->sp); > > - idx = unwind_find_idx(frame->pc); > + pc = frame->ex_frame ? frame->pc : frame->pc - 4; > + > + idx = unwind_find_idx(pc); > if (!idx) { > - if (frame->pc && kernel_text_address(frame->pc)) { > - if (in_module_plt(frame->pc) && frame->pc != frame->lr) { > + if (kernel_text_address(pc)) { > + if (in_module_plt(pc) && frame->pc != frame->lr) { > /* > * Quoting Ard: Veneers only set PC using a > * PC+immediate LDR, and so they don't affect > * the state of the stack or the register file > */ > frame->pc = frame->lr; > + frame->ex_frame = false; > return URC_OK; > } > - pr_warn("unwind: Index not found %08lx\n", frame->pc); > + pr_warn("unwind: Index not found %08lx\n", pc); > } > return -URC_FAILURE; > } > @@ -442,7 +445,7 @@ int unwind_frame(struct stackframe *frame) > if (idx->insn == 1) > /* can't unwind */ > return -URC_FAILURE; > - else if (frame->pc == prel31_to_addr(&idx->addr_offset)) { > + else if (frame->ex_frame && pc == prel31_to_addr(&idx->addr_offset)) { > /* > * Unwinding is tricky when we're halfway through the prologue, > * since the stack frame that the unwinder expects may not be > @@ -451,9 +454,10 @@ int unwind_frame(struct stackframe *frame) > * a function, we are still effectively in the stack frame of > * the caller, and the unwind info has no relevance yet. > */ > - if (frame->pc == frame->lr) > + if (pc == frame->lr) > return -URC_FAILURE; > frame->pc = frame->lr; > + frame->ex_frame = false; > return URC_OK; > } else if ((idx->insn & 0x80000000) == 0) > /* prel31 to the unwind table */ > @@ -515,6 +519,7 @@ int unwind_frame(struct stackframe *frame) > frame->lr = ctrl.vrs[LR]; > frame->pc = ctrl.vrs[PC]; > frame->lr_addr = ctrl.lr_addr; > + frame->ex_frame = false; > > return URC_OK; > } > @@ -544,6 +549,7 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk, > */ > here: > frame.pc = (unsigned long)&&here; > + frame.ex_frame = false; > } else { > /* task blocked in __switch_to */ > frame.fp = thread_saved_fp(tsk); > @@ -554,11 +560,12 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk, > */ > frame.lr = 0; > frame.pc = thread_saved_pc(tsk); > + frame.ex_frame = false; > } > > while (1) { > int urc; > - unsigned long where = frame.pc; > + unsigned long where = frame.ex_frame ? frame.pc : frame.pc - 4; > > urc = unwind_frame(&frame); > if (urc < 0) > . >
On Tue, Mar 05, 2024 at 09:58:46AM -0800, Josh Poimboeuf wrote: > This is an off-by-one bug which is common in unwinders, due to the fact > that the address on the stack points to the return address rather than > the call address. > > So, for example, when the last instruction of a function is a function > call (e.g., to a noreturn function), it can cause the unwinder to > incorrectly try to unwind from the function *after* the callee. I suppose this can only happen in __noreturn functions because that can be: foo: ... bl bar ... end of function and thus next function ... which results in LR pointing into the next function. Would it make better sense to lookup the LR value winding it back by one instruction like ORC on x86 does (as you mention) rather than the patch you proposed which looks rather large and complicated?
On Wed, Mar 06, 2024 at 09:52:01AM +0000, Russell King (Oracle) wrote: > On Tue, Mar 05, 2024 at 09:58:46AM -0800, Josh Poimboeuf wrote: > > This is an off-by-one bug which is common in unwinders, due to the fact > > that the address on the stack points to the return address rather than > > the call address. > > > > So, for example, when the last instruction of a function is a function > > call (e.g., to a noreturn function), it can cause the unwinder to > > incorrectly try to unwind from the function *after* the callee. > > I suppose this can only happen in __noreturn functions because that > can be: > > foo: > ... > bl bar > ... end of function and thus next function ... > > which results in LR pointing into the next function. > > Would it make better sense to lookup the LR value winding it back by > one instruction like ORC on x86 does (as you mention) rather than > the patch you proposed which looks rather large and complicated? That patch *is* an attempt to make it match ORC's behavior. What specifically looks complicated about it?
From: Russell King > Sent: 06 March 2024 09:52 > > On Tue, Mar 05, 2024 at 09:58:46AM -0800, Josh Poimboeuf wrote: > > This is an off-by-one bug which is common in unwinders, due to the fact > > that the address on the stack points to the return address rather than > > the call address. > > > > So, for example, when the last instruction of a function is a function > > call (e.g., to a noreturn function), it can cause the unwinder to > > incorrectly try to unwind from the function *after* the callee. > > I suppose this can only happen in __noreturn functions because that > can be: > > foo: > .. > bl bar > .. end of function and thus next function ... > > which results in LR pointing into the next function. > > Would it make better sense to lookup the LR value winding it back by > one instruction like ORC on x86 does (as you mention) rather than > the patch you proposed which looks rather large and complicated? Is it even possible to always reliably get a stack trace from a no-return function on a cpu that uses a 'lr'? If the function doesn't return then the compiler need not save 'lr' on stack and can still use it as a temporary register. Without a valid 'lr' I think all you can do is search the stack for a likely code address? Am I missing something? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 2024/3/6 1:58, Josh Poimboeuf wrote: > Adding ARM folks -- see > https://lkml.kernel.org/lkml/1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com > for the original bug report. > > This is an off-by-one bug which is common in unwinders, due to the fact > that the address on the stack points to the return address rather than > the call address. > > So, for example, when the last instruction of a function is a function > call (e.g., to a noreturn function), it can cause the unwinder to > incorrectly try to unwind from the function *after* the callee. > > For ORC (x86), we fixed this by decrementing the PC for call frames (but > not exception frames). I've seen user space unwinders do similar, for > non-signal frames. > > Something like the following might fix your issue (completely untested): > Thank you very much. I have verified that your patch can fix my issue. But I have some little questions. > diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h > index 360f0d2406bf..4891e38cdc1f 100644 > --- a/arch/arm/include/asm/stacktrace.h > +++ b/arch/arm/include/asm/stacktrace.h > @@ -21,9 +21,7 @@ struct stackframe { > struct llist_node *kr_cur; > struct task_struct *tsk; > #endif > -#ifdef CONFIG_UNWINDER_FRAME_POINTER > bool ex_frame; > -#endif > }; > > static __always_inline > @@ -37,9 +35,8 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame) > frame->kr_cur = NULL; > frame->tsk = current; > #endif > -#ifdef CONFIG_UNWINDER_FRAME_POINTER > - frame->ex_frame = in_entry_text(frame->pc); > -#endif > + frame->ex_frame = !!regs; > + 'regs' must not be NULL, frame->ex_frame will always be TRUE. So I think we just need to remove CONFIG_UNWINDER_FRAME_POINTER here. We don't need to change the frame->ex_frame assignment statement. > diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c > index 9d2192156087..99ded32196af 100644 > --- a/arch/arm/kernel/unwind.c > +++ b/arch/arm/kernel/unwind.c > @@ -407,7 +407,7 @@ int unwind_frame(struct stackframe *frame) > { > const struct unwind_idx *idx; > struct unwind_ctrl_block ctrl; > - unsigned long sp_low; > + unsigned long sp_low, pc; > > /* store the highest address on the stack to avoid crossing it*/ > sp_low = frame->sp; > @@ -417,19 +417,22 @@ int unwind_frame(struct stackframe *frame) > pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__, > frame->pc, frame->lr, frame->sp); > > - idx = unwind_find_idx(frame->pc); > + pc = frame->ex_frame ? frame->pc : frame->pc - 4; For details, see the unwind_next_frame function in the unwind_orc.c. Why subtract 4 here instead of 1? `pc = frame->ex_frame ? frame->pc : frame->pc - 1` Is it more appropriate? > + > + idx = unwind_find_idx(pc); > if (!idx) { > - if (frame->pc && kernel_text_address(frame->pc)) { > - if (in_module_plt(frame->pc) && frame->pc != frame->lr) { > + if (kernel_text_address(pc)) { > + if (in_module_plt(pc) && frame->pc != frame->lr) { > /* > * Quoting Ard: Veneers only set PC using a > * PC+immediate LDR, and so they don't affect > * the state of the stack or the register file > */ > frame->pc = frame->lr; > + frame->ex_frame = false; > return URC_OK; > } > - pr_warn("unwind: Index not found %08lx\n", frame->pc); > + pr_warn("unwind: Index not found %08lx\n", pc); > } > return -URC_FAILURE; > } > @@ -442,7 +445,7 @@ int unwind_frame(struct stackframe *frame) > if (idx->insn == 1) > /* can't unwind */ > return -URC_FAILURE; > - else if (frame->pc == prel31_to_addr(&idx->addr_offset)) { > + else if (frame->ex_frame && pc == prel31_to_addr(&idx->addr_offset)) { > /* > * Unwinding is tricky when we're halfway through the prologue, > * since the stack frame that the unwinder expects may not be > @@ -451,9 +454,10 @@ int unwind_frame(struct stackframe *frame) > * a function, we are still effectively in the stack frame of > * the caller, and the unwind info has no relevance yet. > */ > - if (frame->pc == frame->lr) > + if (pc == frame->lr) > return -URC_FAILURE; > frame->pc = frame->lr; > + frame->ex_frame = false; > return URC_OK; > } else if ((idx->insn & 0x80000000) == 0) > /* prel31 to the unwind table */ > @@ -515,6 +519,7 @@ int unwind_frame(struct stackframe *frame) > frame->lr = ctrl.vrs[LR]; > frame->pc = ctrl.vrs[PC]; > frame->lr_addr = ctrl.lr_addr; > + frame->ex_frame = false; Why is the value of `frame->ex_frame` directly set to false? Why is the value not determined based on `frame->pc`? That is, `frame->ex_frame = in_entry_text(frame->pc)` > > return URC_OK; > } > @@ -544,6 +549,7 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk, > */ > here: > frame.pc = (unsigned long)&&here; > + frame.ex_frame = false; > } else { > /* task blocked in __switch_to */ > frame.fp = thread_saved_fp(tsk); > @@ -554,11 +560,12 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk, > */ > frame.lr = 0; > frame.pc = thread_saved_pc(tsk); > + frame.ex_frame = false; > } > > while (1) { > int urc; > - unsigned long where = frame.pc; > + unsigned long where = frame.ex_frame ? frame.pc : frame.pc - 4; > > urc = unwind_frame(&frame); > if (urc < 0) > . > If I refer to your demo patch and submit a new bugfix patch, can I mark you as "Co-developed-by" in this new bugfix patch?
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 3064314..c37af70 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -437,7 +437,7 @@ static inline void user_access_restore(unsigned long flags) { } #endif #ifdef CONFIG_HARDENED_USERCOPY -void __noreturn usercopy_abort(const char *name, const char *detail, +void usercopy_abort(const char *name, const char *detail, bool to_user, unsigned long offset, unsigned long len); #endif diff --git a/mm/usercopy.c b/mm/usercopy.c index 83c164a..ca1b22e 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -83,7 +83,7 @@ static noinline int check_stack_object(const void *obj, unsigned long len) * kmem_cache_create_usercopy() function to create the cache (and * carefully audit the whitelist range). */ -void __noreturn usercopy_abort(const char *name, const char *detail, +void usercopy_abort(const char *name, const char *detail, bool to_user, unsigned long offset, unsigned long len) { diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h index 1685d7e..c7e341c 100644 --- a/tools/objtool/noreturns.h +++ b/tools/objtool/noreturns.h @@ -40,7 +40,6 @@ NORETURN(snp_abort) NORETURN(start_kernel) NORETURN(stop_this_cpu) -NORETURN(usercopy_abort) NORETURN(x86_64_start_kernel) NORETURN(x86_64_start_reservations) NORETURN(xen_cpu_bringup_again)
When the last instruction of a noreturn function is a call to another function, the return address falls outside of the function boundary. This seems to cause kernel to interrupt the backtrace. My testcase is as follow: ``` static volatile size_t unconst = 0; /* check_object_size __check_object_size check_kernel_text_object usercopy_abort("kernel text", ...) */ void test_usercopy_kernel(void) { check_object_size(schedule, unconst + PAGE_SIZE, 1); } static int __init test_usercopy_init(void) { test_usercopy_kernel(); return 0; } static void __exit test_usercopy_exit(void) { } module_init(test_usercopy_init); module_exit(test_usercopy_exit); MODULE_LICENSE("GPL"); ``` Running the testcase cause kernel oops, and then the call stack is incorrect: ``` usercopy: Kernel memory exposure attempt detected from kernel text Kernel BUG at usercopy_abort+0x98/0x9c Internal error: Oops - undefined instruction: 0 [#1] SMP ARM Modules linked in: test_usercopy(O+) usbcore usb_common CPU: 0 PID: 609 Comm: insmod Tainted: G O 5.10.0 #11 Hardware name: Hisilicon A9 PC is at usercopy_abort+0x98/0x9c LR is at usercopy_abort+0x98/0x9c [...] (usercopy_abort) from (memfd_fcntl+0x0/0x654) (memfd_fcntl) from (0xef7368f8) Code: e1a01004 e58dc004 e58de000 eb108378 (e7f001f2) ---[ end trace e5fdc684259b0883 ]--- Kernel panic - not syncing: Fatal exception ``` Why Does the kernel backtrace cause errors? Because usercopy_abort is marked as __noreturn. You can see the related disassembling: ``` c02f24ac <__check_object_size>: static_key_count(): [...] check_kernel_text_object(): linux/mm/usercopy.c:125 usercopy_abort("kernel text", NULL, to_user, ptr - textlow, n); c02f26c4: e3040110 movw r0, #16656 ; 0x4110 c02f26c8: e58d5000 str r5, [sp] c02f26cc: e0443003 sub r3, r4, r3 c02f26d0: e1a02007 mov r2, r7 c02f26d4: e34c0096 movt r0, #49302 ; 0xc096 c02f26d8: ebffff4c bl c02f2410 <usercopy_abort> c02f26dc <memfd_fcntl>: memfd_fcntl(): [...] ``` The last instruction of __check_object_size is a call to usercopy_abort, which is a noreturn function, the return address falls outside of the __check_object_size function boundary, the return address falls into the next function memfd_fcntl, therefore, an error occurs when the kernel backtrace. Delete __noreturn from usercopy_abort, the correct call stack is as follow: ``` (usercopy_abort) from (__check_object_size+0x170/0x234) (__check_object_size) from (test_usercopy_init+0x8/0xc0 [test_usercopy]) (test_usercopy_init [test_usercopy]) from (do_one_initcall+0xac/0x204) (do_one_initcall) from (do_init_module+0x44/0x1c8) (do_init_module) from (load_module+0x1d48/0x2434) (load_module) from (sys_finit_module+0xc0/0xf4) (sys_finit_module) from (ret_fast_syscall+0x0/0x50) ``` Fixes: b394d468e7d7 ("usercopy: Enhance and rename report_usercopy()") Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> --- include/linux/uaccess.h | 2 +- mm/usercopy.c | 2 +- tools/objtool/noreturns.h | 1 - 3 files changed, 2 insertions(+), 3 deletions(-)