diff mbox series

usercopy: delete __noreturn from usercopy_abort

Message ID 1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com (mailing list archive)
State Rejected
Headers show
Series usercopy: delete __noreturn from usercopy_abort | expand

Commit Message

Jiangfeng Xiao March 4, 2024, 1:39 a.m. UTC
When the last instruction of a noreturn function is a call
to another function, the return address falls outside
of the function boundary. This seems to cause kernel
to interrupt the backtrace.

My testcase is as follow:
```

static volatile size_t unconst = 0;
/*
check_object_size
    __check_object_size
        check_kernel_text_object
            usercopy_abort("kernel text", ...)
*/
void test_usercopy_kernel(void)
{
	check_object_size(schedule, unconst + PAGE_SIZE, 1);
}

static int __init test_usercopy_init(void)
{
        test_usercopy_kernel();
	return 0;
}

static void __exit test_usercopy_exit(void)
{
}

module_init(test_usercopy_init);
module_exit(test_usercopy_exit);
MODULE_LICENSE("GPL");
```

Running the testcase cause kernel oops,
and then the call stack is incorrect:
```
usercopy: Kernel memory exposure attempt detected from kernel text
Kernel BUG at usercopy_abort+0x98/0x9c
Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
Modules linked in: test_usercopy(O+) usbcore usb_common
CPU: 0 PID: 609 Comm: insmod Tainted: G           O      5.10.0 #11
Hardware name: Hisilicon A9
PC is at usercopy_abort+0x98/0x9c
LR is at usercopy_abort+0x98/0x9c
[...]
 (usercopy_abort) from (memfd_fcntl+0x0/0x654)
 (memfd_fcntl) from (0xef7368f8)
Code: e1a01004 e58dc004 e58de000 eb108378 (e7f001f2)
---[ end trace e5fdc684259b0883 ]---
Kernel panic - not syncing: Fatal exception
```

Why Does the kernel backtrace cause errors?
Because usercopy_abort is marked as __noreturn.

You can see the related disassembling:
```
c02f24ac <__check_object_size>:
static_key_count():
[...]
check_kernel_text_object():
linux/mm/usercopy.c:125
        usercopy_abort("kernel text", NULL, to_user, ptr - textlow, n);
c02f26c4:       e3040110        movw    r0, #16656      ; 0x4110
c02f26c8:       e58d5000        str     r5, [sp]
c02f26cc:       e0443003        sub     r3, r4, r3
c02f26d0:       e1a02007        mov     r2, r7
c02f26d4:       e34c0096        movt    r0, #49302      ; 0xc096
c02f26d8:       ebffff4c        bl      c02f2410 <usercopy_abort>

c02f26dc <memfd_fcntl>:
memfd_fcntl():
[...]
```

The last instruction of __check_object_size is a call to usercopy_abort,
which is a noreturn function, the return address falls outside of the
__check_object_size function boundary, the return address falls into
the next function memfd_fcntl, therefore,
an error occurs when the kernel backtrace.

Delete __noreturn from usercopy_abort,
the correct call stack is as follow:
```
 (usercopy_abort) from (__check_object_size+0x170/0x234)
 (__check_object_size) from (test_usercopy_init+0x8/0xc0 [test_usercopy])
 (test_usercopy_init [test_usercopy]) from (do_one_initcall+0xac/0x204)
 (do_one_initcall) from (do_init_module+0x44/0x1c8)
 (do_init_module) from (load_module+0x1d48/0x2434)
 (load_module) from (sys_finit_module+0xc0/0xf4)
 (sys_finit_module) from (ret_fast_syscall+0x0/0x50)
```

Fixes: b394d468e7d7 ("usercopy: Enhance and rename report_usercopy()")

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
---
 include/linux/uaccess.h   | 2 +-
 mm/usercopy.c             | 2 +-
 tools/objtool/noreturns.h | 1 -
 3 files changed, 2 insertions(+), 3 deletions(-)

Comments

Jann Horn March 4, 2024, 3:15 p.m. UTC | #1
On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
> When the last instruction of a noreturn function is a call
> to another function, the return address falls outside
> of the function boundary. This seems to cause kernel
> to interrupt the backtrace.
[...]
> Delete __noreturn from usercopy_abort,

This sounds like the actual bug is in the backtracing logic? I don't
think removing __noreturn annotations from an individual function is a
good fix, since the same thing can happen with other __noreturn
functions depending on what choices the compiler makes.
Kees Cook March 4, 2024, 5:40 p.m. UTC | #2
On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote:
> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
> > When the last instruction of a noreturn function is a call
> > to another function, the return address falls outside
> > of the function boundary. This seems to cause kernel
> > to interrupt the backtrace.

FWIW, all email from huawei.com continues to get eaten by anti-spam
checking. I've reported this a few times -- it'd be really nice if the
domain configuration could get fixed.

> [...]
> > Delete __noreturn from usercopy_abort,
> 
> This sounds like the actual bug is in the backtracing logic? I don't
> think removing __noreturn annotations from an individual function is a
> good fix, since the same thing can happen with other __noreturn
> functions depending on what choices the compiler makes.

Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG().
Jiangfeng Xiao March 5, 2024, 2:54 a.m. UTC | #3
On 2024/3/4 23:15, Jann Horn wrote:
> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
>> When the last instruction of a noreturn function is a call
>> to another function, the return address falls outside
>> of the function boundary. This seems to cause kernel
>> to interrupt the backtrace.
> [...]
>> Delete __noreturn from usercopy_abort,
> 
> This sounds like the actual bug is in the backtracing logic? I don't
> think removing __noreturn annotations from an individual function is a
> good fix, since the same thing can happen with other __noreturn
> functions depending on what choices the compiler makes.
> .
> 
Yes, you make a point. This may be a bug is in the backtracing logic, but
the kernel backtracing always parses symbols using (lr) instead of (lr-4).
This may be due to historical reasons or more comprehensive considerations.
In addition, modifying the implementation logic of the kernel backtracing
has a great impact. Therefore, I do not dare to modify the implementation
logic of the kernel backtracing.

Not all noreturn functions are ended with calling other functions.
Therefore, only a few individual functions may have the same problem.
In addition, deleting '__noreturn' from usercopy_abort does not
change the internal behavior of usercopy_abort function.
Therefore, there is no risk. Deleting '__noreturn' from usercopy_abort
is the solution that I can think of with minimal impact and minimum risk.

If you will submit a better patch to solve this problem,
I would like to learn from you. Thank you.
Jiangfeng Xiao March 5, 2024, 3:12 a.m. UTC | #4
On 2024/3/5 10:54, Jiangfeng Xiao wrote:
> 
> 
> On 2024/3/4 23:15, Jann Horn wrote:
>> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
>>> When the last instruction of a noreturn function is a call
>>> to another function, the return address falls outside
>>> of the function boundary. This seems to cause kernel
>>> to interrupt the backtrace.
>> [...]
>>> Delete __noreturn from usercopy_abort,
>>
>> This sounds like the actual bug is in the backtracing logic? I don't
>> think removing __noreturn annotations from an individual function is a
>> good fix, since the same thing can happen with other __noreturn
>> functions depending on what choices the compiler makes.
>> .
>>
> Yes, you make a point. This may be a bug is in the backtracing logic, but
> the kernel backtracing always parses symbols using (lr) instead of (lr-4).
> This may be due to historical reasons or more comprehensive considerations.
> In addition, modifying the implementation logic of the kernel backtracing
> has a great impact. Therefore, I do not dare to modify the implementation
> logic of the kernel backtracing.
> 
> Not all noreturn functions are ended with calling other functions.
> Therefore, only a few individual functions may have the same problem.
> In addition, deleting '__noreturn' from usercopy_abort does not
> change the internal behavior of usercopy_abort function.
> Therefore, there is no risk. Deleting '__noreturn' from usercopy_abort
> is the solution that I can think of with minimal impact and minimum risk.
> 
> If you will submit a better patch to solve this problem,
> I would like to learn from you. Thank you.
> 
What are your suggestions on modifying the kernel backtracing logic
or deleting '__noretrun'? If you insist on modifying the kernel
backtracing logic and persuade me with good reasons, I can also try
to submit the patch for modifying the kernel backtracing logic
to the community.
Jiangfeng Xiao March 5, 2024, 3:31 a.m. UTC | #5
On 2024/3/5 1:40, Kees Cook wrote:
> On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote:
>> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
>>> When the last instruction of a noreturn function is a call
>>> to another function, the return address falls outside
>>> of the function boundary. This seems to cause kernel
>>> to interrupt the backtrace.
> 
> FWIW, all email from huawei.com continues to get eaten by anti-spam
> checking. I've reported this a few times -- it'd be really nice if the
> domain configuration could get fixed.
> 
>> [...]
>>> Delete __noreturn from usercopy_abort,
>>
>> This sounds like the actual bug is in the backtracing logic? I don't
>> think removing __noreturn annotations from an individual function is a
>> good fix, since the same thing can happen with other __noreturn
>> functions depending on what choices the compiler makes.
> 
> Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG().
> 
When the user directly or indirectly calls usercopy_abort,
the final call stack is incorrect, and the
code where the problem occurs cannot be located.
In this case, the user will be frustrated.

For the usercopy_abort function, whether '__noreturn' is added
does not affect the internal behavior of the usercopy_abort function.
Therefore, it is recommended that '__noreturn' be deleted
so that backtrace can work properly.
Kees Cook March 5, 2024, 9:32 a.m. UTC | #6
On Tue, Mar 05, 2024 at 11:31:06AM +0800, Jiangfeng Xiao wrote:
> 
> 
> On 2024/3/5 1:40, Kees Cook wrote:
> > On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote:
> >> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
> >>> When the last instruction of a noreturn function is a call
> >>> to another function, the return address falls outside
> >>> of the function boundary. This seems to cause kernel
> >>> to interrupt the backtrace.
> > 
> > FWIW, all email from huawei.com continues to get eaten by anti-spam
> > checking. I've reported this a few times -- it'd be really nice if the
> > domain configuration could get fixed.
> > 
> >> [...]
> >>> Delete __noreturn from usercopy_abort,
> >>
> >> This sounds like the actual bug is in the backtracing logic? I don't
> >> think removing __noreturn annotations from an individual function is a
> >> good fix, since the same thing can happen with other __noreturn
> >> functions depending on what choices the compiler makes.
> > 
> > Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG().
> > 
> When the user directly or indirectly calls usercopy_abort,
> the final call stack is incorrect, and the
> code where the problem occurs cannot be located.
> In this case, the user will be frustrated.

Can you please give an example of this?

> For the usercopy_abort function, whether '__noreturn' is added
> does not affect the internal behavior of the usercopy_abort function.
> Therefore, it is recommended that '__noreturn' be deleted
> so that backtrace can work properly.

This isn't acceptable. Removing __noreturn this will break
objtool's processing of execution flow for livepatching, IBT, and
KCFI instrumentation. These all depend on an accurate control flow
descriptions, and usercopy_abort is correctly marked __noreturn.
Jiangfeng Xiao March 5, 2024, 11:38 a.m. UTC | #7
On 2024/3/5 17:32, Kees Cook wrote:
> On Tue, Mar 05, 2024 at 11:31:06AM +0800, Jiangfeng Xiao wrote:
>>
>>
>> On 2024/3/5 1:40, Kees Cook wrote:
>>> On Mon, Mar 04, 2024 at 04:15:07PM +0100, Jann Horn wrote:
>>>> On Mon, Mar 4, 2024 at 3:02 AM Jiangfeng Xiao <xiaojiangfeng@huawei.com> wrote:
>>>>> When the last instruction of a noreturn function is a call
>>>>> to another function, the return address falls outside
>>>>> of the function boundary. This seems to cause kernel
>>>>> to interrupt the backtrace.
>>>
>>> FWIW, all email from huawei.com continues to get eaten by anti-spam
>>> checking. I've reported this a few times -- it'd be really nice if the
>>> domain configuration could get fixed.
>>>
>>>> [...]
>>>>> Delete __noreturn from usercopy_abort,
>>>>
>>>> This sounds like the actual bug is in the backtracing logic? I don't
>>>> think removing __noreturn annotations from an individual function is a
>>>> good fix, since the same thing can happen with other __noreturn
>>>> functions depending on what choices the compiler makes.
>>>
>>> Yeah, NAK. usercopy_abort() doesn't return. It ends with BUG().
>>>
>> When the user directly or indirectly calls usercopy_abort,
>> the final call stack is incorrect, and the
>> code where the problem occurs cannot be located.
>> In this case, the user will be frustrated.
> 
> Can you please give an example of this?

The main configurations of my kernel are as follows:

CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is enabled.
(This config uses the compilation parameter -O2.)

CONFIG_RELOCATABLE is disabled.
(This config uses the compilation option -fpic.)

You can use the following kernel module testcase
to reproduce this problem on the ARM32 platform.

```
#include <linux/module.h>
#include <linux/sched.h>

static volatile size_t unconst = 0;
/*
check_object_size
    __check_object_size
        check_kernel_text_object
            usercopy_abort("kernel text", ...)
*/
void test_usercopy_kernel(void)
{
        check_object_size(schedule, unconst + PAGE_SIZE, 1);
}

static int __init test_usercopy_init(void)
{
        test_usercopy_kernel();
        return 0;
}

static void __exit test_usercopy_exit(void)
{
}

module_init(test_usercopy_init);
module_exit(test_usercopy_exit);
MODULE_LICENSE("GPL");
```

> 
>> For the usercopy_abort function, whether '__noreturn' is added
>> does not affect the internal behavior of the usercopy_abort function.
>> Therefore, it is recommended that '__noreturn' be deleted
>> so that backtrace can work properly.
> 
> This isn't acceptable. Removing __noreturn this will break
> objtool's processing of execution flow for livepatching, IBT, and
> KCFI instrumentation. These all depend on an accurate control flow
> descriptions, and usercopy_abort is correctly marked __noreturn.
> 

Thank you for providing this information.
I'll go back to further understand how __noreturn is used
in features such as KCFI and livepatching.
Josh Poimboeuf March 5, 2024, 5:58 p.m. UTC | #8
> >> For the usercopy_abort function, whether '__noreturn' is added
> >> does not affect the internal behavior of the usercopy_abort function.
> >> Therefore, it is recommended that '__noreturn' be deleted
> >> so that backtrace can work properly.
> > 
> > This isn't acceptable. Removing __noreturn this will break
> > objtool's processing of execution flow for livepatching, IBT, and
> > KCFI instrumentation. These all depend on an accurate control flow
> > descriptions, and usercopy_abort is correctly marked __noreturn.

__noreturn also has the benefit of enabling the compiler to produce more
compact code for callees.

> Thank you for providing this information.
> I'll go back to further understand how __noreturn is used
> in features such as KCFI and livepatching.

Adding ARM folks -- see
https://lkml.kernel.org/lkml/1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com
for the original bug report.

This is an off-by-one bug which is common in unwinders, due to the fact
that the address on the stack points to the return address rather than
the call address.

So, for example, when the last instruction of a function is a function
call (e.g., to a noreturn function), it can cause the unwinder to
incorrectly try to unwind from the function *after* the callee.

For ORC (x86), we fixed this by decrementing the PC for call frames (but
not exception frames).  I've seen user space unwinders do similar, for
non-signal frames.

Something like the following might fix your issue (completely untested):

diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
index 360f0d2406bf..4891e38cdc1f 100644
--- a/arch/arm/include/asm/stacktrace.h
+++ b/arch/arm/include/asm/stacktrace.h
@@ -21,9 +21,7 @@ struct stackframe {
 	struct llist_node *kr_cur;
 	struct task_struct *tsk;
 #endif
-#ifdef CONFIG_UNWINDER_FRAME_POINTER
 	bool ex_frame;
-#endif
 };
 
 static __always_inline
@@ -37,9 +35,8 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame)
 		frame->kr_cur = NULL;
 		frame->tsk = current;
 #endif
-#ifdef CONFIG_UNWINDER_FRAME_POINTER
-		frame->ex_frame = in_entry_text(frame->pc);
-#endif
+		frame->ex_frame = !!regs;
+
 }
 
 extern int unwind_frame(struct stackframe *frame);
diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c
index 620aa82e3bdd..caed7436da09 100644
--- a/arch/arm/kernel/stacktrace.c
+++ b/arch/arm/kernel/stacktrace.c
@@ -154,9 +154,6 @@ static void start_stack_trace(struct stackframe *frame, struct task_struct *task
 	frame->kr_cur = NULL;
 	frame->tsk = task;
 #endif
-#ifdef CONFIG_UNWINDER_FRAME_POINTER
-	frame->ex_frame = in_entry_text(frame->pc);
-#endif
 }
 
 void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
@@ -167,6 +164,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 	if (regs) {
 		start_stack_trace(&frame, NULL, regs->ARM_fp, regs->ARM_sp,
 				  regs->ARM_lr, regs->ARM_pc);
+		frame.ex_frame = true;
 	} else if (task != current) {
 #ifdef CONFIG_SMP
 		/*
@@ -180,6 +178,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 				  thread_saved_sp(task), 0,
 				  thread_saved_pc(task));
 #endif
+		frame.ex_frame = false;
 	} else {
 here:
 		start_stack_trace(&frame, task,
@@ -187,6 +186,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 				  current_stack_pointer,
 				  (unsigned long)__builtin_return_address(0),
 				  (unsigned long)&&here);
+		frame.ex_frame = false;
 		/* skip this function */
 		if (unwind_frame(&frame))
 			return;
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 3bad79db5d6e..46a5b1eb3f0a 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -84,10 +84,10 @@ void dump_backtrace_entry(unsigned long where, unsigned long from,
 	printk("%sFunction entered at [<%08lx>] from [<%08lx>]\n",
 		loglvl, where, from);
 #elif defined CONFIG_BACKTRACE_VERBOSE
-	printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pS)\n",
+	printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pB)\n",
 		loglvl, where, (void *)where, from, (void *)from);
 #else
-	printk("%s %ps from %pS\n", loglvl, (void *)where, (void *)from);
+	printk("%s %ps from %pB\n", loglvl, (void *)where, (void *)from);
 #endif
 
 	if (in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE))
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 9d2192156087..99ded32196af 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -407,7 +407,7 @@ int unwind_frame(struct stackframe *frame)
 {
 	const struct unwind_idx *idx;
 	struct unwind_ctrl_block ctrl;
-	unsigned long sp_low;
+	unsigned long sp_low, pc;
 
 	/* store the highest address on the stack to avoid crossing it*/
 	sp_low = frame->sp;
@@ -417,19 +417,22 @@ int unwind_frame(struct stackframe *frame)
 	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
 		 frame->pc, frame->lr, frame->sp);
 
-	idx = unwind_find_idx(frame->pc);
+	pc = frame->ex_frame ? frame->pc : frame->pc - 4;
+
+	idx = unwind_find_idx(pc);
 	if (!idx) {
-		if (frame->pc && kernel_text_address(frame->pc)) {
-			if (in_module_plt(frame->pc) && frame->pc != frame->lr) {
+		if (kernel_text_address(pc)) {
+			if (in_module_plt(pc) && frame->pc != frame->lr) {
 				/*
 				 * Quoting Ard: Veneers only set PC using a
 				 * PC+immediate LDR, and so they don't affect
 				 * the state of the stack or the register file
 				 */
 				frame->pc = frame->lr;
+				frame->ex_frame = false;
 				return URC_OK;
 			}
-			pr_warn("unwind: Index not found %08lx\n", frame->pc);
+			pr_warn("unwind: Index not found %08lx\n", pc);
 		}
 		return -URC_FAILURE;
 	}
@@ -442,7 +445,7 @@ int unwind_frame(struct stackframe *frame)
 	if (idx->insn == 1)
 		/* can't unwind */
 		return -URC_FAILURE;
-	else if (frame->pc == prel31_to_addr(&idx->addr_offset)) {
+	else if (frame->ex_frame && pc == prel31_to_addr(&idx->addr_offset)) {
 		/*
 		 * Unwinding is tricky when we're halfway through the prologue,
 		 * since the stack frame that the unwinder expects may not be
@@ -451,9 +454,10 @@ int unwind_frame(struct stackframe *frame)
 		 * a function, we are still effectively in the stack frame of
 		 * the caller, and the unwind info has no relevance yet.
 		 */
-		if (frame->pc == frame->lr)
+		if (pc == frame->lr)
 			return -URC_FAILURE;
 		frame->pc = frame->lr;
+		frame->ex_frame = false;
 		return URC_OK;
 	} else if ((idx->insn & 0x80000000) == 0)
 		/* prel31 to the unwind table */
@@ -515,6 +519,7 @@ int unwind_frame(struct stackframe *frame)
 	frame->lr = ctrl.vrs[LR];
 	frame->pc = ctrl.vrs[PC];
 	frame->lr_addr = ctrl.lr_addr;
+	frame->ex_frame = false;
 
 	return URC_OK;
 }
@@ -544,6 +549,7 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 		 */
 here:
 		frame.pc = (unsigned long)&&here;
+		frame.ex_frame = false;
 	} else {
 		/* task blocked in __switch_to */
 		frame.fp = thread_saved_fp(tsk);
@@ -554,11 +560,12 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 		 */
 		frame.lr = 0;
 		frame.pc = thread_saved_pc(tsk);
+		frame.ex_frame = false;
 	}
 
 	while (1) {
 		int urc;
-		unsigned long where = frame.pc;
+		unsigned long where = frame.ex_frame ? frame.pc : frame.pc - 4;
 
 		urc = unwind_frame(&frame);
 		if (urc < 0)
Jiangfeng Xiao March 6, 2024, 4 a.m. UTC | #9
On 2024/3/6 1:58, Josh Poimboeuf wrote:
>>>> For the usercopy_abort function, whether '__noreturn' is added
>>>> does not affect the internal behavior of the usercopy_abort function.
>>>> Therefore, it is recommended that '__noreturn' be deleted
>>>> so that backtrace can work properly.
>>>
>>> This isn't acceptable. Removing __noreturn this will break
>>> objtool's processing of execution flow for livepatching, IBT, and
>>> KCFI instrumentation. These all depend on an accurate control flow
>>> descriptions, and usercopy_abort is correctly marked __noreturn.
> 
> __noreturn also has the benefit of enabling the compiler to produce more
> compact code for callees.
> 
>> Thank you for providing this information.
>> I'll go back to further understand how __noreturn is used
>> in features such as KCFI and livepatching.
> 
> Adding ARM folks -- see
> https://lkml.kernel.org/lkml/1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com
> for the original bug report.
> 
> This is an off-by-one bug which is common in unwinders, due to the fact
> that the address on the stack points to the return address rather than
> the call address.
> 

Thanks for your advice. I think I understand. To solve this problem, I need to
fix the off-by-one bug which is common in unwinders. I'll try to fix it later
by referring to your patch.

> So, for example, when the last instruction of a function is a function
> call (e.g., to a noreturn function), it can cause the unwinder to
> incorrectly try to unwind from the function *after* the callee.
> 
> For ORC (x86), we fixed this by decrementing the PC for call frames (but
> not exception frames).  I've seen user space unwinders do similar, for
> non-signal frames.
> 
> Something like the following might fix your issue (completely untested):
> 
> diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
> index 360f0d2406bf..4891e38cdc1f 100644
> --- a/arch/arm/include/asm/stacktrace.h
> +++ b/arch/arm/include/asm/stacktrace.h
> @@ -21,9 +21,7 @@ struct stackframe {
>  	struct llist_node *kr_cur;
>  	struct task_struct *tsk;
>  #endif
> -#ifdef CONFIG_UNWINDER_FRAME_POINTER
>  	bool ex_frame;
> -#endif
>  };
>  
>  static __always_inline
> @@ -37,9 +35,8 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame)
>  		frame->kr_cur = NULL;
>  		frame->tsk = current;
>  #endif
> -#ifdef CONFIG_UNWINDER_FRAME_POINTER
> -		frame->ex_frame = in_entry_text(frame->pc);
> -#endif
> +		frame->ex_frame = !!regs;
> +
>  }
>  
>  extern int unwind_frame(struct stackframe *frame);
> diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c
> index 620aa82e3bdd..caed7436da09 100644
> --- a/arch/arm/kernel/stacktrace.c
> +++ b/arch/arm/kernel/stacktrace.c
> @@ -154,9 +154,6 @@ static void start_stack_trace(struct stackframe *frame, struct task_struct *task
>  	frame->kr_cur = NULL;
>  	frame->tsk = task;
>  #endif
> -#ifdef CONFIG_UNWINDER_FRAME_POINTER
> -	frame->ex_frame = in_entry_text(frame->pc);
> -#endif
>  }
>  
>  void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> @@ -167,6 +164,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  	if (regs) {
>  		start_stack_trace(&frame, NULL, regs->ARM_fp, regs->ARM_sp,
>  				  regs->ARM_lr, regs->ARM_pc);
> +		frame.ex_frame = true;
>  	} else if (task != current) {
>  #ifdef CONFIG_SMP
>  		/*
> @@ -180,6 +178,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  				  thread_saved_sp(task), 0,
>  				  thread_saved_pc(task));
>  #endif
> +		frame.ex_frame = false;
>  	} else {
>  here:
>  		start_stack_trace(&frame, task,
> @@ -187,6 +186,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  				  current_stack_pointer,
>  				  (unsigned long)__builtin_return_address(0),
>  				  (unsigned long)&&here);
> +		frame.ex_frame = false;
>  		/* skip this function */
>  		if (unwind_frame(&frame))
>  			return;
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 3bad79db5d6e..46a5b1eb3f0a 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -84,10 +84,10 @@ void dump_backtrace_entry(unsigned long where, unsigned long from,
>  	printk("%sFunction entered at [<%08lx>] from [<%08lx>]\n",
>  		loglvl, where, from);
>  #elif defined CONFIG_BACKTRACE_VERBOSE
> -	printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pS)\n",
> +	printk("%s[<%08lx>] (%ps) from [<%08lx>] (%pB)\n",
>  		loglvl, where, (void *)where, from, (void *)from);
>  #else
> -	printk("%s %ps from %pS\n", loglvl, (void *)where, (void *)from);
> +	printk("%s %ps from %pB\n", loglvl, (void *)where, (void *)from);
>  #endif
>  
>  	if (in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE))
> diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> index 9d2192156087..99ded32196af 100644
> --- a/arch/arm/kernel/unwind.c
> +++ b/arch/arm/kernel/unwind.c
> @@ -407,7 +407,7 @@ int unwind_frame(struct stackframe *frame)
>  {
>  	const struct unwind_idx *idx;
>  	struct unwind_ctrl_block ctrl;
> -	unsigned long sp_low;
> +	unsigned long sp_low, pc;
>  
>  	/* store the highest address on the stack to avoid crossing it*/
>  	sp_low = frame->sp;
> @@ -417,19 +417,22 @@ int unwind_frame(struct stackframe *frame)
>  	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
>  		 frame->pc, frame->lr, frame->sp);
>  
> -	idx = unwind_find_idx(frame->pc);
> +	pc = frame->ex_frame ? frame->pc : frame->pc - 4;
> +
> +	idx = unwind_find_idx(pc);
>  	if (!idx) {
> -		if (frame->pc && kernel_text_address(frame->pc)) {
> -			if (in_module_plt(frame->pc) && frame->pc != frame->lr) {
> +		if (kernel_text_address(pc)) {
> +			if (in_module_plt(pc) && frame->pc != frame->lr) {
>  				/*
>  				 * Quoting Ard: Veneers only set PC using a
>  				 * PC+immediate LDR, and so they don't affect
>  				 * the state of the stack or the register file
>  				 */
>  				frame->pc = frame->lr;
> +				frame->ex_frame = false;
>  				return URC_OK;
>  			}
> -			pr_warn("unwind: Index not found %08lx\n", frame->pc);
> +			pr_warn("unwind: Index not found %08lx\n", pc);
>  		}
>  		return -URC_FAILURE;
>  	}
> @@ -442,7 +445,7 @@ int unwind_frame(struct stackframe *frame)
>  	if (idx->insn == 1)
>  		/* can't unwind */
>  		return -URC_FAILURE;
> -	else if (frame->pc == prel31_to_addr(&idx->addr_offset)) {
> +	else if (frame->ex_frame && pc == prel31_to_addr(&idx->addr_offset)) {
>  		/*
>  		 * Unwinding is tricky when we're halfway through the prologue,
>  		 * since the stack frame that the unwinder expects may not be
> @@ -451,9 +454,10 @@ int unwind_frame(struct stackframe *frame)
>  		 * a function, we are still effectively in the stack frame of
>  		 * the caller, and the unwind info has no relevance yet.
>  		 */
> -		if (frame->pc == frame->lr)
> +		if (pc == frame->lr)
>  			return -URC_FAILURE;
>  		frame->pc = frame->lr;
> +		frame->ex_frame = false;
>  		return URC_OK;
>  	} else if ((idx->insn & 0x80000000) == 0)
>  		/* prel31 to the unwind table */
> @@ -515,6 +519,7 @@ int unwind_frame(struct stackframe *frame)
>  	frame->lr = ctrl.vrs[LR];
>  	frame->pc = ctrl.vrs[PC];
>  	frame->lr_addr = ctrl.lr_addr;
> +	frame->ex_frame = false;
>  
>  	return URC_OK;
>  }
> @@ -544,6 +549,7 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  		 */
>  here:
>  		frame.pc = (unsigned long)&&here;
> +		frame.ex_frame = false;
>  	} else {
>  		/* task blocked in __switch_to */
>  		frame.fp = thread_saved_fp(tsk);
> @@ -554,11 +560,12 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  		 */
>  		frame.lr = 0;
>  		frame.pc = thread_saved_pc(tsk);
> +		frame.ex_frame = false;
>  	}
>  
>  	while (1) {
>  		int urc;
> -		unsigned long where = frame.pc;
> +		unsigned long where = frame.ex_frame ? frame.pc : frame.pc - 4;
>  
>  		urc = unwind_frame(&frame);
>  		if (urc < 0)
> .
>
Russell King (Oracle) March 6, 2024, 9:52 a.m. UTC | #10
On Tue, Mar 05, 2024 at 09:58:46AM -0800, Josh Poimboeuf wrote:
> This is an off-by-one bug which is common in unwinders, due to the fact
> that the address on the stack points to the return address rather than
> the call address.
> 
> So, for example, when the last instruction of a function is a function
> call (e.g., to a noreturn function), it can cause the unwinder to
> incorrectly try to unwind from the function *after* the callee.

I suppose this can only happen in __noreturn functions because that
can be:

foo:
...
	bl	bar
... end of function and thus next function ...

which results in LR pointing into the next function.

Would it make better sense to lookup the LR value winding it back by
one instruction like ORC on x86 does (as you mention) rather than
the patch you proposed which looks rather large and complicated?
Josh Poimboeuf March 6, 2024, 4:02 p.m. UTC | #11
On Wed, Mar 06, 2024 at 09:52:01AM +0000, Russell King (Oracle) wrote:
> On Tue, Mar 05, 2024 at 09:58:46AM -0800, Josh Poimboeuf wrote:
> > This is an off-by-one bug which is common in unwinders, due to the fact
> > that the address on the stack points to the return address rather than
> > the call address.
> > 
> > So, for example, when the last instruction of a function is a function
> > call (e.g., to a noreturn function), it can cause the unwinder to
> > incorrectly try to unwind from the function *after* the callee.
> 
> I suppose this can only happen in __noreturn functions because that
> can be:
> 
> foo:
> ...
> 	bl	bar
> ... end of function and thus next function ...
> 
> which results in LR pointing into the next function.
> 
> Would it make better sense to lookup the LR value winding it back by
> one instruction like ORC on x86 does (as you mention) rather than
> the patch you proposed which looks rather large and complicated?

That patch *is* an attempt to make it match ORC's behavior.  What
specifically looks complicated about it?
David Laight March 9, 2024, 2:58 p.m. UTC | #12
From: Russell King
> Sent: 06 March 2024 09:52
> 
> On Tue, Mar 05, 2024 at 09:58:46AM -0800, Josh Poimboeuf wrote:
> > This is an off-by-one bug which is common in unwinders, due to the fact
> > that the address on the stack points to the return address rather than
> > the call address.
> >
> > So, for example, when the last instruction of a function is a function
> > call (e.g., to a noreturn function), it can cause the unwinder to
> > incorrectly try to unwind from the function *after* the callee.
> 
> I suppose this can only happen in __noreturn functions because that
> can be:
> 
> foo:
> ..
> 	bl	bar
> .. end of function and thus next function ...
> 
> which results in LR pointing into the next function.
> 
> Would it make better sense to lookup the LR value winding it back by
> one instruction like ORC on x86 does (as you mention) rather than
> the patch you proposed which looks rather large and complicated?

Is it even possible to always reliably get a stack trace from
a no-return function on a cpu that uses a 'lr'?

If the function doesn't return then the compiler need not save
'lr' on stack and can still use it as a temporary register.
Without a valid 'lr' I think all you can do is search the stack
for a likely code address?

Am I missing something?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Jiangfeng Xiao March 18, 2024, 4:01 a.m. UTC | #13
On 2024/3/6 1:58, Josh Poimboeuf wrote:

> Adding ARM folks -- see
> https://lkml.kernel.org/lkml/1709516385-7778-1-git-send-email-xiaojiangfeng@huawei.com
> for the original bug report.
> 
> This is an off-by-one bug which is common in unwinders, due to the fact
> that the address on the stack points to the return address rather than
> the call address.
> 
> So, for example, when the last instruction of a function is a function
> call (e.g., to a noreturn function), it can cause the unwinder to
> incorrectly try to unwind from the function *after* the callee.
> 
> For ORC (x86), we fixed this by decrementing the PC for call frames (but
> not exception frames).  I've seen user space unwinders do similar, for
> non-signal frames.
> 
> Something like the following might fix your issue (completely untested):
> 

Thank you very much. I have verified that your patch can fix my issue.
But I have some little questions.

> diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
> index 360f0d2406bf..4891e38cdc1f 100644
> --- a/arch/arm/include/asm/stacktrace.h
> +++ b/arch/arm/include/asm/stacktrace.h
> @@ -21,9 +21,7 @@ struct stackframe {
>  	struct llist_node *kr_cur;
>  	struct task_struct *tsk;
>  #endif
> -#ifdef CONFIG_UNWINDER_FRAME_POINTER
>  	bool ex_frame;
> -#endif
>  };
>  
>  static __always_inline
> @@ -37,9 +35,8 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame)
>  		frame->kr_cur = NULL;
>  		frame->tsk = current;
>  #endif
> -#ifdef CONFIG_UNWINDER_FRAME_POINTER
> -		frame->ex_frame = in_entry_text(frame->pc);
> -#endif
> +		frame->ex_frame = !!regs;
> +

'regs' must not be NULL, frame->ex_frame will always be TRUE.
So I think we just need to remove CONFIG_UNWINDER_FRAME_POINTER here.
We don't need to change the frame->ex_frame assignment statement.


> diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
> index 9d2192156087..99ded32196af 100644
> --- a/arch/arm/kernel/unwind.c
> +++ b/arch/arm/kernel/unwind.c
> @@ -407,7 +407,7 @@ int unwind_frame(struct stackframe *frame)
>  {
>  	const struct unwind_idx *idx;
>  	struct unwind_ctrl_block ctrl;
> -	unsigned long sp_low;
> +	unsigned long sp_low, pc;
>  
>  	/* store the highest address on the stack to avoid crossing it*/
>  	sp_low = frame->sp;
> @@ -417,19 +417,22 @@ int unwind_frame(struct stackframe *frame)
>  	pr_debug("%s(pc = %08lx lr = %08lx sp = %08lx)\n", __func__,
>  		 frame->pc, frame->lr, frame->sp);
>  
> -	idx = unwind_find_idx(frame->pc);
> +	pc = frame->ex_frame ? frame->pc : frame->pc - 4;

For details, see the unwind_next_frame function in the unwind_orc.c.
Why subtract 4 here instead of 1?
`pc = frame->ex_frame ? frame->pc : frame->pc - 1`
Is it more appropriate?

> +
> +	idx = unwind_find_idx(pc);
>  	if (!idx) {
> -		if (frame->pc && kernel_text_address(frame->pc)) {
> -			if (in_module_plt(frame->pc) && frame->pc != frame->lr) {
> +		if (kernel_text_address(pc)) {
> +			if (in_module_plt(pc) && frame->pc != frame->lr) {
>  				/*
>  				 * Quoting Ard: Veneers only set PC using a
>  				 * PC+immediate LDR, and so they don't affect
>  				 * the state of the stack or the register file
>  				 */
>  				frame->pc = frame->lr;
> +				frame->ex_frame = false;
>  				return URC_OK;
>  			}
> -			pr_warn("unwind: Index not found %08lx\n", frame->pc);
> +			pr_warn("unwind: Index not found %08lx\n", pc);
>  		}
>  		return -URC_FAILURE;
>  	}
> @@ -442,7 +445,7 @@ int unwind_frame(struct stackframe *frame)
>  	if (idx->insn == 1)
>  		/* can't unwind */
>  		return -URC_FAILURE;
> -	else if (frame->pc == prel31_to_addr(&idx->addr_offset)) {
> +	else if (frame->ex_frame && pc == prel31_to_addr(&idx->addr_offset)) {
>  		/*
>  		 * Unwinding is tricky when we're halfway through the prologue,
>  		 * since the stack frame that the unwinder expects may not be
> @@ -451,9 +454,10 @@ int unwind_frame(struct stackframe *frame)
>  		 * a function, we are still effectively in the stack frame of
>  		 * the caller, and the unwind info has no relevance yet.
>  		 */
> -		if (frame->pc == frame->lr)
> +		if (pc == frame->lr)
>  			return -URC_FAILURE;
>  		frame->pc = frame->lr;
> +		frame->ex_frame = false;
>  		return URC_OK;
>  	} else if ((idx->insn & 0x80000000) == 0)
>  		/* prel31 to the unwind table */
> @@ -515,6 +519,7 @@ int unwind_frame(struct stackframe *frame)
>  	frame->lr = ctrl.vrs[LR];
>  	frame->pc = ctrl.vrs[PC];
>  	frame->lr_addr = ctrl.lr_addr;
> +	frame->ex_frame = false;

Why is the value of `frame->ex_frame` directly set to false?
Why is the value not determined based on `frame->pc`?
That is, `frame->ex_frame = in_entry_text(frame->pc)`

>  
>  	return URC_OK;
>  }
> @@ -544,6 +549,7 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  		 */
>  here:
>  		frame.pc = (unsigned long)&&here;
> +		frame.ex_frame = false;
>  	} else {
>  		/* task blocked in __switch_to */
>  		frame.fp = thread_saved_fp(tsk);
> @@ -554,11 +560,12 @@ void unwind_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>  		 */
>  		frame.lr = 0;
>  		frame.pc = thread_saved_pc(tsk);
> +		frame.ex_frame = false;
>  	}
>  
>  	while (1) {
>  		int urc;
> -		unsigned long where = frame.pc;
> +		unsigned long where = frame.ex_frame ? frame.pc : frame.pc - 4;
>  
>  		urc = unwind_frame(&frame);
>  		if (urc < 0)
> .
> 

If I refer to your demo patch and submit a new bugfix patch,
can I mark you as "Co-developed-by" in this new bugfix patch?
diff mbox series

Patch

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 3064314..c37af70 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -437,7 +437,7 @@  static inline void user_access_restore(unsigned long flags) { }
 #endif
 
 #ifdef CONFIG_HARDENED_USERCOPY
-void __noreturn usercopy_abort(const char *name, const char *detail,
+void usercopy_abort(const char *name, const char *detail,
 			       bool to_user, unsigned long offset,
 			       unsigned long len);
 #endif
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 83c164a..ca1b22e 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -83,7 +83,7 @@  static noinline int check_stack_object(const void *obj, unsigned long len)
  * kmem_cache_create_usercopy() function to create the cache (and
  * carefully audit the whitelist range).
  */
-void __noreturn usercopy_abort(const char *name, const char *detail,
+void usercopy_abort(const char *name, const char *detail,
 			       bool to_user, unsigned long offset,
 			       unsigned long len)
 {
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 1685d7e..c7e341c 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -40,7 +40,6 @@ 
 NORETURN(snp_abort)
 NORETURN(start_kernel)
 NORETURN(stop_this_cpu)
-NORETURN(usercopy_abort)
 NORETURN(x86_64_start_kernel)
 NORETURN(x86_64_start_reservations)
 NORETURN(xen_cpu_bringup_again)