diff mbox series

ARM: check __ex_table in do_bad()

Message ID 1573112713-10115-1-git-send-email-Lvqiang.Huang@unisoc.com (mailing list archive)
State New, archived
Headers show
Series ARM: check __ex_table in do_bad() | expand

Commit Message

黄吕强 (Lvqiang Huang) Nov. 7, 2019, 7:45 a.m. UTC
We got many crashs in for_each_frame+0x18 arch/arm/lib/backtrace.S
    1003: ldr r2, [sv_pc, #-4]

The backtrace is
    dump_backtrace
    show_stack
    sched_show_task
    show_state_filter
    sysrq_handle_showstate_blocked
    __handle_sysrq
    write_sysrq_trigger
    proc_reg_write
    __vfs_write
    vfs_write
    sys_write

Related Kernel config
    CONFIG_CPU_SW_DOMAIN_PAN=y
    # CONFIG_ARM_UNWIND is not set
    CONFIG_FRAME_POINTER=y

The task A was dumping the stack of an UN task B. However, the task B
scheduled to run on another CPU, which cause it stack content changed.
Then, task A may hit a page domain fault and die().
    [520.661314] Unhandled fault: page domain fault (0x01b) at 0x32848c02

The addr 0x32848c02 is a valid user-space address.
    PAGE DIRECTORY: d1854000
      PGD: d1854ca0 => bb21e835
      PMD: d1854ca0 => bb21e835
      PTE: bb21e120 => afffa79f

With CONFIG_CPU_SW_DOMAIN_PAN=y, a page domain fault occurred.
    { do_bad, SIGSEGV, SEGV_ACCERR, "page domain fault"},

Without check the __ex_table entry, do_bad() just return fault and die().
    .pushsection __ex_table,"a"
    .long	1003b, 1006b

This patch try __ex_table in do_bad(), the same as in __do_kernel_fault().

Signed-off-by: Lvqiang <Lvqiang.Huang@unisoc.com>
---
 arch/arm/mm/fault.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Russell King (Oracle) Nov. 7, 2019, 9:24 a.m. UTC | #1
On Thu, Nov 07, 2019 at 03:45:13PM +0800, Lvqiang wrote:
> 
> We got many crashs in for_each_frame+0x18 arch/arm/lib/backtrace.S
>     1003: ldr r2, [sv_pc, #-4]
> 
> The backtrace is
>     dump_backtrace
>     show_stack
>     sched_show_task
>     show_state_filter
>     sysrq_handle_showstate_blocked
>     __handle_sysrq
>     write_sysrq_trigger
>     proc_reg_write
>     __vfs_write
>     vfs_write
>     sys_write
> 
> Related Kernel config
>     CONFIG_CPU_SW_DOMAIN_PAN=y
>     # CONFIG_ARM_UNWIND is not set
>     CONFIG_FRAME_POINTER=y
> 
> The task A was dumping the stack of an UN task B. However, the task B

What is "an UN task B"?

> scheduled to run on another CPU, which cause it stack content changed.
> Then, task A may hit a page domain fault and die().
>     [520.661314] Unhandled fault: page domain fault (0x01b) at 0x32848c02

So, the backtrace code is trying to access userspace.  It isn't supposed
to be accessing userspace - there are no guarantees that userspace will
be using frame pointers.  That is the bug.
黄吕强 (Lvqiang Huang) Nov. 7, 2019, 3:30 p.m. UTC | #2
Hi Russell,
Thanks a lot for the reply!

UN means TASK_INTERRUPTIBLE. 

Task A found the Task B was in TASK_INTERRUPTIBLE. 
But just during try to get the backtrace of Task B, the Task B changed to TASK_RUNNING

Task B push and pop to it stack during executing, so the stack context of task B changed a lot. 
But Task A calculated and pop a value as sv_fp of Task B. 
1002:		ldr	sv_fp, [frame, #-12]	@ get saved fp

But, the task B had been TASK_RUNNING, sv_fp Task A get can be any value chaned by the executing of Task B. 
It can be an accessible user-space address of Task A's address space. 

If we enable the CONFIG_ARM_UNWIND, the crash is gone. 

-----Original Message-----
From: Russell King - ARM Linux admin [mailto:linux@armlinux.org.uk] 
Sent: Thursday, November 07, 2019 5:24 PM
To: 黄吕强 (Lvqiang Huang)
Cc: ebiederm@xmission.com; dave.hansen@linux.intel.com; anshuman.khandual@arm.com; akpm@linux-foundation.org; f.fainelli@gmail.com; will@kernel.org; tglx@linutronix.de; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ARM: check __ex_table in do_bad()

On Thu, Nov 07, 2019 at 03:45:13PM +0800, Lvqiang wrote:
> 
> We got many crashs in for_each_frame+0x18 arch/arm/lib/backtrace.S
>     1003: ldr r2, [sv_pc, #-4]
> 
> The backtrace is
>     dump_backtrace
>     show_stack
>     sched_show_task
>     show_state_filter
>     sysrq_handle_showstate_blocked
>     __handle_sysrq
>     write_sysrq_trigger
>     proc_reg_write
>     __vfs_write
>     vfs_write
>     sys_write
> 
> Related Kernel config
>     CONFIG_CPU_SW_DOMAIN_PAN=y
>     # CONFIG_ARM_UNWIND is not set
>     CONFIG_FRAME_POINTER=y
> 
> The task A was dumping the stack of an UN task B. However, the task B

What is "an UN task B"?

> scheduled to run on another CPU, which cause it stack content changed.
> Then, task A may hit a page domain fault and die().
>     [520.661314] Unhandled fault: page domain fault (0x01b) at 
> 0x32848c02

So, the backtrace code is trying to access userspace.  It isn't supposed to be accessing userspace - there are no guarantees that userspace will be using frame pointers.  That is the bug.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up


============================================================================
This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions. 
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
黄吕强 (Lvqiang Huang) Nov. 7, 2019, 5:22 p.m. UTC | #3
> 在 2019年11月7日,17:24,Russell King - ARM Linux admin <linux@armlinux.org.uk> 写道:
> 
>> On Thu, Nov 07, 2019 at 03:45:13PM +0800, Lvqiang wrote:
>> 
>> We got many crashs in for_each_frame+0x18 arch/arm/lib/backtrace.S
>>    1003: ldr r2, [sv_pc, #-4]
>> 
>> The backtrace is
>>    dump_backtrace
>>    show_stack
>>    sched_show_task
>>    show_state_filter
>>    sysrq_handle_showstate_blocked
>>    __handle_sysrq
>>    write_sysrq_trigger
>>    proc_reg_write
>>    __vfs_write
>>    vfs_write
>>    sys_write
>> 
>> Related Kernel config
>>    CONFIG_CPU_SW_DOMAIN_PAN=y
>>    # CONFIG_ARM_UNWIND is not set
>>    CONFIG_FRAME_POINTER=y
>> 
>> The task A was dumping the stack of an UN task B. However, the task B
> 
> What is "an UN task B"?

UN means TASK_UNINTERRUPTIBLE. 
(Sorry for the typo in the last reply)

>> scheduled to run on another CPU, which cause it stack content changed.
>> Then, task A may hit a page domain fault and die().
>>    [520.661314] Unhandled fault: page domain fault (0x01b) at 0x32848c02
> 
> So, the backtrace code is trying to access userspace.  It isn't supposed
> to be accessing userspace - there are no guarantees that userspace will
> be using frame pointers.  That is the bug.
> 

There is a race condition when try to get the backtrace of another task,whose frames may totally changed during the execution. 

> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up


============================================================================
This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions. 
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
黄吕强 (Lvqiang Huang) Nov. 8, 2019, 2:16 a.m. UTC | #4
Sorry for not having described it clearly, please let me add some more information. 

The kernel log for the scenario
[20461.271374] sysrq: SysRq : Show Blocked State
[20461.271405]   task                PC stack   pid father
[20461.271436] mbox-send-threa D c08cfad8     0    38      2 0x00000000
/*and ignore some logs abort the backtrace dump of some TASK_UNINTERRUPTIBLE tasks */
[20461.273387] fsck.exfat      D c08cfad8     0  6221   2276 0x00000000
[20461.273408] Backtrace:
[20461.273430] [<c08cf5d0>] (__schedule) from [<c08cff84>] (schedule+0x90/0xa8)
[20461.273442]  r10:ce009ef0 r9:ce009df4 r8:c0d0790c r7:00000082 r6:7fffffff r5:00000000
[20461.273477]  r4:ce008000
[20461.273497] [<c08cfef4>] (schedule) from [<c08d2b90>] (schedule_timeout+0x2c/0x26c)
[20461.273509]  r4:7fffffff r3:dc8ba693
[20461.273561] Unhandled fault: page domain fault (0x01b) at 0x32848c02
[20461.273576] pgd = d1854000
[20461.273587] [32848c02] *pgd=bb21e835
[20461.273607] Internal error: : 1b [#1] PREEMPT SMP ARM
[20461.278903] CPU: 2 PID: 5917 Comm: watchdog Tainted: G        W  O    4.4.147+ #1
[20461.278929] task: e9beecc0 task.stack: e30a4000
[20461.278949] PC is at for_each_frame+0x18/0x88
[20461.278965] LR is at vprintk_emit+0x470/0x4ec

The Task A: finally crashed task, PID: 5917 Comm: watchdog, running on CPU 2, dumping backtrace of all UN tasks.
The Task B: TASK_UNINTERRUPTIBLE to TASK_RUNNING when Task A is trying to dump its backtrace. 

The first 2 frames dump for task B are ok, see 
[20461.273430] [<c08cf5d0>] (__schedule) from [<c08cff84>] (schedule+0x90/0xa8)
[20461.273497] [<c08cfef4>] (schedule) from [<c08d2b90>] (schedule_timeout+0x2c/0x26c)

Then task A crashed:
[20461.273561] Unhandled fault: page domain fault (0x01b) at 0x32848c02

From the RAM dump after kernel crash, we can see Task B had been scheduled to running on CPU 0. 
crash_arm> ps 6221
   PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
>  6221   2276   0  cde04880  RU   0.4   17784  13596  fsck.exfat

And the backtrace should changed, which cause the crash of Task A.
crash_arm> bt 6221
PID: 6221   TASK: cde04880  CPU: 0   COMMAND: "fsck.exfat"
 #0 [<c0117a5c>] (__kunmap_atomic) from [<c0413ae8>]
 #1 [<c0413894>] (copy_page_to_iter) from [<c01f4788>]
 #2 [<c01f439c>] (generic_file_read_iter) from [<c02725e8>]
 #3 [<c027257c>] (blkdev_read_iter) from [<c023b5b0>]
 #4 [<c023b4f8>] (__vfs_read) from [<c023bd04>]
 #5 [<c023bc78>] (vfs_read) from [<c023c7e0>]
 #6 [<c023c76c>] (sys_pread64) from [<c01079a0>]

This is the race condition, try to backtrace another task is not safe. We can't assume the task won't be scheduled to execution during the backtrace dump. The stack frame should totally change once execute again. 

The __ex_table entry in @for_each_frame should adding for this scenario. But with CONFIG_CPU_SW_DOMAIN_PAN=y, page domain fault may hit and go the do_bad() instead of do_page_fault().

The path may not an optimal solution, I just want to point out the problem, and is there any concern if we check __ex_table in do_bad()? 

Now, our project had enabled CONFIG_ARM_UNWIND=y, it will fail to get an unwind_idx when get a wrong sv_pc, then the unwind abort without kernel crash.

-----Original Message-----
From: 黄吕强 (Lvqiang Huang) 
Sent: Friday, November 08, 2019 1:23 AM
To: Russell King - ARM Linux admin
Cc: ebiederm@xmission.com; dave.hansen@linux.intel.com; anshuman.khandual@arm.com; akpm@linux-foundation.org; f.fainelli@gmail.com; will@kernel.org; tglx@linutronix.de; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ARM: check __ex_table in do_bad()


> 在 2019年11月7日,17:24,Russell King - ARM Linux admin 
> <linux@armlinux.org.uk> 写道:
> 
>> On Thu, Nov 07, 2019 at 03:45:13PM +0800, Lvqiang wrote:
>> 
>> We got many crashs in for_each_frame+0x18 arch/arm/lib/backtrace.S
>>    1003: ldr r2, [sv_pc, #-4]
>> 
>> The backtrace is
>>    dump_backtrace
>>    show_stack
>>    sched_show_task
>>    show_state_filter
>>    sysrq_handle_showstate_blocked
>>    __handle_sysrq
>>    write_sysrq_trigger
>>    proc_reg_write
>>    __vfs_write
>>    vfs_write
>>    sys_write
>> 
>> Related Kernel config
>>    CONFIG_CPU_SW_DOMAIN_PAN=y
>>    # CONFIG_ARM_UNWIND is not set
>>    CONFIG_FRAME_POINTER=y
>> 
>> The task A was dumping the stack of an UN task B. However, the task B
> 
> What is "an UN task B"?

UN means TASK_UNINTERRUPTIBLE. 
(Sorry for the typo in the last reply)

>> scheduled to run on another CPU, which cause it stack content changed.
>> Then, task A may hit a page domain fault and die().
>>    [520.661314] Unhandled fault: page domain fault (0x01b) at 
>> 0x32848c02
> 
> So, the backtrace code is trying to access userspace.  It isn't 
> supposed to be accessing userspace - there are no guarantees that 
> userspace will be using frame pointers.  That is the bug.
> 

There is a race condition when try to get the backtrace of another task,whose frames may totally changed during the execution. 

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 
> 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up


============================================================================
This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions. 
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
diff mbox series

Patch

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index bd0f482..22f45df 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -487,11 +487,14 @@  static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 #endif /* CONFIG_ARM_LPAE */
 
 /*
- * This abort handler always returns "fault".
+ * Checks __ex_table before returns "fault".
  */
 static int
 do_bad(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 {
+	if (fixup_exception(regs))
+		return 0;
+
 	return 1;
 }