diff mbox

[v3,8/8] arm64: exception: check shared writable page in SEI handler

Message ID 1490869877-118713-9-git-send-email-xiexiuqi@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xie XiuQi March 30, 2017, 10:31 a.m. UTC
From: Wang Xiongfeng <wangxiongfeng2@huawei.com>

Since SEI is asynchronous, the error data has been consumed. So we must
suppose that all the memory data current process can write are
contaminated. If the process doesn't have shared writable pages, the
process will be killed, and the system will continue running normally.
Otherwise, the system must be terminated, because the error has been
propagated to other processes running on other cores, and recursively
the error may be propagated to several another processes.

Signed-off-by: Wang Xiongfeng <wangxiongfengi2@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/arm64/kernel/traps.c | 149 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 144 insertions(+), 5 deletions(-)

Comments

James Morse April 7, 2017, 3:56 p.m. UTC | #1
Hi Xie XiuQi,

On 30/03/17 11:31, Xie XiuQi wrote:
> From: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> 
> Since SEI is asynchronous, the error data has been consumed. So we must
> suppose that all the memory data current process can write are
> contaminated. If the process doesn't have shared writable pages, the
> process will be killed, and the system will continue running normally.
> Otherwise, the system must be terminated, because the error has been
> propagated to other processes running on other cores, and recursively
> the error may be propagated to several another processes.

This is pretty complicated. We can't guarantee that another CPU hasn't modified
the page tables while we do this, (so its racy). We can't guarantee that the
corrupt data hasn't been sent over the network or written to disk in the mean
time (so its not enough).

The scenario you have is a write of corrupt data to memory where another CPU
reading it doesn't know the value is corrupt.

The hardware gives us quite a lot of help containing errors. The RAS
specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
Architectural error propagation', and then classifies it in '2.1.3
Architecturally infected, containable and uncontainable' as uncontained because
the value is no longer in the general-purpose registers. For uncontained errors
we should panic().

We shouldn't need to try to track errors after we get a notification as the
hardware has done this for us.


Firmware-first does complicate this if events like this are not delivered using
a synchronous external abort, as Linux may have PSTATE.A masked preventing
SError Interrupts from being taken. It looks like PSTATE.A is masked much more
often than is necessary. I will look into cleaning this up.


Thanks,

James
Xiongfeng Wang April 12, 2017, 8:35 a.m. UTC | #2
Hi James,


On 2017/4/7 23:56, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/03/17 11:31, Xie XiuQi wrote:
>> From: Wang Xiongfeng <wangxiongfeng2@huawei.com>
>>
>> Since SEI is asynchronous, the error data has been consumed. So we must
>> suppose that all the memory data current process can write are
>> contaminated. If the process doesn't have shared writable pages, the
>> process will be killed, and the system will continue running normally.
>> Otherwise, the system must be terminated, because the error has been
>> propagated to other processes running on other cores, and recursively
>> the error may be propagated to several another processes.
> 
> This is pretty complicated. We can't guarantee that another CPU hasn't modified
> the page tables while we do this, (so its racy). We can't guarantee that the
> corrupt data hasn't been sent over the network or written to disk in the mean
> time (so its not enough).
> 
> The scenario you have is a write of corrupt data to memory where another CPU
> reading it doesn't know the value is corrupt.
> 
> The hardware gives us quite a lot of help containing errors. The RAS
> specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
> Architectural error propagation', and then classifies it in '2.1.3
> Architecturally infected, containable and uncontainable' as uncontained because
> the value is no longer in the general-purpose registers. For uncontained errors
> we should panic().
> 
> We shouldn't need to try to track errors after we get a notification as the
> hardware has done this for us.
> 
Thanks for your comments. I think what you said is reasonable. We will remove this
patch and use AET fields of ESR_ELx to determine whether we should kill current
process or just panic.
> 
> Firmware-first does complicate this if events like this are not delivered using
> a synchronous external abort, as Linux may have PSTATE.A masked preventing
> SError Interrupts from being taken. It looks like PSTATE.A is masked much more
> often than is necessary. I will look into cleaning this up.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng
diff mbox

Patch

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 99be6d8..b222589 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -34,6 +34,8 @@ 
 #include <linux/sched/task_stack.h>
 #include <linux/syscalls.h>
 #include <linux/mm_types.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
 
 #include <asm/atomic.h>
 #include <asm/bug.h>
@@ -662,7 +664,144 @@  asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
 	[ESR_ELx_AET_CE]	=	"Corrected",
 };
 
+static void shared_writable_pte_entry(pte_t *pte, unsigned long addr,
+				struct mm_walk *walk)
+{
+	int *is_shared_writable = walk->private;
+	struct vm_area_struct *vma = walk->vma;
+	struct page *page = NULL;
+	int mapcount = -1;
+
+	if (!pte_write(__pte(pgprot_val(vma->vm_page_prot))))
+		return;
+
+	if (pte_present(*pte)) {
+		page = vm_normal_page(vma, addr, *pte);
+	} else if (is_swap_pte(*pte)) {
+		swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+		if (!non_swap_entry(swpent))
+			mapcount = swp_swapcount(swpent);
+		else if (is_migration_entry(swpent))
+			page = migration_entry_to_page(swpent);
+	}
+
+	if (mapcount == -1 && page)
+		mapcount = page_mapcount(page);
+	if (mapcount >= 2)
+		*is_shared_writable = 1;
+}
+
+static void shared_writable_pmd_entry(pmd_t *pmd, unsigned long addr,
+				struct mm_walk *walk)
+{
+	struct page *page;
+	int mapcount;
+	int *is_shared_writable = walk->private;
+
+	if (!pmd_write(*pmd))
+		return;
+
+	page = pmd_page(*pmd);
+	if (page) {
+		mapcount = page_mapcount(page);
+		if (mapcount >= 2)
+			*is_shared_writable = 1;
+	}
+}
+
+static int shared_writable_pte_range(pmd_t *pmd, unsigned long addr,
+				unsigned long end, struct mm_walk *walk)
+{
+	pte_t *pte;
+
+	if (pmd_trans_huge(*pmd)) {
+		shared_writable_pmd_entry(pmd, addr, walk);
+		return 0;
+	}
+
+	if (pmd_trans_unstable(pmd))
+		return 0;
+
+	pte = pte_offset_map(pmd, addr);
+	for (; addr != end; pte++, addr += PAGE_SIZE)
+	shared_writable_pte_entry(pte, addr, walk);
+	return 0;
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+static int shared_writable_hugetlb_range(pte_t *pte, unsigned long hmask,
+					unsigned long addr, unsigned long end,
+					struct mm_walk *walk)
+{
+	struct vm_area_struct *vma = walk->vma;
+	int *is_shared_writable = walk->private;
+	struct page *page = NULL;
+	int mapcount;
+
+	if (!pte_write(*pte))
+		return 0;
+
+	if (pte_present(*pte)) {
+		page = vm_normal_page(vma, addr, *pte);
+	} else if (is_swap_pte(*pte)) {
+		swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+		if (is_migration_entry(swpent))
+			page = migration_entry_to_page(swpent);
+	}
+
+	if (page) {
+		mapcount = page_mapcount(page);
+
+		if (mapcount >= 2)
+			*is_shared_writable = 1;
+	}
+	return 0;
+}
+#endif
+
+/*
+ *Check whether there exists a page in mm_struct which is shared with other
+ process and writable (not COW) at the same time. 0 means existing such a page.
+ */
+int mm_shared_writable(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+	int is_shared_writable = 0;
+	struct mm_walk shared_writable_walk = {
+		.pmd_entry = shared_writable_pte_range,
+#ifdef CONFIG_HUGETLB_PAGE
+		.hugetlb_entry = shared_writable_hugetlb_range,
+#endif
+		.mm = mm,
+		.private = &is_shared_writable,
+	};
+
+	if (!mm)
+		return -EPERM;
+
+	vma = mm->mmap;
+	while (vma) {
+		walk_page_vma(vma, &shared_writable_walk);
+		if (is_shared_writable)
+			return 1;
+		vma = vma->vm_next;
+	}
+	return 0;
+}
+
 DEFINE_PER_CPU(int, sei_in_process);
+
+/*
+ * Since SEI is asynchronous, the error data has been consumed. So we must
+ * suppose that all the memory data current process can write are
+ * contaminated. If the process doesn't have shared writable pages, the
+ * process will be killed, and the system will continue running normally.
+ * Otherwise, the system must be terminated, because the error has been
+ * propagated to other processes running on other cores, and recursively
+ * the error may be propagated to several another processes.
+ */
 asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
 {
 	int aet = ESR_ELx_AET(esr);
@@ -684,16 +823,16 @@  asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
 	if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
 	    cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
 		siginfo_t info;
-		void __user *pc = (void __user *)instruction_pointer(regs);
 
 		if (aet >= ESR_ELx_AET_UEO)
 			return;
 
-		if (aet == ESR_ELx_AET_UEU) {
-			info.si_signo = SIGILL;
+		if (aet == ESR_ELx_AET_UEU &&
+		    !mm_shared_writable(current->mm)) {
+			info.si_signo = SIGKILL;
 			info.si_errno = 0;
-			info.si_code  = ILL_ILLOPC;
-			info.si_addr  = pc;
+			info.si_code = 0;
+			info.si_addr = 0;
 
 			current->thread.fault_address = 0;
 			current->thread.fault_code = 0;