From patchwork Thu Oct 5 19:18:04 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Morse X-Patchwork-Id: 9987893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0EEE66029B for ; Thu, 5 Oct 2017 20:01:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F243728CE0 for ; Thu, 5 Oct 2017 20:01:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E6DEA28D29; Thu, 5 Oct 2017 20:01:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6364928CE0 for ; Thu, 5 Oct 2017 20:01:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=sRlwz91YfV4nEGpAEnRUkmSoPIGIeBvywRHotmk5Qs8=; b=UwzYt1IlIk/PJJfcjGhRoLg/uM 8z0WewI7L7l2v72W328KMn5xwAk9cZFxccZSlVpd2kkh1UatlwUBjShv5HDcu7+Yc8ZJWMbN3Izht jQGH8umDTD6hdwrOQiBsku5cEsmBjN//Rzm9zMZ9E0JCP0xkUZHhwjcb2giJqs3yQCtjuDwe8xLTz CNFqixgx7KH8pkQnBPbh+T8Wm5UzSZYR2xPDh+3uzbCUScm2ugwMnz8b8mruJAUBSgV0Gbhh9jkLK +afTWLn6EQtUfrNXUWuQSqXxZjsGsBZ0nI+sFaw+BTUL98/wZtEL5u9oyD7FckWz55vPuiPoiFXTZ oxmTlUsg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1e0CKb-0005n3-Pm; Thu, 05 Oct 2017 20:01:17 +0000 Received: from merlin.infradead.org ([205.233.59.134]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1e0C91-0007cz-B0 for linux-arm-kernel@bombadil.infradead.org; Thu, 05 Oct 2017 19:49:19 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=References:In-Reply-To:Message-Id:Date: Subject:Cc:To:From:Sender:Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=mIWZFyrjVBzRBuMfJBAYzmRm9MT4BQuCFOu+Au360pc=; b=z0Bf9pLTI3HWki0gjQ1si7tb4 3bCjW7L2j2db6T2TJGseAZU2dgHyhIGQlhrBKVsIP3eLNVPMU4lC92DHw7ndJNeVIxvhWaBFz3aSR QCdYie3uKkNct4uQ+eYjhWmrSy+Cv1Iap+lAANLWuPyuYSvb5S4s0m6r3Q/SaTDTDe6prkx0EU7R+ kkHu5jTrYz7zQ5mqz+5pdWqaYbVPaCmnZ4g/dRLw/7yfMnro0QswwX2AQZ87aLOBhiYVXRF4mQDNb gbvuBbTU1VymH7Ei6bSmTF7IJFjsi7F7VMJm3k+uXzW0bebSK0/SmA8UvxC+vp08uCufmGsJLAHKt oZPTCxXvA==; Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by merlin.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1e0BhX-0004zJ-Ud for linux-arm-kernel@lists.infradead.org; Thu, 05 Oct 2017 19:20:57 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E0E2B1688; Thu, 5 Oct 2017 12:20:38 -0700 (PDT) Received: from melchizedek.cambridge.arm.com (melchizedek.cambridge.arm.com [10.1.207.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1EE8D3F53D; Thu, 5 Oct 2017 12:20:36 -0700 (PDT) From: James Morse To: linux-arm-kernel@lists.infradead.org Subject: [PATCH v3 12/20] arm64: kernel: Survive corrected RAS errors notified by SError Date: Thu, 5 Oct 2017 20:18:04 +0100 Message-Id: <20171005191812.5678-13-james.morse@arm.com> X-Mailer: git-send-email 2.13.3 In-Reply-To: <20171005191812.5678-1-james.morse@arm.com> References: <20171005191812.5678-1-james.morse@arm.com> X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jonathan.Zhang@cavium.com, Xie XiuQi , Marc Zyngier , Catalin Marinas , Will Deacon , wangxiongfeng2@huawei.com, James Morse , kvmarm@lists.cs.columbia.edu, Christoffer Dall MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS extensions use SError to notify software about RAS errors, these can be contained by the ESB instruction. An ACPI system with firmware-first may use SError as its 'SEI' notification. Future patches may add code to 'claim' this SError as a notification. Other systems can distinguish these RAS errors from the SError ESR and use the AET bits and additional data from RAS-Error registers to handle the error. Future patches may add this kernel-first handling. Without support for either of these we will panic(), even if we received a corrected error. Add code to decode the severity of RAS errors. We can safely ignore contained errors where the CPU can continue to make progress. For all other errors we continue to panic(). Signed-off-by: James Morse == I couldn't come up with a concise way to capture 'can continue to make progress', so opted for 'blocking' instead. --- arch/arm64/include/asm/esr.h | 10 ++++++++ arch/arm64/include/asm/traps.h | 36 ++++++++++++++++++++++++++ arch/arm64/kernel/traps.c | 58 ++++++++++++++++++++++++++++++++++++++---- 3 files changed, 99 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h index 66ed8b6b9976..8ea52f15bf1c 100644 --- a/arch/arm64/include/asm/esr.h +++ b/arch/arm64/include/asm/esr.h @@ -85,6 +85,15 @@ #define ESR_ELx_WNR_SHIFT (6) #define ESR_ELx_WNR (UL(1) << ESR_ELx_WNR_SHIFT) +/* Asynchronous Error Type */ +#define ESR_ELx_AET (UL(0x7) << 10) + +#define ESR_ELx_AET_UC (UL(0) << 10) /* Uncontainable */ +#define ESR_ELx_AET_UEU (UL(1) << 10) /* Uncorrected Unrecoverable */ +#define ESR_ELx_AET_UEO (UL(2) << 10) /* Uncorrected Restartable */ +#define ESR_ELx_AET_UER (UL(3) << 10) /* Uncorrected Recoverable */ +#define ESR_ELx_AET_CE (UL(6) << 10) /* Corrected */ + /* Shared ISS field definitions for Data/Instruction aborts */ #define ESR_ELx_SET_SHIFT (11) #define ESR_ELx_SET_MASK (UL(3) << ESR_ELx_SET_SHIFT) @@ -99,6 +108,7 @@ #define ESR_ELx_FSC (0x3F) #define ESR_ELx_FSC_TYPE (0x3C) #define ESR_ELx_FSC_EXTABT (0x10) +#define ESR_ELx_FSC_SERROR (0x11) #define ESR_ELx_FSC_ACCESS (0x08) #define ESR_ELx_FSC_FAULT (0x04) #define ESR_ELx_FSC_PERM (0x0C) diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h index d131501c6222..8d2a1fff5c6b 100644 --- a/arch/arm64/include/asm/traps.h +++ b/arch/arm64/include/asm/traps.h @@ -19,6 +19,7 @@ #define __ASM_TRAP_H #include +#include #include struct pt_regs; @@ -58,4 +59,39 @@ static inline int in_entry_text(unsigned long ptr) return ptr >= (unsigned long)&__entry_text_start && ptr < (unsigned long)&__entry_text_end; } + +static inline bool arm64_is_ras_serror(u32 esr) +{ + bool impdef = esr & ESR_ELx_ISV; /* aka IDS */ + + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) + return !impdef; + + return false; +} + +/* Return the AET bits of an SError ESR, or 0/uncontainable/uncategorized */ +static inline u32 arm64_ras_serror_get_severity(u32 esr) +{ + u32 aet = esr & ESR_ELx_AET; + + if (!arm64_is_ras_serror(esr)) { + /* Not a RAS error, we can't interpret the ESR */ + return 0; + } + + /* + * AET is RES0 if 'the value returned in the DFSC field is not + * [ESR_ELx_FSC_SERROR]' + */ + if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR) { + /* No severity information */ + return 0; + } + + return aet; +} + +bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr); +void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr); #endif diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 210647d33bea..ab0f61d98fd6 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -709,17 +709,65 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs) } #endif -asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr) +void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr) { - nmi_enter(); - console_verbose(); pr_crit("SError Interrupt on CPU%d, code 0x%08x -- %s\n", smp_processor_id(), esr, esr_get_class_string(esr)); - __show_regs(regs); + if (regs) + __show_regs(regs); + + /* KVM may call this this from a preemptible context */ + preempt_disable(); + + /* + * panic() unmasks interrupts, which unmasks SError. Use nmi_panic() + * to avoid re-entering panic. + */ + nmi_panic(regs, "Asynchronous SError Interrupt"); + + cpu_park_loop(); + unreachable(); +} + +bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr) +{ + u32 aet = arm64_ras_serror_get_severity(esr); + + switch (aet) { + case ESR_ELx_AET_CE: /* corrected error */ + case ESR_ELx_AET_UEO: /* restartable, not yet consumed */ + /* + * The CPU can make progress. We may take UEO again as + * a more severe error. + */ + return false; + + case ESR_ELx_AET_UEU: /* Uncorrected Unrecoverable */ + case ESR_ELx_AET_UER: /* Uncorrected Recoverable */ + /* + * The CPU can't make progress. The exception may have + * been imprecise. + */ + return true; + + case ESR_ELx_AET_UC: /* Uncontainable error */ + default: + /* Error has been silently propagated */ + arm64_serror_panic(regs, esr); + } +} + +asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr) +{ + nmi_enter(); + + /* non-RAS errors are not containable */ + if (!arm64_is_ras_serror(esr) || arm64_blocking_ras_serror(regs, esr)) + arm64_serror_panic(regs, esr); - panic("Asynchronous SError Interrupt"); + nmi_exit(); } void __pte_error(const char *file, int line, unsigned long val)