From patchwork Tue Oct 12 14:29:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: luanshi X-Patchwork-Id: 12552829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65858C433F5 for ; Tue, 12 Oct 2021 14:31:29 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 06CD060240 for ; Tue, 12 Oct 2021 14:31:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 06CD060240 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=mqcTMoO6s0DYbZWsHt3LPwRMvpK/B3trPts89YbEie0=; b=eDLr0rICSxcNs2 iGNHjvMsKViZimvtKkUNNXjzNNQU1dqEUbiMxFWk8sv3XpnTWZLxKqKAfX/8x3bFxD/qnIbd866Hq WhrEVKkG+CIONoxNa7RXxEBSQdDtXxdwn+rsUz4gUz/Ip18xh2NG2pSiHC+b7n0nucsiBLUFkObdY pTm+5BoLhpF4bX/6ipquIx94U81R+9dagELVf5fMOfw0/EvK294LeqpY2kP5YK0/ZB6XaHFWqM4/t alk3BQjmUY3hirdivjinJ29t1Msh32Th/mIPZfCm50tCobtMKvygSoo4Cv5VEC/uLBt6dlxv/GjTP O9cFACOW56phBWCrIq/A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1maIme-00DBx9-PL; Tue, 12 Oct 2021 14:29:36 +0000 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1maIma-00DBwG-Nu for linux-arm-kernel@lists.infradead.org; Tue, 12 Oct 2021 14:29:34 +0000 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R181e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04407; MF=zhangliguang@linux.alibaba.com; NM=1; PH=DS; RN=9; SR=0; TI=SMTPD_---0UrbBHyu_1634048958; Received: from localhost(mailfrom:zhangliguang@linux.alibaba.com fp:SMTPD_---0UrbBHyu_1634048958) by smtp.aliyun-inc.com(127.0.0.1); Tue, 12 Oct 2021 22:29:26 +0800 From: Liguang Zhang To: "Rafael J. Wysocki" , Len Brown , James Morse , Tony Luck , Borislav Petkov Cc: linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Liguang Zhang Subject: [PATCH V2] ACPI / APEI: restore interrupt before panic in sdei flow Date: Tue, 12 Oct 2021 22:29:10 +0800 Message-Id: <20211012142910.9688-1-zhangliguang@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211012_072933_026766_53B06990 X-CRM114-Status: GOOD ( 15.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When hest acpi table configure Hardware Error Notification type as Software Delegated Exception(0x0B) for RAS event, OS RAS interacts with ATF by SDEI mechanism. On the firmware first system, OS was notified by ATF sdei call. The calling flow like as below when fatal RAS error happens: ATF notify OS flow: sdei_dispatch_event() ehf_activate_priority() call sdei callback // callback registered by OS ehf_deactivate_priority() OS sdei callback: sdei_asm_handler() __sdei_handler() _sdei_handler() sdei_event_handler() ghes_sdei_critical_callback() ghes_in_nmi_queue_one_entry() /* if RAS error is fatal */ __ghes_panic() panic() If fatal RAS error occured, panic was called in sdei_asm_handle() without ehf_deactivate_priority executed, which lead interrupt masked. If interrupt masked, system would be halted in kdump flow like this: arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for cmdq arm-smmu-v3 arm-smmu-v3.3.auto: allocated 32768 entries for evtq arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for priq arm-smmu-v3 arm-smmu-v3.3.auto: SMMU currently enabled! Resetting... After debug, we found accurate halted position is: arm_smmu_device_probe() arm_smmu_device_reset() arm_smmu_device_disable() arm_smmu_write_reg_sync() readl_relaxed_poll_timeout() readx_poll_timeout() read_poll_timeout() usleep_range() // hrtimer is never waked. So interrupt should be restored before panic otherwise kdump will trigger error. In the process of sdei, a SDEI_EVENT_COMPLETE_AND_RESUME call should be called before panic for a completed run of ehf_deactivate_priority(). Signed-off-by: Liguang Zhang --- drivers/acpi/apei/ghes.c | 25 +++++++++++++++++++++---- drivers/firmware/arm_sdei.c | 14 ++++++++++++++ include/linux/arm_sdei.h | 2 ++ 3 files changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 0c8330ed1ffd..4f734c60987c 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -141,6 +141,7 @@ static unsigned long ghes_estatus_pool_size_request; static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE]; static atomic_t ghes_estatus_cache_alloced; +static bool ghes_sdei_callback; static int ghes_panic_timeout __read_mostly = 30; static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx) @@ -837,18 +838,30 @@ static void ghes_estatus_cache_add( rcu_read_unlock(); } +static void sdei_api_restore_ras(void) +{ + /* reboot to log the error! */ + if (!panic_timeout) + panic_timeout = ghes_panic_timeout; + panic("Fatal hardware error!"); +} + static void __ghes_panic(struct ghes *ghes, struct acpi_hest_generic_status *estatus, u64 buf_paddr, enum fixed_addresses fixmap_idx) { + int err; + __ghes_print_estatus(KERN_EMERG, ghes->generic, estatus); ghes_clear_estatus(ghes, estatus, buf_paddr, fixmap_idx); - /* reboot to log the error! */ - if (!panic_timeout) - panic_timeout = ghes_panic_timeout; - panic("Fatal hardware error!"); + if (ghes_sdei_callback) { + err = sdei_api_event_complete_and_resume((unsigned long)sdei_api_restore_ras); + if (err) + sdei_api_restore_ras(); + } else + sdei_api_restore_ras(); } static int ghes_proc(struct ghes *ghes) @@ -1224,7 +1237,9 @@ static int ghes_sdei_normal_callback(u32 event_num, struct pt_regs *regs, int err; raw_spin_lock(&ghes_notify_lock_sdei_normal); + ghes_sdei_callback = true; err = __ghes_sdei_callback(ghes, FIX_APEI_GHES_SDEI_NORMAL); + ghes_sdei_callback = false; raw_spin_unlock(&ghes_notify_lock_sdei_normal); return err; @@ -1238,7 +1253,9 @@ static int ghes_sdei_critical_callback(u32 event_num, struct pt_regs *regs, int err; raw_spin_lock(&ghes_notify_lock_sdei_critical); + ghes_sdei_callback = true; err = __ghes_sdei_callback(ghes, FIX_APEI_GHES_SDEI_CRITICAL); + ghes_sdei_callback = false; raw_spin_unlock(&ghes_notify_lock_sdei_critical); return err; diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c index a7e762c352f9..1af6b6b55c57 100644 --- a/drivers/firmware/arm_sdei.c +++ b/drivers/firmware/arm_sdei.c @@ -473,6 +473,20 @@ static int sdei_api_event_unregister(u32 event_num) 0, 0, 0, NULL); } +int sdei_api_event_complete_and_resume(u64 addr) +{ + int err; + + err = invoke_sdei_fn(SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME, addr, + 0, 0, 0, 0, NULL); + if (err && err != -EIO) { + pr_warn_once("failed to restore CPU[%u]: %d\n", smp_processor_id(), err); + return err; + } + + return 0; +} + /* Called directly by the hotplug callbacks */ static void _local_event_unregister(void *data) { diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index 0a241c5c911d..b6d347085834 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -46,9 +46,11 @@ int sdei_unregister_ghes(struct ghes *ghes); /* For use by arch code when CPU hotplug notifiers are not appropriate. */ int sdei_mask_local_cpu(void); int sdei_unmask_local_cpu(void); +int sdei_api_event_complete_and_resume(u64 addr); #else static inline int sdei_mask_local_cpu(void) { return 0; } static inline int sdei_unmask_local_cpu(void) { return 0; } +int sdei_api_event_complete_and_resume(u64 addr) { return 0; } #endif /* CONFIG_ARM_SDE_INTERFACE */