From patchwork Mon Dec 5 16:00:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xie XiuQi X-Patchwork-Id: 13064714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1A316C4332F for ; Mon, 5 Dec 2022 15:46:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=GZ7rRanvyMf8l3dk5/BmymadS1G++g6AmxhcSxNm34w=; b=jyXuQ1h46L7jcY vqM5eCBGvtNVm9dc3rcNr0aFwOsenlPrRjJnZjEvZg34GulGRU2TNQoyrUHX51/wFnXDFPopDHSLU HAZwDCG4nY3GrAMqUwPQ7hx1m49MJ27offqOIrWm18xiruNyE0MGkZ243YoiD4ws3W1o0SIM/+nbu gnlvgiRAjn7KkT9+X+ojLY99+6qoVMVLsRuM6akf29F/ZHQ8OwlZAHPFFKMSTbkB8Bctmw3za3TSH O7CMVuzZ1giuaTeFiLVXt0d7qg2qXAbUpeep6ljLvLeWWZA2Tnt/Ve1aHKsqMfNyVrtFDieVpd5Sf Vi7iEJLPDIBqWoFq/Y0w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2DeN-005Bo1-Se; Mon, 05 Dec 2022 15:45:00 +0000 Received: from szxga03-in.huawei.com ([45.249.212.189]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Dd6-00593w-V8 for linux-arm-kernel@lists.infradead.org; Mon, 05 Dec 2022 15:43:42 +0000 Received: from canpemm500001.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4NQnmb48l0zJnQc; Mon, 5 Dec 2022 23:39:59 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by canpemm500001.china.huawei.com (7.192.104.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 5 Dec 2022 23:43:27 +0800 From: Xie XiuQi To: , , , , , , , , , , CC: , , , , Subject: [PATCH v3 1/4] ACPI: APEI: include missing acpi/apei.h Date: Tue, 6 Dec 2022 00:00:40 +0800 Message-ID: <20221205160043.57465-2-xiexiuqi@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com> References: <20221205160043.57465-1-xiexiuqi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500001.china.huawei.com (7.192.104.163) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221205_074341_232333_B700721D X-CRM114-Status: UNSURE ( 6.26 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org kernel test robot reported this warning with 'make W=1': drivers/acpi/apei/apei-base.c:763:12: warning: no previous prototype for 'arch_apei_enable_cmcff' [-Wmissing-prototypes] 763 | int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, | ^~~~~~~~~~~~~~~~~~~~~~ drivers/acpi/apei/apei-base.c:770:13: warning: no previous prototype for 'arch_apei_report_mem_error' [-Wmissing-prototypes] 770 | void __weak arch_apei_report_mem_error(int sev, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ Include missing acpi/apei.h to avoid this warning. Reported-by: kernel test robot Fixes: 9dae3d0d9e64 ("apei, mce: Factor out APEI architecture specific MCE calls") Signed-off-by: Xie XiuQi --- drivers/acpi/apei/apei-base.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c index 9b52482b4ed5..02196a312dc5 100644 --- a/drivers/acpi/apei/apei-base.c +++ b/drivers/acpi/apei/apei-base.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include "apei-internal.h" From patchwork Mon Dec 5 16:00:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xie XiuQi X-Patchwork-Id: 13064715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EA545C4332F for ; Mon, 5 Dec 2022 15:46:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=TjSvBIs0KCj/jvZtG0sbib0SpDzQ0qxIlvpu7VLmfQY=; b=XJgOOlZPxp3n3H Yr06rqkC5aQVUcUmeNeFnnmm0GqC8wKB6SfYpLnsaXuT27aNF8f1L2hJ2A38XFz1nz8p8NlSfiSv5 JHJsAC3vE5qKoebO1HNsbMMZDE3KtdNDgOj8lyYUgVdPeTZ87Y2ovac/NNgTi3HeNmZnryzyRnk7Q Jz1a5nexzqHFdnTNMOhk+TZdk/JPuVWPcEBvvF5yiT3OgIz+GLEdUOPgNhWIoeTuZ7RO8BC4iwS/t B26yeqh3z6RJq3y456KI3ekZH4jM1CKxBZpFLGME0cnCNa0nA/su7Q2lmdoClvYnytiUr92J9CCWX dkBE19aVwLPnVcuHIjtg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Del-005CWN-Vd; Mon, 05 Dec 2022 15:45:24 +0000 Received: from szxga02-in.huawei.com ([45.249.212.188]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Dd4-00596X-EI for linux-arm-kernel@lists.infradead.org; Mon, 05 Dec 2022 15:43:42 +0000 Received: from canpemm500001.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NQnqj1fLGzRphV; Mon, 5 Dec 2022 23:42:41 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by canpemm500001.china.huawei.com (7.192.104.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 5 Dec 2022 23:43:28 +0800 From: Xie XiuQi To: , , , , , , , , , , CC: , , , , Subject: [PATCH v3 2/4] arm64: ghes: fix error unhandling in synchronous External Data Abort Date: Tue, 6 Dec 2022 00:00:41 +0800 Message-ID: <20221205160043.57465-3-xiexiuqi@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com> References: <20221205160043.57465-1-xiexiuqi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500001.china.huawei.com (7.192.104.163) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221205_074339_698315_E64C0EBE X-CRM114-Status: GOOD ( 19.78 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org According to the RAS documentation, if we cannot determine the impact of the error based on the details of the error when an SEA occurs, the process cannot safely continue to run. Therefore, for unhandled error, we should signal the system and terminate the process immediately. 2.2 Generating error exceptions: "An error exception is generated when a detected error is signaled to the PE as an in-band error response to an architecturally-executed memory access or cache maintenance operation. This includes any explicit data access, instruction fetch, translation table walk, or hardware update to the translation tables made by an architecturally-executed instruction." [1] 2.3 Taking error exceptions: Software is only able to successfully recover execution and make progress from a restart address for the exception by executing an Exception Return instruction to branch to the instruction at this restart address if all of the following are true: [2] - The error has not been silently propagated by the PE. - At the point when the Exception Return instruction is executed, the PE state and memory system state are consistent with the PE having executed all of the instructions up to but not including the instruction at the restart address, and none afterwards. That is, at least one of the following restart conditions is true: - The error has been not architecturally consumed by the PE andinfected the PE state. - Executing the instruction at the restart address will not consume the error and will correct any corrupt state by overwriting it with the correct value or values After commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work"), we deferred de SEA process to irq_work. For example, an memory reading error without valid pa, the process isn't been terminated. It is not safe. commit ccb5ecdc2dd ("ACPI: APEI: fix synchronous external aborts in user-mode") fix the cache errors, but the tlb or uarch errors also have problems. In this patch, a SIGBUS is force signaled to fix this case. Note: RAS documentation: https://developer.arm.com/documentation/ddi0587/latest Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work") Fixes: ccb5ecdc2dde ("ACPI: APEI: fix synchronous external aborts in user-mode") Signed-off-by: Xie XiuQi --- arch/arm64/kernel/acpi.c | 6 ++++++ drivers/acpi/apei/apei-base.c | 4 ++++ drivers/acpi/apei/ghes.c | 14 +++++++++++--- include/acpi/apei.h | 1 + 4 files changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index a5a256e3f9fe..75fc16a68dc3 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -32,6 +32,7 @@ #include #include #include +#include int acpi_noirq = 1; /* skip ACPI IRQ initialization */ int acpi_disabled = 1; @@ -407,6 +408,11 @@ int apei_claim_sea(struct pt_regs *regs) return err; } +void arch_apei_do_recovery_failed(void) +{ + arm64_force_sig_mceerr(BUS_MCEERR_AR, 0, 0, "Unhandled processor error"); +} + void arch_reserve_mem_area(acpi_physical_address addr, size_t size) { memblock_mark_nomap(addr, size); diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c index 02196a312dc5..784fe75258d9 100644 --- a/drivers/acpi/apei/apei-base.c +++ b/drivers/acpi/apei/apei-base.c @@ -774,6 +774,10 @@ void __weak arch_apei_report_mem_error(int sev, } EXPORT_SYMBOL_GPL(arch_apei_report_mem_error); +void __weak arch_apei_do_recovery_failed(void) +{ +} + int apei_osc_setup(void) { static u8 whea_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c"; diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 9952f3a792ba..ba0631c54c52 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "apei-internal.h" @@ -483,11 +484,12 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, return false; } -static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev) +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, + int sev, int notify_type) { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); bool queued = false; - int sec_sev, i; + int sec_sev, i, unhandled_errs = 0; char *p; log_arm_hw_error(err); @@ -521,9 +523,14 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int s pr_warn_ratelimited(FW_WARN GHES_PFX "Unhandled processor error type: %s\n", error_type); + unhandled_errs++; + p += err_info->length; } + if (unhandled_errs && notify_type == ACPI_HEST_NOTIFY_SEA) + arch_apei_do_recovery_failed(); + return queued; } @@ -631,6 +638,7 @@ static bool ghes_do_proc(struct ghes *ghes, const guid_t *fru_id = &guid_null; char *fru_text = ""; bool queued = false; + int notify_type = ghes->generic->notify.type; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { @@ -654,7 +662,7 @@ static bool ghes_do_proc(struct ghes *ghes, ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { - queued = ghes_handle_arm_hw_error(gdata, sev); + queued = ghes_handle_arm_hw_error(gdata, sev, notify_type); } else { void *err = acpi_hest_get_payload(gdata); diff --git a/include/acpi/apei.h b/include/acpi/apei.h index dc60f7db5524..136be5534581 100644 --- a/include/acpi/apei.h +++ b/include/acpi/apei.h @@ -52,6 +52,7 @@ int erst_clear(u64 record_id); int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data); void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); +void arch_apei_do_recovery_failed(void); #endif #endif From patchwork Mon Dec 5 16:00:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xie XiuQi X-Patchwork-Id: 13064712 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C058BC4332F for ; Mon, 5 Dec 2022 15:45:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Mm/XlZ61Ce8RppggUYjZFmvgv6aosjfQlHNGCEFtZb8=; b=C29wilNxWchJFo IK3tu0ihE8N9ErIwTlQdyoH9JrArxfPm2TsdYi6Dzkk1qAvGKIFGSoDi0e86b8KmX2YTBhf25CV7y c2fUCBj/5fcweeCcDfBV0fGPs39WwlpbOhZSyBTsMr3WvgU4uYaz8l+x3hTQD/FiaowTLeERHHPLC 8CJBl/BI5UhOhj4gFzpeaKRA79ggf05HPU6BINCM2fh4nfDqcRWwNZlPCIVzDvjYOO5cVS6Ep3Q+c GNV1y9d9drOtuqPA/veEPxU3jQ3dHARrAnQH967BX5krsCrJlwhUq7ln5KFbKZgePTb7nolrkmUNE OSaZai9mH6+WC8BZxbeQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Ddb-005ANx-M2; Mon, 05 Dec 2022 15:44:11 +0000 Received: from szxga01-in.huawei.com ([45.249.212.187]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Dd4-00599L-Mo for linux-arm-kernel@lists.infradead.org; Mon, 05 Dec 2022 15:43:40 +0000 Received: from canpemm500001.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4NQnls6FkJzqSvC; Mon, 5 Dec 2022 23:39:21 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by canpemm500001.china.huawei.com (7.192.104.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 5 Dec 2022 23:43:29 +0800 From: Xie XiuQi To: , , , , , , , , , , CC: , , , , Subject: [PATCH v3 3/4] arm64: ghes: handle the case when memory_failure recovery failed Date: Tue, 6 Dec 2022 00:00:42 +0800 Message-ID: <20221205160043.57465-4-xiexiuqi@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com> References: <20221205160043.57465-1-xiexiuqi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500001.china.huawei.com (7.192.104.163) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221205_074339_141373_A33451A8 X-CRM114-Status: GOOD ( 12.98 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org memory_failure() may not always recovery successfully. In synchronous external data abort case, if memory_failure() recovery failed, we must handle it. In this case, if the recovery fails, the common helper function arch_apei_do_recovery_failed() is invoked. For arm64 platform, we just send a SIGBUS. Signed-off-by: Xie XiuQi --- drivers/acpi/apei/ghes.c | 3 ++- include/linux/mm.h | 2 +- mm/memory-failure.c | 24 +++++++++++++++++------- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index ba0631c54c52..ddc4da603215 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -435,7 +435,8 @@ static void ghes_kick_task_work(struct callback_head *head) estatus_node = container_of(head, struct ghes_estatus_node, task_work); if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) - memory_failure_queue_kick(estatus_node->task_work_cpu); + if (memory_failure_queue_kick(estatus_node->task_work_cpu)) + arch_apei_do_recovery_failed(); estatus = GHES_ESTATUS_FROM_NODE(estatus_node); node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); diff --git a/include/linux/mm.h b/include/linux/mm.h index 974ccca609d2..126d1395c208 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3290,7 +3290,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, unsigned long count, int mf_flags); extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); -extern void memory_failure_queue_kick(int cpu); +extern int memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bead6bccc7f2..b9398f67264a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2240,12 +2240,12 @@ void memory_failure_queue(unsigned long pfn, int flags) } EXPORT_SYMBOL_GPL(memory_failure_queue); -static void memory_failure_work_func(struct work_struct *work) +static int __memory_failure_work_func(struct work_struct *work) { struct memory_failure_cpu *mf_cpu; struct memory_failure_entry entry = { 0, }; unsigned long proc_flags; - int gotten; + int gotten, ret = 0, result; mf_cpu = container_of(work, struct memory_failure_cpu, work); for (;;) { @@ -2254,24 +2254,34 @@ static void memory_failure_work_func(struct work_struct *work) spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); if (!gotten) break; - if (entry.flags & MF_SOFT_OFFLINE) + if (entry.flags & MF_SOFT_OFFLINE) { soft_offline_page(entry.pfn, entry.flags); - else - memory_failure(entry.pfn, entry.flags); + } else { + result = memory_failure(entry.pfn, entry.flags); + if (ret == 0 && result != 0) + ret = result; + } } + + return ret; +} + +static void memory_failure_work_func(struct work_struct *work) +{ + __memory_failure_work_func(work); } /* * Process memory_failure work queued on the specified CPU. * Used to avoid return-to-userspace racing with the memory_failure workqueue. */ -void memory_failure_queue_kick(int cpu) +int memory_failure_queue_kick(int cpu) { struct memory_failure_cpu *mf_cpu; mf_cpu = &per_cpu(memory_failure_cpu, cpu); cancel_work_sync(&mf_cpu->work); - memory_failure_work_func(&mf_cpu->work); + return __memory_failure_work_func(&mf_cpu->work); } static int __init memory_failure_init(void) From patchwork Mon Dec 5 16:00:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xie XiuQi X-Patchwork-Id: 13064713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9A52AC4332F for ; Mon, 5 Dec 2022 15:45:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=5Eqae1qsGvFiWbs+bC3kvHm72reuB5aUsQBGgS4z4Cc=; b=cCjP0NbryepE6U f7rsA9RLUczvtKMLiVLLUVXA614l+odnmzBzss9uDFkww+S/H61CE5rTlLKJLr5fF5dTk1nNxQI3h cUJ3yV2x28oyNGd+FEK2v7YemDMtEgr1P1Y5EltR8b5ps+rPedvrjTpne126+g47C+x9D8PApodOx dGg2a7UzDhA7B0mNF+wg8KxUVEnplNFH/KYtDWnzMJfbUxwEceHKQfJD6c5fYdeno0dx2W2NlhCla 3yehSSr3rMtHzFMMaP2+5Dzi7928QucBS/M4Kv/eu+OVcAzKeGVt4lp3xWEfpcZ1K/hOy4zVbM3yI 2v/tKA2X5qnDnbr2TnfQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Ddv-005B1h-CU; Mon, 05 Dec 2022 15:44:31 +0000 Received: from szxga02-in.huawei.com ([45.249.212.188]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p2Dd5-0059BX-3a for linux-arm-kernel@lists.infradead.org; Mon, 05 Dec 2022 15:43:41 +0000 Received: from canpemm500001.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NQnqk5ZDrzRpmX; Mon, 5 Dec 2022 23:42:42 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by canpemm500001.china.huawei.com (7.192.104.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 5 Dec 2022 23:43:30 +0800 From: Xie XiuQi To: , , , , , , , , , , CC: , , , , Subject: [PATCH v3 4/4] arm64: ghes: pass MF_ACTION_REQUIRED to memory_failure when sea Date: Tue, 6 Dec 2022 00:00:43 +0800 Message-ID: <20221205160043.57465-5-xiexiuqi@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com> References: <20221205160043.57465-1-xiexiuqi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500001.china.huawei.com (7.192.104.163) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221205_074339_383530_555CBF13 X-CRM114-Status: UNSURE ( 9.86 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For synchronous external data abort case, pass MF_ACTION_REQUIRED to memory_failure, ensure that error recovery is performed before return to the user space. Synchronous external data abort happened in current execution context, so as the description for 'action required', MF_ACTION_REQUIRED flag is needed. ``action optional'' if they are not immediately affected by the error ``action required'' if error happened in current execution context Signed-off-by: Xie XiuQi --- drivers/acpi/apei/ghes.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index ddc4da603215..043a91a7dd17 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -463,7 +463,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) } static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, - int sev) + int sev, int notify_type) { int flags = -1; int sec_sev = ghes_severity(gdata->error_severity); @@ -472,6 +472,9 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) return false; + if (notify_type == ACPI_HEST_NOTIFY_SEA) + flags |= MF_ACTION_REQUIRED; + /* iff following two events can be handled properly by now */ if (sec_sev == GHES_SEV_CORRECTED && (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) @@ -513,7 +516,12 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, * and don't filter out 'corrected' error here. */ if (is_cache && has_pa) { - queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0); + int flags = 0; + + if (notify_type == ACPI_HEST_NOTIFY_SEA) + flags |= MF_ACTION_REQUIRED; + + queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags); p += err_info->length; continue; } @@ -657,7 +665,7 @@ static bool ghes_do_proc(struct ghes *ghes, ghes_edac_report_mem_error(sev, mem_err); arch_apei_report_mem_error(sev, mem_err); - queued = ghes_handle_memory_failure(gdata, sev); + queued = ghes_handle_memory_failure(gdata, sev, notify_type); } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { ghes_handle_aer(gdata);