From patchwork Mon Apr 17 01:14:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13213116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B59E3C77B77 for ; Mon, 17 Apr 2023 01:14:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A6EF8E0002; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2565F8E0001; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F7A38E0002; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EFE258E0001 for ; Sun, 16 Apr 2023 21:14:20 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BA0F612018D for ; Mon, 17 Apr 2023 01:14:20 +0000 (UTC) X-FDA: 80689112280.20.7242667 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf03.hostedemail.com (Postfix) with ESMTP id A907920003 for ; Mon, 17 Apr 2023 01:14:18 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf03.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681694059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ATLTk4tIRKY2ik9mvk6iClmg5kdM0zMa6DywilfNnt8=; b=Ugs0Tw6x816gSZ/x541FckKYOUyy59VFgCSETkNvksZPhez4M1gqayH/WJlYu6zd/vKIpI Qtskk6tZpKuMjdGvoObNZy3zx4GVxxvFzTYJzWRO9wHQKbtAFtNtWhJahcqgzKxZU5xHCF I2YPCyKLf/LOOxk7fCxKeq5deflWewE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf03.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681694059; a=rsa-sha256; cv=none; b=VhQefT6N92GnAwlnZVQwSFJjd+vjo5b7SiMpfg+4wujc8Lwt4pzOk0gM9OdZ//aes/sKCL SEdBEK/mBxxxVE42H8ovMyFcJ2uiiPAgxW+glHgg8/QYZ31f+rCepR8PfxnsOGwXrj/2GI HHCY6g0GmglwQbG6QqhD7I6byHzSmPM= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R271e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=25;SR=0;TI=SMTPD_---0VgBQJ7w_1681694052; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VgBQJ7w_1681694052) by smtp.aliyun-inc.com; Mon, 17 Apr 2023 09:14:13 +0800 From: Shuai Xue To: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, akpm@linux-foundation.org, ardb@kernel.org, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, bp@alien8.de, cuibixuan@linux.alibaba.com, dave.hansen@linux.intel.com, james.morse@arm.com, jarkko@kernel.org, lenb@kernel.org, linmiaohe@huawei.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v7 1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events Date: Mon, 17 Apr 2023 09:14:06 +0800 Message-Id: <20230417011407.58319-2-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A907920003 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: egb3szppco1eij5b6u5qp11yfi5ad4za X-HE-Tag: 1681694058-960000 X-HE-Meta: U2FsdGVkX1804DFsl2geiSdfIffmBfdRBzlDP7vTLj64TNXH4N3nTgPdLSTyWKRfDbeeBNQM+xOYcKniOVWr1KleQs8RYyQEji9b6YS2X8Ofg4TZUCUi65442qi7/afptnOn08cYpWG8Ghwh/Rj86LY3fSMkIgOeFH8o9MkjOsE8xG/OCgmXCPibEAPjfjt4axI0aEbqUENdt+ewhsfIxGns30IjR0dRwkgwnrktq1RnrRRpILWI8JZWNvShnqnOidOFSa2tWeS29pIwQAdG0EZ96BO3++CgQlrbRjxZ3wYG9RL36rf+t7KO/mn88JW/3j6aXB8kfdhSKGduha9RYNH1TW+nx9FyhtIWIbwFBy5rUbUvFWLDldV7/sW5jDJCEQ6QGYWLZBhIBVZ3lfEryEqxbi85OaS3iOhf2ZbGYtNzoD52G6DigICMi4emQES/SwQKDY7i90lZnchiIUFsfspiHoZwBk3J8u7T5qWWrfYQMkC6mp7AanAPl/R7Vndnpu+YYs60qtoa4oTPvYsDBiB7jYoZFFyKVZcLpCo1QtqQ4KpnUMuRCfdCCTY3JaFFjGeLdJzz6oEZkep2UHbquUMzVUYgta6u/abc/cEaVC5F5S0a/5MCiqKdFwgMM2mOni/Gvbaaq5J7bmQq8tEflFzxhR1oDzcuk1XgwIgda4PEl6NJ40n4ljP7Szld6k+tfUZr4IArjg+5kDPaM+kfV5csdEcNkzDvRehl2tWrEl1qHi/Nk5b5QXKh7aUu738ebnciwKdjcRmqVuzuxr/x5Uxl/rvCQxQ60tsbZEjJcIQ/ZjGFviie7xkDUjiLR1NbdGV98Awzk0x6W14oaTWT1PnSuLp152d57T5Z1ScLVZIcwpBk1pTuHKrMFWHMrBy9d7eD0UFG1KeXMU1AsTUkP3hj0pjsfqK/CxDNk+efGASln7YPg4Ve52/qBN4qtACya32ANsmD0exAx2fOW4a EJc28cUI J8gVL/0nDZO2sRlbFc+wKnyPdEGetN2gcCMQXXEVr+vLV1lgzp1f9DyYcrEnmZtEO+qUT8KjrjwDWfdYlgQ5CBuZioX98cmt95hP8ugfNo5+sLOPBm0lzZ+6R2VzsjMTvmd3oVBppyIDtfoyn/yzC+EzHmd4rX4WEnf3azQwYIvh13431CxBl3NzyUR07+lShtE1b X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are two major types of uncorrected recoverable (UCR) errors : - Action Required (AR): The error is detected and the processor already consumes the memory. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error. - Action Optional (AO): The error is detected out of processor execution context. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error. The essential difference between AR and AO errors is that AR is a synchronous event, while AO is an asynchronous event. The hardware will signal a synchronous exception (Machine Check Exception on X86 and Synchronous External Abort on Arm64) when an error is detected and the memory access has been architecturally executed. When APEI firmware first is enabled, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA notification ), or for handling asynchronous errors (e.g. SCI or External Interrupt notification). In other words, we can distinguish synchronous errors by APEI notification. For AR errors, kernel will kill current process accessing the poisoned page by sending SIGBUS with BUS_MCEERR_AR. In addition, for AO errors, kernel will notify the process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in early kill mode. However, the GHES driver always sets mf_flags to 0 so that all UCR errors are handled as AO errors in memory failure. To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous events. Fixes: ba61ca4aab47 ("ACPI, APEI, GHES: Add hardware memory error recovery support")' Signed-off-by: Shuai Xue Tested-by: Ma Wupeng Reviewed-by: Kefeng Wang Reviewed-by: Xiaofei Tan Reviewed-by: Baolin Wang --- drivers/acpi/apei/ghes.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 34ad071a64e9..c479b85899f5 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -101,6 +101,20 @@ static inline bool is_hest_type_generic_v2(struct ghes *ghes) return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2; } +/* + * A platform may describe one error source for the handling of synchronous + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI + * or External Interrupt). On x86, the HEST notifications are always + * asynchronous, so only SEA on ARM is delivered as a synchronous + * notification. + */ +static inline bool is_hest_sync_notify(struct ghes *ghes) +{ + u8 notify_type = ghes->generic->notify.type; + + return notify_type == ACPI_HEST_NOTIFY_SEA; +} + /* * This driver isn't really modular, however for the time being, * continuing to use module_param is the easiest way to remain @@ -477,7 +491,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) } static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, - int sev) + int sev, bool sync) { int flags = -1; int sec_sev = ghes_severity(gdata->error_severity); @@ -491,7 +505,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) flags = MF_SOFT_OFFLINE; if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) - flags = 0; + flags = sync ? MF_ACTION_REQUIRED : 0; if (flags != -1) return ghes_do_memory_failure(mem_err->physical_addr, flags); @@ -499,9 +513,11 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, return false; } -static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev) +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, + int sev, bool sync) { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); + int flags = sync ? MF_ACTION_REQUIRED : 0; bool queued = false; int sec_sev, i; char *p; @@ -526,7 +542,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int s * and don't filter out 'corrected' error here. */ if (is_cache && has_pa) { - queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0); + queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags); p += err_info->length; continue; } @@ -647,6 +663,7 @@ static bool ghes_do_proc(struct ghes *ghes, const guid_t *fru_id = &guid_null; char *fru_text = ""; bool queued = false; + bool sync = is_hest_sync_notify(ghes); sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { @@ -664,13 +681,13 @@ static bool ghes_do_proc(struct ghes *ghes, atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err); arch_apei_report_mem_error(sev, mem_err); - queued = ghes_handle_memory_failure(gdata, sev); + queued = ghes_handle_memory_failure(gdata, sev, sync); } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { - queued = ghes_handle_arm_hw_error(gdata, sev); + queued = ghes_handle_arm_hw_error(gdata, sev, sync); } else { void *err = acpi_hest_get_payload(gdata); From patchwork Mon Apr 17 01:14:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13213117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9637BC77B61 for ; Mon, 17 Apr 2023 01:14:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82506900002; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D4818E0001; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67586900002; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 552AF8E0001 for ; Sun, 16 Apr 2023 21:14:21 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1D37880168 for ; Mon, 17 Apr 2023 01:14:21 +0000 (UTC) X-FDA: 80689112322.20.FCEDD25 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf17.hostedemail.com (Postfix) with ESMTP id DE9A64001A for ; Mon, 17 Apr 2023 01:14:18 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf17.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681694059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n17p9QgfLFZeY7KigPCAyUKhyl4VHEvtuO9jvsURamc=; b=zACRwZtJ6sZYSnk10KQbqdujok8dJTY6pFY3H/5JSw5QVDpCIb8a0z6KmIbOsXC3FkjpEU Sp+54mciO1ym3rbBJhy9zwgivl4Eu9WbWhBeDRL5fRjISI2r3Hke21Vb6S9gg739qUwqSp qhr648KJGhZKXmNBM0IcneGshoh//Pg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf17.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681694059; a=rsa-sha256; cv=none; b=7XLrgD7dXJHQQZNSHmBT825/PIqw8cV1Ad/IIfd5lxf3Ec6zKuGYMc7bS5LgddCifYvIWL 2atXZsa5rd6bwUodM1ooOBIHFok7EElCnNp55EeM0Ym7y4cX93pWuClfSk90xuBbHbbDer QCqEe/l8Nuu/KwpipSv/XRb1P7b/08Q= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R531e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=25;SR=0;TI=SMTPD_---0VgBQJ8G_1681694053; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VgBQJ8G_1681694053) by smtp.aliyun-inc.com; Mon, 17 Apr 2023 09:14:14 +0800 From: Shuai Xue To: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, akpm@linux-foundation.org, ardb@kernel.org, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, bp@alien8.de, cuibixuan@linux.alibaba.com, dave.hansen@linux.intel.com, james.morse@arm.com, jarkko@kernel.org, lenb@kernel.org, linmiaohe@huawei.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v7 2/2] ACPI: APEI: handle synchronous exceptions in task work Date: Mon, 17 Apr 2023 09:14:07 +0800 Message-Id: <20230417011407.58319-3-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: DE9A64001A X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 558ekp88reyf7r1p6a5zu3eo7dhzqez4 X-HE-Tag: 1681694058-656569 X-HE-Meta: U2FsdGVkX1/zv58W05thF7GEumGmDL43+dmlB8Tjh4OjdPyX6XJndysaKyXZLu+4EtqqaXAwadey79AEc7Oxk+AixouUrtAcmtabSwSHdYpHF0sJ4hPt8QEb4kab2MRvHdgpUYH4KDf/S6q1/TPbqL6/G1ELfH8oxsztoRl6HA8GZwrXxD05CxogPMitMjOvYkHRljR3k1ks7TyeKsiwMtexqRtqHXCqAJaGkApI/6uNRzYwNKeIVxcfbKAYYHqtmeE9c7t8gZQQdIqkbYK569BMfWvrQaHjfFlUFw+V9ecquMLhKdSzWo+LLQwQSJxRx2eC9mC1lBklbcb08fd04m7oPZ++1t9BFu6/kP68sgilkd5LMZRUrefxpCmp1e7roM65SF8RswPLSW6bSTgTncDhTNRpvHEy66OB8sdTCXJGfZQaADm0qvnJkfxCFyDLDfDxIqBLOSZteqBF4QQQZdgDLuiOVhXAR7ErDN8+ozicPv3RnBGZC+6qtHW2p0MoK0O+Gq5ISOEGuBCHSF+H7UJc+TjWe9KoFLkwEINrY+4w/WM1Jyu9wiQO92cMOF2Z5hwUxDwy5l4fcDpm0gWZFzCeSjGg/4QCiuQdnyepWu3/W1G6wNjtE4jYZclnKWma8LId2qAsNr95RA4K8DYe6XEGoP4NfDyxwjWGvNpEqPPcdjsPqRPlqyuFEmMH1Zhc2tSP01539t+bPQDs+H2G6doSLS7tUtydjh06tdYWt9ZTuWYuhnihRte4QZJEJlDh+AzPJJYwcePbxb0Nx/kwh4bS/VKppTX7/s0c/s284IXfrmCzqgiZZ08sOgYcWCazzBdo4sMmmL/ItP5SurCCmfdG+pntCAwHOTfxcc5DktWH+j64X+FxEZ5lpXkygtzEgtBmAx78HwkFGTLk/Kvngj7JxP6H0dB3RS3PbgZJyY0jFrIcPy/tyMhzfSnQdUbaCNXfrie0SHsVcedvF3U gUtqLL9j 7O+iI5ve2xZaiasKwKwS4FCmvooleQCr+RdD+C08NIbljjAgjwJc/icQLgqh/EzbVTDlpj4XPa23aP/Rv0OaY4GLxowcd3al+L4tWgzbcONESuDZ67wCEerbFoMCzf70bEfb40ln3gL54dw4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hardware errors could be signaled by synchronous interrupt, e.g. when an error is detected by a background scrubber, or signaled by synchronous exception, e.g. when an uncorrected error is consumed. Both synchronous and asynchronous error are queued and handled by a dedicated kthread in workqueue. commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") keep track of whether memory_failure() work was queued, and make task_work pending to flush out the workqueue so that the work for synchronous error is processed before returning to user-space. The trick ensures that the corrupted page is unmapped and poisoned. And after returning to user-space, the task starts at current instruction which triggering a page fault in which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON. However, the memory failure recovery for hwpoison-aware mechanisms does not work as expected. For example, hwpoison-aware user-space processes like QEMU register their customized SIGBUS handler and enable early kill mode by seting PF_MCE_EARLY at initialization. Then the kernel will directy notify the process by sending a SIGBUS signal in memory failure with wrong si_code: the actual user-space process accessing the corrupt memory location, but its memory failure work is handled in a kthread context, so it will send SIGBUS with BUS_MCEERR_AO si_code to the actual user-space process instead of BUS_MCEERR_AR in kill_proc(). To this end, separate synchronous and asynchronous error handling into different paths like X86 platform does: - valid synchronous errors: queue a task_work to synchronously send SIGBUS before ret_to_user. - valid asynchronous errors: queue a work into workqueue to asynchronously handle memory failure. - abnormal branches such as invalid PA, unexpected severity, no memory failure config support, invalid GUID section, OOM, etc. Then for valid synchronous errors, the current context in memory failure is exactly belongs to the task consuming poison data and it will send SIBBUS with proper si_code. Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") Signed-off-by: Shuai Xue Tested-by: Ma Wupeng Reviewed-by: Kefeng Wang Reviewed-by: Xiaofei Tan Reviewed-by: Baolin Wang --- arch/x86/kernel/cpu/mce/core.c | 9 +--- drivers/acpi/apei/ghes.c | 84 +++++++++++++++++++++------------- include/acpi/ghes.h | 3 -- mm/memory-failure.c | 17 ++----- 4 files changed, 56 insertions(+), 57 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2eec60f50057..2ebaaa494ac4 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1311,17 +1311,10 @@ static void kill_me_maybe(struct callback_head *cb) return; } - /* - * -EHWPOISON from memory_failure() means that it already sent SIGBUS - * to the current process with the proper error info, - * -EOPNOTSUPP means hwpoison_filter() filtered the error event, - * - * In both cases, no further processing is required. - */ if (ret == -EHWPOISON || ret == -EOPNOTSUPP) return; - pr_err("Memory error not recovered"); + pr_err("Sending SIGBUS to current task due to memory error not recovered"); kill_me_now(cb); } diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index c479b85899f5..b41d4e462b36 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -452,28 +452,41 @@ static void ghes_clear_estatus(struct ghes *ghes, } /* - * Called as task_work before returning to user-space. - * Ensure any queued work has been done before we return to the context that - * triggered the notification. + * struct sync_task_work - for synchronous RAS event + * + * @twork: callback_head for task work + * @pfn: page frame number of corrupted page + * @flags: fine tune action taken + * + * Structure to pass task work to be handled before + * ret_to_user via task_work_add(). */ -static void ghes_kick_task_work(struct callback_head *head) +struct sync_task_work { + struct callback_head twork; + u64 pfn; + int flags; +}; + +static void memory_failure_cb(struct callback_head *twork) { - struct acpi_hest_generic_status *estatus; - struct ghes_estatus_node *estatus_node; - u32 node_len; + int ret; + struct sync_task_work *twcb = + container_of(twork, struct sync_task_work, twork); - estatus_node = container_of(head, struct ghes_estatus_node, task_work); - if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) - memory_failure_queue_kick(estatus_node->task_work_cpu); + ret = memory_failure(twcb->pfn, twcb->flags); + kfree(twcb); - estatus = GHES_ESTATUS_FROM_NODE(estatus_node); - node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); - gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); + if (!ret || ret == -EHWPOISON || ret == -EOPNOTSUPP) + return; + + pr_err("Sending SIGBUS to current task due to memory error not recovered"); + force_sig(SIGBUS); } static bool ghes_do_memory_failure(u64 physical_addr, int flags) { unsigned long pfn; + struct sync_task_work *twcb; if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; @@ -486,6 +499,18 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) return false; } + if (flags == MF_ACTION_REQUIRED && current->mm) { + twcb = kmalloc(sizeof(*twcb), GFP_ATOMIC); + if (!twcb) + return false; + + twcb->pfn = pfn; + twcb->flags = flags; + init_task_work(&twcb->twork, memory_failure_cb); + task_work_add(current, &twcb->twork, TWA_RESUME); + return true; + } + memory_failure_queue(pfn, flags); return true; } @@ -654,7 +679,7 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata, schedule_work(&entry->work); } -static bool ghes_do_proc(struct ghes *ghes, +static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { int sev, sec_sev; @@ -698,7 +723,14 @@ static bool ghes_do_proc(struct ghes *ghes, } } - return queued; + /* + * If no memory failure work is queued for abnormal synchronous + * errors, do a force kill. + */ + if (sync && !queued) { + pr_err("Sending SIGBUS to current task due to memory error not recovered"); + force_sig(SIGBUS); + } } static void __ghes_print_estatus(const char *pfx, @@ -1000,9 +1032,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) struct ghes_estatus_node *estatus_node; struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; - bool task_work_pending; u32 len, node_len; - int ret; llnode = llist_del_all(&ghes_estatus_llist); /* @@ -1017,25 +1047,16 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) estatus = GHES_ESTATUS_FROM_NODE(estatus_node); len = cper_estatus_len(estatus); node_len = GHES_ESTATUS_NODE_LEN(len); - task_work_pending = ghes_do_proc(estatus_node->ghes, estatus); + + ghes_do_proc(estatus_node->ghes, estatus); + if (!ghes_estatus_cached(estatus)) { generic = estatus_node->generic; if (ghes_print_estatus(NULL, generic, estatus)) ghes_estatus_cache_add(generic, estatus); } - - if (task_work_pending && current->mm) { - estatus_node->task_work.func = ghes_kick_task_work; - estatus_node->task_work_cpu = smp_processor_id(); - ret = task_work_add(current, &estatus_node->task_work, - TWA_RESUME); - if (ret) - estatus_node->task_work.func = NULL; - } - - if (!estatus_node->task_work.func) - gen_pool_free(ghes_estatus_pool, - (unsigned long)estatus_node, node_len); + gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, + node_len); llnode = next; } @@ -1096,7 +1117,6 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes, estatus_node->ghes = ghes; estatus_node->generic = ghes->generic; - estatus_node->task_work.func = NULL; estatus = GHES_ESTATUS_FROM_NODE(estatus_node); if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) { diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 3c8bba9f1114..e5e0c308d27f 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -35,9 +35,6 @@ struct ghes_estatus_node { struct llist_node llnode; struct acpi_hest_generic *generic; struct ghes *ghes; - - int task_work_cpu; - struct callback_head task_work; }; struct ghes_estatus_cache { diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fae9baf3be16..3aef483ca3c6 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2073,7 +2073,9 @@ static DEFINE_MUTEX(mf_mutex); * * Return: 0 for successfully handled the memory error, * -EOPNOTSUPP for hwpoison_filter() filtered the error event, - * < 0(except -EOPNOTSUPP) on failure. + * -EHWPOISON for already sent SIGBUS to the current process with + * the proper error info, + * other negative error code on failure. */ int memory_failure(unsigned long pfn, int flags) { @@ -2355,19 +2357,6 @@ static void memory_failure_work_func(struct work_struct *work) } } -/* - * Process memory_failure work queued on the specified CPU. - * Used to avoid return-to-userspace racing with the memory_failure workqueue. - */ -void memory_failure_queue_kick(int cpu) -{ - struct memory_failure_cpu *mf_cpu; - - mf_cpu = &per_cpu(memory_failure_cpu, cpu); - cancel_work_sync(&mf_cpu->work); - memory_failure_work_func(&mf_cpu->work); -} - static int __init memory_failure_init(void) { struct memory_failure_cpu *mf_cpu; From patchwork Mon Dec 18 06:45:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13496264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C385DC35274 for ; Mon, 18 Dec 2023 06:45:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B4F98D0006; Mon, 18 Dec 2023 01:45:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 516C68D0001; Mon, 18 Dec 2023 01:45:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B93B8D0006; Mon, 18 Dec 2023 01:45:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2E9B68D0001 for ; Mon, 18 Dec 2023 01:45:42 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0E978A08A8 for ; Mon, 18 Dec 2023 06:45:42 +0000 (UTC) X-FDA: 81579003324.22.85B2673 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf05.hostedemail.com (Postfix) with ESMTP id EDB7F100015 for ; Mon, 18 Dec 2023 06:45:38 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702881939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S2f1TDEW63cD3jPt1NzhP/57iKUVALLGIHmiAutG68U=; b=C01zfCKlIBqHHByLRkx/zVAAlfBK/hiwCXDsDDVUEUbOPe2SoFeUbjvxMk97t8VrwjKT9e Vaz8OcRSHZF1liywh7mKCUQ4Dg6i6uUGNHPbUK9yEndiI1X1Bx+vnw8AJjjsjrujPYcwaS eVsoGGFczNINX+OdR3yPEJbWOHYTW9s= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702881939; a=rsa-sha256; cv=none; b=akaJsZ/hM6YsAyS7QpDZ5Ne3T7yMbFF3d1DZPvNUxfgid+kxl2/5JYHVLTXw8WOe8lZMEH RQnpf7hnnn0Cv/I+MchpldF+pZxRN2KL7urat0WGFChiFcEHcz/wXXyOX70BpqYb243XiK Ck7iq1GMXEMViVvkFRI6shzG3NNHjx8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=35;SR=0;TI=SMTPD_---0VygHb.J_1702881931; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VygHb.J_1702881931) by smtp.aliyun-inc.com; Mon, 18 Dec 2023 14:45:33 +0800 From: Shuai Xue To: bp@alien8.de, rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v10 3/4] mm: memory-failure: move memory_failure() return value documentation to function declaration Date: Mon, 18 Dec 2023 14:45:20 +0800 Message-Id: <20231218064521.37324-4-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: EDB7F100015 X-Rspam-User: X-Stat-Signature: awz7bnwn4rm538t8emr1k9f5ser3umg3 X-Rspamd-Server: rspam01 X-HE-Tag: 1702881938-626746 X-HE-Meta: U2FsdGVkX19al9v3JoaRIURfkjOrRp1pllJ5d24+kCdOOB75b078N0RsJEQob/IVwu9fYZBEowHE6CPsGyhs8goa9RhjBQsNRXLZtfOwZdar/HYjvYktvpFd+o6AmZWGJtaOJTv5yRQYpWnzI9z7B1x+4ZEze7IhzAqSTRl4ZUjwUOvghSGicGv+I1WLtCz3ALaiB/PjUJIC1J9sQ5tuuZNVLDt/TJ7ja4MWlu83JpZ7kqu+jC3yR3R8QlndY2PdCw9CqhNrnbLEhYKz5qbsxnsZcDpK+DEzr/vsCVeu5wMl9lH1SUkbWvGDN1b/KBNyvpPAE6PvTxgrrhDTg9rjHKZRkr6trwyxnh4RdYM72frgjHk/Tj63Ujfm6zXWfsMbvIAUp4C9ymrHZjWPq4a3v6C9lmyHgZoRbe5mNALueshNkdxUm9iYYQpFs9tFu5hkoRnv2c0SaIo4qhIhz9M3WpuKyKUlSSMt8OZOnWgMIwf6CZC65CBkCp6dCjbaW5ZUUNeOsOuTlEIjfawjk4fXTKGsO+wWZr6y+/Fv5h9Cp2zfyJL/gj37d7Lq2H2MoGGf4LY3REj2kBQWNzEdoESc9fQ10s/kPKK5BZNuMulueZWq7wQSe+Q0GyVPnLGwY7GdUvvhY2RsiK5Xhx6i/ItKGC8oVw0ZdjmENVZMafUyzVoLvp4FjgJGpXsBG56wDXHEH+sUTuVNuAiTxL6/LfpFa6cbO/9NOC7shyI0AjtjUtcUXjGQ4p4TAcQYlLY464LnhdNU5omiFU/OHCsUTxnkYIu0ddVXhp3J5srqTGRPAcvkEDU9+VL50ruZoAIzzqK+d4CEzyu5BAkS2TWZbwOY6qJLz24hBcO6UpXaJMP6wlE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Part of return value comments for memory_failure() were originally documented at the call site. Move those comments to the function declaration to improve code readability and to provide developers with immediate access to function usage and return information. Signed-off-by: Shuai Xue --- arch/x86/kernel/cpu/mce/core.c | 9 +-------- mm/memory-failure.c | 9 ++++++--- 2 files changed, 7 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 7b397370b4d6..43e542f06ad5 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1324,17 +1324,10 @@ static void kill_me_maybe(struct callback_head *cb) return; } - /* - * -EHWPOISON from memory_failure() means that it already sent SIGBUS - * to the current process with the proper error info, - * -EOPNOTSUPP means hwpoison_filter() filtered the error event, - * - * In both cases, no further processing is required. - */ if (ret == -EHWPOISON || ret == -EOPNOTSUPP) return; - pr_err("Memory error not recovered"); + pr_err("Sending SIGBUS to current task due to memory error not recovered"); kill_me_now(cb); } diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 660c21859118..bd3dcafdfa4a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2164,9 +2164,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, * Must run in process context (e.g. a work queue) with interrupts * enabled and no spinlocks held. * - * Return: 0 for successfully handled the memory error, - * -EOPNOTSUPP for hwpoison_filter() filtered the error event, - * < 0(except -EOPNOTSUPP) on failure. + * Return values: + * 0 - success + * -EOPNOTSUPP - hwpoison_filter() filtered the error event. + * -EHWPOISON - sent SIGBUS to the current process with the proper + * error info by kill_accessing_process(). + * other negative values - failure */ int memory_failure(unsigned long pfn, int flags) { From patchwork Mon Dec 18 06:45:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 13496265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFBEFC46CD2 for ; Mon, 18 Dec 2023 06:45:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6773D8D0007; Mon, 18 Dec 2023 01:45:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 629338D0001; Mon, 18 Dec 2023 01:45:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F07A8D0007; Mon, 18 Dec 2023 01:45:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3EB1A8D0001 for ; Mon, 18 Dec 2023 01:45:43 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 160321608CF for ; Mon, 18 Dec 2023 06:45:43 +0000 (UTC) X-FDA: 81579003366.18.86F8D78 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf22.hostedemail.com (Postfix) with ESMTP id 11212C000F for ; Mon, 18 Dec 2023 06:45:40 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702881941; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1zS9Da9CAyaj/s3gA9c/mCNMBK7GU5h51xos1juqGpY=; b=VUeEmjQp4tTWAxIY+D6+zpT3wUYWpGBF5bqfwkiIEpBvqr3Xk7CYiH8QzlnuzHor5A2K9P Z5RU84Veg0VW4uNRVdifhQDFJEsB2slYnKLsvPpH6NQGCUhomYyZCwwIdKDChImsOTZ2Y3 5xubpuzlKuKUjkw+GIfZwKGt2UmjzRs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702881941; a=rsa-sha256; cv=none; b=bSIsOqDL9UTxmRg6A88ggpEwmAfqpnC4B/5m8pVPuBPvWMabqmoTN5LEA9gjYP74FWDx8r oESYz7tWLTlOSX7a3c9QxmNUF7yVCZqlFfGjwdOSgVCquXE/K0FIugAsFaCoq0pON8vUAT OnMqPGrLaSw4szKuWuGgPyfqNqdLuc8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=35;SR=0;TI=SMTPD_---0VygHb04_1702881934; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VygHb04_1702881934) by smtp.aliyun-inc.com; Mon, 18 Dec 2023 14:45:35 +0800 From: Shuai Xue To: bp@alien8.de, rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v10 4/4] ACPI: APEI: handle synchronous exceptions in task work Date: Mon, 18 Dec 2023 14:45:21 +0800 Message-Id: <20231218064521.37324-5-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 11212C000F X-Rspam-User: X-Stat-Signature: tnetruum6eqsn4b6uw8b79jtz7r4knqu X-Rspamd-Server: rspam03 X-HE-Tag: 1702881940-973109 X-HE-Meta: U2FsdGVkX19S+8qtnIndUKke1UHPMJqEBi5fS+78JydpZHDoEg3C9hNYHoCpWiF4eaS3eXjZ56oHLb0IxUZpma/lIl3uXy1rTYv8DVnpFAwSSURdwodfs8j5cphQxppVugXqZxq1OHPROp8Ak6Se1fDTCEnNl+2W56c5h+yAYcPSUPuZXKM5xTjLLc5JEQyjKdyGWOX9jqEin2ALVanty6hiIgOCP2ei0SXCflsgDbGMmNkYlLxg/kDewgABBj4NsVZTu7fe7X54BepGqZn2Q6LQnN97oMxGYxnsY3tq48QpCvz4rpaLuVCQJtk7MOXsm0yDHOIAVC7YJIlP2kOGqlsDC3AD/h6+iiW/BMhS00jmhx3GErKMbFD4JphVbxVpq0QCjUNSG3iNnNJaCcTrx7+pekosIUAbzisRGt5ly72nbSd3964BsEem9pnkv/F8RYj7Scc0crByv2Wo1RcswTXGQ1j4Lh5vTbSGIHMjJf05mEARJvK4Q900jNngjMV5BvQW8zCbk6uCrcTW1irTrghotn/KxgCsrzjRixxOpRVl0ZCsvmxeT407dBunbL5dnbPC/MxXy4lVkXKoHFi/oLGD3LHWa49QAqcLzqxDiTm2gPikuwxKEM3HYT7QdOTXy8Ytgo8bLsuODzE6YEsY5zr9/JRBHBaNasOjcWbZl/zpjCwHN5/PdG2yIJGdoK78STAfeREWADn/Te/ou/q7mRcNvLtzHgY/S3dH6LGhZaV5BmaGeNJSiR+LM+6hXFYuMXfjs+ZAVel5chnV39Ts+J3WR3xOkQ2ggAv1D6nKlxhaDNIC1PhD0ndAGVxrVq7m1QTj/cy1Uj3naKBo5ht3sxKkOGxb81ZL7OO0LHF2cZRrjv93PGCElB277Wib61wHHXCvH/HAIAjhoKSYA8egKTeL+ci0WJ7nAff4pT+CuLow1QyCkHJj/7HjcL5OwI0NAO3dDsYjt0snjBJTY4g lR2HuwAk DVkQysHip5Ja5r+gnUWugmyRpRIq5gKiq1yVVaB7fSAqyRhzaYonEFpmNvG+EomX0Y7f0lp6LGpmN5w+rt6YINqGG5ZnF7zTw2egZRDQm05VEd0wqvNaRhZnqXYVjA4h1KSmDLBOFxCkwmWs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hardware errors could be signaled by asynchronous interrupt, e.g. when an error is detected by a background scrubber, or signaled by synchronous exception, e.g. when a CPU tries to access a poisoned cache line. Both synchronous and asynchronous error are queued as a memory_failure() work and handled by a dedicated kthread in workqueue. However, the memory failure recovery sends SIBUS with wrong BUS_MCEERR_AO si_code for synchronous errors in early kill mode, even MF_ACTION_REQUIRED is set. The main problem is that the memory failure work is handled in kthread context but not the user-space process which is accessing the corrupt memory location, so it will send SIGBUS with BUS_MCEERR_AO si_code to the user-space process instead of BUS_MCEERR_AR in kill_proc(). To this end, queue memory_failure() as a task_work so that the current context in memory_failure() is exactly belongs to the process consuming poison data and it will send SIBBUS with proper si_code. Signed-off-by: Shuai Xue Tested-by: Ma Wupeng Reviewed-by: Kefeng Wang Reviewed-by: Xiaofei Tan Reviewed-by: Baolin Wang --- drivers/acpi/apei/ghes.c | 77 +++++++++++++++++++++++----------------- include/acpi/ghes.h | 3 -- mm/memory-failure.c | 13 ------- 3 files changed, 44 insertions(+), 49 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index f832ffc5a88d..a6b4907cfe47 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -464,28 +464,41 @@ static void ghes_clear_estatus(struct ghes *ghes, } /* - * Called as task_work before returning to user-space. - * Ensure any queued work has been done before we return to the context that - * triggered the notification. + * struct sync_task_work - for synchronous RAS event + * + * @twork: callback_head for task work + * @pfn: page frame number of corrupted page + * @flags: fine tune action taken + * + * Structure to pass task work to be handled before + * ret_to_user via task_work_add(). */ -static void ghes_kick_task_work(struct callback_head *head) +struct sync_task_work { + struct callback_head twork; + u64 pfn; + int flags; +}; + +static void memory_failure_cb(struct callback_head *twork) { - struct acpi_hest_generic_status *estatus; - struct ghes_estatus_node *estatus_node; - u32 node_len; + int ret; + struct sync_task_work *twcb = + container_of(twork, struct sync_task_work, twork); - estatus_node = container_of(head, struct ghes_estatus_node, task_work); - if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) - memory_failure_queue_kick(estatus_node->task_work_cpu); + ret = memory_failure(twcb->pfn, twcb->flags); + gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb)); - estatus = GHES_ESTATUS_FROM_NODE(estatus_node); - node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); - gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); + if (!ret || ret == -EHWPOISON || ret == -EOPNOTSUPP) + return; + + pr_err("Sending SIGBUS to current task due to memory error not recovered"); + force_sig(SIGBUS); } static bool ghes_do_memory_failure(u64 physical_addr, int flags) { unsigned long pfn; + struct sync_task_work *twcb; if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; @@ -498,6 +511,18 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags) return false; } + if (flags == MF_ACTION_REQUIRED && current->mm) { + twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb)); + if (!twcb) + return false; + + twcb->pfn = pfn; + twcb->flags = flags; + init_task_work(&twcb->twork, memory_failure_cb); + task_work_add(current, &twcb->twork, TWA_RESUME); + return true; + } + memory_failure_queue(pfn, flags); return true; } @@ -673,7 +698,7 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata, schedule_work(&entry->work); } -static bool ghes_do_proc(struct ghes *ghes, +static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { int sev, sec_sev; @@ -725,8 +750,6 @@ static bool ghes_do_proc(struct ghes *ghes, pr_err("Sending SIGBUS to current task due to memory error not recovered"); force_sig(SIGBUS); } - - return queued; } static void __ghes_print_estatus(const char *pfx, @@ -1028,9 +1051,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) struct ghes_estatus_node *estatus_node; struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; - bool task_work_pending; u32 len, node_len; - int ret; llnode = llist_del_all(&ghes_estatus_llist); /* @@ -1045,25 +1066,16 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) estatus = GHES_ESTATUS_FROM_NODE(estatus_node); len = cper_estatus_len(estatus); node_len = GHES_ESTATUS_NODE_LEN(len); - task_work_pending = ghes_do_proc(estatus_node->ghes, estatus); + + ghes_do_proc(estatus_node->ghes, estatus); + if (!ghes_estatus_cached(estatus)) { generic = estatus_node->generic; if (ghes_print_estatus(NULL, generic, estatus)) ghes_estatus_cache_add(generic, estatus); } - - if (task_work_pending && current->mm) { - estatus_node->task_work.func = ghes_kick_task_work; - estatus_node->task_work_cpu = smp_processor_id(); - ret = task_work_add(current, &estatus_node->task_work, - TWA_RESUME); - if (ret) - estatus_node->task_work.func = NULL; - } - - if (!estatus_node->task_work.func) - gen_pool_free(ghes_estatus_pool, - (unsigned long)estatus_node, node_len); + gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, + node_len); llnode = next; } @@ -1124,7 +1136,6 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes, estatus_node->ghes = ghes; estatus_node->generic = ghes->generic; - estatus_node->task_work.func = NULL; estatus = GHES_ESTATUS_FROM_NODE(estatus_node); if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) { diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index be1dd4c1a917..ebd21b05fe6e 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -35,9 +35,6 @@ struct ghes_estatus_node { struct llist_node llnode; struct acpi_hest_generic *generic; struct ghes *ghes; - - int task_work_cpu; - struct callback_head task_work; }; struct ghes_estatus_cache { diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bd3dcafdfa4a..6bff57444928 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2451,19 +2451,6 @@ static void memory_failure_work_func(struct work_struct *work) } } -/* - * Process memory_failure work queued on the specified CPU. - * Used to avoid return-to-userspace racing with the memory_failure workqueue. - */ -void memory_failure_queue_kick(int cpu) -{ - struct memory_failure_cpu *mf_cpu; - - mf_cpu = &per_cpu(memory_failure_cpu, cpu); - cancel_work_sync(&mf_cpu->work); - memory_failure_work_func(&mf_cpu->work); -} - static int __init memory_failure_init(void) { struct memory_failure_cpu *mf_cpu;