From patchwork Thu Mar 6 02:10:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuai Xue X-Patchwork-Id: 14003740 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 101EFC19F32 for ; Thu, 6 Mar 2025 02:10:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D17E280007; Wed, 5 Mar 2025 21:10:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3813B280006; Wed, 5 Mar 2025 21:10:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2255E280007; Wed, 5 Mar 2025 21:10:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 01ED9280006 for ; Wed, 5 Mar 2025 21:10:40 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 82762160DFB for ; Thu, 6 Mar 2025 02:10:42 +0000 (UTC) X-FDA: 83189497524.12.F1D8934 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) by imf25.hostedemail.com (Postfix) with ESMTP id 41F01A0007 for ; Thu, 6 Mar 2025 02:10:39 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=FRwjkhFn; spf=pass (imf25.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741227040; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0Z4b4sITWeCYSe6qVPzMI52ddHUIEkqBypCvt4Knc4I=; b=aeOoTV+s4EvmVGQfwNnSSSqmW1FN8gfxqy3O7oqp2pqZ8P8SODrRQyqb2XI90M7hvR0OLr LIqxyQcVLTb4jI9EnspjvnGnOaakIp9aUvNtkDwpbzNGaLrsj7F3Jb5rLczE46jjpAbGk2 dCqllIfGubznFdq/krjS0wHu7ZtlWE4= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=FRwjkhFn; spf=pass (imf25.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741227040; a=rsa-sha256; cv=none; b=O0R5dVhPvk+V42bhMWsrKe91dnh3B3E3z35DC4GGYuqzCe+OOXZTlm5Ut2GGCafkBm4IgT zAz+TOg8LtrIXT2GGuvPJgQKL72KbNEvy4Xu6gxiPCWcVhtT2cW9NW92wxAprEP7+VTuJS EKX9Fg7nhzY28ZWlt7gWrtCzUBbrrW0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1741227037; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=0Z4b4sITWeCYSe6qVPzMI52ddHUIEkqBypCvt4Knc4I=; b=FRwjkhFnaZs109yDiMGdo83IPRWAq6LVt8cwivOJAZHO2i2ucMSl3ZNfTAfe3K9PweT9imIVkUfX2yEEN2rFVYoKnq3CJxN4ZbGo2ihyAgcYu2IzLqsZapFwWSl1sAAan0E2kWL0OOGAmkRfTAByyS6s/VYbOQhbbJaz33h2kKY= Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WQmzy.c_1741227035 cluster:ay36) by smtp.aliyun-inc.com; Thu, 06 Mar 2025 10:10:35 +0800 From: Shuai Xue To: tony.luck@intel.com, bp@alien8.de, peterz@infradead.org, catalin.marinas@arm.com, yazen.ghannam@amd.com, akpm@linux-foundation.org, linmiaohe@huawei.com, nao.horiguchi@gmail.com Cc: tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, jpoimboe@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, tianruidong@linux.alibaba.com Subject: [PATCH v3 2/3] mm/hwpoison: Do not send SIGBUS to processes with recovered clean pages Date: Thu, 6 Mar 2025 10:10:30 +0800 Message-ID: <20250306021031.5538-3-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20250306021031.5538-1-xueshuai@linux.alibaba.com> References: <20250306021031.5538-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 X-Stat-Signature: bhgt9uuah8q376jiw74e38gmhyom6jbh X-Rspamd-Queue-Id: 41F01A0007 X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1741227039-40314 X-HE-Meta: U2FsdGVkX1/EfLE2WDXmgbjiV+aNLzpZH4o6Z/RQ4gElIKB+3r1XCa0OYKiK9xBHXi362wXmRKoXf18/bkgA6WMz77eoVMKID5JBGLR7775qP+d5VPim0vKuDR5Au4epO3P8mhihlhMTIvOsp5i4lIeJIqaZD/IO8uKZTNmuZ9qlL2E51LTcCDdURkwx1tacnnrKfk+3jfbmWfhwq8Q9FdiAkPF2psty88zl3w3xBaBlse+sDAGqK1JncwPfv7Bvr6cPLp/2EuZEhtaF/ZWI7Qa2sIkpeO7tKUfCwrC3gz1HdI/tGVDEAhuMWymNYnichPi5Z7h+CGUPyKt0kKnmtMHcFmDfuF989XiecX4rRQk+FBJ7F0J3CGDjZ+whC7BNxrUj56I7WpUdBtKx6/iMOc+6CFo0rNATi0kMUGmsAPnkpRymVMtzyvIJUDlkS/KuNas02KuTF3tAUqaSeTy5ZRm99pVZzw4CZOHs0a5tfJMZBzYrxOcWVx8zOHc5lR8XpX7k0ELEOLCylq+RVx1iPIlGd+dma7WY47i76SlHeSet1v5zpPOPmyqYVNmN75HAJmpFmwjOcPPQoLnYCpLdTymbEGqeJMPScG6Tdqw0sE4kpr5ylnW5PwMeV0zab+ABYs6iWEZZGbwDY3qtDflhzk5BVsupDJdFc/zARcnzGo5ew1AZrTBIk8THrhCIAS77rGiYaXEebpyotP2Oke2q0GG9kqWw9nJm6K/AHERGcCIvZ8uPWUjBTXrp7KbLcPfGELiRSsVZpgH5OT32pvLz2Zghr4IXe3oa1gJiIqAJPiwchgCv3UzgFF8sBlxXnrDUavDx2s0cfy2XnsZ08+UMUlHBKDHIFP1lN9lAJBj4yL2u6XnU356LTvBdXKNZ9EvDc1zG9UpSWxZ70NqGJ/sMIkthsU7Es/nv+KK5QdXPdLg1QokiKanv5sBsJlYR6GgsZhqLQAhfHuU/Kiq5yx1 +4DV3ogM MTXdUZ9fCGrEuL7ysAMFkpacksARjRwCuSX/BoNeWgM28d1wQVazgWH4epDtVdzvSEXYkqCRpLGJSRcyTvCC+PuW59KZqcXMAoPFi5FTnN/T/OON7zBrOmYY0fOjkS2kQLbTBWt3VjcVh2om1mwawPmsx/SqbN4QWN/74ybk9lioc0GeG+jhqdwdm+oa78rNsrg3/3f5deyeWOQ7ZNpk+XqH79SJi+4+JeCCL+XYwbZ1mt56mOSu2LPKKJsY1G/l2UMR3j8DxtWglhOEJFEisWtis7CSCEThfyIkCxq5b1+2vKsPWI4huzNcqcosLvxd+P4L895sURmSZdpBxxzbghG9RSUQIcrn/tcZZE/RyqDJmXFyvIgVeSzYWcFFJ0dULI3qa X-Bogosity: Ham, tests=bogofilter, spamicity=0.001240, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. - Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR. Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name. - Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read. Thus: 1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A. Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison). - Why user process is killed for instr case Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") tries to fix noise message "Memory error not recovered" and skips duplicate SIGBUSs due to the race. But it also introduced a bug that kill_accessing_process() return -EHWPOISON for instr case, as result, kill_me_maybe() send a SIGBUS to user process. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure(). For dirty pages, memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, converting the PTE to a hwpoison entry. As a result, kill_accessing_process(): - call walk_page_range() and return 1 regardless of whether try_to_unmap() succeeds or fails, - call kill_proc() to make sure a SIGBUS is sent - return -EHWPOISON to indicate that SIGBUS is already sent to the process and kill_me_maybe() doesn't have to send it again. However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted to a hwpoison entry. Conversely, for clean pages where PTE entries are not marked as hwpoison, kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a SIGBUS. Console log looks like this: Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered Memory failure: 0x827ca68: already hardware poisoned mce: Memory error not recovered To fix it, return 0 for "corrupted page was clean", preventing an unnecessary SIGBUS to user process. [1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871 Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") Signed-off-by: Shuai Xue Cc: stable@vger.kernel.org --- mm/memory-failure.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 995a15eb67e2..b037952565be 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -881,12 +881,17 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn, mmap_read_lock(p->mm); ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwpoison_walk_ops, (void *)&priv); + /* + * ret = 1 when CMCI wins, regardless of whether try_to_unmap() + * succeeds or fails, then kill the process with SIGBUS. + * ret = 0 when poison page is a clean page and it's dropped, no + * SIGBUS is needed. + */ if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); - else - ret = 0; mmap_read_unlock(p->mm); - return ret > 0 ? -EHWPOISON : -EFAULT; + + return ret > 0 ? -EHWPOISON : 0; } /*