From patchwork Fri Jun 25 01:39:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12343477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE6A0C49EA7 for ; Fri, 25 Jun 2021 01:40:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7BD8A610A7 for ; Fri, 25 Jun 2021 01:40:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BD8A610A7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D2BE8D000F; Thu, 24 Jun 2021 21:40:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8833E6B0073; Thu, 24 Jun 2021 21:40:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FC7E8D000F; Thu, 24 Jun 2021 21:40:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 3C6246B0072 for ; Thu, 24 Jun 2021 21:40:00 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6DA511809CD6A for ; Fri, 25 Jun 2021 01:40:00 +0000 (UTC) X-FDA: 78290540160.30.F76AE60 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf09.hostedemail.com (Postfix) with ESMTP id 2C1626000148 for ; Fri, 25 Jun 2021 01:40:00 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 0039F61220; Fri, 25 Jun 2021 01:39:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1624585199; bh=8c8w0TGUkRuQHCjjh8touhZDdnrgaWxcNkMAqxZi29I=; h=Date:From:To:Subject:In-Reply-To:From; b=QicPOIfMDIlfK8sNkiJ5APQGcnj8Jm1vHuhzIt/1zUka1EieFp7LNvfKPWl9UbIot LJqpMDhjPwFpr2QseeAu+KQ6wFDPc2VaNbcbFm6vWDP6RuLwqdJM8E/9zWd/KQ/b20 KQNpd3mJGAEB4i0LrWoHfuPyKcU/nDtjZlgsFtBI= Date: Thu, 24 Jun 2021 18:39:58 -0700 From: Andrew Morton To: akpm@linux-foundation.org, bp@alien8.de, bp@suse.de, david@redhat.com, juew@google.com, linux-mm@kvack.org, luto@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, stable@vger.kernel.org, tony.luck@intel.com, torvalds@linux-foundation.org, yaoaili@kingsoft.com Subject: [patch 19/24] mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned Message-ID: <20210625013958.xJoXf1EqN%akpm@linux-foundation.org> In-Reply-To: <20210624183838.ac3161ca4a43989665ac8b2f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2C1626000148 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=QicPOIfM; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: r846nzidrp477n1ongajtna9nnyui9up X-HE-Tag: 1624585200-692668 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aili Yao Subject: mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com Signed-off-by: Aili Yao Signed-off-by: Naoya Horiguchi Reviewed-by: Oscar Salvador Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: David Hildenbrand Cc: Jue Wang Cc: Tony Luck Cc: Signed-off-by: Andrew Morton --- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/memory-failure.c~mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned +++ a/mm/memory-failure.c @@ -1253,7 +1253,7 @@ static int memory_failure_hugetlb(unsign if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; } num_poisoned_pages_inc(); @@ -1461,6 +1461,7 @@ try_again: if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; goto unlock_mutex; }