From patchwork Fri May 21 03:01:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12271791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D299BC433ED for ; Fri, 21 May 2021 03:02:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6877F6108D for ; Fri, 21 May 2021 03:02:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6877F6108D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 086658D001C; Thu, 20 May 2021 23:02:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 05F7D8D0001; Thu, 20 May 2021 23:02:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E41728D001C; Thu, 20 May 2021 23:02:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A9F988D0001 for ; Thu, 20 May 2021 23:02:08 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4B74E8248068 for ; Fri, 21 May 2021 03:02:08 +0000 (UTC) X-FDA: 78163739136.11.F10BC79 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf17.hostedemail.com (Postfix) with ESMTP id E378C40B8CE8 for ; Fri, 21 May 2021 03:02:06 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id e15so3576805plh.1 for ; Thu, 20 May 2021 20:02:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Gmn+77caXUMJ4caoqVqQUZ1bs4QzEShm2/YkSqe/I/E=; b=gBprfNO0r40StCr1EKtjQyFNmJ/3h0TWADQlG9H2814dyqK6YYA0B+jBSv/lpDJajG uR/ACB82FwsWQN+4K8gaoIBAazyMtsRsXADQpQE2msBBJlrJM3ukzxxDPAJV1RXrMjim FqY6oJ+EGHKlkhbOzgo1LGnQuRW1N5Ez0pNXdqnO0ZHVHmpGN/qDPfrfdfhjkvpjQM+i YOw7hITzdED2GnCiPuBqhpgIuzjeVV9nYZuP0Gr3m3GSRnWPUo15rsYzKX7+o223L8Y8 Cm62D8GQx8wH6fzZoMlQTdh786b/dLIfxFRwHJPYmhlfmsFSoHmU/xNF1RSm4G52WK7t BzrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Gmn+77caXUMJ4caoqVqQUZ1bs4QzEShm2/YkSqe/I/E=; b=NgYpPSyrMczOwqPyM5Ld4P/o79me2ISPhQ3PbW0T5qWw9yX+PbGUmBuEgrCV0dx1uZ XiEuyHjsGXBB5X3Rc1pKmDZEsPQhvB/+hrd1wR9qR71YGUyvrr+fi7ei5uLCmxNQhnUn D59dkk4kXyoCeIgJpa7LBmgbPGm8w9LRf88pqXdQh8/iM+sgjGPEwDFIH9xCoLQSZ7gJ AlZEXG7uv24I4IO+c6u+xW4jLhMSm1fNKxGYBCQ6jmCVvQ9w/+xxEX3xLKeYqjUE5ZME M/ItujzOfacHTDngBku7kY5IPbm5T1Ty4xOHdgz3q7lyx06XYTrttotVbRQYueFHpK+A LooQ== X-Gm-Message-State: AOAM533dQbmu3iMWkPXCLOAp5OAliLEUvvPbBoshYQEmXLxALs6ajsGq 8VzOW3jLf79EQzPKgIwMPkWt+ZiCvJtF9cQ= X-Google-Smtp-Source: ABdhPJxRtjfTUESVH4jVOuiT/Er6wTL+7Qo724pplulYgttBkTDucJcWrjra9CjvWSPfrPRJmTASLQ== X-Received: by 2002:a17:90a:29c4:: with SMTP id h62mr8613595pjd.177.1621566126847; Thu, 20 May 2021 20:02:06 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id 191sm2959677pfx.121.2021.05.20.20.02.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 20:02:06 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v5 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Date: Fri, 21 May 2021 12:01:54 +0900 Message-Id: <20210521030156.2612074-2-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210521030156.2612074-1-nao.horiguchi@gmail.com> References: <20210521030156.2612074-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=gBprfNO0; spf=pass (imf17.hostedemail.com: domain of naohoriguchi@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=naohoriguchi@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E378C40B8CE8 X-Stat-Signature: a5pbneoqsk1o1gdj87nwt85mz7aft7w1 X-HE-Tag: 1621566126-571038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tony Luck There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks. But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing. Fix by wrapping most of the internal parts of memory_failure() in a mutex. Signed-off-by: Tony Luck Signed-off-by: Naoya Horiguchi Reviewed-by: Borislav Petkov Reviewed-by: Oscar Salvador --- mm/memory-failure.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c index 9a7c12ace9e2..0f0b932ccbca 100644 --- v5.13-rc2/mm/memory-failure.c +++ v5.13-rc2_patched/mm/memory-failure.c @@ -1400,6 +1400,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +static DEFINE_MUTEX(mf_mutex); + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1423,7 +1425,7 @@ int memory_failure(unsigned long pfn, int flags) struct page *hpage; struct page *orig_head; struct dev_pagemap *pgmap; - int res; + int res = 0; unsigned long page_flags; bool retry = true; @@ -1443,13 +1445,18 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; } + mutex_lock(&mf_mutex); + try_again: - if (PageHuge(p)) - return memory_failure_hugetlb(pfn, flags); + if (PageHuge(p)) { + res = memory_failure_hugetlb(pfn, flags); + goto unlock_mutex; + } + if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + goto unlock_mutex; } orig_head = hpage = compound_head(p); @@ -1482,17 +1489,19 @@ int memory_failure(unsigned long pfn, int flags) res = MF_FAILED; } action_result(pfn, MF_MSG_BUDDY, res); - return res == MF_RECOVERED ? 0 : -EBUSY; + res = res == MF_RECOVERED ? 0 : -EBUSY; } else { action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); - return -EBUSY; + res = -EBUSY; } + goto unlock_mutex; } if (PageTransHuge(hpage)) { if (try_to_split_thp_page(p, "Memory Failure") < 0) { action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); - return -EBUSY; + res = -EBUSY; + goto unlock_mutex; } VM_BUG_ON_PAGE(!page_count(p), p); } @@ -1516,7 +1525,7 @@ int memory_failure(unsigned long pfn, int flags) if (PageCompound(p) && compound_head(p) != orig_head) { action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } /* @@ -1536,14 +1545,14 @@ int memory_failure(unsigned long pfn, int flags) num_poisoned_pages_dec(); unlock_page(p); put_page(p); - return 0; + goto unlock_mutex; } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); unlock_page(p); put_page(p); - return 0; + goto unlock_mutex; } if (!PageTransTail(p) && !PageLRU(p)) @@ -1562,7 +1571,7 @@ int memory_failure(unsigned long pfn, int flags) if (!hwpoison_user_mappings(p, pfn, flags, &p)) { action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } /* @@ -1571,13 +1580,15 @@ int memory_failure(unsigned long pfn, int flags) if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) { action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } identify_page_state: res = identify_page_state(pfn, p, page_flags); -out: +unlock_page: unlock_page(p); +unlock_mutex: + mutex_unlock(&mf_mutex); return res; } EXPORT_SYMBOL_GPL(memory_failure); From patchwork Fri May 21 03:01:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12271793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DDB5C433B4 for ; Fri, 21 May 2021 03:02:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E6756128A for ; Fri, 21 May 2021 03:02:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E6756128A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3F61D8D001D; Thu, 20 May 2021 23:02:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A3BB8D0001; Thu, 20 May 2021 23:02:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBCE18D001D; Thu, 20 May 2021 23:02:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 91DAD8D0001 for ; Thu, 20 May 2021 23:02:11 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 246EEBEE0 for ; Fri, 21 May 2021 03:02:11 +0000 (UTC) X-FDA: 78163739262.35.4B2A5B7 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf30.hostedemail.com (Postfix) with ESMTP id 37126E000803 for ; Fri, 21 May 2021 03:02:09 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id kr9so2151950pjb.5 for ; Thu, 20 May 2021 20:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pyb0TctjvaI/OUB8rDFVD7xXzPEU5hRVJt7W8tnhK1Q=; b=T0p2rR2gWnKeMMy4lg+36jLMLFefCFJQpxF0fSZkNAor+htwgnU68pj6lm/uDsIehQ 9TILJ8hh2jMOfmo7unm+O+zjFg3F7j5YF2y1kHU5DvMya8wngy3oCiyDeLLuopkomOZr CBJp7FBTP1HcZulZyTo/aYLz48bmJxu8THj+rZJMYk+SzaMzvNTPCRR41+5xwonWmVkC PjD1Nd0PIthLpEGuPN30a2BgvTsr5nnhcYUD+ouD3GWsXFHreuDDUiho/Ttlah4sCgeq hhICjTEfI+c85VQ41rPPBsxMhxS+ZfL1vPZCu0kxEhOubQGquDV6O94m/jp4p24DamhY cXNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pyb0TctjvaI/OUB8rDFVD7xXzPEU5hRVJt7W8tnhK1Q=; b=Q4ZEr4M1gHyqiHeAJ+Q0F755KrQSEpRv43SFlgNki5WKgIBc+7ilHqk46FFV0+/fPD ccxVItqyIGW1ikkXGhP1RVQJu6+PNFZRxhy1QweIkgox+Zj5Dw77rM6/qlIgZefp+t84 mLbYH/gypHDeNNhcTAU8iPjUlmBJfFkTJt8mFF+o+o0Vt7x1E75g0fNMrcky1KcG1jVf 6Y7GXIc5WkFnN5g5eJwKpmO7S4uG+LIkvqa1K+KbpjeQM45pzudFTHCj4ZopHWD6j6Ib qG2wIv4T8s1ltaQoqwnGzdnyT3UVf5cEOHAvO6Ea2NrPyB+WoFW6mxEM5z3p+rpS8FVu mAnw== X-Gm-Message-State: AOAM5303bSUbP8yylTASAQlKTF5gWpDeLcAcXQ223nhYfWnHCVHTEQGO xVe9zPgNu59RnxL4Wcv3LQ5NAGq4OlAtNAg= X-Google-Smtp-Source: ABdhPJxFqjiOZDuJCAgryzL8NkinikwGV9C+njllULaSHVoCRPlzpo0KTnp4x3CmDRX2fyPw3CWGrA== X-Received: by 2002:a17:90a:fa88:: with SMTP id cu8mr8631050pjb.233.1621566129802; Thu, 20 May 2021 20:02:09 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id 191sm2959677pfx.121.2021.05.20.20.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 20:02:09 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v5 2/3] mm,hwpoison: Return -EHWPOISON to denote that the page has already been poisoned Date: Fri, 21 May 2021 12:01:55 +0900 Message-Id: <20210521030156.2612074-3-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210521030156.2612074-1-nao.horiguchi@gmail.com> References: <20210521030156.2612074-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 37126E000803 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=T0p2rR2g; spf=pass (imf30.hostedemail.com: domain of naohoriguchi@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=naohoriguchi@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Stat-Signature: 9yzcdfxs8ap1yew7ity1pjmq68azrzac X-HE-Tag: 1621566129-633335 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aili Yao When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Signed-off-by: Aili Yao Signed-off-by: Naoya Horiguchi Reviewed-by: Oscar Salvador --- ChangeLog v5: - update patch description. --- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c index 0f0b932ccbca..8add7cafad5e 100644 --- v5.13-rc2/mm/memory-failure.c +++ v5.13-rc2_patched/mm/memory-failure.c @@ -1247,7 +1247,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; } num_poisoned_pages_inc(); @@ -1456,6 +1456,7 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; goto unlock_mutex; } From patchwork Fri May 21 03:01:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12271795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91B88C433ED for ; Fri, 21 May 2021 03:02:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2AE786108D for ; Fri, 21 May 2021 03:02:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AE786108D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BAA988D001E; Thu, 20 May 2021 23:02:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B830D8D0001; Thu, 20 May 2021 23:02:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A22958D001E; Thu, 20 May 2021 23:02:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id 6E9548D0001 for ; Thu, 20 May 2021 23:02:14 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1A4E0181AF5C2 for ; Fri, 21 May 2021 03:02:14 +0000 (UTC) X-FDA: 78163739388.37.2DAEC86 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf03.hostedemail.com (Postfix) with ESMTP id A611BC001C75 for ; Fri, 21 May 2021 03:02:11 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id e15so3576903plh.1 for ; Thu, 20 May 2021 20:02:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MH94uIHMeoDjk3zXobqNy3iErBgo2MMguCnQ5paX5D4=; b=C0Szw1wFaLt4RsgPq1+m2fKIvOSDWr8xoUgr3cDf1tIsu9MRuDdE+bXUp9t14OM+Vg CNsHJpOWc8BNDdBHFo08XSYRl3jCpzK2e71kFFio/Fpcvd1ZfGXq4QNvGQzdynCjV5A1 6e+XFxZLR5FAYxTU2wx7kLPmz8VzUkT4Dv/JrwO9gd7ZtJlAvahun7wdTCKQjNpApyOl 76yXZ0eveC3XW98MCRt5QHi+ivqgYkPb7s84NEfRsqqamEZwBtCh9t5rIQXXzT4vJqTP 0F41PzWfmaE7us3HDdsWMoIvSGZlwtawApFbeJXsHrIXI8nBaE2ei4tmFO9mKaYokT9B /+5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MH94uIHMeoDjk3zXobqNy3iErBgo2MMguCnQ5paX5D4=; b=oMWF32RVH58zizcfJNwObFzedistRp+qMO405h9Ff2GcDXDDZA+FsptJa3YDZHy6Zf 5yG9zTjugyFJlOCLHox7HMAPsNw+D6OcI+vKQIhXcYYX1F1RFAjep8Z8n2fmOAmMJoH3 5uc3/7JXblORy+JMNR9gUG63t9PVThK2CHlWuD5OSTOt+VrHm0QH47inoFpZ9DtT22XZ zOtgY+l1cAPgYxt1mR71ZaQoE4g3/T0zRQhkjeQ1Vo03vS7jfH3CqJoeu4S0JKRyvACt 9geH0uqLPeSzl8T6Nn672vzo7MNMhOdf/fx3DR87V/HlBQrSqtXH665ooRgr+AuOP59G BeUA== X-Gm-Message-State: AOAM532ke5TyxDZcKOGcrmmb/uxoBQOZcilY8/mfOI4g060h6EiCUKx4 V+HwSc9nH9Y3tP31kH9rDgGAvOmFRTDNcyw= X-Google-Smtp-Source: ABdhPJybg9fSkjtWKRuss5+gcKVA5fpqZNUNWUejwaSKi+iUWFMSdo97tXrKmGz1mCQW6/cxvEwgRg== X-Received: by 2002:a17:90b:e02:: with SMTP id ge2mr8361420pjb.196.1621566132714; Thu, 20 May 2021 20:02:12 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id 191sm2959677pfx.121.2021.05.20.20.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 20:02:12 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v5 3/3] mm,hwpoison: Send SIGBUS with error virutal address Date: Fri, 21 May 2021 12:01:56 +0900 Message-Id: <20210521030156.2612074-4-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210521030156.2612074-1-nao.horiguchi@gmail.com> References: <20210521030156.2612074-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=C0Szw1wF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of naohoriguchi@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=naohoriguchi@gmail.com X-Stat-Signature: 9t7dy7mtecgtip1a3qs3rh7pzdapbb4k X-Rspamd-Queue-Id: A611BC001C75 X-Rspamd-Server: rspam02 X-HE-Tag: 1621566131-515222 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Now an action required MCE in already hwpoisoned address surely sends a SIGBUS to current process, but the SIGBUS doesn't convey error virtual address. That's not optimal for hwpoison-aware applications. To fix the issue, make memory_failure() call kill_accessing_process(), that does pagetable walk to find the error virtual address. It could find multiple virtual addresses for the same error page, and it seems hard to tell which virtual address is correct one. But that's rare and sending incorrect virtual address could be better than no address. So let's report the first found virtual address for now. Signed-off-by: Naoya Horiguchi --- change log v4 -> v5: - switched to first found approach, - introduced check_hwpoisoned_pmd_entry() to fix build failure on arch without thp support. change log v3 -> v4: - refactored hwpoison_pte_range to save indentation, - updated patch description change log v1 -> v2: - initialize local variables in check_hwpoisoned_entry() and hwpoison_pte_range() - fix and improve logic to calculate error address offset. --- arch/x86/kernel/cpu/mce/core.c | 13 ++- include/linux/swapops.h | 5 ++ mm/memory-failure.c | 150 ++++++++++++++++++++++++++++++++- 3 files changed, 165 insertions(+), 3 deletions(-) diff --git v5.13-rc2/arch/x86/kernel/cpu/mce/core.c v5.13-rc2_patched/arch/x86/kernel/cpu/mce/core.c index bf7fe87a7e88..22791aadc085 100644 --- v5.13-rc2/arch/x86/kernel/cpu/mce/core.c +++ v5.13-rc2_patched/arch/x86/kernel/cpu/mce/core.c @@ -1257,19 +1257,28 @@ static void kill_me_maybe(struct callback_head *cb) { struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me); int flags = MF_ACTION_REQUIRED; + int ret; pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr); if (!p->mce_ripv) flags |= MF_MUST_KILL; - if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && - !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { + ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags); + if (!ret && !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); sync_core(); return; } + /* + * -EHWPOISON from memory_failure() means that it already sent SIGBUS + * to the current process with the proper error info, so no need to + * send SIGBUS here again. + */ + if (ret == -EHWPOISON) + return; + if (p->mce_vaddr != (void __user *)-1l) { force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT); } else { diff --git v5.13-rc2/include/linux/swapops.h v5.13-rc2_patched/include/linux/swapops.h index d9b7c9132c2f..98ea67fcf360 100644 --- v5.13-rc2/include/linux/swapops.h +++ v5.13-rc2_patched/include/linux/swapops.h @@ -323,6 +323,11 @@ static inline int is_hwpoison_entry(swp_entry_t entry) return swp_type(entry) == SWP_HWPOISON; } +static inline unsigned long hwpoison_entry_to_pfn(swp_entry_t entry) +{ + return swp_offset(entry); +} + static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c index 8add7cafad5e..137cd0f61af3 100644 --- v5.13-rc2/mm/memory-failure.c +++ v5.13-rc2_patched/mm/memory-failure.c @@ -56,6 +56,7 @@ #include #include #include +#include #include "internal.h" #include "ras/ras_event.h" @@ -554,6 +555,148 @@ static void collect_procs(struct page *page, struct list_head *tokill, collect_procs_file(page, tokill, force_early); } +struct hwp_walk { + struct to_kill tk; + unsigned long pfn; + int flags; +}; + +static void set_to_kill(struct to_kill *tk, unsigned long addr, short shift) +{ + tk->addr = addr; + tk->size_shift = shift; +} + +static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, + unsigned long poisoned_pfn, struct to_kill *tk) +{ + unsigned long pfn = 0; + + if (pte_present(pte)) { + pfn = pte_pfn(pte); + } else { + swp_entry_t swp = pte_to_swp_entry(pte); + + if (is_hwpoison_entry(swp)) + pfn = hwpoison_entry_to_pfn(swp); + } + + if (!pfn || pfn != poisoned_pfn) + return 0; + + set_to_kill(tk, addr, shift); + return 1; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static int check_hwpoisoned_pmd_entry(pmd_t *pmdp, unsigned long addr, + struct hwp_walk *hwp) +{ + pmd_t pmd = *pmdp; + unsigned long pfn; + unsigned long hwpoison_vaddr; + + if (!pmd_present(pmd)) + return 0; + pfn = pmd_pfn(pmd); + if (pfn <= hwp->pfn && hwp->pfn < pfn + HPAGE_PMD_NR) { + hwpoison_vaddr = addr + ((hwp->pfn - pfn) << PAGE_SHIFT); + set_to_kill(&hwp->tk, hwpoison_vaddr, PAGE_SHIFT); + return 1; + } + return 0; +} +#else +static int check_hwpoisoned_pmd_entry(pmd_t *pmdp, unsigned long addr, + struct hwp_walk *hwp) +{ + return 0; +} +#endif + +static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct hwp_walk *hwp = (struct hwp_walk *)walk->private; + int ret = 0; + pte_t *ptep; + spinlock_t *ptl; + + ptl = pmd_trans_huge_lock(pmdp, walk->vma); + if (ptl) { + ret = check_hwpoisoned_pmd_entry(pmdp, addr, hwp); + spin_unlock(ptl); + goto out; + } + + if (pmd_trans_unstable(pmdp)) + goto out; + + ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); + for (; addr != end; ptep++, addr += PAGE_SIZE) { + ret = check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT, + hwp->pfn, &hwp->tk); + if (ret == 1) + break; + } + pte_unmap_unlock(ptep - 1, ptl); +out: + cond_resched(); + return ret; +} + +#ifdef CONFIG_HUGETLB_PAGE +static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct hwp_walk *hwp = (struct hwp_walk *)walk->private; + pte_t pte = huge_ptep_get(ptep); + struct hstate *h = hstate_vma(walk->vma); + + return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), + hwp->pfn, &hwp->tk); +} +#else +#define hwpoison_hugetlb_range NULL +#endif + +static struct mm_walk_ops hwp_walk_ops = { + .pmd_entry = hwpoison_pte_range, + .hugetlb_entry = hwpoison_hugetlb_range, +}; + +/* + * Sends SIGBUS to the current process with error info. + * + * This function is intended to handle "Action Required" MCEs on already + * hardware poisoned pages. They could happen, for example, when + * memory_failure() failed to unmap the error page at the first call, or + * when multiple local machine checks happened on different CPUs. + * + * MCE handler currently has no easy access to the error virtual address, + * so this function walks page table to find it. The returned virtual address + * is proper in most cases, but it could be wrong when the application + * process has multiple entries mapping the error page. + */ +static int kill_accessing_process(struct task_struct *p, unsigned long pfn, + int flags) +{ + int ret; + struct hwp_walk priv = { + .pfn = pfn, + }; + priv.tk.tsk = p; + + mmap_read_lock(p->mm); + ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops, + (void *)&priv); + if (!ret && priv.tk.addr) + kill_proc(&priv.tk, pfn, flags); + mmap_read_unlock(p->mm); + return ret ? -EFAULT : -EHWPOISON; +} + static const char *action_name[] = { [MF_IGNORED] = "Ignored", [MF_FAILED] = "Failed", @@ -1247,7 +1390,10 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return -EHWPOISON; + res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, page_to_pfn(head), flags); + return res; } num_poisoned_pages_inc(); @@ -1457,6 +1603,8 @@ int memory_failure(unsigned long pfn, int flags) pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, pfn, flags); goto unlock_mutex; }