From patchwork Wed Apr 21 00:57:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12215265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17904C433B4 for ; Wed, 21 Apr 2021 00:57:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 69862613F5 for ; Wed, 21 Apr 2021 00:57:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 69862613F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E85B18D0002; Tue, 20 Apr 2021 20:57:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5F3B6B0071; Tue, 20 Apr 2021 20:57:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFE718D0002; Tue, 20 Apr 2021 20:57:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id B6ED26B0070 for ; Tue, 20 Apr 2021 20:57:43 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6588D1804EE06 for ; Wed, 21 Apr 2021 00:57:43 +0000 (UTC) X-FDA: 78054561606.01.1146BF3 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf01.hostedemail.com (Postfix) with ESMTP id 31AB95001533 for ; Wed, 21 Apr 2021 00:57:41 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id a12so27035804pfc.7 for ; Tue, 20 Apr 2021 17:57:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2nq3/ECMlWFOh9aBfURRW3bFhH3w1pC1RXXgmopGjoU=; b=RX8GqcDVqZryXJsFgtBirG/p+dv1y2g1b+RWkMv0FQyh6r5Ipj7L6leC1fIJSC/9n3 28PxkUuszl8U/a7UgsjXKZyGL6Sj2ptJBc/sts2+QhRCiZ8s+ZMW2bFg7eWXdiWzr1TA Xaf8FKzhhDJ/zSf1ilrHy17GfNF0xRqvhOs55tac5DqxOZTAoU0rwZ6OAookbT5S6yHT 57ptXEh+pvDkg0orHgcgYYvztvSjIJWtn/JEmQB59nxhlMNIFZnqxSV/IqI7iP1N5BTL DLH4GuI2CsgZUCqrHe+4LYcJ7OJLPdwyP9VPfRvG74jnazmIaY1ZFE2iuC58AFfMm1vy 6Nww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2nq3/ECMlWFOh9aBfURRW3bFhH3w1pC1RXXgmopGjoU=; b=OmIdC0SYHcf9uqCQvrhHO4qwjoTDPC+QIWaiaBMT8KOBTzM0Oh0OFL6yI6in7U1WLS gxZ63kS455dYhr5IhFypZF/KZxcyAufwnG321Kv8jg4suePIt7QEkMPdmZYOqhdjKRH0 MNm47DY6f640QFRBIXcoSvADchlM5SdLnrcrF74+XwENkui0rQ7wHLpQhBjJ7f6OW7Xx Y8VOixegL8FfXKm8EZzpfit75ZqizzypPfemKO/5jO9Xw+3btn7GEEihg7KonTHH1lWN VujSQR0yRXF2Hsp3yivG9+BaFYEsNpbB20a28S7Auob5G8uOj6LSd4VSUT/Je5zImR82 ezYw== X-Gm-Message-State: AOAM533/BJgb/ehU+pFjptG/S2tPACUf6FGTteHJvUTzGBZ0g+9lgeC5 CSK8BD+/R1p9kQdo30sy/qoPNN/09q4b1hI= X-Google-Smtp-Source: ABdhPJwZIo48/xU0japGGgAYMRnjMEC6BjOY0TIBYvR1Z6I1QyWJulIj08Iac3q+WSoNK7ZCcx85rw== X-Received: by 2002:a63:344:: with SMTP id 65mr19429870pgd.24.1618966662035; Tue, 20 Apr 2021 17:57:42 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id e13sm178278pfi.199.2021.04.20.17.57.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:57:41 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v3 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Date: Wed, 21 Apr 2021 09:57:26 +0900 Message-Id: <20210421005728.1994268-2-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210421005728.1994268-1-nao.horiguchi@gmail.com> References: <20210421005728.1994268-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 31AB95001533 X-Stat-Signature: dgeutsjqhi17r1nu99tp3f747rowojpw Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf01; identity=mailfrom; envelope-from=""; helo=mail-pf1-f180.google.com; client-ip=209.85.210.180 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618966661-61254 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tony Luck There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks. But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing. Fix by wrapping most of the internal parts of memory_failure() in a mutex. Signed-off-by: Tony Luck Signed-off-by: Naoya Horiguchi Reviewed-by: Borislav Petkov --- mm/memory-failure.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git v5.12-rc8/mm/memory-failure.c v5.12-rc8_patched/mm/memory-failure.c index 24210c9bd843..4087308e4b32 100644 --- v5.12-rc8/mm/memory-failure.c +++ v5.12-rc8_patched/mm/memory-failure.c @@ -1381,6 +1381,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +static DEFINE_MUTEX(mf_mutex); + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1404,7 +1406,7 @@ int memory_failure(unsigned long pfn, int flags) struct page *hpage; struct page *orig_head; struct dev_pagemap *pgmap; - int res; + int res = 0; unsigned long page_flags; bool retry = true; @@ -1424,13 +1426,18 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; } + mutex_lock(&mf_mutex); + try_again: - if (PageHuge(p)) - return memory_failure_hugetlb(pfn, flags); + if (PageHuge(p)) { + res = memory_failure_hugetlb(pfn, flags); + goto unlock_mutex; + } + if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + goto unlock_mutex; } orig_head = hpage = compound_head(p); @@ -1463,17 +1470,19 @@ int memory_failure(unsigned long pfn, int flags) res = MF_FAILED; } action_result(pfn, MF_MSG_BUDDY, res); - return res == MF_RECOVERED ? 0 : -EBUSY; + res = res == MF_RECOVERED ? 0 : -EBUSY; } else { action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); - return -EBUSY; + res = -EBUSY; } + goto unlock_mutex; } if (PageTransHuge(hpage)) { if (try_to_split_thp_page(p, "Memory Failure") < 0) { action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); - return -EBUSY; + res = -EBUSY; + goto unlock_mutex; } VM_BUG_ON_PAGE(!page_count(p), p); } @@ -1497,7 +1506,7 @@ int memory_failure(unsigned long pfn, int flags) if (PageCompound(p) && compound_head(p) != orig_head) { action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } /* @@ -1517,14 +1526,14 @@ int memory_failure(unsigned long pfn, int flags) num_poisoned_pages_dec(); unlock_page(p); put_page(p); - return 0; + goto unlock_mutex; } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); unlock_page(p); put_page(p); - return 0; + goto unlock_mutex; } if (!PageTransTail(p) && !PageLRU(p)) @@ -1543,7 +1552,7 @@ int memory_failure(unsigned long pfn, int flags) if (!hwpoison_user_mappings(p, pfn, flags, &p)) { action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } /* @@ -1552,13 +1561,15 @@ int memory_failure(unsigned long pfn, int flags) if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) { action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } identify_page_state: res = identify_page_state(pfn, p, page_flags); -out: +unlock_page: unlock_page(p); +unlock_mutex: + mutex_unlock(&mf_mutex); return res; } EXPORT_SYMBOL_GPL(memory_failure); From patchwork Wed Apr 21 00:57:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12215267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A92A4C433B4 for ; Wed, 21 Apr 2021 00:57:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3917161409 for ; Wed, 21 Apr 2021 00:57:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3917161409 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C16456B0070; Tue, 20 Apr 2021 20:57:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BEDDC8D0003; Tue, 20 Apr 2021 20:57:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB3DC6B0072; Tue, 20 Apr 2021 20:57:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id 89E6C6B0070 for ; Tue, 20 Apr 2021 20:57:46 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4B82E181AEF3E for ; Wed, 21 Apr 2021 00:57:46 +0000 (UTC) X-FDA: 78054561732.14.64E8989 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf18.hostedemail.com (Postfix) with ESMTP id AF71F2000242 for ; Wed, 21 Apr 2021 00:57:47 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id h15so9349781pfv.2 for ; Tue, 20 Apr 2021 17:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ykSjY95cqkRXggeKZ48gy3dH7XwchqBVjMxlboR4Vf0=; b=i9MK1MbRoL2svXM3fLPYJa68MIZAR2O8jBzW0qTKgkc0G5kVL2RFQmC++wbYq7tCl+ c1/1iWr1tB7CSeCfmYA4x+Ev8FMT9qOCvPGMdoihgEnGpvQUwe5QXZ52fGDzZHgj+NIY 00y/x/BuJ5ezpZfZLXPLI0kDX1vdV5c1N+TvY+xEbytAtrA88bHXTw1hA97SMILuYGEk GyzlMoiPqZb/xIxlO+mCpN8tio3KFIFPhPrqWITKZ9gy67/rYdr6ob/u6P5m2izvgiyY XxapFWLx013eqSAdnJT/adrjP0copP/zC6v9IDejqa6cVxLAAK2C7DfoAeEO1dTtnqEm rEpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ykSjY95cqkRXggeKZ48gy3dH7XwchqBVjMxlboR4Vf0=; b=kSKqncz9/TzY1TwzxmI/iz+7kaA6aE7OlmWq2lRUt4qeo6ICoQw2nK7ti4AxefGjlK wDo+Zk0//Q+JP3Y0Z1kfSWQVUzhxNKWd+gxMezKcMoxz3cctZ0O+6Tj8+HvnSNnV/0Zx HUo799Mst5Vcf9Fbgaow82EwP8WrdCPBeMJiQQ0bzSY3rpypCLyhwDTr757V6eAIeHP/ 570cyL1cnBG5vzH8lpf1IUmhoedfZWvZ8g0rGReAuu2AdGzHlCgBNGwCNoVsaUvr1Oxm Br4dSMBNAnV5YnkNAI8wo5wLGRhUX2RGZmsy7ZWRkc1JrC3gj5z6GwIh3YCceM/bq1RK ClLw== X-Gm-Message-State: AOAM530yZc5d9AG8nm+D2vYfnyJcNrA7jUlbaTJvuc575lXIAORuRyWF OjI5Dt2Q37gAvcnR7xsC7AGLadOP8oYRo3E= X-Google-Smtp-Source: ABdhPJxR5nyNxQS5ervGrHw4rorWmK1JFla9w4YOeUKrYHxk6hRWcvt52rQ4HI2IPtti5inBvSXy4A== X-Received: by 2002:a62:3086:0:b029:248:16e0:7c6 with SMTP id w128-20020a6230860000b029024816e007c6mr28088086pfw.19.1618966664818; Tue, 20 Apr 2021 17:57:44 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id e13sm178278pfi.199.2021.04.20.17.57.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:57:44 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already Date: Wed, 21 Apr 2021 09:57:27 +0900 Message-Id: <20210421005728.1994268-3-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210421005728.1994268-1-nao.horiguchi@gmail.com> References: <20210421005728.1994268-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AF71F2000242 X-Stat-Signature: bwpd9kix19r35qr4jxkbzw388wsw4kmg X-Rspamd-Server: rspam02 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=mail-pf1-f181.google.com; client-ip=209.85.210.181 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618966667-222485 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aili Yao When the page is already poisoned, another memory_failure() call in the same page now returns 0, meaning OK. For nested memory mce handling, this behavior may lead to one mce looping, Example: 1. When LCME is enabled, and there are two processes A && B running on different core X && Y separately, which will access one same page, then the page corrupted when process A access it, a MCE will be rasied to core X and the error process is just underway. 2. Then B access the page and trigger another MCE to core Y, it will also do error process, it will see TestSetPageHWPoison be true, and 0 is returned. 3. The kill_me_maybe will check the return: 1244 static void kill_me_maybe(struct callback_head *cb) 1245 { ... 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); 1257 sync_core(); 1258 return; 1259 } ... 1267 } 4. The error process for B will end, and may nothing happened if kill-early is not set, The process B will re-excute instruction and get into mce again and then loop happens. And also the set_mce_nospec() here is not proper, may refer to commit fd0e786d9d09 ("x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages"). For other cases which care the return value of memory_failure() should check why they want to process a memory error which have already been processed. This behavior seems reasonable. Signed-off-by: Aili Yao Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git v5.12-rc8/mm/memory-failure.c v5.12-rc8_patched/mm/memory-failure.c index 4087308e4b32..39d0ff0339b9 100644 --- v5.12-rc8/mm/memory-failure.c +++ v5.12-rc8_patched/mm/memory-failure.c @@ -1228,7 +1228,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; } num_poisoned_pages_inc(); @@ -1437,6 +1437,7 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; goto unlock_mutex; } From patchwork Wed Apr 21 00:57:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12215269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1686C433B4 for ; Wed, 21 Apr 2021 00:57:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4055261409 for ; Wed, 21 Apr 2021 00:57:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4055261409 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C63438D0005; Tue, 20 Apr 2021 20:57:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C39FC8D0003; Tue, 20 Apr 2021 20:57:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8DA08D0005; Tue, 20 Apr 2021 20:57:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 8FB328D0003 for ; Tue, 20 Apr 2021 20:57:49 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 44B363657 for ; Wed, 21 Apr 2021 00:57:49 +0000 (UTC) X-FDA: 78054561858.35.2B7B13C Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf21.hostedemail.com (Postfix) with ESMTP id 470BFE00011A for ; Wed, 21 Apr 2021 00:57:46 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id 20so16712889pll.7 for ; Tue, 20 Apr 2021 17:57:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EVzrlAG5S4c5i21A2xQbfzsii/cvRpawzHB8MLnK5hw=; b=WaRpcDHov1beFYCT2p6RMBCG8Y8ImRcIA3gF3lzt8NuXHteo9YydUpVX3Suy1+E0Nf Doi+2kM++82zEk6ikHT7iHrD+DrSbTU4z7CH9qrYRfL2MBXIwSXTXikrYYAF6So/2Ku8 JBG3smlDOgOxUUXxEOUBGElexpdkLqLVbXVLabGyD1TI6tJlGzmT99diIuYPD53ZYbb1 q0qLC2tIOzhxuE2qL0ZnqiclVb3H18XhF/tn2Fc1IcXpsZCjVmUuw4yMZNuKLFw1F9XK bLjUwrNtUDwe9mcqNhuqVxsoJW7sRVR72WzsunIf9e436AwJCjU07GrQEyKUn4rWvzGj 394w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EVzrlAG5S4c5i21A2xQbfzsii/cvRpawzHB8MLnK5hw=; b=NY/ZWM17pxpMFy7usphbgEQjIfXErwN0dotGEqHSSK65x9ZYFqERnhKJFzA8vNHrr5 H3Ia2TW3gl3Qar+m93FonLFFObEiwvOnOVdeQ5GEM4MIb9aPDQ3LiciG0KxORJ0wC0aY 3cyjg7tUziYtoNnIzRg2ui1SFk2t/hOIur9wO2Pv2iICRu598pV4qvpEy+Al12CJ2PbS DSlfOu/kYEktkMVubTzjDYfb4cPyVqTgxSkF3iakSSyvbiKnF0+EgVJqstOeUjhQn8TW K/hXJkoiEpR0PFsF3K19y8B3Pcy8nNWwWjvdqsDQlwfx4hf2Z52NeL7tuBrThS33qujk 5XUQ== X-Gm-Message-State: AOAM531lQ7PAHFbXu/EWfGNQ7KHAYMtRSfCQ4VeGQUc5ihKU0RS1Zjg2 85z86eYu6j43DJvJXCuw56sWYaRPmbsHmgY= X-Google-Smtp-Source: ABdhPJyDF63dp3mPKSPMKWzeP4+nvo6GjM1DF03V/LB7qEKFmHlWODOrF2fXRAg0xkzIHmEvJL8FEQ== X-Received: by 2002:a17:902:8487:b029:e9:a884:7450 with SMTP id c7-20020a1709028487b02900e9a8847450mr31666657plo.49.1618966667722; Tue, 20 Apr 2021 17:57:47 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id e13sm178278pfi.199.2021.04.20.17.57.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Apr 2021 17:57:47 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Date: Wed, 21 Apr 2021 09:57:28 +0900 Message-Id: <20210421005728.1994268-4-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210421005728.1994268-1-nao.horiguchi@gmail.com> References: <20210421005728.1994268-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 470BFE00011A X-Stat-Signature: osnjriyxmboupa5u649zf5wduwg1bmsb X-Rspamd-Server: rspam02 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=mail-pl1-f170.google.com; client-ip=209.85.214.170 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618966666-757883 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi The previous patch solves the infinite MCE loop issue when multiple MCE events races. The remaining issue is to make sure that all threads processing Action Required MCEs send to the current processes the SIGBUS with the proper virtual address and the error size. This patch suggests to do page table walk to find the error virtual address. If we find multiple virtual addresses in walking, we now can't determine which one is correct, so we fall back to sending SIGBUS in kill_me_maybe() without error info as we do now. This corner case needs to be solved in the future. Signed-off-by: Naoya Horiguchi Tested-by: Aili Yao --- change log v1 -> v2: - initialize local variables in check_hwpoisoned_entry() and hwpoison_pte_range() - fix and improve logic to calculate error address offset. --- arch/x86/kernel/cpu/mce/core.c | 13 ++- include/linux/swapops.h | 5 ++ mm/memory-failure.c | 143 ++++++++++++++++++++++++++++++++- 3 files changed, 158 insertions(+), 3 deletions(-) diff --git v5.12-rc8/arch/x86/kernel/cpu/mce/core.c v5.12-rc8_patched/arch/x86/kernel/cpu/mce/core.c index 7962355436da..3ce23445a48c 100644 --- v5.12-rc8/arch/x86/kernel/cpu/mce/core.c +++ v5.12-rc8_patched/arch/x86/kernel/cpu/mce/core.c @@ -1257,19 +1257,28 @@ static void kill_me_maybe(struct callback_head *cb) { struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me); int flags = MF_ACTION_REQUIRED; + int ret; pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr); if (!p->mce_ripv) flags |= MF_MUST_KILL; - if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && - !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { + ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags); + if (!ret && !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); sync_core(); return; } + /* + * -EHWPOISON from memory_failure() means that it already sent SIGBUS + * to the current process with the proper error info, so no need to + * send it here again. + */ + if (ret == -EHWPOISON) + return; + if (p->mce_vaddr != (void __user *)-1l) { force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT); } else { diff --git v5.12-rc8/include/linux/swapops.h v5.12-rc8_patched/include/linux/swapops.h index d9b7c9132c2f..98ea67fcf360 100644 --- v5.12-rc8/include/linux/swapops.h +++ v5.12-rc8_patched/include/linux/swapops.h @@ -323,6 +323,11 @@ static inline int is_hwpoison_entry(swp_entry_t entry) return swp_type(entry) == SWP_HWPOISON; } +static inline unsigned long hwpoison_entry_to_pfn(swp_entry_t entry) +{ + return swp_offset(entry); +} + static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); diff --git v5.12-rc8/mm/memory-failure.c v5.12-rc8_patched/mm/memory-failure.c index 39d0ff0339b9..7cc563e1770a 100644 --- v5.12-rc8/mm/memory-failure.c +++ v5.12-rc8_patched/mm/memory-failure.c @@ -56,6 +56,7 @@ #include #include #include +#include #include "internal.h" #include "ras/ras_event.h" @@ -554,6 +555,141 @@ static void collect_procs(struct page *page, struct list_head *tokill, collect_procs_file(page, tokill, force_early); } +struct hwp_walk { + struct to_kill tk; + unsigned long pfn; + int flags; +}; + +static int set_to_kill(struct to_kill *tk, unsigned long addr, short shift) +{ + /* Abort pagewalk when finding multiple mappings to the error page. */ + if (tk->addr) + return 1; + tk->addr = addr; + tk->size_shift = shift; + return 0; +} + +static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, + unsigned long poisoned_pfn, struct to_kill *tk) +{ + unsigned long pfn = 0; + + if (pte_present(pte)) { + pfn = pte_pfn(pte); + } else { + swp_entry_t swp = pte_to_swp_entry(pte); + + if (is_hwpoison_entry(swp)) + pfn = hwpoison_entry_to_pfn(swp); + } + + if (!pfn || pfn != poisoned_pfn) + return 0; + + return set_to_kill(tk, addr, shift); +} + +static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct hwp_walk *hwp = (struct hwp_walk *)walk->private; + int ret = 0; + pte_t *ptep; + spinlock_t *ptl; + + ptl = pmd_trans_huge_lock(pmdp, walk->vma); + if (ptl) { + pmd_t pmd = *pmdp; + + if (pmd_present(pmd)) { + unsigned long pfn = pmd_pfn(pmd); + + if (pfn <= hwp->pfn && hwp->pfn < pfn + HPAGE_PMD_NR) { + unsigned long hwpoison_vaddr = addr + + ((hwp->pfn - pfn) << PAGE_SHIFT); + + ret = set_to_kill(&hwp->tk, hwpoison_vaddr, + PAGE_SHIFT); + } + } + spin_unlock(ptl); + goto out; + } + + if (pmd_trans_unstable(pmdp)) + goto out; + + ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); + for (; addr != end; ptep++, addr += PAGE_SIZE) { + ret = check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT, + hwp->pfn, &hwp->tk); + if (ret == 1) + break; + } + pte_unmap_unlock(ptep - 1, ptl); +out: + cond_resched(); + return ret; +} + +#ifdef CONFIG_HUGETLB_PAGE +static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct hwp_walk *hwp = (struct hwp_walk *)walk->private; + pte_t pte = huge_ptep_get(ptep); + struct hstate *h = hstate_vma(walk->vma); + + return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), + hwp->pfn, &hwp->tk); +} +#else +#define hwpoison_hugetlb_range NULL +#endif + +static struct mm_walk_ops hwp_walk_ops = { + .pmd_entry = hwpoison_pte_range, + .hugetlb_entry = hwpoison_hugetlb_range, +}; + +/* + * Sends SIGBUS to the current process with the error info. + * + * This function is intended to handle "Action Required" MCEs on already + * hardware poisoned pages. They could happen, for example, when + * memory_failure() failed to unmap the error page at the first call, or + * when multiple local machine checks happened on different CPUs. + * + * MCE handler currently has no easy access to the error virtual address, + * so this function walks page table to find it. One challenge on this is + * to reliably get the proper virual address of the error to report to + * applications via SIGBUS. A process could map a page multiple times to + * different virtual addresses, then we now have no way to tell which virtual + * address was accessed when the Action Required MCE was generated. + * So in such a corner case, we now give up and fall back to sending SIGBUS + * with no error info. + */ +static int kill_accessing_process(struct task_struct *p, unsigned long pfn, + int flags) +{ + int ret; + struct hwp_walk priv = { + .pfn = pfn, + }; + priv.tk.tsk = p; + + mmap_read_lock(p->mm); + ret = walk_page_range(p->mm, 0, TASK_SIZE_MAX, &hwp_walk_ops, + (void *)&priv); + if (!ret && priv.tk.addr) + kill_proc(&priv.tk, pfn, flags); + mmap_read_unlock(p->mm); + return ret ? -EFAULT : -EHWPOISON; +} + static const char *action_name[] = { [MF_IGNORED] = "Ignored", [MF_FAILED] = "Failed", @@ -1228,7 +1364,10 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return -EHWPOISON; + res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, page_to_pfn(head), flags); + return res; } num_poisoned_pages_inc(); @@ -1438,6 +1577,8 @@ int memory_failure(unsigned long pfn, int flags) pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, pfn, flags); goto unlock_mutex; }