From patchwork Thu Sep 23 03:28:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 12511893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92E52C433FE for ; Thu, 23 Sep 2021 03:28:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2328960F48 for ; Thu, 23 Sep 2021 03:28:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2328960F48 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BDDED6B006C; Wed, 22 Sep 2021 23:28:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8C3E940007; Wed, 22 Sep 2021 23:28:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7C6E6B0072; Wed, 22 Sep 2021 23:28:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id 996BA6B006C for ; Wed, 22 Sep 2021 23:28:50 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 54E6223E50 for ; Thu, 23 Sep 2021 03:28:50 +0000 (UTC) X-FDA: 78617406420.19.E1E3047 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf17.hostedemail.com (Postfix) with ESMTP id F37F6F000136 for ; Thu, 23 Sep 2021 03:28:49 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id j15so1687970plh.7 for ; Wed, 22 Sep 2021 20:28:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=L73Wo9W9HNFfCQZNH7vnv7PBAtWSuL4iMpzvZ9oV57A=; b=DlhRfef0wm9RQIHDUoMltF6Fl1ae/QjMynQtiQjkBy7RSWyZZ1FQsfpHu1K/OjWrwD lC40P+9KDlG4G/slC65tKyQJl1UjINgwibzZpFm/z4hZjNat7i9MMhdQVe3UN/XaJniU qjMl6k7jJYNjwRv/8+GOjkxJri1K3X3ABCoZtl4TKb/+2ORx5WXlnHHfiTjRmhB93Zt+ omyAIKPUhRg2yVdwIKmdT/QI0F1hTZJgkShXU0Q7pf8TVqZIRDnEpg93c4b6soLHJ1Vt dD7gLYMe3fPzVfZPi6MVhhfEliC9AN8DeH1krSrPtC7yWBlPFS0xKiC4+wCQd0ADE5fm QA7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=L73Wo9W9HNFfCQZNH7vnv7PBAtWSuL4iMpzvZ9oV57A=; b=h+ZplfKp2q1A/Xg6rOiOvzzIOnzosWNqtubnuLwQo48Y51UYz7WmO5P9Y3nPi6Ngmv FqXMEPNZ794eeBgE5T27yOB9LrR3dQfDWm1z9mwr/5Y3g3AvNUMujesiCpuv7lzSU3tv cJ11XGmHhISSGFw8lTLP9kxEQKDId6AvyAMBm2LU4TMUeXERW+/jVpGEHa891Mc3oXfk WBj5ghhKe/keReTeEPIy/PnVAaCwOZp4YVPmdG88bIhf7dXfAa+NyZpnkifBNV1UeR7J XBsbHkt28wS0e/VT6kIoK3yCvcRL6bLz3vANX6htVAWMYeYow1clxH1g3euAmJhjXLkQ 1SsA== X-Gm-Message-State: AOAM531MVB7SH6hTqZkgsANgsIbq5+xFKkCjobE2V/lwFVKUjZsUN/Ni amfcoB8h8CHqUSq+YxdHdpY= X-Google-Smtp-Source: ABdhPJw/Wx92TTkQLGU1qB4d4XVgn8HofDtq+ukKz/VKsQP0awUfmDTEQnQhjCW0uiBL5mjGlOYGeA== X-Received: by 2002:a17:902:b410:b0:13a:3f4a:db58 with SMTP id x16-20020a170902b41000b0013a3f4adb58mr2239986plr.12.1632367729074; Wed, 22 Sep 2021 20:28:49 -0700 (PDT) Received: from localhost.localdomain (c-73-93-239-127.hsd1.ca.comcast.net. [73.93.239.127]) by smtp.gmail.com with ESMTPSA id x8sm3699696pfq.131.2021.09.22.20.28.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Sep 2021 20:28:48 -0700 (PDT) From: Yang Shi To: naoya.horiguchi@nec.com, hughd@google.com, kirill.shutemov@linux.intel.com, willy@infradead.org, peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 1/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Date: Wed, 22 Sep 2021 20:28:26 -0700 Message-Id: <20210923032830.314328-2-shy828301@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210923032830.314328-1-shy828301@gmail.com> References: <20210923032830.314328-1-shy828301@gmail.com> MIME-Version: 1.0 X-Stat-Signature: pqsfx8yqtf33xy6sk9w8b69u4sgk8tkk Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=DlhRfef0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: F37F6F000136 X-HE-Tag: 1632367729-125940 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When handling shmem page fault the THP with corrupted subpage could be PMD mapped if certain conditions are satisfied. But kernel is supposed to send SIGBUS when trying to map hwpoisoned page. There are two paths which may do PMD map: fault around and regular fault. Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") the thing was even worse in fault around path. The THP could be PMD mapped as long as the VMA fits regardless what subpage is accessed and corrupted. After this commit as long as head page is not corrupted the THP could be PMD mapped. In the regulat fault path the THP could be PMD mapped as long as the corrupted page is not accessed and the VMA fits. This loophole could be fixed by iterating every subpage to check if any of them is hwpoisoned or not, but it is somewhat costly in page fault path. So introduce a new page flag called HasHWPoisoned on the first tail page. It indicates the THP has hwpoisoned subpage(s). It is set if any subpage of THP is found hwpoisoned by memory failure and cleared when the THP is freed or split. Cc: Suggested-by: Kirill A. Shutemov Signed-off-by: Yang Shi --- include/linux/page-flags.h | 19 +++++++++++++++++++ mm/filemap.c | 15 +++++++++------ mm/huge_memory.c | 2 ++ mm/memory-failure.c | 4 ++++ mm/memory.c | 9 +++++++++ mm/page_alloc.c | 4 +++- 6 files changed, 46 insertions(+), 7 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index a558d67ee86f..a357b41b3057 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -171,6 +171,11 @@ enum pageflags { /* Compound pages. Stored in first tail page's flags */ PG_double_map = PG_workingset, +#ifdef CONFIG_MEMORY_FAILURE + /* Compound pages. Stored in first tail page's flags */ + PG_has_hwpoisoned = PG_mappedtodisk, +#endif + /* non-lru isolated movable page */ PG_isolated = PG_reclaim, @@ -668,6 +673,20 @@ PAGEFLAG_FALSE(DoubleMap) TESTSCFLAG_FALSE(DoubleMap) #endif +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE) +/* + * PageHasPoisoned indicates that at least on subpage is hwpoisoned in the + * compound page. + * + * This flag is set by hwpoison handler. Cleared by THP split or free page. + */ +PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND) + TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND) +#else +PAGEFLAG_FALSE(HasHWPoisoned) + TESTSCFLAG_FALSE(HasHWPoisoned) +#endif + /* * Check if a page is currently marked HWPoisoned. Note that this check is * best effort only and inherently racy: there is no way to synchronize with diff --git a/mm/filemap.c b/mm/filemap.c index dae481293b5d..740b7afe159a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3195,12 +3195,14 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) } if (pmd_none(*vmf->pmd) && PageTransHuge(page)) { - vm_fault_t ret = do_set_pmd(vmf, page); - if (!ret) { - /* The page is mapped successfully, reference consumed. */ - unlock_page(page); - return true; - } + vm_fault_t ret = do_set_pmd(vmf, page); + if (ret == VM_FAULT_FALLBACK) + goto out; + if (!ret) { + /* The page is mapped successfully, reference consumed. */ + unlock_page(page); + return true; + } } if (pmd_none(*vmf->pmd)) { @@ -3220,6 +3222,7 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) return true; } +out: return false; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5e9ef0fc261e..0574b1613714 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list, /* lock lru list/PageCompound, ref frozen by page_ref_freeze */ lruvec = lock_page_lruvec(head); + ClearPageHasHWPoisoned(head); + for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); /* Some pages can be beyond EOF: drop them from page cache */ diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 54879c339024..93ae0ce90ab8 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1663,6 +1663,10 @@ int memory_failure(unsigned long pfn, int flags) } orig_head = hpage = compound_head(p); + + if (PageTransHuge(hpage)) + SetPageHasHWPoisoned(orig_head); + num_poisoned_pages_inc(); /* diff --git a/mm/memory.c b/mm/memory.c index 25fc46e87214..738f4e1df81e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3905,6 +3905,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) if (compound_order(page) != HPAGE_PMD_ORDER) return ret; + /* + * Just backoff if any subpage of a THP is corrupted otherwise + * the corrupted page may mapped by PMD silently to escape the + * check. This kind of THP just can be PTE mapped. Access to + * the corrupted subpage should trigger SIGBUS as expected. + */ + if (unlikely(PageHasHWPoisoned(page))) + return ret; + /* * Archs like ppc64 need additional space to store information * related to pte entry. Use the preallocated table for that. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b37435c274cf..7f37652f0287 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); - if (compound) + if (compound) { ClearPageDoubleMap(page); + ClearPageHasHWPoisoned(page); + } for (i = 1; i < (1 << order); i++) { if (compound) bad += free_tail_pages_check(page, page + i);