From patchwork Fri Apr 9 20:27:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12195023 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4DF7C43460 for ; Fri, 9 Apr 2021 20:27:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 949E161108 for ; Fri, 9 Apr 2021 20:27:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 949E161108 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 361A16B0075; Fri, 9 Apr 2021 16:27:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 337996B0078; Fri, 9 Apr 2021 16:27:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 201CE6B007B; Fri, 9 Apr 2021 16:27:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 029FD6B0075 for ; Fri, 9 Apr 2021 16:27:21 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B81A3AF70 for ; Fri, 9 Apr 2021 20:27:21 +0000 (UTC) X-FDA: 78013963482.39.5D52C1D Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf21.hostedemail.com (Postfix) with ESMTP id 1FD33E000115 for ; Fri, 9 Apr 2021 20:27:19 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 3CED161007; Fri, 9 Apr 2021 20:27:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1618000040; bh=SakCH8C8OtsgETv5kZZnG3bXMi7xsKfTvcCZqYP0FHM=; h=Date:From:To:Subject:In-Reply-To:From; b=vkYkBYll71ckwxgrQnMrqZxPkwqiidOhWNO6wK51c1bY3HNNKYdJqAHL+EHOHWPQr wMTYgM0I4d4JNLVlRZMJxymajkkIxFaO9yVRl/ihq6oYIuP2PBlrCaiUnmI6bH0foA KTOJF8TqfJbgjRMN+nGh8VcrSl3nhLswr1m5o1Ck= Date: Fri, 09 Apr 2021 13:27:19 -0700 From: Andrew Morton To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, torvalds@linux-foundation.org, willy@infradead.org, yaoaili@kingsoft.com Subject: [patch 07/16] mm/gup: check page posion status for coredump. Message-ID: <20210409202719.tI0cmqT6l%akpm@linux-foundation.org> In-Reply-To: <20210409132633.6855fc8fea1b3905ea1bb4be@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1FD33E000115 X-Stat-Signature: jfp11f49gwgy9gwatx1fg7mja79x5fu6 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618000039-29104 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aili Yao Subject: mm/gup: check page posion status for coredump. When we do coredump for user process signal, this may be an SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory recovery work is finished correctly, then the get_dump_page() will not return the error page as its process pte is set invalid by memory_failure(). But memory_failure() may fail, and the process's related pte may not be correctly set invalid, for current code, we will return the poison page, get it dumped, and then lead to system panic as its in kernel code. So check the poison status in get_dump_page(), and if TRUE, return NULL. There maybe other scenario that is also better to check the posion status and not to panic, so make a wrapper for this check, Thanks to David's suggestion(). [akpm@linux-foundation.org: s/0/false/] [yaoaili@kingsoft.com: is_page_poisoned() arg cannot be null, per Matthew] Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine Link: https://lkml.kernel.org/r/20210322115233.05e4e82a@alex-virtual-machine Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine Signed-off-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Aili Yao Signed-off-by: Andrew Morton --- mm/gup.c | 4 ++++ mm/internal.h | 20 ++++++++++++++++++++ 2 files changed, 24 insertions(+) --- a/mm/gup.c~mm-gup-check-page-posion-status-for-coredump +++ a/mm/gup.c @@ -1535,6 +1535,10 @@ struct page *get_dump_page(unsigned long FOLL_FORCE | FOLL_DUMP | FOLL_GET); if (locked) mmap_read_unlock(mm); + + if (ret == 1 && is_page_poisoned(page)) + return NULL; + return (ret == 1) ? page : NULL; } #endif /* CONFIG_ELF_CORE */ --- a/mm/internal.h~mm-gup-check-page-posion-status-for-coredump +++ a/mm/internal.h @@ -97,6 +97,26 @@ static inline void set_page_refcounted(s set_page_count(page, 1); } +/* + * When kernel touch the user page, the user page may be have been marked + * poison but still mapped in user space, if without this page, the kernel + * can guarantee the data integrity and operation success, the kernel is + * better to check the posion status and avoid touching it, be good not to + * panic, coredump for process fatal signal is a sample case matching this + * scenario. Or if kernel can't guarantee the data integrity, it's better + * not to call this function, let kernel touch the poison page and get to + * panic. + */ +static inline bool is_page_poisoned(struct page *page) +{ + if (PageHWPoison(page)) + return true; + else if (PageHuge(page) && PageHWPoison(compound_head(page))) + return true; + + return false; +} + extern unsigned long highest_memmap_pfn; /*