From patchwork Thu Jan 14 00:43:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12018037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0F45C433E0 for ; Thu, 14 Jan 2021 00:43:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 36808233EA for ; Thu, 14 Jan 2021 00:43:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36808233EA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B368E8D00A8; Wed, 13 Jan 2021 19:43:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE53F8D008E; Wed, 13 Jan 2021 19:43:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D4C78D00A8; Wed, 13 Jan 2021 19:43:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 805C08D008E for ; Wed, 13 Jan 2021 19:43:22 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4D0BE180AD81A for ; Thu, 14 Jan 2021 00:43:22 +0000 (UTC) X-FDA: 77702531844.14.feet60_4307e0227522 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 313601822987B for ; Thu, 14 Jan 2021 00:43:22 +0000 (UTC) X-HE-Tag: feet60_4307e0227522 X-Filterd-Recvd-Size: 4281 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Jan 2021 00:43:19 +0000 (UTC) IronPort-SDR: vhUHb66b8n5yC+2Lrk78WiAdU5Kkja6gVLN+anr4rsoEVE1k7M6Qy5XRxtE7793eeRi4jJ1+vJ YK5QM1Z2USWg== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="239831028" X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="239831028" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:16 -0800 IronPort-SDR: 115jzHosqSjA608xwh/hG5WnzVebIk5BCHEyB+oFTxzD9/L652pWRDfYgK/3swoa2h56CJIW3w Swad5KOTXWEg== X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="364059362" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.25]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:16 -0800 Subject: [PATCH v4 1/5] mm: Move pfn_to_online_page() out of line From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , David Hildenbrand , Oscar Salvador , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Wed, 13 Jan 2021 16:43:16 -0800 Message-ID: <161058499608.1840162.10165648147615238793.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> References: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pfn_to_online_page() is already too large to be a macro or an inline function. In anticipation of further logic changes / growth, move it out of line. No functional change, just code movement. Reported-by: Michal Hocko Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams Acked-by: Michal Hocko --- include/linux/memory_hotplug.h | 17 +---------------- mm/memory_hotplug.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 15acce5ab106..3d99de0db2dd 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -16,22 +16,7 @@ struct resource; struct vmem_altmap; #ifdef CONFIG_MEMORY_HOTPLUG -/* - * Return page for the valid pfn only if the page is online. All pfn - * walkers which rely on the fully initialized page->flags and others - * should use this rather than pfn_valid && pfn_to_page - */ -#define pfn_to_online_page(pfn) \ -({ \ - struct page *___page = NULL; \ - unsigned long ___pfn = pfn; \ - unsigned long ___nr = pfn_to_section_nr(___pfn); \ - \ - if (___nr < NR_MEM_SECTIONS && online_section_nr(___nr) && \ - pfn_valid_within(___pfn)) \ - ___page = pfn_to_page(___pfn); \ - ___page; \ -}) +struct page *pfn_to_online_page(unsigned long pfn); /* * Types for free bootmem stored in page->lru.next. These have to be in diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f9d57b9be8c7..55a69d4396e7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -300,6 +300,22 @@ static int check_hotplug_memory_addressable(unsigned long pfn, return 0; } +/* + * Return page for the valid pfn only if the page is online. All pfn + * walkers which rely on the fully initialized page->flags and others + * should use this rather than pfn_valid && pfn_to_page + */ +struct page *pfn_to_online_page(unsigned long pfn) +{ + unsigned long nr = pfn_to_section_nr(pfn); + + if (nr < NR_MEM_SECTIONS && online_section_nr(nr) && + pfn_valid_within(pfn)) + return pfn_to_page(pfn); + return NULL; +} +EXPORT_SYMBOL_GPL(pfn_to_online_page); + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will From patchwork Thu Jan 14 00:43:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12018039 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB349C433DB for ; Thu, 14 Jan 2021 00:43:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 83DFE233F6 for ; Thu, 14 Jan 2021 00:43:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83DFE233F6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 149AC8D00A9; Wed, 13 Jan 2021 19:43:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D32C8D008E; Wed, 13 Jan 2021 19:43:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F03A28D00A9; Wed, 13 Jan 2021 19:43:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id D5D3E8D008E for ; Wed, 13 Jan 2021 19:43:25 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A100E1F0A for ; Thu, 14 Jan 2021 00:43:25 +0000 (UTC) X-FDA: 77702531970.03.boot46_5113d3727522 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 7EA5028A4E8 for ; Thu, 14 Jan 2021 00:43:25 +0000 (UTC) X-HE-Tag: boot46_5113d3727522 X-Filterd-Recvd-Size: 3572 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Jan 2021 00:43:24 +0000 (UTC) IronPort-SDR: rxfuplzpajBQ66YYU1rZlzy4Y8PNLI0aAszH6bcgMkZNNOLbUKS3JYf5+Hu54N14ot7M/WVGiD FVb/o9WCShKQ== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="165377730" X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="165377730" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:21 -0800 IronPort-SDR: 9Fkr+vtQiYNC6KHDg9P1BPiU9cebLEwQpH+qEGsboHEOeqGHMwn1rVIknyF06J25kU7XVFeJZt t66Bu++NFWuw== X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="348995326" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.25]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:21 -0800 Subject: [PATCH v4 2/5] mm: Teach pfn_to_online_page() to consider subsection validity From: Dan Williams To: akpm@linux-foundation.org Cc: Qian Cai , Michal Hocko , Oscar Salvador , David Hildenbrand , David Hildenbrand , Oscar Salvador , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Wed, 13 Jan 2021 16:43:21 -0800 Message-ID: <161058500148.1840162.4365921007820501696.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> References: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pfn_section_valid() determines pfn validity on subsection granularity where pfn_valid() may be limited to coarse section granularity. Explicitly validate subsections after pfn_valid() succeeds. Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()") Cc: Qian Cai Cc: Michal Hocko Cc: Oscar Salvador Reported-by: David Hildenbrand Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams Acked-by: Michal Hocko --- mm/memory_hotplug.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 55a69d4396e7..d0c81f7a3347 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -308,11 +308,26 @@ static int check_hotplug_memory_addressable(unsigned long pfn, struct page *pfn_to_online_page(unsigned long pfn) { unsigned long nr = pfn_to_section_nr(pfn); + struct mem_section *ms; + + if (nr >= NR_MEM_SECTIONS) + return NULL; + + ms = __nr_to_section(nr); + if (!online_section(ms)) + return NULL; + + /* + * Save some code text when online_section() + + * pfn_section_valid() are sufficient. + */ + if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn)) + return NULL; + + if (!pfn_section_valid(ms, pfn)) + return NULL; - if (nr < NR_MEM_SECTIONS && online_section_nr(nr) && - pfn_valid_within(pfn)) - return pfn_to_page(pfn); - return NULL; + return pfn_to_page(pfn); } EXPORT_SYMBOL_GPL(pfn_to_online_page); From patchwork Thu Jan 14 00:43:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12018041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E283C433E6 for ; Thu, 14 Jan 2021 00:43:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD03A233F6 for ; Thu, 14 Jan 2021 00:43:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD03A233F6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 604698D00AA; Wed, 13 Jan 2021 19:43:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 58D258D008E; Wed, 13 Jan 2021 19:43:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47BFA8D00AA; Wed, 13 Jan 2021 19:43:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id 2C2AA8D008E for ; Wed, 13 Jan 2021 19:43:30 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id EE145181AC9CB for ; Thu, 14 Jan 2021 00:43:29 +0000 (UTC) X-FDA: 77702532138.01.north98_3b0f53a27522 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id C8CAE10045EF3 for ; Thu, 14 Jan 2021 00:43:29 +0000 (UTC) X-HE-Tag: north98_3b0f53a27522 X-Filterd-Recvd-Size: 8408 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Jan 2021 00:43:29 +0000 (UTC) IronPort-SDR: Vg4dB9KangToFOOTG4GmSslAhqNQWhOSjr+NmX0L7fwg2+Vv7AE/2qQrCcXQ+tAKxeRN73jgBs GPD3IUos+hvA== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="158064388" X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="158064388" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:27 -0800 IronPort-SDR: 6B/1pjhIgbnqO6Bgy3EsWlDuY2b4N6/woJ2GKIlQqTCIBxfslsAHwCwYpy+CfsPJwL91i82Dvw MxBwIJHDBg6Q== X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="499434942" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.25]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:27 -0800 Subject: [PATCH v4 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , David Hildenbrand , David Hildenbrand , Oscar Salvador , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Wed, 13 Jan 2021 16:43:26 -0800 Message-ID: <161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> References: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While pfn_to_online_page() is able to determine pfn_valid() at subsection granularity it is not able to reliably determine if a given pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with ZONE_DEVICE. This means that pfn_to_online_page() may return invalid @page objects. For example with a memory map like: 100000000-1fbffffff : System RAM 142000000-143002e16 : Kernel code 143200000-143713fff : Kernel rodata 143800000-143b15b7f : Kernel data 144227000-144ffffff : Kernel bss 1fc000000-2fbffffff : Persistent Memory (legacy) 1fc000000-2fbffffff : namespace0.0 This command: echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page ...succeeds when it should fail. When it succeeds it touches an uninitialized page and may crash or cause other damage (see dissolve_free_huge_page()). While the memory map above is contrived via the memmap=ss!nn kernel command line option, the collision happens in practice on shipping platforms. The memory controller resources that decode spans of physical address space are a limited resource. One technique platform-firmware uses to conserve those resources is to share a decoder across 2 devices to keep the address range contiguous. Unfortunately the unit of operation of a decoder is 64MiB while the Linux section size is 128MiB. This results in situations where, without subsection hotplug memory mappings with different lifetimes collide into one object that can only express one lifetime. Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a section that mixes ZONE_DEVICE pfns with other online pfns. With SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall back to a slow-path check for ZONE_DEVICE pfns in an online section. In the fast path online_section() for a full ZONE_DEVICE section returns false. Because the collision case is rare, and for simplicity, the SECTION_TAINT_ZONE_DEVICE flag is never cleared once set. Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Cc: Andrew Morton Reported-by: Michal Hocko Reported-by: David Hildenbrand Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams Acked-by: Michal Hocko --- include/linux/mmzone.h | 22 +++++++++++++++------- mm/memory_hotplug.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b593316bff3d..0b5c44f730b4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1273,13 +1273,14 @@ extern size_t mem_section_usage_size(void); * which results in PFN_SECTION_SHIFT equal 6. * To sum it up, at least 6 bits are available. */ -#define SECTION_MARKED_PRESENT (1UL<<0) -#define SECTION_HAS_MEM_MAP (1UL<<1) -#define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_IS_EARLY (1UL<<3) -#define SECTION_MAP_LAST_BIT (1UL<<4) -#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 3 +#define SECTION_MARKED_PRESENT (1UL<<0) +#define SECTION_HAS_MEM_MAP (1UL<<1) +#define SECTION_IS_ONLINE (1UL<<2) +#define SECTION_IS_EARLY (1UL<<3) +#define SECTION_TAINT_ZONE_DEVICE (1UL<<4) +#define SECTION_MAP_LAST_BIT (1UL<<5) +#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) +#define SECTION_NID_SHIFT 3 static inline struct page *__section_mem_map_addr(struct mem_section *section) { @@ -1318,6 +1319,13 @@ static inline int online_section(struct mem_section *section) return (section && (section->section_mem_map & SECTION_IS_ONLINE)); } +static inline int online_device_section(struct mem_section *section) +{ + unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE; + + return section && ((section->section_mem_map & flags) == flags); +} + static inline int online_section_nr(unsigned long nr) { return online_section(__nr_to_section(nr)); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d0c81f7a3347..c78a1bef561b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -308,6 +308,7 @@ static int check_hotplug_memory_addressable(unsigned long pfn, struct page *pfn_to_online_page(unsigned long pfn) { unsigned long nr = pfn_to_section_nr(pfn); + struct dev_pagemap *pgmap; struct mem_section *ms; if (nr >= NR_MEM_SECTIONS) @@ -327,6 +328,22 @@ struct page *pfn_to_online_page(unsigned long pfn) if (!pfn_section_valid(ms, pfn)) return NULL; + if (!online_device_section(ms)) + return pfn_to_page(pfn); + + /* + * Slowpath: when ZONE_DEVICE collides with + * ZONE_{NORMAL,MOVABLE} within the same section some pfns in + * the section may be 'offline' but 'valid'. Only + * get_dev_pagemap() can determine sub-section online status. + */ + pgmap = get_dev_pagemap(pfn, NULL); + put_dev_pagemap(pgmap); + + /* The presence of a pgmap indicates ZONE_DEVICE offline pfn */ + if (pgmap) + return NULL; + return pfn_to_page(pfn); } EXPORT_SYMBOL_GPL(pfn_to_online_page); @@ -709,6 +726,14 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn; } + +static void section_taint_zone_device(unsigned long pfn) +{ + struct mem_section *ms = __pfn_to_section(pfn); + + ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE; +} + /* * Associate the pfn range with the given zone, initializing the memmaps * and resizing the pgdat/zone data to span the added pages. After this @@ -738,6 +763,19 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, resize_pgdat_range(pgdat, start_pfn, nr_pages); pgdat_resize_unlock(pgdat, &flags); + /* + * Subsection population requires care in pfn_to_online_page(). + * Set the taint to enable the slow path detection of + * ZONE_DEVICE pages in an otherwise ZONE_{NORMAL,MOVABLE} + * section. + */ + if (zone_idx(zone) == ZONE_DEVICE) { + if (!IS_ALIGNED(start_pfn, PAGES_PER_SECTION)) + section_taint_zone_device(start_pfn); + if (!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)) + section_taint_zone_device(start_pfn + nr_pages); + } + /* * TODO now we have a visible range of pages which are not associated * with their zone properly. Not nice but set_pfnblock_flags_mask From patchwork Thu Jan 14 00:43:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12018043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48611C433E6 for ; Thu, 14 Jan 2021 00:43:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F1930233F6 for ; Thu, 14 Jan 2021 00:43:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F1930233F6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 939AF8D00AB; Wed, 13 Jan 2021 19:43:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C1108D008E; Wed, 13 Jan 2021 19:43:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B1338D00AB; Wed, 13 Jan 2021 19:43:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id 613788D008E for ; Wed, 13 Jan 2021 19:43:35 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2CE8C180AD81A for ; Thu, 14 Jan 2021 00:43:35 +0000 (UTC) X-FDA: 77702532390.11.card22_430dfe727522 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 0F432180F8B81 for ; Thu, 14 Jan 2021 00:43:35 +0000 (UTC) X-HE-Tag: card22_430dfe727522 X-Filterd-Recvd-Size: 4062 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Jan 2021 00:43:34 +0000 (UTC) IronPort-SDR: hJsI4s0xamMqIF/frKrmOwabFDwKoXkqSnGrii6Pif2cUTzSBbQLX1h7DPId1i98l7SqKXE1Pu tnSc8I/90SCw== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174781372" X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="174781372" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:32 -0800 IronPort-SDR: JKUz1UfliHzulnNuEcOPcLkaxSEvNZWPjdOW53CuaeLmvtVpWrng+EQUTdyhmgbeO7NW7Ev8+G fR/oy8U7L/rA== X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="405000358" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.25]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:32 -0800 Subject: [PATCH v4 4/5] mm: Fix page reference leak in soft_offline_page() From: Dan Williams To: akpm@linux-foundation.org Cc: Naoya Horiguchi , Michal Hocko , David Hildenbrand , Oscar Salvador , stable@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Wed, 13 Jan 2021 16:43:32 -0800 Message-ID: <161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> References: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup. When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference. Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Cc: Andrew Morton Cc: Naoya Horiguchi Cc: Michal Hocko Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Cc: Signed-off-by: Dan Williams Reviewed-by: Naoya Horiguchi --- mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..78b173c7190c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page) return rc; } +static void put_ref_page(struct page *page) +{ + if (page) + put_page(page); +} + /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page) int soft_offline_page(unsigned long pfn, int flags) { int ret; - struct page *page; bool try_again = true; + struct page *page, *ref_page = NULL; + + WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED)); if (!pfn_valid(pfn)) return -ENXIO; + if (flags & MF_COUNT_INCREASED) + ref_page = pfn_to_page(pfn); + /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn); - if (!page) + if (!page) { + put_ref_page(ref_page); return -EIO; + } if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); - if (flags & MF_COUNT_INCREASED) - put_page(page); + put_ref_page(ref_page); return 0; } From patchwork Thu Jan 14 00:43:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12018045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D49DC433E0 for ; Thu, 14 Jan 2021 00:43:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ED1EE23406 for ; Thu, 14 Jan 2021 00:43:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ED1EE23406 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C3108D00AC; Wed, 13 Jan 2021 19:43:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 84B958D008E; Wed, 13 Jan 2021 19:43:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73AFE8D00AC; Wed, 13 Jan 2021 19:43:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id 5A9348D008E for ; Wed, 13 Jan 2021 19:43:41 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 232A7824556B for ; Thu, 14 Jan 2021 00:43:41 +0000 (UTC) X-FDA: 77702532642.11.cough74_520cdaf27522 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 04434180F8B81 for ; Thu, 14 Jan 2021 00:43:40 +0000 (UTC) X-HE-Tag: cough74_520cdaf27522 X-Filterd-Recvd-Size: 5068 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Jan 2021 00:43:39 +0000 (UTC) IronPort-SDR: 0hzx7k+nQMFaS9zj4Yplm2pzRpcd/Q0W5ZdItvLi8M0wV24GNMEQGcKc4IHYCI3glx2ayMmj/r Qj/V6I5pE2bQ== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="165963418" X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="165963418" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:38 -0800 IronPort-SDR: 5HKoXBKoyGb3n9NZ5CSoSgpslTMJGeu3Q/HP1qdDNEM0W8PUXDj/VZ5I9mv6uURdMVoGUcknp1 WxWFxf/T5a8g== X-IronPort-AV: E=Sophos;i="5.79,345,1602572400"; d="scan'208";a="568019431" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.25]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 16:43:38 -0800 Subject: [PATCH v4 5/5] mm: Fix memory_failure() handling of dax-namespace metadata From: Dan Williams To: akpm@linux-foundation.org Cc: Naoya Horiguchi , David Hildenbrand , David Hildenbrand , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Wed, 13 Jan 2021 16:43:37 -0800 Message-ID: <161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> References: <161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Given 'struct dev_pagemap' spans both data pages and metadata pages be careful to consult the altmap if present to delineate metadata. In fact the pfn_first() helper already identifies the first valid data pfn, so export that helper for other code paths via pgmap_pfn_valid(). Other usage of get_dev_pagemap() are not a concern because those are operating on known data pfns having been looked up by get_user_pages(). I.e. metadata pfns are never user mapped. Fixes: 6100e34b2526 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages") Cc: Naoya Horiguchi Cc: Andrew Morton Reported-by: David Hildenbrand Reviewed-by: David Hildenbrand Signed-off-by: Dan Williams Reviewed-by: Naoya Horiguchi --- include/linux/memremap.h | 6 ++++++ mm/memory-failure.c | 6 ++++++ mm/memremap.c | 15 +++++++++++++++ 3 files changed, 27 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 79c49e7f5c30..f5b464daeeca 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -137,6 +137,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap); struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap); +bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); @@ -165,6 +166,11 @@ static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, return NULL; } +static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn) +{ + return false; +} + static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) { return 0; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 78b173c7190c..541569cb4a99 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1308,6 +1308,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, */ put_page(page); + /* device metadata space is not recoverable */ + if (!pgmap_pfn_valid(pgmap, pfn)) { + rc = -ENXIO; + goto out; + } + /* * Prevent the inode from being freed while we are interrogating * the address_space, typically this would be handled by diff --git a/mm/memremap.c b/mm/memremap.c index 16b2fb482da1..2455bac89506 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -80,6 +80,21 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap, int range_id) return pfn + vmem_altmap_offset(pgmap_altmap(pgmap)); } +bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn) +{ + int i; + + for (i = 0; i < pgmap->nr_range; i++) { + struct range *range = &pgmap->ranges[i]; + + if (pfn >= PHYS_PFN(range->start) && + pfn <= PHYS_PFN(range->end)) + return pfn >= pfn_first(pgmap, i); + } + + return false; +} + static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id) { const struct range *range = &pgmap->ranges[range_id];