From patchwork Mon Dec 9 17:48:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 11279777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E24CD14B7 for ; Mon, 9 Dec 2019 17:48:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A6B642077B for ; Mon, 9 Dec 2019 17:48:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fR9binU0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6B642077B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 548546B27E8; Mon, 9 Dec 2019 12:48:52 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4F8ED6B27E9; Mon, 9 Dec 2019 12:48:52 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E86C6B27EA; Mon, 9 Dec 2019 12:48:52 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id 292B76B27E8 for ; Mon, 9 Dec 2019 12:48:52 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id C869A282C for ; Mon, 9 Dec 2019 17:48:51 +0000 (UTC) X-FDA: 76246338462.13.look84_6a60d1e066113 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,david@redhat.com,:linux-kernel@vger.kernel.org::david@redhat.com:stable@vger.kernel.org:n-horiguchi@ah.jp.nec.com:pasha.tatashin@oracle.com:akpm@linux-foundation.org:steven.sistare@oracle.com:mhocko@suse.com:daniel.m.jordan@oracle.com:bob.picco@oracle.com:osalvador@suse.de,RULES_HIT:30003:30012:30029:30045:30054:30056:30064,0,RBL:205.139.110.61:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: look84_6a60d1e066113 X-Filterd-Recvd-Size: 7427 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Mon, 9 Dec 2019 17:48:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575913730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5j9y6uTzLH6AC77P2ufw4QrIfhyaBDvoWP5b6gJS+oc=; b=fR9binU0UU8PsjwemD+N7GlcTUGRDom3vaCWYaWIpRNwVhvuN9kXKJlWPMnFrnFKxkxkov 4ZwIfT00HNtP7g7gfQcS6AaXZeKZmIMwhqWLZsGH7iejU1RP9OpTGGkMtj5XstVrycMvJ4 2v7tBZV9Fz7mCwkLZWmHhXlb0EjvhBs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-362-Fe-49BicNvahBHQY8FMnhg-1; Mon, 09 Dec 2019 12:48:47 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 872581005514; Mon, 9 Dec 2019 17:48:45 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-214.ams2.redhat.com [10.36.116.214]) by smtp.corp.redhat.com (Postfix) with ESMTP id 143951001B03; Mon, 9 Dec 2019 17:48:42 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , stable@vger.kernel.org, Naoya Horiguchi , Pavel Tatashin , Andrew Morton , Steven Sistare , Michal Hocko , Daniel Jordan , Bob Picco , Oscar Salvador Subject: [PATCH v1 1/3] mm: fix uninitialized memmaps on a partially populated last section Date: Mon, 9 Dec 2019 18:48:34 +0100 Message-Id: <20191209174836.11063-2-david@redhat.com> In-Reply-To: <20191209174836.11063-1-david@redhat.com> References: <20191209174836.11063-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: Fe-49BicNvahBHQY8FMnhg-1 X-Mimecast-Spam-Score: 0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If max_pfn is not aligned to a section boundary, we can easily run into BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a memory size that is not a multiple of 128MB (e.g., 4097MB, but also 4160MB). I was told that on real HW, we can easily have this scenario (esp., one of the main reasons sub-section hotadd of devmem was added). The issue is, that we have a valid memmap (pfn_valid()) for the whole section, and the whole section will be marked "online". pfn_to_online_page() will succeed, but the memmap contains garbage. E.g., doing a "cat /proc/kpageflags > /dev/null" results in [ 303.218313] BUG: unable to handle page fault for address: fffffffffffffffe [ 303.218899] #PF: supervisor read access in kernel mode [ 303.219344] #PF: error_code(0x0000) - not-present page [ 303.219787] PGD 12614067 P4D 12614067 PUD 12616067 PMD 0 [ 303.220266] Oops: 0000 [#1] SMP NOPTI [ 303.220587] CPU: 0 PID: 424 Comm: cat Not tainted 5.4.0-next-20191128+ #17 [ 303.221169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 303.222140] RIP: 0010:stable_page_flags+0x4d/0x410 [ 303.222554] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f [ 303.224135] RSP: 0018:ffff9f5980187e58 EFLAGS: 00010202 [ 303.224576] RAX: fffffffffffffffe RBX: ffffda1285004000 RCX: ffff9f5980187dd4 [ 303.225178] RDX: 0000000000000001 RSI: ffffffff92662420 RDI: 0000000000000246 [ 303.225789] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 [ 303.226405] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f31d070e000 [ 303.227012] R13: 0000000000140100 R14: 00007f31d070e800 R15: ffffda1285004000 [ 303.227629] FS: 00007f31d08f6580(0000) GS:ffff90a6bba00000(0000) knlGS:0000000000000000 [ 303.228329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 303.228820] CR2: fffffffffffffffe CR3: 00000001332a2000 CR4: 00000000000006f0 [ 303.229438] Call Trace: [ 303.229654] kpageflags_read.cold+0x57/0xf0 [ 303.230016] proc_reg_read+0x3c/0x60 [ 303.230332] vfs_read+0xc2/0x170 [ 303.230614] ksys_read+0x65/0xe0 [ 303.230898] do_syscall_64+0x5c/0xa0 [ 303.231216] entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch fixes that by at least zero-ing out that memmap (so e.g., page_to_pfn() will not crash). Commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages") tried to fix a similar issue, but forgot to consider this special case. After this patch, there are still problems to solve. E.g., not all of these pages falling into a memory hole will actually get initialized later and set PageReserved - they are only zeroed out - but at least the immediate crashes are gone. A follow-up patch will take care of this. Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Cc: # v4.15+ Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Andrew Morton Cc: Steven Sistare Cc: Michal Hocko Cc: Daniel Jordan Cc: Bob Picco Cc: Oscar Salvador Signed-off-by: David Hildenbrand Tested-by: Daniel Jordan --- mm/page_alloc.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 62dcd6b76c80..1eb2ce7c79e4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6932,7 +6932,8 @@ static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn) * This function also addresses a similar issue where struct pages are left * uninitialized because the physical address range is not covered by * memblock.memory or memblock.reserved. That could happen when memblock - * layout is manually configured via memmap=. + * layout is manually configured via memmap=, or when the highest physical + * address (max_pfn) does not end on a section boundary. */ void __init zero_resv_unavail(void) { @@ -6950,7 +6951,16 @@ void __init zero_resv_unavail(void) pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start)); next = end; } - pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn); + + /* + * Early sections always have a fully populated memmap for the whole + * section - see pfn_valid(). If the last section has holes at the + * end and that section is marked "online", the memmap will be + * considered initialized. Make sure that memmap has a well defined + * state. + */ + pgcnt += zero_pfn_range(PFN_DOWN(next), + round_up(max_pfn, PAGES_PER_SECTION)); /* * Struct pages that do not have backing memory. This could be because From patchwork Mon Dec 9 17:48:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 11279781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C96431575 for ; Mon, 9 Dec 2019 17:48:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8DF1B20836 for ; Mon, 9 Dec 2019 17:48:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KpZj5Y9p" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DF1B20836 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A80866B27E9; Mon, 9 Dec 2019 12:48:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 980026B27EC; Mon, 9 Dec 2019 12:48:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 743086B27ED; Mon, 9 Dec 2019 12:48:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 57DBB6B27EB for ; Mon, 9 Dec 2019 12:48:54 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 06C47180AD81A for ; Mon, 9 Dec 2019 17:48:54 +0000 (UTC) X-FDA: 76246338588.17.crush25_6ac3950ae4d4b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,david@redhat.com,:linux-kernel@vger.kernel.org::david@redhat.com:adobriyan@gmail.com:akpm@linux-foundation.org:osalvador@suse.de:mhocko@kernel.org:sfr@canb.auug.org.au:linux-fsdevel@vger.kernel.org,RULES_HIT:30045:30051:30054:30070,0,RBL:207.211.31.120:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: crush25_6ac3950ae4d4b X-Filterd-Recvd-Size: 6353 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Mon, 9 Dec 2019 17:48:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575913733; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lt4hpqX7ZVZhkeNrxUdXUCr9dF2T52+7cfTs1UM8LG0=; b=KpZj5Y9pipc6BL5BQNDcVQLW+G24hwxBWxYlJeD4b065f8Vsv1Wocslx+vhwYb6tTKz9ck oxSeF+n7VsxKGcjiVcgUDup6tyO3TnMfye2FmPd+2xaxnznxf2d3sYGdxcRoT453joY3II E6mNbYa1v9cTllMxWx3oSv+YnG+e5+4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-127-WurNm-wkPmmW_qXyqjEgnA-1; Mon, 09 Dec 2019 12:48:49 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 08BAB107ACC9; Mon, 9 Dec 2019 17:48:48 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-214.ams2.redhat.com [10.36.116.214]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD7121001B03; Mon, 9 Dec 2019 17:48:45 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Alexey Dobriyan , Andrew Morton , Oscar Salvador , Michal Hocko , Stephen Rothwell , linux-fsdevel@vger.kernel.org Subject: [PATCH v1 2/3] fs/proc/page.c: allow inspection of last section and fix end detection Date: Mon, 9 Dec 2019 18:48:35 +0100 Message-Id: <20191209174836.11063-3-david@redhat.com> In-Reply-To: <20191209174836.11063-1-david@redhat.com> References: <20191209174836.11063-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: WurNm-wkPmmW_qXyqjEgnA-1 X-Mimecast-Spam-Score: 0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If max_pfn does not fall onto a section boundary, it is possible to inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn itself can't be inspected. We can have a valid (and online) memmap at and above max_pfn if max_pfn is not aligned to a section boundary. The whole early section has a memmap and is marked online. Being able to inspect the state of these PFNs is valuable for debugging, especially because max_pfn can change on memory hotplug and expose these memmaps. Also, querying page flags via "./page-types -r -a 0x144001," (tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU results in an (almost) endless loop in user space, because the end is not detected properly when starting after max_pfn. Instead, let's allow to inspect all pages in the highest section and return 0 directly if we try to access pages above that section. While at it, check the count before adjusting it, to avoid masking user errors. Cc: Alexey Dobriyan Cc: Andrew Morton Cc: Oscar Salvador Cc: Michal Hocko Cc: Stephen Rothwell Cc: linux-fsdevel@vger.kernel.org Signed-off-by: David Hildenbrand Reported-by: kbuild test robot Reported-by: kbuild test robot --- fs/proc/page.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/proc/page.c b/fs/proc/page.c index e40dbfe1168e..da01d3d9999a 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -29,6 +29,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + const unsigned long max_dump_pfn = round_up(max_pfn, PAGES_PER_SECTION); u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; @@ -37,9 +38,11 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, u64 pcount; pfn = src / KPMSIZE; - count = min_t(size_t, count, (max_pfn * KPMSIZE) - src); if (src & KPMMASK || count & KPMMASK) return -EINVAL; + if (src >= max_dump_pfn * KPMSIZE) + return 0; + count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src); while (count > 0) { /* @@ -208,6 +211,7 @@ u64 stable_page_flags(struct page *page) static ssize_t kpageflags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + const unsigned long max_dump_pfn = round_up(max_pfn, PAGES_PER_SECTION); u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; @@ -215,9 +219,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf, ssize_t ret = 0; pfn = src / KPMSIZE; - count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); if (src & KPMMASK || count & KPMMASK) return -EINVAL; + if (src >= max_dump_pfn * KPMSIZE) + return 0; + count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src); while (count > 0) { /* @@ -253,6 +259,7 @@ static const struct file_operations proc_kpageflags_operations = { static ssize_t kpagecgroup_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + const unsigned long max_dump_pfn = round_up(max_pfn, PAGES_PER_SECTION); u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; @@ -261,9 +268,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf, u64 ino; pfn = src / KPMSIZE; - count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); if (src & KPMMASK || count & KPMMASK) return -EINVAL; + if (src >= max_dump_pfn * KPMSIZE) + return 0; + count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src); while (count > 0) { /* From patchwork Mon Dec 9 17:48:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 11279779 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6445F1575 for ; Mon, 9 Dec 2019 17:48:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1C22420836 for ; Mon, 9 Dec 2019 17:48:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V+fWroDq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C22420836 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 82DD26B27EA; Mon, 9 Dec 2019 12:48:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7B6D76B27E9; Mon, 9 Dec 2019 12:48:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C7A06B27E9; Mon, 9 Dec 2019 12:48:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id 424366B27E9 for ; Mon, 9 Dec 2019 12:48:54 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id E050A8249980 for ; Mon, 9 Dec 2019 17:48:53 +0000 (UTC) X-FDA: 76246338546.15.lead18_6abc5aacf3059 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,david@redhat.com,:linux-kernel@vger.kernel.org::david@redhat.com:akpm@linux-foundation.org:osalvador@suse.de:mhocko@kernel.org:dan.j.williams@intel.com:n-horiguchi@ah.jp.nec.com,RULES_HIT:30003:30034:30045:30054:30064:30070:30090,0,RBL:207.211.31.120:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: lead18_6abc5aacf3059 X-Filterd-Recvd-Size: 9365 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Mon, 9 Dec 2019 17:48:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575913732; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JIMYiSqspAv8n61S8tgbkZ68HfWcOMbw8/ayJuQpZ20=; b=V+fWroDq7ImftUmpnRueGkPTEGxSG4J5m/u1oceKMLdCpNnpbdnxDQtJNdMW1EZNXdgX3H pCtnZKxaotIeHdFFKjbmkDzk1bjLHRE3XyEn9M3nOxcv5cK4u0Uxa+vE9rw/XdMdnaBTtu s4x/Vc2ML43qwR0fw+gl1wfG8zaZbhs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-195-J57-AI3yNo22k9uftEx15w-1; Mon, 09 Dec 2019 12:48:51 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B6B6801E77; Mon, 9 Dec 2019 17:48:50 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-214.ams2.redhat.com [10.36.116.214]) by smtp.corp.redhat.com (Postfix) with ESMTP id 580941001938; Mon, 9 Dec 2019 17:48:48 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Oscar Salvador , Michal Hocko , Dan Williams , Naoya Horiguchi Subject: [PATCH v1 3/3] mm: initialize memmap of unavailable memory directly Date: Mon, 9 Dec 2019 18:48:36 +0100 Message-Id: <20191209174836.11063-4-david@redhat.com> In-Reply-To: <20191209174836.11063-1-david@redhat.com> References: <20191209174836.11063-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: J57-AI3yNo22k9uftEx15w-1 X-Mimecast-Spam-Score: 0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's make sure that all memory holes are actually marked PageReserved(), that page_to_pfn() produces reliable results, and that these pages are not detected as "mmap" pages due to the mapcount. E.g., booting a x86-64 QEMU guest with 4160 MB: [ 0.010585] Early memory node ranges [ 0.010586] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.010588] node 0: [mem 0x0000000000100000-0x00000000bffdefff] [ 0.010589] node 0: [mem 0x0000000100000000-0x0000000143ffffff] max_pfn is 0x144000. Before this change: [root@localhost ~]# ./page-types -r -a 0x144000, flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000800 16384 64 ___________M_______________________________ mmap total 16384 64 After this change: [root@localhost ~]# ./page-types -r -a 0x144000, flags page-count MB symbolic-flags long-symbolic-flags 0x0000000100000000 16384 64 ___________________________r_______________ reserved total 16384 64 IOW, especially the unavailable physical memory ("memory hole") in the last section would not get properly marked PageReserved() and is indicated to be "mmap" memory. Drop the trace of that function from include/linux/mm.h - nobody else needs it, and rename it accordingly. Note: The fake zone/node might not be covered by the zone/node span. This is not an urgent issue (for now, we had the same node/zone due to the zeroing). We'll need a clean way to mark memory holes (e.g., using a page type PageHole() if possible or a fake ZONE_INVALID) and eventually stop marking these memory holes PageReserved(). Cc: Andrew Morton Cc: Oscar Salvador Cc: Michal Hocko Cc: Dan Williams Cc: Naoya Horiguchi Signed-off-by: David Hildenbrand --- include/linux/mm.h | 6 ------ mm/page_alloc.c | 33 ++++++++++++++++++++++----------- 2 files changed, 22 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5dfbc0e56e67..93ee776c2a1e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2176,12 +2176,6 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn, struct mminit_pfnnid_cache *state); #endif -#if !defined(CONFIG_FLAT_NODE_MEM_MAP) -void zero_resv_unavail(void); -#else -static inline void zero_resv_unavail(void) {} -#endif - extern void set_dma_reserve(unsigned long new_dma_reserve); extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, enum memmap_context, struct vmem_altmap *); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1eb2ce7c79e4..85064abafcc3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6901,10 +6901,10 @@ void __init free_area_init_node(int nid, unsigned long *zones_size, #if !defined(CONFIG_FLAT_NODE_MEM_MAP) /* - * Zero all valid struct pages in range [spfn, epfn), return number of struct - * pages zeroed + * Initialize all valid struct pages in the range [spfn, epfn) and mark them + * PageReserved(). Return the number of struct pages that were initialized. */ -static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn) +static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn) { unsigned long pfn; u64 pgcnt = 0; @@ -6915,7 +6915,13 @@ static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn) + pageblock_nr_pages - 1; continue; } - mm_zero_struct_page(pfn_to_page(pfn)); + /* + * Use a fake node/zone (0) for now. Some of these pages + * (in memblock.reserved but not in memblock.memory) will + * get re-initialized via reserve_bootmem_region() later. + */ + __init_single_page(pfn_to_page(pfn), pfn, 0, 0); + __SetPageReserved(pfn_to_page(pfn)); pgcnt++; } @@ -6927,7 +6933,7 @@ static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn) * initialized by going through __init_single_page(). But, there are some * struct pages which are reserved in memblock allocator and their fields * may be accessed (for example page_to_pfn() on some configuration accesses - * flags). We must explicitly zero those struct pages. + * flags). We must explicitly initialize those struct pages. * * This function also addresses a similar issue where struct pages are left * uninitialized because the physical address range is not covered by @@ -6935,7 +6941,7 @@ static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn) * layout is manually configured via memmap=, or when the highest physical * address (max_pfn) does not end on a section boundary. */ -void __init zero_resv_unavail(void) +static void __init init_unavailable_mem(void) { phys_addr_t start, end; u64 i, pgcnt; @@ -6948,7 +6954,8 @@ void __init zero_resv_unavail(void) for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { if (next < start) - pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start)); + pgcnt += init_unavailable_range(PFN_DOWN(next), + PFN_UP(start)); next = end; } @@ -6959,8 +6966,8 @@ void __init zero_resv_unavail(void) * considered initialized. Make sure that memmap has a well defined * state. */ - pgcnt += zero_pfn_range(PFN_DOWN(next), - round_up(max_pfn, PAGES_PER_SECTION)); + pgcnt += init_unavailable_range(PFN_DOWN(next), + round_up(max_pfn, PAGES_PER_SECTION)); /* * Struct pages that do not have backing memory. This could be because @@ -6969,6 +6976,10 @@ void __init zero_resv_unavail(void) if (pgcnt) pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt); } +#else +static inline void __init init_unavailable_mem(void) +{ +} #endif /* !CONFIG_FLAT_NODE_MEM_MAP */ #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP @@ -7398,7 +7409,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); - zero_resv_unavail(); + init_unavailable_mem(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, NULL, @@ -7593,7 +7604,7 @@ void __init set_dma_reserve(unsigned long new_dma_reserve) void __init free_area_init(unsigned long *zones_size) { - zero_resv_unavail(); + init_unavailable_mem(); free_area_init_node(0, zones_size, __pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL); }