From patchwork Tue Oct 2 14:38:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Masayoshi Mizuma X-Patchwork-Id: 10623779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F9BA17E0 for ; Tue, 2 Oct 2018 14:38:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A9C32018E for ; Tue, 2 Oct 2018 14:38:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0E7E5201B0; Tue, 2 Oct 2018 14:38:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D11DF2018E for ; Tue, 2 Oct 2018 14:38:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5D976B0266; Tue, 2 Oct 2018 10:38:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A0E026B0269; Tue, 2 Oct 2018 10:38:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AFAF6B026A; Tue, 2 Oct 2018 10:38:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 5C11C6B0266 for ; Tue, 2 Oct 2018 10:38:35 -0400 (EDT) Received: by mail-qt1-f200.google.com with SMTP id n1-v6so1836602qtb.17 for ; Tue, 02 Oct 2018 07:38:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=yMIZaky490zrCi7TzYwl7/YRlXWpbT5ib/m7bZER82s=; b=GlM6cL8BPPlHl3JK35q8zON2S3+dkQdPvYzoAT1r/OrrqmQ+NxJqO2GrxV8IVUY1xE mIeHMe+h51FiBsiT5yxLl+7jCdEmXjJOont5mRfkdgj0reo5ch5YB9wiXDazsAANsts2 jKmQhfjVnixGk7DdiY0gXsRWt8n02dIfSLbzGQH2fMuX3hytCkLcpTt2NZjzQ7rtYQu2 n7sGDzBV4/XgCdL1bg5iBYJ/INhqI2QwsdaCgdzje9dmTfT3uQQS3uQruIIGSKoq/3dM uJq2s4PaDhK0eZ//gQFfZ5te6im30A06dYTEB8wh3Ln3ElWCGR6JiHY/5DXCjeHJj9x0 9lMA== X-Gm-Message-State: ABuFfohbwk67G+SwXOv3MdXoWuDASnkcy+6M5t7x6f68yWsT9AhedKXT tvTL68j8YYIWBe2aVKNHWCcT0ZjtjZTuLPI+oGFoA4Sn1UaEzXmCyI/8NEAEa+AumvTzEZ7rK0a elMS3AswmUuJe6lzupinzr1SR0Yeo+dduANnSfFCCUUtl7Hfw2Ds4pfrj2sCE+0PD3Bx0s7/z4B XVXBu+pS/xj4Z3nJh+v0mpEgnB00m2nsct0WztrDNRcxNIZ47x4qSk+IXD35XsUJO/syTd03x/i BqyRfAd+vwdTMfVZ5xuLFAU2njagSXfuzDfRK+7gxlO/fJU5HaopDMpFe/YMkHE7JS9eX1KAfPX lN4kh1vXtNsYaoaG5gB6qMv1ScjCl2qBZqKiqx5iLBntVn7bntN6FYOvnt0VdIVrm21q+wJXHO3 f X-Received: by 2002:ac8:36bd:: with SMTP id a58-v6mr12731577qtc.236.1538491115037; Tue, 02 Oct 2018 07:38:35 -0700 (PDT) X-Received: by 2002:ac8:36bd:: with SMTP id a58-v6mr12731521qtc.236.1538491114023; Tue, 02 Oct 2018 07:38:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538491114; cv=none; d=google.com; s=arc-20160816; b=G79DGA9LRbAjun3/EBb+UYAi86iCfRBf00RE6aSCBIxRlog4Z/NXfB+sGyK8CjJFhm 1ki0jenmjRDLafexU842T9x1k1i+tmxebjXZXTry6bnVEWTmlpOI8IomhiNCIqjeEKQs Uxe72EiH3NmHNFdj3gS2QvfGq4Y49ozG4WxucAXKetEkuc0yQ6zOp13mGEy6rkpKqJn4 9X9N5dmosgTFhZynD+WfoOLpSXByLgP+6SmKJ/HgPOeOld+6ZamthpyIEpR5GkvLPW5k wGBcXcBdz/UH2bTa2o3Y6xaJH5mIBdN9fKFKEriFV4Fd2vyDwAVcaPdTG8spW1DlA43z zfCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yMIZaky490zrCi7TzYwl7/YRlXWpbT5ib/m7bZER82s=; b=eCKdrMHnME6ZQqzunKvseS5svwGQsDkPkaZkjWJSaMrl5bZFufrsAU5qycqq7pcfHe H07vmX76D2FvbifW16//VuQWB4cv0ZNonyPW9ntFgv0duprMt0+8D+hTIGmuMaKuyn9d owSdv9fA1OQ5PPsgkLhmXSGoo0KlHdZ+74014r95KSlNdTQWmbPnEZcbqtlJQjJrAcWx Y5ujPJGEx5KvPj6wsjS/KjCzvUX/kfNZqFex8wJK4qcVC75Yf6Kooe9LmQ9+60UW6DUp KjLJB8Js5gBaQu6CW/qzVUFck9bBRdc9jjsslg4/fMf6aRmImFurF2rNqnPs8leWLb1H OHfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cuz0a+GS; spf=pass (google.com: domain of msys.mizuma@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=msys.mizuma@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id w55-v6sor4540860qvj.38.2018.10.02.07.38.33 for (Google Transport Security); Tue, 02 Oct 2018 07:38:33 -0700 (PDT) Received-SPF: pass (google.com: domain of msys.mizuma@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cuz0a+GS; spf=pass (google.com: domain of msys.mizuma@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=msys.mizuma@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=yMIZaky490zrCi7TzYwl7/YRlXWpbT5ib/m7bZER82s=; b=cuz0a+GSwQi3seaaVMGf/6wlD71ZxngSFqSs4WHQ5HjZVfoNdhOy9HMkVpqVOW/FJe ldyeq99JbzKwaBIMhYJ0Pat6iSwtyLQ9DASwsY/ItbwTEXAmZe9pWMm/CUGtDRPhs/GI moVzvrhvQcpVlZzRhyG/lR/HW8woHnDBIVb9PoWxRSUp+VxC05hVpIAg1slKsydFb6pt seryXFZKMsZh8B3t+J0cGC9ny0z7eryJqis72L2BkbjYtKzwBav6s4I4pBqsfSJw2k0L QtcQFKXHjZWqsP5yz4OcBJZmdceIjs9A5+bpnBOdUxJG5SodLKxvulJAl5r115E78J0V FzCw== X-Google-Smtp-Source: ACcGV60K+6UpMhKtrI6m/7BR08567bCeBAIXliCc498aBMPIiSFONjzGRtj6slWMPknpxoqkK0w0cw== X-Received: by 2002:a0c:a809:: with SMTP id w9-v6mr12422242qva.206.1538491113547; Tue, 02 Oct 2018 07:38:33 -0700 (PDT) Received: from gabell.bos.redhat.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id y20-v6sm5257538qkb.46.2018.10.02.07.38.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Oct 2018 07:38:33 -0700 (PDT) From: Masayoshi Mizuma To: linux-mm@kvack.org, Naoya Horiguchi , Pavel Tatashin , Michal Hocko , Thomas Gleixner , Ingo Molnar Cc: Masayoshi Mizuma , linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v3 1/3] mm: zero remaining unavailable struct pages Date: Tue, 2 Oct 2018 10:38:19 -0400 Message-Id: <20181002143821.5112-2-msys.mizuma@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181002143821.5112-1-msys.mizuma@gmail.com> References: <20181002143821.5112-1-msys.mizuma@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Naoya Horiguchi There is a kernel panic that is triggered when reading /proc/kpageflags on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': BUG: unable to handle kernel paging request at fffffffffffffffe PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 RIP: 0010:stable_page_flags+0x27/0x3c0 Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202 RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0 RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001 R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0 R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10 FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0 Call Trace: kpageflags_read+0xc7/0x120 proc_reg_read+0x3c/0x60 __vfs_read+0x36/0x170 vfs_read+0x89/0x130 ksys_pread64+0x71/0x90 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc42e75e23 Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 According to kernel bisection, this problem became visible due to commit f7f99100d8d9 which changes how struct pages are initialized. Memblock layout affects the pfn ranges covered by node/zone. Consider that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the default (no memmap= given) memblock layout is like below: MEMBLOCK configuration: memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x4 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0 memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... If you give memmap=1G!4G (so it just covers memory[0x2]), the range [0x100000000-0x13fffffff] is gone: MEMBLOCK configuration: memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x3 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... This causes shrinking node 0's pfn range because it is calculated by the address range of memblock.memory. So some of struct pages in the gap range are left uninitialized. We have a function zero_resv_unavail() which does zeroing the struct pages outside memblock.memory, but currently it covers only the reserved unavailable range (i.e. memblock.memory && !memblock.reserved). This patch extends it to cover all unavailable range, which fixes the reported issue. Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Naoya Horiguchi Tested-by: Oscar Salvador Tested-by: Masayoshi Mizuma Reviewed-by: Pavel Tatashin --- include/linux/memblock.h | 15 --------------- mm/page_alloc.c | 36 +++++++++++++++++++++++++----------- 2 files changed, 25 insertions(+), 26 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 5169205..2acdd04 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -265,21 +265,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \ nid, flags, p_start, p_end, p_nid) -/** - * for_each_resv_unavail_range - iterate through reserved and unavailable memory - * @i: u64 used as loop variable - * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL - * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL - * - * Walks over unavailable but reserved (reserved && !memory) areas of memblock. - * Available as soon as memblock is initialized. - * Note: because this memory does not belong to any physical node, flags and - * nid arguments do not make sense and thus not exported as arguments. - */ -#define for_each_resv_unavail_range(i, p_start, p_end) \ - for_each_mem_range(i, &memblock.reserved, &memblock.memory, \ - NUMA_NO_NODE, MEMBLOCK_NONE, p_start, p_end, NULL) - static inline void memblock_set_region_flags(struct memblock_region *r, enum memblock_flags flags) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89d2a2a..3b9d89e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6446,29 +6446,42 @@ void __init free_area_init_node(int nid, unsigned long *zones_size, * struct pages which are reserved in memblock allocator and their fields * may be accessed (for example page_to_pfn() on some configuration accesses * flags). We must explicitly zero those struct pages. + * + * This function also addresses a similar issue where struct pages are left + * uninitialized because the physical address range is not covered by + * memblock.memory or memblock.reserved. That could happen when memblock + * layout is manually configured via memmap=. */ void __init zero_resv_unavail(void) { phys_addr_t start, end; unsigned long pfn; u64 i, pgcnt; + phys_addr_t next = 0; /* - * Loop through ranges that are reserved, but do not have reported - * physical memory backing. + * Loop through unavailable ranges not covered by memblock.memory. */ pgcnt = 0; - for_each_resv_unavail_range(i, &start, &end) { - for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) { - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) - + pageblock_nr_pages - 1; - continue; + for_each_mem_range(i, &memblock.memory, NULL, + NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { + if (next < start) { + for (pfn = PFN_DOWN(next); pfn < PFN_UP(start); pfn++) { + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) + continue; + mm_zero_struct_page(pfn_to_page(pfn)); + pgcnt++; } - mm_zero_struct_page(pfn_to_page(pfn)); - pgcnt++; } + next = end; } + for (pfn = PFN_DOWN(next); pfn < max_pfn; pfn++) { + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) + continue; + mm_zero_struct_page(pfn_to_page(pfn)); + pgcnt++; + } + /* * Struct pages that do not have backing memory. This could be because @@ -6478,7 +6491,8 @@ void __init zero_resv_unavail(void) * this code can be removed. */ if (pgcnt) - pr_info("Reserved but unavailable: %lld pages", pgcnt); + pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt); + } #endif /* CONFIG_HAVE_MEMBLOCK && !CONFIG_FLAT_NODE_MEM_MAP */