From patchwork Thu Dec 20 06:03:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qian Cai X-Patchwork-Id: 10738547 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5FE0B13A4 for ; Thu, 20 Dec 2018 06:03:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E270286A0 for ; Thu, 20 Dec 2018 06:03:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41B51286BE; Thu, 20 Dec 2018 06:03:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE2FD286A0 for ; Thu, 20 Dec 2018 06:03:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE4C08E0017; Thu, 20 Dec 2018 01:03:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E947B8E0003; Thu, 20 Dec 2018 01:03:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D83CD8E0017; Thu, 20 Dec 2018 01:03:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id AF4A78E0003 for ; Thu, 20 Dec 2018 01:03:16 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id r145so693599qke.20 for ; Wed, 19 Dec 2018 22:03:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=S7AD+kJd4p6mzjBi7n4tn/iYN+nLwvL9i58SOdg/oDc=; b=d0obQnI/L/QPG9PlqjXYDMo4ronKDIg9nC1NKMu3Cf6IvMZ0SpoSKrcm+lRtDlEoxC Iu1XuOm+pLnOAPpU98ru060Y2qN6rfcnRvip4A+jMr/VOXcUEOIqhQDVBw412lEvsBve oM08XXtCG+uVV9jbAhvTUaRMfCxGds6Nmcfi8PVnfSly/9tk9f6I24sh2rQ3Lff/GQgK haS4CxBjjrFyNwApm4RhE0QhvGmvBon8RhJin98ltzQk9v1vooS0Cazj2jh2AIMY2Z7F TsZdtAm920NEtO8XPuslgGqXxR/YwfWHBC1Bk/synold3V4XV+cz+de/lKGmg1bsH53a swbw== X-Gm-Message-State: AA+aEWap6craenL0j9jbNN5x6bwzQhuUzKN2bd+waBK/5kCHiblivqo8 R/UDmKF/D2Drxdp/g0VUkdRxGI5S/y1zHBaAN2v7R/2rGaQum0tHRaL6CZT4OLU7kXdoL2XRJGG 07P32D/FMJYtvS8C7QqNQi08oNVYTWCInJ+Ei1J7MSexZ+sSUQH+t1di/UL57/QqoxXqiQ8GAIr JhY6QH6y43mNqSEw3IuM5VN9KIBo3TfHpi5vitDOiEtl+Izb3Q1heGvwf6Ekoze733ZqM9Cjv1B bWCQHGVWYGZutDA3sf+ZLFcjlP1YSskM+Td86Hz7QfSF8/8konQbUhkLrWNzx92QtOoSZms/hPd U/HwPHB8AAdRuAWtR43lkiUprgt0a44jLUcA/sdoa4hVjfhQSd+kMmmPVl88MEtSf4YO+HY4eeH e X-Received: by 2002:ae9:e901:: with SMTP id x1mr2804725qkf.332.1545285796332; Wed, 19 Dec 2018 22:03:16 -0800 (PST) X-Received: by 2002:ae9:e901:: with SMTP id x1mr2804682qkf.332.1545285795389; Wed, 19 Dec 2018 22:03:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545285795; cv=none; d=google.com; s=arc-20160816; b=t5YlHe8eQtLXpGaTQpnTf3V1ip0A69wOU6IdFFU0MWIHA31xfhTkCRq4tv3TPnjII7 2TsyTyynchASEvdHLgujzea8pvGoMf2vMndTvcA37vw4NOt9UNnJoCSE2UCVcppshxah 8a3Na6w9zIYna8LgxbV9s2Dwwn4ZcjuE4MN44OziAigUoaWBJrTuZoPH4emdzGeCayCc CRdOgwi71AL6F7mzvz2L3TAD9sFo3uRgMPnDoUv0sDotMsQwCmqLaIJ5PcegwunsW2k/ 7eTqbc0yrHcLsAHX/48bnuuXxItPpw/ZCVdGGypRJY769GHKycllHyJEANZUHt0KrL3c STag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=S7AD+kJd4p6mzjBi7n4tn/iYN+nLwvL9i58SOdg/oDc=; b=cozvY3n9ddtofz+m+m3dkLCh97Vag27mQ3bFuPEUe4q4UeVBeSuPW2V6xUvFJu2lPr YaoUO2XE2QuXo+hCR5m3LTCL1wjzXfSHk9iyF+lD6pVYegCpuCc36jDDruuxAsR1M/ig sgs+DxrhXU7weBtvdjGtBLaR/y+W6le7A+TFpU1XRpuZEfWsQCGf7q3bjqjfDHl43bus RKFYdX4hKcRQDJ95avoXrPAn4/iB6L0PjuyJO6mEBMVPcwh3ff57jegVxWaAWBx6zK89 acu7IC82lmrVDdz5FkMZDBWWtfU6WN/tRaWU5fQZszKLbqV+ugDxa9nQRk3xMaYwdVrY /FCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=JsgWejxy; spf=pass (google.com: domain of cai@lca.pw designates 209.85.220.65 as permitted sender) smtp.mailfrom=cai@lca.pw Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id r41sor6132055qtj.20.2018.12.19.22.03.15 for (Google Transport Security); Wed, 19 Dec 2018 22:03:15 -0800 (PST) Received-SPF: pass (google.com: domain of cai@lca.pw designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=JsgWejxy; spf=pass (google.com: domain of cai@lca.pw designates 209.85.220.65 as permitted sender) smtp.mailfrom=cai@lca.pw DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=S7AD+kJd4p6mzjBi7n4tn/iYN+nLwvL9i58SOdg/oDc=; b=JsgWejxyHgBrhqzY4bMcbAjH/y5tAtKO/rs663eWum05p8zTzyYEZsP5ctOG2IJvsj 5QRerEmDNiYWAby3P71NsacY5wGZAXcgGmmWlKJJ1I7z8DxlrbPr2sFIo1G96HKDoitu Ikr4fdknv0ubvoSHCfbu/IuwVrRn/F8bSob7B/hlHQRSTgdf8dbzD7plidG4YXTTZxK5 r600u1KYp9uUhiXbg5Wv2kArPzqWsDBfnmWZgFAdAY78VqVzP9l8Tkv+4w0dOoeMeAip KnRYs612+dd+A7x4RrRVho1TYpjiGy0n2peS8smPeB3cZHpVISKXnhoILQrfs/ny2SX9 CN6g== X-Google-Smtp-Source: AFSGD/UdYVEIHmmvJHfNT+GHy4ampoc55G4GQy85Br9rm0g+5+7k8ZHeeNTIcTSUQFjcu1k2vsMV1w== X-Received: by 2002:aed:3f71:: with SMTP id q46mr25113030qtf.347.1545285794934; Wed, 19 Dec 2018 22:03:14 -0800 (PST) Received: from ovpn-120-55.rdu2.redhat.com (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id l33sm3580494qta.57.2018.12.19.22.03.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Dec 2018 22:03:14 -0800 (PST) From: Qian Cai To: akpm@linux-foundation.org Cc: Pavel.Tatashin@microsoft.com, mingo@kernel.org, mhocko@suse.com, hpa@zytor.com, mgorman@techsingularity.net, tglx@linutronix.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qian Cai Subject: [PATCH] mm/page_owner: fix for deferred struct page init Date: Thu, 20 Dec 2018 01:03:03 -0500 Message-Id: <20181220060303.38686-1-cai@lca.pw> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: References: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When booting a system with "page_owner=on", start_kernel page_ext_init invoke_init_callbacks init_section_page_ext init_page_owner init_early_allocated_pages init_zones_in_node init_pages_in_zone lookup_page_ext page_to_nid The issue here is that page_to_nid() will not work since some page flags have no node information until later in page_alloc_init_late() due to DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds access with an invalid nid. [ 8.666047] UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50 [ 8.672603] index 7 is out of range for type 'zone [5]' Also, kernel will panic since flags were poisoned earlier with, CONFIG_DEBUG_VM_PGFLAGS=y CONFIG_NODE_NOT_IN_PAGE_FLAGS=n start_kernel setup_arch pagetable_init paging_init sparse_init sparse_init_nid memblock_alloc_try_nid_raw Although later it tries to set page flags for pages in reserved bootmem regions, mm_init mem_init memblock_free_all free_low_memory_core_early reserve_bootmem_region there could still have some freed pages from the page allocator but yet to be initialized due to DEFERRED_STRUCT_PAGE_INIT. It have already been dealt with a bit in page_ext_init(). * Take into account DEFERRED_STRUCT_PAGE_INIT. */ if (early_pfn_to_nid(pfn) != nid) continue; However it did not handle it well in init_pages_in_zone() which end up calling page_to_nid(). [ 11.917212] page:ffffea0004200000 is uninitialized and poisoned [ 11.917220] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff [ 11.921745] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff [ 11.924523] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) [ 11.926498] page_owner info is not active (free page?) [ 12.329560] kernel BUG at include/linux/mm.h:990! [ 12.337632] RIP: 0010:init_page_owner+0x486/0x520 Since init_pages_in_zone() has already had the node information, there is no need to call page_to_nid() at all during the page ext lookup, and also replace calls that could incorrectly checked for poisoned page structs. It ends up wasting some memory to allocate page ext for those already freed pages, but there is no sane ways to tell those freed pages apart from uninitialized valid pages due to DEFERRED_STRUCT_PAGE_INIT. It looks quite reasonable on an arm64 server though. allocated 83230720 bytes of page_ext Node 0, zone DMA32: page owner found early allocated 0 pages Node 0, zone Normal: page owner found early allocated 2048214 pages Node 1, zone Normal: page owner found early allocated 2080641 pages Used more memory on a x86_64 server. allocated 334233600 bytes of page_ext Node 0, zone DMA: page owner found early allocated 2 pages Node 0, zone DMA32: page owner found early allocated 24303 pages Node 0, zone Normal: page owner found early allocated 7545357 pages Node 1, zone Normal: page owner found early allocated 8331279 pages Finally, rename get_entry() to get_ext_entry(), so it can be exported without a naming collision. Signed-off-by: Qian Cai --- include/linux/page_ext.h | 6 ++++++ mm/page_ext.c | 8 ++++---- mm/page_owner.c | 39 ++++++++++++++++++++++++++++++++------- 3 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index f84f167ec04c..e95cb6198014 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -51,6 +51,7 @@ static inline void page_ext_init(void) #endif struct page_ext *lookup_page_ext(const struct page *page); +struct page_ext *get_ext_entry(void *base, unsigned long index); #else /* !CONFIG_PAGE_EXTENSION */ struct page_ext; @@ -64,6 +65,11 @@ static inline struct page_ext *lookup_page_ext(const struct page *page) return NULL; } +static inline struct page_ext *get_ext_entry(void *base, unsigned long index) +{ + return NULL; +} + static inline void page_ext_init(void) { } diff --git a/mm/page_ext.c b/mm/page_ext.c index ae44f7adbe07..3cd8f0c13057 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -107,7 +107,7 @@ static unsigned long get_entry_size(void) return sizeof(struct page_ext) + extra_mem; } -static inline struct page_ext *get_entry(void *base, unsigned long index) +struct page_ext *get_ext_entry(void *base, unsigned long index) { return base + get_entry_size() * index; } @@ -137,7 +137,7 @@ struct page_ext *lookup_page_ext(const struct page *page) return NULL; index = pfn - round_down(node_start_pfn(page_to_nid(page)), MAX_ORDER_NR_PAGES); - return get_entry(base, index); + return get_ext_entry(base, index); } static int __init alloc_node_page_ext(int nid) @@ -207,7 +207,7 @@ struct page_ext *lookup_page_ext(const struct page *page) */ if (!section->page_ext) return NULL; - return get_entry(section->page_ext, pfn); + return get_ext_entry(section->page_ext, pfn); } static void *__meminit alloc_page_ext(size_t size, int nid) @@ -285,7 +285,7 @@ static void __free_page_ext(unsigned long pfn) ms = __pfn_to_section(pfn); if (!ms || !ms->page_ext) return; - base = get_entry(ms->page_ext, pfn); + base = get_ext_entry(ms->page_ext, pfn); free_page_ext(base); ms->page_ext = NULL; } diff --git a/mm/page_owner.c b/mm/page_owner.c index 87bc0dfdb52b..c27712c9a764 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -531,6 +531,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) unsigned long pfn = zone->zone_start_pfn; unsigned long end_pfn = zone_end_pfn(zone); unsigned long count = 0; + struct page_ext *base; /* * Walk the zone in pageblock_nr_pages steps. If a page block spans @@ -555,11 +556,11 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) if (!pfn_valid_within(pfn)) continue; - page = pfn_to_page(pfn); - - if (page_zone(page) != zone) + if (pfn < zone->zone_start_pfn || pfn >= end_pfn) continue; + page = pfn_to_page(pfn); + /* * To avoid having to grab zone->lock, be a little * careful when reading buddy page order. The only @@ -575,13 +576,37 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) continue; } - if (PageReserved(page)) +#ifdef CONFIG_SPARSEMEM + base = __pfn_to_section(pfn)->page_ext; +#else + base = pgdat->node_page_ext; +#endif + /* + * The sanity checks the page allocator does upon + * freeing a page can reach here before the page_ext + * arrays are allocated when feeding a range of pages to + * the allocator for the first time during bootup or + * memory hotplug. + */ + if (unlikely(!base)) continue; - page_ext = lookup_page_ext(page); - if (unlikely(!page_ext)) + /* + * Those pages reached here might had already been freed + * due to the deferred struct page init. + */ +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + if (pfn < pgdat->first_deferred_pfn) +#endif + if (PageReserved(page)) continue; - +#ifdef CONFIG_SPARSEMEM + page_ext = get_ext_entry(base, pfn); +#else + page_ext = get_ext_entry(base, pfn - + round_down(pgdat->node_start_pfn, + MAX_ORDER_NR_PAGES)); +#endif /* Maybe overlapping zone */ if (test_bit(PAGE_EXT_OWNER, &page_ext->flags)) continue;