From patchwork Fri Jan 31 06:13:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11359225 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1A0B139A for ; Fri, 31 Jan 2020 06:14:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7E15A2173E for ; Fri, 31 Jan 2020 06:14:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="OlsmYacg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E15A2173E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 43A5A6B0523; Fri, 31 Jan 2020 01:14:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 39B616B0525; Fri, 31 Jan 2020 01:14:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 260A16B0526; Fri, 31 Jan 2020 01:14:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0127.hostedemail.com [216.40.44.127]) by kanga.kvack.org (Postfix) with ESMTP id 0793C6B0523 for ; Fri, 31 Jan 2020 01:14:00 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B2441181AEF09 for ; Fri, 31 Jan 2020 06:13:59 +0000 (UTC) X-FDA: 76436913798.16.size85_22d3dfb68be5c X-Spam-Summary: 2,0,0,4eba37a42ad9efbd,d41d8cd98f00b204,akpm@linux-foundation.org,:akpm@linux-foundation.org:bhe@redhat.com:dan.j.williams@intel.com:david@redhat.com:kirill.shutemov@linux.intel.com:kirill@shutemov.name::mgorman@suse.de:mhocko@kernel.org:mhocko@suse.com:mm-commits@vger.kernel.org:osalvador@suse.de:pasha.tatashin@oracle.com:torvalds@linux-foundation.org:vbabka@suse.cz:zhi.jin@intel.com,RULES_HIT:41:355:379:800:960:966:967:973:988:989:1260:1263:1345:1359:1381:1431:1437:1534:1542:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2525:2559:2563:2682:2685:2693:2859:2898:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3354:3369:3865:3867:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:4385:5007:6117:6119:6261:6653:6737:7576:7875:7903:8599:9025:9545:10004:10913:11026:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12783:12986:14096:14181:14721:14849:21060:21063:21080:21433:21451:21627:21740 :21939:2 X-HE-Tag: size85_22d3dfb68be5c X-Filterd-Recvd-Size: 4242 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Fri, 31 Jan 2020 06:13:59 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2B6D52465B; Fri, 31 Jan 2020 06:13:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1580451238; bh=m+lV3ZHNg7M22XZDPt1ep48Ut5fl3nqCJymg9oDKbIk=; h=Date:From:To:Subject:In-Reply-To:From; b=OlsmYacgccbOQupTUdivy1wje9EwrQ5MyqDwP1QC/hNSfx09X80Z6lQhYig3Fem0i yrziFK4ZezscGQO/SSN8G00kvZyqy3NgPEpkadvssCPx1vQ76lk6UdY+o1gzbbRR0n 7ju1j1MxAU6i8HY1DVdlxwjvmyyoQ7u4dDAL0ZBo= Date: Thu, 30 Jan 2020 22:13:57 -0800 From: Andrew Morton To: akpm@linux-foundation.org, bhe@redhat.com, dan.j.williams@intel.com, david@redhat.com, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@suse.de, mhocko@kernel.org, mhocko@suse.com, mm-commits@vger.kernel.org, osalvador@suse.de, pasha.tatashin@oracle.com, torvalds@linux-foundation.org, vbabka@suse.cz, zhi.jin@intel.com Subject: [patch 052/118] mm/page_alloc: skip non present sections on zone initialization Message-ID: <20200131061357.6KI75t6mY%akpm@linux-foundation.org> In-Reply-To: <20200130221021.5f0211c56346d5485af07923@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Kirill A. Shutemov" Subject: mm/page_alloc: skip non present sections on zone initialization memmap_init_zone() can be called on the ranges with holes during the boot. It will skip any non-valid PFNs one-by-one. It works fine as long as holes are not too big. But huge holes in the memory map causes a problem. It takes over 20 seconds to walk 32TiB hole. x86-64 with 5-level paging allows for much larger holes in the memory map which would practically hang the system. Deferred struct page init doesn't help here. It only works on the present ranges. Skipping non-present sections would fix the issue. Link: http://lkml.kernel.org/r/20191230093828.24613-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov Reviewed-by: Baoquan He Acked-by: Michal Hocko Cc: Dan Williams Cc: Vlastimil Babka Cc: Mel Gorman Cc: "Jin, Zhi" Cc: David Hildenbrand Cc: Michal Hocko Cc: Oscar Salvador Cc: Pavel Tatashin Signed-off-by: Andrew Morton --- mm/page_alloc.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-page_alloc-skip-non-present-sections-on-zone-initialization +++ a/mm/page_alloc.c @@ -5848,6 +5848,30 @@ overlap_memmap_init(unsigned long zone, return false; } +#ifdef CONFIG_SPARSEMEM +/* Skip PFNs that belong to non-present sections */ +static inline __meminit unsigned long next_pfn(unsigned long pfn) +{ + unsigned long section_nr; + + section_nr = pfn_to_section_nr(++pfn); + if (present_section_nr(section_nr)) + return pfn; + + while (++section_nr <= __highest_present_section_nr) { + if (present_section_nr(section_nr)) + return section_nr_to_pfn(section_nr); + } + + return -1; +} +#else +static inline __meminit unsigned long next_pfn(unsigned long pfn) +{ + return pfn++; +} +#endif + /* * Initially all pages are reserved - free ones are freed * up by memblock_free_all() once the early boot process is @@ -5887,8 +5911,10 @@ void __meminit memmap_init_zone(unsigned * function. They do not exist on hotplugged memory. */ if (context == MEMMAP_EARLY) { - if (!early_pfn_valid(pfn)) + if (!early_pfn_valid(pfn)) { + pfn = next_pfn(pfn) - 1; continue; + } if (!early_pfn_in_nid(pfn, nid)) continue; if (overlap_memmap_init(zone, &pfn))