From patchwork Fri Nov 30 21:52:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8090E14E2 for ; Fri, 30 Nov 2018 21:52:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 723943061A for ; Fri, 30 Nov 2018 21:52:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6518E3061F; Fri, 30 Nov 2018 21:52:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D00A63061A for ; Fri, 30 Nov 2018 21:52:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8DE46B5A6C; Fri, 30 Nov 2018 16:52:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C17536B5A6D; Fri, 30 Nov 2018 16:52:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE01C6B5A6E; Fri, 30 Nov 2018 16:52:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 68B736B5A6C for ; Fri, 30 Nov 2018 16:52:55 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id bj3so5048295plb.17 for ; Fri, 30 Nov 2018 13:52:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=94em1wpyLIy/AWmMo3OJIDZ0oZJTMB9fqgJ3TnSzj9o=; b=P/tM7atZo57bXMHVulJjGYjSW3AwrH/Sx31cz7/HuZ7cUtNqeZAKIBlV/YRL42l+af kmUXoAaov+RXvngyU8+vKhVG9OCXocVdhxoMqawfWw8kvGjM7q6oN1MBgIpKbCSZVE8I hLoWN0X1EJvmMnOgdfzMF8LAzObziyZnOaVnWmNMTXqkK/H/VXzZVA/W+Os4vO7wt8V3 oQqdwDkT1lvwKzVn8RKM0YweLG3J9ax5STJaiNb+6FIRKh1dVnSBjVhaSOFjxWci/354 /oB4rdW9GWLesBUb1cpEuA8HrIMx8H4vlmU0AGZKD1dxXsu20qfLR5RkKwCrTm0FYHSd OzzQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWacDyzk2OVkG54aJqVY+dU5hwwerYt5Tj3AT/C/CTHKNz6JZUgF FTfX2wtpV5AKL0XLbQiZ57vbelRHzJmSLLdHWpMNhqB4nebbcy4b12jAl8C8e2+KE9XXyaaYrDA cBXXZdmeNBa/Iz8xwYmKOQhoNzBaQkFIZMcAL7UnakhlnUoqV7VwqkjKcL1MDeKPanA== X-Received: by 2002:a17:902:7896:: with SMTP id q22mr7359771pll.280.1543614775043; Fri, 30 Nov 2018 13:52:55 -0800 (PST) X-Google-Smtp-Source: AFSGD/XJFB6xcY/8kn/3Hr50gemniEsbOIarekIRF4pLmOb2R8VmMUpnBOgV+Te5u9upzf6el4Db X-Received: by 2002:a17:902:7896:: with SMTP id q22mr7359711pll.280.1543614773945; Fri, 30 Nov 2018 13:52:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614773; cv=none; d=google.com; s=arc-20160816; b=S4paHTJyy7ofbDvnMR34b4UESNArwQDIPqN/OFi9M2HA4C1qOT2eDpI4N9BPIYrrSH KFBclu/N+i0g0USNmY/LnqZxY9hHyppFThTOyTU2I7BVMUqCnUvHls1oV7paxkV/Xet2 2QJmyWBUFFKlap1g6mZWZsO7dmqt2eE9fhbXikbRFD0VNPhykFaa87KRtRI3kAK9d8tQ reMciHTgmEK1KiWLYNSIAzVZoYkPJT4XHmqn0a2b+MjuSNUpIl0okifXsDdki2HaybvI X1EvSUo1LOW23PPMYq9ZDkHK3ZwVmPvefYAP0I4h+SXpEn1+5hkgWx5EMmj32MFqhddx 6nmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=94em1wpyLIy/AWmMo3OJIDZ0oZJTMB9fqgJ3TnSzj9o=; b=il+ZMCVhs1mX94remYQNGKM+7Ij1vi8FOZaobZZqT6Nb2wzyGjB9V7ICMuJMYUDrsF esAOvFdmpkjCpUV7gTkXdCSlhWP6/DLoboFjnm93Z8dmRz4KUuo7HDg/AGOfCZiWVV1W hoHJ6uFrhhR9yntJ3H09ChtnxJo4SPestus0ZGCaLglC76MBL+9b/+hqU+6LnQt4F9gj aCnwFTJLzwmYUFKbPZdAuHzqPQRe3h+YjGM3y/BVCAV4p0ZXCI4/SyrnZ0L88sMUfArI xMd80IdUxzseHp66Llc1WOuOsfU/UxdfQLDd6sdbN1aMqFl+83BcChEX+5OZs5+dwHn+ BDnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id i129si6691798pfb.32.2018.11.30.13.52.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:52:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:52:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="114393654" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga001.jf.intel.com with ESMTP; 30 Nov 2018 13:52:53 -0800 Subject: [mm PATCH v6 1/7] mm: Use mm_zero_struct_page from SPARC on all 64b architectures From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:52:53 -0800 Message-ID: <154361477318.7497.13432441396440493352.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Use the same approach that was already in use on Sparc on all the architectures that support a 64b long. This is mostly motivated by the fact that 7 to 10 store/move instructions are likely always going to be faster than having to call into a function that is not specialized for handling page init. An added advantage to doing it this way is that the compiler can get away with combining writes in the __init_single_page call. As a result the memset call will be reduced to only about 4 write operations, or at least that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and count/mapcount seem to be cancelling out at least 4 of the 8 assignments on my system. One change I had to make to the function was to reduce the minimum page size to 56 to support some powerpc64 configurations. This change should introduce no change on SPARC since it already had this code. In the case of x86_64 I saw a reduction from 3.75s to 2.80s when initializing 384GB of RAM per node. Pavel Tatashin tested on a system with Broadcom's Stingray CPU and 48GB of RAM and found that __init_single_page() takes 19.30ns / 64-byte struct page before this patch and with this patch it takes 17.33ns / 64-byte struct page. Mike Rapoport ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor and 128GB or RAM. His results per 64-byte struct page were 4.68ns before, and 4.59ns after this patch. Reviewed-by: Pavel Tatashin Acked-by: Michal Hocko Signed-off-by: Alexander Duyck --- arch/sparc/include/asm/pgtable_64.h | 30 -------------------------- include/linux/mm.h | 41 ++++++++++++++++++++++++++++++++--- 2 files changed, 38 insertions(+), 33 deletions(-) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 1393a8ac596b..22500c3be7a9 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -231,36 +231,6 @@ extern unsigned long _PAGE_ALL_SZ_BITS; extern struct page *mem_map_zero; #define ZERO_PAGE(vaddr) (mem_map_zero) -/* This macro must be updated when the size of struct page grows above 80 - * or reduces below 64. - * The idea that compiler optimizes out switch() statement, and only - * leaves clrx instructions - */ -#define mm_zero_struct_page(pp) do { \ - unsigned long *_pp = (void *)(pp); \ - \ - /* Check that struct page is either 64, 72, or 80 bytes */ \ - BUILD_BUG_ON(sizeof(struct page) & 7); \ - BUILD_BUG_ON(sizeof(struct page) < 64); \ - BUILD_BUG_ON(sizeof(struct page) > 80); \ - \ - switch (sizeof(struct page)) { \ - case 80: \ - _pp[9] = 0; /* fallthrough */ \ - case 72: \ - _pp[8] = 0; /* fallthrough */ \ - default: \ - _pp[7] = 0; \ - _pp[6] = 0; \ - _pp[5] = 0; \ - _pp[4] = 0; \ - _pp[3] = 0; \ - _pp[2] = 0; \ - _pp[1] = 0; \ - _pp[0] = 0; \ - } \ -} while (0) - /* PFNs are real physical page numbers. However, mem_map only begins to record * per-page information starting at pfn_base. This is to handle systems where * the first physical page in the machine is at some huge physical address, diff --git a/include/linux/mm.h b/include/linux/mm.h index 692158d6c619..eb6e52b66bc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -123,10 +123,45 @@ extern int mmap_rnd_compat_bits __read_mostly; /* * On some architectures it is expensive to call memset() for small sizes. - * Those architectures should provide their own implementation of "struct page" - * zeroing by defining this macro in . + * If an architecture decides to implement their own version of + * mm_zero_struct_page they should wrap the defines below in a #ifndef and + * define their own version of this macro in */ -#ifndef mm_zero_struct_page +#if BITS_PER_LONG == 64 +/* This function must be updated when the size of struct page grows above 80 + * or reduces below 56. The idea that compiler optimizes out switch() + * statement, and only leaves move/store instructions. Also the compiler can + * combine write statments if they are both assignments and can be reordered, + * this can result in several of the writes here being dropped. + */ +#define mm_zero_struct_page(pp) __mm_zero_struct_page(pp) +static inline void __mm_zero_struct_page(struct page *page) +{ + unsigned long *_pp = (void *)page; + + /* Check that struct page is either 56, 64, 72, or 80 bytes */ + BUILD_BUG_ON(sizeof(struct page) & 7); + BUILD_BUG_ON(sizeof(struct page) < 56); + BUILD_BUG_ON(sizeof(struct page) > 80); + + switch (sizeof(struct page)) { + case 80: + _pp[9] = 0; /* fallthrough */ + case 72: + _pp[8] = 0; /* fallthrough */ + case 64: + _pp[7] = 0; /* fallthrough */ + case 56: + _pp[6] = 0; + _pp[5] = 0; + _pp[4] = 0; + _pp[3] = 0; + _pp[2] = 0; + _pp[1] = 0; + _pp[0] = 0; + } +} +#else #define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page))) #endif From patchwork Fri Nov 30 21:52:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707251 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CB8013A4 for ; Fri, 30 Nov 2018 21:53:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E55530619 for ; Fri, 30 Nov 2018 21:53:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5288130625; Fri, 30 Nov 2018 21:53:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AEF3E30619 for ; Fri, 30 Nov 2018 21:53:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D0B36B5A6E; Fri, 30 Nov 2018 16:53:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 97E8D6B5A6F; Fri, 30 Nov 2018 16:53:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 847B26B5A70; Fri, 30 Nov 2018 16:53:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 3D01F6B5A6E for ; Fri, 30 Nov 2018 16:53:00 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id d3so4206048pgv.23 for ; Fri, 30 Nov 2018 13:53:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=N45FK2XXjCOh1tefz3nElLKHOhsuwCsJGOYo0qdT5Yg=; b=LhjkeagGvL4Yn2ckgKTvL0ax3y/Ggc7MB91S31/hB282VlwgT6yMf5DfDq5qNGgDUi 4v7L5ZF8FD/rOYVc1V2R+J/otwXgTHgaFn9OKZP3ianFC3D9ILfkxq6NVTsn3Ayc4NE1 e1fmAm9nMHWDTwUMSRYoLw3CZpgESZYQ42HLUzdBs2D9AuCZfiTo/dKT+j27zg7X3VIJ 4EkfIOJAFTOq+2EJmK4H4AHKRx9ptVp60tfWMjtehZWzlDHrHMNJ+EmIXVhDtFnGdNN/ oLDI5D+MFV5ZmLzLLFArtfmwMB7EtsjW8ITa4Lm/tYLZ+x5+VGmYTveUF5nfqSvleJC9 mM+Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZPxqX6QqpMKcBA/b4DkxDAjMhWdAWszev9PDSEs6Eb6EGipyX+ gtvORuBsTYKF5+fu3dLJ2gfRhSpfhDHHCArSYJ2lLhmsbcMVm/TnLs0H8j+0Ct8mWsqcG8ypy35 xk7NsARQA7Z28i/1EYpVv+5jQQkX8UlMcVkRBuvOHAdiAJb1mGza39FP6GioxepUsOg== X-Received: by 2002:a63:da14:: with SMTP id c20mr5889831pgh.233.1543614779865; Fri, 30 Nov 2018 13:52:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/W2e4UoEXW+TR6me3VeZfDtmp1vZto5wtzZGDxJURQiJez5FFZGIeHo0JIjk+JlMel9wzdv X-Received: by 2002:a63:da14:: with SMTP id c20mr5889798pgh.233.1543614778886; Fri, 30 Nov 2018 13:52:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614778; cv=none; d=google.com; s=arc-20160816; b=z9GvtFbSe0AKNQO4SxWOlhodZCzBsV7fGpqX9TatocL6SJ8DI0zAL6AQz9Tdj706g3 RuOXaj2fKMxwlh2K0QVvqS7vP+dvW+NfJksqbPOdmMo0wDcWxfbP6kQT/jGfYUrXj67f QTsar90x7swbw1rKeUuGSIuoBgu43qh/VKTyKdkAA3G1ikgsAhBg5hvT+sPpjPlNAuH1 3uQjdbmVF1lLvWHHo8Drxn55VRwnNc768t0FRXMU00p8OI2pY0TQhH/UjB8/smhCV+6c H+CTLvwsb2oGQzyFemYP9Y4V0i2AgLzrkg3zRO0a1KRkDD4dYechIq4EBMZ3KS+iam8t r2Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=N45FK2XXjCOh1tefz3nElLKHOhsuwCsJGOYo0qdT5Yg=; b=RQ1FG/3gpQn07UfE1rGf/oLz/TVxZn+FSCXPnQSkXAeghzEyYm5ISmRmmMOoc1hUhj pbfkDKVZixxtZVOgmWYQSsIpg2Q9ePQKC7chBWBkontE87BVzxu2sUS5ll4o6gsv8D1O TSXMBKkeAqf+qT4p1WZN9PY4lwU0muQHB7aGaJt16lfI8s0Z7wkS3o5ocM9F+XItzArj E4g5ArtMv/igkiCBos//dNY3rJHKzTsjjmVQOMehOsFBsdW5XitIHzGKjlynTfhkZ6a/ cDOx92M/uYgHFLebOOu1/eJE1mZd96Q0xZbuUh6fOnKsm5u6sgN0N3Zgdq+iCn6ab+pZ 5FQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id t3si6045980pgl.108.2018.11.30.13.52.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:52:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:52:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="114393665" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga001.jf.intel.com with ESMTP; 30 Nov 2018 13:52:58 -0800 Subject: [mm PATCH v6 2/7] mm: Drop meminit_pfn_in_nid as it is redundant From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:52:58 -0800 Message-ID: <154361477830.7497.18073959471440151885.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP As best as I can tell the meminit_pfn_in_nid call is completely redundant. The deferred memory initialization is already making use of for_each_free_mem_range which in turn will call into __next_mem_range which will only return a memory range if it matches the node ID provided assuming it is not NUMA_NO_NODE. I am operating on the assumption that there are no zones or pgdata_t structures that have a NUMA node of NUMA_NO_NODE associated with them. If that is the case then __next_mem_range will never return a memory range that doesn't match the zone's node ID and as such the check is redundant. So one piece I would like to verify on this is if this works for ia64. Technically it was using a different approach to get the node ID, but it seems to have the node ID also encoded into the memblock. So I am assuming this is okay, but would like to get confirmation on that. On my x86_64 test system with 384GB of memory per node I saw a reduction in initialization time from 2.80s to 1.85s as a result of this patch. Reviewed-by: Pavel Tatashin Acked-by: Michal Hocko Signed-off-by: Alexander Duyck --- mm/page_alloc.c | 51 ++++++++++++++------------------------------------- 1 file changed, 14 insertions(+), 37 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 20e1a8a2e98c..09969619ab48 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1308,36 +1308,22 @@ int __meminit early_pfn_to_nid(unsigned long pfn) #endif #ifdef CONFIG_NODES_SPAN_OTHER_NODES -static inline bool __meminit __maybe_unused -meminit_pfn_in_nid(unsigned long pfn, int node, - struct mminit_pfnnid_cache *state) +/* Only safe to use early in boot when initialisation is single-threaded */ +static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node) { int nid; - nid = __early_pfn_to_nid(pfn, state); + nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); if (nid >= 0 && nid != node) return false; return true; } -/* Only safe to use early in boot when initialisation is single-threaded */ -static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node) -{ - return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache); -} - #else - static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node) { return true; } -static inline bool __meminit __maybe_unused -meminit_pfn_in_nid(unsigned long pfn, int node, - struct mminit_pfnnid_cache *state) -{ - return true; -} #endif @@ -1466,21 +1452,13 @@ static inline void __init pgdat_init_report_one_done(void) * * Then, we check if a current large page is valid by only checking the validity * of the head pfn. - * - * Finally, meminit_pfn_in_nid is checked on systems where pfns can interleave - * within a node: a pfn is between start and end of a node, but does not belong - * to this memory node. */ -static inline bool __init -deferred_pfn_valid(int nid, unsigned long pfn, - struct mminit_pfnnid_cache *nid_init_state) +static inline bool __init deferred_pfn_valid(unsigned long pfn) { if (!pfn_valid_within(pfn)) return false; if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn)) return false; - if (!meminit_pfn_in_nid(pfn, nid, nid_init_state)) - return false; return true; } @@ -1488,15 +1466,14 @@ deferred_pfn_valid(int nid, unsigned long pfn, * Free pages to buddy allocator. Try to free aligned pages in * pageblock_nr_pages sizes. */ -static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, +static void __init deferred_free_pages(unsigned long pfn, unsigned long end_pfn) { - struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pgmask = pageblock_nr_pages - 1; unsigned long nr_free = 0; for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) { + if (!deferred_pfn_valid(pfn)) { deferred_free_range(pfn - nr_free, nr_free); nr_free = 0; } else if (!(pfn & nr_pgmask)) { @@ -1516,17 +1493,18 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, * by performing it only once every pageblock_nr_pages. * Return number of pages initialized. */ -static unsigned long __init deferred_init_pages(int nid, int zid, +static unsigned long __init deferred_init_pages(struct zone *zone, unsigned long pfn, unsigned long end_pfn) { - struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pgmask = pageblock_nr_pages - 1; + int nid = zone_to_nid(zone); unsigned long nr_pages = 0; + int zid = zone_idx(zone); struct page *page = NULL; for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) { + if (!deferred_pfn_valid(pfn)) { page = NULL; continue; } else if (!page || !(pfn & nr_pgmask)) { @@ -1589,12 +1567,12 @@ static int __init deferred_init_memmap(void *data) for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); - nr_pages += deferred_init_pages(nid, zid, spfn, epfn); + nr_pages += deferred_init_pages(zone, spfn, epfn); } for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); - deferred_free_pages(nid, zid, spfn, epfn); + deferred_free_pages(spfn, epfn); } pgdat_resize_unlock(pgdat, &flags); @@ -1633,7 +1611,6 @@ static DEFINE_STATIC_KEY_TRUE(deferred_pages); static noinline bool __init deferred_grow_zone(struct zone *zone, unsigned int order) { - int zid = zone_idx(zone); int nid = zone_to_nid(zone); pg_data_t *pgdat = NODE_DATA(nid); unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); @@ -1683,7 +1660,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order) while (spfn < epfn && nr_pages < nr_pages_needed) { t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION); first_deferred_pfn = min(t, epfn); - nr_pages += deferred_init_pages(nid, zid, spfn, + nr_pages += deferred_init_pages(zone, spfn, first_deferred_pfn); spfn = first_deferred_pfn; } @@ -1695,7 +1672,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order) for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa)); - deferred_free_pages(nid, zid, spfn, epfn); + deferred_free_pages(spfn, epfn); if (first_deferred_pfn == epfn) break; From patchwork Fri Nov 30 21:53:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707255 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2CFC17D5 for ; Fri, 30 Nov 2018 21:53:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 942B23061A for ; Fri, 30 Nov 2018 21:53:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8728A30625; Fri, 30 Nov 2018 21:53:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC1D33061A for ; Fri, 30 Nov 2018 21:53:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7D646B5A71; Fri, 30 Nov 2018 16:53:05 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2C826B5A72; Fri, 30 Nov 2018 16:53:05 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D0606B5A73; Fri, 30 Nov 2018 16:53:05 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 53B4A6B5A71 for ; Fri, 30 Nov 2018 16:53:05 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id v2so5068228plg.6 for ; Fri, 30 Nov 2018 13:53:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=+fEMzuEWqdJVp9W2vrD3mSfbFayE/tVkshC8mZ+F7sU=; b=UHLBo38HI4n3XQpoCllc+bgUglHgs/+ibcg1UmSkD05UxO9rCcRAfrlZ4GuUnu/dY9 88Tqm6XY9IjBVpShZtAQmJbhRlJwii8AnhK0a0aqvo5YVBgPLFZFPPZqigsZhKKqoy6g AnVdV37dozMxl7NV9sHZYxMhN79KRmYnM7raKva+VRkzaDzGr8qJbqLlC1Vn6YigHaQU K86pwDjxDhc1s1545l/CEs0yMDZF4U9SNU7EpLiOJercLBb11hay8lEfAv6yscDRS1LT c78k0lTc6TMlFlDtYNaJzfj1yzYHYP3GiHHDgyOSHH57SbKW5kZgi3bsdw4ML5B6DJ+7 Y4Tg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbgGOmBSj8Y7RGchHD94Rn4znDWLaBiShY+IEYUUrvSEK9/fVaC i9TrT9hloGFJLvpIEdaRM/ePQCt+6E0WUTxGdP6W9sC1CeZ9MHPKl6phP/hj1L/cQzPJyzNoRkM oiCDF7WmCSawLZzIgBxLAdz/Aq8g2dpia0xgV8pwPkQtw3rDLrLERKT1Agrpe0TLPZw== X-Received: by 2002:a17:902:24d:: with SMTP id 71mr7035539plc.225.1543614784980; Fri, 30 Nov 2018 13:53:04 -0800 (PST) X-Google-Smtp-Source: AFSGD/X1p9f2RP9Vd1vVGyCpKRx8/R+5iPjGh0sozNavLt3mZccgKXWRqbxMWN9FIaBuvNlWL2Gi X-Received: by 2002:a17:902:24d:: with SMTP id 71mr7035504plc.225.1543614783993; Fri, 30 Nov 2018 13:53:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614783; cv=none; d=google.com; s=arc-20160816; b=oZTu7Bt7OzN1rc75lsmLWYBzmz5Vqv+EUGVowHIOQur3ZFWOyIn/UceRZxv1zG4sPS GwmaySfKR3vbh7919CsrUg6g1zsB9jXyKBdhHGVCW50attzovurDPgxpHqYrTlALJZAH RB9aWsmjLgPIfwkcQnpvZ0qiRdomhpM6THZxJEKLleJuoBwWJlRof2E8IZC9FDKX20zO 2+THfeHGrm7nvl1ofeyKKFuPqhd58FG59VussZJoyS77I+OI8/EqDZ9n0Hs18jVB14RU 3dy3jmB8DaWRpRpxCcB+/W3EqEjlayjg6aU4ScupYHXkz+bqG2W0qpxhlY6HRP5+aYhv q6Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=+fEMzuEWqdJVp9W2vrD3mSfbFayE/tVkshC8mZ+F7sU=; b=RqhLwFpB0aRpAxPwVyR6QdpKyp71GYCLH/TTakn4S5gK3n37VBvw3mlXHMfpcpmyou o3ppsNCt4tzMXeMQuujJ072x13ZXbyJVrVIJOSbSYjEe6+9jCXemt8tfh42y8VOyoAs2 F/ZkgacWgVJg2ClPIc1qPoOY8w9PhnQT0ek86ILd0vP7fdUSYLIguTFizlkNLAhQiegL SFVdt3HlGew+FW4/Ro6K+cKzoj0pckx8p0dWjfaNu1mNGjPVrnrjLMLVWz2Kalj7t9+v mx1Gm1Qcs11V1fky44fjo1Sb5PKFz9jbTOHD9NugqTvin2cSZHv/s+JdMPLcn9+xZHuY U4AQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id l66si6654894pfl.258.2018.11.30.13.53.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:53:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:53:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="105769566" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga003.jf.intel.com with ESMTP; 30 Nov 2018 13:53:03 -0800 Subject: [mm PATCH v6 3/7] mm: Implement new zone specific memblock iterator From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:53:03 -0800 Message-ID: <154361478343.7497.6591693538181082582.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Introduce a new iterator for_each_free_mem_pfn_range_in_zone. This iterator will take care of making sure a given memory range provided is in fact contained within a zone. It takes are of all the bounds checking we were doing in deferred_grow_zone, and deferred_init_memmap. In addition it should help to speed up the search a bit by iterating until the end of a range is greater than the start of the zone pfn range, and will exit completely if the start is beyond the end of the zone. Reviewed-by: Pavel Tatashin Signed-off-by: Alexander Duyck --- include/linux/memblock.h | 25 ++++++++++++++++++ mm/memblock.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 31 +++++++++------------- 3 files changed, 101 insertions(+), 19 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 64c41cf45590..95d1aaa3f412 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -247,6 +247,31 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid)) #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone, + unsigned long *out_spfn, + unsigned long *out_epfn); +/** + * for_each_free_mem_range_in_zone - iterate through zone specific free + * memblock areas + * @i: u64 used as loop variable + * @zone: zone in which all of the memory blocks reside + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL + * + * Walks over free (memory && !reserved) areas of memblock in a specific + * zone. Available once memblock and an empty zone is initialized. The main + * assumption is that the zone start, end, and pgdat have been associated. + * This way we can use the zone to determine NUMA node, and if a given part + * of the memblock is valid for the zone. + */ +#define for_each_free_mem_pfn_range_in_zone(i, zone, p_start, p_end) \ + for (i = 0, \ + __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end); \ + i != U64_MAX; \ + __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end)) +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + /** * for_each_free_mem_range - iterate through free memblock areas * @i: u64 used as loop variable diff --git a/mm/memblock.c b/mm/memblock.c index 57298abc7d98..0e49382033dd 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1247,6 +1247,70 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size, return 0; } #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +/** + * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone() + * + * @idx: pointer to u64 loop variable + * @zone: zone in which all of the memory blocks reside + * @out_spfn: ptr to ulong for start pfn of the range, can be %NULL + * @out_epfn: ptr to ulong for end pfn of the range, can be %NULL + * + * This function is meant to be a zone/pfn specific wrapper for the + * for_each_mem_range type iterators. Specifically they are used in the + * deferred memory init routines and as such we were duplicating much of + * this logic throughout the code. So instead of having it in multiple + * locations it seemed like it would make more sense to centralize this to + * one new iterator that does everything they need. + */ +void __init_memblock +__next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone, + unsigned long *out_spfn, unsigned long *out_epfn) +{ + int zone_nid = zone_to_nid(zone); + phys_addr_t spa, epa; + int nid; + + __next_mem_range(idx, zone_nid, MEMBLOCK_NONE, + &memblock.memory, &memblock.reserved, + &spa, &epa, &nid); + + while (*idx != U64_MAX) { + unsigned long epfn = PFN_DOWN(epa); + unsigned long spfn = PFN_UP(spa); + + /* + * Verify the end is at least past the start of the zone and + * that we have at least one PFN to initialize. + */ + if (zone->zone_start_pfn < epfn && spfn < epfn) { + /* if we went too far just stop searching */ + if (zone_end_pfn(zone) <= spfn) { + *idx = U64_MAX; + break; + } + + if (out_spfn) + *out_spfn = max(zone->zone_start_pfn, spfn); + if (out_epfn) + *out_epfn = min(zone_end_pfn(zone), epfn); + + return; + } + + __next_mem_range(idx, zone_nid, MEMBLOCK_NONE, + &memblock.memory, &memblock.reserved, + &spa, &epa, &nid); + } + + /* signal end of iteration */ + if (out_spfn) + *out_spfn = ULONG_MAX; + if (out_epfn) + *out_epfn = 0; +} + +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 09969619ab48..72f9889e3866 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1523,11 +1523,9 @@ static unsigned long __init deferred_init_pages(struct zone *zone, static int __init deferred_init_memmap(void *data) { pg_data_t *pgdat = data; - int nid = pgdat->node_id; unsigned long start = jiffies; unsigned long nr_pages = 0; unsigned long spfn, epfn, first_init_pfn, flags; - phys_addr_t spa, epa; int zid; struct zone *zone; const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); @@ -1564,14 +1562,12 @@ static int __init deferred_init_memmap(void *data) * freeing pages we can access pages that are ahead (computing buddy * page in __free_one_page()). */ - for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { - spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); - epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); + for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { + spfn = max_t(unsigned long, first_init_pfn, spfn); nr_pages += deferred_init_pages(zone, spfn, epfn); } - for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { - spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); - epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); + for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { + spfn = max_t(unsigned long, first_init_pfn, spfn); deferred_free_pages(spfn, epfn); } pgdat_resize_unlock(pgdat, &flags); @@ -1579,8 +1575,8 @@ static int __init deferred_init_memmap(void *data) /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); - pr_info("node %d initialised, %lu pages in %ums\n", nid, nr_pages, - jiffies_to_msecs(jiffies - start)); + pr_info("node %d initialised, %lu pages in %ums\n", + pgdat->node_id, nr_pages, jiffies_to_msecs(jiffies - start)); pgdat_init_report_one_done(); return 0; @@ -1611,13 +1607,11 @@ static DEFINE_STATIC_KEY_TRUE(deferred_pages); static noinline bool __init deferred_grow_zone(struct zone *zone, unsigned int order) { - int nid = zone_to_nid(zone); - pg_data_t *pgdat = NODE_DATA(nid); unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); + pg_data_t *pgdat = zone->zone_pgdat; unsigned long nr_pages = 0; unsigned long first_init_pfn, spfn, epfn, t, flags; unsigned long first_deferred_pfn = pgdat->first_deferred_pfn; - phys_addr_t spa, epa; u64 i; /* Only the last zone may have deferred pages */ @@ -1653,9 +1647,8 @@ deferred_grow_zone(struct zone *zone, unsigned int order) return false; } - for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { - spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); - epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); + for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { + spfn = max_t(unsigned long, first_init_pfn, spfn); while (spfn < epfn && nr_pages < nr_pages_needed) { t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION); @@ -1669,9 +1662,9 @@ deferred_grow_zone(struct zone *zone, unsigned int order) break; } - for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { - spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); - epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa)); + for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { + spfn = max_t(unsigned long, first_init_pfn, spfn); + epfn = min_t(unsigned long, first_deferred_pfn, epfn); deferred_free_pages(spfn, epfn); if (first_deferred_pfn == epfn) From patchwork Fri Nov 30 21:53:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707259 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18A4B13A4 for ; Fri, 30 Nov 2018 21:53:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0989E30619 for ; Fri, 30 Nov 2018 21:53:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F142D3061E; Fri, 30 Nov 2018 21:53:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 432D63062A for ; Fri, 30 Nov 2018 21:53:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 479776B5A73; Fri, 30 Nov 2018 16:53:11 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4288D6B5A74; Fri, 30 Nov 2018 16:53:11 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F12E6B5A75; Fri, 30 Nov 2018 16:53:11 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id DCEB16B5A73 for ; Fri, 30 Nov 2018 16:53:10 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id v72so4225660pgb.10 for ; Fri, 30 Nov 2018 13:53:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=zCLd1+DvWi0wT7fXXa//0d9jsZktbadND9WYl3VdIeU=; b=YYRDQTSEPhr1uPaZZxzjtApwtmYoC/ITGW2TlOxsdhEPK9kSMwGZq0tuJ9OUS5O/4S Ij1ERyy0NWUqlOC1DJVrf1S88QSVblo0Nv2qvvkkorjJuAYNKnXKRMpBDAzgWtJb4giI g74wNUxvvkG86ervSFO1s04kNpPPvPsKhklsfavyKNghujEgOyjeHuiGUz5T+xnUREAG pS2yZ73YFWIS6JFtCEOftnSNimhaH3JeHBimzoeV/SZckW4Z3GXVsu+mtC49jHPcfmjW nm93EJHKxGJcizPPDRfs2LGbHj3VYvUMXjwmA3faPoMXUNkhoflEQ33Jy4djA/wyZNBH EKKQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaZDIMEcU9iH2czdnjiKsCLGXyt+5djOYqzM90ruS2HWdqt+bNe AtwxC2eTsUrn1d9gBBY7U1s/gRpxTlH17hUt8SgPf+TsAytQaRX3AkAlq3L5g33MjEbociKtxTc +lEv4/VDHIzmWcFV2JYilNNyect7JGQVUwn8z1mmU7L0eOuMjSxQMSxfGIjJ/lSkIXg== X-Received: by 2002:a62:d148:: with SMTP id t8mr7420496pfl.52.1543614790542; Fri, 30 Nov 2018 13:53:10 -0800 (PST) X-Google-Smtp-Source: AFSGD/US6T81MWSq9V44/gEMY6wIAccAg4xdA35VgVkA73fo9q92u6eHmwnm66na+6M/Z1B3MKW4 X-Received: by 2002:a62:d148:: with SMTP id t8mr7420448pfl.52.1543614789399; Fri, 30 Nov 2018 13:53:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614789; cv=none; d=google.com; s=arc-20160816; b=mDCq2Uj/z7Q28+bqLp8ASZNtKrtwSvYRvsviWvx+x9yfuDVMECG+6mhYU+nOsSwWkV KLSn1xSMryGg70smcSckIOKyLk73Q5A2pWYFh/Eg1ef/uZ8H62Ki7c32t9uLC0k13e98 5WZeGchPDnb2T0luJjlTwMcMnyYgONjA5ineshn0b71WCFAfRDrVRNM4I4CtEe8IKkLo AEKQwvQR2ol5ede+epMt1czuvUXOqYHaNaOLYCINTgERgJlxZsBtgDnCFrrWPOtDBwjU Z/eFLpnkCg1j00hXuU3gb9/pRxCPInlAXyVzhSTOMMEdTPz9cYND0dtlhpMPLpHwyy2K 2XfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=zCLd1+DvWi0wT7fXXa//0d9jsZktbadND9WYl3VdIeU=; b=k1X+x2EjawsIQS8HSmqJv5fvsfro2MeJ2nTRB2oRJ2mGONTjiu2yFJYmkDSo138MLl mDslJy4mY79YPsyBzb6DHXd+g9QOCUjRUm6BjevsxMox432x01zep3b4wcG2rEkzO/JJ nFAlcMorg0b1UCCP6q4bhiT0U7pGkO9pLJOiwOfEJOpJBnPRN/F8bBYoxFUe3R5kuXpt gD1Pg49xDaKan3VpPCGkbWr7hsOhM5Mo+SLBRjG19WcqXKysqQ6BvKKTyxYuyKp3nmwU R+feoZGD5+yFGdvJlkSa6Lr4Mu6A1gfFCjVN+GeE6hunYRkVnH/Fgx6nONXBcbxLsOUp 4A9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id j20si5354964pgh.224.2018.11.30.13.53.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:53:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:53:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="96500641" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga006.jf.intel.com with ESMTP; 30 Nov 2018 13:53:08 -0800 Subject: [mm PATCH v6 4/7] mm: Initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:53:08 -0800 Message-ID: <154361478854.7497.15456929701404283744.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add yet another iterator, for_each_free_mem_range_in_zone_from, and then use it to support initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES. By doing this we can greatly improve the cache locality of the pages while we do several loops over them in the init and freeing process. We are able to tighten the loops further as a result of the "from" iterator as we can perform the initial checks for first_init_pfn in our first call to the iterator, and continue without the need for those checks via the "from" iterator. I have added this functionality in the function called deferred_init_mem_pfn_range_in_zone that primes the iterator and causes us to exit if we encounter any failure. On my x86_64 test system with 384GB of memory per node I saw a reduction in initialization time from 1.85s to 1.38s as a result of this patch. Reviewed-by: Pavel Tatashin Signed-off-by: Alexander Duyck --- include/linux/memblock.h | 16 +++++ mm/page_alloc.c | 160 +++++++++++++++++++++++++++++++++------------- 2 files changed, 132 insertions(+), 44 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 95d1aaa3f412..60e100fe5922 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -270,6 +270,22 @@ void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone, __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end); \ i != U64_MAX; \ __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end)) + +/** + * for_each_free_mem_range_in_zone_from - iterate through zone specific + * free memblock areas from a given point + * @i: u64 used as loop variable + * @zone: zone in which all of the memory blocks reside + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL + * + * Walks over free (memory && !reserved) areas of memblock in a specific + * zone, continuing from current position. Available as soon as memblock is + * initialized. + */ +#define for_each_free_mem_pfn_range_in_zone_from(i, zone, p_start, p_end) \ + for (; i != U64_MAX; \ + __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end)) #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ /** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 72f9889e3866..fbd9bd2bc262 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1519,16 +1519,102 @@ static unsigned long __init deferred_init_pages(struct zone *zone, return (nr_pages); } +/* + * This function is meant to pre-load the iterator for the zone init. + * Specifically it walks through the ranges until we are caught up to the + * first_init_pfn value and exits there. If we never encounter the value we + * return false indicating there are no valid ranges left. + */ +static bool __init +deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone, + unsigned long *spfn, unsigned long *epfn, + unsigned long first_init_pfn) +{ + u64 j; + + /* + * Start out by walking through the ranges in this zone that have + * already been initialized. We don't need to do anything with them + * so we just need to flush them out of the system. + */ + for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) { + if (*epfn <= first_init_pfn) + continue; + if (*spfn < first_init_pfn) + *spfn = first_init_pfn; + *i = j; + return true; + } + + return false; +} + +/* + * Initialize and free pages. We do it in two loops: first we initialize + * struct page, than free to buddy allocator, because while we are + * freeing pages we can access pages that are ahead (computing buddy + * page in __free_one_page()). + * + * In order to try and keep some memory in the cache we have the loop + * broken along max page order boundaries. This way we will not cause + * any issues with the buddy page computation. + */ +static unsigned long __init +deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, + unsigned long *end_pfn) +{ + unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); + unsigned long spfn = *start_pfn, epfn = *end_pfn; + unsigned long nr_pages = 0; + u64 j = *i; + + /* First we loop through and initialize the page values */ + for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) { + unsigned long t; + + if (mo_pfn <= spfn) + break; + + t = min(mo_pfn, epfn); + nr_pages += deferred_init_pages(zone, spfn, t); + + if (mo_pfn <= epfn) + break; + } + + /* Reset values and now loop through freeing pages as needed */ + j = *i; + + for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) { + unsigned long t; + + if (mo_pfn <= *start_pfn) + break; + + t = min(mo_pfn, *end_pfn); + deferred_free_pages(*start_pfn, t); + *start_pfn = t; + + if (mo_pfn < *end_pfn) + break; + } + + /* Store our current values to be reused on the next iteration */ + *i = j; + + return nr_pages; +} + /* Initialise remaining memory on a node */ static int __init deferred_init_memmap(void *data) { pg_data_t *pgdat = data; + const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); + unsigned long spfn = 0, epfn = 0, nr_pages = 0; + unsigned long first_init_pfn, flags; unsigned long start = jiffies; - unsigned long nr_pages = 0; - unsigned long spfn, epfn, first_init_pfn, flags; - int zid; struct zone *zone; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); + int zid; u64 i; /* Bind memory initialisation thread to a local node if possible */ @@ -1554,22 +1640,20 @@ static int __init deferred_init_memmap(void *data) if (first_init_pfn < zone_end_pfn(zone)) break; } - first_init_pfn = max(zone->zone_start_pfn, first_init_pfn); + + /* If the zone is empty somebody else may have cleared out the zone */ + if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + first_init_pfn)) + goto zone_empty; /* - * Initialize and free pages. We do it in two loops: first we initialize - * struct page, than free to buddy allocator, because while we are - * freeing pages we can access pages that are ahead (computing buddy - * page in __free_one_page()). + * Initialize and free pages in MAX_ORDER sized increments so + * that we can avoid introducing any issues with the buddy + * allocator. */ - for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { - spfn = max_t(unsigned long, first_init_pfn, spfn); - nr_pages += deferred_init_pages(zone, spfn, epfn); - } - for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { - spfn = max_t(unsigned long, first_init_pfn, spfn); - deferred_free_pages(spfn, epfn); - } + while (spfn < epfn) + nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); +zone_empty: pgdat_resize_unlock(pgdat, &flags); /* Sanity check that the next zone really is unpopulated */ @@ -1609,9 +1693,9 @@ deferred_grow_zone(struct zone *zone, unsigned int order) { unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); pg_data_t *pgdat = zone->zone_pgdat; - unsigned long nr_pages = 0; - unsigned long first_init_pfn, spfn, epfn, t, flags; unsigned long first_deferred_pfn = pgdat->first_deferred_pfn; + unsigned long spfn, epfn, flags; + unsigned long nr_pages = 0; u64 i; /* Only the last zone may have deferred pages */ @@ -1640,36 +1724,24 @@ deferred_grow_zone(struct zone *zone, unsigned int order) return true; } - first_init_pfn = max(zone->zone_start_pfn, first_deferred_pfn); - - if (first_init_pfn >= pgdat_end_pfn(pgdat)) { + /* If the zone is empty somebody else may have cleared out the zone */ + if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + first_deferred_pfn)) { + pgdat->first_deferred_pfn = ULONG_MAX; pgdat_resize_unlock(pgdat, &flags); - return false; + return true; } - for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { - spfn = max_t(unsigned long, first_init_pfn, spfn); - - while (spfn < epfn && nr_pages < nr_pages_needed) { - t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION); - first_deferred_pfn = min(t, epfn); - nr_pages += deferred_init_pages(zone, spfn, - first_deferred_pfn); - spfn = first_deferred_pfn; - } - - if (nr_pages >= nr_pages_needed) - break; + /* + * Initialize and free pages in MAX_ORDER sized increments so + * that we can avoid introducing any issues with the buddy + * allocator. + */ + while (spfn < epfn && nr_pages < nr_pages_needed) { + nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + first_deferred_pfn = spfn; } - for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) { - spfn = max_t(unsigned long, first_init_pfn, spfn); - epfn = min_t(unsigned long, first_deferred_pfn, epfn); - deferred_free_pages(spfn, epfn); - - if (first_deferred_pfn == epfn) - break; - } pgdat->first_deferred_pfn = first_deferred_pfn; pgdat_resize_unlock(pgdat, &flags); From patchwork Fri Nov 30 21:53:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6381914E2 for ; Fri, 30 Nov 2018 21:53:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5187830011 for ; Fri, 30 Nov 2018 21:53:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 454F130627; Fri, 30 Nov 2018 21:53:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BA0430625 for ; Fri, 30 Nov 2018 21:53:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 366CE6B5A75; Fri, 30 Nov 2018 16:53:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 318336B5A76; Fri, 30 Nov 2018 16:53:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 206426B5A77; Fri, 30 Nov 2018 16:53:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id CB3256B5A75 for ; Fri, 30 Nov 2018 16:53:15 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id u20so5529546pfa.1 for ; Fri, 30 Nov 2018 13:53:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=41lN3cxnuq9bzMNZf/cDDA7hzzbpRMOhvdVHF8ENE64=; b=c+rNkfEpTfCK+JNXEh/jN3r+xbDc15Do5I8t6qyuJabahFMso9XdR6sUbAYDMtuP5v ebaV1yQj7SGKHK7Uulc/EgvVyqU5/om+ZVsRCWyxp0Uuui64SaCxmrpgvcwp/vdYySLk LEQLQlFJnKfitgylFA/fgm9JRBDZw1e5eqC7WI2e/DJ1QDCWRgB3gWJXyIhOEo8ddLU7 cJ6iUz38Ja6h3R8+eUwl9trFt645PcfrgzLXzFsMqwDPn4bhYX5Ecq1KKQimPXoaqW/a eBuSDw1Apav+CV5Nsy892ti7nVe+RvjFkaFyjUetzjdkjCIgtYKjhct4f3SzlV1Ifno3 RAAQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbPmIdBlLrTw4/RL7OJJsy6nn/Yjgv4fepC57BTqW8keZhq6brx 3iwUptQAkn5xnyPbKLecsy4kZ/vYUdLmbki6wPFwg1ULf5yl8SJMmOIYPTyYmbEIrAcez6/U+GO e+XL5T8WYAPMZ2LiWdHISJS+uSvS509pE6Ry/93Awq62E0F+hcWDZmE5TdFNLA5i+lg== X-Received: by 2002:aa7:8608:: with SMTP id p8mr7339399pfn.125.1543614795421; Fri, 30 Nov 2018 13:53:15 -0800 (PST) X-Google-Smtp-Source: AFSGD/XkSlYpzU8ndG4K0Df+JeudgEkKUPMUnfMV0mzyfdEvp2kzDhcKsaRcW68flhElvBhYpxqP X-Received: by 2002:aa7:8608:: with SMTP id p8mr7339341pfn.125.1543614794207; Fri, 30 Nov 2018 13:53:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614794; cv=none; d=google.com; s=arc-20160816; b=JL/kUpIs1d7m3t2nz6vhRq1ksLgn699o15d3JRmAGAzFOiJ+i7X6ix8JrAk1stxKv8 4HfFudFlo5jHaPp/hIqv0UBo9LOUE8g1d685tLoaPQUYEmmSSyRvRoO9BIVMVtLZQVKx vMURXmHlAwuKDfkZuM+P1hO2v3x7VsAIRFa7FJCFppnYsoWfIVJijrD+ju1MBn+vXUOk OIXjRzR4IIufofEcGc+n42FKjCvvqWx7KmpVRroEvloz6qkUalUqCHANHz6p61DMJ/oP VZpvXI5xbmeLOALSpZw1YUInawySugrNKQQrgFf9braw45Q+zCJgVjoRdVzPi4+j85Y7 v19g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=41lN3cxnuq9bzMNZf/cDDA7hzzbpRMOhvdVHF8ENE64=; b=pOOex+Q2uymf2GFEzaTkuyVuV5iweWqIqyYdrzgFJv/SufDakH2M1rQbj/OlcM+SBD sPGTHe0I483MHiXWZLjrGlX7bWI7zHMwTt+p+6LPh/MGkbILoqEolEH4r+96IpHn6+Ut Pv7beQZHx1dKPCyKGD9BzUJh+5hFfySvzYv5TFzrmsfcrbDZ0dpw+abOsEYRVbad6hTa q1CThOqe7zqnKincbyFEh8ztFL1q9b9rOVW/oV5375nj+rDz6dHvjhHWm2Z/tmbyYFU5 Hl28R4QTTi83FkE2mqLQn2xlyeD8YtaAlIpV48WoVzAnUzHrulgF6rOQ9VNglEDlZoev t7/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id o125-v6si6373149pfb.35.2018.11.30.13.53.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:53:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:53:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="114393781" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga001.jf.intel.com with ESMTP; 30 Nov 2018 13:53:13 -0800 Subject: [mm PATCH v6 5/7] mm: Move hot-plug specific memory init into separate functions and optimize From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:53:13 -0800 Message-ID: <154361479366.7497.13916678539146224699.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Combine the bits in memmap_init_zone and memmap_init_zone_device that are related to hotplug into a single function called __memmap_init_hotplug. Also take the opportunity to integrate __init_single_page's functionality into this function. In doing so we can get rid of some of the redundancy such as the LRU pointers versus the pgmap. Reviewed-by: Pavel Tatashin Signed-off-by: Alexander Duyck --- mm/page_alloc.c | 208 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 135 insertions(+), 73 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fbd9bd2bc262..416bbb6f05ab 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1181,8 +1181,9 @@ static void free_one_page(struct zone *zone, spin_unlock(&zone->lock); } -static void __meminit __init_single_page(struct page *page, unsigned long pfn, - unsigned long zone, int nid) +static void __meminit __init_struct_page_nolru(struct page *page, + unsigned long pfn, + unsigned long zone, int nid) { mm_zero_struct_page(page); set_page_links(page, zone, nid, pfn); @@ -1191,7 +1192,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, page_cpupid_reset_last(page); page_kasan_tag_reset(page); - INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL /* The shift won't overflow because ZONE_NORMAL is below 4G. */ if (!is_highmem_idx(zone)) @@ -1199,6 +1199,80 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, #endif } +static void __meminit __init_single_page(struct page *page, unsigned long pfn, + unsigned long zone, int nid) +{ + __init_struct_page_nolru(page, pfn, zone, nid); + INIT_LIST_HEAD(&page->lru); +} + +static void __meminit __init_pageblock(unsigned long start_pfn, + unsigned long nr_pages, + unsigned long zone, int nid, + struct dev_pagemap *pgmap) +{ + unsigned long nr_pgmask = pageblock_nr_pages - 1; + struct page *start_page = pfn_to_page(start_pfn); + unsigned long pfn = start_pfn + nr_pages - 1; + struct page *page; + + /* + * Enforce the following requirements: + * size > 0 + * size < pageblock_nr_pages + * start_pfn -> pfn does not cross pageblock_nr_pages boundary + */ + VM_BUG_ON(((start_pfn ^ pfn) | (nr_pages - 1)) > nr_pgmask); + + /* + * Work from highest page to lowest, this way we will still be + * warm in the cache when we call set_pageblock_migratetype + * below. + * + * The loop is based around the page pointer as the main index + * instead of the pfn because pfn is not used inside the loop if + * the section number is not in page flags and WANT_PAGE_VIRTUAL + * is not defined. + */ + for (page = start_page + nr_pages; page-- != start_page; pfn--) { + __init_struct_page_nolru(page, pfn, zone, nid); + /* + * Mark page reserved as it will need to wait for onlining + * phase for it to be fully associated with a zone. + * + * We can use the non-atomic __set_bit operation for setting + * the flag as we are still initializing the pages. + */ + __SetPageReserved(page); + /* + * ZONE_DEVICE pages union ->lru with a ->pgmap back + * pointer and hmm_data. It is a bug if a ZONE_DEVICE + * page is ever freed or placed on a driver-private list. + */ + page->pgmap = pgmap; + if (!pgmap) + INIT_LIST_HEAD(&page->lru); + } + + /* + * Mark the block movable so that blocks are reserved for + * movable at startup. This will force kernel allocations + * to reserve their blocks rather than leaking throughout + * the address space during boot when many long-lived + * kernel allocations are made. + * + * bitmap is created for zone's valid pfn range. but memmap + * can be created for invalid pages (for alignment) + * check here not to call set_pageblock_migratetype() against + * pfn out of zone. + * + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap + * because this is done early in sparse_add_one_section + */ + if (!(start_pfn & nr_pgmask)) + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); +} + #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT static void __meminit init_reserved_page(unsigned long pfn) { @@ -5693,6 +5767,25 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) return false; } +static void __meminit __memmap_init_hotplug(unsigned long size, int nid, + unsigned long zone, + unsigned long start_pfn, + struct dev_pagemap *pgmap) +{ + unsigned long pfn = start_pfn + size; + + while (pfn != start_pfn) { + unsigned long stride = pfn; + + pfn = max(ALIGN_DOWN(pfn - 1, pageblock_nr_pages), start_pfn); + stride -= pfn; + + __init_pageblock(pfn, stride, zone, nid, pgmap); + + cond_resched(); + } +} + /* * Initially all pages are reserved - free ones are freed * up by memblock_free_all() once the early boot process is @@ -5703,49 +5796,59 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, struct vmem_altmap *altmap) { unsigned long pfn, end_pfn = start_pfn + size; - struct page *page; if (highest_memmap_pfn < end_pfn - 1) highest_memmap_pfn = end_pfn - 1; + if (context == MEMMAP_HOTPLUG) { #ifdef CONFIG_ZONE_DEVICE - /* - * Honor reservation requested by the driver for this ZONE_DEVICE - * memory. We limit the total number of pages to initialize to just - * those that might contain the memory mapping. We will defer the - * ZONE_DEVICE page initialization until after we have released - * the hotplug lock. - */ - if (zone == ZONE_DEVICE) { - if (!altmap) - return; + /* + * Honor reservation requested by the driver for this + * ZONE_DEVICE memory. We limit the total number of pages to + * initialize to just those that might contain the memory + * mapping. We will defer the ZONE_DEVICE page initialization + * until after we have released the hotplug lock. + */ + if (zone == ZONE_DEVICE) { + if (!altmap) + return; + + if (start_pfn == altmap->base_pfn) + start_pfn += altmap->reserve; + end_pfn = altmap->base_pfn + + vmem_altmap_offset(altmap); + } +#endif + /* + * For these ZONE_DEVICE pages we don't need to record the + * pgmap as they should represent only those pages used to + * store the memory map. The actual ZONE_DEVICE pages will + * be initialized later. + */ + __memmap_init_hotplug(end_pfn - start_pfn, nid, zone, + start_pfn, NULL); - if (start_pfn == altmap->base_pfn) - start_pfn += altmap->reserve; - end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); + return; } -#endif for (pfn = start_pfn; pfn < end_pfn; pfn++) { + struct page *page; + /* * There can be holes in boot-time mem_map[]s handed to this * function. They do not exist on hotplugged memory. */ - if (context == MEMMAP_EARLY) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; - if (overlap_memmap_init(zone, &pfn)) - continue; - if (defer_init(nid, pfn, end_pfn)) - break; - } + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + if (overlap_memmap_init(zone, &pfn)) + continue; + if (defer_init(nid, pfn, end_pfn)) + break; page = pfn_to_page(pfn); __init_single_page(page, pfn, zone, nid); - if (context == MEMMAP_HOTPLUG) - __SetPageReserved(page); /* * Mark the block movable so that blocks are reserved for @@ -5772,7 +5875,6 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long size, struct dev_pagemap *pgmap) { - unsigned long pfn, end_pfn = start_pfn + size; struct pglist_data *pgdat = zone->zone_pgdat; unsigned long zone_idx = zone_idx(zone); unsigned long start = jiffies; @@ -5788,53 +5890,13 @@ void __ref memmap_init_zone_device(struct zone *zone, */ if (pgmap->altmap_valid) { struct vmem_altmap *altmap = &pgmap->altmap; + unsigned long end_pfn = start_pfn + size; start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); size = end_pfn - start_pfn; } - for (pfn = start_pfn; pfn < end_pfn; pfn++) { - struct page *page = pfn_to_page(pfn); - - __init_single_page(page, pfn, zone_idx, nid); - - /* - * Mark page reserved as it will need to wait for onlining - * phase for it to be fully associated with a zone. - * - * We can use the non-atomic __set_bit operation for setting - * the flag as we are still initializing the pages. - */ - __SetPageReserved(page); - - /* - * ZONE_DEVICE pages union ->lru with a ->pgmap back - * pointer and hmm_data. It is a bug if a ZONE_DEVICE - * page is ever freed or placed on a driver-private list. - */ - page->pgmap = pgmap; - page->hmm_data = 0; - - /* - * Mark the block movable so that blocks are reserved for - * movable at startup. This will force kernel allocations - * to reserve their blocks rather than leaking throughout - * the address space during boot when many long-lived - * kernel allocations are made. - * - * bitmap is created for zone's valid pfn range. but memmap - * can be created for invalid pages (for alignment) - * check here not to call set_pageblock_migratetype() against - * pfn out of zone. - * - * Please note that MEMMAP_HOTPLUG path doesn't clear memmap - * because this is done early in sparse_add_one_section - */ - if (!(pfn & (pageblock_nr_pages - 1))) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - cond_resched(); - } - } + __memmap_init_hotplug(size, nid, zone_idx, start_pfn, pgmap); pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev), size, jiffies_to_msecs(jiffies - start)); From patchwork Fri Nov 30 21:53:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707267 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E27313A4 for ; Fri, 30 Nov 2018 21:53:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F88330627 for ; Fri, 30 Nov 2018 21:53:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 12E493062B; Fri, 30 Nov 2018 21:53:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8117930627 for ; Fri, 30 Nov 2018 21:53:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69B406B5A77; Fri, 30 Nov 2018 16:53:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 64A596B5A78; Fri, 30 Nov 2018 16:53:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 539F86B5A79; Fri, 30 Nov 2018 16:53:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 0FDE36B5A77 for ; Fri, 30 Nov 2018 16:53:22 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id p9so5519068pfj.3 for ; Fri, 30 Nov 2018 13:53:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=bpl5o0AF0Jqp93Btqfi2CQDNS4Bd8fSa0w8Rb9LrKRY=; b=NoHoqVVF1soby0pPFA/D94kn8DtC9hKMrpvB+0OSiCLz8cv5qzy7YVus8+oQqU4Hw8 QXCWDUL9Rxst7paCiB7E5jQrv14KnTSkkOq16fQHLloJPsuLWhjXXP4C/GSOGWfzdfsF klC0GGhGaaKyG2OuWKgGaKmkWT1bL8VGR6cBQeSZLdDXOB+cn9vS3ucfda1ULMO2wJbI p+EXlPwYdBcq5bWZZKL7mzxzwZ6lN4xX9wYXIAIIlS0tLha+3MwO1Vuqean7JdI/TyAN 1hNzGMEEJGCVCwJi2Al9BnzVNEC+VIRfLuHEwFhS/kdK31KMGCzY0uXu9AP2lqEnSL3Z LdBA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbuhXb9QD0jyxuKC7wUcodpgiueEo2t5djipMoxdUyi9WR0L7Q7 spid3cEz/nNgI3EL6P0loRBklM2z6C81UkGZOkZKqkqiG5WTSxgCR7mT7aXElRTEYPBRswbiL4+ clW0NPRs/BssO3htUwBCtiRnJG2lhERDKGa2znbW8x7dlthyhMsdXX2zNfmKg7E0jow== X-Received: by 2002:a63:ce50:: with SMTP id r16mr6180068pgi.217.1543614801707; Fri, 30 Nov 2018 13:53:21 -0800 (PST) X-Google-Smtp-Source: AFSGD/X4N+o9DFnbN4fozny6NdoaMeircTKRAZDFcCDRQd0d4L/4wL2Q2Jrqzv69XYqsUbmr/Qef X-Received: by 2002:a63:ce50:: with SMTP id r16mr6180031pgi.217.1543614800772; Fri, 30 Nov 2018 13:53:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614800; cv=none; d=google.com; s=arc-20160816; b=zBqlD5UUCAj34sH/yEoSFXXDs18ZPC0PAvUZ1yLghH3w8norT2dWKxCzEvlLTYhjoG Gy8qWtYjJLytX8q7FazLOXclrzZ/Tae59oSleqGCloGpKHVDcHeY4eMaDb/Y++v9XOPC gWRiLDtHJkcbFQim7r3yA51ExhsqrUnjZ85DPyVUIIQoqMYp7/V+qLWbWMWDlDHzo821 9uY5jpmIAZZGOso884d6WIBteYjI2jA7NNW1+cB5QMxhdrDefwOLl94l6xGU3+Rt8NCS JZwiONbZszxKtelUKLnRTtkpEY7FL/Sk8yrSXXrNRCBk4slUZloK7RdtNJCXtwE6UXaP 0DSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=bpl5o0AF0Jqp93Btqfi2CQDNS4Bd8fSa0w8Rb9LrKRY=; b=oqvje7OAtKBshqe+d+4FpTZbpmqAKW/Sz5kC2GWo3U4RECCxJHgfEAXpORaS+i4Tf4 0w5PfZ2JwSSqgl4DfmQ2AxFSBMizkGnR4Cq95VnlA+tN23MQbFoZUCpnvXkahLW4nbVW r58ETg55CAKJREp2D+GL6OzRzRT7hF+Sva5ZBZllsMfNIdWeLxWCgqPkP+6XjXYM68kb AXYE5nXQL2b6oVPxKnAzXwf+CjelJeC1M5sA2npbtJICuiwpdZEtfAn8v3pZEHT4A3og mnYrVMwUJvCm25lDHFrEoY9lH0vJRr6vR2xLPdKmz+jR0fGPtM/uUo9pemH6Tw7YVmSe 5oAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id f38si5003973pgf.206.2018.11.30.13.53.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:53:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:53:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="94786691" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga007.jf.intel.com with ESMTP; 30 Nov 2018 13:53:18 -0800 Subject: [mm PATCH v6 6/7] mm: Add reserved flag setting to set_page_links From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:53:18 -0800 Message-ID: <154361479877.7497.2824031260670152276.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Modify the set_page_links function to include the setting of the reserved flag via a simple AND and OR operation. The motivation for this is the fact that the existing __set_bit call still seems to have effects on performance as replacing the call with the AND and OR can reduce initialization time. Looking over the assembly code before and after the change the main difference between the two is that the reserved bit is stored in a value that is generated outside of the main initialization loop and is then written with the other flags field values in one write to the page->flags value. Previously the generated value was written and then then a btsq instruction was issued. On my x86_64 test system with 3TB of persistent memory per node I saw the persistent memory initialization time on average drop from 23.49s to 19.12s per node. Reviewed-by: Pavel Tatashin Signed-off-by: Alexander Duyck --- include/linux/mm.h | 9 ++++++++- mm/page_alloc.c | 39 +++++++++++++++++++++++++-------------- 2 files changed, 33 insertions(+), 15 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index eb6e52b66bc2..5faf66dd4559 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1238,11 +1238,18 @@ static inline void set_page_node(struct page *page, unsigned long node) page->flags |= (node & NODES_MASK) << NODES_PGSHIFT; } +static inline void set_page_reserved(struct page *page, bool reserved) +{ + page->flags &= ~(1ul << PG_reserved); + page->flags |= (unsigned long)(!!reserved) << PG_reserved; +} + static inline void set_page_links(struct page *page, enum zone_type zone, - unsigned long node, unsigned long pfn) + unsigned long node, unsigned long pfn, bool reserved) { set_page_zone(page, zone); set_page_node(page, node); + set_page_reserved(page, reserved); #ifdef SECTION_IN_PAGE_FLAGS set_page_section(page, pfn_to_section_nr(pfn)); #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 416bbb6f05ab..61eb9945d805 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1183,10 +1183,16 @@ static void free_one_page(struct zone *zone, static void __meminit __init_struct_page_nolru(struct page *page, unsigned long pfn, - unsigned long zone, int nid) + unsigned long zone, int nid, + bool is_reserved) { mm_zero_struct_page(page); - set_page_links(page, zone, nid, pfn); + + /* + * We can use a non-atomic operation for setting the + * PG_reserved flag as we are still initializing the pages. + */ + set_page_links(page, zone, nid, pfn, is_reserved); init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); @@ -1202,14 +1208,15 @@ static void __meminit __init_struct_page_nolru(struct page *page, static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { - __init_struct_page_nolru(page, pfn, zone, nid); + __init_struct_page_nolru(page, pfn, zone, nid, false); INIT_LIST_HEAD(&page->lru); } static void __meminit __init_pageblock(unsigned long start_pfn, unsigned long nr_pages, unsigned long zone, int nid, - struct dev_pagemap *pgmap) + struct dev_pagemap *pgmap, + bool is_reserved) { unsigned long nr_pgmask = pageblock_nr_pages - 1; struct page *start_page = pfn_to_page(start_pfn); @@ -1235,15 +1242,8 @@ static void __meminit __init_pageblock(unsigned long start_pfn, * is not defined. */ for (page = start_page + nr_pages; page-- != start_page; pfn--) { - __init_struct_page_nolru(page, pfn, zone, nid); - /* - * Mark page reserved as it will need to wait for onlining - * phase for it to be fully associated with a zone. - * - * We can use the non-atomic __set_bit operation for setting - * the flag as we are still initializing the pages. - */ - __SetPageReserved(page); + __init_struct_page_nolru(page, pfn, zone, nid, is_reserved); + /* * ZONE_DEVICE pages union ->lru with a ->pgmap back * pointer and hmm_data. It is a bug if a ZONE_DEVICE @@ -5780,7 +5780,18 @@ static void __meminit __memmap_init_hotplug(unsigned long size, int nid, pfn = max(ALIGN_DOWN(pfn - 1, pageblock_nr_pages), start_pfn); stride -= pfn; - __init_pageblock(pfn, stride, zone, nid, pgmap); + /* + * The last argument of __init_pageblock is a boolean + * value indicating if the page will be marked as reserved. + * + * Mark page reserved as it will need to wait for onlining + * phase for it to be fully associated with a zone. + * + * Under certain circumstances ZONE_DEVICE pages may not + * need to be marked as reserved, however there is still + * code that is depending on this being set for now. + */ + __init_pageblock(pfn, stride, zone, nid, pgmap, true); cond_resched(); } From patchwork Fri Nov 30 21:53:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10707271 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 327D614E2 for ; Fri, 30 Nov 2018 21:53:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2311630625 for ; Fri, 30 Nov 2018 21:53:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 178EA30065; Fri, 30 Nov 2018 21:53:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 779FD30620 for ; Fri, 30 Nov 2018 21:53:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3188A6B5A79; Fri, 30 Nov 2018 16:53:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2C89F6B5A7A; Fri, 30 Nov 2018 16:53:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 190976B5A7B; Fri, 30 Nov 2018 16:53:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id C7E9A6B5A79 for ; Fri, 30 Nov 2018 16:53:25 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id f69so5520200pff.5 for ; Fri, 30 Nov 2018 13:53:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=NJWMWnGqc5wXx77X2p+emxeUcX1MEi7E0F8HR0yF8tM=; b=fTT09+14wE7WERqqFpaLG4F04ZMmz5AWPf+QmlTrNkTsU1KX5AWG5gakEim/AEk1YI ZdjN1yJIPRAkUoMHgeGMR+5xiXlmupxP5dof9wAHJO7iJg/7FrRk2csQXgbh2+W3sDFP k0da7aKOb7IB5EqW7IArPDXIGr19rIwtHhfruNAs5XNULXe6ZYXW47B2zDe8Nx85Cxyk WaLrcYN0n7Iwp7LZk4IXDNcr+OKE9TA4sgO/pdHofD0svrgEQjgm+8BEkxN13xxAm1Ck +5hDJkHJVpwrboHHnM9UpiS4y8z3/NsE10dvBw8C1FbxetyOBBaPVG7CQ5h4ubqwnyzU lEtw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYLIyjDBQvpGCHaYOOpRycyx/yGiBQisIdGFUS/6UeRuTGKxVqi GpJgVrIfJDrku6iRAZDtJW1LiCThYbUpuGIJ07XD9c0M9YSny/hlMiFcoyoLFN7Mxnlf4rugxhj M4vmDbuHmeyeMT9nc2Vxkuad0eDKkkaN320Z9ZnmlnRS955O77omuwTZNwCnosg8sEA== X-Received: by 2002:a17:902:f01:: with SMTP id 1mr7016757ply.143.1543614805477; Fri, 30 Nov 2018 13:53:25 -0800 (PST) X-Google-Smtp-Source: AFSGD/X4JvRKSGQuGcUOSYYMoKccKyZhtf00ch/PQ0Jy2k+2Q45ekQulJBB7w4l7B/3wbhlB2F5f X-Received: by 2002:a17:902:f01:: with SMTP id 1mr7016713ply.143.1543614804482; Fri, 30 Nov 2018 13:53:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543614804; cv=none; d=google.com; s=arc-20160816; b=y29hZ83gAst9DiH9PPHdPvkRmQ4WSuhPS9QRpLxMBCIog6g8yEe/1bJYRQCU1qBzz6 FCh1qXj//Re72AnJEgYR02Hc3ntNSpSAn9+FgmfYGin/Gx7i0XXa7Ws5Q5r1vAbba6Mj 05dDjM0YlqtkYqdC/oTk5P0KFn8aA4BhMmO48S7/jn9iLjAUgQggdYg1MfoiUqF+YFtI BiWnYH0gH2nYtvPqqaMpF9NRu38JZl7dWbOW9bn3cLTOAg/JdXEeC6AKhi/ErGjH5TRk MqN/RCb6HMkpWJ6DrDOM/c1zqP39vmt3fUK/qzAoog2amirYh99Mul6eeut1duqdJH88 vsbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=NJWMWnGqc5wXx77X2p+emxeUcX1MEi7E0F8HR0yF8tM=; b=jcREmDoWgr45tjejTusFhN2XpyKzQn9cztJ2uKftLX3XeGnQ5o5p9mYSg2n0Lne4KX s2WPYOHVjozf0Ewn575wwCyr7Cjr0eyGpg0HYFyCxfIWchG9nnq/sdfKf8SR5qG3W8vs YLzZFxky/tS5Ot28FkK7xAKR5v4LZd3sMXNvRWBubIFguASI4+AmY2iLB3Ot56ixeVDo sn+UR5l2KzEwn0e6D7r6SFps3ej1ydRHk0lrdpbAK6eUjOUGOeXooqF0JOZ1VcLr0KOP mFUDX8ACoDk4VuO4iqXizawzCsU3hM/eFzYQzJqnbm+YOCf5BpUm/1cKbSiBiKWGmc2A +lYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id t6si6239072pgn.258.2018.11.30.13.53.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 13:53:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of alexander.h.duyck@linux.intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Nov 2018 13:53:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,300,1539673200"; d="scan'208";a="96500691" Received: from ahduyck-desk1.jf.intel.com (HELO ahduyck-desk1.amr.corp.intel.com) ([10.7.198.76]) by orsmga006.jf.intel.com with ESMTP; 30 Nov 2018 13:53:23 -0800 Subject: [mm PATCH v6 7/7] mm: Use common iterator for deferred_init_pages and deferred_free_pages From: Alexander Duyck To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com, mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com, willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com Date: Fri, 30 Nov 2018 13:53:23 -0800 Message-ID: <154361480390.7497.9730184349746888133.stgit@ahduyck-desk1.amr.corp.intel.com> In-Reply-To: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> References: <154361452447.7497.1348692079883153517.stgit@ahduyck-desk1.amr.corp.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Create a common iterator to be used by both deferred_init_pages and deferred_free_pages. By doing this we can cut down a bit on code overhead as they will likely both be inlined into the same function anyway. This new approach allows deferred_init_pages to make use of __init_pageblock. By doing this we can cut down on the code size by sharing code between both the hotplug and deferred memory init code paths. An additional benefit to this approach is that we improve in cache locality of the memory init as we can focus on the memory areas related to identifying if a given PFN is valid and keep that warm in the cache until we transition to a region of a different type. So we will stream through a chunk of valid blocks before we turn to initializing page structs. On my x86_64 test system with 384GB of memory per node I saw a reduction in initialization time from 1.38s to 1.06s as a result of this patch. Reviewed-by: Pavel Tatashin Signed-off-by: Alexander Duyck --- mm/page_alloc.c | 146 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 77 insertions(+), 69 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 61eb9945d805..48c6fc73a70d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1481,32 +1481,6 @@ void clear_zone_contiguous(struct zone *zone) } #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __init deferred_free_range(unsigned long pfn, - unsigned long nr_pages) -{ - struct page *page; - unsigned long i; - - if (!nr_pages) - return; - - page = pfn_to_page(pfn); - - /* Free a large naturally-aligned chunk if possible */ - if (nr_pages == pageblock_nr_pages && - (pfn & (pageblock_nr_pages - 1)) == 0) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_core(page, pageblock_order); - return; - } - - for (i = 0; i < nr_pages; i++, page++, pfn++) { - if ((pfn & (pageblock_nr_pages - 1)) == 0) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_core(page, 0); - } -} - /* Completion tracking for deferred_init_memmap() threads */ static atomic_t pgdat_init_n_undone __initdata; static __initdata DECLARE_COMPLETION(pgdat_init_all_done_comp); @@ -1518,48 +1492,89 @@ static inline void __init pgdat_init_report_one_done(void) } /* - * Returns true if page needs to be initialized or freed to buddy allocator. + * Returns count if page range needs to be initialized or freed * - * First we check if pfn is valid on architectures where it is possible to have - * holes within pageblock_nr_pages. On systems where it is not possible, this - * function is optimized out. + * First we check if the contiguous pfns are valid on architectures where it + * is possible to have holes within pageblock_nr_pages. On systems where it + * is not possible, this function is optimized out. + * + * Then, we check if a current large page is valid by only checking the + * validity of the head pfn. * - * Then, we check if a current large page is valid by only checking the validity - * of the head pfn. */ -static inline bool __init deferred_pfn_valid(unsigned long pfn) +static unsigned long __next_pfn_valid_range(unsigned long *pfn, + unsigned long *i, + unsigned long end_pfn) { - if (!pfn_valid_within(pfn)) - return false; - if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn)) - return false; - return true; + unsigned long start_pfn = *i; + + while (start_pfn < end_pfn) { + unsigned long t = ALIGN(start_pfn + 1, pageblock_nr_pages); + unsigned long pageblock_pfn = min(t, end_pfn); + unsigned long count = 0; + +#ifndef CONFIG_HOLES_IN_ZONE + if (pfn_valid(start_pfn)) + count = pageblock_pfn - start_pfn; + start_pfn = pageblock_pfn; +#else + while (start_pfn < pageblock_pfn) { + if (pfn_valid(start_pfn++)) { + count++; + continue; + } + + if (!count) + continue; + + /* + * The last PFN was invalid, report the block of + * PFNs we currently have available and skip over + * the invalid one. + */ + *pfn = start_pfn - (count + 1); + *i = start_pfn; + return count; + } +#endif + if (!count) + continue; + + *pfn = start_pfn - count; + *i = start_pfn; + return count; + } + + return 0; } +#define for_each_deferred_pfn_valid_range(pfn, count, i, start_pfn, end_pfn) \ + for (i = (start_pfn), \ + count = __next_pfn_valid_range(&pfn, &i, (end_pfn)); \ + count; \ + count = __next_pfn_valid_range(&pfn, &i, (end_pfn))) + /* * Free pages to buddy allocator. Try to free aligned pages in * pageblock_nr_pages sizes. */ -static void __init deferred_free_pages(unsigned long pfn, +static void __init deferred_free_pages(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long nr_pgmask = pageblock_nr_pages - 1; - unsigned long nr_free = 0; - - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(pfn)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 0; - } else if (!(pfn & nr_pgmask)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 1; - touch_nmi_watchdog(); + unsigned long i, pfn, count; + + for_each_deferred_pfn_valid_range(pfn, count, i, start_pfn, end_pfn) { + struct page *page = pfn_to_page(pfn); + + if (count == pageblock_nr_pages) { + __free_pages_core(page, pageblock_order); } else { - nr_free++; + while (count--) + __free_pages_core(page++, 0); } + + touch_nmi_watchdog(); } - /* Free the last block of pages to allocator */ - deferred_free_range(pfn - nr_free, nr_free); } /* @@ -1568,29 +1583,22 @@ static void __init deferred_free_pages(unsigned long pfn, * Return number of pages initialized. */ static unsigned long __init deferred_init_pages(struct zone *zone, - unsigned long pfn, + unsigned long start_pfn, unsigned long end_pfn) { - unsigned long nr_pgmask = pageblock_nr_pages - 1; int nid = zone_to_nid(zone); + unsigned long i, pfn, count; unsigned long nr_pages = 0; int zid = zone_idx(zone); - struct page *page = NULL; - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(pfn)) { - page = NULL; - continue; - } else if (!page || !(pfn & nr_pgmask)) { - page = pfn_to_page(pfn); - touch_nmi_watchdog(); - } else { - page++; - } - __init_single_page(page, pfn, zid, nid); - nr_pages++; + for_each_deferred_pfn_valid_range(pfn, count, i, start_pfn, end_pfn) { + nr_pages += count; + __init_pageblock(pfn, count, zid, nid, NULL, false); + + touch_nmi_watchdog(); } - return (nr_pages); + + return nr_pages; } /*