From patchwork Wed Oct 17 23:54:08 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10646613
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7D4F112B
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:12 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D916D285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:12 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CCAE7285CE; Wed, 17 Oct 2018 23:54:12 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33D21285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:12 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 347BF6B000A; Wed, 17 Oct 2018 19:54:11 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2D0396B000D; Wed, 17 Oct 2018 19:54:11 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 173C46B026C; Wed, 17 Oct 2018 19:54:11 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com
 [209.85.214.197])
	by kanga.kvack.org (Postfix) with ESMTP id C880E6B000A
	for <linux-mm@kvack.org>; Wed, 17 Oct 2018 19:54:10 -0400 (EDT)
Received: by mail-pl1-f197.google.com with SMTP id f17-v6so22368270plr.1
        for <linux-mm@kvack.org>; Wed, 17 Oct 2018 16:54:10 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=OIbxSgYz1QauRJ5KpMHPxP86VVk0hXfD5aRBBonaBJ8=;
        b=ZIIUkFeF15kbWSQG8EFC6EuGFl625bP7d07lH1ZU/QqEbvs0Co9nQZQ4sVSEAqH3Ok
         cK+m3utD+unUDgQBAtEjxgTMBQunEQe/TCqlgAjM163WCZZtiX2xRTiivT6Jwb9+9VbO
         XWxQRBO7jylaJzRMbsuN1O0I4h6Ie1tt0y7jhIS9oNgSP5EdaUhtvG9xE1sU7zxohC+L
         t2EHs9ZLzsMS76dU1xpC1mbeqKz7rInUiQZ+n1k61emj0s2s91q3Ij65vKnmCkUnxBem
         4NEiAiuyRyxKByHTOlEro/hQ2e+4KheQusD+wPuhvLtz9ywEL64f+kTFQAe3sXBF506G
         23nw==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ABuFfohVNxgpFjylztOcgB4Az8Zwf12ZfeBg1983KX0mtvkxaQrCjKwr
	NSSVHb7TEZ18xyWJhLyfVCTQlwccXWJQeBLwg6W0+jsIlKyt3EqS8SybkWQBF/S/ptcFskgexEi
	lI97uAhuS66hIHSugajKRk0G6qKh/79M0Cfa+svQ+D3tSUVe+MjsK0ZPEIiCDJlbvwQ==
X-Received: by 2002:a63:5308:: with SMTP id
 h8-v6mr25571940pgb.358.1539820450461;
        Wed, 17 Oct 2018 16:54:10 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV63HHe4kR5JS+8NaQyKDo12NxqxBZcEZSz7YwHjCE9Gdua94zKROUXWos80lCi+GPQIWwTvg
X-Received: by 2002:a63:5308:: with SMTP id
 h8-v6mr25571909pgb.358.1539820449442;
        Wed, 17 Oct 2018 16:54:09 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1539820449; cv=none;
        d=google.com; s=arc-20160816;
        b=tquWB6glltp/7NtY41EwOAiwC3pfIyMh7cJj4XEjRTWvXz5qfasorV2YS88+Hnr4+P
         564P8Fs8fOgJ2xNfPE+KEeEsbBGwvRz/P2/m2jwJr9xGh3EoGiaOoUc2DQaP+YjeWV2d
         re5QBTGc/99Y5KFHep6D264YFuhpdx5kW3qtke36SumiVe0Re64SCOsYDAlnX3sgkI5M
         xBhHK8J87AN3INnZVsy4YMFOFs9OQr5cKEtRVSynEf4ajkoMfXsq3emckErrXrS16zp2
         6uUVf1pMRD3kMlJ//f3VOtDgXo3dQBwHpXaWFY8Qzc694q76uoF9Aldxx6UFdhZK/fBu
         Omcw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=OIbxSgYz1QauRJ5KpMHPxP86VVk0hXfD5aRBBonaBJ8=;
        b=twzJLvqDUxeAdvxJ9lNgQpWRyxXZL6OTHO+flvXwDAXxbrs7OEn3zJWEU2bQfbUvEj
         LLq2R5oufl38jg1CazzbKDDsbr9AEY2sYv+GSx845fXUgV1DHGWCmB0xNLTVKtV7cVyF
         KQrRS1ZPvK3lVQGM8qHRh0MFYEsO56vv56YJad+yqyJWcOftrDuhriMjgYq/3A6O160J
         XBNWrl5ne8D+XASbSZPJbryMxUPV62yhIRQzi+v/xy/QBhZeHcqDpMfDTQVtv5XJnkBE
         e+AZJzGuINe4RJJ1zhG7PqR47U3XIMWefaxw3VcT7oPYbJ6SyS2FyC/YmeHx0gG3DYag
         N6+Q==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65])
        by mx.google.com with ESMTPS id
 g12-v6si19758215pla.70.2018.10.17.16.54.09
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 17 Oct 2018 16:54:09 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted
 sender) client-ip=134.134.136.65;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.65 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2018 16:54:08 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,393,1534834800";
   d="scan'208";a="100366568"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain)
 ([10.7.198.154])
  by orsmga001.jf.intel.com with ESMTP; 17 Oct 2018 16:54:08 -0700
Subject: [mm PATCH v4 1/6] mm: Use mm_zero_struct_page from SPARC on all 64b
 architectures
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: pavel.tatashin@microsoft.com, mhocko@suse.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, linux-kernel@vger.kernel.org,
 willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com,
 khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz,
 sparclinux@vger.kernel.org, dan.j.williams@intel.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org,
 kirill.shutemov@linux.intel.com
Date: Wed, 17 Oct 2018 16:54:08 -0700
Message-ID: <20181017235408.17213.38641.stgit@localhost.localdomain>
In-Reply-To: <20181017235043.17213.92459.stgit@localhost.localdomain>
References: <20181017235043.17213.92459.stgit@localhost.localdomain>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This change makes it so that we use the same approach that was already in
use on Sparc on all the archtectures that support a 64b long.

This is mostly motivated by the fact that 7 to 10 store/move instructions
are likely always going to be faster than having to call into a function
that is not specialized for handling page init.

An added advantage to doing it this way is that the compiler can get away
with combining writes in the __init_single_page call. As a result the
memset call will be reduced to only about 4 write operations, or at least
that is what I am seeing with GCC 6.2 as the flags, LRU poitners, and
count/mapcount seem to be cancelling out at least 4 of the 8 assignments on
my system.

One change I had to make to the function was to reduce the minimum page
size to 56 to support some powerpc64 configurations.

This change should introduce no change on SPARC since it already had this
code. In the case of x86_64 I saw a reduction from 3.75s to 2.80s when
initializing 384GB of RAM per node. Pavel Tatashin tested on a system with
Broadcom's Stingray CPU and 48GB of RAM and found that __init_single_page()
takes 19.30ns / 64-byte struct page before this patch and with this patch
it takes 17.33ns / 64-byte struct page. Mike Rapoport ran a similar test on
a OpenPower (S812LC 8348-21C) with Power8 processor and 128GB or RAM. His
results per 64-byte struct page were 4.68ns before, and 4.59ns after this
patch.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 arch/sparc/include/asm/pgtable_64.h |   30 --------------------------
 include/linux/mm.h                  |   41 ++++++++++++++++++++++++++++++++---
 2 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 1393a8ac596b..22500c3be7a9 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -231,36 +231,6 @@
 extern struct page *mem_map_zero;
 #define ZERO_PAGE(vaddr)	(mem_map_zero)
 
-/* This macro must be updated when the size of struct page grows above 80
- * or reduces below 64.
- * The idea that compiler optimizes out switch() statement, and only
- * leaves clrx instructions
- */
-#define	mm_zero_struct_page(pp) do {					\
-	unsigned long *_pp = (void *)(pp);				\
-									\
-	 /* Check that struct page is either 64, 72, or 80 bytes */	\
-	BUILD_BUG_ON(sizeof(struct page) & 7);				\
-	BUILD_BUG_ON(sizeof(struct page) < 64);				\
-	BUILD_BUG_ON(sizeof(struct page) > 80);				\
-									\
-	switch (sizeof(struct page)) {					\
-	case 80:							\
-		_pp[9] = 0;	/* fallthrough */			\
-	case 72:							\
-		_pp[8] = 0;	/* fallthrough */			\
-	default:							\
-		_pp[7] = 0;						\
-		_pp[6] = 0;						\
-		_pp[5] = 0;						\
-		_pp[4] = 0;						\
-		_pp[3] = 0;						\
-		_pp[2] = 0;						\
-		_pp[1] = 0;						\
-		_pp[0] = 0;						\
-	}								\
-} while (0)
-
 /* PFNs are real physical page numbers.  However, mem_map only begins to record
  * per-page information starting at pfn_base.  This is to handle systems where
  * the first physical page in the machine is at some huge physical address,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fcf9cc9d535f..6e2c9631af05 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -98,10 +98,45 @@ static inline void set_max_mapnr(unsigned long limit) { }
 
 /*
  * On some architectures it is expensive to call memset() for small sizes.
- * Those architectures should provide their own implementation of "struct page"
- * zeroing by defining this macro in <asm/pgtable.h>.
+ * If an architecture decides to implement their own version of
+ * mm_zero_struct_page they should wrap the defines below in a #ifndef and
+ * define their own version of this macro in <asm/pgtable.h>
  */
-#ifndef mm_zero_struct_page
+#if BITS_PER_LONG == 64
+/* This function must be updated when the size of struct page grows above 80
+ * or reduces below 56. The idea that compiler optimizes out switch()
+ * statement, and only leaves move/store instructions. Also the compiler can
+ * combine write statments if they are both assignments and can be reordered,
+ * this can result in several of the writes here being dropped.
+ */
+#define	mm_zero_struct_page(pp) __mm_zero_struct_page(pp)
+static inline void __mm_zero_struct_page(struct page *page)
+{
+	unsigned long *_pp = (void *)page;
+
+	 /* Check that struct page is either 56, 64, 72, or 80 bytes */
+	BUILD_BUG_ON(sizeof(struct page) & 7);
+	BUILD_BUG_ON(sizeof(struct page) < 56);
+	BUILD_BUG_ON(sizeof(struct page) > 80);
+
+	switch (sizeof(struct page)) {
+	case 80:
+		_pp[9] = 0;	/* fallthrough */
+	case 72:
+		_pp[8] = 0;	/* fallthrough */
+	case 64:
+		_pp[7] = 0;	/* fallthrough */
+	case 56:
+		_pp[6] = 0;
+		_pp[5] = 0;
+		_pp[4] = 0;
+		_pp[3] = 0;
+		_pp[2] = 0;
+		_pp[1] = 0;
+		_pp[0] = 0;
+	}
+}
+#else
 #define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
 #endif
 

From patchwork Wed Oct 17 23:54:13 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10646615
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F38FE112B
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:18 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E2CC6285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:18 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D5E96285CE; Wed, 17 Oct 2018 23:54:18 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2AD4D285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:18 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 1E7446B026C; Wed, 17 Oct 2018 19:54:17 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1723F6B026D; Wed, 17 Oct 2018 19:54:17 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 013136B0271; Wed, 17 Oct 2018 19:54:16 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com
 [209.85.210.197])
	by kanga.kvack.org (Postfix) with ESMTP id AFFF36B026C
	for <linux-mm@kvack.org>; Wed, 17 Oct 2018 19:54:16 -0400 (EDT)
Received: by mail-pf1-f197.google.com with SMTP id v88-v6so28517342pfk.19
        for <linux-mm@kvack.org>; Wed, 17 Oct 2018 16:54:16 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=2e9ckMEA34h1xE924zQanZhWAV/FQTO6lp1Fsgzu5HU=;
        b=T2UETWZqM8WyR9Aq1Hj+uWe07cYJdRV1skEOPHn1V+ulCVmoKHDMle3tsjXTiHGaoJ
         XLYD7CavFGEoYm3HJODRrVdio2HSI4rQSFyIMeERSNq5dS+HQHO65NKlmRH7gc3twyVZ
         h/+kXJsBkqv/eZ2KxeMnR4Z9RlngMiQM26BSbT+IK+ODE4voepwKB3Jm6YVDRY2m6DA3
         C5cbLG1m6aARUnm9K2XvGhI1EQdQ3kLSdY8RUHUUUBkEItGV8H2S96KOWoUj5tMKqvdm
         7lJxdsS2HPblFG0WH387XzDdUNJAw7g1mV4cSE78hmzsMpUGO4XseCfiUU9Vwud0npOJ
         LyoA==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ABuFfoh1hRpk6bSvsMDZZtieA4g8t9iYQopA7VSAzT3CGR+aji94UFsx
	CfsVJUmdihvE5JT7vL1UOUcQD1BJpRiPUhRDiMyU1cKrffcAFq38AC0Bnvz+Dk21uH1U8LThxLH
	3rlrw1yUrjm0lmiX/MhQ7XT28jAIYddvHYoWPTR6EChtCEVbezZOWCN0HUqytFjuFwA==
X-Received: by 2002:a17:902:b78b:: with SMTP id
 e11-v6mr116277pls.139.1539820456377;
        Wed, 17 Oct 2018 16:54:16 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV63NUsnx+1vw1msa+9LCO9Yx1kLbONxNh6lw7GUAFnlZveoxlB7gVgkfmq8/WwT+s2jrNakk
X-Received: by 2002:a17:902:b78b:: with SMTP id
 e11-v6mr116244pls.139.1539820455252;
        Wed, 17 Oct 2018 16:54:15 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1539820455; cv=none;
        d=google.com; s=arc-20160816;
        b=zllM4BCsbN9VHW9dR51qCruzLumKUtYUD615jvKRrbhHiRKyJx4/i9fwS1EcTDKEeV
         HlnSkVK7uPi7O9zIRpTX1xykFb7nTFNsOgIfj/8IquTS+F2KswPvQ0sis/g6k1pvRBSn
         hG7NvLtv1z7aVJV2MaMjF2Yn/RC37wCOTGH3q8jNsIqPCItq/yv5yzt1SiuOUZ/u7gnE
         ClvNmI4Ffr2gnwlKq0H7xfUwzTdG6Cbvi5SSu/0I6dQw8qsWcmVpBXgpeiPjGioqLSNU
         Oa/EpIxrB4Mdv7yAeZt8uz3svf+waoif+GRdXTaceJE/tPEerrNYtHy89fEaIvO3gCh8
         Dppw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=2e9ckMEA34h1xE924zQanZhWAV/FQTO6lp1Fsgzu5HU=;
        b=wJ7JDiSRYidL65nSTmMl8yxpkPkRlvV7P/juyZsyJgHaz7FisVfyn3YZvXCRVuwVFj
         q+Xv08eJtcPqunP/EuuhduRB7ugf59kLD876ClSyF3FdBseLH9WR7otWuTtJFjT7cRyc
         6ShLaA0lBO+3E+VzXgRaHZ61iKH9QxE4ICuwmZNTKJMlOvU+DOKZecten1xRPkuta8Tu
         PptqmNrWsuxbr48Jga6G0ZNW+c5HiNliBTBFhND3gg9QCPq8P7zGrlnTq6W/uiBnEWmq
         fpWPrfBT/QEzC4n1WtQiaHHaU8Tvbn4EzFWmdgKABIP6BAxrp3BN0zxQN9/uVCDjeDL/
         MP4w==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120])
        by mx.google.com with ESMTPS id
 b128-v6si18510667pfg.94.2018.10.17.16.54.15
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 17 Oct 2018 16:54:15 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted
 sender) client-ip=192.55.52.120;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.120 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2018 16:54:14 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,393,1534834800";
   d="scan'208";a="100366587"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain)
 ([10.7.198.154])
  by orsmga001.jf.intel.com with ESMTP; 17 Oct 2018 16:54:14 -0700
Subject: [mm PATCH v4 2/6] mm: Drop meminit_pfn_in_nid as it is redundant
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: pavel.tatashin@microsoft.com, mhocko@suse.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, linux-kernel@vger.kernel.org,
 willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com,
 khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz,
 sparclinux@vger.kernel.org, dan.j.williams@intel.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org,
 kirill.shutemov@linux.intel.com
Date: Wed, 17 Oct 2018 16:54:13 -0700
Message-ID: <20181017235413.17213.73254.stgit@localhost.localdomain>
In-Reply-To: <20181017235043.17213.92459.stgit@localhost.localdomain>
References: <20181017235043.17213.92459.stgit@localhost.localdomain>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

As best as I can tell the meminit_pfn_in_nid call is completely redundant.
The deferred memory initialization is already making use of
for_each_free_mem_range which in turn will call into __next_mem_range which
will only return a memory range if it matches the node ID provided assuming
it is not NUMA_NO_NODE.

I am operating on the assumption that there are no zones or pgdata_t
structures that have a NUMA node of NUMA_NO_NODE associated with them. If
that is the case then __next_mem_range will never return a memory range
that doesn't match the zone's node ID and as such the check is redundant.

So one piece I would like to verfy on this is if this works for ia64.
Technically it was using a different approach to get the node ID, but it
seems to have the node ID also encoded into the memblock. So I am
assuming this is okay, but would like to get confirmation on that.

On my x86_64 test system with 384GB of memory per node I saw a reduction in
initialization time from 2.80s to 1.85s as a result of this patch.

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 mm/page_alloc.c |   50 ++++++++++++++------------------------------------
 1 file changed, 14 insertions(+), 36 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4bd858d1c3ba..a766a15fad81 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1301,36 +1301,22 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 #endif
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
-static inline bool __meminit __maybe_unused
-meminit_pfn_in_nid(unsigned long pfn, int node,
-		   struct mminit_pfnnid_cache *state)
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn, state);
+	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
 	if (nid >= 0 && nid != node)
 		return false;
 	return true;
 }
 
-/* Only safe to use early in boot when initialisation is single-threaded */
-static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
-	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
-}
-
 #else
-
 static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 {
 	return true;
 }
-static inline bool __meminit  __maybe_unused
-meminit_pfn_in_nid(unsigned long pfn, int node,
-		   struct mminit_pfnnid_cache *state)
-{
-	return true;
-}
 #endif
 
 
@@ -1459,21 +1445,13 @@ static inline void __init pgdat_init_report_one_done(void)
  *
  * Then, we check if a current large page is valid by only checking the validity
  * of the head pfn.
- *
- * Finally, meminit_pfn_in_nid is checked on systems where pfns can interleave
- * within a node: a pfn is between start and end of a node, but does not belong
- * to this memory node.
  */
-static inline bool __init
-deferred_pfn_valid(int nid, unsigned long pfn,
-		   struct mminit_pfnnid_cache *nid_init_state)
+static inline bool __init deferred_pfn_valid(unsigned long pfn)
 {
 	if (!pfn_valid_within(pfn))
 		return false;
 	if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn))
 		return false;
-	if (!meminit_pfn_in_nid(pfn, nid, nid_init_state))
-		return false;
 	return true;
 }
 
@@ -1481,15 +1459,14 @@ static inline void __init pgdat_init_report_one_done(void)
  * Free pages to buddy allocator. Try to free aligned pages in
  * pageblock_nr_pages sizes.
  */
-static void __init deferred_free_pages(int nid, int zid, unsigned long pfn,
+static void __init deferred_free_pages(unsigned long pfn,
 				       unsigned long end_pfn)
 {
-	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long nr_pgmask = pageblock_nr_pages - 1;
 	unsigned long nr_free = 0;
 
 	for (; pfn < end_pfn; pfn++) {
-		if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) {
+		if (!deferred_pfn_valid(pfn)) {
 			deferred_free_range(pfn - nr_free, nr_free);
 			nr_free = 0;
 		} else if (!(pfn & nr_pgmask)) {
@@ -1509,17 +1486,18 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn,
  * by performing it only once every pageblock_nr_pages.
  * Return number of pages initialized.
  */
-static unsigned long  __init deferred_init_pages(int nid, int zid,
+static unsigned long  __init deferred_init_pages(struct zone *zone,
 						 unsigned long pfn,
 						 unsigned long end_pfn)
 {
-	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long nr_pgmask = pageblock_nr_pages - 1;
+	int nid = zone_to_nid(zone);
 	unsigned long nr_pages = 0;
+	int zid = zone_idx(zone);
 	struct page *page = NULL;
 
 	for (; pfn < end_pfn; pfn++) {
-		if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) {
+		if (!deferred_pfn_valid(pfn)) {
 			page = NULL;
 			continue;
 		} else if (!page || !(pfn & nr_pgmask)) {
@@ -1582,12 +1560,12 @@ static int __init deferred_init_memmap(void *data)
 	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
 		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
 		epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
-		nr_pages += deferred_init_pages(nid, zid, spfn, epfn);
+		nr_pages += deferred_init_pages(zone, spfn, epfn);
 	}
 	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
 		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
 		epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
-		deferred_free_pages(nid, zid, spfn, epfn);
+		deferred_free_pages(spfn, epfn);
 	}
 	pgdat_resize_unlock(pgdat, &flags);
 
@@ -1676,7 +1654,7 @@ static int __init deferred_init_memmap(void *data)
 		while (spfn < epfn && nr_pages < nr_pages_needed) {
 			t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION);
 			first_deferred_pfn = min(t, epfn);
-			nr_pages += deferred_init_pages(nid, zid, spfn,
+			nr_pages += deferred_init_pages(zone, spfn,
 							first_deferred_pfn);
 			spfn = first_deferred_pfn;
 		}
@@ -1688,7 +1666,7 @@ static int __init deferred_init_memmap(void *data)
 	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
 		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
 		epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa));
-		deferred_free_pages(nid, zid, spfn, epfn);
+		deferred_free_pages(spfn, epfn);
 
 		if (first_deferred_pfn == epfn)
 			break;

From patchwork Wed Oct 17 23:54:19 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10646617
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B2B2A13A4
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:30 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A18B2285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:30 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 92504285CE; Wed, 17 Oct 2018 23:54:30 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C6327285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:24 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8A0186B0271; Wed, 17 Oct 2018 19:54:23 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 827506B0274; Wed, 17 Oct 2018 19:54:23 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6C7A46B0276; Wed, 17 Oct 2018 19:54:23 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com
 [209.85.215.198])
	by kanga.kvack.org (Postfix) with ESMTP id 183506B0271
	for <linux-mm@kvack.org>; Wed, 17 Oct 2018 19:54:23 -0400 (EDT)
Received: by mail-pg1-f198.google.com with SMTP id 17-v6so21101468pgs.18
        for <linux-mm@kvack.org>; Wed, 17 Oct 2018 16:54:23 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=+A5YqEG4L/mbWRZB0hAnS3btR9Jszinx5eo7Fxunq4w=;
        b=ANoK/YS9FYUdmJNMhy1Q1Xx5dZTHEHOol2qOkvxYPtO9jvf3BgF05wV0HANa+1Requ
         oKnEizik2DkF8XxAYghZdqoeSPX/i9oOGkqcfwEHPZDEajw2cSouAy6tNYaeXXPBK1IF
         9gjIkEeHXe+bQ2ZrPIoKy3lT0U5O8mXF0toXj91/cE1CvCoMEB9yFMUV7t5znFGQExJx
         SBsYqkaBrFVLB+SCtEJwEitvkeibd0+L3rdT082iKpD+z9w/Xf1M1RI+vegIQ/fm8O/0
         /mtQDVNyevqBcjQMrAX71+iT9guXstj9ZI+iLZYvw4IFXLWBrbFJyOhv7dsbDSeeUG4G
         ORRQ==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.43 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ABuFfojgzRxgGmlTVxkPaX9UxDeLqfbLzmEA62KG9BZe6u8fXe7BnQy5
	7TpluTtI8eNCvwNkxPuJ0A0ZqbaSvW6SNf4aFrhApBpOzkSp+ExAY0RgfAAWhqYIWcDYznaG98+
	6Vyrg7W7YNDOOiSSTF6dRLbmraCHNDwhapJ+aKsZufiLPs3IMYnlCCu9d3U0QPmoJww==
X-Received: by 2002:a62:6c42:: with SMTP id
 h63-v6mr518772pfc.176.1539820462729;
        Wed, 17 Oct 2018 16:54:22 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV60nHEr33Y+Q/LW2ZcUJEoXtBFqmHjX2XqmGC3jM+405Fbf6tVw+u4+A+b6AIDFVvVseOqfA
X-Received: by 2002:a62:6c42:: with SMTP id
 h63-v6mr518709pfc.176.1539820461223;
        Wed, 17 Oct 2018 16:54:21 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1539820461; cv=none;
        d=google.com; s=arc-20160816;
        b=sU2n/X4OYa1lMNfQDrAfpBu6QN0bu6Cffx20PuyJTSe1Knqa9w//lh/UJLzpSkEOR0
         CQEoG6TwF+uCY56DndjnQkQD+L0Mv65JWcZvpFKrzlBvg3A2TkXyaNN55gzUayfB6k5C
         5NjFUzoipsIeqs3Hi1MTpiJvixjOSE21uL9TXVWlEt398eOy3cXz33AuCc3A65Kafs0h
         aFqGUu8RyEbjJ7fu9f0kdaPbJE4KEJyOprBr70bnI1Lnp6v8l4V1Aoz0YB8wTWxJ736N
         WYkYUA3+K6n6Z8GeNea+FQalN7+eHpGRDci/t2veNd1E2FbghWwLIyfWCQngnpnMSBAg
         v7Pw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=+A5YqEG4L/mbWRZB0hAnS3btR9Jszinx5eo7Fxunq4w=;
        b=VjCcY+28ldbmyopdQoywLJOztCvrVaChPCLDF/IxCEscB3GZ2w4P6Cnw4KoS82LZDX
         wwNdR+jQIJJDk6N5/9Nituerc0kfWGijrslhnzAVa3OCmL7lX6Rhs3SUWDwhBp0wfGjS
         kwZRX9LkxJW3XrL7UhpU3w3JaKHlw9UMlLCaRGRDV35f7aCCEuvHgERtbX0OYWW+mS5w
         GPebj9C1ZFdiFa9P8Bv96IFF0KWWfytoyBoN9xOfBp6LVDGQKacLWNaGx5UdzGJ+25rH
         Prfr+N9S4bt/6AOpYoLV/BJthU9wOlVH1CDA02wkKg8B4bP0kClXwOJe9JAQCTbNTppx
         2v0A==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.43 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43])
        by mx.google.com with ESMTPS id
 e3-v6si12329315plt.245.2018.10.17.16.54.20
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 17 Oct 2018 16:54:21 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.43 as permitted
 sender) client-ip=192.55.52.43;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.43 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2018 16:54:20 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,393,1534834800";
   d="scan'208";a="100366607"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain)
 ([10.7.198.154])
  by orsmga001.jf.intel.com with ESMTP; 17 Oct 2018 16:54:19 -0700
Subject: [mm PATCH v4 3/6] mm: Use memblock/zone specific iterator for
 handling deferred page init
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: pavel.tatashin@microsoft.com, mhocko@suse.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, linux-kernel@vger.kernel.org,
 willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com,
 khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz,
 sparclinux@vger.kernel.org, dan.j.williams@intel.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org,
 kirill.shutemov@linux.intel.com
Date: Wed, 17 Oct 2018 16:54:19 -0700
Message-ID: <20181017235419.17213.68425.stgit@localhost.localdomain>
In-Reply-To: <20181017235043.17213.92459.stgit@localhost.localdomain>
References: <20181017235043.17213.92459.stgit@localhost.localdomain>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.

This iterator will take care of making sure a given memory range provided
is in fact contained within a zone. It takes are of all the bounds checking
we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
it should help to speed up the search a bit by iterating until the end of a
range is greater than the start of the zone pfn range, and will exit
completely if the start is beyond the end of the zone.

This patch adds yet another iterator called
for_each_free_mem_range_in_zone_from and then uses it to support
initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
By doing this we can greatly improve the cache locality of the pages while
we do several loops over them in the init and freeing process.

We are able to tighten the loops as a result since we only really need the
checks for first_init_pfn in our first iteration and after that we can
assume that all future values will be greater than this. So I have added a
function called deferred_init_mem_pfn_range_in_zone that primes the
iterators and if it fails we can just exit.

On my x86_64 test system with 384GB of memory per node I saw a reduction in
initialization time from 1.85s to 1.38s as a result of this patch.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/linux/memblock.h |   58 +++++++++++++++
 mm/memblock.c            |   63 ++++++++++++++++
 mm/page_alloc.c          |  176 ++++++++++++++++++++++++++++++++--------------
 3 files changed, 242 insertions(+), 55 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index aee299a6aa76..2ddd1bafdd03 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -178,6 +178,25 @@ void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
 			      p_start, p_end, p_nid))
 
 /**
+ * for_each_mem_range_from - iterate through memblock areas from type_a and not
+ * included in type_b. Or just type_a if type_b is NULL.
+ * @i: u64 used as loop variable
+ * @type_a: ptr to memblock_type to iterate
+ * @type_b: ptr to memblock_type which excludes from the iteration
+ * @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @flags: pick from blocks based on memory attributes
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ * @p_nid: ptr to int for nid of the range, can be %NULL
+ */
+#define for_each_mem_range_from(i, type_a, type_b, nid, flags,		\
+			   p_start, p_end, p_nid)			\
+	for (i = 0, __next_mem_range(&i, nid, flags, type_a, type_b,	\
+				     p_start, p_end, p_nid);		\
+	     i != (u64)ULLONG_MAX;					\
+	     __next_mem_range(&i, nid, flags, type_a, type_b,		\
+			      p_start, p_end, p_nid))
+/**
  * for_each_mem_range_rev - reverse iterate through memblock areas from
  * type_a and not included in type_b. Or just type_a if type_b is NULL.
  * @i: u64 used as loop variable
@@ -248,6 +267,45 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
 	     i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
+				  unsigned long *out_spfn,
+				  unsigned long *out_epfn);
+/**
+ * for_each_free_mem_range_in_zone - iterate through zone specific free
+ * memblock areas
+ * @i: u64 used as loop variable
+ * @zone: zone in which all of the memory blocks reside
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over free (memory && !reserved) areas of memblock in a specific
+ * zone. Available as soon as memblock is initialized.
+ */
+#define for_each_free_mem_pfn_range_in_zone(i, zone, p_start, p_end)	\
+	for (i = 0,							\
+	     __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end);	\
+	     i != (u64)ULLONG_MAX;					\
+	     __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end))
+
+/**
+ * for_each_free_mem_range_in_zone_from - iterate through zone specific
+ * free memblock areas from a given point
+ * @i: u64 used as loop variable
+ * @zone: zone in which all of the memory blocks reside
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over free (memory && !reserved) areas of memblock in a specific
+ * zone, continuing from current position. Available as soon as memblock is
+ * initialized.
+ */
+#define for_each_free_mem_pfn_range_in_zone_from(i, zone, p_start, p_end) \
+	for (; i != (u64)ULLONG_MAX;					  \
+	     __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end))
+
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index f2ef3915a356..ab3545e356b7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1239,6 +1239,69 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 	return 0;
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+/**
+ * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone()
+ *
+ * @idx: pointer to u64 loop variable
+ * @zone: zone in which all of the memory blocks reside
+ * @out_start: ptr to ulong for start pfn of the range, can be %NULL
+ * @out_end: ptr to ulong for end pfn of the range, can be %NULL
+ *
+ * This function is meant to be a zone/pfn specific wrapper for the
+ * for_each_mem_range type iterators. Specifically they are used in the
+ * deferred memory init routines and as such we were duplicating much of
+ * this logic throughout the code. So instead of having it in multiple
+ * locations it seemed like it would make more sense to centralize this to
+ * one new iterator that does everything they need.
+ */
+void __init_memblock
+__next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
+			     unsigned long *out_spfn, unsigned long *out_epfn)
+{
+	int zone_nid = zone_to_nid(zone);
+	phys_addr_t spa, epa;
+	int nid;
+
+	__next_mem_range(idx, zone_nid, MEMBLOCK_NONE,
+			 &memblock.memory, &memblock.reserved,
+			 &spa, &epa, &nid);
+
+	while (*idx != ULLONG_MAX) {
+		unsigned long epfn = PFN_DOWN(epa);
+		unsigned long spfn = PFN_UP(spa);
+
+		/*
+		 * Verify the end is at least past the start of the zone and
+		 * that we have at least one PFN to initialize.
+		 */
+		if (zone->zone_start_pfn < epfn && spfn < epfn) {
+			/* if we went too far just stop searching */
+			if (zone_end_pfn(zone) <= spfn)
+				break;
+
+			if (out_spfn)
+				*out_spfn = max(zone->zone_start_pfn, spfn);
+			if (out_epfn)
+				*out_epfn = min(zone_end_pfn(zone), epfn);
+
+			return;
+		}
+
+		__next_mem_range(idx, zone_nid, MEMBLOCK_NONE,
+				 &memblock.memory, &memblock.reserved,
+				 &spa, &epa, &nid);
+	}
+
+	/* signal end of iteration */
+	*idx = ULLONG_MAX;
+	if (out_spfn)
+		*out_spfn = ULONG_MAX;
+	if (out_epfn)
+		*out_epfn = 0;
+}
+
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 #ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
 unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a766a15fad81..20e9eb35d75d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1512,19 +1512,103 @@ static unsigned long  __init deferred_init_pages(struct zone *zone,
 	return (nr_pages);
 }
 
+/*
+ * This function is meant to pre-load the iterator for the zone init.
+ * Specifically it walks through the ranges until we are caught up to the
+ * first_init_pfn value and exits there. If we never encounter the value we
+ * return false indicating there are no valid ranges left.
+ */
+static bool __init
+deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone,
+				    unsigned long *spfn, unsigned long *epfn,
+				    unsigned long first_init_pfn)
+{
+	u64 j;
+
+	/*
+	 * Start out by walking through the ranges in this zone that have
+	 * already been initialized. We don't need to do anything with them
+	 * so we just need to flush them out of the system.
+	 */
+	for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) {
+		if (*epfn <= first_init_pfn)
+			continue;
+		if (*spfn < first_init_pfn)
+			*spfn = first_init_pfn;
+		*i = j;
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Initialize and free pages. We do it in two loops: first we initialize
+ * struct page, than free to buddy allocator, because while we are
+ * freeing pages we can access pages that are ahead (computing buddy
+ * page in __free_one_page()).
+ *
+ * In order to try and keep some memory in the cache we have the loop
+ * broken along max page order boundaries. This way we will not cause
+ * any issues with the buddy page computation.
+ */
+static unsigned long __init
+deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
+		       unsigned long *end_pfn)
+{
+	unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
+	unsigned long spfn = *start_pfn, epfn = *end_pfn;
+	unsigned long nr_pages = 0;
+	u64 j = *i;
+
+	/* First we loop through and initialize the page values */
+	for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) {
+		unsigned long t;
+
+		if (mo_pfn <= spfn)
+			break;
+
+		t = min(mo_pfn, epfn);
+		nr_pages += deferred_init_pages(zone, spfn, t);
+
+		if (mo_pfn <= epfn)
+			break;
+	}
+
+	/* Reset values and now loop through freeing pages as needed */
+	j = *i;
+
+	for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) {
+		unsigned long t;
+
+		if (mo_pfn <= *start_pfn)
+			break;
+
+		t = min(mo_pfn, *end_pfn);
+		deferred_free_pages(*start_pfn, t);
+		*start_pfn = t;
+
+		if (mo_pfn < *end_pfn)
+			break;
+	}
+
+	/* Store our current values to be reused on the next iteration */
+	*i = j;
+
+	return nr_pages;
+}
+
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
 	pg_data_t *pgdat = data;
-	int nid = pgdat->node_id;
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
+	unsigned long spfn = 0, epfn = 0, nr_pages = 0;
+	unsigned long first_init_pfn, flags;
 	unsigned long start = jiffies;
-	unsigned long nr_pages = 0;
-	unsigned long spfn, epfn, first_init_pfn, flags;
-	phys_addr_t spa, epa;
-	int zid;
 	struct zone *zone;
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 	u64 i;
+	int zid;
 
 	/* Bind memory initialisation thread to a local node if possible */
 	if (!cpumask_empty(cpumask))
@@ -1549,31 +1633,30 @@ static int __init deferred_init_memmap(void *data)
 		if (first_init_pfn < zone_end_pfn(zone))
 			break;
 	}
-	first_init_pfn = max(zone->zone_start_pfn, first_init_pfn);
+
+	/* If the zone is empty somebody else may have cleared out the zone */
+	if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
+						 first_init_pfn)) {
+		pgdat_resize_unlock(pgdat, &flags);
+		pgdat_init_report_one_done();
+		return 0;
+	}
 
 	/*
-	 * Initialize and free pages. We do it in two loops: first we initialize
-	 * struct page, than free to buddy allocator, because while we are
-	 * freeing pages we can access pages that are ahead (computing buddy
-	 * page in __free_one_page()).
+	 * Initialize and free pages in MAX_ORDER sized increments so
+	 * that we can avoid introducing any issues with the buddy
+	 * allocator.
 	 */
-	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
-		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
-		epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
-		nr_pages += deferred_init_pages(zone, spfn, epfn);
-	}
-	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
-		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
-		epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
-		deferred_free_pages(spfn, epfn);
-	}
+	while (spfn < epfn)
+		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+
 	pgdat_resize_unlock(pgdat, &flags);
 
 	/* Sanity check that the next zone really is unpopulated */
 	WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone));
 
-	pr_info("node %d initialised, %lu pages in %ums\n", nid, nr_pages,
-					jiffies_to_msecs(jiffies - start));
+	pr_info("node %d initialised, %lu pages in %ums\n",
+		pgdat->node_id,	nr_pages, jiffies_to_msecs(jiffies - start));
 
 	pgdat_init_report_one_done();
 	return 0;
@@ -1604,14 +1687,11 @@ static int __init deferred_init_memmap(void *data)
 static noinline bool __init
 deferred_grow_zone(struct zone *zone, unsigned int order)
 {
-	int zid = zone_idx(zone);
-	int nid = zone_to_nid(zone);
-	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
-	unsigned long nr_pages = 0;
-	unsigned long first_init_pfn, spfn, epfn, t, flags;
+	pg_data_t *pgdat = zone->zone_pgdat;
 	unsigned long first_deferred_pfn = pgdat->first_deferred_pfn;
-	phys_addr_t spa, epa;
+	unsigned long spfn, epfn, flags;
+	unsigned long nr_pages = 0;
 	u64 i;
 
 	/* Only the last zone may have deferred pages */
@@ -1640,37 +1720,23 @@ static int __init deferred_init_memmap(void *data)
 		return true;
 	}
 
-	first_init_pfn = max(zone->zone_start_pfn, first_deferred_pfn);
-
-	if (first_init_pfn >= pgdat_end_pfn(pgdat)) {
+	/* If the zone is empty somebody else may have cleared out the zone */
+	if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
+						 first_deferred_pfn)) {
 		pgdat_resize_unlock(pgdat, &flags);
-		return false;
+		return true;
 	}
 
-	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
-		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
-		epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
-
-		while (spfn < epfn && nr_pages < nr_pages_needed) {
-			t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION);
-			first_deferred_pfn = min(t, epfn);
-			nr_pages += deferred_init_pages(zone, spfn,
-							first_deferred_pfn);
-			spfn = first_deferred_pfn;
-		}
-
-		if (nr_pages >= nr_pages_needed)
-			break;
+	/*
+	 * Initialize and free pages in MAX_ORDER sized increments so
+	 * that we can avoid introducing any issues with the buddy
+	 * allocator.
+	 */
+	while (spfn < epfn && nr_pages < nr_pages_needed) {
+		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+		first_deferred_pfn = spfn;
 	}
 
-	for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
-		spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
-		epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa));
-		deferred_free_pages(spfn, epfn);
-
-		if (first_deferred_pfn == epfn)
-			break;
-	}
 	pgdat->first_deferred_pfn = first_deferred_pfn;
 	pgdat_resize_unlock(pgdat, &flags);
 

From patchwork Wed Oct 17 23:54:25 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10646619
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7BE85112B
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:35 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B6F0285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:35 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5D8A0285E8; Wed, 17 Oct 2018 23:54:35 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A863285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:29 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 40A816B0276; Wed, 17 Oct 2018 19:54:28 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 393376B0277; Wed, 17 Oct 2018 19:54:28 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 235DB6B0279; Wed, 17 Oct 2018 19:54:28 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com
 [209.85.215.197])
	by kanga.kvack.org (Postfix) with ESMTP id D374F6B0276
	for <linux-mm@kvack.org>; Wed, 17 Oct 2018 19:54:27 -0400 (EDT)
Received: by mail-pg1-f197.google.com with SMTP id 75-v6so30811pgc.13
        for <linux-mm@kvack.org>; Wed, 17 Oct 2018 16:54:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=d5nx1rpoV0xNSLb5pFDghDji7ZfJhkCKlQAQTdqQ55Q=;
        b=o6pO87Ucz2k2roCoFUycrmGJNPGB2rHl/wyn0tP3+J1LgFXFb7jfqZ66j920DoUB4E
         +u+geoGpgbCEoppex8l5Mgw4m7uF/eaVfVBWS5TOJnG4SvLSUcAyiMpvz0bDI2mK09jo
         I9ebjQqC7gPIxo4uxAcz69yc68p1Nv3R4ycppxHTvvmSVAjorAQM0HRRN9N3OU4WdGS1
         cGqLiC8zsYlIoJ2iL1fvR2lYByP1jc6U2iPzUnGSo03sMv5WTFqAAouXV5gQZE0WoUZu
         PExAFWToXyTI4UkigCrpZjCmgtzt06EAS1S0ao8+yEPx/5JbgnmTe585YEflBbBWPVFA
         qGSQ==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ABuFfogYr75ouN5Nex0pw0NLjYbEu0PX66M6bXEluqKuqkfJu66hapk7
	2jczBrmYm2B76W+DXUvErnfJZIHdldbVWx9Vjpaeqs3moqr7ZImtsDO4c80O29Gg0XMAnVM9QYo
	ncCKsr0i38e3ybrQmr36PyudJOTzIxQGzNYHDx8vhxqegdjk1RXEBqZAk44tqPPHByw==
X-Received: by 2002:a17:902:b945:: with SMTP id
 h5-v6mr28144474pls.61.1539820467510;
        Wed, 17 Oct 2018 16:54:27 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV60zI1YQQdTbbB8yKH9q5JyNSZjwX6SX5HW1Qq0+O7Jz1XwKNiz54hkMcfUYeIrtb2a+K97B
X-Received: by 2002:a17:902:b945:: with SMTP id
 h5-v6mr28144444pls.61.1539820466491;
        Wed, 17 Oct 2018 16:54:26 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1539820466; cv=none;
        d=google.com; s=arc-20160816;
        b=V8Mp2KtUZvuvTpiMZIZxI89Z7hLtInY9FFxFfLrOKYpFtZRoUZtTxLVqRrfiTmhHt7
         zaNEun5luOrtg4FFyot271s1PsOOQSk6AlpfY+1uc33XwtSoF/bnMwOo/HnkGB/LS4WR
         ukkeeRG02NSqVan5MqWnG/Q04g+S2qAXgLGddIT3HgUZXaK9eFFtC3Im1638E0q9X7Q0
         BnFxSLdZAfwb+6a0Lb8K0Jc3rZaLvYLLIpmOKUPNZmRM67QPkhC876ipnAJ70sVB/4rv
         5XUbrTBkmcCaRerTwbhE7XOQaVBrLWnnwFDWfYDoXiTiEYtNZD4jIrKjRS0HCOr8SV/S
         DY3A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=d5nx1rpoV0xNSLb5pFDghDji7ZfJhkCKlQAQTdqQ55Q=;
        b=qD7M8ARLIK2NtrYD3aCFT0VtNdcAdq73PpZ5ikYwXxPvugEB/zL4wVEAIk1icVGbnk
         nM87q4VGwnSrED/K7DiwxgPIUt/tpGuuEPMfXlZmUFjQjYnJNocu17orPEiEouzSnG4+
         t8YAZcZHP4Rqw0F14dkW1xg3JYfSsdY9v/VAJCPIK1ZwEs6JDVVGPRiV0Ui9ty4Fzhsf
         ckMO3NnY9eEjPQS7wMxZPKmIp0AftTmDeDKtWW0xRRHwsB+U5JNpN0kzdmtmLDPm216r
         yubcJFXfbtj8nEECP9ewiivrzQ12inZ3JbWmW/JVS+Cc1D82qXxPi6f6//HhmqRafyeM
         iE4Q==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31])
        by mx.google.com with ESMTPS id
 p9-v6si21357103pli.395.2018.10.17.16.54.26
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 17 Oct 2018 16:54:26 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) client-ip=134.134.136.31;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2018 16:54:26 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,393,1534834800";
   d="scan'208";a="100366620"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain)
 ([10.7.198.154])
  by orsmga001.jf.intel.com with ESMTP; 17 Oct 2018 16:54:25 -0700
Subject: [mm PATCH v4 4/6] mm: Move hot-plug specific memory init into
 separate functions and optimize
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: pavel.tatashin@microsoft.com, mhocko@suse.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, linux-kernel@vger.kernel.org,
 willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com,
 khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz,
 sparclinux@vger.kernel.org, dan.j.williams@intel.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org,
 kirill.shutemov@linux.intel.com
Date: Wed, 17 Oct 2018 16:54:25 -0700
Message-ID: <20181017235425.17213.28360.stgit@localhost.localdomain>
In-Reply-To: <20181017235043.17213.92459.stgit@localhost.localdomain>
References: <20181017235043.17213.92459.stgit@localhost.localdomain>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch is going through and combining the bits in memmap_init_zone and
memmap_init_zone_device that are related to hotplug into a single function
called __memmap_init_hotplug.

I also took the opportunity to integrate __init_single_page's functionality
into this function. In doing so I can get rid of some of the redundancy
such as the LRU pointers versus the pgmap.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 mm/page_alloc.c |  216 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 145 insertions(+), 71 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 20e9eb35d75d..a0b81e0bef03 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1192,6 +1192,92 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 #endif
 }
 
+static void __meminit __init_pageblock(unsigned long start_pfn,
+				       unsigned long nr_pages,
+				       unsigned long zone, int nid,
+				       struct dev_pagemap *pgmap)
+{
+	unsigned long nr_pgmask = pageblock_nr_pages - 1;
+	struct page *start_page = pfn_to_page(start_pfn);
+	unsigned long pfn = start_pfn + nr_pages - 1;
+#ifdef WANT_PAGE_VIRTUAL
+	bool is_highmem = is_highmem_idx(zone);
+#endif
+	struct page *page;
+
+	/*
+	 * Enforce the following requirements:
+	 * size > 0
+	 * size < pageblock_nr_pages
+	 * start_pfn -> pfn does not cross pageblock_nr_pages boundary
+	 */
+	VM_BUG_ON(((start_pfn ^ pfn) | (nr_pages - 1)) > nr_pgmask);
+
+	/*
+	 * Work from highest page to lowest, this way we will still be
+	 * warm in the cache when we call set_pageblock_migratetype
+	 * below.
+	 *
+	 * The loop is based around the page pointer as the main index
+	 * instead of the pfn because pfn is not used inside the loop if
+	 * the section number is not in page flags and WANT_PAGE_VIRTUAL
+	 * is not defined.
+	 */
+	for (page = start_page + nr_pages; page-- != start_page; pfn--) {
+		mm_zero_struct_page(page);
+
+		/*
+		 * We use the start_pfn instead of pfn in the set_page_links
+		 * call because of the fact that the pfn number is used to
+		 * get the section_nr and this function should not be
+		 * spanning more than a single section.
+		 */
+		set_page_links(page, zone, nid, start_pfn);
+		init_page_count(page);
+		page_mapcount_reset(page);
+		page_cpupid_reset_last(page);
+
+		/*
+		 * We can use the non-atomic __set_bit operation for setting
+		 * the flag as we are still initializing the pages.
+		 */
+		__SetPageReserved(page);
+
+		/*
+		 * ZONE_DEVICE pages union ->lru with a ->pgmap back
+		 * pointer and hmm_data.  It is a bug if a ZONE_DEVICE
+		 * page is ever freed or placed on a driver-private list.
+		 */
+		page->pgmap = pgmap;
+		if (!pgmap)
+			INIT_LIST_HEAD(&page->lru);
+
+#ifdef WANT_PAGE_VIRTUAL
+		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
+		if (!is_highmem)
+			set_page_address(page, __va(pfn << PAGE_SHIFT));
+#endif
+	}
+
+	/*
+	 * Mark the block movable so that blocks are reserved for
+	 * movable at startup. This will force kernel allocations
+	 * to reserve their blocks rather than leaking throughout
+	 * the address space during boot when many long-lived
+	 * kernel allocations are made.
+	 *
+	 * bitmap is created for zone's valid pfn range. but memmap
+	 * can be created for invalid pages (for alignment)
+	 * check here not to call set_pageblock_migratetype() against
+	 * pfn out of zone.
+	 *
+	 * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
+	 * because this is done early in sparse_add_one_section
+	 */
+	if (!(start_pfn & nr_pgmask))
+		set_pageblock_migratetype(start_page, MIGRATE_MOVABLE);
+}
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __meminit init_reserved_page(unsigned long pfn)
 {
@@ -5513,6 +5599,25 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
 	return false;
 }
 
+static void __meminit __memmap_init_hotplug(unsigned long size, int nid,
+					    unsigned long zone,
+					    unsigned long start_pfn,
+					    struct dev_pagemap *pgmap)
+{
+	unsigned long pfn = start_pfn + size;
+
+	while (pfn != start_pfn) {
+		unsigned long stride = pfn;
+
+		pfn = max(ALIGN_DOWN(pfn - 1, pageblock_nr_pages), start_pfn);
+		stride -= pfn;
+
+		__init_pageblock(pfn, stride, zone, nid, pgmap);
+
+		cond_resched();
+	}
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by memblock_free_all() once the early boot process is
@@ -5523,51 +5628,61 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		struct vmem_altmap *altmap)
 {
 	unsigned long pfn, end_pfn = start_pfn + size;
-	struct page *page;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
+	if (context == MEMMAP_HOTPLUG) {
 #ifdef CONFIG_ZONE_DEVICE
-	/*
-	 * Honor reservation requested by the driver for this ZONE_DEVICE
-	 * memory. We limit the total number of pages to initialize to just
-	 * those that might contain the memory mapping. We will defer the
-	 * ZONE_DEVICE page initialization until after we have released
-	 * the hotplug lock.
-	 */
-	if (zone == ZONE_DEVICE) {
-		if (!altmap)
-			return;
+		/*
+		 * Honor reservation requested by the driver for this
+		 * ZONE_DEVICE memory. We limit the total number of pages to
+		 * initialize to just those that might contain the memory
+		 * mapping. We will defer the ZONE_DEVICE page initialization
+		 * until after we have released the hotplug lock.
+		 */
+		if (zone == ZONE_DEVICE) {
+			if (!altmap)
+				return;
+
+			if (start_pfn == altmap->base_pfn)
+				start_pfn += altmap->reserve;
+			end_pfn = altmap->base_pfn +
+				  vmem_altmap_offset(altmap);
+		}
+#endif
+		/*
+		 * For these ZONE_DEVICE pages we don't need to record the
+		 * pgmap as they should represent only those pages used to
+		 * store the memory map. The actual ZONE_DEVICE pages will
+		 * be initialized later.
+		 */
+		__memmap_init_hotplug(end_pfn - start_pfn, nid, zone,
+				      start_pfn, NULL);
 
-		if (start_pfn == altmap->base_pfn)
-			start_pfn += altmap->reserve;
-		end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
+		return;
 	}
-#endif
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		struct page *page;
+
 		/*
 		 * There can be holes in boot-time mem_map[]s handed to this
 		 * function.  They do not exist on hotplugged memory.
 		 */
-		if (context == MEMMAP_EARLY) {
-			if (!early_pfn_valid(pfn)) {
-				pfn = next_valid_pfn(pfn) - 1;
-				continue;
-			}
-			if (!early_pfn_in_nid(pfn, nid))
-				continue;
-			if (overlap_memmap_init(zone, &pfn))
-				continue;
-			if (defer_init(nid, pfn, end_pfn))
-				break;
+		if (!early_pfn_valid(pfn)) {
+			pfn = next_valid_pfn(pfn) - 1;
+			continue;
 		}
+		if (!early_pfn_in_nid(pfn, nid))
+			continue;
+		if (overlap_memmap_init(zone, &pfn))
+			continue;
+		if (defer_init(nid, pfn, end_pfn))
+			break;
 
 		page = pfn_to_page(pfn);
 		__init_single_page(page, pfn, zone, nid);
-		if (context == MEMMAP_HOTPLUG)
-			__SetPageReserved(page);
 
 		/*
 		 * Mark the block movable so that blocks are reserved for
@@ -5594,7 +5709,6 @@ void __ref memmap_init_zone_device(struct zone *zone,
 				   unsigned long size,
 				   struct dev_pagemap *pgmap)
 {
-	unsigned long pfn, end_pfn = start_pfn + size;
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long zone_idx = zone_idx(zone);
 	unsigned long start = jiffies;
@@ -5610,53 +5724,13 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	 */
 	if (pgmap->altmap_valid) {
 		struct vmem_altmap *altmap = &pgmap->altmap;
+		unsigned long end_pfn = start_pfn + size;
 
 		start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
 		size = end_pfn - start_pfn;
 	}
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-		struct page *page = pfn_to_page(pfn);
-
-		__init_single_page(page, pfn, zone_idx, nid);
-
-		/*
-		 * Mark page reserved as it will need to wait for onlining
-		 * phase for it to be fully associated with a zone.
-		 *
-		 * We can use the non-atomic __set_bit operation for setting
-		 * the flag as we are still initializing the pages.
-		 */
-		__SetPageReserved(page);
-
-		/*
-		 * ZONE_DEVICE pages union ->lru with a ->pgmap back
-		 * pointer and hmm_data.  It is a bug if a ZONE_DEVICE
-		 * page is ever freed or placed on a driver-private list.
-		 */
-		page->pgmap = pgmap;
-		page->hmm_data = 0;
-
-		/*
-		 * Mark the block movable so that blocks are reserved for
-		 * movable at startup. This will force kernel allocations
-		 * to reserve their blocks rather than leaking throughout
-		 * the address space during boot when many long-lived
-		 * kernel allocations are made.
-		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
-		 *
-		 * Please note that MEMMAP_HOTPLUG path doesn't clear memmap
-		 * because this is done early in sparse_add_one_section
-		 */
-		if (!(pfn & (pageblock_nr_pages - 1))) {
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-			cond_resched();
-		}
-	}
+	__memmap_init_hotplug(size, nid, zone_idx, start_pfn, pgmap);
 
 	pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
 		size, jiffies_to_msecs(jiffies - start));

From patchwork Wed Oct 17 23:54:31 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10646621
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9305613A4
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:36 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82641285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:36 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 768F9285C7; Wed, 17 Oct 2018 23:54:36 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D780285CE
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:35 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 481FD6B0279; Wed, 17 Oct 2018 19:54:34 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 40A866B027B; Wed, 17 Oct 2018 19:54:34 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 2A88C6B027C; Wed, 17 Oct 2018 19:54:34 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-yw1-f71.google.com (mail-yw1-f71.google.com
 [209.85.161.71])
	by kanga.kvack.org (Postfix) with ESMTP id EA7A16B0279
	for <linux-mm@kvack.org>; Wed, 17 Oct 2018 19:54:33 -0400 (EDT)
Received: by mail-yw1-f71.google.com with SMTP id c123-v6so11122638ywf.9
        for <linux-mm@kvack.org>; Wed, 17 Oct 2018 16:54:33 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=12zZk11cx7CQ1HxVNZBiBumXtMxabIabSrImjcrcurI=;
        b=QYKjYB827e0jmhUTmUg3qytHwMDPP8HH7Qv+iYudewID2vYefjc9/eX3/eg/iHIkBo
         hyGIm3T50mB2m72IRNnuWKC6nFtC67tdBwnggvCso6QjcVwC1IZvq9zQomGEtMzxor0k
         XEkqpnrhL98nNqMdKnAMpig2UDZooxbvx6Dot6KXKRR8qgSA1kcYM63QtvtMSU035uaF
         NbIIKqfEmNH/N/ThsY9Xc1XpM+lCYYc7p80SR7cA9XEIko2j9drf19zTT8TfYt/pzVQ5
         8q6eq/HkwwuASvLfR98jHbbq/SKxuJSoMfU2iVBo5giWvy87GJos21Q7kbyDZeqm+kKg
         Bkrw==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ABuFfoh1ZZoXvCl4haOQh4T7sTSAqCWucepopzSWVuiF52mnqHM4YIY5
	qxLUgcOzeMWcaWQlLhB1PGA8QC+Gz/lHwhqxXMdxpfYUtDPHRsRaxQ32axN2TfD5KiuVgxcMnQq
	pWJpp5MMAjNCBkR9ZZL209z/bjITZrGau3U1WaiJnvvOVr8wCgV7725ieW24BpOLi2A==
X-Received: by 2002:a81:b2c9:: with SMTP id
 q192-v6mr16565496ywh.223.1539820473655;
        Wed, 17 Oct 2018 16:54:33 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV62U9ZXJiqpNpQhzsUUTMEsB4EMdkcQMdzxyBPmB+h2VIdRQdDfe6w0r8dSWAGpOzl42Yh98
X-Received: by 2002:a81:b2c9:: with SMTP id
 q192-v6mr16565475ywh.223.1539820472898;
        Wed, 17 Oct 2018 16:54:32 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1539820472; cv=none;
        d=google.com; s=arc-20160816;
        b=Y8J6TLZ8rGx0MIJvjBY3u59+GMLM/iX4NgnBI5AdmyqYvVf0KI+HLFg7uGj9s3axxi
         TXutjn5VUQJIMYqxKQQdHabn2t0IIXL0bU3shsy6ZFqCrkozyKUn+jFQ5SZpwcU9V9Jz
         vge3i3bfX/k3PRJNfRtkfwOmL3GNO3dhnTz9l/nHJvMOGYCqrrupddwFsZJrx/+26ZSZ
         /f96LtMcRxzkyX1JSxI2SjOLR3kwwh25NsEsb94SBACgFxYgSFQ+j5onvMD93/w+kYot
         0nqEEhk14MSJndzFEuHHNA4zcI9G5dqKNRe4qcm56a3v+dnNtmpRHdN51bZJxwGQ1JdD
         2M9w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=12zZk11cx7CQ1HxVNZBiBumXtMxabIabSrImjcrcurI=;
        b=vwehc7jKtEBVVmUqGf3R1Cpuxex60lHWy2GOU4j8dt08ENhi8356uQd+eSYNHt41Qy
         Fl11RXYTV+BSClFlxK5Xqq16hIXdpgauVgVyaqQXLMVLMkXetxOnEPIPbuwSTQHNycQK
         MsJXy4CSTJnoJ+TDs+SIM1pk017IkbnFETEAtNodfbhx2IEKdxQ8D00FdNhCwK2b/ojR
         Kvn5SZ5VhmIQ7jPP3pKg6Y/fERGz7fxP9mTsxR8fJz6S7gEQ1AKTvCGn+Uv7nGzoU0IX
         YlEOwvsJsiqarZpZmNtanpm5ZT/g6u1sGQXDzD6L9hjLb5T3l9Nk7MA/Qj/Vn5ocmICa
         vvuQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31])
        by mx.google.com with ESMTPS id
 a142-v6si6680026ywe.179.2018.10.17.16.54.32
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 17 Oct 2018 16:54:32 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) client-ip=134.134.136.31;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.31 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2018 16:54:31 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,393,1534834800";
   d="scan'208";a="100366643"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain)
 ([10.7.198.154])
  by orsmga001.jf.intel.com with ESMTP; 17 Oct 2018 16:54:31 -0700
Subject: [mm PATCH v4 5/6] mm: Add reserved flag setting to set_page_links
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: pavel.tatashin@microsoft.com, mhocko@suse.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, linux-kernel@vger.kernel.org,
 willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com,
 khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz,
 sparclinux@vger.kernel.org, dan.j.williams@intel.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org,
 kirill.shutemov@linux.intel.com
Date: Wed, 17 Oct 2018 16:54:31 -0700
Message-ID: <20181017235431.17213.11512.stgit@localhost.localdomain>
In-Reply-To: <20181017235043.17213.92459.stgit@localhost.localdomain>
References: <20181017235043.17213.92459.stgit@localhost.localdomain>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch modifies the set_page_links function to include the setting of
the reserved flag via a simple AND and OR operation. The motivation for
this is the fact that the existing __set_bit call still seems to have
effects on performance as replacing the call with the AND and OR can reduce
initialization time.

Looking over the assembly code before and after the change the main
difference between the two is that the reserved bit is stored in a value
that is generated outside of the main initialization loop and is then
written with the other flags field values in one write to the page->flags
value. Previously the generated value was written and then then a btsq
instruction was issued.

On my x86_64 test system with 3TB of persistent memory per node I saw the
persistent memory initialization time on average drop from 23.49s to
19.12s per node.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/linux/mm.h |    9 ++++++++-
 mm/page_alloc.c    |   29 +++++++++++++++++++----------
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6e2c9631af05..14d06d7d2986 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1171,11 +1171,18 @@ static inline void set_page_node(struct page *page, unsigned long node)
 	page->flags |= (node & NODES_MASK) << NODES_PGSHIFT;
 }
 
+static inline void set_page_reserved(struct page *page, bool reserved)
+{
+	page->flags &= ~(1ul << PG_reserved);
+	page->flags |= (unsigned long)(!!reserved) << PG_reserved;
+}
+
 static inline void set_page_links(struct page *page, enum zone_type zone,
-	unsigned long node, unsigned long pfn)
+	unsigned long node, unsigned long pfn, bool reserved)
 {
 	set_page_zone(page, zone);
 	set_page_node(page, node);
+	set_page_reserved(page, reserved);
 #ifdef SECTION_IN_PAGE_FLAGS
 	set_page_section(page, pfn_to_section_nr(pfn));
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a0b81e0bef03..e7fee7a5f8a3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1179,7 +1179,7 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
 	mm_zero_struct_page(page);
-	set_page_links(page, zone, nid, pfn);
+	set_page_links(page, zone, nid, pfn, false);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
@@ -1195,7 +1195,8 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 static void __meminit __init_pageblock(unsigned long start_pfn,
 				       unsigned long nr_pages,
 				       unsigned long zone, int nid,
-				       struct dev_pagemap *pgmap)
+				       struct dev_pagemap *pgmap,
+				       bool is_reserved)
 {
 	unsigned long nr_pgmask = pageblock_nr_pages - 1;
 	struct page *start_page = pfn_to_page(start_pfn);
@@ -1231,19 +1232,16 @@ static void __meminit __init_pageblock(unsigned long start_pfn,
 		 * call because of the fact that the pfn number is used to
 		 * get the section_nr and this function should not be
 		 * spanning more than a single section.
+		 *
+		 * We can use a non-atomic operation for setting the
+		 * PG_reserved flag as we are still initializing the pages.
 		 */
-		set_page_links(page, zone, nid, start_pfn);
+		set_page_links(page, zone, nid, start_pfn, is_reserved);
 		init_page_count(page);
 		page_mapcount_reset(page);
 		page_cpupid_reset_last(page);
 
 		/*
-		 * We can use the non-atomic __set_bit operation for setting
-		 * the flag as we are still initializing the pages.
-		 */
-		__SetPageReserved(page);
-
-		/*
 		 * ZONE_DEVICE pages union ->lru with a ->pgmap back
 		 * pointer and hmm_data.  It is a bug if a ZONE_DEVICE
 		 * page is ever freed or placed on a driver-private list.
@@ -5612,7 +5610,18 @@ static void __meminit __memmap_init_hotplug(unsigned long size, int nid,
 		pfn = max(ALIGN_DOWN(pfn - 1, pageblock_nr_pages), start_pfn);
 		stride -= pfn;
 
-		__init_pageblock(pfn, stride, zone, nid, pgmap);
+		/*
+		 * The last argument of __init_pageblock is a boolean
+		 * value indicating if the page will be marked as reserved.
+		 *
+		 * Mark page reserved as it will need to wait for onlining
+		 * phase for it to be fully associated with a zone.
+		 *
+		 * Under certain circumstances ZONE_DEVICE pages may not
+		 * need to be marked as reserved, however there is still
+		 * code that is depending on this being set for now.
+		 */
+		__init_pageblock(pfn, stride, zone, nid, pgmap, true);
 
 		cond_resched();
 	}

From patchwork Wed Oct 17 23:54:36 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10646623
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB79113A4
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:41 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98356285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:41 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 8BC26285CE; Wed, 17 Oct 2018 23:54:41 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFEFF285A2
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Wed, 17 Oct 2018 23:54:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id A9DDF6B027C; Wed, 17 Oct 2018 19:54:39 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A48ED6B027E; Wed, 17 Oct 2018 19:54:39 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 8C5136B0280; Wed, 17 Oct 2018 19:54:39 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com
 [209.85.215.198])
	by kanga.kvack.org (Postfix) with ESMTP id 46F2B6B027C
	for <linux-mm@kvack.org>; Wed, 17 Oct 2018 19:54:39 -0400 (EDT)
Received: by mail-pg1-f198.google.com with SMTP id o18-v6so21264048pgv.14
        for <linux-mm@kvack.org>; Wed, 17 Oct 2018 16:54:39 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=bihAXs1i9XRIpTlw+e26Z6O8KNcgXHgKZW9dMZxzIao=;
        b=CgvAkEDPyQAiBnbSG+BS6dyPpHcTF0LklhExjwCFCGPV6DR4R3FLGw/CEKuHZiIyeq
         60VzeQ+J2bMLHBQmpgSYW4KNeqYR9wn/kh4BGJU4U5s3BRa3Kh0xp8XgevLbxX5qHqN6
         aa3autbGB/VFxeD251+lIBQD9kkqU4yBeoLsU4PzM1Yblmggtb3qhh/cqMNncBxfBPT8
         wsXqG25xKzdZ3JX/Jqs3uaKQY5pV0YQzFMZGEsK5DFr96TMp48FLmTy21Ol98wq+Xgvh
         jvlmIOVp42wb2ke10d+hDDTIOQN3rDJ/NaUsVYdSVRQXvnpg7MpZqPrCKCRMbSmwOjL7
         rAaQ==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.24 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ABuFfohTJzhDOViek3JF0Z4HumA/QCSTadegZ8QVoEOFsBHqn4eMLj5P
	AXTiZ0UNSZXFd8QRSIWGjT34UJMl8LE1CI1FNDqbA5E3wu8oZAezFW+Ak2pXDolypsy56RWLToK
	+ngfyrZFwte9U+90lX654YCLL/kqrOVHSFCubmERZ3ONUjO5Xln7+JgZACChufmwtrQ==
X-Received: by 2002:a63:d70c:: with SMTP id
 d12-v6mr25903666pgg.110.1539820478946;
        Wed, 17 Oct 2018 16:54:38 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV600SXNIy+u9FAsLfATOnqTL4NttdvDOejPa2KVst82ZoOzYO6/kCGuER+tQjess+frj6XnS
X-Received: by 2002:a63:d70c:: with SMTP id
 d12-v6mr25903631pgg.110.1539820478024;
        Wed, 17 Oct 2018 16:54:38 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1539820477; cv=none;
        d=google.com; s=arc-20160816;
        b=YpdqvdlQCN0Wcklp2B6FAPocBchAhkH/vttm+xxIWVYMoKwNpTDNu+LuTkk+qEabGC
         Bw3J47XsRsoM+935KwgV1J0Hx9FclXqnd1rsBMoQCl7K+nDMirAtuFQVvy1i9SZhQ1zm
         nqEkX0AMPKi+/x/UAdCOPtlhIYXjUnO9IhIX/NDkMXSeQZGFhXa5mGJSAwbnrBX/QTuq
         WqwGMnBSv3l5s/Tnb7TaV3nnr3nqjJg5VmdJj11PMIRGxKFAQFMLNIoCtTJ3VCNVX7LH
         qVtx1Kbk8K3i9D+ImKWjdvbT8QKALCdz5QKRPJd2E0EoSTAvG6Hgct7jGozWFUOkF6sn
         lWJw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=bihAXs1i9XRIpTlw+e26Z6O8KNcgXHgKZW9dMZxzIao=;
        b=F945BAmmiDTvir6z9mXQ5EYPL9i2/BvlXYSw1Q58vhyV6PSA9YIZ9Fdpz6et2PUiF4
         EQlj7sP6jQMndJ3QaEayi+3rWVUbbTOmWP3mwCIB3tOz+bzWLPWovvDNQ4vnwp2FzCUO
         vL/5800wM4j/HvgJPQcATz7+CTNtOibOokx32wkLPWyHz2MF+2aKZpMhbXUbQpFPMyJb
         a55+/AlzXwbvXsc5aZTjRrzp28vJIwHaPGYWGnbA2JPhJI2SYRzcs1wKVaoOq7Y1s13Z
         xDWlYiV2zIUN5ANheFOMnquNtP6vRZvWIoR9GKvjJXvYSpR1harfC6la9XaK/ExXRkRz
         ZE/g==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.24 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24])
        by mx.google.com with ESMTPS id
 v17-v6si19533626pgn.108.2018.10.17.16.54.37
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 17 Oct 2018 16:54:37 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.24 as permitted
 sender) client-ip=134.134.136.24;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 134.134.136.24 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2018 16:54:37 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,393,1534834800";
   d="scan'208";a="100366649"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain)
 ([10.7.198.154])
  by orsmga001.jf.intel.com with ESMTP; 17 Oct 2018 16:54:37 -0700
Subject: [mm PATCH v4 6/6] mm: Use common iterator for deferred_init_pages
 and deferred_free_pages
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: pavel.tatashin@microsoft.com, mhocko@suse.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, linux-kernel@vger.kernel.org,
 willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com,
 khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz,
 sparclinux@vger.kernel.org, dan.j.williams@intel.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org,
 kirill.shutemov@linux.intel.com
Date: Wed, 17 Oct 2018 16:54:36 -0700
Message-ID: <20181017235436.17213.15091.stgit@localhost.localdomain>
In-Reply-To: <20181017235043.17213.92459.stgit@localhost.localdomain>
References: <20181017235043.17213.92459.stgit@localhost.localdomain>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch creates a common iterator to be used by both deferred_init_pages
and deferred_free_pages. By doing this we can cut down a bit on code
overhead as they will likely both be inlined into the same function anyway.

This new approach allows deferred_init_pages to make use of
__init_pageblock. By doing this we can cut down on the code size by sharing
code between both the hotplug and deferred memory init code paths.

An additional benefit to this approach is that we improve in cache locality
of the memory init as we can focus on the memory areas related to
identifying if a given PFN is valid and keep that warm in the cache until
we transition to a region of a different type. So we will stream through a
chunk of valid blocks before we turn to initializing page structs.

On my x86_64 test system with 384GB of memory per node I saw a reduction in
initialization time from 1.38s to 1.06s as a result of this patch.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 mm/page_alloc.c |  134 +++++++++++++++++++++++++++----------------------------
 1 file changed, 65 insertions(+), 69 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e7fee7a5f8a3..f47d02e42cf7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1484,32 +1484,6 @@ void clear_zone_contiguous(struct zone *zone)
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-static void __init deferred_free_range(unsigned long pfn,
-				       unsigned long nr_pages)
-{
-	struct page *page;
-	unsigned long i;
-
-	if (!nr_pages)
-		return;
-
-	page = pfn_to_page(pfn);
-
-	/* Free a large naturally-aligned chunk if possible */
-	if (nr_pages == pageblock_nr_pages &&
-	    (pfn & (pageblock_nr_pages - 1)) == 0) {
-		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-		__free_pages_core(page, pageblock_order);
-		return;
-	}
-
-	for (i = 0; i < nr_pages; i++, page++, pfn++) {
-		if ((pfn & (pageblock_nr_pages - 1)) == 0)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-		__free_pages_core(page, 0);
-	}
-}
-
 /* Completion tracking for deferred_init_memmap() threads */
 static atomic_t pgdat_init_n_undone __initdata;
 static __initdata DECLARE_COMPLETION(pgdat_init_all_done_comp);
@@ -1521,48 +1495,77 @@ static inline void __init pgdat_init_report_one_done(void)
 }
 
 /*
- * Returns true if page needs to be initialized or freed to buddy allocator.
+ * Returns count if page range needs to be initialized or freed
  *
- * First we check if pfn is valid on architectures where it is possible to have
- * holes within pageblock_nr_pages. On systems where it is not possible, this
- * function is optimized out.
+ * First, we check if a current large page is valid by only checking the
+ * validity of the head pfn.
  *
- * Then, we check if a current large page is valid by only checking the validity
- * of the head pfn.
+ * Then we check if the contiguous pfns are valid on architectures where it
+ * is possible to have holes within pageblock_nr_pages. On systems where it
+ * is not possible, this function is optimized out.
  */
-static inline bool __init deferred_pfn_valid(unsigned long pfn)
+static unsigned long __next_pfn_valid_range(unsigned long *i,
+					    unsigned long end_pfn)
 {
-	if (!pfn_valid_within(pfn))
-		return false;
-	if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn))
-		return false;
-	return true;
+	unsigned long pfn = *i;
+	unsigned long count;
+
+	while (pfn < end_pfn) {
+		unsigned long t = ALIGN(pfn + 1, pageblock_nr_pages);
+		unsigned long pageblock_pfn = min(t, end_pfn);
+
+#ifndef CONFIG_HOLES_IN_ZONE
+		count = pageblock_pfn - pfn;
+		pfn = pageblock_pfn;
+		if (!pfn_valid(pfn))
+			continue;
+#else
+		for (count = 0; pfn < pageblock_pfn; pfn++) {
+			if (pfn_valid_within(pfn)) {
+				count++;
+				continue;
+			}
+
+			if (count)
+				break;
+		}
+
+		if (!count)
+			continue;
+#endif
+		*i = pfn;
+		return count;
+	}
+
+	return 0;
 }
 
+#define for_each_deferred_pfn_valid_range(i, start_pfn, end_pfn, pfn, count) \
+	for (i = (start_pfn),						     \
+	     count = __next_pfn_valid_range(&i, (end_pfn));		     \
+	     count && ({ pfn = i - count; 1; });			     \
+	     count = __next_pfn_valid_range(&i, (end_pfn)))
 /*
  * Free pages to buddy allocator. Try to free aligned pages in
  * pageblock_nr_pages sizes.
  */
-static void __init deferred_free_pages(unsigned long pfn,
+static void __init deferred_free_pages(unsigned long start_pfn,
 				       unsigned long end_pfn)
 {
-	unsigned long nr_pgmask = pageblock_nr_pages - 1;
-	unsigned long nr_free = 0;
-
-	for (; pfn < end_pfn; pfn++) {
-		if (!deferred_pfn_valid(pfn)) {
-			deferred_free_range(pfn - nr_free, nr_free);
-			nr_free = 0;
-		} else if (!(pfn & nr_pgmask)) {
-			deferred_free_range(pfn - nr_free, nr_free);
-			nr_free = 1;
-			touch_nmi_watchdog();
+	unsigned long i, pfn, count;
+
+	for_each_deferred_pfn_valid_range(i, start_pfn, end_pfn, pfn, count) {
+		struct page *page = pfn_to_page(pfn);
+
+		if (count == pageblock_nr_pages) {
+			__free_pages_core(page, pageblock_order);
 		} else {
-			nr_free++;
+			while (count--)
+				__free_pages_core(page++, 0);
 		}
+
+		touch_nmi_watchdog();
 	}
-	/* Free the last block of pages to allocator */
-	deferred_free_range(pfn - nr_free, nr_free);
 }
 
 /*
@@ -1571,29 +1574,22 @@ static void __init deferred_free_pages(unsigned long pfn,
  * Return number of pages initialized.
  */
 static unsigned long  __init deferred_init_pages(struct zone *zone,
-						 unsigned long pfn,
+						 unsigned long start_pfn,
 						 unsigned long end_pfn)
 {
-	unsigned long nr_pgmask = pageblock_nr_pages - 1;
+	unsigned long i, pfn, count;
 	int nid = zone_to_nid(zone);
 	unsigned long nr_pages = 0;
 	int zid = zone_idx(zone);
-	struct page *page = NULL;
 
-	for (; pfn < end_pfn; pfn++) {
-		if (!deferred_pfn_valid(pfn)) {
-			page = NULL;
-			continue;
-		} else if (!page || !(pfn & nr_pgmask)) {
-			page = pfn_to_page(pfn);
-			touch_nmi_watchdog();
-		} else {
-			page++;
-		}
-		__init_single_page(page, pfn, zid, nid);
-		nr_pages++;
+	for_each_deferred_pfn_valid_range(i, start_pfn, end_pfn, pfn, count) {
+		nr_pages += count;
+		__init_pageblock(pfn, count, zid, nid, NULL, false);
+
+		touch_nmi_watchdog();
 	}
-	return (nr_pages);
+
+	return nr_pages;
 }
 
 /*