From patchwork Mon Nov  5 21:19:45 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Duyck <alexander.h.duyck@linux.intel.com>
X-Patchwork-Id: 10669191
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C492815A6
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Mon,  5 Nov 2018 21:19:50 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B799029C30
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Mon,  5 Nov 2018 21:19:50 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id AB4DF29C5E; Mon,  5 Nov 2018 21:19:50 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F062429C30
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Mon,  5 Nov 2018 21:19:49 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 86EE56B026F; Mon,  5 Nov 2018 16:19:48 -0500 (EST)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 81C3C6B0270; Mon,  5 Nov 2018 16:19:48 -0500 (EST)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 70BC96B0271; Mon,  5 Nov 2018 16:19:48 -0500 (EST)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com
 [209.85.210.197])
	by kanga.kvack.org (Postfix) with ESMTP id 321566B026F
	for <linux-mm@kvack.org>; Mon,  5 Nov 2018 16:19:48 -0500 (EST)
Received: by mail-pf1-f197.google.com with SMTP id 190-v6so2110505pfd.7
        for <linux-mm@kvack.org>; Mon, 05 Nov 2018 13:19:48 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-original-authentication-results:x-gm-message-state:subject:from
         :to:cc:date:message-id:in-reply-to:references:user-agent
         :mime-version:content-transfer-encoding;
        bh=Zj0BlE/hbr4hYtdiBJ5b3uY9MCxMDAoEBu1Qa0fCIWk=;
        b=QHbEoYRP4cqXLWaEbEAIVRacwHuKM+tOxvX/eQ8cZVdinTH23KDuYTS7fSnHOZzmfz
         nx7O3Wk4pCvq8PDTXCZDF+nzqKSrZV9yKsUEYGvJtJ0hjF7pEZFDDPKXUCdjW7cggl5Y
         Doa8m9tpejQeky4iJVYcJEDKfYJfh96xEZy2P2iLK05LwT8wt82NXrZRZ4YUtAnHGxwq
         PlruLzJ/NT2O2vK5jWTn5RaZu59Gn2vIFOqFDdEI4Ih4M2QCqjZvLohPwJG5I8VtITdn
         Gi21khytcV9pEIQXJTCPuLydImA382WcbueVs+Jo1sy6F53Pn82k/35B/Bdbn5Xeox3s
         leuA==
X-Original-Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.136 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: AGRZ1gIvtmNwl7hiMIl2JlsBES5FMUyRC/1p/iocY32W6ui7ZeTSeI/3
	K6LGC2j+U2/pAaJ+oioA4jaXiCeBVnEsweA8AkRvB8lIPJAn630/EChknFNpzN4NKKPOXL34yPz
	7EYLgxPHct3Mx4EObKylGi/3jYK2YDGeFfOijDefbz/vh3JmX6BKy9qLnEQU8xhYyDg==
X-Received: by 2002:a17:902:2f84:: with SMTP id
 t4-v6mr23426754plb.239.1541452787836;
        Mon, 05 Nov 2018 13:19:47 -0800 (PST)
X-Google-Smtp-Source: 
 AJdET5cW77XnLSrHN5eAvpsm7DnQ0N78HAx9HD1nTZ38OcBWa72FVnxU5DPmUIL3ukN5c2SCQqup
X-Received: by 2002:a17:902:2f84:: with SMTP id
 t4-v6mr23426689plb.239.1541452786705;
        Mon, 05 Nov 2018 13:19:46 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1541452786; cv=none;
        d=google.com; s=arc-20160816;
        b=A9TSeGXqEMoMqEoBDky6UvSo3CTwEzf/kKR8gXUX8vwN6tSdNWljM7Kw6cN+7qcLVF
         TbhrvqtIjD8WnXS1bULAMlywgDNOp8pjMX949vUPZ3OrA3Wc37Et0DwdDbx4JP2nFTJu
         krSCoHYfw1C+QdhKyoRHPF0rf77POlAks3dvk7GVgtaqk/8YfN0jHAqPC65jUqVdK1oB
         QDAxtHveotQ1HLiMB0c16x7UCTWbDBLH3pDs46FUfhVF31rH0ikSzVSdRbKkYVeNep7l
         poQiK7+jeKgqgbYH5wtHPZvf5y7LM/oU9+2WlFU7CDIplqWs7Dr9+QDRRC0y8LAD8Y2h
         T49Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=content-transfer-encoding:mime-version:user-agent:references
         :in-reply-to:message-id:date:cc:to:from:subject;
        bh=Zj0BlE/hbr4hYtdiBJ5b3uY9MCxMDAoEBu1Qa0fCIWk=;
        b=OcjhkvHpHLqGIjdJuFhqZ5gNDPGPQ1pjXI0G/6Z6A+wgurUdD2JJiNVkw7wtfvQ5VF
         88VwciO/gHmb9jctaMIxTgUj5Qaqcxd/7bc8cdOdDvZ+fPdaKyMNYYwvTmgz+KyjLuTG
         ghO49Vog1iqzMjueStpKk8T/Z+57myoeVWLwmuPCpvUcwOOqkX4JOVR+RI1TVD5HKXr8
         jM5j4GBX9RWsJYe33TA2ej9obxnBmfnxEJXuRXTFsNb33o0Ofw0MvfKCRVx23EHDGquG
         BENJVIcojm1uYbuaQOzIcrfi25ws6gESY0cNL6JbaW1K/A40GUfGSC7r4RcOHbJZdU+H
         kCFQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.136 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga12.intel.com (mga12.intel.com. [192.55.52.136])
        by mx.google.com with ESMTPS id
 l8si14685916pgr.345.2018.11.05.13.19.46
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 05 Nov 2018 13:19:46 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.136 as permitted
 sender) client-ip=192.55.52.136;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of
 alexander.h.duyck@linux.intel.com designates 192.55.52.136 as permitted
 sender) smtp.mailfrom=alexander.h.duyck@linux.intel.com;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
  by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 05 Nov 2018 13:19:46 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,469,1534834800";
   d="scan'208";a="105640341"
Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76])
  by orsmga001.jf.intel.com with ESMTP; 05 Nov 2018 13:19:45 -0800
Subject: [mm PATCH v5 4/7] mm: Initialize MAX_ORDER_NR_PAGES at a time
 instead of doing larger sections
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: akpm@linux-foundation.org, linux-mm@kvack.org
Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org,
 linux-nvdimm@lists.01.org, davem@davemloft.net, pavel.tatashin@microsoft.com,
 mhocko@suse.com, mingo@kernel.org, kirill.shutemov@linux.intel.com,
 dan.j.williams@intel.com, dave.jiang@intel.com,
 alexander.h.duyck@linux.intel.com, rppt@linux.vnet.ibm.com,
 willy@infradead.org, vbabka@suse.cz, khalid.aziz@oracle.com,
 ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net,
 yi.z.zhang@linux.intel.com, alexander.h.duyck@linux.intel.com
Date: Mon, 05 Nov 2018 13:19:45 -0800
Message-ID: 
 <154145278583.30046.4918131143612801028.stgit@ahduyck-desk1.jf.intel.com>
In-Reply-To: 
 <154145268025.30046.11742652345962594283.stgit@ahduyck-desk1.jf.intel.com>
References: 
 <154145268025.30046.11742652345962594283.stgit@ahduyck-desk1.jf.intel.com>
User-Agent: StGit/unknown-version
MIME-Version: 1.0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch adds yet another iterator, for_each_free_mem_range_in_zone_from.
It and then uses it to support initializing and freeing pages in groups no
larger than MAX_ORDER_NR_PAGES. By doing this we can greatly improve the
cache locality of the pages while we do several loops over them in the init
and freeing process.

We are able to tighten the loops as a result since we only really need the
checks for first_init_pfn in our first iteration and after that we can
assume that all future values will be greater than this. So I have added a
function called deferred_init_mem_pfn_range_in_zone that primes the
iterators and if it fails we can just exit.

On my x86_64 test system with 384GB of memory per node I saw a reduction in
initialization time from 1.85s to 1.38s as a result of this patch.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/memblock.h |   16 +++++
 mm/page_alloc.c          |  162 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 134 insertions(+), 44 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 413623dc96a3..5ba52a7878a0 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -268,6 +268,22 @@ void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
 	     __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end);	\
 	     i != (u64)ULLONG_MAX;					\
 	     __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end))
+
+/**
+ * for_each_free_mem_range_in_zone_from - iterate through zone specific
+ * free memblock areas from a given point
+ * @i: u64 used as loop variable
+ * @zone: zone in which all of the memory blocks reside
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over free (memory && !reserved) areas of memblock in a specific
+ * zone, continuing from current position. Available as soon as memblock is
+ * initialized.
+ */
+#define for_each_free_mem_pfn_range_in_zone_from(i, zone, p_start, p_end) \
+	for (; i != (u64)ULLONG_MAX;					  \
+	     __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end))
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 /**
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5cfd3ebe10d1..3466a01ed90a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1512,16 +1512,102 @@ static unsigned long  __init deferred_init_pages(struct zone *zone,
 	return (nr_pages);
 }
 
+/*
+ * This function is meant to pre-load the iterator for the zone init.
+ * Specifically it walks through the ranges until we are caught up to the
+ * first_init_pfn value and exits there. If we never encounter the value we
+ * return false indicating there are no valid ranges left.
+ */
+static bool __init
+deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone,
+				    unsigned long *spfn, unsigned long *epfn,
+				    unsigned long first_init_pfn)
+{
+	u64 j;
+
+	/*
+	 * Start out by walking through the ranges in this zone that have
+	 * already been initialized. We don't need to do anything with them
+	 * so we just need to flush them out of the system.
+	 */
+	for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) {
+		if (*epfn <= first_init_pfn)
+			continue;
+		if (*spfn < first_init_pfn)
+			*spfn = first_init_pfn;
+		*i = j;
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Initialize and free pages. We do it in two loops: first we initialize
+ * struct page, than free to buddy allocator, because while we are
+ * freeing pages we can access pages that are ahead (computing buddy
+ * page in __free_one_page()).
+ *
+ * In order to try and keep some memory in the cache we have the loop
+ * broken along max page order boundaries. This way we will not cause
+ * any issues with the buddy page computation.
+ */
+static unsigned long __init
+deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
+		       unsigned long *end_pfn)
+{
+	unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
+	unsigned long spfn = *start_pfn, epfn = *end_pfn;
+	unsigned long nr_pages = 0;
+	u64 j = *i;
+
+	/* First we loop through and initialize the page values */
+	for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) {
+		unsigned long t;
+
+		if (mo_pfn <= spfn)
+			break;
+
+		t = min(mo_pfn, epfn);
+		nr_pages += deferred_init_pages(zone, spfn, t);
+
+		if (mo_pfn <= epfn)
+			break;
+	}
+
+	/* Reset values and now loop through freeing pages as needed */
+	j = *i;
+
+	for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) {
+		unsigned long t;
+
+		if (mo_pfn <= *start_pfn)
+			break;
+
+		t = min(mo_pfn, *end_pfn);
+		deferred_free_pages(*start_pfn, t);
+		*start_pfn = t;
+
+		if (mo_pfn < *end_pfn)
+			break;
+	}
+
+	/* Store our current values to be reused on the next iteration */
+	*i = j;
+
+	return nr_pages;
+}
+
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
 	pg_data_t *pgdat = data;
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
+	unsigned long spfn = 0, epfn = 0, nr_pages = 0;
+	unsigned long first_init_pfn, flags;
 	unsigned long start = jiffies;
-	unsigned long nr_pages = 0;
-	unsigned long spfn, epfn, first_init_pfn, flags;
-	int zid;
 	struct zone *zone;
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
+	int zid;
 	u64 i;
 
 	/* Bind memory initialisation thread to a local node if possible */
@@ -1547,22 +1633,23 @@ static int __init deferred_init_memmap(void *data)
 		if (first_init_pfn < zone_end_pfn(zone))
 			break;
 	}
-	first_init_pfn = max(zone->zone_start_pfn, first_init_pfn);
+
+	/* If the zone is empty somebody else may have cleared out the zone */
+	if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
+						 first_init_pfn)) {
+		pgdat_resize_unlock(pgdat, &flags);
+		pgdat_init_report_one_done();
+		return 0;
+	}
 
 	/*
-	 * Initialize and free pages. We do it in two loops: first we initialize
-	 * struct page, than free to buddy allocator, because while we are
-	 * freeing pages we can access pages that are ahead (computing buddy
-	 * page in __free_one_page()).
+	 * Initialize and free pages in MAX_ORDER sized increments so
+	 * that we can avoid introducing any issues with the buddy
+	 * allocator.
 	 */
-	for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) {
-		spfn = max_t(unsigned long, first_init_pfn, spfn);
-		nr_pages += deferred_init_pages(zone, spfn, epfn);
-	}
-	for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) {
-		spfn = max_t(unsigned long, first_init_pfn, spfn);
-		deferred_free_pages(spfn, epfn);
-	}
+	while (spfn < epfn)
+		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+
 	pgdat_resize_unlock(pgdat, &flags);
 
 	/* Sanity check that the next zone really is unpopulated */
@@ -1602,9 +1689,9 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
 {
 	unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
 	pg_data_t *pgdat = zone->zone_pgdat;
-	unsigned long nr_pages = 0;
-	unsigned long first_init_pfn, spfn, epfn, t, flags;
 	unsigned long first_deferred_pfn = pgdat->first_deferred_pfn;
+	unsigned long spfn, epfn, flags;
+	unsigned long nr_pages = 0;
 	u64 i;
 
 	/* Only the last zone may have deferred pages */
@@ -1633,36 +1720,23 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
 		return true;
 	}
 
-	first_init_pfn = max(zone->zone_start_pfn, first_deferred_pfn);
-
-	if (first_init_pfn >= pgdat_end_pfn(pgdat)) {
+	/* If the zone is empty somebody else may have cleared out the zone */
+	if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
+						 first_deferred_pfn)) {
 		pgdat_resize_unlock(pgdat, &flags);
-		return false;
+		return true;
 	}
 
-	for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) {
-		spfn = max_t(unsigned long, first_init_pfn, spfn);
-
-		while (spfn < epfn && nr_pages < nr_pages_needed) {
-			t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION);
-			first_deferred_pfn = min(t, epfn);
-			nr_pages += deferred_init_pages(zone, spfn,
-							first_deferred_pfn);
-			spfn = first_deferred_pfn;
-		}
-
-		if (nr_pages >= nr_pages_needed)
-			break;
+	/*
+	 * Initialize and free pages in MAX_ORDER sized increments so
+	 * that we can avoid introducing any issues with the buddy
+	 * allocator.
+	 */
+	while (spfn < epfn && nr_pages < nr_pages_needed) {
+		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+		first_deferred_pfn = spfn;
 	}
 
-	for_each_free_mem_pfn_range_in_zone(i, zone, &spfn, &epfn) {
-		spfn = max_t(unsigned long, first_init_pfn, spfn);
-		epfn = min_t(unsigned long, first_deferred_pfn, epfn);
-		deferred_free_pages(spfn, epfn);
-
-		if (first_deferred_pfn == epfn)
-			break;
-	}
 	pgdat->first_deferred_pfn = first_deferred_pfn;
 	pgdat_resize_unlock(pgdat, &flags);