diff mbox series

[RFC] mm: update highest_memmap_pfn based on exact pfn

Message ID 20181128083634.18515-1-richard.weiyang@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm: update highest_memmap_pfn based on exact pfn | expand

Commit Message

Wei Yang Nov. 28, 2018, 8:36 a.m. UTC
When DEFERRED_STRUCT_PAGE_INIT is set, page struct will not be
initialized all at boot up. Some of them is postponed to defer stage.
While the global variable highest_memmap_pfn is still set to the highest
pfn at boot up, even some of them are not initialized.

This patch adjust this behavior by update highest_memmap_pfn with the
exact pfn during each iteration. Since each node has a defer thread,
introduce a spin lock to protect it.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/internal.h   | 8 ++++++++
 mm/memory.c     | 1 +
 mm/nommu.c      | 1 +
 mm/page_alloc.c | 9 ++++++---
 4 files changed, 16 insertions(+), 3 deletions(-)

Comments

Andrew Morton Nov. 28, 2018, 11 p.m. UTC | #1
On Wed, 28 Nov 2018 16:36:34 +0800 Wei Yang <richard.weiyang@gmail.com> wrote:

> When DEFERRED_STRUCT_PAGE_INIT is set, page struct will not be
> initialized all at boot up. Some of them is postponed to defer stage.
> While the global variable highest_memmap_pfn is still set to the highest
> pfn at boot up, even some of them are not initialized.
> 
> This patch adjust this behavior by update highest_memmap_pfn with the
> exact pfn during each iteration. Since each node has a defer thread,
> introduce a spin lock to protect it.
> 

Does this solve any known problems?  If so then I'm suspecting that
those problems go deeper than this.

Why use a spinlock rather than an atomic_long_t?

Perhaps this check should instead be built into pfn_valid()?
Wei Yang Nov. 29, 2018, 2:08 a.m. UTC | #2
On Wed, Nov 28, 2018 at 03:00:52PM -0800, Andrew Morton wrote:
>On Wed, 28 Nov 2018 16:36:34 +0800 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> When DEFERRED_STRUCT_PAGE_INIT is set, page struct will not be
>> initialized all at boot up. Some of them is postponed to defer stage.
>> While the global variable highest_memmap_pfn is still set to the highest
>> pfn at boot up, even some of them are not initialized.
>> 
>> This patch adjust this behavior by update highest_memmap_pfn with the
>> exact pfn during each iteration. Since each node has a defer thread,
>> introduce a spin lock to protect it.
>> 
>
>Does this solve any known problems?  If so then I'm suspecting that
>those problems go deeper than this.

Corrently I don't see any problem.

>
>Why use a spinlock rather than an atomic_long_t?

Sorry for my shortage in knowledge. I am not sure how to compare and
change a value atomicly. cmpxchg just could compare the exact value.

>
>Perhaps this check should instead be built into pfn_valid()?

I think the original commit 22b31eec63e5 ('badpage: vm_normal_page use
print_bad_pte') introduce highest_memmap_pfn to make pfn_valid()
cheaper.

Some definition of pfn_valid() is :

#define pfn_valid(pfn)          ((pfn) < max_pfn)

Which doesn't care about the exact presented or memmap-ed page.

I am not for sure all pfn_valid() could leverage this. One thing for
sure is there are only two users of highest_memmap_pfn

   * vm_normal_page_pmd
   * _vm_normal_page
diff mbox series

Patch

diff --git a/mm/internal.h b/mm/internal.h
index 6a57811ae47d..f9e19c7d9b0a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -79,6 +79,14 @@  static inline void set_page_refcounted(struct page *page)
 }
 
 extern unsigned long highest_memmap_pfn;
+extern spinlock_t highest_memmap_pfn_lock;
+static inline void update_highest_memmap_pfn(unsigned long end_pfn)
+{
+	spin_lock(&highest_memmap_pfn_lock);
+	if (highest_memmap_pfn < end_pfn - 1)
+		highest_memmap_pfn = end_pfn - 1;
+	spin_unlock(&highest_memmap_pfn_lock);
+}
 
 /*
  * Maximum number of reclaim retries without progress before the OOM
diff --git a/mm/memory.c b/mm/memory.c
index 4ad2d293ddc2..0cf9b88b51b7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -127,6 +127,7 @@  unsigned long zero_pfn __read_mostly;
 EXPORT_SYMBOL(zero_pfn);
 
 unsigned long highest_memmap_pfn __read_mostly;
+DEFINE_SPINLOCK(highest_memmap_pfn_lock);
 
 /*
  * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
diff --git a/mm/nommu.c b/mm/nommu.c
index 749276beb109..610dc6e17ce5 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -48,6 +48,7 @@  struct page *mem_map;
 unsigned long max_mapnr;
 EXPORT_SYMBOL(max_mapnr);
 unsigned long highest_memmap_pfn;
+static DEFINE_SPINLOCK(highest_memmap_pfn_lock);
 int sysctl_nr_trim_pages = CONFIG_NOMMU_INITIAL_TRIM_EXCESS;
 int heap_stack_gap = 0;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ccc86df24f28..33bb339aaef8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1216,6 +1216,7 @@  static void __meminit init_reserved_page(unsigned long pfn)
 			break;
 	}
 	__init_single_page(pfn_to_page(pfn), pfn, zid, nid);
+	update_highest_memmap_pfn(pfn);
 }
 #else
 static inline void init_reserved_page(unsigned long pfn)
@@ -1540,6 +1541,7 @@  static unsigned long  __init deferred_init_pages(int nid, int zid,
 		__init_single_page(page, pfn, zid, nid);
 		nr_pages++;
 	}
+	update_highest_memmap_pfn(pfn);
 	return (nr_pages);
 }
 
@@ -5491,9 +5493,6 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long pfn, end_pfn = start_pfn + size;
 	struct page *page;
 
-	if (highest_memmap_pfn < end_pfn - 1)
-		highest_memmap_pfn = end_pfn - 1;
-
 #ifdef CONFIG_ZONE_DEVICE
 	/*
 	 * Honor reservation requested by the driver for this ZONE_DEVICE
@@ -5550,6 +5549,8 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			cond_resched();
 		}
 	}
+
+	update_highest_memmap_pfn(pfn);
 }
 
 #ifdef CONFIG_ZONE_DEVICE
@@ -5622,6 +5623,8 @@  void __ref memmap_init_zone_device(struct zone *zone,
 		}
 	}
 
+	update_highest_memmap_pfn(pfn);
+
 	pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev),
 		size, jiffies_to_msecs(jiffies - start));
 }