diff mbox

[RESEND,v10,2/6] mm: page_alloc: remain memblock_next_valid_pfn() on arm/arm64

Message ID 1530867675-9018-3-git-send-email-hejianet@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jia He July 6, 2018, 9:01 a.m. UTC
From: Jia He <jia.he@hxt-semitech.com>

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.

But as suggested by Daniel Vacek, it is fine to using memblock to skip
gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
Daniel said:
"On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
not always return the next valid one but skips more resulting in some
valid frames to be skipped (as if they were invalid). And that's why
kernel was eventually crashing on some !arm machines."

About the performance consideration:
As said by James in b92df1de5,
"I have tested this patch on a virtual model of a Samurai CPU
with a sparse memory map.  The kernel boot time drops from 109 to
62 seconds."

Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64.

Suggested-by: Daniel Vacek <neelx@redhat.com>
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
---
 include/linux/mmzone.h | 11 +++++++++++
 mm/memblock.c          | 30 ++++++++++++++++++++++++++++++
 mm/page_alloc.c        |  5 ++++-
 3 files changed, 45 insertions(+), 1 deletion(-)

Comments

Andrew Morton July 6, 2018, 10:37 p.m. UTC | #1
On Fri,  6 Jul 2018 17:01:11 +0800 Jia He <hejianet@gmail.com> wrote:

> From: Jia He <jia.he@hxt-semitech.com>
> 
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But it causes
> possible panic bug. So Daniel Vacek reverted it later.
> 
> But as suggested by Daniel Vacek, it is fine to using memblock to skip
> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
> Daniel said:
> "On arm and arm64, memblock is used by default. But generic version of
> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
> not always return the next valid one but skips more resulting in some
> valid frames to be skipped (as if they were invalid). And that's why
> kernel was eventually crashing on some !arm machines."
> 
> About the performance consideration:
> As said by James in b92df1de5,
> "I have tested this patch on a virtual model of a Samurai CPU
> with a sparse memory map.  The kernel boot time drops from 109 to
> 62 seconds."
> 
> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64.
> 

We're making a bit of a mess here.  mmzone.h:

...
#ifndef CONFIG_HAVE_ARCH_PFN_VALID
...
#define next_valid_pfn(pfn)	(pfn + 1)
#endif
...
#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
#define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
...
#else
...
#ifndef next_valid_pfn
#define next_valid_pfn(pfn)	(pfn + 1)
#endif

I guess it works OK, since CONFIG_HAVE_MEMBLOCK_PFN_VALID depends on
CONFIG_HAVE_ARCH_PFN_VALID.  But it could all do with some cleanup and
modernization.

- Perhaps memblock_next_valid_pfn() should just be called
  pfn_valid().  So the header file's responsibility is to provide
  pfn_valid() and next_valid_pfn().

- CONFIG_HAVE_ARCH_PFN_VALID should go away.  The current way of
  doing such thnigs is for the arch (or some Kconfig combination) to
  define pfn_valid() and next_valid_pfn() in some fashion and to then
  ensure that one of them is #defined to something, to indicate that
  both of these have been set up.  Or something like that.


Secondly, in memmap_init_zone()

> -		if (!early_pfn_valid(pfn))
> +		if (!early_pfn_valid(pfn)) {
> +			pfn = next_valid_pfn(pfn) - 1;
> 			continue;
> +		}
> +

This is weird-looking.  next_valid_pfn(pfn) is usually (pfn+1) so it's
a no-op.  Sometimes we're calling memblock_next_valid_pfn() and then
backing up one, presumably because the `for' loop ends in `pfn++'.  Or
something.  Can this please be fully commented or cleaned up?
Jia He July 9, 2018, 3:30 a.m. UTC | #2
Hi Andew
Thanks for the comments

On 7/7/2018 6:37 AM, Andrew Morton Wrote:
> On Fri,  6 Jul 2018 17:01:11 +0800 Jia He <hejianet@gmail.com> wrote:
> 
>> From: Jia He <jia.he@hxt-semitech.com>
>>
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic bug. So Daniel Vacek reverted it later.
>>
>> But as suggested by Daniel Vacek, it is fine to using memblock to skip
>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>> Daniel said:
>> "On arm and arm64, memblock is used by default. But generic version of
>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
>> not always return the next valid one but skips more resulting in some
>> valid frames to be skipped (as if they were invalid). And that's why
>> kernel was eventually crashing on some !arm machines."
>>
>> About the performance consideration:
>> As said by James in b92df1de5,
>> "I have tested this patch on a virtual model of a Samurai CPU
>> with a sparse memory map.  The kernel boot time drops from 109 to
>> 62 seconds."
>>
>> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64.
>>
> 
> We're making a bit of a mess here.  mmzone.h:
> 
> ...
> #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> ...
> #define next_valid_pfn(pfn)	(pfn + 1)

Yes, ^ this line can be removed.

> #endif
> ...
> #ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
> #define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
> ...
> #else
> ...
> #ifndef next_valid_pfn
> #define next_valid_pfn(pfn)	(pfn + 1)
> #endif
> 
> I guess it works OK, since CONFIG_HAVE_MEMBLOCK_PFN_VALID depends on
> CONFIG_HAVE_ARCH_PFN_VALID.  But it could all do with some cleanup and
> modernization.
> 
> - Perhaps memblock_next_valid_pfn() should just be called
>   pfn_valid().  So the header file's responsibility is to provide
>   pfn_valid() and next_valid_pfn().
> 
> - CONFIG_HAVE_ARCH_PFN_VALID should go away.  The current way of
>   doing such thnigs is for the arch (or some Kconfig combination) to
>   define pfn_valid() and next_valid_pfn() in some fashion and to then
>   ensure that one of them is #defined to something, to indicate that
>   both of these have been set up.  Or something like that.

This is what I did in Patch v2, please see [1]. But Daniel opposed it [2]

As he said:
Now, if any other architecture defines CONFIG_HAVE_ARCH_PFN_VALID and
implements it's own version of pfn_valid(), there is no guarantee that
it will be based on memblock data or somehow equivalent to the arm
implementation, right?
I think it make sense, so I introduced the new config
CONFIG_HAVE_MEMBLOCK_PFN_VALID instead of using CONFIG_HAVE_ARCH_PFN_VALID
how about you ? :-)

[1] https://lkml.org/lkml/2018/3/24/71
[2] https://lkml.org/lkml/2018/3/28/231

> 
> 
> Secondly, in memmap_init_zone()
> 
>> -		if (!early_pfn_valid(pfn))
>> +		if (!early_pfn_valid(pfn)) {
>> +			pfn = next_valid_pfn(pfn) - 1;
>> 			continue;
>> +		}
>> +
> 
> This is weird-looking.  next_valid_pfn(pfn) is usually (pfn+1) so it's
> a no-op.  Sometimes we're calling memblock_next_valid_pfn() and then
> backing up one, presumably because the `for' loop ends in `pfn++'.  Or
> something.  Can this please be fully commented or cleaned up?
To clean it up, maybe below is not acceptable for you and other experts ?
		if (!early_pfn_valid(pfn)) {
#ifndef XXX
			continue;
		}
#else
		pfn = next_valid_pfn(pfn) - 1;
			continue;
		}
#endif

Another way which was suggested by Ard Biesheuvel
something like:
	for (pfn = start_pfn; pfn < end_pfn; pfn = next_valid_pfn(pfn))
	...
But it might have impact on memmap_init_zone loop.

E.g. context != MEMMAP_EARLY, pfn will not be checked by early_pfn_valid, thus
it will change the mem hotplug logic.

Sure, as you suggested, I can give more comments in all the cases of different
configs/arches for this line.
Pasha Tatashin Aug. 16, 2018, 10:54 p.m. UTC | #3
On 18-07-06 17:01:11, Jia He wrote:
> From: Jia He <jia.he@hxt-semitech.com>
> 
> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> where possible") optimized the loop in memmap_init_zone(). But it causes
> possible panic bug. So Daniel Vacek reverted it later.
> 
> But as suggested by Daniel Vacek, it is fine to using memblock to skip
> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
> Daniel said:
> "On arm and arm64, memblock is used by default. But generic version of
> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
> not always return the next valid one but skips more resulting in some
> valid frames to be skipped (as if they were invalid). And that's why
> kernel was eventually crashing on some !arm machines."
> 
> About the performance consideration:
> As said by James in b92df1de5,
> "I have tested this patch on a virtual model of a Samurai CPU
> with a sparse memory map.  The kernel boot time drops from 109 to
> 62 seconds."
> 
> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64.
> 
> Suggested-by: Daniel Vacek <neelx@redhat.com>
> Signed-off-by: Jia He <jia.he@hxt-semitech.com>

The version of this patch in linux-next has few fixes, I reviewed that one
looks good to me.

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
diff mbox

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2..57cdc42 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1241,6 +1241,8 @@  static inline int pfn_valid(unsigned long pfn)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
+
+#define next_valid_pfn(pfn)	(pfn + 1)
 #endif
 
 static inline int pfn_present(unsigned long pfn)
@@ -1266,6 +1268,10 @@  static inline int pfn_present(unsigned long pfn)
 #endif
 
 #define early_pfn_valid(pfn)	pfn_valid(pfn)
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+extern ulong memblock_next_valid_pfn(ulong pfn);
+#define next_valid_pfn(pfn)	memblock_next_valid_pfn(pfn)
+#endif
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
@@ -1287,6 +1293,11 @@  struct mminit_pfnnid_cache {
 #define early_pfn_valid(pfn)	(1)
 #endif
 
+/* fallback to default definitions*/
+#ifndef next_valid_pfn
+#define next_valid_pfn(pfn)	(pfn + 1)
+#endif
+
 void memory_present(int nid, unsigned long start, unsigned long end);
 
 /*
diff --git a/mm/memblock.c b/mm/memblock.c
index b9cdfa0..ccad225 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1139,6 +1139,36 @@  int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+ulong __init_memblock memblock_next_valid_pfn(ulong pfn)
+{
+	struct memblock_type *type = &memblock.memory;
+	unsigned int right = type->cnt;
+	unsigned int mid, left = 0;
+	phys_addr_t addr = PFN_PHYS(++pfn);
+
+	do {
+		mid = (right + left) / 2;
+
+		if (addr < type->regions[mid].base)
+			right = mid;
+		else if (addr >= (type->regions[mid].base +
+				  type->regions[mid].size))
+			left = mid + 1;
+		else {
+			/* addr is within the region, so pfn is valid */
+			return pfn;
+		}
+	} while (left < right);
+
+	if (right == type->cnt)
+		return -1UL;
+	else
+		return PHYS_PFN(type->regions[right].base);
+}
+EXPORT_SYMBOL(memblock_next_valid_pfn);
+#endif /*CONFIG_HAVE_MEMBLOCK_PFN_VALID*/
+
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 					phys_addr_t align, phys_addr_t start,
 					phys_addr_t end, int nid, ulong flags)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cd3c7b9..607deff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5485,8 +5485,11 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context != MEMMAP_EARLY)
 			goto not_early;
 
-		if (!early_pfn_valid(pfn))
+		if (!early_pfn_valid(pfn)) {
+			pfn = next_valid_pfn(pfn) - 1;
 			continue;
+		}
+
 		if (!early_pfn_in_nid(pfn, nid))
 			continue;
 		if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))