diff mbox series

[RFC,v2,3/4] mm,memory_hotplug: Add mhp_supports_memmap_on_memory

Message ID 20201125112048.8211-4-osalvador@suse.de (mailing list archive)
State New, archived
Headers show
Series Allocate memmap from hotadded memory (per device) | expand

Commit Message

Oscar Salvador Nov. 25, 2020, 11:20 a.m. UTC
mhp_supports_memmap_on_memory is meant to be used by the caller prior
to hot-adding memory in order to figure out whether it can enable
MHP_MEMMAP_ON_MEMORY or not.

Enabling MHP_MEMMAP_ON_MEMORY requires:

 - CONFIG_SPARSEMEM_VMEMMAP
 - architecture support for altmap
 - hot-added range spans a single memory block

At the moment, only three architectures support passing altmap when
building the page tables: x86, POWERPC and ARM.
Define an arch_support_memmap_on_memory function on those architectures
that returns true, and define a __weak variant of it that will be used
on the others.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 arch/arm64/mm/mmu.c   |  5 +++++
 arch/powerpc/mm/mem.c |  5 +++++
 arch/x86/mm/init_64.c |  5 +++++
 mm/memory_hotplug.c   | 24 ++++++++++++++++++++++++
 4 files changed, 39 insertions(+)

Comments

Michal Hocko Nov. 27, 2020, 3:02 p.m. UTC | #1
On Wed 25-11-20 12:20:47, Oscar Salvador wrote:
> mhp_supports_memmap_on_memory is meant to be used by the caller prior
> to hot-adding memory in order to figure out whether it can enable
> MHP_MEMMAP_ON_MEMORY or not.
> 
> Enabling MHP_MEMMAP_ON_MEMORY requires:
> 
>  - CONFIG_SPARSEMEM_VMEMMAP
>  - architecture support for altmap
>  - hot-added range spans a single memory block

It should also require a tunable (kernel parameter for now but maybe we
will need a more fine grained control later) to enable this explicitly.
Earlier discussions have pointed out that allocating vmemmap from each
section can lead to a sparse memory unsuitable for very large pages.
So I believe this should be an opt in.
 
Also is there any reason why this cannot be a preparatory patch for the
actual implementation? It would look more natural that way to me.
Oscar Salvador Nov. 30, 2020, 8:50 a.m. UTC | #2
On Fri, Nov 27, 2020 at 04:02:53PM +0100, Michal Hocko wrote:
> It should also require a tunable (kernel parameter for now but maybe we
> will need a more fine grained control later) to enable this explicitly.
> Earlier discussions have pointed out that allocating vmemmap from each
> section can lead to a sparse memory unsuitable for very large pages.
> So I believe this should be an opt in.

Yeah, I already had that in mind, just did not get to implement it in this RFC
yet as I was more focused on the implementation per se.
I thought about a tunable in /sys/device/system/memory/[file], but a kernel
command line boot option would also work and for the first implementation
would be less for a hassel.

> Also is there any reason why this cannot be a preparatory patch for the
> actual implementation? It would look more natural that way to me.

I guess you are right, will re-order it in a future submission.
diff mbox series

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ca692a815731..0da4e4f8794f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1456,6 +1456,11 @@  static bool inside_linear_region(u64 start, u64 size)
 	       (start + size - 1) <= __pa(PAGE_END - 1);
 }
 
+bool arch_support_memmap_on_memory(void)
+{
+	return true;
+}
+
 int arch_add_memory(int nid, u64 start, u64 size,
 		    struct mhp_params *params)
 {
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3fc325bebe4d..18e7e28fe713 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -121,6 +121,11 @@  static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
 	}
 }
 
+bool arch_support_memmap_on_memory(void)
+{
+	return true;
+}
+
 int __ref arch_add_memory(int nid, u64 start, u64 size,
 			  struct mhp_params *params)
 {
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b5a3fa4033d3..ffb9d87c77e8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -860,6 +860,11 @@  int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 	return ret;
 }
 
+bool arch_support_memmap_on_memory(void)
+{
+	return true;
+}
+
 int arch_add_memory(int nid, u64 start, u64 size,
 		    struct mhp_params *params)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 87fbc2cc0d90..10255606ff85 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1028,6 +1028,20 @@  static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
+bool __weak arch_support_memmap_on_memory(void)
+{
+	return false;
+}
+
+bool mhp_supports_memmap_on_memory(unsigned long size)
+{
+	if (!arch_support_memmap_on_memory() ||
+	    !IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ||
+	    size > memory_block_size_bytes())
+		return false;
+	return true;
+}
+
 /*
  * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
  * and online/offline operations (triggered e.g. by sysfs).
@@ -1064,6 +1078,16 @@  int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		goto error;
 	new_node = ret;
 
+	/*
+	 * Return -EINVAL if caller specified MHP_MEMMAP_ON_MEMORY and we do
+	 * not support it.
+	 */
+	if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
+	    !mhp_supports_memmap_on_memory(size)) {
+		ret = -EINVAL;
+		goto error;
+	}
+
 	/*
 	 * Self hosted memmap array
 	 */