diff mbox series

[RFC] mm/hotplug: Make get_nid_for_pfn() work with HAVE_ARCH_PFN_VALID

Message ID 1553155700-3414-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm/hotplug: Make get_nid_for_pfn() work with HAVE_ARCH_PFN_VALID | expand

Commit Message

Anshuman Khandual March 21, 2019, 8:08 a.m. UTC
Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
entries between memory block and node. It first checks pfn validity with
pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
(arm64 has this enabled) pfn_valid_within() calls pfn_valid().

pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
which scans all mapped memblock regions with memblock_is_map_memory(). This
creates a problem in memory hot remove path which has already removed given
memory range from memory block with memblock_[remove|free] before arriving
at unregister_mem_sect_under_nodes().

During runtime memory hot remove get_nid_for_pfn() needs to validate that
given pfn has a struct page mapping so that it can fetch required nid. This
can be achieved just by looking into it's section mapping information. This
adds a new helper pfn_section_valid() for this purpose. Its same as generic
pfn_valid().

This maintains existing behaviour for deferred struct page init case.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This is a preparatory patch for memory hot-remove enablement on arm64. I
will appreciate some early feedback on this approach.

 drivers/base/node.c    | 15 ++++++++++++---
 include/linux/mmzone.h |  9 +++++++--
 2 files changed, 19 insertions(+), 5 deletions(-)

Comments

Michal Hocko March 21, 2019, 8:36 a.m. UTC | #1
On Thu 21-03-19 13:38:20, Anshuman Khandual wrote:
> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
> entries between memory block and node. It first checks pfn validity with
> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
> 
> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
> which scans all mapped memblock regions with memblock_is_map_memory(). This
> creates a problem in memory hot remove path which has already removed given
> memory range from memory block with memblock_[remove|free] before arriving
> at unregister_mem_sect_under_nodes().

Could you be more specific on what is the actual problem please? It
would be also helpful to mention when is the memblock[remove|free]
called actually.

> During runtime memory hot remove get_nid_for_pfn() needs to validate that
> given pfn has a struct page mapping so that it can fetch required nid. This
> can be achieved just by looking into it's section mapping information. This
> adds a new helper pfn_section_valid() for this purpose. Its same as generic
> pfn_valid().

I have to say I do not like this. Having pfn_section_valid != pfn_valid_within
is just confusing as hell. pfn_valid_within should return true whenever
a struct page exists and it is sensible (same like pfn_valid). So it
seems that this is something to be solved on that arch specific side of
pfn_valid.
Oscar Salvador March 21, 2019, 10:37 a.m. UTC | #2
On Thu, Mar 21, 2019 at 01:38:20PM +0530, Anshuman Khandual wrote:
> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
> entries between memory block and node. It first checks pfn validity with
> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
> 
> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
> which scans all mapped memblock regions with memblock_is_map_memory(). This
> creates a problem in memory hot remove path which has already removed given
> memory range from memory block with memblock_[remove|free] before arriving
> at unregister_mem_sect_under_nodes().
> 
> During runtime memory hot remove get_nid_for_pfn() needs to validate that
> given pfn has a struct page mapping so that it can fetch required nid. This
> can be achieved just by looking into it's section mapping information. This
> adds a new helper pfn_section_valid() for this purpose. Its same as generic
> pfn_valid().
> 
> This maintains existing behaviour for deferred struct page init case.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

I did not look really close to the patch, but I was dealing with
unregister_mem_sect_under_nodes() some time ago [1].

The thing is, I think we can just make it less complex.
Jonathan tried it out that patch on arm64 back then, and it worked correctly
for him, and it did for me too on x86_64.

I am not sure if I overlooked a corner case during the creation of the patch,
that could lead to problems.
But if not, we can get away with that, and we would not need to worry
about get_nid_for_pfn on hot-remove path.

I plan to revisit the patch in some days, but first I wanted to sort out
the vmemmap stuff, which I am preparing a new version of it.

[1] https://patchwork.kernel.org/patch/10700795/
Anshuman Khandual March 22, 2019, 6:19 a.m. UTC | #3
On 03/21/2019 02:06 PM, Michal Hocko wrote:
> On Thu 21-03-19 13:38:20, Anshuman Khandual wrote:
>> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
>> entries between memory block and node. It first checks pfn validity with
>> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
>> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
>>
>> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
>> which scans all mapped memblock regions with memblock_is_map_memory(). This
>> creates a problem in memory hot remove path which has already removed given
>> memory range from memory block with memblock_[remove|free] before arriving
>> at unregister_mem_sect_under_nodes().
> 
> Could you be more specific on what is the actual problem please? It
> would be also helpful to mention when is the memblock[remove|free]
> called actually.

The problem is in unregister_mem_sect_under_nodes() as it skips calling into both
instances of sysfs_remove_link() which removes node-memory block sysfs symlinks.
The node enumeration of the memory block still remains in sysfs even if the memory
block itself has been removed.

This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has
a valid associated struct page to fetch the node ID from.

On arm64 (with CONFIG_HOLES_IN_ZONE)

get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory()

At this point memblock for the range has been removed.

__remove_memory()
	memblock_free()
	memblock_remove()	--------> memblock has already been removed
	arch_remove_memory()
		__remove_pages()
			__remove_section()
				unregister_memory_section()
 					remove_memory_section()
						unregister_mem_sect_under_nodes()

There is a dependency on memblock (after it has been removed) through pfn_valid().
  			
> 
>> During runtime memory hot remove get_nid_for_pfn() needs to validate that
>> given pfn has a struct page mapping so that it can fetch required nid. This
>> can be achieved just by looking into it's section mapping information. This
>> adds a new helper pfn_section_valid() for this purpose. Its same as generic
>> pfn_valid().
> 
> I have to say I do not like this. Having pfn_section_valid != pfn_valid_within
> is just confusing as hell. pfn_valid_within should return true whenever
> a struct page exists and it is sensible (same like pfn_valid). So it
> seems that this is something to be solved on that arch specific side of
> pfn_valid.

At present arm64's pfn_valid() implementation validates the pfn inside sparse
memory section mapping as well memblock. The memblock search excludes memory
with MEMBLOCK_NOMAP attribute. But in this particular instance during hotplug
only section mapping validation for the pfn is good enough.

IIUC the current arm64 pfn_valid() already extends the definition beyond the
availability of a valid struct page to operate on.
Anshuman Khandual March 22, 2019, 6:45 a.m. UTC | #4
On 03/21/2019 04:07 PM, Oscar Salvador wrote:
> On Thu, Mar 21, 2019 at 01:38:20PM +0530, Anshuman Khandual wrote:
>> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
>> entries between memory block and node. It first checks pfn validity with
>> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
>> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
>>
>> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
>> which scans all mapped memblock regions with memblock_is_map_memory(). This
>> creates a problem in memory hot remove path which has already removed given
>> memory range from memory block with memblock_[remove|free] before arriving
>> at unregister_mem_sect_under_nodes().
>>
>> During runtime memory hot remove get_nid_for_pfn() needs to validate that
>> given pfn has a struct page mapping so that it can fetch required nid. This
>> can be achieved just by looking into it's section mapping information. This
>> adds a new helper pfn_section_valid() for this purpose. Its same as generic
>> pfn_valid().
>>
>> This maintains existing behaviour for deferred struct page init case.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> I did not look really close to the patch, but I was dealing with
> unregister_mem_sect_under_nodes() some time ago [1].
> 
> The thing is, I think we can just make it less complex.
> Jonathan tried it out that patch on arm64 back then, and it worked correctly
> for him, and it did for me too on x86_64.
> 
> I am not sure if I overlooked a corner case during the creation of the patch,
> that could lead to problems.

Is there any known corner cases ?

> But if not, we can get away with that, and we would not need to worry
> about get_nid_for_pfn on hot-remove path.

The approach of passing down node ID looks good and will also avoid proposed
changes here to get_nid_for_pfn() during memory hot-remove.

> 
> I plan to revisit the patch in some days, but first I wanted to sort out
> the vmemmap stuff, which I am preparing a new version of it.
> 
> [1] https://patchwork.kernel.org/patch/10700795/
> 

Sure. Please keep me copied when you repost this patch. Thank you.
Michal Hocko March 22, 2019, 12:02 p.m. UTC | #5
On Fri 22-03-19 11:49:30, Anshuman Khandual wrote:
> 
> 
> On 03/21/2019 02:06 PM, Michal Hocko wrote:
> > On Thu 21-03-19 13:38:20, Anshuman Khandual wrote:
> >> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
> >> entries between memory block and node. It first checks pfn validity with
> >> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
> >> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
> >>
> >> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
> >> which scans all mapped memblock regions with memblock_is_map_memory(). This
> >> creates a problem in memory hot remove path which has already removed given
> >> memory range from memory block with memblock_[remove|free] before arriving
> >> at unregister_mem_sect_under_nodes().
> > 
> > Could you be more specific on what is the actual problem please? It
> > would be also helpful to mention when is the memblock[remove|free]
> > called actually.
> 
> The problem is in unregister_mem_sect_under_nodes() as it skips calling into both
> instances of sysfs_remove_link() which removes node-memory block sysfs symlinks.
> The node enumeration of the memory block still remains in sysfs even if the memory
> block itself has been removed.
> 
> This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has
> a valid associated struct page to fetch the node ID from.
> 
> On arm64 (with CONFIG_HOLES_IN_ZONE)
> 
> get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory()
> 
> At this point memblock for the range has been removed.
> 
> __remove_memory()
> 	memblock_free()
> 	memblock_remove()	--------> memblock has already been removed
> 	arch_remove_memory()
> 		__remove_pages()
> 			__remove_section()
> 				unregister_memory_section()
>  					remove_memory_section()
> 						unregister_mem_sect_under_nodes()
> 
> There is a dependency on memblock (after it has been removed) through pfn_valid().

Can we reorganize or rework the code that the memblock is removed later?
I guess this is what Oscar was suggesting.

Or ...

> >> During runtime memory hot remove get_nid_for_pfn() needs to validate that
> >> given pfn has a struct page mapping so that it can fetch required nid. This
> >> can be achieved just by looking into it's section mapping information. This
> >> adds a new helper pfn_section_valid() for this purpose. Its same as generic
> >> pfn_valid().
> > 
> > I have to say I do not like this. Having pfn_section_valid != pfn_valid_within
> > is just confusing as hell. pfn_valid_within should return true whenever
> > a struct page exists and it is sensible (same like pfn_valid). So it
> > seems that this is something to be solved on that arch specific side of
> > pfn_valid.
> 
> At present arm64's pfn_valid() implementation validates the pfn inside sparse
> memory section mapping as well memblock. The memblock search excludes memory
> with MEMBLOCK_NOMAP attribute. But in this particular instance during hotplug
> only section mapping validation for the pfn is good enough.
> 
> IIUC the current arm64 pfn_valid() already extends the definition beyond the
> availability of a valid struct page to operate on.

is there any way to record that nomap information into the section
instead. It looks rather weird that this information is spread into two
data structures and it makes pfn_valid more expensive at the same time.
Anshuman Khandual March 26, 2019, 12:03 p.m. UTC | #6
On 03/22/2019 05:32 PM, Michal Hocko wrote:
> On Fri 22-03-19 11:49:30, Anshuman Khandual wrote:
>>
>> On 03/21/2019 02:06 PM, Michal Hocko wrote:
>>> On Thu 21-03-19 13:38:20, Anshuman Khandual wrote:
>>>> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
>>>> entries between memory block and node. It first checks pfn validity with
>>>> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
>>>> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
>>>>
>>>> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
>>>> which scans all mapped memblock regions with memblock_is_map_memory(). This
>>>> creates a problem in memory hot remove path which has already removed given
>>>> memory range from memory block with memblock_[remove|free] before arriving
>>>> at unregister_mem_sect_under_nodes().
>>> Could you be more specific on what is the actual problem please? It
>>> would be also helpful to mention when is the memblock[remove|free]
>>> called actually.
>> The problem is in unregister_mem_sect_under_nodes() as it skips calling into both
>> instances of sysfs_remove_link() which removes node-memory block sysfs symlinks.
>> The node enumeration of the memory block still remains in sysfs even if the memory
>> block itself has been removed.
>>
>> This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has
>> a valid associated struct page to fetch the node ID from.
>>
>> On arm64 (with CONFIG_HOLES_IN_ZONE)
>>
>> get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory()
>>
>> At this point memblock for the range has been removed.
>>
>> __remove_memory()
>> 	memblock_free()
>> 	memblock_remove()	--------> memblock has already been removed
>> 	arch_remove_memory()
>> 		__remove_pages()
>> 			__remove_section()
>> 				unregister_memory_section()
>>  					remove_memory_section()
>> 						unregister_mem_sect_under_nodes()
>>
>> There is a dependency on memblock (after it has been removed) through pfn_valid().
> Can we reorganize or rework the code that the memblock is removed later?
> I guess this is what Oscar was suggesting.

I could get it working with the following re-order of memblock_[free|remove] and
arch_remove_memory(). I did not observe any other adverse side affect because of
this change. Does it look okay ?

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1863,11 +1863,11 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 
        /* remove memmap entry */
        firmware_map_remove(start, start + size, "System RAM");
+       arch_remove_memory(nid, start, size, NULL);
+
        memblock_free(start, size);
        memblock_remove(start, size);
 
-       arch_remove_memory(nid, start, size, NULL);
-
        try_offline_node(nid);
 
        mem_hotplug_done();
Michal Hocko March 26, 2019, 12:25 p.m. UTC | #7
On Tue 26-03-19 17:33:19, Anshuman Khandual wrote:
[...]
> I could get it working with the following re-order of memblock_[free|remove] and
> arch_remove_memory(). I did not observe any other adverse side affect because of
> this change. Does it look okay ?

Memblock should only work with physical memory ranges without touching
struct pages so this should be safe. But you should double check of
course.

> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1863,11 +1863,11 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>  
>         /* remove memmap entry */
>         firmware_map_remove(start, start + size, "System RAM");
> +       arch_remove_memory(nid, start, size, NULL);
> +
>         memblock_free(start, size);
>         memblock_remove(start, size);
>  
> -       arch_remove_memory(nid, start, size, NULL);
> -
>         try_offline_node(nid);
>  
>         mem_hotplug_done();
diff mbox series

Patch

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 86d6cd92ce3d..9e944b71e352 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -394,11 +394,20 @@  int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 static int __ref get_nid_for_pfn(unsigned long pfn)
 {
-	if (!pfn_valid_within(pfn))
-		return -1;
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-	if (system_state < SYSTEM_RUNNING)
+	if (system_state < SYSTEM_RUNNING) {
+		if (!pfn_valid_within(pfn))
+			return -1;
 		return early_pfn_to_nid(pfn);
+	}
+#endif
+
+#if defined(CONFIG_HAVE_ARCH_PFN_VALID) && defined(CONFIG_HOLES_IN_ZONE)
+	if (!pfn_section_valid(pfn))
+		return -1;
+#else
+	if (!pfn_valid_within(pfn))
+		return -1;
 #endif
 	return pfn_to_nid(pfn);
 }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 842f9189537b..9cf4c1111b95 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1242,13 +1242,18 @@  static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 
 extern int __highest_present_section_nr;
 
-#ifndef CONFIG_HAVE_ARCH_PFN_VALID
-static inline int pfn_valid(unsigned long pfn)
+static inline int pfn_section_valid(unsigned long pfn)
 {
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
+
+#ifndef CONFIG_HAVE_ARCH_PFN_VALID
+static inline int pfn_valid(unsigned long pfn)
+{
+	return pfn_section_valid(pfn);
+}
 #endif
 
 static inline int pfn_present(unsigned long pfn)