Message ID | 1553155700-3414-1-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] mm/hotplug: Make get_nid_for_pfn() work with HAVE_ARCH_PFN_VALID | expand |
On Thu 21-03-19 13:38:20, Anshuman Khandual wrote: > Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs > entries between memory block and node. It first checks pfn validity with > pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config > (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). > > pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) > which scans all mapped memblock regions with memblock_is_map_memory(). This > creates a problem in memory hot remove path which has already removed given > memory range from memory block with memblock_[remove|free] before arriving > at unregister_mem_sect_under_nodes(). Could you be more specific on what is the actual problem please? It would be also helpful to mention when is the memblock[remove|free] called actually. > During runtime memory hot remove get_nid_for_pfn() needs to validate that > given pfn has a struct page mapping so that it can fetch required nid. This > can be achieved just by looking into it's section mapping information. This > adds a new helper pfn_section_valid() for this purpose. Its same as generic > pfn_valid(). I have to say I do not like this. Having pfn_section_valid != pfn_valid_within is just confusing as hell. pfn_valid_within should return true whenever a struct page exists and it is sensible (same like pfn_valid). So it seems that this is something to be solved on that arch specific side of pfn_valid.
On Thu, Mar 21, 2019 at 01:38:20PM +0530, Anshuman Khandual wrote: > Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs > entries between memory block and node. It first checks pfn validity with > pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config > (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). > > pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) > which scans all mapped memblock regions with memblock_is_map_memory(). This > creates a problem in memory hot remove path which has already removed given > memory range from memory block with memblock_[remove|free] before arriving > at unregister_mem_sect_under_nodes(). > > During runtime memory hot remove get_nid_for_pfn() needs to validate that > given pfn has a struct page mapping so that it can fetch required nid. This > can be achieved just by looking into it's section mapping information. This > adds a new helper pfn_section_valid() for this purpose. Its same as generic > pfn_valid(). > > This maintains existing behaviour for deferred struct page init case. > > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> I did not look really close to the patch, but I was dealing with unregister_mem_sect_under_nodes() some time ago [1]. The thing is, I think we can just make it less complex. Jonathan tried it out that patch on arm64 back then, and it worked correctly for him, and it did for me too on x86_64. I am not sure if I overlooked a corner case during the creation of the patch, that could lead to problems. But if not, we can get away with that, and we would not need to worry about get_nid_for_pfn on hot-remove path. I plan to revisit the patch in some days, but first I wanted to sort out the vmemmap stuff, which I am preparing a new version of it. [1] https://patchwork.kernel.org/patch/10700795/
On 03/21/2019 02:06 PM, Michal Hocko wrote: > On Thu 21-03-19 13:38:20, Anshuman Khandual wrote: >> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs >> entries between memory block and node. It first checks pfn validity with >> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config >> (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). >> >> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) >> which scans all mapped memblock regions with memblock_is_map_memory(). This >> creates a problem in memory hot remove path which has already removed given >> memory range from memory block with memblock_[remove|free] before arriving >> at unregister_mem_sect_under_nodes(). > > Could you be more specific on what is the actual problem please? It > would be also helpful to mention when is the memblock[remove|free] > called actually. The problem is in unregister_mem_sect_under_nodes() as it skips calling into both instances of sysfs_remove_link() which removes node-memory block sysfs symlinks. The node enumeration of the memory block still remains in sysfs even if the memory block itself has been removed. This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has a valid associated struct page to fetch the node ID from. On arm64 (with CONFIG_HOLES_IN_ZONE) get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory() At this point memblock for the range has been removed. __remove_memory() memblock_free() memblock_remove() --------> memblock has already been removed arch_remove_memory() __remove_pages() __remove_section() unregister_memory_section() remove_memory_section() unregister_mem_sect_under_nodes() There is a dependency on memblock (after it has been removed) through pfn_valid(). > >> During runtime memory hot remove get_nid_for_pfn() needs to validate that >> given pfn has a struct page mapping so that it can fetch required nid. This >> can be achieved just by looking into it's section mapping information. This >> adds a new helper pfn_section_valid() for this purpose. Its same as generic >> pfn_valid(). > > I have to say I do not like this. Having pfn_section_valid != pfn_valid_within > is just confusing as hell. pfn_valid_within should return true whenever > a struct page exists and it is sensible (same like pfn_valid). So it > seems that this is something to be solved on that arch specific side of > pfn_valid. At present arm64's pfn_valid() implementation validates the pfn inside sparse memory section mapping as well memblock. The memblock search excludes memory with MEMBLOCK_NOMAP attribute. But in this particular instance during hotplug only section mapping validation for the pfn is good enough. IIUC the current arm64 pfn_valid() already extends the definition beyond the availability of a valid struct page to operate on.
On 03/21/2019 04:07 PM, Oscar Salvador wrote: > On Thu, Mar 21, 2019 at 01:38:20PM +0530, Anshuman Khandual wrote: >> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs >> entries between memory block and node. It first checks pfn validity with >> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config >> (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). >> >> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) >> which scans all mapped memblock regions with memblock_is_map_memory(). This >> creates a problem in memory hot remove path which has already removed given >> memory range from memory block with memblock_[remove|free] before arriving >> at unregister_mem_sect_under_nodes(). >> >> During runtime memory hot remove get_nid_for_pfn() needs to validate that >> given pfn has a struct page mapping so that it can fetch required nid. This >> can be achieved just by looking into it's section mapping information. This >> adds a new helper pfn_section_valid() for this purpose. Its same as generic >> pfn_valid(). >> >> This maintains existing behaviour for deferred struct page init case. >> >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > > I did not look really close to the patch, but I was dealing with > unregister_mem_sect_under_nodes() some time ago [1]. > > The thing is, I think we can just make it less complex. > Jonathan tried it out that patch on arm64 back then, and it worked correctly > for him, and it did for me too on x86_64. > > I am not sure if I overlooked a corner case during the creation of the patch, > that could lead to problems. Is there any known corner cases ? > But if not, we can get away with that, and we would not need to worry > about get_nid_for_pfn on hot-remove path. The approach of passing down node ID looks good and will also avoid proposed changes here to get_nid_for_pfn() during memory hot-remove. > > I plan to revisit the patch in some days, but first I wanted to sort out > the vmemmap stuff, which I am preparing a new version of it. > > [1] https://patchwork.kernel.org/patch/10700795/ > Sure. Please keep me copied when you repost this patch. Thank you.
On Fri 22-03-19 11:49:30, Anshuman Khandual wrote: > > > On 03/21/2019 02:06 PM, Michal Hocko wrote: > > On Thu 21-03-19 13:38:20, Anshuman Khandual wrote: > >> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs > >> entries between memory block and node. It first checks pfn validity with > >> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config > >> (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). > >> > >> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) > >> which scans all mapped memblock regions with memblock_is_map_memory(). This > >> creates a problem in memory hot remove path which has already removed given > >> memory range from memory block with memblock_[remove|free] before arriving > >> at unregister_mem_sect_under_nodes(). > > > > Could you be more specific on what is the actual problem please? It > > would be also helpful to mention when is the memblock[remove|free] > > called actually. > > The problem is in unregister_mem_sect_under_nodes() as it skips calling into both > instances of sysfs_remove_link() which removes node-memory block sysfs symlinks. > The node enumeration of the memory block still remains in sysfs even if the memory > block itself has been removed. > > This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has > a valid associated struct page to fetch the node ID from. > > On arm64 (with CONFIG_HOLES_IN_ZONE) > > get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory() > > At this point memblock for the range has been removed. > > __remove_memory() > memblock_free() > memblock_remove() --------> memblock has already been removed > arch_remove_memory() > __remove_pages() > __remove_section() > unregister_memory_section() > remove_memory_section() > unregister_mem_sect_under_nodes() > > There is a dependency on memblock (after it has been removed) through pfn_valid(). Can we reorganize or rework the code that the memblock is removed later? I guess this is what Oscar was suggesting. Or ... > >> During runtime memory hot remove get_nid_for_pfn() needs to validate that > >> given pfn has a struct page mapping so that it can fetch required nid. This > >> can be achieved just by looking into it's section mapping information. This > >> adds a new helper pfn_section_valid() for this purpose. Its same as generic > >> pfn_valid(). > > > > I have to say I do not like this. Having pfn_section_valid != pfn_valid_within > > is just confusing as hell. pfn_valid_within should return true whenever > > a struct page exists and it is sensible (same like pfn_valid). So it > > seems that this is something to be solved on that arch specific side of > > pfn_valid. > > At present arm64's pfn_valid() implementation validates the pfn inside sparse > memory section mapping as well memblock. The memblock search excludes memory > with MEMBLOCK_NOMAP attribute. But in this particular instance during hotplug > only section mapping validation for the pfn is good enough. > > IIUC the current arm64 pfn_valid() already extends the definition beyond the > availability of a valid struct page to operate on. is there any way to record that nomap information into the section instead. It looks rather weird that this information is spread into two data structures and it makes pfn_valid more expensive at the same time.
On 03/22/2019 05:32 PM, Michal Hocko wrote: > On Fri 22-03-19 11:49:30, Anshuman Khandual wrote: >> >> On 03/21/2019 02:06 PM, Michal Hocko wrote: >>> On Thu 21-03-19 13:38:20, Anshuman Khandual wrote: >>>> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs >>>> entries between memory block and node. It first checks pfn validity with >>>> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config >>>> (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). >>>> >>>> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) >>>> which scans all mapped memblock regions with memblock_is_map_memory(). This >>>> creates a problem in memory hot remove path which has already removed given >>>> memory range from memory block with memblock_[remove|free] before arriving >>>> at unregister_mem_sect_under_nodes(). >>> Could you be more specific on what is the actual problem please? It >>> would be also helpful to mention when is the memblock[remove|free] >>> called actually. >> The problem is in unregister_mem_sect_under_nodes() as it skips calling into both >> instances of sysfs_remove_link() which removes node-memory block sysfs symlinks. >> The node enumeration of the memory block still remains in sysfs even if the memory >> block itself has been removed. >> >> This happens because get_nid_for_pfn() returns -1 for a given pfn even if it has >> a valid associated struct page to fetch the node ID from. >> >> On arm64 (with CONFIG_HOLES_IN_ZONE) >> >> get_nid_for_pfn() -> pfn_valid_within() -> pfn_valid -> memblock_is_map_memory() >> >> At this point memblock for the range has been removed. >> >> __remove_memory() >> memblock_free() >> memblock_remove() --------> memblock has already been removed >> arch_remove_memory() >> __remove_pages() >> __remove_section() >> unregister_memory_section() >> remove_memory_section() >> unregister_mem_sect_under_nodes() >> >> There is a dependency on memblock (after it has been removed) through pfn_valid(). > Can we reorganize or rework the code that the memblock is removed later? > I guess this is what Oscar was suggesting. I could get it working with the following re-order of memblock_[free|remove] and arch_remove_memory(). I did not observe any other adverse side affect because of this change. Does it look okay ? --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1863,11 +1863,11 @@ void __ref __remove_memory(int nid, u64 start, u64 size) /* remove memmap entry */ firmware_map_remove(start, start + size, "System RAM"); + arch_remove_memory(nid, start, size, NULL); + memblock_free(start, size); memblock_remove(start, size); - arch_remove_memory(nid, start, size, NULL); - try_offline_node(nid); mem_hotplug_done();
On Tue 26-03-19 17:33:19, Anshuman Khandual wrote: [...] > I could get it working with the following re-order of memblock_[free|remove] and > arch_remove_memory(). I did not observe any other adverse side affect because of > this change. Does it look okay ? Memblock should only work with physical memory ranges without touching struct pages so this should be safe. But you should double check of course. > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1863,11 +1863,11 @@ void __ref __remove_memory(int nid, u64 start, u64 size) > > /* remove memmap entry */ > firmware_map_remove(start, start + size, "System RAM"); > + arch_remove_memory(nid, start, size, NULL); > + > memblock_free(start, size); > memblock_remove(start, size); > > - arch_remove_memory(nid, start, size, NULL); > - > try_offline_node(nid); > > mem_hotplug_done();
diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..9e944b71e352 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -394,11 +394,20 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE static int __ref get_nid_for_pfn(unsigned long pfn) { - if (!pfn_valid_within(pfn)) - return -1; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT - if (system_state < SYSTEM_RUNNING) + if (system_state < SYSTEM_RUNNING) { + if (!pfn_valid_within(pfn)) + return -1; return early_pfn_to_nid(pfn); + } +#endif + +#if defined(CONFIG_HAVE_ARCH_PFN_VALID) && defined(CONFIG_HOLES_IN_ZONE) + if (!pfn_section_valid(pfn)) + return -1; +#else + if (!pfn_valid_within(pfn)) + return -1; #endif return pfn_to_nid(pfn); } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 842f9189537b..9cf4c1111b95 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1242,13 +1242,18 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn) extern int __highest_present_section_nr; -#ifndef CONFIG_HAVE_ARCH_PFN_VALID -static inline int pfn_valid(unsigned long pfn) +static inline int pfn_section_valid(unsigned long pfn) { if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); } + +#ifndef CONFIG_HAVE_ARCH_PFN_VALID +static inline int pfn_valid(unsigned long pfn) +{ + return pfn_section_valid(pfn); +} #endif static inline int pfn_present(unsigned long pfn)
Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs entries between memory block and node. It first checks pfn validity with pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) which scans all mapped memblock regions with memblock_is_map_memory(). This creates a problem in memory hot remove path which has already removed given memory range from memory block with memblock_[remove|free] before arriving at unregister_mem_sect_under_nodes(). During runtime memory hot remove get_nid_for_pfn() needs to validate that given pfn has a struct page mapping so that it can fetch required nid. This can be achieved just by looking into it's section mapping information. This adds a new helper pfn_section_valid() for this purpose. Its same as generic pfn_valid(). This maintains existing behaviour for deferred struct page init case. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- This is a preparatory patch for memory hot-remove enablement on arm64. I will appreciate some early feedback on this approach. drivers/base/node.c | 15 ++++++++++++--- include/linux/mmzone.h | 9 +++++++-- 2 files changed, 19 insertions(+), 5 deletions(-)