Message ID | 20181101091055.GA15166@MiWiFi-R3L-srv (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Memory hotplug failed to offline on bare metal system of multiple nodes | expand |
On Thu 01-11-18 17:10:55, Baoquan He wrote: > Hi, > > A hot removal failure was met on one bare metal system with 8 nodes, and > node1~7 are all hotpluggable and 'movable_node' is set. When try to check > value of /sys/devices/system/node/node1/memory*/removable, found some of > them are 0, namely un-removable. And a back trace will always be seen. After > bisecting, it points at criminal commit: > > 15c30bc09085 ("mm, memory_hotplug: make has_unmovable_pages more robust") > > Reverting it fix the failure, and node1~7 can be hot removed and hot > added again. From the log of commit 15c30bc09085, it's to fix a > movable_core setting issue which we allocated node_data firstly in > initmem_init(), then try to mark it as movable in mm_init(). We may need > think about it further to fix it, meanwhile not breaking bare metal > system. > > I haven't figured out why the above commit caused those memmory > block in MOVABL zone being not removable. Still checking. Attach the > tested reverting patch in this mail. Could you check which of the test inside has_unmovable_pages claimed the failure? Going back to marking movable_zone as guaranteed to offline is just too fragile.
On 11/01/18 at 10:22am, Michal Hocko wrote: > > I haven't figured out why the above commit caused those memmory > > block in MOVABL zone being not removable. Still checking. Attach the > > tested reverting patch in this mail. > > Could you check which of the test inside has_unmovable_pages claimed the > failure? Going back to marking movable_zone as guaranteed to offline is > just too fragile. Sure, will add debugging code and check. Will update if anything found.
On Thu 01-11-18 17:42:43, Baoquan He wrote: > On 11/01/18 at 10:22am, Michal Hocko wrote: > > > I haven't figured out why the above commit caused those memmory > > > block in MOVABL zone being not removable. Still checking. Attach the > > > tested reverting patch in this mail. > > > > Could you check which of the test inside has_unmovable_pages claimed the > > failure? Going back to marking movable_zone as guaranteed to offline is > > just too fragile. > > Sure, will add debugging code and check. Will update if anything found. Please dump the whole struct page state for the failing pfn.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a919ba5..b48b5eb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7760,12 +7760,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, unsigned long pfn, iter, found; /* - * TODO we could make this much more efficient by not checking every - * page in the range if we know all of them are in MOVABLE_ZONE and - * that the movable zone guarantees that pages are migratable but - * the later is not the case right now unfortunatelly. E.g. movablecore - * can still lead to having bootmem allocations in zone_movable. + * For avoiding noise data, lru_add_drain_all() should be called + * If ZONE_MOVABLE, the zone never contains unmovable pages */ + if (zone_idx(zone) == ZONE_MOVABLE) + return false; /* * CMA allocations (alloc_contig_range) really need to mark isolate @@ -7786,7 +7785,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); if (PageReserved(page)) - goto unmovable; + return true; /* * Hugepages are not in LRU lists, but they're movable. @@ -7796,7 +7795,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, if (PageHuge(page)) { if (!hugepage_migration_supported(page_hstate(page))) - goto unmovable; + return true; iter = round_up(iter + 1, 1<<compound_order(page)) - 1; continue; @@ -7840,12 +7839,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * page at boot. */ if (found > count) - goto unmovable; + return true; } return false; -unmovable: - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); - return true; } #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)