diff mbox series

[v3,3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range()

Message ID 20240827114728.3212578-4-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: memory_hotplug: improve do_migrate_range() | expand

Commit Message

Kefeng Wang Aug. 27, 2024, 11:47 a.m. UTC
The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned
pages to be offlined") don't handle the hugetlb pages, the endless loop
still occur if offline a hwpoison hugetlb page, luckly, after the
commit e591ef7d96d6 ("mm, hwpoison,hugetlb,memory_hotplug: hotremove
memory section with hwpoisoned hugepage"), the HPageMigratable of hugetlb
page will be cleared, and the hwpoison hugetlb page will be skipped in
scan_movable_pages(), so the endless loop issue is fixed.

However if the HPageMigratable() check passed(without reference and lock),
the hugetlb page may be hwpoisoned, it won't cause issue since the
hwpoisoned page will be handled correctly in the next movable pages scan
loop, and it will be isolated in do_migrate_range() but fails to migrate.
In order to avoid the unnecessary isolation and unify all hwpoisoned page
handling, let's unconditionally check hwpoison firstly, and if it is a
hwpoisoned hugetlb page, try to unmap it as the catch all safety net like
normal page does.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Miaohe Lin Aug. 31, 2024, 8:36 a.m. UTC | #1
On 2024/8/27 19:47, Kefeng Wang wrote:
> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned
> pages to be offlined") don't handle the hugetlb pages, the endless loop
> still occur if offline a hwpoison hugetlb page, luckly, after the
> commit e591ef7d96d6 ("mm, hwpoison,hugetlb,memory_hotplug: hotremove
> memory section with hwpoisoned hugepage"), the HPageMigratable of hugetlb
> page will be cleared, and the hwpoison hugetlb page will be skipped in
> scan_movable_pages(), so the endless loop issue is fixed.
> 
> However if the HPageMigratable() check passed(without reference and lock),
> the hugetlb page may be hwpoisoned, it won't cause issue since the
> hwpoisoned page will be handled correctly in the next movable pages scan
> loop, and it will be isolated in do_migrate_range() but fails to migrate.
> In order to avoid the unnecessary isolation and unify all hwpoisoned page
> handling, let's unconditionally check hwpoison firstly, and if it is a
> hwpoisoned hugetlb page, try to unmap it as the catch all safety net like
> normal page does.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/memory_hotplug.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9ef776a25b9d..1335fb6ef7fa 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1793,26 +1793,26 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		 * folio_nr_pages() may read garbage.  This is fine as the outer
>  		 * loop will revisit the split folio later.
>  		 */
> -		if (folio_test_large(folio)) {
> +		if (folio_test_large(folio))
>  			pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
> -			if (folio_test_hugetlb(folio)) {
> -				isolate_hugetlb(folio, &source);
> -				continue;
> -			}
> -		}
>  
>  		/*
>  		 * HWPoison pages have elevated reference counts so the migration would
>  		 * fail on them. It also doesn't make any sense to migrate them in the
>  		 * first place. Still try to unmap such a page in case it is still mapped
> -		 * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep
> -		 * the unmap as the catch all safety net).
> +		 * (keep the unmap as the catch all safety net).

I tend to remove "keep the unmap as the catch all safety net" too. I think it's simply
used to describe KSM scene.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9ef776a25b9d..1335fb6ef7fa 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1793,26 +1793,26 @@  static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		 * folio_nr_pages() may read garbage.  This is fine as the outer
 		 * loop will revisit the split folio later.
 		 */
-		if (folio_test_large(folio)) {
+		if (folio_test_large(folio))
 			pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
-			if (folio_test_hugetlb(folio)) {
-				isolate_hugetlb(folio, &source);
-				continue;
-			}
-		}
 
 		/*
 		 * HWPoison pages have elevated reference counts so the migration would
 		 * fail on them. It also doesn't make any sense to migrate them in the
 		 * first place. Still try to unmap such a page in case it is still mapped
-		 * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep
-		 * the unmap as the catch all safety net).
+		 * (keep the unmap as the catch all safety net).
 		 */
-		if (PageHWPoison(page)) {
+		if (folio_test_hwpoison(folio) ||
+		    (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
 			if (WARN_ON(folio_test_lru(folio)))
 				folio_isolate_lru(folio);
 			if (folio_mapped(folio))
-				try_to_unmap(folio, TTU_IGNORE_MLOCK);
+				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
+			continue;
+		}
+
+		if (folio_test_hugetlb(folio)) {
+			isolate_hugetlb(folio, &source);
 			continue;
 		}