Message ID | 20240817084941.2375713-4-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memory_hotplug: improve do_migrate_range() | expand |
On 2024/8/17 16:49, Kefeng Wang wrote: > The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned > pages to be offlined") don't handle the hugetlb pages, the endless > loop still occur if offline a hwpoison hugetlb, luckly, with the > commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove > memory section with hwpoisoned hugepage") section with hwpoisoned > hugepage"), the HPageMigratable of hugetlb page will be clear, and It should be commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage")? Above "section with hwpoisoned")" is duplicated. Also s/be clear/be cleared/ ? > the hwpoison hugetlb page will be skipped in scan_movable_pages(), > so the endless loop issue is fixed. > > However if the HPageMigratable() check passed(without reference and > lock), the hugetlb page may be hwpoisoned, it won't cause issue since > the hwpoisoned page will be handled correctly in the next movable > pages scan loop, and it will be isolated in do_migrate_range() but > fails to migrate. In order to avoid the unnecessary isolation and > unify all hwpoisoned page handling, let's unconditionally check hwpoison > firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as > the catch all safety net like normal page does. > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > mm/memory_hotplug.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index dc19b0e28fbc..02a0d4fbc3fe 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * but out loop could handle that as it revisits the split > * folio later. > */ > - if (folio_test_large(folio)) { > + if (folio_test_large(folio)) > pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > - if (folio_test_hugetlb(folio)) { > - isolate_hugetlb(folio, &source); > - continue; > - } > - } > > /* > * HWPoison pages have elevated reference counts so the migration would > @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep > * the unmap as the catch all safety net). > */ > - if (PageHWPoison(page)) { > + if (folio_test_hwpoison(folio) || > + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { > if (WARN_ON(folio_test_lru(folio))) > folio_isolate_lru(folio); > if (folio_mapped(folio)) > - try_to_unmap(folio, TTU_IGNORE_MLOCK); > + unmap_posioned_folio(folio, TTU_IGNORE_MLOCK); > + continue; > + } > + > + if (folio_test_hugetlb(folio)) { > + isolate_hugetlb(folio, &source); While you're here, should we pr_warn "failed to isolate pfn xx" for hugetlb folios too as we already done for raw pages and thp folios? Thanks. .
On 2024/8/22 14:52, Miaohe Lin wrote: > On 2024/8/17 16:49, Kefeng Wang wrote: >> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >> pages to be offlined") don't handle the hugetlb pages, the endless >> loop still occur if offline a hwpoison hugetlb, luckly, with the >> commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove >> memory section with hwpoisoned hugepage") section with hwpoisoned >> hugepage"), the HPageMigratable of hugetlb page will be clear, and > > It should be commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section > with hwpoisoned hugepage")? Above "section with hwpoisoned")" is duplicated. > > Also s/be clear/be cleared/ ? Acked, thanks for carefully review. > >> the hwpoison hugetlb page will be skipped in scan_movable_pages(), >> so the endless loop issue is fixed. >> >> However if the HPageMigratable() check passed(without reference and >> lock), the hugetlb page may be hwpoisoned, it won't cause issue since >> the hwpoisoned page will be handled correctly in the next movable >> pages scan loop, and it will be isolated in do_migrate_range() but >> fails to migrate. In order to avoid the unnecessary isolation and >> unify all hwpoisoned page handling, let's unconditionally check hwpoison >> firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as >> the catch all safety net like normal page does. >> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> --- >> mm/memory_hotplug.c | 17 +++++++++-------- >> 1 file changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index dc19b0e28fbc..02a0d4fbc3fe 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> * but out loop could handle that as it revisits the split >> * folio later. >> */ >> - if (folio_test_large(folio)) { >> + if (folio_test_large(folio)) >> pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >> - if (folio_test_hugetlb(folio)) { >> - isolate_hugetlb(folio, &source); >> - continue; >> - } >> - } >> >> /* >> * HWPoison pages have elevated reference counts so the migration would >> @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >> * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep >> * the unmap as the catch all safety net). >> */ >> - if (PageHWPoison(page)) { >> + if (folio_test_hwpoison(folio) || >> + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >> if (WARN_ON(folio_test_lru(folio))) >> folio_isolate_lru(folio); >> if (folio_mapped(folio)) >> - try_to_unmap(folio, TTU_IGNORE_MLOCK); >> + unmap_posioned_folio(folio, TTU_IGNORE_MLOCK); >> + continue; >> + } >> + >> + if (folio_test_hugetlb(folio)) { >> + isolate_hugetlb(folio, &source); > > While you're here, should we pr_warn "failed to isolate pfn xx" for hugetlb folios too as > we already done for raw pages and thp folios? We will unify folio isolation in final patch, which will print warn for hugetlb folio when failed to isolate, so no need to add here. > > Thanks. > .
On 17.08.24 10:49, Kefeng Wang wrote: > The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned > pages to be offlined") don't handle the hugetlb pages, the endless > loop still occur if offline a hwpoison hugetlb, luckly, with the > commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove > memory section with hwpoisoned hugepage") section with hwpoisoned > hugepage"), the HPageMigratable of hugetlb page will be clear, and > the hwpoison hugetlb page will be skipped in scan_movable_pages(), > so the endless loop issue is fixed. > > However if the HPageMigratable() check passed(without reference and > lock), the hugetlb page may be hwpoisoned, it won't cause issue since > the hwpoisoned page will be handled correctly in the next movable > pages scan loop, and it will be isolated in do_migrate_range() but > fails to migrate. In order to avoid the unnecessary isolation and > unify all hwpoisoned page handling, let's unconditionally check hwpoison > firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as > the catch all safety net like normal page does. > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > mm/memory_hotplug.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index dc19b0e28fbc..02a0d4fbc3fe 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * but out loop could handle that as it revisits the split > * folio later. > */ > - if (folio_test_large(folio)) { > + if (folio_test_large(folio)) > pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > - if (folio_test_hugetlb(folio)) { > - isolate_hugetlb(folio, &source); > - continue; > - } > - } > > /* > * HWPoison pages have elevated reference counts so the migration would > @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep > * the unmap as the catch all safety net). > */ > - if (PageHWPoison(page)) { > + if (folio_test_hwpoison(folio) || > + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { We have the exact same check already in mm/shmem.c now. Likely this should be factored out ... but no idea what function name we should use that won't add even more confusion :D > if (WARN_ON(folio_test_lru(folio))) > folio_isolate_lru(folio); > if (folio_mapped(folio)) > - try_to_unmap(folio, TTU_IGNORE_MLOCK); > + unmap_posioned_folio(folio, TTU_IGNORE_MLOCK); > + continue; > + } > + > + if (folio_test_hugetlb(folio)) { > + isolate_hugetlb(folio, &source); > continue; > } > Acked-by: David Hildenbrand <david@redhat.com>
On 2024/8/26 22:46, David Hildenbrand wrote: > On 17.08.24 10:49, Kefeng Wang wrote: >> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >> pages to be offlined") don't handle the hugetlb pages, the endless >> loop still occur if offline a hwpoison hugetlb, luckly, with the >> commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove >> memory section with hwpoisoned hugepage") section with hwpoisoned >> hugepage"), the HPageMigratable of hugetlb page will be clear, and >> the hwpoison hugetlb page will be skipped in scan_movable_pages(), >> so the endless loop issue is fixed. >> >> However if the HPageMigratable() check passed(without reference and >> lock), the hugetlb page may be hwpoisoned, it won't cause issue since >> the hwpoisoned page will be handled correctly in the next movable >> pages scan loop, and it will be isolated in do_migrate_range() but >> fails to migrate. In order to avoid the unnecessary isolation and >> unify all hwpoisoned page handling, let's unconditionally check hwpoison >> firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as >> the catch all safety net like normal page does. >> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> --- >> mm/memory_hotplug.c | 17 +++++++++-------- >> 1 file changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index dc19b0e28fbc..02a0d4fbc3fe 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long >> start_pfn, unsigned long end_pfn) >> * but out loop could handle that as it revisits the split >> * folio later. >> */ >> - if (folio_test_large(folio)) { >> + if (folio_test_large(folio)) >> pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >> - if (folio_test_hugetlb(folio)) { >> - isolate_hugetlb(folio, &source); >> - continue; >> - } >> - } >> /* >> * HWPoison pages have elevated reference counts so the >> migration would >> @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long >> start_pfn, unsigned long end_pfn) >> * (e.g. current hwpoison implementation doesn't unmap KSM >> pages but keep >> * the unmap as the catch all safety net). >> */ >> - if (PageHWPoison(page)) { >> + if (folio_test_hwpoison(folio) || >> + (folio_test_large(folio) && >> folio_test_has_hwpoisoned(folio))) { > > We have the exact same check already in mm/shmem.c now. > > Likely this should be factored out ... but no idea what function name we > should use that won't add even more confusion :D Maybe folio_has_hwpoison(), and Miaohe may have some suggestion, but leave it for later. > >> if (WARN_ON(folio_test_lru(folio))) >> folio_isolate_lru(folio); >> if (folio_mapped(folio)) >> - try_to_unmap(folio, TTU_IGNORE_MLOCK); >> + unmap_posioned_folio(folio, TTU_IGNORE_MLOCK); >> + continue; >> + } >> + >> + if (folio_test_hugetlb(folio)) { >> + isolate_hugetlb(folio, &source); >> continue; >> } > > Acked-by: David Hildenbrand <david@redhat.com> > Thanks.
On 2024/8/27 9:13, Kefeng Wang wrote: > > > On 2024/8/26 22:46, David Hildenbrand wrote: >> On 17.08.24 10:49, Kefeng Wang wrote: >>> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >>> pages to be offlined") don't handle the hugetlb pages, the endless >>> loop still occur if offline a hwpoison hugetlb, luckly, with the >>> commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove >>> memory section with hwpoisoned hugepage") section with hwpoisoned >>> hugepage"), the HPageMigratable of hugetlb page will be clear, and >>> the hwpoison hugetlb page will be skipped in scan_movable_pages(), >>> so the endless loop issue is fixed. >>> >>> However if the HPageMigratable() check passed(without reference and >>> lock), the hugetlb page may be hwpoisoned, it won't cause issue since >>> the hwpoisoned page will be handled correctly in the next movable >>> pages scan loop, and it will be isolated in do_migrate_range() but >>> fails to migrate. In order to avoid the unnecessary isolation and >>> unify all hwpoisoned page handling, let's unconditionally check hwpoison >>> firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as >>> the catch all safety net like normal page does. >>> >>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>> --- >>> mm/memory_hotplug.c | 17 +++++++++-------- >>> 1 file changed, 9 insertions(+), 8 deletions(-) >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index dc19b0e28fbc..02a0d4fbc3fe 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>> * but out loop could handle that as it revisits the split >>> * folio later. >>> */ >>> - if (folio_test_large(folio)) { >>> + if (folio_test_large(folio)) >>> pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>> - if (folio_test_hugetlb(folio)) { >>> - isolate_hugetlb(folio, &source); >>> - continue; >>> - } >>> - } >>> /* >>> * HWPoison pages have elevated reference counts so the migration would >>> @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>> * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep >>> * the unmap as the catch all safety net). >>> */ >>> - if (PageHWPoison(page)) { >>> + if (folio_test_hwpoison(folio) || >>> + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >> >> We have the exact same check already in mm/shmem.c now. >> >> Likely this should be factored out ... but no idea what function name we should use that won't add even more confusion :D > > Maybe folio_has_hwpoison(), and Miaohe may have some suggestion, > but leave it for later. Will it be suitable to be named as something like folio_contain_hwpoisoned_page? Thanks. .
On 27.08.24 04:12, Miaohe Lin wrote: > On 2024/8/27 9:13, Kefeng Wang wrote: >> >> >> On 2024/8/26 22:46, David Hildenbrand wrote: >>> On 17.08.24 10:49, Kefeng Wang wrote: >>>> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >>>> pages to be offlined") don't handle the hugetlb pages, the endless >>>> loop still occur if offline a hwpoison hugetlb, luckly, with the >>>> commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove >>>> memory section with hwpoisoned hugepage") section with hwpoisoned >>>> hugepage"), the HPageMigratable of hugetlb page will be clear, and >>>> the hwpoison hugetlb page will be skipped in scan_movable_pages(), >>>> so the endless loop issue is fixed. >>>> >>>> However if the HPageMigratable() check passed(without reference and >>>> lock), the hugetlb page may be hwpoisoned, it won't cause issue since >>>> the hwpoisoned page will be handled correctly in the next movable >>>> pages scan loop, and it will be isolated in do_migrate_range() but >>>> fails to migrate. In order to avoid the unnecessary isolation and >>>> unify all hwpoisoned page handling, let's unconditionally check hwpoison >>>> firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as >>>> the catch all safety net like normal page does. >>>> >>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>>> --- >>>> mm/memory_hotplug.c | 17 +++++++++-------- >>>> 1 file changed, 9 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>> index dc19b0e28fbc..02a0d4fbc3fe 100644 >>>> --- a/mm/memory_hotplug.c >>>> +++ b/mm/memory_hotplug.c >>>> @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>>> * but out loop could handle that as it revisits the split >>>> * folio later. >>>> */ >>>> - if (folio_test_large(folio)) { >>>> + if (folio_test_large(folio)) >>>> pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; >>>> - if (folio_test_hugetlb(folio)) { >>>> - isolate_hugetlb(folio, &source); >>>> - continue; >>>> - } >>>> - } >>>> /* >>>> * HWPoison pages have elevated reference counts so the migration would >>>> @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) >>>> * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep >>>> * the unmap as the catch all safety net). >>>> */ >>>> - if (PageHWPoison(page)) { >>>> + if (folio_test_hwpoison(folio) || >>>> + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >>> >>> We have the exact same check already in mm/shmem.c now. >>> >>> Likely this should be factored out ... but no idea what function name we should use that won't add even more confusion :D >> >> Maybe folio_has_hwpoison(), and Miaohe may have some suggestion, >> but leave it for later. > > Will it be suitable to be named as something like folio_contain_hwpoisoned_page? That sounds much better. "folio_has_hwpoison" is way to similar to "folio_test_has_hwpoisoned".
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index dc19b0e28fbc..02a0d4fbc3fe 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1793,13 +1793,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) * but out loop could handle that as it revisits the split * folio later. */ - if (folio_test_large(folio)) { + if (folio_test_large(folio)) pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; - if (folio_test_hugetlb(folio)) { - isolate_hugetlb(folio, &source); - continue; - } - } /* * HWPoison pages have elevated reference counts so the migration would @@ -1808,11 +1803,17 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep * the unmap as the catch all safety net). */ - if (PageHWPoison(page)) { + if (folio_test_hwpoison(folio) || + (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); if (folio_mapped(folio)) - try_to_unmap(folio, TTU_IGNORE_MLOCK); + unmap_posioned_folio(folio, TTU_IGNORE_MLOCK); + continue; + } + + if (folio_test_hugetlb(folio)) { + isolate_hugetlb(folio, &source); continue; }
The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") don't handle the hugetlb pages, the endless loop still occur if offline a hwpoison hugetlb, luckly, with the commit e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage") section with hwpoisoned hugepage"), the HPageMigratable of hugetlb page will be clear, and the hwpoison hugetlb page will be skipped in scan_movable_pages(), so the endless loop issue is fixed. However if the HPageMigratable() check passed(without reference and lock), the hugetlb page may be hwpoisoned, it won't cause issue since the hwpoisoned page will be handled correctly in the next movable pages scan loop, and it will be isolated in do_migrate_range() but fails to migrate. In order to avoid the unnecessary isolation and unify all hwpoisoned page handling, let's unconditionally check hwpoison firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as the catch all safety net like normal page does. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- mm/memory_hotplug.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)