diff mbox series

[v2,1/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage

Message ID 20220905062137.1455537-2-naoya.horiguchi@linux.dev (mailing list archive)
State New
Headers show
Series mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug | expand

Commit Message

Naoya Horiguchi Sept. 5, 2022, 6:21 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

HWPoisoned page is not supposed to be accessed once marked, but currently
such accesses can happen during memory hotremove because do_migrate_range()
can be called before dissolve_free_huge_pages() is called.

Move dissolve_free_huge_pages() before scan_movable_pages(). Recently
delayed dissolve has been implemented, so the dissolving can turn
a hwpoisoned hugepage into 4kB hwpoison page, which memory hotplug can
handle safely.

Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory_hotplug.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Comments

Miaohe Lin Sept. 6, 2022, 2:59 a.m. UTC | #1
On 2022/9/5 14:21, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> HWPoisoned page is not supposed to be accessed once marked, but currently
> such accesses can happen during memory hotremove because do_migrate_range()
> can be called before dissolve_free_huge_pages() is called.
> 
> Move dissolve_free_huge_pages() before scan_movable_pages(). Recently
> delayed dissolve has been implemented, so the dissolving can turn
> a hwpoisoned hugepage into 4kB hwpoison page, which memory hotplug can
> handle safely.

Yes, thanks for your work, Naoya. ;)

> 
> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/memory_hotplug.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index fad6d1f2262a..c24735d63b25 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1880,6 +1880,17 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  
>  			cond_resched();
>  
> +			/*
> +			 * Dissolve free hugepages in the memory block before doing
> +			 * offlining actually in order to make hugetlbfs's object
> +			 * counting consistent.
> +			 */
> +			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> +			if (ret) {
> +				reason = "failure to dissolve huge pages";
> +				goto failed_removal_isolated;
> +			}

This change has a side-effect. If hugetlb pages are in-use, dissolve_free_huge_pages() will always return -EBUSY
even if those pages can be migrated. So we fail to hotremove the memory even if they could be offlined.
Or am I miss something?

Thanks,
Miaohe Lin

> +
>  			ret = scan_movable_pages(pfn, end_pfn, &pfn);
>  			if (!ret) {
>  				/*
> @@ -1895,17 +1906,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  			goto failed_removal_isolated;
>  		}
>  
> -		/*
> -		 * Dissolve free hugepages in the memory block before doing
> -		 * offlining actually in order to make hugetlbfs's object
> -		 * counting consistent.
> -		 */
> -		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -		if (ret) {
> -			reason = "failure to dissolve huge pages";
> -			goto failed_removal_isolated;
> -		}
> -
>  		ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
>  
>  	} while (ret);
>
HORIGUCHI NAOYA(堀口 直也) Sept. 6, 2022, 6:14 a.m. UTC | #2
On Tue, Sep 06, 2022 at 10:59:58AM +0800, Miaohe Lin wrote:
> On 2022/9/5 14:21, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > HWPoisoned page is not supposed to be accessed once marked, but currently
> > such accesses can happen during memory hotremove because do_migrate_range()
> > can be called before dissolve_free_huge_pages() is called.
> > 
> > Move dissolve_free_huge_pages() before scan_movable_pages(). Recently
> > delayed dissolve has been implemented, so the dissolving can turn
> > a hwpoisoned hugepage into 4kB hwpoison page, which memory hotplug can
> > handle safely.
> 
> Yes, thanks for your work, Naoya. ;)
> 
> > 
> > Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> >  mm/memory_hotplug.c | 22 +++++++++++-----------
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index fad6d1f2262a..c24735d63b25 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1880,6 +1880,17 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> >  
> >  			cond_resched();
> >  
> > +			/*
> > +			 * Dissolve free hugepages in the memory block before doing
> > +			 * offlining actually in order to make hugetlbfs's object
> > +			 * counting consistent.
> > +			 */
> > +			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> > +			if (ret) {
> > +				reason = "failure to dissolve huge pages";
> > +				goto failed_removal_isolated;
> > +			}
> 
> This change has a side-effect. If hugetlb pages are in-use, dissolve_free_huge_pages() will always return -EBUSY
> even if those pages can be migrated. So we fail to hotremove the memory even if they could be offlined.
> Or am I miss something?

Thank you for the comment, you're right.  (Taking a look over my test result
carefully, it showed failures for the related cases, I somehow overlooked
them, really sorry.)  So my second thought is that we keep offline_pages()
as is, and insert a few line in do_migrate_range() to handle the case of
hwpoisoned hugepage like below:

@@ -1642,6 +1642,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)

                if (PageHuge(page)) {
                        pfn = page_to_pfn(head) + compound_nr(head) - 1;
+                       if (PageHWPoison(head))
+                               continue;
                        isolate_hugetlb(head, &source);
                        continue;
                } else if (PageTransHuge(page))

This is slightly different from your original suggestion
https://lore.kernel.org/linux-mm/20220421135129.19767-1-linmiaohe@huawei.com/T
, as discussed in the thread existing "if (PageHWPoison(page))" branch in
this function can't be used for hugetlb.  We could adjust them to handle
hugetlb, but maybe separating code for hugetlb first from the others looks
less compicated to me.

If you have any suggestion on this, please let me know.

Thanks,
Naoya Horiguchi
Miaohe Lin Sept. 6, 2022, 8:14 a.m. UTC | #3
On 2022/9/6 14:14, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Sep 06, 2022 at 10:59:58AM +0800, Miaohe Lin wrote:
>> On 2022/9/5 14:21, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> HWPoisoned page is not supposed to be accessed once marked, but currently
>>> such accesses can happen during memory hotremove because do_migrate_range()
>>> can be called before dissolve_free_huge_pages() is called.
>>>
>>> Move dissolve_free_huge_pages() before scan_movable_pages(). Recently
>>> delayed dissolve has been implemented, so the dissolving can turn
>>> a hwpoisoned hugepage into 4kB hwpoison page, which memory hotplug can
>>> handle safely.
>>
>> Yes, thanks for your work, Naoya. ;)
>>
>>>
>>> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>>  mm/memory_hotplug.c | 22 +++++++++++-----------
>>>  1 file changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index fad6d1f2262a..c24735d63b25 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1880,6 +1880,17 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>>>  
>>>  			cond_resched();
>>>  
>>> +			/*
>>> +			 * Dissolve free hugepages in the memory block before doing
>>> +			 * offlining actually in order to make hugetlbfs's object
>>> +			 * counting consistent.
>>> +			 */
>>> +			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
>>> +			if (ret) {
>>> +				reason = "failure to dissolve huge pages";
>>> +				goto failed_removal_isolated;
>>> +			}
>>
>> This change has a side-effect. If hugetlb pages are in-use, dissolve_free_huge_pages() will always return -EBUSY
>> even if those pages can be migrated. So we fail to hotremove the memory even if they could be offlined.
>> Or am I miss something?
> 
> Thank you for the comment, you're right.  (Taking a look over my test result
> carefully, it showed failures for the related cases, I somehow overlooked
> them, really sorry.)  So my second thought is that we keep offline_pages()
> as is, and insert a few line in do_migrate_range() to handle the case of
> hwpoisoned hugepage like below:
> 
> @@ -1642,6 +1642,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> 
>                 if (PageHuge(page)) {
>                         pfn = page_to_pfn(head) + compound_nr(head) - 1;
> +                       if (PageHWPoison(head))
> +                               continue;

Thanks for your update. But it seems this is not enough. With the above code change, HWPoisoned
hugetlb pages will always be ignored in do_migrate_range(). And if these pages are HPageMigratable,
they will be returned in scan_movable_pages() then passed into the do_migrate_range() again. Thus
a possible deadloop will occur until these pages become un-movable?

>                         isolate_hugetlb(head, &source);
>                         continue;
>                 } else if (PageTransHuge(page))
> 
> This is slightly different from your original suggestion
> https://lore.kernel.org/linux-mm/20220421135129.19767-1-linmiaohe@huawei.com/T
> , as discussed in the thread existing "if (PageHWPoison(page))" branch in
> this function can't be used for hugetlb.  We could adjust them to handle
> hugetlb, but maybe separating code for hugetlb first from the others looks
> less compicated to me.

It might be better to do something, e.g. unmap the hugetlb pages to prevent accessing from process if mapped,
even try truncating the error page from pagecache. But I have no strong opinion as handling memory failure
would always be a best try. ;)

Thanks,
Miaohe Lin


> 
> If you have any suggestion on this, please let me know.
> 
> Thanks,
> Naoya Horiguchi
>
HORIGUCHI NAOYA(堀口 直也) Sept. 7, 2022, 4:12 a.m. UTC | #4
On Tue, Sep 06, 2022 at 04:14:40PM +0800, Miaohe Lin wrote:
> On 2022/9/6 14:14, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Tue, Sep 06, 2022 at 10:59:58AM +0800, Miaohe Lin wrote:
> >> On 2022/9/5 14:21, Naoya Horiguchi wrote:
> >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>
> >>> HWPoisoned page is not supposed to be accessed once marked, but currently
> >>> such accesses can happen during memory hotremove because do_migrate_range()
> >>> can be called before dissolve_free_huge_pages() is called.
> >>>
> >>> Move dissolve_free_huge_pages() before scan_movable_pages(). Recently
> >>> delayed dissolve has been implemented, so the dissolving can turn
> >>> a hwpoisoned hugepage into 4kB hwpoison page, which memory hotplug can
> >>> handle safely.
> >>
> >> Yes, thanks for your work, Naoya. ;)
> >>
> >>>
> >>> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> >>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>> ---
> >>>  mm/memory_hotplug.c | 22 +++++++++++-----------
> >>>  1 file changed, 11 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >>> index fad6d1f2262a..c24735d63b25 100644
> >>> --- a/mm/memory_hotplug.c
> >>> +++ b/mm/memory_hotplug.c
> >>> @@ -1880,6 +1880,17 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> >>>  
> >>>  			cond_resched();
> >>>  
> >>> +			/*
> >>> +			 * Dissolve free hugepages in the memory block before doing
> >>> +			 * offlining actually in order to make hugetlbfs's object
> >>> +			 * counting consistent.
> >>> +			 */
> >>> +			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> >>> +			if (ret) {
> >>> +				reason = "failure to dissolve huge pages";
> >>> +				goto failed_removal_isolated;
> >>> +			}
> >>
> >> This change has a side-effect. If hugetlb pages are in-use, dissolve_free_huge_pages() will always return -EBUSY
> >> even if those pages can be migrated. So we fail to hotremove the memory even if they could be offlined.
> >> Or am I miss something?
> > 
> > Thank you for the comment, you're right.  (Taking a look over my test result
> > carefully, it showed failures for the related cases, I somehow overlooked
> > them, really sorry.)  So my second thought is that we keep offline_pages()
> > as is, and insert a few line in do_migrate_range() to handle the case of
> > hwpoisoned hugepage like below:
> > 
> > @@ -1642,6 +1642,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> > 
> >                 if (PageHuge(page)) {
> >                         pfn = page_to_pfn(head) + compound_nr(head) - 1;
> > +                       if (PageHWPoison(head))
> > +                               continue;
> 
> Thanks for your update. But it seems this is not enough. With the above code change, HWPoisoned
> hugetlb pages will always be ignored in do_migrate_range(). And if these pages are HPageMigratable,
> they will be returned in scan_movable_pages() then passed into the do_migrate_range() again. Thus
> a possible deadloop will occur until these pages become un-movable?

Yeah, so scan_movable_pages() can have an additional check for hwpoisoned hugepages, or
making hwpoisoned hugepage to be !HPageMigratable (somehow) might be another option.
I like the latter one for now, but I need look into how I can update the patch more.

> 
> >                         isolate_hugetlb(head, &source);
> >                         continue;
> >                 } else if (PageTransHuge(page))
> > 
> > This is slightly different from your original suggestion
> > https://lore.kernel.org/linux-mm/20220421135129.19767-1-linmiaohe@huawei.com/T
> > , as discussed in the thread existing "if (PageHWPoison(page))" branch in
> > this function can't be used for hugetlb.  We could adjust them to handle
> > hugetlb, but maybe separating code for hugetlb first from the others looks
> > less compicated to me.
> 
> It might be better to do something, e.g. unmap the hugetlb pages to prevent accessing from process if mapped,
> even try truncating the error page from pagecache. But I have no strong opinion as handling memory failure
> would always be a best try. ;)

This could be helpful, I'll try some.
Thank you for valuable comments.

- Naoya Horiguchi

> 
> Thanks,
> Miaohe Lin
> 
> 
> > 
> > If you have any suggestion on this, please let me know.
> > 
> > Thanks,
> > Naoya Horiguchi
> >
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fad6d1f2262a..c24735d63b25 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1880,6 +1880,17 @@  int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 			cond_resched();
 
+			/*
+			 * Dissolve free hugepages in the memory block before doing
+			 * offlining actually in order to make hugetlbfs's object
+			 * counting consistent.
+			 */
+			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+			if (ret) {
+				reason = "failure to dissolve huge pages";
+				goto failed_removal_isolated;
+			}
+
 			ret = scan_movable_pages(pfn, end_pfn, &pfn);
 			if (!ret) {
 				/*
@@ -1895,17 +1906,6 @@  int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 			goto failed_removal_isolated;
 		}
 
-		/*
-		 * Dissolve free hugepages in the memory block before doing
-		 * offlining actually in order to make hugetlbfs's object
-		 * counting consistent.
-		 */
-		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
-		if (ret) {
-			reason = "failure to dissolve huge pages";
-			goto failed_removal_isolated;
-		}
-
 		ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
 
 	} while (ret);