diff mbox series

[RFC,v1,2/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage

Message ID 20220427042841.678351-3-naoya.horiguchi@linux.dev (mailing list archive)
State New
Headers show
Series mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug | expand

Commit Message

Naoya Horiguchi April 27, 2022, 4:28 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

HWPoisoned page is not supposed to prevent memory hotremove, but
currently this does not properly work for hwpoisoned hugepages and the
kernel tries to migrate them, which could cause consuming corrupted
data.

Move dissolve_free_huge_pages() before scan_movable_pages(). This is
because the result of the movable check depends on the result of the
dissolve.  Now delayed dissolve is available, so hwpoisoned hugepages
can be turned into 4kB hwpoison page which memory hotplug can handle.

And clear HPageMigratable pseudo flag for hwpoisoned hugepages. This is
also important because dissolve_free_huge_page() can fail.  So it's
still necessary to prevent do_migrate_pages() from trying to migrate
hwpoison hugepages.

Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/hugetlb.c        | 11 +++++++++++
 mm/memory-failure.c |  2 ++
 mm/memory_hotplug.c | 23 +++++++++++------------
 3 files changed, 24 insertions(+), 12 deletions(-)

Comments

Miaohe Lin April 29, 2022, 8:49 a.m. UTC | #1
On 2022/4/27 12:28, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> HWPoisoned page is not supposed to prevent memory hotremove, but
> currently this does not properly work for hwpoisoned hugepages and the
> kernel tries to migrate them, which could cause consuming corrupted
> data.
> 
> Move dissolve_free_huge_pages() before scan_movable_pages(). This is
> because the result of the movable check depends on the result of the
> dissolve.  Now delayed dissolve is available, so hwpoisoned hugepages
> can be turned into 4kB hwpoison page which memory hotplug can handle.
> 
> And clear HPageMigratable pseudo flag for hwpoisoned hugepages. This is
> also important because dissolve_free_huge_page() can fail.  So it's
> still necessary to prevent do_migrate_pages() from trying to migrate
> hwpoison hugepages.
> 
> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/hugetlb.c        | 11 +++++++++++
>  mm/memory-failure.c |  2 ++
>  mm/memory_hotplug.c | 23 +++++++++++------------
>  3 files changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6867ea8345d1..95b1db852ca9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2159,6 +2159,17 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>  		page = pfn_to_page(pfn);
> +
> +		if (PageHuge(page) && PageHWPoison(page)) {
> +			/*
> +			 * Release the last refcount from hwpoison to turn into
> +			 * a free hugepage.
> +			 */
> +			if (page_count(page) == 1)
> +				put_page(page);
> +			page = hugetlb_page_hwpoison(page);
> +		}
> +

This patch looks good to me. Thanks!

One question: Can this hugepage be put into buddy system? In free_huge_page,
if h->surplus_huge_pages_node[nid] > 0, hugepage might be put into buddy via
update_and_free_page. So it's not PageHuge anymore and won't be dissolved. If
this happens, the "raw error page" is still missed and might be accessed later.
Or am I miss something?

Thanks!

>  		rc = dissolve_free_huge_page(page);
>  		if (rc)
>  			break;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 73948a00ad4a..4a2e22bf0983 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1607,6 +1607,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  		return res == MF_RECOVERED ? 0 : -EBUSY;
>  	}
>  
> +	ClearHPageMigratable(head);
> +
>  	page_flags = head->flags;
>  
>  	/*
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 416b38ca8def..4bc0590f4334 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1864,6 +1864,17 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  
>  			cond_resched();
>  
> +			/*
> +			 * Dissolve free hugepages in the memory block before doing
> +			 * offlining actually in order to make hugetlbfs's object
> +			 * counting consistent.
> +			 */
> +			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> +			if (ret) {
> +				reason = "failure to dissolve huge pages";
> +				goto failed_removal_isolated;
> +			}
> +
>  			ret = scan_movable_pages(pfn, end_pfn, &pfn);
>  			if (!ret) {
>  				/*
> @@ -1879,19 +1890,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  			goto failed_removal_isolated;
>  		}
>  
> -		/*
> -		 * Dissolve free hugepages in the memory block before doing
> -		 * offlining actually in order to make hugetlbfs's object
> -		 * counting consistent.
> -		 */
> -		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -		if (ret) {
> -			reason = "failure to dissolve huge pages";
> -			goto failed_removal_isolated;
> -		}
> -
>  		ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
> -
>  	} while (ret);
>  
>  	/* Mark all sections offline and remove free pages from the buddy. */
>
HORIGUCHI NAOYA(堀口 直也) May 9, 2022, 7:55 a.m. UTC | #2
On Fri, Apr 29, 2022 at 04:49:16PM +0800, Miaohe Lin wrote:
> On 2022/4/27 12:28, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > HWPoisoned page is not supposed to prevent memory hotremove, but
> > currently this does not properly work for hwpoisoned hugepages and the
> > kernel tries to migrate them, which could cause consuming corrupted
> > data.
> > 
> > Move dissolve_free_huge_pages() before scan_movable_pages(). This is
> > because the result of the movable check depends on the result of the
> > dissolve.  Now delayed dissolve is available, so hwpoisoned hugepages
> > can be turned into 4kB hwpoison page which memory hotplug can handle.
> > 
> > And clear HPageMigratable pseudo flag for hwpoisoned hugepages. This is
> > also important because dissolve_free_huge_page() can fail.  So it's
> > still necessary to prevent do_migrate_pages() from trying to migrate
> > hwpoison hugepages.
> > 
> > Reported-by: Miaohe Lin <linmiaohe@huawei.com>
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> >  mm/hugetlb.c        | 11 +++++++++++
> >  mm/memory-failure.c |  2 ++
> >  mm/memory_hotplug.c | 23 +++++++++++------------
> >  3 files changed, 24 insertions(+), 12 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6867ea8345d1..95b1db852ca9 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2159,6 +2159,17 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >  		page = pfn_to_page(pfn);
> > +
> > +		if (PageHuge(page) && PageHWPoison(page)) {
> > +			/*
> > +			 * Release the last refcount from hwpoison to turn into
> > +			 * a free hugepage.
> > +			 */
> > +			if (page_count(page) == 1)
> > +				put_page(page);
> > +			page = hugetlb_page_hwpoison(page);
> > +		}
> > +
> 
> This patch looks good to me. Thanks!
> 
> One question: Can this hugepage be put into buddy system? In free_huge_page,
> if h->surplus_huge_pages_node[nid] > 0, hugepage might be put into buddy via
> update_and_free_page. So it's not PageHuge anymore and won't be dissolved. If
> this happens, the "raw error page" is still missed and might be accessed later.

Yes, this put_page() could free pages directly into buddy.  In such case, I
expect __update_and_free_page() to move the PageHWpoison flag to the raw error
page, so I think the final result should be the same.

Thanks,
Naoya Horiguchi
Miaohe Lin May 9, 2022, 8:57 a.m. UTC | #3
On 2022/5/9 15:55, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Apr 29, 2022 at 04:49:16PM +0800, Miaohe Lin wrote:
>> On 2022/4/27 12:28, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> HWPoisoned page is not supposed to prevent memory hotremove, but
>>> currently this does not properly work for hwpoisoned hugepages and the
>>> kernel tries to migrate them, which could cause consuming corrupted
>>> data.
>>>
>>> Move dissolve_free_huge_pages() before scan_movable_pages(). This is
>>> because the result of the movable check depends on the result of the
>>> dissolve.  Now delayed dissolve is available, so hwpoisoned hugepages
>>> can be turned into 4kB hwpoison page which memory hotplug can handle.
>>>
>>> And clear HPageMigratable pseudo flag for hwpoisoned hugepages. This is
>>> also important because dissolve_free_huge_page() can fail.  So it's
>>> still necessary to prevent do_migrate_pages() from trying to migrate
>>> hwpoison hugepages.
>>>
>>> Reported-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>>  mm/hugetlb.c        | 11 +++++++++++
>>>  mm/memory-failure.c |  2 ++
>>>  mm/memory_hotplug.c | 23 +++++++++++------------
>>>  3 files changed, 24 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 6867ea8345d1..95b1db852ca9 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2159,6 +2159,17 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>>>  
>>>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>>>  		page = pfn_to_page(pfn);
>>> +
>>> +		if (PageHuge(page) && PageHWPoison(page)) {
>>> +			/*
>>> +			 * Release the last refcount from hwpoison to turn into
>>> +			 * a free hugepage.
>>> +			 */
>>> +			if (page_count(page) == 1)
>>> +				put_page(page);
>>> +			page = hugetlb_page_hwpoison(page);
>>> +		}
>>> +
>>
>> This patch looks good to me. Thanks!
>>
>> One question: Can this hugepage be put into buddy system? In free_huge_page,
>> if h->surplus_huge_pages_node[nid] > 0, hugepage might be put into buddy via
>> update_and_free_page. So it's not PageHuge anymore and won't be dissolved. If
>> this happens, the "raw error page" is still missed and might be accessed later.
> 
> Yes, this put_page() could free pages directly into buddy.  In such case, I
> expect __update_and_free_page() to move the PageHWpoison flag to the raw error
> page, so I think the final result should be the same.

Agree, __update_and_free_page could help move the PageHWpoison flag to the raw error page.

Thanks!

> 
> Thanks,
> Naoya Horiguchi
>
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6867ea8345d1..95b1db852ca9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2159,6 +2159,17 @@  int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
+
+		if (PageHuge(page) && PageHWPoison(page)) {
+			/*
+			 * Release the last refcount from hwpoison to turn into
+			 * a free hugepage.
+			 */
+			if (page_count(page) == 1)
+				put_page(page);
+			page = hugetlb_page_hwpoison(page);
+		}
+
 		rc = dissolve_free_huge_page(page);
 		if (rc)
 			break;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 73948a00ad4a..4a2e22bf0983 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1607,6 +1607,8 @@  static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 		return res == MF_RECOVERED ? 0 : -EBUSY;
 	}
 
+	ClearHPageMigratable(head);
+
 	page_flags = head->flags;
 
 	/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 416b38ca8def..4bc0590f4334 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1864,6 +1864,17 @@  int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 			cond_resched();
 
+			/*
+			 * Dissolve free hugepages in the memory block before doing
+			 * offlining actually in order to make hugetlbfs's object
+			 * counting consistent.
+			 */
+			ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+			if (ret) {
+				reason = "failure to dissolve huge pages";
+				goto failed_removal_isolated;
+			}
+
 			ret = scan_movable_pages(pfn, end_pfn, &pfn);
 			if (!ret) {
 				/*
@@ -1879,19 +1890,7 @@  int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 			goto failed_removal_isolated;
 		}
 
-		/*
-		 * Dissolve free hugepages in the memory block before doing
-		 * offlining actually in order to make hugetlbfs's object
-		 * counting consistent.
-		 */
-		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
-		if (ret) {
-			reason = "failure to dissolve huge pages";
-			goto failed_removal_isolated;
-		}
-
 		ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE);
-
 	} while (ret);
 
 	/* Mark all sections offline and remove free pages from the buddy. */