diff mbox series

[mmotm,v1] mm/hwpoison: disable pcp for page_handle_poison()

Message ID 20210617092626.291006-1-nao.horiguchi@gmail.com (mailing list archive)
State New, archived
Headers show
Series [mmotm,v1] mm/hwpoison: disable pcp for page_handle_poison() | expand

Commit Message

Naoya Horiguchi June 17, 2021, 9:26 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Recent changes by patch "mm/page_alloc: allow high-order pages to be
stored on the per-cpu lists" makes kernels determine whether to use pcp
by pcp_allowed_order(), which breaks soft-offline for hugetlb pages.

Soft-offline dissolves a migration source page, then removes it from
buddy free list, so it's assumed that any subpage of the soft-offlined
hugepage are recognized as a buddy page just after returning from
dissolve_free_huge_page().  pcp_allowed_order() returns true for
hugetlb, so this assumption is no longer true.

So disable pcp during dissolve_free_huge_page() and
take_page_off_buddy() to prevent soft-offlined hugepages from linking to
pcp lists.  Soft-offline should not be common events so the impact on
performance should be minimal.  And I think that the optimization of
Mel's patch could benefit to hugetlb so zone_pcp_disable() is called
only in hwpoison context.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

Comments

David Hildenbrand June 17, 2021, 9:28 a.m. UTC | #1
On 17.06.21 11:26, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Recent changes by patch "mm/page_alloc: allow high-order pages to be
> stored on the per-cpu lists" makes kernels determine whether to use pcp
> by pcp_allowed_order(), which breaks soft-offline for hugetlb pages.
> 
> Soft-offline dissolves a migration source page, then removes it from
> buddy free list, so it's assumed that any subpage of the soft-offlined
> hugepage are recognized as a buddy page just after returning from
> dissolve_free_huge_page().  pcp_allowed_order() returns true for
> hugetlb, so this assumption is no longer true.
> 
> So disable pcp during dissolve_free_huge_page() and
> take_page_off_buddy() to prevent soft-offlined hugepages from linking to
> pcp lists.  Soft-offline should not be common events so the impact on
> performance should be minimal.  And I think that the optimization of
> Mel's patch could benefit to hugetlb so zone_pcp_disable() is called
> only in hwpoison context.

Mel, Oscar, does alloc_contig_range() now have similar issues or is it 
avoided because the pageblock(s) are set MIGRATE_ISOLATE?
Mel Gorman June 17, 2021, 11:15 a.m. UTC | #2
On Thu, Jun 17, 2021 at 06:26:26PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Recent changes by patch "mm/page_alloc: allow high-order pages to be
> stored on the per-cpu lists" makes kernels determine whether to use pcp
> by pcp_allowed_order(), which breaks soft-offline for hugetlb pages.
> 
> Soft-offline dissolves a migration source page, then removes it from
> buddy free list, so it's assumed that any subpage of the soft-offlined
> hugepage are recognized as a buddy page just after returning from
> dissolve_free_huge_page().  pcp_allowed_order() returns true for
> hugetlb, so this assumption is no longer true.
> 
> So disable pcp during dissolve_free_huge_page() and
> take_page_off_buddy() to prevent soft-offlined hugepages from linking to
> pcp lists.  Soft-offline should not be common events so the impact on
> performance should be minimal.  And I think that the optimization of
> Mel's patch could benefit to hugetlb so zone_pcp_disable() is called
> only in hwpoison context.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

I think this is ok, it'll remove a page that is to be poisoned from the
PCP lists and put them back in the page allocator. It's a heavy but rare
operation and identifying what PCP list a free page is on would be
tricky so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

The alternative I guess would be specical casing update_and_free_page to
bypass the PCP but it'd be clumsy from an API point of view and I don't
think it's worth the effort.
Mel Gorman June 17, 2021, 11:16 a.m. UTC | #3
On Thu, Jun 17, 2021 at 11:28:49AM +0200, David Hildenbrand wrote:
> On 17.06.21 11:26, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > Recent changes by patch "mm/page_alloc: allow high-order pages to be
> > stored on the per-cpu lists" makes kernels determine whether to use pcp
> > by pcp_allowed_order(), which breaks soft-offline for hugetlb pages.
> > 
> > Soft-offline dissolves a migration source page, then removes it from
> > buddy free list, so it's assumed that any subpage of the soft-offlined
> > hugepage are recognized as a buddy page just after returning from
> > dissolve_free_huge_page().  pcp_allowed_order() returns true for
> > hugetlb, so this assumption is no longer true.
> > 
> > So disable pcp during dissolve_free_huge_page() and
> > take_page_off_buddy() to prevent soft-offlined hugepages from linking to
> > pcp lists.  Soft-offline should not be common events so the impact on
> > performance should be minimal.  And I think that the optimization of
> > Mel's patch could benefit to hugetlb so zone_pcp_disable() is called
> > only in hwpoison context.
> 
> Mel, Oscar, does alloc_contig_range() now have similar issues or is it
> avoided because the pageblock(s) are set MIGRATE_ISOLATE?
> 

I'd expect MIGRATE_ISOLATE to be sufficient because they should bypass
the PCP list in free_unref_page.
David Hildenbrand June 17, 2021, 12:21 p.m. UTC | #4
On 17.06.21 11:26, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Recent changes by patch "mm/page_alloc: allow high-order pages to be
> stored on the per-cpu lists" makes kernels determine whether to use pcp
> by pcp_allowed_order(), which breaks soft-offline for hugetlb pages.
> 
> Soft-offline dissolves a migration source page, then removes it from
> buddy free list, so it's assumed that any subpage of the soft-offlined
> hugepage are recognized as a buddy page just after returning from
> dissolve_free_huge_page().  pcp_allowed_order() returns true for
> hugetlb, so this assumption is no longer true.
> 
> So disable pcp during dissolve_free_huge_page() and
> take_page_off_buddy() to prevent soft-offlined hugepages from linking to
> pcp lists.  Soft-offline should not be common events so the impact on
> performance should be minimal.  And I think that the optimization of
> Mel's patch could benefit to hugetlb so zone_pcp_disable() is called
> only in hwpoison context.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>   mm/memory-failure.c | 19 ++++++++++++++++---
>   1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c
> index 1842822a10da..593079766655 100644
> --- v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c
> +++ v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c
> @@ -66,6 +66,19 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>   
>   atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>   
> +static bool __page_handle_poison(struct page *page)
> +{
> +	bool ret;
> +
> +	zone_pcp_disable(page_zone(page));
> +	ret = dissolve_free_huge_page(page);
> +	if (!ret)
> +		ret = take_page_off_buddy(page);
> +	zone_pcp_enable(page_zone(page));
> +
> +	return ret;
> +}
> +
>   static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
>   {
>   	if (hugepage_or_freepage) {
> @@ -73,7 +86,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
>   		 * Doing this check for free pages is also fine since dissolve_free_huge_page
>   		 * returns 0 for non-hugetlb pages as well.
>   		 */
> -		if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
> +		if (!__page_handle_poison(page))
>   			/*
>   			 * We could fail to take off the target page from buddy
>   			 * for example due to racy page allocation, but that's
> @@ -986,7 +999,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
>   		 */
>   		if (PageAnon(hpage))
>   			put_page(hpage);
> -		if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
> +		if (__page_handle_poison(p)) {
>   			page_ref_inc(p);
>   			res = MF_RECOVERED;
>   		}
> @@ -1441,7 +1454,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>   		res = get_hwpoison_page(p, flags);
>   		if (!res) {
>   			res = MF_FAILED;
> -			if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
> +			if (__page_handle_poison(p)) {
>   				page_ref_inc(p);
>   				res = MF_RECOVERED;
>   			}
> 

Just to make sure: all call paths are fine that we are taking a mutex, 
right?

I can see that get_hwpoison_page() already disables the PCP.

LGTM
Mike Kravetz June 18, 2021, 4:47 a.m. UTC | #5
On 6/17/21 5:21 AM, David Hildenbrand wrote:
> On 17.06.21 11:26, Naoya Horiguchi wrote:
>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>
>> Recent changes by patch "mm/page_alloc: allow high-order pages to be
>> stored on the per-cpu lists" makes kernels determine whether to use pcp
>> by pcp_allowed_order(), which breaks soft-offline for hugetlb pages.
>>
>> Soft-offline dissolves a migration source page, then removes it from
>> buddy free list, so it's assumed that any subpage of the soft-offlined
>> hugepage are recognized as a buddy page just after returning from
>> dissolve_free_huge_page().  pcp_allowed_order() returns true for
>> hugetlb, so this assumption is no longer true.
>>
>> So disable pcp during dissolve_free_huge_page() and
>> take_page_off_buddy() to prevent soft-offlined hugepages from linking to
>> pcp lists.  Soft-offline should not be common events so the impact on
>> performance should be minimal.  And I think that the optimization of
>> Mel's patch could benefit to hugetlb so zone_pcp_disable() is called
>> only in hwpoison context.
>>
>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> ---
>>   mm/memory-failure.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c
>> index 1842822a10da..593079766655 100644
>> --- v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c
>> +++ v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c
>> @@ -66,6 +66,19 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>>     atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>>   +static bool __page_handle_poison(struct page *page)
>> +{
>> +    bool ret;
>> +
>> +    zone_pcp_disable(page_zone(page));
>> +    ret = dissolve_free_huge_page(page);
>> +    if (!ret)
>> +        ret = take_page_off_buddy(page);
>> +    zone_pcp_enable(page_zone(page));
>> +
>> +    return ret;
>> +}
>> +
>>   static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
>>   {
>>       if (hugepage_or_freepage) {
>> @@ -73,7 +86,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
>>            * Doing this check for free pages is also fine since dissolve_free_huge_page
>>            * returns 0 for non-hugetlb pages as well.
>>            */
>> -        if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
>> +        if (!__page_handle_poison(page))
>>               /*
>>                * We could fail to take off the target page from buddy
>>                * for example due to racy page allocation, but that's
>> @@ -986,7 +999,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
>>            */
>>           if (PageAnon(hpage))
>>               put_page(hpage);
>> -        if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
>> +        if (__page_handle_poison(p)) {
>>               page_ref_inc(p);
>>               res = MF_RECOVERED;
>>           }
>> @@ -1441,7 +1454,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>           res = get_hwpoison_page(p, flags);
>>           if (!res) {
>>               res = MF_FAILED;
>> -            if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
>> +            if (__page_handle_poison(p)) {
>>                   page_ref_inc(p);
>>                   res = MF_RECOVERED;
>>               }
>>
> 
> Just to make sure: all call paths are fine that we are taking a mutex, right?
> 

That should be the case.  dissolve_free_huge_page can sleep, so if not
we are already broken.
diff mbox series

Patch

diff --git v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c
index 1842822a10da..593079766655 100644
--- v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c
+++ v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c
@@ -66,6 +66,19 @@  int sysctl_memory_failure_recovery __read_mostly = 1;
 
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
+static bool __page_handle_poison(struct page *page)
+{
+	bool ret;
+
+	zone_pcp_disable(page_zone(page));
+	ret = dissolve_free_huge_page(page);
+	if (!ret)
+		ret = take_page_off_buddy(page);
+	zone_pcp_enable(page_zone(page));
+
+	return ret;
+}
+
 static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
 {
 	if (hugepage_or_freepage) {
@@ -73,7 +86,7 @@  static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
 		 * Doing this check for free pages is also fine since dissolve_free_huge_page
 		 * returns 0 for non-hugetlb pages as well.
 		 */
-		if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
+		if (!__page_handle_poison(page))
 			/*
 			 * We could fail to take off the target page from buddy
 			 * for example due to racy page allocation, but that's
@@ -986,7 +999,7 @@  static int me_huge_page(struct page *p, unsigned long pfn)
 		 */
 		if (PageAnon(hpage))
 			put_page(hpage);
-		if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
+		if (__page_handle_poison(p)) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
@@ -1441,7 +1454,7 @@  static int memory_failure_hugetlb(unsigned long pfn, int flags)
 		res = get_hwpoison_page(p, flags);
 		if (!res) {
 			res = MF_FAILED;
-			if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
+			if (__page_handle_poison(p)) {
 				page_ref_inc(p);
 				res = MF_RECOVERED;
 			}