diff mbox series

[4/4] mm/memory-failure.c: fix potential VM_BUG_ON_PAGE in split_huge_page_to_list

Message ID 20220228140245.24552-5-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few fixup patches for memory failure | expand

Commit Message

Miaohe Lin Feb. 28, 2022, 2:02 p.m. UTC
The huge zero page could reach here and if we ever try to split it, the
VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
non-lru compound movable pages could be taken for transhuge pages. Skip
these pages by checking PageLRU because huge zero page isn't lru page as
non-lru compound movable pages.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

HORIGUCHI NAOYA(堀口 直也) March 4, 2022, 8:28 a.m. UTC | #1
On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
> The huge zero page could reach here and if we ever try to split it, the
> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
> non-lru compound movable pages could be taken for transhuge pages. Skip
> these pages by checking PageLRU because huge zero page isn't lru page as
> non-lru compound movable pages.

It seems that memory_failure() also fails at get_any_page() with "hwpoison:
unhandlable page" message.

  [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
  [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
  [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
  [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
  [16478.214473] page dumped because: hwpoison: unhandlable page
  [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored

We can't handle errors on huge (or normal) zero page, so the current
behavior seems to me more suitable than "unsplit thp".

Or if you have some producer to reach the following path with huge zero
page, could you share it?

Thanks,
Naoya Horiguchi

> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 23bfd809dc8c..ac6492e36978 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
>  	}
>  
>  	if (PageTransHuge(hpage)) {
> +		/*
> +		 * The non-lru compound movable pages could be taken for
> +		 * transhuge pages. Also huge zero page could reach here
> +		 * and if we ever try to split it, the VM_BUG_ON_PAGE will
> +		 * be triggered in split_huge_page_to_list(). Skip these
> +		 * pages by checking PageLRU because huge zero page isn't
> +		 * lru page as non-lru compound movable pages.
> +		 */
> +		if (!PageLRU(hpage)) {
> +			put_page(p);
> +			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> +			res = -EBUSY;
> +			goto unlock_mutex;
> +		}
>  		/*
>  		 * The flag must be set after the refcount is bumped
>  		 * otherwise it may race with THP split.
> -- 
> 2.23.0
Miaohe Lin March 7, 2022, 7:07 a.m. UTC | #2
On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
>> The huge zero page could reach here and if we ever try to split it, the
>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
>> non-lru compound movable pages could be taken for transhuge pages. Skip
>> these pages by checking PageLRU because huge zero page isn't lru page as
>> non-lru compound movable pages.
> 
> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
> unhandlable page" message.
> 
>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>   [16478.214473] page dumped because: hwpoison: unhandlable page
>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
> 
> We can't handle errors on huge (or normal) zero page, so the current

Sorry for confusing commit log again. I should have a coffee before I make this patch.
Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
nor PageHuge.

> behavior seems to me more suitable than "unsplit thp".
> 
> Or if you have some producer to reach the following path with huge zero
> page, could you share it?
> 

What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.

Does this make sense for you? Thanks Naoya.

> Thanks,
> Naoya Horiguchi
> 
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 23bfd809dc8c..ac6492e36978 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
>>  	}
>>  
>>  	if (PageTransHuge(hpage)) {
>> +		/*
>> +		 * The non-lru compound movable pages could be taken for
>> +		 * transhuge pages. Also huge zero page could reach here
>> +		 * and if we ever try to split it, the VM_BUG_ON_PAGE will
>> +		 * be triggered in split_huge_page_to_list(). Skip these
>> +		 * pages by checking PageLRU because huge zero page isn't
>> +		 * lru page as non-lru compound movable pages.
>> +		 */
>> +		if (!PageLRU(hpage)) {
>> +			put_page(p);
>> +			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>> +			res = -EBUSY;
>> +			goto unlock_mutex;
>> +		}
>>  		/*
>>  		 * The flag must be set after the refcount is bumped
>>  		 * otherwise it may race with THP split.
>> -- 
>> 2.23.0
Yang Shi March 7, 2022, 7:53 p.m. UTC | #3
On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
> >> The huge zero page could reach here and if we ever try to split it, the
> >> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
> >> non-lru compound movable pages could be taken for transhuge pages. Skip
> >> these pages by checking PageLRU because huge zero page isn't lru page as
> >> non-lru compound movable pages.
> >
> > It seems that memory_failure() also fails at get_any_page() with "hwpoison:
> > unhandlable page" message.
> >
> >   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
> >   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> >   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
> >   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> >   [16478.214473] page dumped because: hwpoison: unhandlable page
> >   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
> >
> > We can't handle errors on huge (or normal) zero page, so the current
>
> Sorry for confusing commit log again. I should have a coffee before I make this patch.
> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
> nor PageHuge.
>
> > behavior seems to me more suitable than "unsplit thp".
> >
> > Or if you have some producer to reach the following path with huge zero
> > page, could you share it?
> >
>
> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.

Can we really handle non-LRU movable pages in memory failure
(uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
Assuming we run into a base (4K) non-LRU movable page, we could reach
as far as identify_page_state(), it should not fall into any category
except me_unknown. So it seems we could just simply make it
unhandlable.

But it should be handlable for soft-offline since it could be migrated.


> Does this make sense for you? Thanks Naoya.
>
> > Thanks,
> > Naoya Horiguchi
> >
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/memory-failure.c | 14 ++++++++++++++
> >>  1 file changed, 14 insertions(+)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 23bfd809dc8c..ac6492e36978 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
> >>      }
> >>
> >>      if (PageTransHuge(hpage)) {
> >> +            /*
> >> +             * The non-lru compound movable pages could be taken for
> >> +             * transhuge pages. Also huge zero page could reach here
> >> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
> >> +             * be triggered in split_huge_page_to_list(). Skip these
> >> +             * pages by checking PageLRU because huge zero page isn't
> >> +             * lru page as non-lru compound movable pages.
> >> +             */
> >> +            if (!PageLRU(hpage)) {
> >> +                    put_page(p);
> >> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> >> +                    res = -EBUSY;
> >> +                    goto unlock_mutex;
> >> +            }
> >>              /*
> >>               * The flag must be set after the refcount is bumped
> >>               * otherwise it may race with THP split.
> >> --
> >> 2.23.0
>
>
Miaohe Lin March 8, 2022, 12:36 p.m. UTC | #4
On 2022/3/8 3:53, Yang Shi wrote:
> On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
>>>> The huge zero page could reach here and if we ever try to split it, the
>>>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
>>>> non-lru compound movable pages could be taken for transhuge pages. Skip
>>>> these pages by checking PageLRU because huge zero page isn't lru page as
>>>> non-lru compound movable pages.
>>>
>>> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
>>> unhandlable page" message.
>>>
>>>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
>>>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
>>>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
>>>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>>>   [16478.214473] page dumped because: hwpoison: unhandlable page
>>>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
>>>
>>> We can't handle errors on huge (or normal) zero page, so the current
>>
>> Sorry for confusing commit log again. I should have a coffee before I make this patch.
>> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
>> nor PageHuge.
>>
>>> behavior seems to me more suitable than "unsplit thp".
>>>
>>> Or if you have some producer to reach the following path with huge zero
>>> page, could you share it?
>>>
>>
>> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
>> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
>> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
>> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.
> 
> Can we really handle non-LRU movable pages in memory failure
> (uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
> Assuming we run into a base (4K) non-LRU movable page, we could reach
> as far as identify_page_state(), it should not fall into any category
> except me_unknown. So it seems we could just simply make it
> unhandlable.

There is the comment from memory_failure:
	/*
	 * We ignore non-LRU pages for good reasons.
	 * - PG_locked is only well defined for LRU pages and a few others
	 * - to avoid races with __SetPageLocked()
	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
	 * The check (unnecessarily) ignores LRU pages being isolated and
	 * walked by the page reclaim code, however that's not a big loss.
	 */

So we could not handle non-LRU movable pages.

What do you mean is something like below?

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5444a8ef4867..d80dbe0f20b6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1784,6 +1784,13 @@ int memory_failure(unsigned long pfn, int flags)
                }
        }

+       if (__PageMovable(hpage)) {
+               put_page(p);
+               action_result(pfn, MF_MSG_MOVALBE_PAGE, MF_IGNORED);
+               res = -EBUSY;
+               goto unlock_mutex;
+       }
+
        if (PageTransHuge(hpage)) {
                /*
                 * The flag must be set after the refcount is bumped


i.e. Simply make non-LRU movable pages unhandlable ?

> 
> But it should be handlable for soft-offline since it could be migrated.
> 

Yes, non-LRU movable pages can be simply migrated.

Many thanks.

> 
>> Does this make sense for you? Thanks Naoya.
>>
>>> Thanks,
>>> Naoya Horiguchi
>>>
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/memory-failure.c | 14 ++++++++++++++
>>>>  1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 23bfd809dc8c..ac6492e36978 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
>>>>      }
>>>>
>>>>      if (PageTransHuge(hpage)) {
>>>> +            /*
>>>> +             * The non-lru compound movable pages could be taken for
>>>> +             * transhuge pages. Also huge zero page could reach here
>>>> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
>>>> +             * be triggered in split_huge_page_to_list(). Skip these
>>>> +             * pages by checking PageLRU because huge zero page isn't
>>>> +             * lru page as non-lru compound movable pages.
>>>> +             */
>>>> +            if (!PageLRU(hpage)) {
>>>> +                    put_page(p);
>>>> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>>>> +                    res = -EBUSY;
>>>> +                    goto unlock_mutex;
>>>> +            }
>>>>              /*
>>>>               * The flag must be set after the refcount is bumped
>>>>               * otherwise it may race with THP split.
>>>> --
>>>> 2.23.0
>>
>>
> .
>
Yang Shi March 8, 2022, 6:47 p.m. UTC | #5
On Tue, Mar 8, 2022 at 4:36 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/3/8 3:53, Yang Shi wrote:
> > On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
> >>> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
> >>>> The huge zero page could reach here and if we ever try to split it, the
> >>>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
> >>>> non-lru compound movable pages could be taken for transhuge pages. Skip
> >>>> these pages by checking PageLRU because huge zero page isn't lru page as
> >>>> non-lru compound movable pages.
> >>>
> >>> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
> >>> unhandlable page" message.
> >>>
> >>>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
> >>>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> >>>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
> >>>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> >>>   [16478.214473] page dumped because: hwpoison: unhandlable page
> >>>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
> >>>
> >>> We can't handle errors on huge (or normal) zero page, so the current
> >>
> >> Sorry for confusing commit log again. I should have a coffee before I make this patch.
> >> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
> >> nor PageHuge.
> >>
> >>> behavior seems to me more suitable than "unsplit thp".
> >>>
> >>> Or if you have some producer to reach the following path with huge zero
> >>> page, could you share it?
> >>>
> >>
> >> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
> >> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
> >> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
> >> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.
> >
> > Can we really handle non-LRU movable pages in memory failure
> > (uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
> > Assuming we run into a base (4K) non-LRU movable page, we could reach
> > as far as identify_page_state(), it should not fall into any category
> > except me_unknown. So it seems we could just simply make it
> > unhandlable.
>
> There is the comment from memory_failure:
>         /*
>          * We ignore non-LRU pages for good reasons.
>          * - PG_locked is only well defined for LRU pages and a few others
>          * - to avoid races with __SetPageLocked()
>          * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>          * The check (unnecessarily) ignores LRU pages being isolated and
>          * walked by the page reclaim code, however that's not a big loss.
>          */
>
> So we could not handle non-LRU movable pages.
>
> What do you mean is something like below?
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..d80dbe0f20b6 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1784,6 +1784,13 @@ int memory_failure(unsigned long pfn, int flags)
>                 }
>         }
>
> +       if (__PageMovable(hpage)) {
> +               put_page(p);
> +               action_result(pfn, MF_MSG_MOVALBE_PAGE, MF_IGNORED);
> +               res = -EBUSY;
> +               goto unlock_mutex;
> +       }
> +
>         if (PageTransHuge(hpage)) {
>                 /*
>                  * The flag must be set after the refcount is bumped
>
>
> i.e. Simply make non-LRU movable pages unhandlable ?

I'd prefer this personally. Something like the below (compile test only):

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5444a8ef4867..789e40909ade 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
  * does not return true for hugetlb or device memory pages, so it's assumed
  * to be called only in the context where we never have such pages.
  */
-static inline bool HWPoisonHandlable(struct page *page)
+static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
 {
- return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
+ bool movable = false;
+
+ /* Soft offline could mirgate non-LRU movable pages */
+ if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
+ movable = true;
+
+ return movable || PageLRU(page) || is_free_buddy_page(page);
 }

-static int __get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page, unsigned long flags)
 {
  struct page *head = compound_head(page);
  int ret = 0;
@@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct page *page)
  * for any unsupported type of page in order to reduce the risk of
  * unexpected races caused by taking a page refcount.
  */
- if (!HWPoisonHandlable(head))
+ if (!HWPoisonHandlable(head, flags))
  return -EBUSY;

  if (get_page_unless_zero(head)) {
@@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p, unsigned
long flags)

 try_again:
  if (!count_increased) {
- ret = __get_hwpoison_page(p);
+ ret = __get_hwpoison_page(p, flags);
  if (!ret) {
  if (page_count(p)) {
  /* We raced with an allocation, retry. */
@@ -1249,7 +1255,7 @@ static int get_any_page(struct page *p, unsigned
long flags)
  }
  }

- if (PageHuge(p) || HWPoisonHandlable(p)) {
+ if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
  ret = 1;
  } else {
  /*

>
> >
> > But it should be handlable for soft-offline since it could be migrated.
> >
>
> Yes, non-LRU movable pages can be simply migrated.
>
> Many thanks.
>
> >
> >> Does this make sense for you? Thanks Naoya.
> >>
> >>> Thanks,
> >>> Naoya Horiguchi
> >>>
> >>>>
> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >>>> ---
> >>>>  mm/memory-failure.c | 14 ++++++++++++++
> >>>>  1 file changed, 14 insertions(+)
> >>>>
> >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>>> index 23bfd809dc8c..ac6492e36978 100644
> >>>> --- a/mm/memory-failure.c
> >>>> +++ b/mm/memory-failure.c
> >>>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
> >>>>      }
> >>>>
> >>>>      if (PageTransHuge(hpage)) {
> >>>> +            /*
> >>>> +             * The non-lru compound movable pages could be taken for
> >>>> +             * transhuge pages. Also huge zero page could reach here
> >>>> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
> >>>> +             * be triggered in split_huge_page_to_list(). Skip these
> >>>> +             * pages by checking PageLRU because huge zero page isn't
> >>>> +             * lru page as non-lru compound movable pages.
> >>>> +             */
> >>>> +            if (!PageLRU(hpage)) {
> >>>> +                    put_page(p);
> >>>> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> >>>> +                    res = -EBUSY;
> >>>> +                    goto unlock_mutex;
> >>>> +            }
> >>>>              /*
> >>>>               * The flag must be set after the refcount is bumped
> >>>>               * otherwise it may race with THP split.
> >>>> --
> >>>> 2.23.0
> >>
> >>
> > .
> >
>
Miaohe Lin March 9, 2022, 8:45 a.m. UTC | #6
On 2022/3/9 2:47, Yang Shi wrote:
> On Tue, Mar 8, 2022 at 4:36 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/3/8 3:53, Yang Shi wrote:
>>> On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>>> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
>>>>>> The huge zero page could reach here and if we ever try to split it, the
>>>>>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
>>>>>> non-lru compound movable pages could be taken for transhuge pages. Skip
>>>>>> these pages by checking PageLRU because huge zero page isn't lru page as
>>>>>> non-lru compound movable pages.
>>>>>
>>>>> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
>>>>> unhandlable page" message.
>>>>>
>>>>>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
>>>>>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
>>>>>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
>>>>>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>>>>>   [16478.214473] page dumped because: hwpoison: unhandlable page
>>>>>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
>>>>>
>>>>> We can't handle errors on huge (or normal) zero page, so the current
>>>>
>>>> Sorry for confusing commit log again. I should have a coffee before I make this patch.
>>>> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
>>>> nor PageHuge.
>>>>
>>>>> behavior seems to me more suitable than "unsplit thp".
>>>>>
>>>>> Or if you have some producer to reach the following path with huge zero
>>>>> page, could you share it?
>>>>>
>>>>
>>>> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
>>>> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
>>>> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
>>>> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.
>>>
>>> Can we really handle non-LRU movable pages in memory failure
>>> (uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
>>> Assuming we run into a base (4K) non-LRU movable page, we could reach
>>> as far as identify_page_state(), it should not fall into any category
>>> except me_unknown. So it seems we could just simply make it
>>> unhandlable.
>>
>> There is the comment from memory_failure:
>>         /*
>>          * We ignore non-LRU pages for good reasons.
>>          * - PG_locked is only well defined for LRU pages and a few others
>>          * - to avoid races with __SetPageLocked()
>>          * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>>          * The check (unnecessarily) ignores LRU pages being isolated and
>>          * walked by the page reclaim code, however that's not a big loss.
>>          */
>>
>> So we could not handle non-LRU movable pages.
>>
>> What do you mean is something like below?
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 5444a8ef4867..d80dbe0f20b6 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1784,6 +1784,13 @@ int memory_failure(unsigned long pfn, int flags)
>>                 }
>>         }
>>
>> +       if (__PageMovable(hpage)) {
>> +               put_page(p);
>> +               action_result(pfn, MF_MSG_MOVALBE_PAGE, MF_IGNORED);
>> +               res = -EBUSY;
>> +               goto unlock_mutex;
>> +       }
>> +
>>         if (PageTransHuge(hpage)) {
>>                 /*
>>                  * The flag must be set after the refcount is bumped
>>
>>
>> i.e. Simply make non-LRU movable pages unhandlable ?
> 

The below change looks good to me except we need to ensure the caller passing
in the MF_SOFT_OFFLINE flag while doing soft-offline now. This is false for
madvise_inject_error but it's trivial to change this.

Will try to do this in V2.
Many thanks.

> I'd prefer this personally. Something like the below (compile test only):
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..789e40909ade 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>   * does not return true for hugetlb or device memory pages, so it's assumed
>   * to be called only in the context where we never have such pages.
>   */
> -static inline bool HWPoisonHandlable(struct page *page)
> +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>  {
> - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
> + bool movable = false;
> +
> + /* Soft offline could mirgate non-LRU movable pages */
> + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> + movable = true;
> +
> + return movable || PageLRU(page) || is_free_buddy_page(page);
>  }
> 
> -static int __get_hwpoison_page(struct page *page)
> +static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  {
>   struct page *head = compound_head(page);
>   int ret = 0;
> @@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct page *page)
>   * for any unsupported type of page in order to reduce the risk of
>   * unexpected races caused by taking a page refcount.
>   */
> - if (!HWPoisonHandlable(head))
> + if (!HWPoisonHandlable(head, flags))
>   return -EBUSY;
> 
>   if (get_page_unless_zero(head)) {
> @@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p, unsigned
> long flags)
> 
>  try_again:
>   if (!count_increased) {
> - ret = __get_hwpoison_page(p);
> + ret = __get_hwpoison_page(p, flags);
>   if (!ret) {
>   if (page_count(p)) {
>   /* We raced with an allocation, retry. */
> @@ -1249,7 +1255,7 @@ static int get_any_page(struct page *p, unsigned
> long flags)
>   }
>   }
> 
> - if (PageHuge(p) || HWPoisonHandlable(p)) {
> + if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
>   ret = 1;
>   } else {
>   /*
> 
>>
>>>
>>> But it should be handlable for soft-offline since it could be migrated.
>>>
>>
>> Yes, non-LRU movable pages can be simply migrated.
>>
>> Many thanks.
>>
>>>
>>>> Does this make sense for you? Thanks Naoya.
>>>>
>>>>> Thanks,
>>>>> Naoya Horiguchi
>>>>>
>>>>>>
>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>> ---
>>>>>>  mm/memory-failure.c | 14 ++++++++++++++
>>>>>>  1 file changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>> index 23bfd809dc8c..ac6492e36978 100644
>>>>>> --- a/mm/memory-failure.c
>>>>>> +++ b/mm/memory-failure.c
>>>>>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>      }
>>>>>>
>>>>>>      if (PageTransHuge(hpage)) {
>>>>>> +            /*
>>>>>> +             * The non-lru compound movable pages could be taken for
>>>>>> +             * transhuge pages. Also huge zero page could reach here
>>>>>> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
>>>>>> +             * be triggered in split_huge_page_to_list(). Skip these
>>>>>> +             * pages by checking PageLRU because huge zero page isn't
>>>>>> +             * lru page as non-lru compound movable pages.
>>>>>> +             */
>>>>>> +            if (!PageLRU(hpage)) {
>>>>>> +                    put_page(p);
>>>>>> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>>>>>> +                    res = -EBUSY;
>>>>>> +                    goto unlock_mutex;
>>>>>> +            }
>>>>>>              /*
>>>>>>               * The flag must be set after the refcount is bumped
>>>>>>               * otherwise it may race with THP split.
>>>>>> --
>>>>>> 2.23.0
>>>>
>>>>
>>> .
>>>
>>
> .
>
Miaohe Lin March 10, 2022, 11:46 a.m. UTC | #7
On 2022/3/9 2:47, Yang Shi wrote:
> On Tue, Mar 8, 2022 at 4:36 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/3/8 3:53, Yang Shi wrote:
>>> On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>>> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
>>>>>> The huge zero page could reach here and if we ever try to split it, the
>>>>>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
>>>>>> non-lru compound movable pages could be taken for transhuge pages. Skip
>>>>>> these pages by checking PageLRU because huge zero page isn't lru page as
>>>>>> non-lru compound movable pages.
>>>>>
>>>>> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
>>>>> unhandlable page" message.
>>>>>
>>>>>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
>>>>>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
>>>>>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
>>>>>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>>>>>   [16478.214473] page dumped because: hwpoison: unhandlable page
>>>>>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
>>>>>
>>>>> We can't handle errors on huge (or normal) zero page, so the current
>>>>
>>>> Sorry for confusing commit log again. I should have a coffee before I make this patch.
>>>> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
>>>> nor PageHuge.
>>>>
>>>>> behavior seems to me more suitable than "unsplit thp".
>>>>>
>>>>> Or if you have some producer to reach the following path with huge zero
>>>>> page, could you share it?
>>>>>
>>>>
>>>> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
>>>> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
>>>> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
>>>> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.
>>>
>>> Can we really handle non-LRU movable pages in memory failure
>>> (uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
>>> Assuming we run into a base (4K) non-LRU movable page, we could reach
>>> as far as identify_page_state(), it should not fall into any category
>>> except me_unknown. So it seems we could just simply make it
>>> unhandlable.
>>
>> There is the comment from memory_failure:
>>         /*
>>          * We ignore non-LRU pages for good reasons.
>>          * - PG_locked is only well defined for LRU pages and a few others
>>          * - to avoid races with __SetPageLocked()
>>          * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>>          * The check (unnecessarily) ignores LRU pages being isolated and
>>          * walked by the page reclaim code, however that's not a big loss.
>>          */
>>
>> So we could not handle non-LRU movable pages.
>>
>> What do you mean is something like below?
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 5444a8ef4867..d80dbe0f20b6 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1784,6 +1784,13 @@ int memory_failure(unsigned long pfn, int flags)
>>                 }
>>         }
>>
>> +       if (__PageMovable(hpage)) {
>> +               put_page(p);
>> +               action_result(pfn, MF_MSG_MOVALBE_PAGE, MF_IGNORED);
>> +               res = -EBUSY;
>> +               goto unlock_mutex;
>> +       }
>> +
>>         if (PageTransHuge(hpage)) {
>>                 /*
>>                  * The flag must be set after the refcount is bumped
>>
>>
>> i.e. Simply make non-LRU movable pages unhandlable ?
> 

I think about the below code more carefully and I found that this will make
hwpoison_filter can't handle the non-LRU movable pages now. Because non-LRU
movable pages return early now and thus can't reach the hwpoison_filter. This
results in a inconsistent behavior with previous one. So I think the origin
fixup of this patch is more suitable. What do you think?

Thanks.

> I'd prefer this personally. Something like the below (compile test only):
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..789e40909ade 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>   * does not return true for hugetlb or device memory pages, so it's assumed
>   * to be called only in the context where we never have such pages.
>   */
> -static inline bool HWPoisonHandlable(struct page *page)
> +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>  {
> - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
> + bool movable = false;
> +
> + /* Soft offline could mirgate non-LRU movable pages */
> + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> + movable = true;
> +
> + return movable || PageLRU(page) || is_free_buddy_page(page);
>  }
> 
> -static int __get_hwpoison_page(struct page *page)
> +static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  {
>   struct page *head = compound_head(page);
>   int ret = 0;
> @@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct page *page)
>   * for any unsupported type of page in order to reduce the risk of
>   * unexpected races caused by taking a page refcount.
>   */
> - if (!HWPoisonHandlable(head))
> + if (!HWPoisonHandlable(head, flags))
>   return -EBUSY;
> 
>   if (get_page_unless_zero(head)) {
> @@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p, unsigned
> long flags)
> 
>  try_again:
>   if (!count_increased) {
> - ret = __get_hwpoison_page(p);
> + ret = __get_hwpoison_page(p, flags);
>   if (!ret) {
>   if (page_count(p)) {
>   /* We raced with an allocation, retry. */
> @@ -1249,7 +1255,7 @@ static int get_any_page(struct page *p, unsigned
> long flags)
>   }
>   }
> 
> - if (PageHuge(p) || HWPoisonHandlable(p)) {
> + if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
>   ret = 1;
>   } else {
>   /*
> 
>>
>>>
>>> But it should be handlable for soft-offline since it could be migrated.
>>>
>>
>> Yes, non-LRU movable pages can be simply migrated.
>>
>> Many thanks.
>>
>>>
>>>> Does this make sense for you? Thanks Naoya.
>>>>
>>>>> Thanks,
>>>>> Naoya Horiguchi
>>>>>
>>>>>>
>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>> ---
>>>>>>  mm/memory-failure.c | 14 ++++++++++++++
>>>>>>  1 file changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>> index 23bfd809dc8c..ac6492e36978 100644
>>>>>> --- a/mm/memory-failure.c
>>>>>> +++ b/mm/memory-failure.c
>>>>>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>      }
>>>>>>
>>>>>>      if (PageTransHuge(hpage)) {
>>>>>> +            /*
>>>>>> +             * The non-lru compound movable pages could be taken for
>>>>>> +             * transhuge pages. Also huge zero page could reach here
>>>>>> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
>>>>>> +             * be triggered in split_huge_page_to_list(). Skip these
>>>>>> +             * pages by checking PageLRU because huge zero page isn't
>>>>>> +             * lru page as non-lru compound movable pages.
>>>>>> +             */
>>>>>> +            if (!PageLRU(hpage)) {
>>>>>> +                    put_page(p);
>>>>>> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>>>>>> +                    res = -EBUSY;
>>>>>> +                    goto unlock_mutex;
>>>>>> +            }
>>>>>>              /*
>>>>>>               * The flag must be set after the refcount is bumped
>>>>>>               * otherwise it may race with THP split.
>>>>>> --
>>>>>> 2.23.0
>>>>
>>>>
>>> .
>>>
>>
> .
>
Yang Shi March 10, 2022, 7:32 p.m. UTC | #8
On Thu, Mar 10, 2022 at 3:46 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/3/9 2:47, Yang Shi wrote:
> > On Tue, Mar 8, 2022 at 4:36 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> On 2022/3/8 3:53, Yang Shi wrote:
> >>> On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>>>
> >>>> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
> >>>>> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
> >>>>>> The huge zero page could reach here and if we ever try to split it, the
> >>>>>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
> >>>>>> non-lru compound movable pages could be taken for transhuge pages. Skip
> >>>>>> these pages by checking PageLRU because huge zero page isn't lru page as
> >>>>>> non-lru compound movable pages.
> >>>>>
> >>>>> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
> >>>>> unhandlable page" message.
> >>>>>
> >>>>>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
> >>>>>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> >>>>>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
> >>>>>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> >>>>>   [16478.214473] page dumped because: hwpoison: unhandlable page
> >>>>>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
> >>>>>
> >>>>> We can't handle errors on huge (or normal) zero page, so the current
> >>>>
> >>>> Sorry for confusing commit log again. I should have a coffee before I make this patch.
> >>>> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
> >>>> nor PageHuge.
> >>>>
> >>>>> behavior seems to me more suitable than "unsplit thp".
> >>>>>
> >>>>> Or if you have some producer to reach the following path with huge zero
> >>>>> page, could you share it?
> >>>>>
> >>>>
> >>>> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
> >>>> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
> >>>> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
> >>>> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.
> >>>
> >>> Can we really handle non-LRU movable pages in memory failure
> >>> (uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
> >>> Assuming we run into a base (4K) non-LRU movable page, we could reach
> >>> as far as identify_page_state(), it should not fall into any category
> >>> except me_unknown. So it seems we could just simply make it
> >>> unhandlable.
> >>
> >> There is the comment from memory_failure:
> >>         /*
> >>          * We ignore non-LRU pages for good reasons.
> >>          * - PG_locked is only well defined for LRU pages and a few others
> >>          * - to avoid races with __SetPageLocked()
> >>          * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
> >>          * The check (unnecessarily) ignores LRU pages being isolated and
> >>          * walked by the page reclaim code, however that's not a big loss.
> >>          */
> >>
> >> So we could not handle non-LRU movable pages.
> >>
> >> What do you mean is something like below?
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 5444a8ef4867..d80dbe0f20b6 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1784,6 +1784,13 @@ int memory_failure(unsigned long pfn, int flags)
> >>                 }
> >>         }
> >>
> >> +       if (__PageMovable(hpage)) {
> >> +               put_page(p);
> >> +               action_result(pfn, MF_MSG_MOVALBE_PAGE, MF_IGNORED);
> >> +               res = -EBUSY;
> >> +               goto unlock_mutex;
> >> +       }
> >> +
> >>         if (PageTransHuge(hpage)) {
> >>                 /*
> >>                  * The flag must be set after the refcount is bumped
> >>
> >>
> >> i.e. Simply make non-LRU movable pages unhandlable ?
> >
>
> I think about the below code more carefully and I found that this will make
> hwpoison_filter can't handle the non-LRU movable pages now. Because non-LRU
> movable pages return early now and thus can't reach the hwpoison_filter. This
> results in a inconsistent behavior with previous one. So I think the origin
> fixup of this patch is more suitable. What do you think?

I'm not familiar with hwpoison_filter(), it seems like a test helper
for error injection. I don't think hwpoison_filter() is used to filter
unhandlable page, for example, slab page, IIUC. So the non-LRU movable
pages should be treated the same. If so, the old behavior was simply
wrong.

>
> Thanks.
>
> > I'd prefer this personally. Something like the below (compile test only):
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 5444a8ef4867..789e40909ade 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
> >   * does not return true for hugetlb or device memory pages, so it's assumed
> >   * to be called only in the context where we never have such pages.
> >   */
> > -static inline bool HWPoisonHandlable(struct page *page)
> > +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
> >  {
> > - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
> > + bool movable = false;
> > +
> > + /* Soft offline could mirgate non-LRU movable pages */
> > + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> > + movable = true;
> > +
> > + return movable || PageLRU(page) || is_free_buddy_page(page);
> >  }
> >
> > -static int __get_hwpoison_page(struct page *page)
> > +static int __get_hwpoison_page(struct page *page, unsigned long flags)
> >  {
> >   struct page *head = compound_head(page);
> >   int ret = 0;
> > @@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct page *page)
> >   * for any unsupported type of page in order to reduce the risk of
> >   * unexpected races caused by taking a page refcount.
> >   */
> > - if (!HWPoisonHandlable(head))
> > + if (!HWPoisonHandlable(head, flags))
> >   return -EBUSY;
> >
> >   if (get_page_unless_zero(head)) {
> > @@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p, unsigned
> > long flags)
> >
> >  try_again:
> >   if (!count_increased) {
> > - ret = __get_hwpoison_page(p);
> > + ret = __get_hwpoison_page(p, flags);
> >   if (!ret) {
> >   if (page_count(p)) {
> >   /* We raced with an allocation, retry. */
> > @@ -1249,7 +1255,7 @@ static int get_any_page(struct page *p, unsigned
> > long flags)
> >   }
> >   }
> >
> > - if (PageHuge(p) || HWPoisonHandlable(p)) {
> > + if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
> >   ret = 1;
> >   } else {
> >   /*
> >
> >>
> >>>
> >>> But it should be handlable for soft-offline since it could be migrated.
> >>>
> >>
> >> Yes, non-LRU movable pages can be simply migrated.
> >>
> >> Many thanks.
> >>
> >>>
> >>>> Does this make sense for you? Thanks Naoya.
> >>>>
> >>>>> Thanks,
> >>>>> Naoya Horiguchi
> >>>>>
> >>>>>>
> >>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >>>>>> ---
> >>>>>>  mm/memory-failure.c | 14 ++++++++++++++
> >>>>>>  1 file changed, 14 insertions(+)
> >>>>>>
> >>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>>>>> index 23bfd809dc8c..ac6492e36978 100644
> >>>>>> --- a/mm/memory-failure.c
> >>>>>> +++ b/mm/memory-failure.c
> >>>>>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
> >>>>>>      }
> >>>>>>
> >>>>>>      if (PageTransHuge(hpage)) {
> >>>>>> +            /*
> >>>>>> +             * The non-lru compound movable pages could be taken for
> >>>>>> +             * transhuge pages. Also huge zero page could reach here
> >>>>>> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
> >>>>>> +             * be triggered in split_huge_page_to_list(). Skip these
> >>>>>> +             * pages by checking PageLRU because huge zero page isn't
> >>>>>> +             * lru page as non-lru compound movable pages.
> >>>>>> +             */
> >>>>>> +            if (!PageLRU(hpage)) {
> >>>>>> +                    put_page(p);
> >>>>>> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> >>>>>> +                    res = -EBUSY;
> >>>>>> +                    goto unlock_mutex;
> >>>>>> +            }
> >>>>>>              /*
> >>>>>>               * The flag must be set after the refcount is bumped
> >>>>>>               * otherwise it may race with THP split.
> >>>>>> --
> >>>>>> 2.23.0
> >>>>
> >>>>
> >>> .
> >>>
> >>
> > .
> >
>
>
Miaohe Lin March 11, 2022, 1:54 a.m. UTC | #9
On 2022/3/11 3:32, Yang Shi wrote:
> On Thu, Mar 10, 2022 at 3:46 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/3/9 2:47, Yang Shi wrote:
>>> On Tue, Mar 8, 2022 at 4:36 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> On 2022/3/8 3:53, Yang Shi wrote:
>>>>> On Sun, Mar 6, 2022 at 11:07 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>
>>>>>> On 2022/3/4 16:28, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>>>>> On Mon, Feb 28, 2022 at 10:02:45PM +0800, Miaohe Lin wrote:
>>>>>>>> The huge zero page could reach here and if we ever try to split it, the
>>>>>>>> VM_BUG_ON_PAGE will be triggered in split_huge_page_to_list(). Also the
>>>>>>>> non-lru compound movable pages could be taken for transhuge pages. Skip
>>>>>>>> these pages by checking PageLRU because huge zero page isn't lru page as
>>>>>>>> non-lru compound movable pages.
>>>>>>>
>>>>>>> It seems that memory_failure() also fails at get_any_page() with "hwpoison:
>>>>>>> unhandlable page" message.
>>>>>>>
>>>>>>>   [16478.203474] page:00000000b6acdbd1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1810b4
>>>>>>>   [16478.206612] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
>>>>>>>   [16478.209411] raw: 0057ffffc0801000 fffff11bc6042d08 fffff11bc6042d08 0000000000000000
>>>>>>>   [16478.211921] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>>>>>>>   [16478.214473] page dumped because: hwpoison: unhandlable page
>>>>>>>   [16478.216386] Memory failure: 0x1810b4: recovery action for unknown page: Ignored
>>>>>>>
>>>>>>> We can't handle errors on huge (or normal) zero page, so the current
>>>>>>
>>>>>> Sorry for confusing commit log again. I should have a coffee before I make this patch.
>>>>>> Huge or normal zero page will fail at get_any_page because they're neither HWPoisonHandlable
>>>>>> nor PageHuge.
>>>>>>
>>>>>>> behavior seems to me more suitable than "unsplit thp".
>>>>>>>
>>>>>>> Or if you have some producer to reach the following path with huge zero
>>>>>>> page, could you share it?
>>>>>>>
>>>>>>
>>>>>> What I mean is that non-lru movable compound page can reach here unexpected because __PageMovable(page)
>>>>>> is handleable now. So get_any_page could succeed to grab the page refcnt. And since it's compound page,
>>>>>> it will go through the split_huge_page_to_list because PageTransHuge checks PageHead(page) which can also
>>>>>> be true for compound page. But this type of pages is unexpected for split_huge_page_to_list.
>>>>>
>>>>> Can we really handle non-LRU movable pages in memory failure
>>>>> (uncorrectable errors)? Typically they are balloon, zsmalloc, etc.
>>>>> Assuming we run into a base (4K) non-LRU movable page, we could reach
>>>>> as far as identify_page_state(), it should not fall into any category
>>>>> except me_unknown. So it seems we could just simply make it
>>>>> unhandlable.
>>>>
>>>> There is the comment from memory_failure:
>>>>         /*
>>>>          * We ignore non-LRU pages for good reasons.
>>>>          * - PG_locked is only well defined for LRU pages and a few others
>>>>          * - to avoid races with __SetPageLocked()
>>>>          * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
>>>>          * The check (unnecessarily) ignores LRU pages being isolated and
>>>>          * walked by the page reclaim code, however that's not a big loss.
>>>>          */
>>>>
>>>> So we could not handle non-LRU movable pages.
>>>>
>>>> What do you mean is something like below?
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 5444a8ef4867..d80dbe0f20b6 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1784,6 +1784,13 @@ int memory_failure(unsigned long pfn, int flags)
>>>>                 }
>>>>         }
>>>>
>>>> +       if (__PageMovable(hpage)) {
>>>> +               put_page(p);
>>>> +               action_result(pfn, MF_MSG_MOVALBE_PAGE, MF_IGNORED);
>>>> +               res = -EBUSY;
>>>> +               goto unlock_mutex;
>>>> +       }
>>>> +
>>>>         if (PageTransHuge(hpage)) {
>>>>                 /*
>>>>                  * The flag must be set after the refcount is bumped
>>>>
>>>>
>>>> i.e. Simply make non-LRU movable pages unhandlable ?
>>>
>>
>> I think about the below code more carefully and I found that this will make
>> hwpoison_filter can't handle the non-LRU movable pages now. Because non-LRU
>> movable pages return early now and thus can't reach the hwpoison_filter. This
>> results in a inconsistent behavior with previous one. So I think the origin
>> fixup of this patch is more suitable. What do you think?
> 
> I'm not familiar with hwpoison_filter(), it seems like a test helper
> for error injection. I don't think hwpoison_filter() is used to filter
> unhandlable page, for example, slab page, IIUC. So the non-LRU movable
> pages should be treated the same. If so, the old behavior was simply
> wrong.

I think you're right. hwpoison_filter should filter the handleable error.

Thanks.

> 
>>
>> Thanks.
>>
>>> I'd prefer this personally. Something like the below (compile test only):
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 5444a8ef4867..789e40909ade 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>>>   * does not return true for hugetlb or device memory pages, so it's assumed
>>>   * to be called only in the context where we never have such pages.
>>>   */
>>> -static inline bool HWPoisonHandlable(struct page *page)
>>> +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>>>  {
>>> - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
>>> + bool movable = false;
>>> +
>>> + /* Soft offline could mirgate non-LRU movable pages */
>>> + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
>>> + movable = true;
>>> +
>>> + return movable || PageLRU(page) || is_free_buddy_page(page);
>>>  }
>>>
>>> -static int __get_hwpoison_page(struct page *page)
>>> +static int __get_hwpoison_page(struct page *page, unsigned long flags)
>>>  {
>>>   struct page *head = compound_head(page);
>>>   int ret = 0;
>>> @@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct page *page)
>>>   * for any unsupported type of page in order to reduce the risk of
>>>   * unexpected races caused by taking a page refcount.
>>>   */
>>> - if (!HWPoisonHandlable(head))
>>> + if (!HWPoisonHandlable(head, flags))
>>>   return -EBUSY;
>>>
>>>   if (get_page_unless_zero(head)) {
>>> @@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p, unsigned
>>> long flags)
>>>
>>>  try_again:
>>>   if (!count_increased) {
>>> - ret = __get_hwpoison_page(p);
>>> + ret = __get_hwpoison_page(p, flags);
>>>   if (!ret) {
>>>   if (page_count(p)) {
>>>   /* We raced with an allocation, retry. */
>>> @@ -1249,7 +1255,7 @@ static int get_any_page(struct page *p, unsigned
>>> long flags)
>>>   }
>>>   }
>>>
>>> - if (PageHuge(p) || HWPoisonHandlable(p)) {
>>> + if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
>>>   ret = 1;
>>>   } else {
>>>   /*
>>>
>>>>
>>>>>
>>>>> But it should be handlable for soft-offline since it could be migrated.
>>>>>
>>>>
>>>> Yes, non-LRU movable pages can be simply migrated.
>>>>
>>>> Many thanks.
>>>>
>>>>>
>>>>>> Does this make sense for you? Thanks Naoya.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Naoya Horiguchi
>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>>>> ---
>>>>>>>>  mm/memory-failure.c | 14 ++++++++++++++
>>>>>>>>  1 file changed, 14 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>>> index 23bfd809dc8c..ac6492e36978 100644
>>>>>>>> --- a/mm/memory-failure.c
>>>>>>>> +++ b/mm/memory-failure.c
>>>>>>>> @@ -1792,6 +1792,20 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>>      }
>>>>>>>>
>>>>>>>>      if (PageTransHuge(hpage)) {
>>>>>>>> +            /*
>>>>>>>> +             * The non-lru compound movable pages could be taken for
>>>>>>>> +             * transhuge pages. Also huge zero page could reach here
>>>>>>>> +             * and if we ever try to split it, the VM_BUG_ON_PAGE will
>>>>>>>> +             * be triggered in split_huge_page_to_list(). Skip these
>>>>>>>> +             * pages by checking PageLRU because huge zero page isn't
>>>>>>>> +             * lru page as non-lru compound movable pages.
>>>>>>>> +             */
>>>>>>>> +            if (!PageLRU(hpage)) {
>>>>>>>> +                    put_page(p);
>>>>>>>> +                    action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>>>>>>>> +                    res = -EBUSY;
>>>>>>>> +                    goto unlock_mutex;
>>>>>>>> +            }
>>>>>>>>              /*
>>>>>>>>               * The flag must be set after the refcount is bumped
>>>>>>>>               * otherwise it may race with THP split.
>>>>>>>> --
>>>>>>>> 2.23.0
>>>>>>
>>>>>>
>>>>> .
>>>>>
>>>>
>>> .
>>>
>>
>>
> .
>
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 23bfd809dc8c..ac6492e36978 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1792,6 +1792,20 @@  int memory_failure(unsigned long pfn, int flags)
 	}
 
 	if (PageTransHuge(hpage)) {
+		/*
+		 * The non-lru compound movable pages could be taken for
+		 * transhuge pages. Also huge zero page could reach here
+		 * and if we ever try to split it, the VM_BUG_ON_PAGE will
+		 * be triggered in split_huge_page_to_list(). Skip these
+		 * pages by checking PageLRU because huge zero page isn't
+		 * lru page as non-lru compound movable pages.
+		 */
+		if (!PageLRU(hpage)) {
+			put_page(p);
+			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
+			res = -EBUSY;
+			goto unlock_mutex;
+		}
 		/*
 		 * The flag must be set after the refcount is bumped
 		 * otherwise it may race with THP split.