diff mbox series

Fix incorrect compound page flags store

Message ID 20200908034441.16359-1-yaoaili126@163.com
State New
Headers show
Series Fix incorrect compound page flags store | expand

Commit Message

yaoaili126@163.com Sept. 8, 2020, 3:44 a.m. UTC
From: Aili Yao <yaoaili@kingsoft.com>

PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
---
 mm/memory-failure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

HORIGUCHI NAOYA(堀口 直也) Sept. 8, 2020, 7:02 a.m. UTC | #1
On Mon, Sep 07, 2020 at 08:44:42PM -0700, yaoaili126@163.com wrote:
> From: Aili Yao <yaoaili@kingsoft.com>
> 
> PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.
> 
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>

I found that this PageHuge() check is removed and no long exists
in the latest mmotm, so we don't have worry about it.
Sorry for missing it in my previous review.

Thanks,
Naoya Horiguchi

> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f1aa6433f404..e6995976b11d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1382,7 +1382,7 @@ int memory_failure(unsigned long pfn, int flags)
>  	 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
>  	 * correctly, we save a copy of the page flags at this time.
>  	 */
> -	if (PageHuge(p))
> +	if (PageCompound(p))
>  		page_flags = hpage->flags;
>  	else
>  		page_flags = p->flags;
> -- 
> 2.18.4
> 
>
yaoaili [么爱利] Sept. 8, 2020, 7:18 a.m. UTC | #2
Thanks all the same!
And where can I get latest mmotm?

-----Original Message-----
From: HORIGUCHI NAOYA(堀口 直也) [mailto:naoya.horiguchi@nec.com] 
Sent: Tuesday, September 8, 2020 3:02 PM
To: yaoaili126@163.com
Cc: linux-mm@kvack.org; YANGFENG1 [杨峰] <YANGFENG1@kingsoft.com>; yaoaili [么爱利] <yaoaili@kingsoft.com>; willy@infradead.org
Subject: Re: [PATCH] Fix incorrect compound page flags store

On Mon, Sep 07, 2020 at 08:44:42PM -0700, yaoaili126@163.com wrote:
> From: Aili Yao <yaoaili@kingsoft.com>
> 
> PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.
> 
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>

I found that this PageHuge() check is removed and no long exists in the latest mmotm, so we don't have worry about it.
Sorry for missing it in my previous review.

Thanks,
Naoya Horiguchi

> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 
> f1aa6433f404..e6995976b11d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1382,7 +1382,7 @@ int memory_failure(unsigned long pfn, int flags)
>  	 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
>  	 * correctly, we save a copy of the page flags at this time.
>  	 */
> -	if (PageHuge(p))
> +	if (PageCompound(p))
>  		page_flags = hpage->flags;
>  	else
>  		page_flags = p->flags;
> --
> 2.18.4
> 
>
Oscar Salvador Sept. 8, 2020, 7:26 a.m. UTC | #3
On Tue, Sep 08, 2020 at 07:02:12AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Sep 07, 2020 at 08:44:42PM -0700, yaoaili126@163.com wrote:
> > From: Aili Yao <yaoaili@kingsoft.com>
> > 
> > PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.
> > 
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
> > Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> 
> I found that this PageHuge() check is removed and no long exists
> in the latest mmotm, so we don't have worry about it.

I might be missing something, so bear with me.

It is true that the PageHuge check is gone, but we are storing the page's
flags in page_flags, even if the page is a tail (e.g: part of a compound page
).
Should not we store heads' flags instead?

AFAICS, hpage contains either the head of the compound page,
or the page itself in case it is a normal page.
yaoaili [么爱利] Sept. 8, 2020, 8:02 a.m. UTC | #4
Hi
I think if the poision page is from patrol scrub, the page may be a just compound page in driver(not huge and transhuge), which it's ok for the following hwpoison_user_mappings. For  PageTransHuge, we have split it, the the flags will be good.
so I think the modification in the latest mmotm  is OK
-----Original Message-----
From: Oscar Salvador [mailto:osalvador@suse.de] 
Sent: Tuesday, September 8, 2020 3:26 PM
To: HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com>
Cc: yaoaili126@163.com; linux-mm@kvack.org; YANGFENG1 [杨峰] <YANGFENG1@kingsoft.com>; yaoaili [么爱利] <yaoaili@kingsoft.com>; willy@infradead.org
Subject: Re: [PATCH] Fix incorrect compound page flags store

On Tue, Sep 08, 2020 at 07:02:12AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Sep 07, 2020 at 08:44:42PM -0700, yaoaili126@163.com wrote:
> > From: Aili Yao <yaoaili@kingsoft.com>
> > 
> > PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.
> > 
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
> > Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> 
> I found that this PageHuge() check is removed and no long exists in 
> the latest mmotm, so we don't have worry about it.

I might be missing something, so bear with me.

It is true that the PageHuge check is gone, but we are storing the page's flags in page_flags, even if the page is a tail (e.g: part of a compound page ).
Should not we store heads' flags instead?

AFAICS, hpage contains either the head of the compound page, or the page itself in case it is a normal page.

--
Oscar Salvador
SUSE L3
Oscar Salvador Sept. 8, 2020, 8:09 a.m. UTC | #5
On Tue, Sep 08, 2020 at 08:02:12AM +0000, yaoaili [么爱利] wrote:
> Hi
> I think if the poision page is from patrol scrub, the page may be a just compound page in driver(not huge and transhuge), which it's ok for the following hwpoison_user_mappings. For  PageTransHuge, we have split it, the the flags will be good.
> so I think the modification in the latest mmotm  is OK

Yeah, I see, I managed to confused myself.

Thanks
HORIGUCHI NAOYA(堀口 直也) Sept. 8, 2020, 8:36 a.m. UTC | #6
On Tue, Sep 08, 2020 at 09:26:13AM +0200, Oscar Salvador wrote:
> On Tue, Sep 08, 2020 at 07:02:12AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Sep 07, 2020 at 08:44:42PM -0700, yaoaili126@163.com wrote:
> > > From: Aili Yao <yaoaili@kingsoft.com>
> > >
> > > PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.
> > >
> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
> > > Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> >
> > I found that this PageHuge() check is removed and no long exists
> > in the latest mmotm, so we don't have worry about it.
>
> I might be missing something, so bear with me.
>
> It is true that the PageHuge check is gone, but we are storing the page's
> flags in page_flags, even if the page is a tail (e.g: part of a compound page
> ).
> Should not we store heads' flags instead?

I think that saving ->flags of the raw error page is OK, because even in thp
case we focus on the raw page instead of its head page from memory error
perspective.  It's not assumed that we can contain the error thp in the thp
unit and instead we always rely on thp split. (For hugetlb, we now contain
the errored hugetlb in hugetlb unit, so saving its head page might be OK.)

Theoretically, it could happen that a error could be collapsed into a new
thp just after passing over the following block:

  1408          if (PageTransHuge(hpage)) {
  1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
  1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
  1411                          return -EBUSY;
  1412                  }
  1413                  VM_BUG_ON_PAGE(!page_count(p), p);
  1414          }

So I feel that some check might be added after holding page lock to avoid
that case. Or acutally, it might better that moving the above block into
page lock is more better for simpler code.

Thanks,
Naoya Horiguchi

> AFAICS, hpage contains either the head of the compound page,
> or the page itself in case it is a normal page.
>
> --
> Oscar Salvador
> SUSE L3
>
yaoaili [么爱利] Sept. 8, 2020, 8:48 a.m. UTC | #7
Does the if (PageCompound(p) && compound_head(p) != orig_head) check cover the case you mentioned?
Thanks

-----Original Message-----
From: HORIGUCHI NAOYA(堀口 直也) [mailto:naoya.horiguchi@nec.com] 
Sent: Tuesday, September 8, 2020 4:37 PM
To: Oscar Salvador <osalvador@suse.de>
Cc: yaoaili126@163.com; linux-mm@kvack.org; YANGFENG1 [杨峰] <YANGFENG1@kingsoft.com>; yaoaili [么爱利] <yaoaili@kingsoft.com>; willy@infradead.org
Subject: Re: [PATCH] Fix incorrect compound page flags store

On Tue, Sep 08, 2020 at 09:26:13AM +0200, Oscar Salvador wrote:
> On Tue, Sep 08, 2020 at 07:02:12AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Sep 07, 2020 at 08:44:42PM -0700, yaoaili126@163.com wrote:
> > > From: Aili Yao <yaoaili@kingsoft.com>
> > >
> > > PageHuge(p) branch will never be true,but for compound page we need to set page_flags to correct value.
> > >
> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Signed-off-by: Yang Feng < yangfeng1@kingsoft.com>
> > > Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> >
> > I found that this PageHuge() check is removed and no long exists in 
> > the latest mmotm, so we don't have worry about it.
>
> I might be missing something, so bear with me.
>
> It is true that the PageHuge check is gone, but we are storing the 
> page's flags in page_flags, even if the page is a tail (e.g: part of a 
> compound page ).
> Should not we store heads' flags instead?

I think that saving ->flags of the raw error page is OK, because even in thp case we focus on the raw page instead of its head page from memory error perspective.  It's not assumed that we can contain the error thp in the thp unit and instead we always rely on thp split. (For hugetlb, we now contain the errored hugetlb in hugetlb unit, so saving its head page might be OK.)

Theoretically, it could happen that a error could be collapsed into a new thp just after passing over the following block:

  1408          if (PageTransHuge(hpage)) {
  1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
  1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
  1411                          return -EBUSY;
  1412                  }
  1413                  VM_BUG_ON_PAGE(!page_count(p), p);
  1414          }

So I feel that some check might be added after holding page lock to avoid that case. Or acutally, it might better that moving the above block into page lock is more better for simpler code.

Thanks,
Naoya Horiguchi

> AFAICS, hpage contains either the head of the compound page, or the 
> page itself in case it is a normal page.
>
> --
> Oscar Salvador
> SUSE L3
>
Oscar Salvador Sept. 8, 2020, 8:51 a.m. UTC | #8
On Tue, Sep 08, 2020 at 08:36:32AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> 
> Theoretically, it could happen that a error could be collapsed into a new
I guess you meant page here?            ^^^^
> thp just after passing over the following block:



> 
>   1408          if (PageTransHuge(hpage)) {
>   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
>   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>   1411                          return -EBUSY;
>   1412                  }
>   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
>   1414          }
> 
> So I feel that some check might be added after holding page lock to avoid
> that case. Or acutally, it might better that moving the above block into
> page lock is more better for simpler code.

I will have a look at this.

Thanks
HORIGUCHI NAOYA(堀口 直也) Sept. 8, 2020, 9 a.m. UTC | #9
On Tue, Sep 08, 2020 at 08:48:21AM +0000, yaoaili [么爱利] wrote:
> Does the if (PageCompound(p) && compound_head(p) != orig_head) check cover the case you mentioned?

This check covers when the page p is included into the compound
page with *different* size from thp (for example, included into
slab pages or hugepages in other size).

If p was originally in a 2MB thp before split and then is now
included in another 2MB thp after the page lock, compound_head(p)
is equal to orig_head, so it passes over this check.

Thanks,
Naoya Horiguchi
HORIGUCHI NAOYA(堀口 直也) Sept. 8, 2020, 9:05 a.m. UTC | #10
On Tue, Sep 08, 2020 at 10:51:57AM +0200, Oscar Salvador wrote:
> On Tue, Sep 08, 2020 at 08:36:32AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > 
> > Theoretically, it could happen that a error could be collapsed into a new
> I guess you meant page here?            ^^^^

Right, sorry for my typo ...

> > thp just after passing over the following block:
> 
> 
> 
> > 
> >   1408          if (PageTransHuge(hpage)) {
> >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> >   1411                          return -EBUSY;
> >   1412                  }
> >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> >   1414          }
> > 
> > So I feel that some check might be added after holding page lock to avoid
> > that case. Or acutally, it might better that moving the above block into
> > page lock is more better for simpler code.
> 
> I will have a look at this.

Thank you!
yaoaili [么爱利] Sept. 8, 2020, 9:10 a.m. UTC | #11
Got it, Thank you! I think I need to learn more about thp.

Thanks
Aili Yao

-----Original Message-----
From: HORIGUCHI NAOYA(堀口 直也) [mailto:naoya.horiguchi@nec.com] 
Sent: Tuesday, September 8, 2020 5:01 PM
To: yaoaili [么爱利] <yaoaili@kingsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>; yaoaili126@163.com; linux-mm@kvack.org; YANGFENG1 [杨峰] <YANGFENG1@kingsoft.com>; willy@infradead.org
Subject: Re: [PATCH] Fix incorrect compound page flags store

On Tue, Sep 08, 2020 at 08:48:21AM +0000, yaoaili [么爱利] wrote:
> Does the if (PageCompound(p) && compound_head(p) != orig_head) check cover the case you mentioned?

This check covers when the page p is included into the compound page with *different* size from thp (for example, included into slab pages or hugepages in other size).

If p was originally in a 2MB thp before split and then is now included in another 2MB thp after the page lock, compound_head(p) is equal to orig_head, so it passes over this check.

Thanks,
Naoya Horiguchi
Oscar Salvador Sept. 18, 2020, 9:35 a.m. UTC | #12
On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > thp just after passing over the following block:
> > 
> > 
> > 
> > > 
> > >   1408          if (PageTransHuge(hpage)) {
> > >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > >   1411                          return -EBUSY;
> > >   1412                  }
> > >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> > >   1414          }
> > > 
> > > So I feel that some check might be added after holding page lock to avoid
> > > that case. Or acutally, it might better that moving the above block into
> > > page lock is more better for simpler code.
> > 
> > I will have a look at this.
> 
> Thank you!

Hi Naoya,

I have been taking a look at this, and unless I am missing something obvious I
do not think that a new THP (containing the page) can be collapsed under us:

We do take a refcount on the page by means of get_hwpoison_page.
We could only have done that if the page was mapped, so its refcount was already
above 0.

Then we split the THP, and the refcount/mapcount go to the page we are trying to
poison.
At this point the page should add least have refcount > 1 mapcount >= 1.

After that, let us assume that a new THP is trying to be collapsed by means of
khugepaged thread or madvise MADV_HUGEPAGE.

khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if they can
be collapsed, and one of the things it does is checking the page's refcount/
mappcount by calling is_refcount_suitable().

	expected_refcount = total_mapcount(page);
	return page_count(page) == expected_refcount;

We do have an extra pin from memory_failure, so this is going to fail because

page: refcount = 2 , mapcount = 1

Beware that the page must sitll be mapped somehow, otherwise the PageLRU check
from above should have failed with the same result:

	if (!PageLRU(page)) {
		result = SCAN_PAGE_LRU;
		goto out_unmap;
	}

So, I do not think the page can be collapsed into a new THP after we have split
it here, but as I said, I might be missing something.

Thoughts?
HORIGUCHI NAOYA(堀口 直也) Sept. 25, 2020, 1:18 a.m. UTC | #13
On Fri, Sep 18, 2020 at 11:35:02AM +0200, Oscar Salvador wrote:
> On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > thp just after passing over the following block:
> > >
> > >
> > >
> > > >
> > > >   1408          if (PageTransHuge(hpage)) {
> > > >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > > >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > > >   1411                          return -EBUSY;
> > > >   1412                  }
> > > >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> > > >   1414          }
> > > >
> > > > So I feel that some check might be added after holding page lock to avoid
> > > > that case. Or acutally, it might better that moving the above block into
> > > > page lock is more better for simpler code.
> > >
> > > I will have a look at this.
> >
> > Thank you!
>
> Hi Naoya,
>
> I have been taking a look at this, and unless I am missing something obvious I
> do not think that a new THP (containing the page) can be collapsed under us:
>
> We do take a refcount on the page by means of get_hwpoison_page.
> We could only have done that if the page was mapped, so its refcount was already
> above 0.
>
> Then we split the THP, and the refcount/mapcount go to the page we are trying to
> poison.
> At this point the page should add least have refcount > 1 mapcount >= 1.
>
> After that, let us assume that a new THP is trying to be collapsed by means of
> khugepaged thread or madvise MADV_HUGEPAGE.
>
> khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if they can
> be collapsed, and one of the things it does is checking the page's refcount/
> mappcount by calling is_refcount_suitable().
>
> 	expected_refcount = total_mapcount(page);
> 	return page_count(page) == expected_refcount;
>
> We do have an extra pin from memory_failure, so this is going to fail because
>
> page: refcount = 2 , mapcount = 1
>
> Beware that the page must sitll be mapped somehow, otherwise the PageLRU check
> from above should have failed with the same result:
>
> 	if (!PageLRU(page)) {
> 		result = SCAN_PAGE_LRU;
> 		goto out_unmap;
> 	}
>
> So, I do not think the page can be collapsed into a new THP after we have split
> it here, but as I said, I might be missing something.

This logic sounds convincing to me, or another possibility like conversion to
other types of compound_page (like slab) is also prevented due to the refcount.

The MF_MSG_DIFFERENT_COMPOUND path was originally introduced heuristically
based on error report in stress testing, and the mechanism of the problem
was unclear.

Thanks,
Naoya Horiguchi
yaoaili [么爱利] Sept. 25, 2020, 1:55 a.m. UTC | #14
>On Fri, Sep 18, 2020 at 11:35:02AM +0200, Oscar Salvador wrote:
> On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > thp just after passing over the following block:
> > >
> > >
> > >
> > > >
> > > >   1408          if (PageTransHuge(hpage)) {
> > > >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > > >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > > >   1411                          return -EBUSY;
> > > >   1412                  }
> > > >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> > > >   1414          }
> > > >
> > > > So I feel that some check might be added after holding page lock 
> > > > to avoid that case. Or acutally, it might better that moving the 
> > > > above block into page lock is more better for simpler code.
> > >
> > > I will have a look at this.
> >
> > Thank you!
>
> Hi Naoya,
>
> I have been taking a look at this, and unless I am missing something 
> obvious I do not think that a new THP (containing the page) can be collapsed under us:
>
> We do take a refcount on the page by means of get_hwpoison_page.
> We could only have done that if the page was mapped, so its refcount 
> was already above 0.
>
> Then we split the THP, and the refcount/mapcount go to the page we are 
> trying to poison.
> At this point the page should add least have refcount > 1 mapcount >= 1.
>
> After that, let us assume that a new THP is trying to be collapsed by 
> means of khugepaged thread or madvise MADV_HUGEPAGE.
>
> khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if 
> they can be collapsed, and one of the things it does is checking the 
> page's refcount/ mappcount by calling is_refcount_suitable().
>
> 	expected_refcount = total_mapcount(page);
> 	return page_count(page) == expected_refcount;
>
> We do have an extra pin from memory_failure, so this is going to fail 
> because
>
> page: refcount = 2 , mapcount = 1
>
> Beware that the page must sitll be mapped somehow, otherwise the 
> PageLRU check from above should have failed with the same result:
>
> 	if (!PageLRU(page)) {
> 		result = SCAN_PAGE_LRU;
> 		goto out_unmap;
> 	}
>
> So, I do not think the page can be collapsed into a new THP after we 
> have split it here, but as I said, I might be missing something.
>
>This logic sounds convincing to me, or another possibility like conversion
>to other types of compound_page (like   slab) is also prevented due to the refcount.
>
>The MF_MSG_DIFFERENT_COMPOUND path was originally introduced heuristically
>based on error report in stress testing, and the mechanism of the problem was unclear.
>
>Thanks,
>Naoya Horiguchi

There is another check for hwpoision status in this function:
	/*
	 * unpoison always clear PG_hwpoison inside page lock
	 */
	if (!PageHWPoison(p)) {
		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
		num_poisoned_pages_dec();
		unlock_page(p);
		put_hwpoison_page(p);
		return 0;
	}
I think the check here and the check for MF_MSG_DIFFERENT_COMPOUND are both for stress test purpose, In stress test scenario, the page may be unpoisioned and be reallocted meanwhile, so the code here and previously talked check really does some meaningful ckecking for test,  which will not happen in real cases,As the poision page will not be unposioned for normal memory.

Thanks 
Aili Yao
HORIGUCHI NAOYA(堀口 直也) Sept. 25, 2020, 4:35 a.m. UTC | #15
On Fri, Sep 25, 2020 at 01:55:31AM +0000, yaoaili [么爱利] wrote:
> >On Fri, Sep 18, 2020 at 11:35:02AM +0200, Oscar Salvador wrote:
> > On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > > thp just after passing over the following block:
> > > >
> > > >
> > > >
> > > > >
> > > > >   1408          if (PageTransHuge(hpage)) {
> > > > >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > > > >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > > > >   1411                          return -EBUSY;
> > > > >   1412                  }
> > > > >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> > > > >   1414          }
> > > > >
> > > > > So I feel that some check might be added after holding page lock 
> > > > > to avoid that case. Or acutally, it might better that moving the 
> > > > > above block into page lock is more better for simpler code.
> > > >
> > > > I will have a look at this.
> > >
> > > Thank you!
> >
> > Hi Naoya,
> >
> > I have been taking a look at this, and unless I am missing something 
> > obvious I do not think that a new THP (containing the page) can be collapsed under us:
> >
> > We do take a refcount on the page by means of get_hwpoison_page.
> > We could only have done that if the page was mapped, so its refcount 
> > was already above 0.
> >
> > Then we split the THP, and the refcount/mapcount go to the page we are 
> > trying to poison.
> > At this point the page should add least have refcount > 1 mapcount >= 1.
> >
> > After that, let us assume that a new THP is trying to be collapsed by 
> > means of khugepaged thread or madvise MADV_HUGEPAGE.
> >
> > khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if 
> > they can be collapsed, and one of the things it does is checking the 
> > page's refcount/ mappcount by calling is_refcount_suitable().
> >
> > 	expected_refcount = total_mapcount(page);
> > 	return page_count(page) == expected_refcount;
> >
> > We do have an extra pin from memory_failure, so this is going to fail 
> > because
> >
> > page: refcount = 2 , mapcount = 1
> >
> > Beware that the page must sitll be mapped somehow, otherwise the 
> > PageLRU check from above should have failed with the same result:
> >
> > 	if (!PageLRU(page)) {
> > 		result = SCAN_PAGE_LRU;
> > 		goto out_unmap;
> > 	}
> >
> > So, I do not think the page can be collapsed into a new THP after we 
> > have split it here, but as I said, I might be missing something.
> >
> >This logic sounds convincing to me, or another possibility like conversion
> >to other types of compound_page (like   slab) is also prevented due to the refcount.
> >
> >The MF_MSG_DIFFERENT_COMPOUND path was originally introduced heuristically
> >based on error report in stress testing, and the mechanism of the problem was unclear.
> >
> >Thanks,
> >Naoya Horiguchi
> 
> There is another check for hwpoision status in this function:
> 	/*
> 	 * unpoison always clear PG_hwpoison inside page lock
> 	 */
> 	if (!PageHWPoison(p)) {
> 		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
> 		num_poisoned_pages_dec();
> 		unlock_page(p);
> 		put_hwpoison_page(p);
> 		return 0;
> 	}
> I think the check here and the check for MF_MSG_DIFFERENT_COMPOUND are both for stress test purpose, In stress test scenario, the page may be unpoisioned and be reallocted meanwhile, so the code here and previously talked check really does some meaningful ckecking for test,  which will not happen in real cases,As the poision page will not be unposioned for normal memory.

Thanks for commenting, I overlooked unpoison. Maybe two threads could run as follows:

  CPU 0                                    CPU 1

  memory_failure
    TestSetPageHWPoison
    get_hwpoison_page
    try_to_split_thp_page
                                           unpoison
                                           (reallocate to construct new compound_page)
    lock_page
    if (...) // checking MF_MSG_DIFFERENT_COMPOUND case
      ...

But in this case unpoison fails because the target page is refcount > 1 and/or mapcount > 0
(which is true between try_to_split_thp_page and lock_page), so unpoison seems not contribute
to the MF_MSG_DIFFERENT_COMPOUND case.

But I'm still not sure of any other possibility and it's ok to keep this check
for potential problem.

Thanks,
Naoya Horiguchi
yaoaili [么爱利] Sept. 25, 2020, 5:52 a.m. UTC | #16
On Fri, Sep 25, 2020 at 01:55:31AM +0000, yaoaili [么爱利] wrote:
> >On Fri, Sep 18, 2020 at 11:35:02AM +0200, Oscar Salvador wrote:
> > On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > > thp just after passing over the following block:
> > > >
> > > >
> > > >
> > > > >
> > > > >   1408          if (PageTransHuge(hpage)) {
> > > > >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > > > >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > > > >   1411                          return -EBUSY;
> > > > >   1412                  }
> > > > >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> > > > >   1414          }
> > > > >
> > > > > So I feel that some check might be added after holding page 
> > > > > lock to avoid that case. Or acutally, it might better that 
> > > > > moving the above block into page lock is more better for simpler code.
> > > >
> > > > I will have a look at this.
> > >
> > > Thank you!
> >
> > Hi Naoya,
> >
> > I have been taking a look at this, and unless I am missing something 
> > obvious I do not think that a new THP (containing the page) can be collapsed under us:
> >
> > We do take a refcount on the page by means of get_hwpoison_page.
> > We could only have done that if the page was mapped, so its refcount 
> > was already above 0.
> >
> > Then we split the THP, and the refcount/mapcount go to the page we 
> > are trying to poison.
> > At this point the page should add least have refcount > 1 mapcount >= 1.
> >
> > After that, let us assume that a new THP is trying to be collapsed 
> > by means of khugepaged thread or madvise MADV_HUGEPAGE.
> >
> > khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if 
> > they can be collapsed, and one of the things it does is checking the 
> > page's refcount/ mappcount by calling is_refcount_suitable().
> >
> > 	expected_refcount = total_mapcount(page);
> > 	return page_count(page) == expected_refcount;
> >
> > We do have an extra pin from memory_failure, so this is going to 
> > fail because
> >
> > page: refcount = 2 , mapcount = 1
> >
> > Beware that the page must sitll be mapped somehow, otherwise the 
> > PageLRU check from above should have failed with the same result:
> >
> > 	if (!PageLRU(page)) {
> > 		result = SCAN_PAGE_LRU;
> > 		goto out_unmap;
> > 	}
> >
> > So, I do not think the page can be collapsed into a new THP after we 
> > have split it here, but as I said, I might be missing something.
> >
> >This logic sounds convincing to me, or another possibility like conversion
> >to other types of compound_page (like   slab) is also prevented due to the refcount.
> >
> >The MF_MSG_DIFFERENT_COMPOUND path was originally introduced 
> >heuristically based on error report in stress testing, and the mechanism of the problem was unclear.
> >
> >Thanks,
> >Naoya Horiguchi
> 
> There is another check for hwpoision status in this function:
> 	/*
> 	 * unpoison always clear PG_hwpoison inside page lock
> 	 */
> 	if (!PageHWPoison(p)) {
> 		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
> 		num_poisoned_pages_dec();
> 		unlock_page(p);
> 		put_hwpoison_page(p);
> 		return 0;
> 	}
> I think the check here and the check for MF_MSG_DIFFERENT_COMPOUND are both for stress test purpose, In stress test scenario, the page may be unpoisioned and be reallocted meanwhile, so the code here and previously talked check really does some meaningful ckecking for test,  which will not happen in real cases,As the poision page will not be unposioned for normal memory.

>Thanks for commenting, I overlooked unpoison. Maybe two threads could run as follows:
>
>  CPU 0                                    CPU 1
>
>  memory_failure
>   TestSetPageHWPoison
>    get_hwpoison_page
>    try_to_split_thp_page
>                                           unpoison
>                                           (reallocate to construct new compound_page)
>    lock_page
>    if (...) // checking MF_MSG_DIFFERENT_COMPOUND case
>      ...
>
>But in this case unpoison fails because the target page is refcount > 1 and/or mapcount > 0 
> (which is true between try_to_split_thp_page and lock_page), so unpoison seems not contribute 
>to the MF_MSG_DIFFERENT_COMPOUND case.
>
>But I'm still not sure of any other possibility and it's ok to keep this check for potential problem.
>
>Thanks,
>Naoya Horiguchi

Yes,I think you are right. this abnormal case may rarely happen.
but I do think of one possibility which may happen, if the unpoision is called just after TestSetPageHWPoison ,so the two thread may work simultaneously and may hit the check case mentioned before.
there may be other cases which we don't know, but I think In production enviroment we may remove it.

Thanks
Aili Yao
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f1aa6433f404..e6995976b11d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1382,7 +1382,7 @@  int memory_failure(unsigned long pfn, int flags)
 	 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
 	 * correctly, we save a copy of the page flags at this time.
 	 */
-	if (PageHuge(p))
+	if (PageCompound(p))
 		page_flags = hpage->flags;
 	else
 		page_flags = p->flags;