diff mbox series

[v1] hugetlb: pass head page to remove_hugetlb_page()

Message ID 20210526235257.2769473-1-nao.horiguchi@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v1] hugetlb: pass head page to remove_hugetlb_page() | expand

Commit Message

Naoya Horiguchi May 26, 2021, 11:52 p.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

When memory_failure() or soft_offline_page() is called on a tail page of
some hugetlb page, "BUG: unable to handle page fault" error can be
triggered.

remove_hugetlb_page() dereferences page->lru, so it's assumed that the
page points to a head page, but one of the caller,
dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
which could be a tail page.  So pass 'head' to it, instead.

Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox May 27, 2021, 12:31 a.m. UTC | #1
On Thu, May 27, 2021 at 08:52:57AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When memory_failure() or soft_offline_page() is called on a tail page of
> some hugetlb page, "BUG: unable to handle page fault" error can be
> triggered.
> 
> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> page points to a head page, but one of the caller,
> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> which could be a tail page.  So pass 'head' to it, instead.

I'd like to point out that with folios, this is a compile-time error,
not a run-time error.
Mike Kravetz May 27, 2021, 12:55 a.m. UTC | #2
On 5/26/21 4:52 PM, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When memory_failure() or soft_offline_page() is called on a tail page of
> some hugetlb page, "BUG: unable to handle page fault" error can be
> triggered.
> 
> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> page points to a head page, but one of the caller,
> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> which could be a tail page.  So pass 'head' to it, instead.
> 
> Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks Naoya!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Muchun Song May 27, 2021, 2:43 a.m. UTC | #3
On Thu, May 27, 2021 at 7:53 AM Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> When memory_failure() or soft_offline_page() is called on a tail page of
> some hugetlb page, "BUG: unable to handle page fault" error can be
> triggered.
>
> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> page points to a head page, but one of the caller,
> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> which could be a tail page.  So pass 'head' to it, instead.
>
> Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks Naoya!

Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Michal Hocko May 27, 2021, 6:58 a.m. UTC | #4
On Thu 27-05-21 08:52:57, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When memory_failure() or soft_offline_page() is called on a tail page of
> some hugetlb page, "BUG: unable to handle page fault" error can be
> triggered.
> 
> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> page points to a head page, but one of the caller,
> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> which could be a tail page.  So pass 'head' to it, instead.
> 
> Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

This is really nasty and easy to overlook. I have completely missed that
when reviewing and I do remember checking for head vs page as there is
quite some non trivial handling of both here.

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git v5.13-rc3/mm/hugetlb.c v5.13-rc3_patched/mm/hugetlb.c
> index 95918f410c0f..470f7b5b437e 100644
> --- v5.13-rc3/mm/hugetlb.c
> +++ v5.13-rc3_patched/mm/hugetlb.c
> @@ -1793,7 +1793,7 @@ int dissolve_free_huge_page(struct page *page)
>  			SetPageHWPoison(page);
>  			ClearPageHWPoison(head);
>  		}
> -		remove_hugetlb_page(h, page, false);
> +		remove_hugetlb_page(h, head, false);
>  		h->max_huge_pages--;
>  		spin_unlock_irq(&hugetlb_lock);
>  		update_and_free_page(h, head);
> -- 
> 2.25.1
Oscar Salvador May 27, 2021, 7:47 a.m. UTC | #5
On Thu, May 27, 2021 at 08:52:57AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When memory_failure() or soft_offline_page() is called on a tail page of
> some hugetlb page, "BUG: unable to handle page fault" error can be
> triggered.
> 
> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> page points to a head page, but one of the caller,
> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> which could be a tail page.  So pass 'head' to it, instead.
> 
> Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

It is probably worth adding a comment in remove_hugetlb_page() noting
that we need a head page, so future users do not repeat the same
mistake.

Thanks
Michal Hocko May 27, 2021, 8:22 a.m. UTC | #6
On Thu 27-05-21 09:47:44, Oscar Salvador wrote:
> On Thu, May 27, 2021 at 08:52:57AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When memory_failure() or soft_offline_page() is called on a tail page of
> > some hugetlb page, "BUG: unable to handle page fault" error can be
> > triggered.
> > 
> > remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> > page points to a head page, but one of the caller,
> > dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> > which could be a tail page.  So pass 'head' to it, instead.
> > 
> > Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> 
> It is probably worth adding a comment in remove_hugetlb_page() noting
> that we need a head page, so future users do not repeat the same
> mistake.

Ideally this will turn into page folio concept and no comments are
really needed.
Mike Kravetz May 27, 2021, 4:28 p.m. UTC | #7
On 5/26/21 4:52 PM, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When memory_failure() or soft_offline_page() is called on a tail page of
> some hugetlb page, "BUG: unable to handle page fault" error can be
> triggered.
> 
> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> page points to a head page, but one of the caller,
> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> which could be a tail page.  So pass 'head' to it, instead.
> 
> Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git v5.13-rc3/mm/hugetlb.c v5.13-rc3_patched/mm/hugetlb.c
> index 95918f410c0f..470f7b5b437e 100644
> --- v5.13-rc3/mm/hugetlb.c
> +++ v5.13-rc3_patched/mm/hugetlb.c
> @@ -1793,7 +1793,7 @@ int dissolve_free_huge_page(struct page *page)
>  			SetPageHWPoison(page);
>  			ClearPageHWPoison(head);
>  		}
> -		remove_hugetlb_page(h, page, false);
> +		remove_hugetlb_page(h, head, false);
>  		h->max_huge_pages--;
>  		spin_unlock_irq(&hugetlb_lock);
>  		update_and_free_page(h, head);
> 

I believe we have the same problem later in the routine when calling
add_hugetlb_page()?

If so, should we combine the changes?  Or, do we need two patches as
the bugs were introduced with different commits?
Michal Hocko May 27, 2021, 7:54 p.m. UTC | #8
On Thu 27-05-21 09:28:51, Mike Kravetz wrote:
> On 5/26/21 4:52 PM, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When memory_failure() or soft_offline_page() is called on a tail page of
> > some hugetlb page, "BUG: unable to handle page fault" error can be
> > triggered.
> > 
> > remove_hugetlb_page() dereferences page->lru, so it's assumed that the
> > page points to a head page, but one of the caller,
> > dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
> > which could be a tail page.  So pass 'head' to it, instead.
> > 
> > Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> >  mm/hugetlb.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git v5.13-rc3/mm/hugetlb.c v5.13-rc3_patched/mm/hugetlb.c
> > index 95918f410c0f..470f7b5b437e 100644
> > --- v5.13-rc3/mm/hugetlb.c
> > +++ v5.13-rc3_patched/mm/hugetlb.c
> > @@ -1793,7 +1793,7 @@ int dissolve_free_huge_page(struct page *page)
> >  			SetPageHWPoison(page);
> >  			ClearPageHWPoison(head);
> >  		}
> > -		remove_hugetlb_page(h, page, false);
> > +		remove_hugetlb_page(h, head, false);
> >  		h->max_huge_pages--;
> >  		spin_unlock_irq(&hugetlb_lock);
> >  		update_and_free_page(h, head);
> > 
> 
> I believe we have the same problem later in the routine when calling
> add_hugetlb_page()?

Can we ever get a tail page there?

> If so, should we combine the changes?  Or, do we need two patches as
> the bugs were introduced with different commits?

If there is an issue then I would go with a separate patch.

Thanks!
Mike Kravetz May 27, 2021, 9:12 p.m. UTC | #9
On 5/27/21 12:54 PM, Michal Hocko wrote:
> On Thu 27-05-21 09:28:51, Mike Kravetz wrote:
>> On 5/26/21 4:52 PM, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> remove_hugetlb_page() dereferences page->lru, so it's assumed that the
>>> page points to a head page, but one of the caller,
>>> dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
>>> which could be a tail page.  So pass 'head' to it, instead.
>>>
>>
>> I believe we have the same problem later in the routine when calling
>> add_hugetlb_page()?
> 
> Can we ever get a tail page there?
> 

Yes.  Actually alloc_huge_page_vmemmap() and add_hugetlb_page() calls
later in the same block of code expect head page but we pass 'page' which
could be a tail page.

>> If so, should we combine the changes?  Or, do we need two patches as
>> the bugs were introduced with different commits?
> 
> If there is an issue then I would go with a separate patch.
> 

The calls to alloc_huge_page_vmemmap() and add_hugetlb_page() are only
in Andrew's tree (and next).  So, they can be fixed there.

I'll send a patch shortly.
diff mbox series

Patch

diff --git v5.13-rc3/mm/hugetlb.c v5.13-rc3_patched/mm/hugetlb.c
index 95918f410c0f..470f7b5b437e 100644
--- v5.13-rc3/mm/hugetlb.c
+++ v5.13-rc3_patched/mm/hugetlb.c
@@ -1793,7 +1793,7 @@  int dissolve_free_huge_page(struct page *page)
 			SetPageHWPoison(page);
 			ClearPageHWPoison(head);
 		}
-		remove_hugetlb_page(h, page, false);
+		remove_hugetlb_page(h, head, false);
 		h->max_huge_pages--;
 		spin_unlock_irq(&hugetlb_lock);
 		update_and_free_page(h, head);