diff mbox series

[v4,1/2] mm,hwpoison: fix race with compound page allocation

Message ID 20210517045401.2506032-2-nao.horiguchi@gmail.com (mailing list archive)
State New, archived
Headers show
Series hwpoison: fix race with compound page allocation | expand

Commit Message

Naoya Horiguchi May 17, 2021, 4:54 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

When hugetlb page fault (under overcommitting situation) and
memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:

    CPU0:                           CPU1:

                                    gather_surplus_pages()
                                      page = alloc_surplus_huge_page()
    memory_failure_hugetlb()
      get_hwpoison_page(page)
        __get_hwpoison_page(page)
          get_page_unless_zero(page)
                                      zero = put_page_testzero(page)
                                      VM_BUG_ON_PAGE(!zero, page)
                                      enqueue_huge_page(h, page)
      put_page(page)

__get_hwpoison_page() only checks page refcount before taking additional
one for memory error handling, which is wrong because there's a time
window where compound pages have non-zero refcount during initialization.

So makes __get_hwpoison_page() check page status a bit more for a few
types of compound pages. PageSlab() check is added because otherwise
"non anonymous thp" path is wrongly chosen.

Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Muchun Song <songmuchun@bytedance.com>
Cc: stable@vger.kernel.org # 5.12+
---
ChangeLog v4:
- all hugetlb related check in hugetlb_lock,
- fix build error with #ifdef CONFIG_HUGETLB_PAGE
---
 mm/memory-failure.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Oscar Salvador May 17, 2021, 10:12 a.m. UTC | #1
On Mon, May 17, 2021 at 01:54:00PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
>           get_page_unless_zero(page)
>                                       zero = put_page_testzero(page)
>                                       VM_BUG_ON_PAGE(!zero, page)
>                                       enqueue_huge_page(h, page)
>       put_page(page)
> 
> __get_hwpoison_page() only checks page refcount before taking additional
                                  ^^ the?                      ^^ an
> one for memory error handling, which is wrong because there's a time
> window where compound pages have non-zero refcount during initialization.
> 
> So makes __get_hwpoison_page() check page status a bit more for a few
     ^^ make
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.

This is no longer true with this patch, is it? What happened here?

>  static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
> +	int ret = 0;
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> +	spin_lock(&hugetlb_lock);
> +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> +		ret = get_page_unless_zero(head);
> +	spin_unlock(&hugetlb_lock);
> +	if (ret > 0)
> +		return ret;
> +#endif

I am kind of fine with this, but I wonder whether it makes sense to hide this
details into helper (with an empty stub for non-hugetlb pages)?

>  	if (!PageHuge(head) && PageTransHuge(head)) {
This !PageHuge could go?
Naoya Horiguchi May 17, 2021, 11:34 a.m. UTC | #2
On Mon, May 17, 2021 at 12:12:50PM +0200, Oscar Salvador wrote:
> On Mon, May 17, 2021 at 01:54:00PM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When hugetlb page fault (under overcommitting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks page refcount before taking additional
>                                   ^^ the?                      ^^ an
> > one for memory error handling, which is wrong because there's a time
> > window where compound pages have non-zero refcount during initialization.
> > 
> > So makes __get_hwpoison_page() check page status a bit more for a few
>      ^^ make
> > types of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
> 
> This is no longer true with this patch, is it? What happened here?

Sorry, we need remove this. I dropped the changes for PageSlab
because I'm still not sure about what to do (I'd like to do it
in a separate work).

> 
> >  static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> > +	int ret = 0;
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	spin_lock(&hugetlb_lock);
> > +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> > +		ret = get_page_unless_zero(head);
> > +	spin_unlock(&hugetlb_lock);
> > +	if (ret > 0)
> > +		return ret;
> > +#endif
> 
> I am kind of fine with this, but I wonder whether it makes sense to hide this
> details into helper (with an empty stub for non-hugetlb pages)?

OK, I will do this.

> 
> >  	if (!PageHuge(head) && PageTransHuge(head)) {
> This !PageHuge could go?

I'll update this, too.

Thanks,
Naoya Horiguchi
Mike Kravetz May 17, 2021, 8:11 p.m. UTC | #3
On 5/16/21 9:54 PM, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
>           get_page_unless_zero(page)
>                                       zero = put_page_testzero(page)
>                                       VM_BUG_ON_PAGE(!zero, page)
>                                       enqueue_huge_page(h, page)
>       put_page(page)
> 
> __get_hwpoison_page() only checks page refcount before taking additional
> one for memory error handling, which is wrong because there's a time
> window where compound pages have non-zero refcount during initialization.
> 
> So makes __get_hwpoison_page() check page status a bit more for a few
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.
> 
> Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: Muchun Song <songmuchun@bytedance.com>
> Cc: stable@vger.kernel.org # 5.12+
> ---
> ChangeLog v4:
> - all hugetlb related check in hugetlb_lock,
> - fix build error with #ifdef CONFIG_HUGETLB_PAGE
> ---
>  mm/memory-failure.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
> index a3659619d293..761f982b6d7b 100644
> --- v5.12/mm/memory-failure.c
> +++ v5.12_patched/mm/memory-failure.c
> @@ -1094,6 +1094,16 @@ static int page_action(struct page_state *ps, struct page *p,
>  static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
> +	int ret = 0;
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> +	spin_lock(&hugetlb_lock);
> +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> +		ret = get_page_unless_zero(head);
> +	spin_unlock(&hugetlb_lock);
> +	if (ret > 0)
> +		return ret;
> +#endif

I must be missing something.

The above code makes sure the page is not in one of these transitive
hugetlb states as mentioned in the commit message.  It only attempts
to take a reference on the page if it is not in one of these states.

However, if it is in such a transitive state (!HPageFreed(head) &&
!HPageMigratable(head)) we will fall through and execute the code:

	if (get_page_unless_zero(head)) {
		if (head == compound_head(page))
			return 1;

So, it seems like we will always do a get_page_unless_zero() for
PageHuge() pages?

Also, due to commit db71ef79b59b ("hugetlb: make free_huge_page irq
safe") you need to make sure interrupts are disabled when taking
hugetlb_lock.
HORIGUCHI NAOYA(堀口 直也) May 18, 2021, 5:24 a.m. UTC | #4
On Mon, May 17, 2021 at 01:11:25PM -0700, Mike Kravetz wrote:
> On 5/16/21 9:54 PM, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When hugetlb page fault (under overcommitting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks page refcount before taking additional
> > one for memory error handling, which is wrong because there's a time
> > window where compound pages have non-zero refcount during initialization.
> > 
> > So makes __get_hwpoison_page() check page status a bit more for a few
> > types of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
> > 
> > Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: Muchun Song <songmuchun@bytedance.com>
> > Cc: stable@vger.kernel.org # 5.12+
> > ---
> > ChangeLog v4:
> > - all hugetlb related check in hugetlb_lock,
> > - fix build error with #ifdef CONFIG_HUGETLB_PAGE
> > ---
> >  mm/memory-failure.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
> > index a3659619d293..761f982b6d7b 100644
> > --- v5.12/mm/memory-failure.c
> > +++ v5.12_patched/mm/memory-failure.c
> > @@ -1094,6 +1094,16 @@ static int page_action(struct page_state *ps, struct page *p,
> >  static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> > +	int ret = 0;
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	spin_lock(&hugetlb_lock);
> > +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> > +		ret = get_page_unless_zero(head);
> > +	spin_unlock(&hugetlb_lock);
> > +	if (ret > 0)
> > +		return ret;
> > +#endif
> 
> I must be missing something.
> 
> The above code makes sure the page is not in one of these transitive
> hugetlb states as mentioned in the commit message.  It only attempts
> to take a reference on the page if it is not in one of these states.
> 
> However, if it is in such a transitive state (!HPageFreed(head) &&
> !HPageMigratable(head)) we will fall through and execute the code:
> 
> 	if (get_page_unless_zero(head)) {
> 		if (head == compound_head(page))
> 			return 1;
> 
> So, it seems like we will always do a get_page_unless_zero() for
> PageHuge() pages?

Right, no need to fall through in such a case.

> 
> Also, due to commit db71ef79b59b ("hugetlb: make free_huge_page irq
> safe") you need to make sure interrupts are disabled when taking
> hugetlb_lock.

Thanks, I'll rebase to latest mainline and comply with new semantics.

- Naoya
diff mbox series

Patch

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index a3659619d293..761f982b6d7b 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1094,6 +1094,16 @@  static int page_action(struct page_state *ps, struct page *p,
 static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
+	int ret = 0;
+
+#ifdef CONFIG_HUGETLB_PAGE
+	spin_lock(&hugetlb_lock);
+	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
+		ret = get_page_unless_zero(head);
+	spin_unlock(&hugetlb_lock);
+	if (ret > 0)
+		return ret;
+#endif
 
 	if (!PageHuge(head) && PageTransHuge(head)) {
 		/*