Message ID | 20210428074654.GA2093897@u2004 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm,hwpoison: fix race with compound page allocation | expand |
On Wed, Apr 28, 2021 at 04:46:54PM +0900, Naoya Horiguchi wrote: > --- > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > Date: Wed, 28 Apr 2021 15:55:47 +0900 > Subject: [PATCH] mm,hwpoison: fix race with compound page allocation > > When hugetlb page fault (under overcommiting situation) and memory_failure() > race, VM_BUG_ON_PAGE() is triggered by the following race: > > CPU0: CPU1: > > gather_surplus_pages() > page = alloc_surplus_huge_page() > memory_failure_hugetlb() > get_hwpoison_page(page) > __get_hwpoison_page(page) > get_page_unless_zero(page) > zero = put_page_testzero(page) > VM_BUG_ON_PAGE(!zero, page) > enqueue_huge_page(h, page) > put_page(page) > > __get_hwpoison_page() only checks page refcount before taking additional > one for memory error handling, which is wrong because there's time > windows where compound pages have non-zero refcount during initialization. > > So makes __get_hwpoison_page() check more page status for a few types > of compound pages. PageSlab() check is added because otherwise > "non anonymous thp" path is wrongly chosen for slab pages. Was it wrongly chosen even before? If so, maybe a Fix tag is warranted. > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > Reported-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memory-failure.c | 48 +++++++++++++++++++++++++-------------------- > 1 file changed, 27 insertions(+), 21 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index a3659619d293..61988e332712 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1095,30 +1095,36 @@ static int __get_hwpoison_page(struct page *page) > + if (PageCompound(page)) { > + if (PageSlab(page)) { > + return get_page_unless_zero(page); > + } else if (PageHuge(head)) { > + if (HPageFreed(head) || HPageMigratable(head)) > + return get_page_unless_zero(head); There were concerns raised wrt. memory-failure should not be fiddling with page's refcount without holding a hugetlb lock. So, if we really want to make this more stable, we might want to hold the lock here. The clearing and setting of HPageFreed happens under the lock, and for HPageMigratable that is also true for the clearing part, so I think it would be more sane to do this under the lock to close any possible race. Does it make sense?
On Wed, Apr 28, 2021 at 10:23:49AM +0200, Oscar Salvador wrote: > On Wed, Apr 28, 2021 at 04:46:54PM +0900, Naoya Horiguchi wrote: > > --- > > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Date: Wed, 28 Apr 2021 15:55:47 +0900 > > Subject: [PATCH] mm,hwpoison: fix race with compound page allocation > > > > When hugetlb page fault (under overcommiting situation) and memory_failure() > > race, VM_BUG_ON_PAGE() is triggered by the following race: > > > > CPU0: CPU1: > > > > gather_surplus_pages() > > page = alloc_surplus_huge_page() > > memory_failure_hugetlb() > > get_hwpoison_page(page) > > __get_hwpoison_page(page) > > get_page_unless_zero(page) > > zero = put_page_testzero(page) > > VM_BUG_ON_PAGE(!zero, page) > > enqueue_huge_page(h, page) > > put_page(page) > > > > __get_hwpoison_page() only checks page refcount before taking additional > > one for memory error handling, which is wrong because there's time > > windows where compound pages have non-zero refcount during initialization. > > > > So makes __get_hwpoison_page() check more page status for a few types > > of compound pages. PageSlab() check is added because otherwise > > "non anonymous thp" path is wrongly chosen for slab pages. > > Was it wrongly chosen even before? If so, maybe a Fix tag is warranted. OK, I'll check when this was introduced. > > > > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Reported-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/memory-failure.c | 48 +++++++++++++++++++++++++-------------------- > > 1 file changed, 27 insertions(+), 21 deletions(-) > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index a3659619d293..61988e332712 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1095,30 +1095,36 @@ static int __get_hwpoison_page(struct page *page) > > > + if (PageCompound(page)) { > > + if (PageSlab(page)) { > > + return get_page_unless_zero(page); > > + } else if (PageHuge(head)) { > > + if (HPageFreed(head) || HPageMigratable(head)) > > + return get_page_unless_zero(head); > > There were concerns raised wrt. memory-failure should not be fiddling with page's > refcount without holding a hugetlb lock. > So, if we really want to make this more stable, we might want to hold the lock > here. > > The clearing and setting of HPageFreed happens under the lock, and for HPageMigratable > that is also true for the clearing part, so I think it would be more sane to do > this under the lock to close any possible race. > > Does it make sense? Thanks, I'll update to do the check under hugetlb_lock. - Naoya Horiguchi
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a3659619d293..61988e332712 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1095,30 +1095,36 @@ static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); - if (!PageHuge(head) && PageTransHuge(head)) { - /* - * Non anonymous thp exists only in allocation/free time. We - * can't handle such a case correctly, so let's give it up. - * This should be better than triggering BUG_ON when kernel - * tries to touch the "partially handled" page. - */ - if (!PageAnon(head)) { - pr_err("Memory failure: %#lx: non anonymous thp\n", - page_to_pfn(page)); - return 0; + if (PageCompound(page)) { + if (PageSlab(page)) { + return get_page_unless_zero(page); + } else if (PageHuge(head)) { + if (HPageFreed(head) || HPageMigratable(head)) + return get_page_unless_zero(head); + } else if (PageTransHuge(head)) { + /* + * Non anonymous thp exists only in allocation/free time. We + * can't handle such a case correctly, so let's give it up. + * This should be better than triggering BUG_ON when kernel + * tries to touch the "partially handled" page. + */ + if (!PageAnon(head)) { + pr_err("Memory failure: %#lx: non anonymous thp\n", + page_to_pfn(page)); + return 0; + } + if (get_page_unless_zero(head)) { + if (head == compound_head(page)) + return 1; + pr_info("Memory failure: %#lx cannot catch tail\n", + page_to_pfn(page)); + put_page(head); + } } + return 0; } - if (get_page_unless_zero(head)) { - if (head == compound_head(page)) - return 1; - - pr_info("Memory failure: %#lx cannot catch tail\n", - page_to_pfn(page)); - put_page(head); - } - - return 0; + return get_page_unless_zero(page); } /*