diff mbox series

[v1] mm: hugetlb: soft-offline: fix wrong return value of soft offline

Message ID 1558937200-18544-1-git-send-email-n-horiguchi@ah.jp.nec.com (mailing list archive)
State New, archived
Headers show
Series [v1] mm: hugetlb: soft-offline: fix wrong return value of soft offline | expand

Commit Message

Naoya Horiguchi May 27, 2019, 6:06 a.m. UTC
Soft offline events for hugetlb pages return -EBUSY when page migration
succeeded and dissolve_free_huge_page() failed, which can happen when
there're surplus hugepages. We should judge pass/fail of soft offline by
checking whether the raw error page was finally contained or not (i.e.
the result of set_hwpoison_free_buddy_page()), so this behavior is wrong.

This problem was introduced by the following change of commit 6bc9b56433b76
("mm: fix race on soft-offlining"):

                    if (ret > 0)
                            ret = -EIO;
            } else {
    -               if (PageHuge(page))
    -                       dissolve_free_huge_page(page);
    +               /*
    +                * We set PG_hwpoison only when the migration source hugepage
    +                * was successfully dissolved, because otherwise hwpoisoned
    +                * hugepage remains on free hugepage list, then userspace will
    +                * find it as SIGBUS by allocation failure. That's not expected
    +                * in soft-offlining.
    +                */
    +               ret = dissolve_free_huge_page(page);
    +               if (!ret) {
    +                       if (set_hwpoison_free_buddy_page(page))
    +                               num_poisoned_pages_inc();
    +               }
            }
            return ret;
     }

, so a simple fix is to restore the PageHuge precheck, but my code
reading shows that we already have PageHuge check in
dissolve_free_huge_page() with hugetlb_lock, which is better place to
check it.  And currently dissolve_free_huge_page() returns -EBUSY for
!PageHuge but that's simply wrong because that that case should be
considered as success (meaning that "the given hugetlb was already
dissolved.")

This change affects other callers of dissolve_free_huge_page(),
which are also cleaned up by this patch.

Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: <stable@vger.kernel.org> # v4.19+
---
 mm/hugetlb.c        | 15 +++++++++------
 mm/memory-failure.c |  7 +++----
 2 files changed, 12 insertions(+), 10 deletions(-)

Comments

Mike Kravetz May 29, 2019, 6:44 p.m. UTC | #1
On 5/26/19 11:06 PM, Naoya Horiguchi wrote:
> Soft offline events for hugetlb pages return -EBUSY when page migration
> succeeded and dissolve_free_huge_page() failed, which can happen when
> there're surplus hugepages. We should judge pass/fail of soft offline by
> checking whether the raw error page was finally contained or not (i.e.
> the result of set_hwpoison_free_buddy_page()), so this behavior is wrong.
> 
> This problem was introduced by the following change of commit 6bc9b56433b76
> ("mm: fix race on soft-offlining"):
> 
>                     if (ret > 0)
>                             ret = -EIO;
>             } else {
>     -               if (PageHuge(page))
>     -                       dissolve_free_huge_page(page);
>     +               /*
>     +                * We set PG_hwpoison only when the migration source hugepage
>     +                * was successfully dissolved, because otherwise hwpoisoned
>     +                * hugepage remains on free hugepage list, then userspace will
>     +                * find it as SIGBUS by allocation failure. That's not expected
>     +                * in soft-offlining.
>     +                */
>     +               ret = dissolve_free_huge_page(page);
>     +               if (!ret) {
>     +                       if (set_hwpoison_free_buddy_page(page))
>     +                               num_poisoned_pages_inc();
>     +               }
>             }
>             return ret;
>      }
> 
> , so a simple fix is to restore the PageHuge precheck, but my code
> reading shows that we already have PageHuge check in
> dissolve_free_huge_page() with hugetlb_lock, which is better place to
> check it.  And currently dissolve_free_huge_page() returns -EBUSY for
> !PageHuge but that's simply wrong because that that case should be
> considered as success (meaning that "the given hugetlb was already
> dissolved.")

Hello Naoya,

I am having a little trouble understanding the situation.  The code above is
in the routine soft_offline_huge_page, and occurs immediately after a call to
migrate_pages() with 'page' being the only on the list of pages to be migrated.
In addition, since we are in soft_offline_huge_page, we know that page is
a huge page (PageHuge) before the call to migrate_pages.

IIUC, the issue is that the migrate_pages call results in 'page' being
dissolved into regular base pages.  Therefore, the call to
dissolve_free_huge_page returns -EBUSY and we never end up setting PageHWPoison
on the (base) page which had the error.

It seems that for the original page to be dissolved, it must go through the
free_huge_page routine.  Once that happens, it is possible for the (dissolved)
pages to be allocated again.  Is that just a known race, or am I missing
something?

> This change affects other callers of dissolve_free_huge_page(),
> which are also cleaned up by this patch.

It may just be me, but I am having a hard time separating the fix for this
issue from the change to the dissolve_free_huge_page routine.  Would it be
more clear or possible to create separate patches for these?
Naoya Horiguchi May 30, 2019, 1:35 a.m. UTC | #2
Hi Mike,

On Wed, May 29, 2019 at 11:44:50AM -0700, Mike Kravetz wrote:
> On 5/26/19 11:06 PM, Naoya Horiguchi wrote:
> > Soft offline events for hugetlb pages return -EBUSY when page migration
> > succeeded and dissolve_free_huge_page() failed, which can happen when
> > there're surplus hugepages. We should judge pass/fail of soft offline by
> > checking whether the raw error page was finally contained or not (i.e.
> > the result of set_hwpoison_free_buddy_page()), so this behavior is wrong.
> > 
> > This problem was introduced by the following change of commit 6bc9b56433b76
> > ("mm: fix race on soft-offlining"):
> > 
> >                     if (ret > 0)
> >                             ret = -EIO;
> >             } else {
> >     -               if (PageHuge(page))
> >     -                       dissolve_free_huge_page(page);
> >     +               /*
> >     +                * We set PG_hwpoison only when the migration source hugepage
> >     +                * was successfully dissolved, because otherwise hwpoisoned
> >     +                * hugepage remains on free hugepage list, then userspace will
> >     +                * find it as SIGBUS by allocation failure. That's not expected
> >     +                * in soft-offlining.
> >     +                */
> >     +               ret = dissolve_free_huge_page(page);
> >     +               if (!ret) {
> >     +                       if (set_hwpoison_free_buddy_page(page))
> >     +                               num_poisoned_pages_inc();
> >     +               }
> >             }
> >             return ret;
> >      }
> > 
> > , so a simple fix is to restore the PageHuge precheck, but my code
> > reading shows that we already have PageHuge check in
> > dissolve_free_huge_page() with hugetlb_lock, which is better place to
> > check it.  And currently dissolve_free_huge_page() returns -EBUSY for
> > !PageHuge but that's simply wrong because that that case should be
> > considered as success (meaning that "the given hugetlb was already
> > dissolved.")
> 
> Hello Naoya,
> 
> I am having a little trouble understanding the situation.  The code above is
> in the routine soft_offline_huge_page, and occurs immediately after a call to
> migrate_pages() with 'page' being the only on the list of pages to be migrated.
> In addition, since we are in soft_offline_huge_page, we know that page is
> a huge page (PageHuge) before the call to migrate_pages.
> 
> IIUC, the issue is that the migrate_pages call results in 'page' being
> dissolved into regular base pages.  Therefore, the call to
> dissolve_free_huge_page returns -EBUSY and we never end up setting PageHWPoison
> on the (base) page which had the error.
> 
> It seems that for the original page to be dissolved, it must go through the
> free_huge_page routine.  Once that happens, it is possible for the (dissolved)
> pages to be allocated again.  Is that just a known race, or am I missing
> something?

No, your understanding is right.  I found that the last (and most important)
part of patch description ("this behavior is wrong") might be wrong.
Sorry about that and let me correct myself:

  - before commit 6bc9b56433b76, the return value of soft offline is the
    return of migrate_page(). dissolve_free_huge_page()'s return value is
    ignored.

  - after commit 6bc9b56433b76 soft_offline_huge_page() returns success
    only dissolve_free_huge_page() returns success.

This change is *mainly OK* (meaning nothing is broken), but there still
remains the room of improvement, that is, even in "dissolved from
free_huge_page()" case, we can try to call set_hwpoison_free_buddy_page() to
contain the 4kB error page, but we don't try it now because
dissolve_free_huge_page() return -EBUSY for !PageHuge case.

> 
> > This change affects other callers of dissolve_free_huge_page(),
> > which are also cleaned up by this patch.
> 
> It may just be me, but I am having a hard time separating the fix for this
> issue from the change to the dissolve_free_huge_page routine.  Would it be
> more clear or possible to create separate patches for these?

Yes, the change is actually an 'improvement' purely related to hugetlb,
and seems not a 'bug fix'. So I'll update the description.
Maybe no need to separate to patches.

Thanks,
Naoya Horiguchi
diff mbox series

Patch

diff --git v5.1-rc6-mmotm-2019-04-25-16-30/mm/hugetlb.c v5.1-rc6-mmotm-2019-04-25-16-30_patched/mm/hugetlb.c
index bf58cee..385899f 100644
--- v5.1-rc6-mmotm-2019-04-25-16-30/mm/hugetlb.c
+++ v5.1-rc6-mmotm-2019-04-25-16-30_patched/mm/hugetlb.c
@@ -1518,7 +1518,12 @@  int dissolve_free_huge_page(struct page *page)
 	int rc = -EBUSY;
 
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
+	if (!PageHuge(page)) {
+		rc = 0;
+		goto out;
+	}
+
+	if (!page_count(page)) {
 		struct page *head = compound_head(page);
 		struct hstate *h = page_hstate(head);
 		int nid = page_to_nid(head);
@@ -1563,11 +1568,9 @@  int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
-		if (PageHuge(page) && !page_count(page)) {
-			rc = dissolve_free_huge_page(page);
-			if (rc)
-				break;
-		}
+		rc = dissolve_free_huge_page(page);
+		if (rc)
+			break;
 	}
 
 	return rc;
diff --git v5.1-rc6-mmotm-2019-04-25-16-30/mm/memory-failure.c v5.1-rc6-mmotm-2019-04-25-16-30_patched/mm/memory-failure.c
index fc8b517..3a83e27 100644
--- v5.1-rc6-mmotm-2019-04-25-16-30/mm/memory-failure.c
+++ v5.1-rc6-mmotm-2019-04-25-16-30_patched/mm/memory-failure.c
@@ -1733,6 +1733,8 @@  static int soft_offline_huge_page(struct page *page, int flags)
 		if (!ret) {
 			if (set_hwpoison_free_buddy_page(page))
 				num_poisoned_pages_inc();
+			else
+				ret = -EBUSY;
 		}
 	}
 	return ret;
@@ -1857,11 +1859,8 @@  static int soft_offline_in_use_page(struct page *page, int flags)
 
 static int soft_offline_free_page(struct page *page)
 {
-	int rc = 0;
-	struct page *head = compound_head(page);
+	int rc = dissolve_free_huge_page(page);
 
-	if (PageHuge(head))
-		rc = dissolve_free_huge_page(page);
 	if (!rc) {
 		if (set_hwpoison_free_buddy_page(page))
 			num_poisoned_pages_inc();