diff mbox

Re:[RFC] a question about reuse hwpoison page in soft_offline_page()

Message ID 518e6b02-47ef-4ba8-ab98-8d807e2de7d5.xishi.qiuxishi@alibaba-inc.com (mailing list archive)
State New, archived
Headers show

Commit Message

=?UTF-8?B?6KOY56iA55+zKOeogOefsyk=?= July 9, 2018, 5:43 a.m. UTC
Hi Naoya,

Shall we fix this path too? It also will set hwpoison before dissolve_free_huge_page().

soft_offline_huge_page
    migrate_pages
        unmap_and_move_huge_page
            if (reason == MR_MEMORY_FAILURE && !test_set_page_hwpoison(hpage))
    dissolve_free_huge_page

Thanks,
Xishi QiuOn Mon, Jul 09, 2018 at 10:31:25AM +0800, 裘稀石(稀石) wrote:
> Hi Naoya,
> 
> I think the double check can not fix the problem as I said in another email.
> If someone mmap before soft offline, so the page_count(head) is still zero
> in soft offline, then hwpoison flag set and it can not be alloced again in
> dequeue_huge_page_node_exact() during page fault, so page fault return
> no-mem, and someone will be killed (not mce kill).
> 
> How about just set hwpoison flag after soft_offline_free_page - 
> dissolve_free_huge_page
> successfully? It will fix the both two problems (mce kill and no-mem kill).

Thank you for elaborating, you're right.
So do you like a fix like this?

---
diff mbox

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d34225c1cb5b..3c9ce4c05f1b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1479,22 +1479,20 @@  static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 /*
  * Dissolve a given free hugepage into free buddy pages. This function does
  * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * number of free hugepages would be reduced below the number of reserved
- * hugepages.
+ * dissolution fails because a give page is not a free hugepage, or because
+ * free hugepages are fully reserved.
  */
 int dissolve_free_huge_page(struct page *page)
 {
- int rc = 0;
+ int rc = -EBUSY;

  spin_lock(&hugetlb_lock);
  if (PageHuge(page) && !page_count(page)) {
   struct page *head = compound_head(page);
   struct hstate *h = page_hstate(head);
   int nid = page_to_nid(head);
-  if (h->free_huge_pages - h->resv_huge_pages == 0) {
-   rc = -EBUSY;
+  if (h->free_huge_pages - h->resv_huge_pages == 0)
    goto out;
-  }
   /*
    * Move PageHWPoison flag from head page to the raw error page,
    * which makes any subpages rather than the error page reusable.
@@ -1508,6 +1506,7 @@  int dissolve_free_huge_page(struct page *page)
   h->free_huge_pages_node[nid]--;
   h->max_huge_pages--;
   update_and_free_page(h, head);
+  rc = 0;
  }
 out:
  spin_unlock(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..e4c7e3ec7b10 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1715,13 +1715,13 @@  static int soft_offline_in_use_page(struct page *page, int flags)

 static void soft_offline_free_page(struct page *page)
 {
+ int rc = 0;
  struct page *head = compound_head(page);

- if (!TestSetPageHWPoison(head)) {
+ if (PageHuge(head))
+  rc = dissolve_free_huge_page(page);
+ if (!rc && !TestSetPageHWPoison(head))
   num_poisoned_pages_inc();
-  if (PageHuge(head))
-   dissolve_free_huge_page(page);
- }
 }

 /**