Message ID | 20220804122819.2917249-1-luofei@unicloud.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] mm, hwpoison, hugetlb: Check hugetlb head page hwpoison flag when unpoison page | expand |
On Thu, Aug 04, 2022 at 08:28:19AM -0400, luofei wrote: > When software-poison a huge page, if dissolve_free_huge_page() failed, > the huge page will be added to hugepage_freelists. In this case, the > head page will hold the hwpoison flag, but the real poisoned tail page > hwpoison flag is not set, this will cause unpoison_memory() fail to > unpoison the previously poisoned page. Hi luofei, When you try to unpoison a hwpoisoned hugepage, you just have to pass the pfn of the head page, not the pfn of raw poisoned subpage. Note that the position of raw error page is not exposed to userspace (dmesg shows it, but saving and parsing it for unpoison is not that useful) and the related utilities like page-types only checks PageHWpoison flag to find error pages, so it seems to me that you're introducing an inconsistent assumption. Thanks, Naoya Horiguchi > > So add a check on hugetlb head page, and also need to ensure the > previously poisoned tail page in huge page raw_hwp_list. > > Signed-off-by: luofei <luofei@unicloud.com> > --- > mm/memory-failure.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 14439806b5ef..92dbeaa24afb 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2293,6 +2293,28 @@ core_initcall(memory_failure_init); > pr_info(fmt, pfn); \ > }) > > +static bool hugetlb_page_head_poison(struct page *hpage, struct page *page) > +{ > + struct llist_head *head; > + struct llist_node *t, *tnode; > + struct raw_hwp_page *p; > + > + if (!PageHuge(page) || !PageHWPoison(hpage) || !HPageFreed(hpage)) > + return false; > + > + if (HPageRawHwpUnreliable(hpage)) > + return false; > + > + head = raw_hwp_list_head(hpage); > + llist_for_each_safe(tnode, t, head->first) { > + p = container_of(tnode, struct raw_hwp_page, node); > + if (p->page == page) > + return true; > + } > + > + return false; > +} > + > /** > * unpoison_memory - Unpoison a previously poisoned page > * @pfn: Page number of the to be unpoisoned page > @@ -2330,7 +2352,7 @@ int unpoison_memory(unsigned long pfn) > goto unlock_mutex; > } > > - if (!PageHWPoison(p)) { > + if (!PageHWPoison(p) && !hugetlb_page_head_poison(page, p)) { > unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", > pfn, &unpoison_rs); > goto unlock_mutex; > -- > 2.27.0
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 14439806b5ef..92dbeaa24afb 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2293,6 +2293,28 @@ core_initcall(memory_failure_init); pr_info(fmt, pfn); \ }) +static bool hugetlb_page_head_poison(struct page *hpage, struct page *page) +{ + struct llist_head *head; + struct llist_node *t, *tnode; + struct raw_hwp_page *p; + + if (!PageHuge(page) || !PageHWPoison(hpage) || !HPageFreed(hpage)) + return false; + + if (HPageRawHwpUnreliable(hpage)) + return false; + + head = raw_hwp_list_head(hpage); + llist_for_each_safe(tnode, t, head->first) { + p = container_of(tnode, struct raw_hwp_page, node); + if (p->page == page) + return true; + } + + return false; +} + /** * unpoison_memory - Unpoison a previously poisoned page * @pfn: Page number of the to be unpoisoned page @@ -2330,7 +2352,7 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; } - if (!PageHWPoison(p)) { + if (!PageHWPoison(p) && !hugetlb_page_head_poison(page, p)) { unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", pfn, &unpoison_rs); goto unlock_mutex;
When software-poison a huge page, if dissolve_free_huge_page() failed, the huge page will be added to hugepage_freelists. In this case, the head page will hold the hwpoison flag, but the real poisoned tail page hwpoison flag is not set, this will cause unpoison_memory() fail to unpoison the previously poisoned page. So add a check on hugetlb head page, and also need to ensure the previously poisoned tail page in huge page raw_hwp_list. Signed-off-by: luofei <luofei@unicloud.com> --- mm/memory-failure.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)