Message ID | 20210617092626.291006-1-nao.horiguchi@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [mmotm,v1] mm/hwpoison: disable pcp for page_handle_poison() | expand |
On 17.06.21 11:26, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Recent changes by patch "mm/page_alloc: allow high-order pages to be > stored on the per-cpu lists" makes kernels determine whether to use pcp > by pcp_allowed_order(), which breaks soft-offline for hugetlb pages. > > Soft-offline dissolves a migration source page, then removes it from > buddy free list, so it's assumed that any subpage of the soft-offlined > hugepage are recognized as a buddy page just after returning from > dissolve_free_huge_page(). pcp_allowed_order() returns true for > hugetlb, so this assumption is no longer true. > > So disable pcp during dissolve_free_huge_page() and > take_page_off_buddy() to prevent soft-offlined hugepages from linking to > pcp lists. Soft-offline should not be common events so the impact on > performance should be minimal. And I think that the optimization of > Mel's patch could benefit to hugetlb so zone_pcp_disable() is called > only in hwpoison context. Mel, Oscar, does alloc_contig_range() now have similar issues or is it avoided because the pageblock(s) are set MIGRATE_ISOLATE?
On Thu, Jun 17, 2021 at 06:26:26PM +0900, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Recent changes by patch "mm/page_alloc: allow high-order pages to be > stored on the per-cpu lists" makes kernels determine whether to use pcp > by pcp_allowed_order(), which breaks soft-offline for hugetlb pages. > > Soft-offline dissolves a migration source page, then removes it from > buddy free list, so it's assumed that any subpage of the soft-offlined > hugepage are recognized as a buddy page just after returning from > dissolve_free_huge_page(). pcp_allowed_order() returns true for > hugetlb, so this assumption is no longer true. > > So disable pcp during dissolve_free_huge_page() and > take_page_off_buddy() to prevent soft-offlined hugepages from linking to > pcp lists. Soft-offline should not be common events so the impact on > performance should be minimal. And I think that the optimization of > Mel's patch could benefit to hugetlb so zone_pcp_disable() is called > only in hwpoison context. > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> I think this is ok, it'll remove a page that is to be poisoned from the PCP lists and put them back in the page allocator. It's a heavy but rare operation and identifying what PCP list a free page is on would be tricky so Acked-by: Mel Gorman <mgorman@techsingularity.net> The alternative I guess would be specical casing update_and_free_page to bypass the PCP but it'd be clumsy from an API point of view and I don't think it's worth the effort.
On Thu, Jun 17, 2021 at 11:28:49AM +0200, David Hildenbrand wrote: > On 17.06.21 11:26, Naoya Horiguchi wrote: > > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > > > Recent changes by patch "mm/page_alloc: allow high-order pages to be > > stored on the per-cpu lists" makes kernels determine whether to use pcp > > by pcp_allowed_order(), which breaks soft-offline for hugetlb pages. > > > > Soft-offline dissolves a migration source page, then removes it from > > buddy free list, so it's assumed that any subpage of the soft-offlined > > hugepage are recognized as a buddy page just after returning from > > dissolve_free_huge_page(). pcp_allowed_order() returns true for > > hugetlb, so this assumption is no longer true. > > > > So disable pcp during dissolve_free_huge_page() and > > take_page_off_buddy() to prevent soft-offlined hugepages from linking to > > pcp lists. Soft-offline should not be common events so the impact on > > performance should be minimal. And I think that the optimization of > > Mel's patch could benefit to hugetlb so zone_pcp_disable() is called > > only in hwpoison context. > > Mel, Oscar, does alloc_contig_range() now have similar issues or is it > avoided because the pageblock(s) are set MIGRATE_ISOLATE? > I'd expect MIGRATE_ISOLATE to be sufficient because they should bypass the PCP list in free_unref_page.
On 17.06.21 11:26, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Recent changes by patch "mm/page_alloc: allow high-order pages to be > stored on the per-cpu lists" makes kernels determine whether to use pcp > by pcp_allowed_order(), which breaks soft-offline for hugetlb pages. > > Soft-offline dissolves a migration source page, then removes it from > buddy free list, so it's assumed that any subpage of the soft-offlined > hugepage are recognized as a buddy page just after returning from > dissolve_free_huge_page(). pcp_allowed_order() returns true for > hugetlb, so this assumption is no longer true. > > So disable pcp during dissolve_free_huge_page() and > take_page_off_buddy() to prevent soft-offlined hugepages from linking to > pcp lists. Soft-offline should not be common events so the impact on > performance should be minimal. And I think that the optimization of > Mel's patch could benefit to hugetlb so zone_pcp_disable() is called > only in hwpoison context. > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > --- > mm/memory-failure.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c > index 1842822a10da..593079766655 100644 > --- v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c > +++ v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c > @@ -66,6 +66,19 @@ int sysctl_memory_failure_recovery __read_mostly = 1; > > atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); > > +static bool __page_handle_poison(struct page *page) > +{ > + bool ret; > + > + zone_pcp_disable(page_zone(page)); > + ret = dissolve_free_huge_page(page); > + if (!ret) > + ret = take_page_off_buddy(page); > + zone_pcp_enable(page_zone(page)); > + > + return ret; > +} > + > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) > { > if (hugepage_or_freepage) { > @@ -73,7 +86,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo > * Doing this check for free pages is also fine since dissolve_free_huge_page > * returns 0 for non-hugetlb pages as well. > */ > - if (dissolve_free_huge_page(page) || !take_page_off_buddy(page)) > + if (!__page_handle_poison(page)) > /* > * We could fail to take off the target page from buddy > * for example due to racy page allocation, but that's > @@ -986,7 +999,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) > */ > if (PageAnon(hpage)) > put_page(hpage); > - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { > + if (__page_handle_poison(p)) { > page_ref_inc(p); > res = MF_RECOVERED; > } > @@ -1441,7 +1454,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > res = get_hwpoison_page(p, flags); > if (!res) { > res = MF_FAILED; > - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { > + if (__page_handle_poison(p)) { > page_ref_inc(p); > res = MF_RECOVERED; > } > Just to make sure: all call paths are fine that we are taking a mutex, right? I can see that get_hwpoison_page() already disables the PCP. LGTM
On 6/17/21 5:21 AM, David Hildenbrand wrote: > On 17.06.21 11:26, Naoya Horiguchi wrote: >> From: Naoya Horiguchi <naoya.horiguchi@nec.com> >> >> Recent changes by patch "mm/page_alloc: allow high-order pages to be >> stored on the per-cpu lists" makes kernels determine whether to use pcp >> by pcp_allowed_order(), which breaks soft-offline for hugetlb pages. >> >> Soft-offline dissolves a migration source page, then removes it from >> buddy free list, so it's assumed that any subpage of the soft-offlined >> hugepage are recognized as a buddy page just after returning from >> dissolve_free_huge_page(). pcp_allowed_order() returns true for >> hugetlb, so this assumption is no longer true. >> >> So disable pcp during dissolve_free_huge_page() and >> take_page_off_buddy() to prevent soft-offlined hugepages from linking to >> pcp lists. Soft-offline should not be common events so the impact on >> performance should be minimal. And I think that the optimization of >> Mel's patch could benefit to hugetlb so zone_pcp_disable() is called >> only in hwpoison context. >> >> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> >> --- >> mm/memory-failure.c | 19 ++++++++++++++++--- >> 1 file changed, 16 insertions(+), 3 deletions(-) >> >> diff --git v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c >> index 1842822a10da..593079766655 100644 >> --- v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c >> +++ v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c >> @@ -66,6 +66,19 @@ int sysctl_memory_failure_recovery __read_mostly = 1; >> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); >> +static bool __page_handle_poison(struct page *page) >> +{ >> + bool ret; >> + >> + zone_pcp_disable(page_zone(page)); >> + ret = dissolve_free_huge_page(page); >> + if (!ret) >> + ret = take_page_off_buddy(page); >> + zone_pcp_enable(page_zone(page)); >> + >> + return ret; >> +} >> + >> static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) >> { >> if (hugepage_or_freepage) { >> @@ -73,7 +86,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo >> * Doing this check for free pages is also fine since dissolve_free_huge_page >> * returns 0 for non-hugetlb pages as well. >> */ >> - if (dissolve_free_huge_page(page) || !take_page_off_buddy(page)) >> + if (!__page_handle_poison(page)) >> /* >> * We could fail to take off the target page from buddy >> * for example due to racy page allocation, but that's >> @@ -986,7 +999,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) >> */ >> if (PageAnon(hpage)) >> put_page(hpage); >> - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { >> + if (__page_handle_poison(p)) { >> page_ref_inc(p); >> res = MF_RECOVERED; >> } >> @@ -1441,7 +1454,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) >> res = get_hwpoison_page(p, flags); >> if (!res) { >> res = MF_FAILED; >> - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { >> + if (__page_handle_poison(p)) { >> page_ref_inc(p); >> res = MF_RECOVERED; >> } >> > > Just to make sure: all call paths are fine that we are taking a mutex, right? > That should be the case. dissolve_free_huge_page can sleep, so if not we are already broken.
diff --git v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c index 1842822a10da..593079766655 100644 --- v5.13-rc6-mmotm-2021-06-15-20-24/mm/memory-failure.c +++ v5.13-rc6-mmotm-2021-06-15-20-24_patched/mm/memory-failure.c @@ -66,6 +66,19 @@ int sysctl_memory_failure_recovery __read_mostly = 1; atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); +static bool __page_handle_poison(struct page *page) +{ + bool ret; + + zone_pcp_disable(page_zone(page)); + ret = dissolve_free_huge_page(page); + if (!ret) + ret = take_page_off_buddy(page); + zone_pcp_enable(page_zone(page)); + + return ret; +} + static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { if (hugepage_or_freepage) { @@ -73,7 +86,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo * Doing this check for free pages is also fine since dissolve_free_huge_page * returns 0 for non-hugetlb pages as well. */ - if (dissolve_free_huge_page(page) || !take_page_off_buddy(page)) + if (!__page_handle_poison(page)) /* * We could fail to take off the target page from buddy * for example due to racy page allocation, but that's @@ -986,7 +999,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) */ if (PageAnon(hpage)) put_page(hpage); - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { + if (__page_handle_poison(p)) { page_ref_inc(p); res = MF_RECOVERED; } @@ -1441,7 +1454,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) res = get_hwpoison_page(p, flags); if (!res) { res = MF_FAILED; - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { + if (__page_handle_poison(p)) { page_ref_inc(p); res = MF_RECOVERED; }