Message ID | 20200806184923.7007-9-nao.horiguchi@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | HWPOISON: soft offline rework | expand |
On 2020-08-06 20:49, nao.horiguchi@gmail.com wrote: > From: Oscar Salvador <osalvador@suse.de> > > This patch changes the way we set and handle in-use poisoned pages. > Until now, poisoned pages were released to the buddy allocator, > trusting > that the checks that take place prior to hand the page would act as a > safe net and would skip that page. > > This has proved to be wrong, as we got some pfn walkers out there, like > compaction, that all they care is the page to be PageBuddy and be in a > freelist. > Although this might not be the only user, having poisoned pages > in the buddy allocator seems a bad idea as we should only have > free pages that are ready and meant to be used as such. > > Before explaining the taken approach, let us break down the kind > of pages we can soft offline. > > - Anonymous THP (after the split, they end up being 4K pages) > - Hugetlb > - Order-0 pages (that can be either migrated or invalited) > > * Normal pages (order-0 and anon-THP) > > - If they are clean and unmapped page cache pages, we invalidate > then by means of invalidate_inode_page(). > - If they are mapped/dirty, we do the isolate-and-migrate dance. > > Either way, do not call put_page directly from those paths. > Instead, we keep the page and send it to page_set_poison to perform > the > right handling. > > page_set_poison sets the HWPoison flag and does the last put_page. > This call to put_page is mainly to be able to call > __page_cache_release, > since this function is not exported. > > Down the chain, we placed a check for HWPoison page in > free_pages_prepare, that just skips any poisoned page, so those pages > do not end up in any pcplist/freelist. > > After that, we set the refcount on the page to 1 and we increment > the poisoned pages counter. > > We could do as we do for free pages: > 1) wait until the page hits buddy's freelists > 2) take it off > 3) flag it > > The problem is that we could race with an allocation, so by the time > we > want to take the page off the buddy, the page is already allocated, > so we > cannot soft-offline it. > This is not fatal of course, but if it is better if we can close the > race > as does not require a lot of code. > > * Hugetlb pages > > - We isolate-and-migrate them > > After the migration has been successful, we call > dissolve_free_huge_page, > and we set HWPoison on the page if we succeed. > Hugetlb has a slightly different handling though. > > While for non-hugetlb pages we cared about closing the race with an > allocation, doing so for hugetlb pages requires quite some additional > code (we would need to hook in free_huge_page and some other places). > So I decided to not make the code overly complicated and just fail > normally if the page we allocated in the meantime. > > Because of the way we handle now in-use pages, we no longer need the > put-as-isolation-migratetype dance, that was guarding for poisoned > pages > to end up in pcplists. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Hi Andrew, I just found out yesterday that the patchset Naoya sent has diverged from mine in some aspects that lead to some bugs [1]. This was due to a misunderstanding so no blame here. So, patch#8 and patch#9 need a little tweak [2]. I was wondering what do you prefer? 1) I paste the chunks that need to be changed in the offending patches (this and patch#9) 2) I just send them as standalone patches and you applied them on top I am asking this because although patches had hit -next, we might still have time to change them. If not, do not worry, I will send them as standalone. [1] https://patchwork.kernel.org/comment/23622881/ [2] https://patchwork.kernel.org/comment/23622985/ I will go ahead and paste the chunks here, in case you lean towards option#1: Patch#8: diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f68cb5e3b320..4ffaaa5c2603 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -67,11 +67,6 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { - if (release) { - put_page(page); - drain_all_pages(page_zone(page)); - } - if (hugepage_or_freepage) { /* * Doing this check for free pages is also fine since dissolve_free_huge_page @@ -89,6 +84,12 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo } SetPageHWPoison(page); + + if (release) { + put_page(page); + drain_all_pages(page_zone(page)); + } + page_ref_inc(page); num_poisoned_pages_inc(); return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0d9f9bd0e06c..8a6ab05f074c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1173,6 +1173,16 @@ static __always_inline bool free_pages_prepare(struct page *page, trace_mm_page_free(page, order); + if (unlikely(PageHWPoison(page)) && !order) { + /* + * Untie memcg state and reset page's owner + */ + if (memcg_kmem_enabled() && PageKmemcg(page)) + __memcg_kmem_uncharge_page(page, order); + reset_page_owner(page, order); + return false; + } + /* * Check tail pages before head page information is cleared to * avoid checking PageCompound for order-0 pag Patch#9: diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c3b96ca5c86d..a1bc573d91d5 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1835,7 +1835,7 @@ static int __soft_offline_page(struct page *page) if (!ret) { bool release = !huge; - if (!page_handle_poison(page, true, release)) + if (!page_handle_poison(page, huge, release)) ret = -EBUSY; } else { if (!list_empty(&pagelist) Thanks ans sorry for the inconvenience.
On Fri, 18 Sep 2020 09:58:22 +0200 osalvador@suse.de wrote: > I just found out yesterday that the patchset Naoya sent has diverged > from mine in some aspects that lead to some bugs [1]. > This was due to a misunderstanding so no blame here. > So, patch#8 and patch#9 need a little tweak [2]. > > I was wondering what do you prefer? Well. I (and I suspect the rest of the world) have lost track of what's going on here. So please let's have a full resend of the whole series?
On 2020-09-19 02:23, Andrew Morton wrote: > On Fri, 18 Sep 2020 09:58:22 +0200 osalvador@suse.de wrote: > >> I just found out yesterday that the patchset Naoya sent has diverged >> from mine in some aspects that lead to some bugs [1]. >> This was due to a misunderstanding so no blame here. >> So, patch#8 and patch#9 need a little tweak [2]. >> >> I was wondering what do you prefer? > > Well. I (and I suspect the rest of the world) have lost track of > what's going on here. > > So please let's have a full resend of the whole series? Sure, I will resend a new version squeezing all changes into it on Monday.
diff --git v5.8-rc7-mmotm-2020-07-27-18-18/include/linux/page-flags.h v5.8-rc7-mmotm-2020-07-27-18-18_patched/include/linux/page-flags.h index 9fa5d4e2d69a..d1df51ed6eeb 100644 --- v5.8-rc7-mmotm-2020-07-27-18-18/include/linux/page-flags.h +++ v5.8-rc7-mmotm-2020-07-27-18-18_patched/include/linux/page-flags.h @@ -422,14 +422,9 @@ PAGEFLAG_FALSE(Uncached) PAGEFLAG(HWPoison, hwpoison, PF_ANY) TESTSCFLAG(HWPoison, hwpoison, PF_ANY) #define __PG_HWPOISON (1UL << PG_hwpoison) -extern bool set_hwpoison_free_buddy_page(struct page *page); extern bool take_page_off_buddy(struct page *page); #else PAGEFLAG_FALSE(HWPoison) -static inline bool set_hwpoison_free_buddy_page(struct page *page) -{ - return 0; -} #define __PG_HWPOISON 0 #endif diff --git v5.8-rc7-mmotm-2020-07-27-18-18/mm/memory-failure.c v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/memory-failure.c index 0e619012e050..95bf8aa44a9a 100644 --- v5.8-rc7-mmotm-2020-07-27-18-18/mm/memory-failure.c +++ v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/memory-failure.c @@ -65,8 +65,12 @@ int sysctl_memory_failure_recovery __read_mostly = 1; atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); -static void page_handle_poison(struct page *page) +static void page_handle_poison(struct page *page, bool release) { + if (release) { + put_page(page); + drain_all_pages(page_zone(page)); + } SetPageHWPoison(page); page_ref_inc(page); num_poisoned_pages_inc(); @@ -1756,19 +1760,13 @@ static int soft_offline_huge_page(struct page *page, int flags) ret = -EIO; } else { /* - * We set PG_hwpoison only when the migration source hugepage - * was successfully dissolved, because otherwise hwpoisoned - * hugepage remains on free hugepage list, then userspace will - * find it as SIGBUS by allocation failure. That's not expected - * in soft-offlining. + * We set PG_hwpoison only when we were able to take the page + * off the buddy. */ - ret = dissolve_free_huge_page(page); - if (!ret) { - if (set_hwpoison_free_buddy_page(page)) - num_poisoned_pages_inc(); - else - ret = -EBUSY; - } + if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) + page_handle_poison(page, false); + else + ret = -EBUSY; } return ret; } @@ -1807,10 +1805,8 @@ static int __soft_offline_page(struct page *page, int flags) * would need to fix isolation locking first. */ if (ret == 1) { - put_page(page); pr_info("soft_offline: %#lx: invalidated\n", pfn); - SetPageHWPoison(page); - num_poisoned_pages_inc(); + page_handle_poison(page, true); return 0; } @@ -1841,7 +1837,9 @@ static int __soft_offline_page(struct page *page, int flags) list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, alloc_migration_target, NULL, (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE); - if (ret) { + if (!ret) { + page_handle_poison(page, true); + } else { if (!list_empty(&pagelist)) putback_movable_pages(&pagelist); @@ -1860,27 +1858,16 @@ static int __soft_offline_page(struct page *page, int flags) static int soft_offline_in_use_page(struct page *page, int flags) { int ret; - int mt; struct page *hpage = compound_head(page); if (!PageHuge(page) && PageTransHuge(hpage)) if (try_to_split_thp_page(page, "soft offline") < 0) return -EBUSY; - /* - * Setting MIGRATE_ISOLATE here ensures that the page will be linked - * to free list immediately (not via pcplist) when released after - * successful page migration. Otherwise we can't guarantee that the - * page is really free after put_page() returns, so - * set_hwpoison_free_buddy_page() highly likely fails. - */ - mt = get_pageblock_migratetype(page); - set_pageblock_migratetype(page, MIGRATE_ISOLATE); if (PageHuge(page)) ret = soft_offline_huge_page(page, flags); else ret = __soft_offline_page(page, flags); - set_pageblock_migratetype(page, mt); return ret; } @@ -1889,7 +1876,7 @@ static int soft_offline_free_page(struct page *page) int rc = -EBUSY; if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) { - page_handle_poison(page); + page_handle_poison(page, false); rc = 0; } diff --git v5.8-rc7-mmotm-2020-07-27-18-18/mm/migrate.c v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/migrate.c index 2c809ffcf0e1..d7a9379c343b 100644 --- v5.8-rc7-mmotm-2020-07-27-18-18/mm/migrate.c +++ v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/migrate.c @@ -1222,16 +1222,11 @@ static int unmap_and_move(new_page_t get_new_page, * we want to retry. */ if (rc == MIGRATEPAGE_SUCCESS) { - put_page(page); - if (reason == MR_MEMORY_FAILURE) { + if (reason != MR_MEMORY_FAILURE) /* - * Set PG_HWPoison on just freed page - * intentionally. Although it's rather weird, - * it's how HWPoison flag works at the moment. + * We release the page in page_handle_poison. */ - if (set_hwpoison_free_buddy_page(page)) - num_poisoned_pages_inc(); - } + put_page(page); } else { if (rc != -EAGAIN) { if (likely(!__PageMovable(page))) { diff --git v5.8-rc7-mmotm-2020-07-27-18-18/mm/page_alloc.c v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/page_alloc.c index aab89f7db4ac..e4896e674594 100644 --- v5.8-rc7-mmotm-2020-07-27-18-18/mm/page_alloc.c +++ v5.8-rc7-mmotm-2020-07-27-18-18_patched/mm/page_alloc.c @@ -8843,32 +8843,4 @@ bool take_page_off_buddy(struct page *page) spin_unlock_irqrestore(&zone->lock, flags); return ret; } - -/* - * Set PG_hwpoison flag if a given page is confirmed to be a free page. This - * test is performed under the zone lock to prevent a race against page - * allocation. - */ -bool set_hwpoison_free_buddy_page(struct page *page) -{ - struct zone *zone = page_zone(page); - unsigned long pfn = page_to_pfn(page); - unsigned long flags; - unsigned int order; - bool hwpoisoned = false; - - spin_lock_irqsave(&zone->lock, flags); - for (order = 0; order < MAX_ORDER; order++) { - struct page *page_head = page - (pfn & ((1 << order) - 1)); - - if (PageBuddy(page_head) && page_order(page_head) >= order) { - if (!TestSetPageHWPoison(page)) - hwpoisoned = true; - break; - } - } - spin_unlock_irqrestore(&zone->lock, flags); - - return hwpoisoned; -} #endif