Message ID | 20200917081049.27428-1-osalvador@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | HWpoison: further fixes and cleanups | expand |
On Thu, Sep 17, 2020 at 10:10:42AM +0200, Oscar Salvador wrote: > This patchset includes some fixups (patch#1,patch#2 and patch#3) > and some cleanups (patch#4-7). > > Patch#1 is a fix to take off HWPoison pages off a buddy freelist since > it can lead us to having HWPoison pages back in the game without no one > noticing it. > So fix it (we did that already for soft_offline_page [1]). > > Patch#2 is fixing a rebasing problem that made the call > to page_handle_poison from _soft_offline_page set the > wrong value for hugepage_or_freepage. [2] > > Patch#3 is not really a fixup, but tries to re-handle a page > in case it was allocated under us. Thanks for the update. This patchset triggers the following BUG_ON() with Aristeu's workload: [ 1010.400900] Soft offlining pfn 0xbff8c at process virtual address 0x7fe6c99c8000 [ 1010.402931] page:00000000f5670686 refcount:1 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0xbff89 [ 1010.405604] flags: 0xfffe000800000(hwpoison) [ 1010.406755] raw: 000fffe000800000 ffffcddf029ab848 ffffcddf02ff9448 0000000000000000 [ 1010.408824] raw: 0000000000000001 0000000000000000 00000001ffffff7f 0000000000000000 [ 1010.410877] page dumped because: VM_BUG_ON_PAGE(page_count(buddy) != 0) [ 1010.412673] ------------[ cut here ]------------ [ 1010.413930] kernel BUG at mm/page_alloc.c:800! [ 1010.415143] invalid opcode: 0000 [#1] SMP PTI [ 1010.416320] CPU: 3 PID: 1340 Comm: kworker/3:0 Not tainted 5.9.0-rc2-mm1-v5.9-rc2-200917-1952-00212-gf1a0765b04cb+ #33 [ 1010.419101] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 1010.422645] Workqueue: mm_percpu_wq drain_local_pages_wq [ 1010.424075] RIP: 0010:__free_one_page+0x552/0x580 [ 1010.425344] Code: 48 c7 c6 90 6c 0f 84 4c 89 e7 e8 69 7e fd ff 0f 0b 0f 1f 44 00 00 e9 e5 fc ff ff 48 c7 c6 c8 f3 11 84 4c 89 f7 e8 4e 7e fd ff <0f> 0b 83 fb 08 0f 86 cb fc ff ff 48 83 c4 20 5b 5d 41 5c 41 5d 41 [ 1010.430231] RSP: 0018:ffffaa96c171fda0 EFLAGS: 00010082 [ 1010.431651] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [ 1010.433598] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8dc8bbd18d08 [ 1010.435627] RBP: 00000000000bff88 R08: ffff8dc8bbd18d00 R09: 6573756163656220 [ 1010.437544] R10: 6163656220646570 R11: 6d75642065676170 R12: ffffcddf02ffe200 [ 1010.439376] R13: 00000000000bff89 R14: ffffcddf02ffe240 R15: ffff8dc7bffd5680 [ 1010.441271] FS: 0000000000000000(0000) GS:ffff8dc8bbd00000(0000) knlGS:0000000000000000 [ 1010.443349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1010.444892] CR2: 00007f6b69f92000 CR3: 0000000139c4a000 CR4: 00000000001506e0 [ 1010.446746] Call Trace: [ 1010.447424] free_pcppages_bulk+0x1d4/0x2c0 [ 1010.448553] drain_pages_zone+0x42/0x50 [ 1010.449585] drain_local_pages_wq+0xe/0x10 [ 1010.450702] process_one_work+0x1b0/0x360 [ 1010.451769] worker_thread+0x50/0x3a0 [ 1010.452940] ? process_one_work+0x360/0x360 [ 1010.454072] kthread+0xfe/0x140 [ 1010.454989] ? kthread_park+0x90/0x90 [ 1010.455970] ret_from_fork+0x22/0x30 This message seems to show that the pages to be moved to buddy have refcount. Could you review how changes in v3 -> v4 make it? Here's my reproducer. [build1:~]$ cat test_ksm_madv_soft.c #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> #include <sys/types.h> #include <errno.h> #include <stdlib.h> #define MADV_SOFT_OFFLINE 101 #define err(x) perror(x),exit(EXIT_FAILURE) int main() { int ret; int size = 100000*0x1000; char *p1 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); printf("p1 %p\n", p1); char *p2 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); printf("p2 %p\n", p2); ret = madvise(p1, size, MADV_MERGEABLE); printf("madvise(p1) %d\n", ret); ret = madvise(p2, size, MADV_MERGEABLE); printf("madvise(p2) %d\n", ret); printf("writing p1 ... "); memset(p1, 'a', size); printf("done\n"); printf("writing p2 ... "); memset(p2, 'a', size); printf("done\n"); usleep(10000000); printf("soft offline\n"); ret = madvise(p1, size, MADV_SOFT_OFFLINE); printf("soft offline returns %d\n", ret); if (ret) err("madvise"); madvise(p1, size, MADV_UNMERGEABLE); madvise(p2, size, MADV_UNMERGEABLE); printf("OK\n"); } [build1:~/upstream/mm_regression/lib]$ cat tmp_run_ksm_madv.sh rm test_ksm_madv_soft 2> /dev/null gcc -o test_ksm_madv_soft test_ksm_madv_soft.c || exit 1 echo 0 > /sys/kernel/mm/ksm/sleep_millisecs echo 100000 > /sys/kernel/mm/ksm/pages_to_scan echo 100000 > /sys/kernel/mm/ksm/max_page_sharing echo 2 > /sys/kernel/mm/ksm/run echo 1 > /sys/kernel/mm/ksm/run ./test_ksm_madv_soft Thanks, Naoya Horiguchi
On Thu, Sep 17, 2020 at 11:39:21AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > Thanks for the update. > This patchset triggers the following BUG_ON() with Aristeu's workload: I just took a look, but I found more oddities. The patchset you sent seems to be a bit buggy and it is missing some things my patchset contains, e.g: static __always_inline bool free_pages_prepare(struct page *page, unsigned int order, bool check_free) { ... if (unlikely(PageHWPoison(page)) && !order) { ... return false; } Moreover, in page_handle_poison, you managed it wrong because: static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { if (release) { put_page(page); drain_all_pages(page_zone(page)); } ... SetPageHWPoison(page); page_ref_inc(page); 1) You are freeing the page first, which means it goes to buddy 2) Then you set it as poisoned and you update its refcount. Now we have a page sitting in Buddy with a refcount = 1 and poisoned, and that is quite wrong. Honestly, I do not know how your patchset diverged so much from mine, but is not right. I will go over my patchset and yours and compare/fix things.
On Thu, Sep 17, 2020 at 03:09:52PM +0200, Oscar Salvador wrote: > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) > { > if (release) { > put_page(page); > drain_all_pages(page_zone(page)); > } > > ... > SetPageHWPoison(page); > page_ref_inc(page); > > 1) You are freeing the page first, which means it goes to buddy > 2) Then you set it as poisoned and you update its refcount. > > Now we have a page sitting in Buddy with a refcount = 1 and poisoned, and that is quite wrong. Hi Naoya, Ok, I tested it and with the following changes on top I cannot reproduce the issue: diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f68cb5e3b320..4ffaaa5c2603 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -67,11 +67,6 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { - if (release) { - put_page(page); - drain_all_pages(page_zone(page)); - } - if (hugepage_or_freepage) { /* * Doing this check for free pages is also fine since dissolve_free_huge_page @@ -89,6 +84,12 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo } SetPageHWPoison(page); + + if (release) { + put_page(page); + drain_all_pages(page_zone(page)); + } + page_ref_inc(page); num_poisoned_pages_inc(); return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0d9f9bd0e06c..8a6ab05f074c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1173,6 +1173,16 @@ static __always_inline bool free_pages_prepare(struct page *page, trace_mm_page_free(page, order); + if (unlikely(PageHWPoison(page)) && !order) { + /* + * Untie memcg state and reset page's owner + */ + if (memcg_kmem_enabled() && PageKmemcg(page)) + __memcg_kmem_uncharge_page(page, order); + reset_page_owner(page, order); + return false; + } + /* * Check tail pages before head page information is cleared to * avoid checking PageCompound for order-0 pages. # sh tmp_run_ksm_madv.sh p1 0x7f6b6b08e000 p2 0x7f6b529ee000 madvise(p1) 0 madvise(p2) 0 writing p1 ... done writing p2 ... done soft offline soft offline returns 0 OK Can you try to re-run your tests and see if they come clean? If they do, I will try to see if Andrew can squezee above changes into [1], where they belong to. Otherwise I will craft a v5 containing all fixups (pretty unfortunate). [1] https://patchwork.kernel.org/patch/11704099/
On Thu, Sep 17, 2020 at 03:40:06PM +0200, Oscar Salvador wrote: > On Thu, Sep 17, 2020 at 03:09:52PM +0200, Oscar Salvador wrote: > > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) > > { > > if (release) { > > put_page(page); > > drain_all_pages(page_zone(page)); > > } > > > > ... > > SetPageHWPoison(page); > > page_ref_inc(page); > > > > 1) You are freeing the page first, which means it goes to buddy > > 2) Then you set it as poisoned and you update its refcount. > > > > Now we have a page sitting in Buddy with a refcount = 1 and poisoned, and that is quite wrong. > > Hi Naoya, > > Ok, I tested it and with the following changes on top I cannot reproduce the issue: > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index f68cb5e3b320..4ffaaa5c2603 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -67,11 +67,6 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); > > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) > { > - if (release) { > - put_page(page); > - drain_all_pages(page_zone(page)); > - } > - > if (hugepage_or_freepage) { > /* > * Doing this check for free pages is also fine since dissolve_free_huge_page > @@ -89,6 +84,12 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo > } > > SetPageHWPoison(page); > + > + if (release) { > + put_page(page); > + drain_all_pages(page_zone(page)); > + } > + > page_ref_inc(page); > num_poisoned_pages_inc(); > return true; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0d9f9bd0e06c..8a6ab05f074c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1173,6 +1173,16 @@ static __always_inline bool free_pages_prepare(struct page *page, > > trace_mm_page_free(page, order); > > + if (unlikely(PageHWPoison(page)) && !order) { > + /* > + * Untie memcg state and reset page's owner > + */ > + if (memcg_kmem_enabled() && PageKmemcg(page)) > + __memcg_kmem_uncharge_page(page, order); > + reset_page_owner(page, order); > + return false; > + } > + Sorry, I modified the patches based on the different assumption from yours. I firstly thought of taking page off after confirming the error page is freed back to buddy. This approach leaves the possibility of reusing the error page (which is acceptable), but simpler and less invasive one. Your approach removes the error page from page allocator's control in freeing time. It has no possibility of reusing the error page but changes are tightly coupled with page free code. This is a tradeoff between complexity and completeness of soft offline, Now I'm not sure I could persist on my own opinion without providing working code, and it's OK for me to take your one. > /* > * Check tail pages before head page information is cleared to > * avoid checking PageCompound for order-0 pages. > > ... (let me reply to older email here...) > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) > { > if (release) { > put_page(page); > drain_all_pages(page_zone(page)); > } > > ... > SetPageHWPoison(page); > page_ref_inc(page); > > 1) You are freeing the page first, which means it goes to buddy > 2) Then you set it as poisoned and you update its refcount. > > Now we have a page sitting in Buddy with a refcount = 1 and poisoned, and that is quite wrong. This order was correct in my approach. Isolation operation is done after confirming it's in the free list. This prevents PageHWPoison pages from being in pcpclists. But the page_ref_inc() may not be necessary in my approach. > # sh tmp_run_ksm_madv.sh > p1 0x7f6b6b08e000 > p2 0x7f6b529ee000 > madvise(p1) 0 > madvise(p2) 0 > writing p1 ... done > writing p2 ... done > soft offline > soft offline returns 0 > OK > > > Can you try to re-run your tests and see if they come clean? The test passed in my environment, so this is fine. > If they do, I will try to see if Andrew can squezee above changes into [1], > where they belong to. Yes, proposing the fix for mmhwpoison-rework-soft-offline-for-in-use-pages.patch seems fine to me. Again, sorry for modifying code without asking. Naoya Horiguchi
On 2020-09-17 17:27, HORIGUCHI NAOYA wrote: > Sorry, I modified the patches based on the different assumption from > yours. > I firstly thought of taking page off after confirming the error page > is freed back to buddy. This approach leaves the possibility of reusing > the error page (which is acceptable), but simpler and less invasive > one. > > Your approach removes the error page from page allocator's control in > freeing time. It has no possibility of reusing the error page but > changes > are tightly coupled with page free code. > > This is a tradeoff between complexity and completeness of soft offline, > Now I'm not sure I could persist on my own opinion without providing > working code, and it's OK for me to take your one. Yeah, you are right it is a trade off. I would suggest taking this path now, and if it proofs to be problematic in some way, we can always do the: free_page take_it_off_buddy OK: mark it as hwpoison and increment refcount NOT_OK (raced with allocation): oops, sorry > The test passed in my environment, so this is fine. Thanks for trying it out. > >> If they do, I will try to see if Andrew can squezee above changes into >> [1], >> where they belong to. > > Yes, proposing the fix for > mmhwpoison-rework-soft-offline-for-in-use-pages.patch > seems fine to me. > > Again, sorry for modifying code without asking. No worries, I wil do a couple of tests on my own and then I will talk to Andrew to see if we can squeeze the changes in there.
On Thu, Sep 17, 2020 at 03:27:18PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> The test passed in my environment, so this is fine.
With everything applied, it works for me as well.
(apologies for the delay, the machine I was using stopped working and took a
while to get another one)