Message ID | 20250113082718.1872494-3-mawupeng1@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memory_failure: unmap poisoned filio during migrate properly | expand |
On 13.01.25 09:27, Wupeng Ma wrote: > From: Ma Wupeng <mawupeng1@huawei.com> > > Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to > be offlined) add page poison checks in do_migrate_range in order to make > offline hwpoisoned page possible by introducing isolate_lru_page and > try_to_unmap for hwpoisoned page. However folio lock must be held before > calling try_to_unmap. Add it to fix this problem. > > Waring will be produced if filio is not locked during unmap: > > ------------[ cut here ]------------ > kernel BUG at ./include/linux/swapops.h:400! > Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP > Modules linked in: > CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41 > Tainted: [W]=WARN > Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : try_to_unmap_one+0xb08/0xd3c > lr : try_to_unmap_one+0x3dc/0xd3c > Call trace: > try_to_unmap_one+0xb08/0xd3c (P) > try_to_unmap_one+0x3dc/0xd3c (L) > rmap_walk_anon+0xdc/0x1f8 > rmap_walk+0x3c/0x58 > try_to_unmap+0x88/0x90 > unmap_poisoned_folio+0x30/0xa8 > do_migrate_range+0x4a0/0x568 > offline_pages+0x5a4/0x670 > memory_block_action+0x17c/0x374 > memory_subsys_offline+0x3c/0x78 > device_offline+0xa4/0xd0 > state_store+0x8c/0xf0 > dev_attr_store+0x18/0x2c > sysfs_kf_write+0x44/0x54 > kernfs_fop_write_iter+0x118/0x1a8 > vfs_write+0x3a8/0x4bc > ksys_write+0x6c/0xf8 > __arm64_sys_write+0x1c/0x28 > invoke_syscall+0x44/0x100 > el0_svc_common.constprop.0+0x40/0xe0 > do_el0_svc+0x1c/0x28 > el0_svc+0x30/0xd0 > el0t_64_sync_handler+0xc8/0xcc > el0t_64_sync+0x198/0x19c > Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000) > ---[ end trace 0000000000000000 ]--- > > Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") > Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 330668d37e44..9bedecfc3577 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { > if (WARN_ON(folio_test_lru(folio))) > folio_isolate_lru(folio); > - if (folio_mapped(folio)) > + if (folio_mapped(folio)) { > + folio_lock(folio); > unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); > + folio_unlock(folio); > + } The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event). If there is an elevated reference already, why not move that chunk after the folio_try_get() and just drop the comment that describes the implied magic? I mean, migration of hwpoison is dangerous either way, so that's not the biggest problem I guess :)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 330668d37e44..9bedecfc3577 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); - if (folio_mapped(folio)) + if (folio_mapped(folio)) { + folio_lock(folio); unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON); + folio_unlock(folio); + } + continue; }