Message ID | d4fab301a5debd792527696add16132f53a80cc9.1651039624.git.xuyu@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/memory-failure: rework fix on huge_zero_page splitting | expand |
On Wed, Apr 27, 2022 at 02:10:17PM +0800, Xu Yu wrote: > Kernel panic when injecting memory_failure for the global > huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows. > > Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 > page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 > head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 > flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff) > raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 > raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 > page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) > ------------[ cut here ]------------ > kernel BUG at mm/huge_memory.c:2499! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 > Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 > RIP: 0010:split_huge_page_to_list+0x66a/0x880 > Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b > RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 > RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff > RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff > R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 > R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 > FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > try_to_split_thp_page+0x3a/0x130 > memory_failure+0x128/0x800 > madvise_inject_error.cold+0x8b/0xa1 > __x64_sys_madvise+0x54/0x60 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7fc3754f8bf9 > Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 > RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 > RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 > RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 > R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 > R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 > > We think that raising BUG is overkilling for splitting huge_zero_page, > the huge_zero_page can't be met from normal paths other than memory > failure, but memory failure is a valid caller. So we tend to replace the > BUG to WARN + returning -EBUSY, and thus the panic above won't happen > again. > > Suggested-by: Yang Shi <shy828301@gmail.com> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> > Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> What to do on -stable? The older version was backported to 5.15.z and 5.17.z, so if you choose to send this to stable, 1/2 should be also sent to stable. Thanks, Naoya Horiguchi
On 4/27/22 3:12 PM, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Apr 27, 2022 at 02:10:17PM +0800, Xu Yu wrote: >> Kernel panic when injecting memory_failure for the global >> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows. >> >> Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 >> page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 >> head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 >> flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff) >> raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 >> raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 >> page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) >> ------------[ cut here ]------------ >> kernel BUG at mm/huge_memory.c:2499! >> invalid opcode: 0000 [#1] PREEMPT SMP PTI >> CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 >> Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 >> RIP: 0010:split_huge_page_to_list+0x66a/0x880 >> Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b >> RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 >> RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 >> RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff >> RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff >> R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 >> R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 >> FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> try_to_split_thp_page+0x3a/0x130 >> memory_failure+0x128/0x800 >> madvise_inject_error.cold+0x8b/0xa1 >> __x64_sys_madvise+0x54/0x60 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> RIP: 0033:0x7fc3754f8bf9 >> Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 >> RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c >> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 >> RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 >> RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 >> R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 >> R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 >> >> We think that raising BUG is overkilling for splitting huge_zero_page, >> the huge_zero_page can't be met from normal paths other than memory >> failure, but memory failure is a valid caller. So we tend to replace the >> BUG to WARN + returning -EBUSY, and thus the panic above won't happen >> again. >> >> Suggested-by: Yang Shi <shy828301@gmail.com> >> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> >> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > What to do on -stable? > The older version was backported to 5.15.z and 5.17.z, so if you choose > to send this to stable, 1/2 should be also sent to stable. IMHO, I would like to view v3 as an optimization of v2, since the older version is also capable of fixing this bug accurately. Since the Fixes tag has already been added to the older version, let's keep -stable to use the older version. Anyway, the older version is not bad. :) > > Thanks, > Naoya Horiguchi
Hi Xu,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on hnaz-mm/master]
url: https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base: https://github.com/hnaz/linux-mm master
config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from include/linux/bits.h:22,
from include/linux/ratelimit_types.h:5,
from include/linux/printk.h:10,
from include/asm-generic/bug.h:22,
from arch/s390/include/asm/bug.h:68,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:6,
from mm/huge_memory.c:8:
mm/huge_memory.c: In function 'split_huge_page_to_list':
>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be
30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
| ^
include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID'
81 | #define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond)
| ^~~~~~~~~~~~~~~~~~~~
mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE'
2553 | if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
| ^~~~~~~~~~~~~~~~~~~~
vim +30 include/linux/build_bug.h
527edbc18a70e74 Masahiro Yamada 2019-01-03 18
527edbc18a70e74 Masahiro Yamada 2019-01-03 19 /* Force a compilation error if a constant expression is not a power of 2 */
527edbc18a70e74 Masahiro Yamada 2019-01-03 20 #define __BUILD_BUG_ON_NOT_POWER_OF_2(n) \
527edbc18a70e74 Masahiro Yamada 2019-01-03 21 BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
527edbc18a70e74 Masahiro Yamada 2019-01-03 22 #define BUILD_BUG_ON_NOT_POWER_OF_2(n) \
527edbc18a70e74 Masahiro Yamada 2019-01-03 23 BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0))
bc6245e5efd70c4 Ian Abbott 2017-07-10 24
bc6245e5efd70c4 Ian Abbott 2017-07-10 25 /*
bc6245e5efd70c4 Ian Abbott 2017-07-10 26 * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the
bc6245e5efd70c4 Ian Abbott 2017-07-10 27 * expression but avoids the generation of any code, even if that expression
bc6245e5efd70c4 Ian Abbott 2017-07-10 28 * has side-effects.
bc6245e5efd70c4 Ian Abbott 2017-07-10 29 */
bc6245e5efd70c4 Ian Abbott 2017-07-10 @30 #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
bc6245e5efd70c4 Ian Abbott 2017-07-10 31
Hi Xu, Thank you for the patch! Yet something to improve: [auto build test ERROR on hnaz-mm/master] url: https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253 base: https://github.com/hnaz/linux-mm master config: i386-randconfig-a003-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271706.mGX6CwrT-lkp@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253 git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> mm/huge_memory.c:2553:2: error: statement requires expression of scalar type ('void' invalid) if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. vim +2553 mm/huge_memory.c 2519 2520 /* 2521 * This function splits huge page into normal pages. @page can point to any 2522 * subpage of huge page to split. Split doesn't change the position of @page. 2523 * 2524 * Only caller must hold pin on the @page, otherwise split fails with -EBUSY. 2525 * The huge page must be locked. 2526 * 2527 * If @list is null, tail pages will be added to LRU list, otherwise, to @list. 2528 * 2529 * Both head page and tail pages will inherit mapping, flags, and so on from 2530 * the hugepage. 2531 * 2532 * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if 2533 * they are not mapped. 2534 * 2535 * Returns 0 if the hugepage is split successfully. 2536 * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under 2537 * us. 2538 */ 2539 int split_huge_page_to_list(struct page *page, struct list_head *list) 2540 { 2541 struct folio *folio = page_folio(page); 2542 struct page *head = &folio->page; 2543 struct deferred_split *ds_queue = get_deferred_split_queue(head); 2544 XA_STATE(xas, &head->mapping->i_pages, head->index); 2545 struct anon_vma *anon_vma = NULL; 2546 struct address_space *mapping = NULL; 2547 int extra_pins, ret; 2548 pgoff_t end; 2549 2550 VM_BUG_ON_PAGE(!PageLocked(head), head); 2551 VM_BUG_ON_PAGE(!PageCompound(head), head); 2552 > 2553 if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) 2554 return -EBUSY; 2555 2556 if (PageWriteback(head)) 2557 return -EBUSY; 2558 2559 if (PageAnon(head)) { 2560 /* 2561 * The caller does not necessarily hold an mmap_lock that would 2562 * prevent the anon_vma disappearing so we first we take a 2563 * reference to it and then lock the anon_vma for write. This 2564 * is similar to folio_lock_anon_vma_read except the write lock 2565 * is taken to serialise against parallel split or collapse 2566 * operations. 2567 */ 2568 anon_vma = page_get_anon_vma(head); 2569 if (!anon_vma) { 2570 ret = -EBUSY; 2571 goto out; 2572 } 2573 end = -1; 2574 mapping = NULL; 2575 anon_vma_lock_write(anon_vma); 2576 } else { 2577 mapping = head->mapping; 2578 2579 /* Truncated ? */ 2580 if (!mapping) { 2581 ret = -EBUSY; 2582 goto out; 2583 } 2584 2585 xas_split_alloc(&xas, head, compound_order(head), 2586 mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); 2587 if (xas_error(&xas)) { 2588 ret = xas_error(&xas); 2589 goto out; 2590 } 2591 2592 anon_vma = NULL; 2593 i_mmap_lock_read(mapping); 2594 2595 /* 2596 *__split_huge_page() may need to trim off pages beyond EOF: 2597 * but on 32-bit, i_size_read() takes an irq-unsafe seqlock, 2598 * which cannot be nested inside the page tree lock. So note 2599 * end now: i_size itself may be changed at any moment, but 2600 * head page lock is good enough to serialize the trimming. 2601 */ 2602 end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); 2603 if (shmem_mapping(mapping)) 2604 end = shmem_fallocend(mapping->host, end); 2605 } 2606 2607 /* 2608 * Racy check if we can split the page, before unmap_page() will 2609 * split PMDs 2610 */ 2611 if (!can_split_folio(folio, &extra_pins)) { 2612 ret = -EBUSY; 2613 goto out_unlock; 2614 } 2615 2616 unmap_page(head); 2617 2618 /* block interrupt reentry in xa_lock and spinlock */ 2619 local_irq_disable(); 2620 if (mapping) { 2621 /* 2622 * Check if the head page is present in page cache. 2623 * We assume all tail are present too, if head is there. 2624 */ 2625 xas_lock(&xas); 2626 xas_reset(&xas); 2627 if (xas_load(&xas) != head) 2628 goto fail; 2629 } 2630 2631 /* Prevent deferred_split_scan() touching ->_refcount */ 2632 spin_lock(&ds_queue->split_queue_lock); 2633 if (page_ref_freeze(head, 1 + extra_pins)) { 2634 if (!list_empty(page_deferred_list(head))) { 2635 ds_queue->split_queue_len--; 2636 list_del(page_deferred_list(head)); 2637 } 2638 spin_unlock(&ds_queue->split_queue_lock); 2639 if (mapping) { 2640 int nr = thp_nr_pages(head); 2641 2642 xas_split(&xas, head, thp_order(head)); 2643 if (PageSwapBacked(head)) { 2644 __mod_lruvec_page_state(head, NR_SHMEM_THPS, 2645 -nr); 2646 } else { 2647 __mod_lruvec_page_state(head, NR_FILE_THPS, 2648 -nr); 2649 filemap_nr_thps_dec(mapping); 2650 } 2651 } 2652 2653 __split_huge_page(page, list, end); 2654 ret = 0; 2655 } else { 2656 spin_unlock(&ds_queue->split_queue_lock); 2657 fail: 2658 if (mapping) 2659 xas_unlock(&xas); 2660 local_irq_enable(); 2661 remap_page(folio, folio_nr_pages(folio)); 2662 ret = -EBUSY; 2663 } 2664 2665 out_unlock: 2666 if (anon_vma) { 2667 anon_vma_unlock_write(anon_vma); 2668 put_anon_vma(anon_vma); 2669 } 2670 if (mapping) 2671 i_mmap_unlock_read(mapping); 2672 out: 2673 /* Free any memory we didn't use */ 2674 xas_nomem(&xas, 0); 2675 count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED); 2676 return ret; 2677 } 2678
Thanks! Sorry for that I tested only with CONFIG_DEBUG_VM enabled. This issue is triggered when CONFIG_DEBUG_VM is disabled. PATCH 2/2 has been resend. On 4/27/22 5:01 PM, kernel test robot wrote: > Hi Xu, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on hnaz-mm/master] > > url: https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253 > base: https://github.com/hnaz/linux-mm master > config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp@intel.com/config) > compiler: s390-linux-gcc (GCC) 11.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1 > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253 > git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1 > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > In file included from include/linux/bits.h:22, > from include/linux/ratelimit_types.h:5, > from include/linux/printk.h:10, > from include/asm-generic/bug.h:22, > from arch/s390/include/asm/bug.h:68, > from include/linux/bug.h:5, > from include/linux/mmdebug.h:5, > from include/linux/mm.h:6, > from mm/huge_memory.c:8: > mm/huge_memory.c: In function 'split_huge_page_to_list': >>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be > 30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e)))) > | ^ > include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID' > 81 | #define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond) > | ^~~~~~~~~~~~~~~~~~~~ > mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE' > 2553 | if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) > | ^~~~~~~~~~~~~~~~~~~~ > > > vim +30 include/linux/build_bug.h > > 527edbc18a70e74 Masahiro Yamada 2019-01-03 18 > 527edbc18a70e74 Masahiro Yamada 2019-01-03 19 /* Force a compilation error if a constant expression is not a power of 2 */ > 527edbc18a70e74 Masahiro Yamada 2019-01-03 20 #define __BUILD_BUG_ON_NOT_POWER_OF_2(n) \ > 527edbc18a70e74 Masahiro Yamada 2019-01-03 21 BUILD_BUG_ON(((n) & ((n) - 1)) != 0) > 527edbc18a70e74 Masahiro Yamada 2019-01-03 22 #define BUILD_BUG_ON_NOT_POWER_OF_2(n) \ > 527edbc18a70e74 Masahiro Yamada 2019-01-03 23 BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0)) > bc6245e5efd70c4 Ian Abbott 2017-07-10 24 > bc6245e5efd70c4 Ian Abbott 2017-07-10 25 /* > bc6245e5efd70c4 Ian Abbott 2017-07-10 26 * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the > bc6245e5efd70c4 Ian Abbott 2017-07-10 27 * expression but avoids the generation of any code, even if that expression > bc6245e5efd70c4 Ian Abbott 2017-07-10 28 * has side-effects. > bc6245e5efd70c4 Ian Abbott 2017-07-10 29 */ > bc6245e5efd70c4 Ian Abbott 2017-07-10 @30 #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e)))) > bc6245e5efd70c4 Ian Abbott 2017-07-10 31 >
On Wed, 27 Apr 2022 07:12:32 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > What to do on -stable? > The older version was backported to 5.15.z and 5.17.z, so if you choose > to send this to stable, 1/2 should be also sent to stable. I added Fixes: d173d5417fb ("mm/memory-failure.c: skip huge_zero_page in memory_failure()") Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7") Cc: <stable@vger.kernel.org> to both patches. I think -stable people will be able to sort that out.
Hi Xu, Thank you for the patch! Yet something to improve: [auto build test ERROR on hnaz-mm/master] url: https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253 base: https://github.com/hnaz/linux-mm master config: arc-randconfig-r005-20220425 (https://download.01.org/0day-ci/archive/20220428/202204280339.5Akc9USp-lkp@intel.com/config) compiler: arc-elf-gcc (GCC) 11.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253 git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/asm-generic/bug.h:5, from arch/arc/include/asm/bug.h:30, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:6, from mm/huge_memory.c:8: mm/huge_memory.c: In function 'split_huge_page_to_list': >> include/linux/compiler.h:56:45: error: invalid use of void expression 56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) ) | ^ include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var' 58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond)) | ^~~~ mm/huge_memory.c:2553:9: note: in expansion of macro 'if' 2553 | if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) | ^~ >> include/linux/compiler.h:56:45: error: invalid use of void expression 56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) ) | ^ include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var' 58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond)) | ^~~~ mm/huge_memory.c:2553:9: note: in expansion of macro 'if' 2553 | if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) | ^~ >> include/linux/compiler.h:56:45: error: invalid use of void expression 56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) ) | ^ include/linux/compiler.h:69:10: note: in definition of macro '__trace_if_value' 69 | (cond) ? \ | ^~~~ include/linux/compiler.h:56:28: note: in expansion of macro '__trace_if_var' 56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) ) | ^~~~~~~~~~~~~~ mm/huge_memory.c:2553:9: note: in expansion of macro 'if' 2553 | if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) | ^~ vim +56 include/linux/compiler.h 2bcd521a684cc94 Steven Rostedt 2008-11-21 50 2bcd521a684cc94 Steven Rostedt 2008-11-21 51 #ifdef CONFIG_PROFILE_ALL_BRANCHES 2bcd521a684cc94 Steven Rostedt 2008-11-21 52 /* 2bcd521a684cc94 Steven Rostedt 2008-11-21 53 * "Define 'is'", Bill Clinton 2bcd521a684cc94 Steven Rostedt 2008-11-21 54 * "Define 'if'", Steven Rostedt 2bcd521a684cc94 Steven Rostedt 2008-11-21 55 */ a15fd609ad53a63 Linus Torvalds 2019-03-20 @56 #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) ) a15fd609ad53a63 Linus Torvalds 2019-03-20 57
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c468fee595ff..3bb464509518 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2496,10 +2496,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) int extra_pins, ret; pgoff_t end; - VM_BUG_ON_PAGE(is_huge_zero_page(head), head); VM_BUG_ON_PAGE(!PageLocked(head), head); VM_BUG_ON_PAGE(!PageCompound(head), head); + if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head)) + return -EBUSY; + if (PageWriteback(head)) return -EBUSY;
Kernel panic when injecting memory_failure for the global huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows. Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000 page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00 head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0 flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head)) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2499! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014 RIP: 0010:split_huge_page_to_list+0x66a/0x880 Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246 RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000 R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40 FS: 00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: try_to_split_thp_page+0x3a/0x130 memory_failure+0x128/0x800 madvise_inject_error.cold+0x8b/0xa1 __x64_sys_madvise+0x54/0x60 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc3754f8bf9 Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8 RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9 RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000 RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490 R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000 We think that raising BUG is overkilling for splitting huge_zero_page, the huge_zero_page can't be met from normal paths other than memory failure, but memory failure is a valid caller. So we tend to replace the BUG to WARN + returning -EBUSY, and thus the panic above won't happen again. Suggested-by: Yang Shi <shy828301@gmail.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> --- mm/huge_memory.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)