diff mbox series

[2/2] mm/huge_memory: do not overkill when splitting huge_zero_page

Message ID d4fab301a5debd792527696add16132f53a80cc9.1651039624.git.xuyu@linux.alibaba.com (mailing list archive)
State New
Headers show
Series mm/memory-failure: rework fix on huge_zero_page splitting | expand

Commit Message

Xu Yu April 27, 2022, 6:10 a.m. UTC
Kernel panic when injecting memory_failure for the global
huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.

  Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
  page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
  head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
  flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
  raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:2499!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
  Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
  RIP: 0010:split_huge_page_to_list+0x66a/0x880
  Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
  RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
  RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
  RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
  R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
  R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
  FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
  try_to_split_thp_page+0x3a/0x130
  memory_failure+0x128/0x800
  madvise_inject_error.cold+0x8b/0xa1
  __x64_sys_madvise+0x54/0x60
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fc3754f8bf9
  Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
  RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
  RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
  RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
  R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000

We think that raising BUG is overkilling for splitting huge_zero_page,
the huge_zero_page can't be met from normal paths other than memory
failure, but memory failure is a valid caller. So we tend to replace the
BUG to WARN + returning -EBUSY, and thus the panic above won't happen
again.

Suggested-by: Yang Shi <shy828301@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
 mm/huge_memory.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

HORIGUCHI NAOYA(堀口 直也) April 27, 2022, 7:12 a.m. UTC | #1
On Wed, Apr 27, 2022 at 02:10:17PM +0800, Xu Yu wrote:
> Kernel panic when injecting memory_failure for the global
> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
> 
>   Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>   page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>   head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>   flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>   raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>   page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2499!
>   invalid opcode: 0000 [#1] PREEMPT SMP PTI
>   CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>   Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>   RIP: 0010:split_huge_page_to_list+0x66a/0x880
>   Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>   RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>   RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>   RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>   RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>   R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>   R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>   FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>   try_to_split_thp_page+0x3a/0x130
>   memory_failure+0x128/0x800
>   madvise_inject_error.cold+0x8b/0xa1
>   __x64_sys_madvise+0x54/0x60
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   RIP: 0033:0x7fc3754f8bf9
>   Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>   RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>   RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>   RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>   RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>   R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>   R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
> 
> We think that raising BUG is overkilling for splitting huge_zero_page,
> the huge_zero_page can't be met from normal paths other than memory
> failure, but memory failure is a valid caller. So we tend to replace the
> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
> again.
> 
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

What to do on -stable?
The older version was backported to 5.15.z and 5.17.z, so if you choose
to send this to stable, 1/2 should be also sent to stable.

Thanks,
Naoya Horiguchi
Xu Yu April 27, 2022, 7:37 a.m. UTC | #2
On 4/27/22 3:12 PM, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Apr 27, 2022 at 02:10:17PM +0800, Xu Yu wrote:
>> Kernel panic when injecting memory_failure for the global
>> huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
>>
>>    Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
>>    page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
>>    head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
>>    flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
>>    raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
>>    raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
>>    page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
>>    ------------[ cut here ]------------
>>    kernel BUG at mm/huge_memory.c:2499!
>>    invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>    CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
>>    Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
>>    RIP: 0010:split_huge_page_to_list+0x66a/0x880
>>    Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
>>    RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
>>    RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
>>    RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
>>    RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
>>    R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
>>    R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
>>    FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>    Call Trace:
>>    try_to_split_thp_page+0x3a/0x130
>>    memory_failure+0x128/0x800
>>    madvise_inject_error.cold+0x8b/0xa1
>>    __x64_sys_madvise+0x54/0x60
>>    do_syscall_64+0x35/0x80
>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>    RIP: 0033:0x7fc3754f8bf9
>>    Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
>>    RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
>>    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
>>    RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
>>    RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
>>    R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
>>    R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
>>
>> We think that raising BUG is overkilling for splitting huge_zero_page,
>> the huge_zero_page can't be met from normal paths other than memory
>> failure, but memory failure is a valid caller. So we tend to replace the
>> BUG to WARN + returning -EBUSY, and thus the panic above won't happen
>> again.
>>
>> Suggested-by: Yang Shi <shy828301@gmail.com>
>> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> 
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> What to do on -stable?
> The older version was backported to 5.15.z and 5.17.z, so if you choose
> to send this to stable, 1/2 should be also sent to stable.

IMHO, I would like to view v3 as an optimization of v2, since the older version
is also capable of fixing this bug accurately. Since the Fixes tag has already
been added to the older version, let's keep -stable to use the older version.

Anyway, the older version is not bad. :)


> 
> Thanks,
> Naoya Horiguchi
kernel test robot April 27, 2022, 9:01 a.m. UTC | #3
Hi Xu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base:   https://github.com/hnaz/linux-mm master
config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
        git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/bits.h:22,
                    from include/linux/ratelimit_types.h:5,
                    from include/linux/printk.h:10,
                    from include/asm-generic/bug.h:22,
                    from arch/s390/include/asm/bug.h:68,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from mm/huge_memory.c:8:
   mm/huge_memory.c: In function 'split_huge_page_to_list':
>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be
      30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
         |                                 ^
   include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID'
      81 | #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
         |                                           ^~~~~~~~~~~~~~~~~~~~
   mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |             ^~~~~~~~~~~~~~~~~~~~


vim +30 include/linux/build_bug.h

527edbc18a70e74 Masahiro Yamada 2019-01-03  18  
527edbc18a70e74 Masahiro Yamada 2019-01-03  19  /* Force a compilation error if a constant expression is not a power of 2 */
527edbc18a70e74 Masahiro Yamada 2019-01-03  20  #define __BUILD_BUG_ON_NOT_POWER_OF_2(n)	\
527edbc18a70e74 Masahiro Yamada 2019-01-03  21  	BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
527edbc18a70e74 Masahiro Yamada 2019-01-03  22  #define BUILD_BUG_ON_NOT_POWER_OF_2(n)			\
527edbc18a70e74 Masahiro Yamada 2019-01-03  23  	BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0))
bc6245e5efd70c4 Ian Abbott      2017-07-10  24  
bc6245e5efd70c4 Ian Abbott      2017-07-10  25  /*
bc6245e5efd70c4 Ian Abbott      2017-07-10  26   * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the
bc6245e5efd70c4 Ian Abbott      2017-07-10  27   * expression but avoids the generation of any code, even if that expression
bc6245e5efd70c4 Ian Abbott      2017-07-10  28   * has side-effects.
bc6245e5efd70c4 Ian Abbott      2017-07-10  29   */
bc6245e5efd70c4 Ian Abbott      2017-07-10 @30  #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
bc6245e5efd70c4 Ian Abbott      2017-07-10  31
kernel test robot April 27, 2022, 9:36 a.m. UTC | #4
Hi Xu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base:   https://github.com/hnaz/linux-mm master
config: i386-randconfig-a003-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271706.mGX6CwrT-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
        git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/huge_memory.c:2553:2: error: statement requires expression of scalar type ('void' invalid)
           if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
           ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 error generated.


vim +2553 mm/huge_memory.c

  2519	
  2520	/*
  2521	 * This function splits huge page into normal pages. @page can point to any
  2522	 * subpage of huge page to split. Split doesn't change the position of @page.
  2523	 *
  2524	 * Only caller must hold pin on the @page, otherwise split fails with -EBUSY.
  2525	 * The huge page must be locked.
  2526	 *
  2527	 * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
  2528	 *
  2529	 * Both head page and tail pages will inherit mapping, flags, and so on from
  2530	 * the hugepage.
  2531	 *
  2532	 * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
  2533	 * they are not mapped.
  2534	 *
  2535	 * Returns 0 if the hugepage is split successfully.
  2536	 * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
  2537	 * us.
  2538	 */
  2539	int split_huge_page_to_list(struct page *page, struct list_head *list)
  2540	{
  2541		struct folio *folio = page_folio(page);
  2542		struct page *head = &folio->page;
  2543		struct deferred_split *ds_queue = get_deferred_split_queue(head);
  2544		XA_STATE(xas, &head->mapping->i_pages, head->index);
  2545		struct anon_vma *anon_vma = NULL;
  2546		struct address_space *mapping = NULL;
  2547		int extra_pins, ret;
  2548		pgoff_t end;
  2549	
  2550		VM_BUG_ON_PAGE(!PageLocked(head), head);
  2551		VM_BUG_ON_PAGE(!PageCompound(head), head);
  2552	
> 2553		if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
  2554			return -EBUSY;
  2555	
  2556		if (PageWriteback(head))
  2557			return -EBUSY;
  2558	
  2559		if (PageAnon(head)) {
  2560			/*
  2561			 * The caller does not necessarily hold an mmap_lock that would
  2562			 * prevent the anon_vma disappearing so we first we take a
  2563			 * reference to it and then lock the anon_vma for write. This
  2564			 * is similar to folio_lock_anon_vma_read except the write lock
  2565			 * is taken to serialise against parallel split or collapse
  2566			 * operations.
  2567			 */
  2568			anon_vma = page_get_anon_vma(head);
  2569			if (!anon_vma) {
  2570				ret = -EBUSY;
  2571				goto out;
  2572			}
  2573			end = -1;
  2574			mapping = NULL;
  2575			anon_vma_lock_write(anon_vma);
  2576		} else {
  2577			mapping = head->mapping;
  2578	
  2579			/* Truncated ? */
  2580			if (!mapping) {
  2581				ret = -EBUSY;
  2582				goto out;
  2583			}
  2584	
  2585			xas_split_alloc(&xas, head, compound_order(head),
  2586					mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK);
  2587			if (xas_error(&xas)) {
  2588				ret = xas_error(&xas);
  2589				goto out;
  2590			}
  2591	
  2592			anon_vma = NULL;
  2593			i_mmap_lock_read(mapping);
  2594	
  2595			/*
  2596			 *__split_huge_page() may need to trim off pages beyond EOF:
  2597			 * but on 32-bit, i_size_read() takes an irq-unsafe seqlock,
  2598			 * which cannot be nested inside the page tree lock. So note
  2599			 * end now: i_size itself may be changed at any moment, but
  2600			 * head page lock is good enough to serialize the trimming.
  2601			 */
  2602			end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
  2603			if (shmem_mapping(mapping))
  2604				end = shmem_fallocend(mapping->host, end);
  2605		}
  2606	
  2607		/*
  2608		 * Racy check if we can split the page, before unmap_page() will
  2609		 * split PMDs
  2610		 */
  2611		if (!can_split_folio(folio, &extra_pins)) {
  2612			ret = -EBUSY;
  2613			goto out_unlock;
  2614		}
  2615	
  2616		unmap_page(head);
  2617	
  2618		/* block interrupt reentry in xa_lock and spinlock */
  2619		local_irq_disable();
  2620		if (mapping) {
  2621			/*
  2622			 * Check if the head page is present in page cache.
  2623			 * We assume all tail are present too, if head is there.
  2624			 */
  2625			xas_lock(&xas);
  2626			xas_reset(&xas);
  2627			if (xas_load(&xas) != head)
  2628				goto fail;
  2629		}
  2630	
  2631		/* Prevent deferred_split_scan() touching ->_refcount */
  2632		spin_lock(&ds_queue->split_queue_lock);
  2633		if (page_ref_freeze(head, 1 + extra_pins)) {
  2634			if (!list_empty(page_deferred_list(head))) {
  2635				ds_queue->split_queue_len--;
  2636				list_del(page_deferred_list(head));
  2637			}
  2638			spin_unlock(&ds_queue->split_queue_lock);
  2639			if (mapping) {
  2640				int nr = thp_nr_pages(head);
  2641	
  2642				xas_split(&xas, head, thp_order(head));
  2643				if (PageSwapBacked(head)) {
  2644					__mod_lruvec_page_state(head, NR_SHMEM_THPS,
  2645								-nr);
  2646				} else {
  2647					__mod_lruvec_page_state(head, NR_FILE_THPS,
  2648								-nr);
  2649					filemap_nr_thps_dec(mapping);
  2650				}
  2651			}
  2652	
  2653			__split_huge_page(page, list, end);
  2654			ret = 0;
  2655		} else {
  2656			spin_unlock(&ds_queue->split_queue_lock);
  2657	fail:
  2658			if (mapping)
  2659				xas_unlock(&xas);
  2660			local_irq_enable();
  2661			remap_page(folio, folio_nr_pages(folio));
  2662			ret = -EBUSY;
  2663		}
  2664	
  2665	out_unlock:
  2666		if (anon_vma) {
  2667			anon_vma_unlock_write(anon_vma);
  2668			put_anon_vma(anon_vma);
  2669		}
  2670		if (mapping)
  2671			i_mmap_unlock_read(mapping);
  2672	out:
  2673		/* Free any memory we didn't use */
  2674		xas_nomem(&xas, 0);
  2675		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
  2676		return ret;
  2677	}
  2678
Xu Yu April 27, 2022, 9:48 a.m. UTC | #5
Thanks!

Sorry for that I tested only with CONFIG_DEBUG_VM enabled. This issue is
triggered when CONFIG_DEBUG_VM is disabled.

PATCH 2/2 has been resend.

On 4/27/22 5:01 PM, kernel test robot wrote:
> Hi Xu,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on hnaz-mm/master]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
> base:   https://github.com/hnaz/linux-mm master
> config: s390-randconfig-r044-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271636.UqHlxRwk-lkp@intel.com/config)
> compiler: s390-linux-gcc (GCC) 11.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
>          git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     In file included from include/linux/bits.h:22,
>                      from include/linux/ratelimit_types.h:5,
>                      from include/linux/printk.h:10,
>                      from include/asm-generic/bug.h:22,
>                      from arch/s390/include/asm/bug.h:68,
>                      from include/linux/bug.h:5,
>                      from include/linux/mmdebug.h:5,
>                      from include/linux/mm.h:6,
>                      from mm/huge_memory.c:8:
>     mm/huge_memory.c: In function 'split_huge_page_to_list':
>>> include/linux/build_bug.h:30:33: error: void value not ignored as it ought to be
>        30 | #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
>           |                                 ^
>     include/linux/mmdebug.h:81:43: note: in expansion of macro 'BUILD_BUG_ON_INVALID'
>        81 | #define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
>           |                                           ^~~~~~~~~~~~~~~~~~~~
>     mm/huge_memory.c:2553:13: note: in expansion of macro 'VM_WARN_ON_ONCE_PAGE'
>      2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
>           |             ^~~~~~~~~~~~~~~~~~~~
> 
> 
> vim +30 include/linux/build_bug.h
> 
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  18
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  19  /* Force a compilation error if a constant expression is not a power of 2 */
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  20  #define __BUILD_BUG_ON_NOT_POWER_OF_2(n)	\
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  21  	BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  22  #define BUILD_BUG_ON_NOT_POWER_OF_2(n)			\
> 527edbc18a70e74 Masahiro Yamada 2019-01-03  23  	BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0))
> bc6245e5efd70c4 Ian Abbott      2017-07-10  24
> bc6245e5efd70c4 Ian Abbott      2017-07-10  25  /*
> bc6245e5efd70c4 Ian Abbott      2017-07-10  26   * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the
> bc6245e5efd70c4 Ian Abbott      2017-07-10  27   * expression but avoids the generation of any code, even if that expression
> bc6245e5efd70c4 Ian Abbott      2017-07-10  28   * has side-effects.
> bc6245e5efd70c4 Ian Abbott      2017-07-10  29   */
> bc6245e5efd70c4 Ian Abbott      2017-07-10 @30  #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e))))
> bc6245e5efd70c4 Ian Abbott      2017-07-10  31
>
Andrew Morton April 27, 2022, 7 p.m. UTC | #6
On Wed, 27 Apr 2022 07:12:32 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:

> What to do on -stable?
> The older version was backported to 5.15.z and 5.17.z, so if you choose
> to send this to stable, 1/2 should be also sent to stable.

I added

Fixes: d173d5417fb ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Cc: <stable@vger.kernel.org>

to both patches.  I think -stable people will be able to sort that out.
kernel test robot April 28, 2022, 1:59 a.m. UTC | #7
Hi Xu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
base:   https://github.com/hnaz/linux-mm master
config: arc-randconfig-r005-20220425 (https://download.01.org/0day-ci/archive/20220428/202204280339.5Akc9USp-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/988ec6e274e00e5706be7590a4a39427fbe856b1
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Xu-Yu/mm-memory-failure-rework-fix-on-huge_zero_page-splitting/20220427-141253
        git checkout 988ec6e274e00e5706be7590a4a39427fbe856b1
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:5,
                    from arch/arc/include/asm/bug.h:30,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from mm/huge_memory.c:8:
   mm/huge_memory.c: In function 'split_huge_page_to_list':
>> include/linux/compiler.h:56:45: error: invalid use of void expression
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                             ^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
      58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                    ^~~~
   mm/huge_memory.c:2553:9: note: in expansion of macro 'if'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |         ^~
>> include/linux/compiler.h:56:45: error: invalid use of void expression
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                             ^
   include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
      58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                             ^~~~
   mm/huge_memory.c:2553:9: note: in expansion of macro 'if'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |         ^~
>> include/linux/compiler.h:56:45: error: invalid use of void expression
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                                             ^
   include/linux/compiler.h:69:10: note: in definition of macro '__trace_if_value'
      69 |         (cond) ?                                        \
         |          ^~~~
   include/linux/compiler.h:56:28: note: in expansion of macro '__trace_if_var'
      56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
         |                            ^~~~~~~~~~~~~~
   mm/huge_memory.c:2553:9: note: in expansion of macro 'if'
    2553 |         if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
         |         ^~


vim +56 include/linux/compiler.h

2bcd521a684cc94 Steven Rostedt 2008-11-21  50  
2bcd521a684cc94 Steven Rostedt 2008-11-21  51  #ifdef CONFIG_PROFILE_ALL_BRANCHES
2bcd521a684cc94 Steven Rostedt 2008-11-21  52  /*
2bcd521a684cc94 Steven Rostedt 2008-11-21  53   * "Define 'is'", Bill Clinton
2bcd521a684cc94 Steven Rostedt 2008-11-21  54   * "Define 'if'", Steven Rostedt
2bcd521a684cc94 Steven Rostedt 2008-11-21  55   */
a15fd609ad53a63 Linus Torvalds 2019-03-20 @56  #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
a15fd609ad53a63 Linus Torvalds 2019-03-20  57
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c468fee595ff..3bb464509518 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2496,10 +2496,12 @@  int split_huge_page_to_list(struct page *page, struct list_head *list)
 	int extra_pins, ret;
 	pgoff_t end;
 
-	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
 	VM_BUG_ON_PAGE(!PageLocked(head), head);
 	VM_BUG_ON_PAGE(!PageCompound(head), head);
 
+	if (VM_WARN_ON_ONCE_PAGE(is_huge_zero_page(head), head))
+		return -EBUSY;
+
 	if (PageWriteback(head))
 		return -EBUSY;