Message ID | 20220504011345.662299-13-Liam.Howlett@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Prepare for maple tree | expand |
On Wed, May 04, 2022 at 01:13:53AM +0000, Liam Howlett wrote: > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com> > > Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and > do_mas_align_munmap(). > > do_munmap() is a wrapper to create a maple state for any callers that have > not been converted to the maple tree. > > do_mas_munmap() takes a maple state to mumap a range. This is just a > small function which checks for error conditions and aligns the end of the > range. > > do_mas_align_munmap() uses the aligned range to mumap a range. > do_mas_align_munmap() starts with the first VMA in the range, then finds > the last VMA in the range. Both start and end are split if necessary. > Then the VMAs are removed from the linked list and the mm mlock count is > updated at the same time. Followed by a single tree operation of > overwriting the area in with a NULL. Finally, the detached list is > unmapped and freed. > > By reorganizing the munmap calls as outlined, it is now possible to avoid > extra work of aligning pre-aligned callers which are known to be safe, > avoid extra VMA lookups or tree walks for modifications. > > detach_vmas_to_be_unmapped() is no longer used, so drop this code. > > vm_brk_flags() can just call the do_mas_munmap() as it checks for > intersecting VMAs directly. > > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> ... > +/* > + * do_mas_align_munmap() - munmap the aligned region from @start to @end. > + * @mas: The maple_state, ideally set up to alter the correct tree location. > + * @vma: The starting vm_area_struct > + * @mm: The mm_struct > + * @start: The aligned start address to munmap. > + * @end: The aligned end address to munmap. > + * @uf: The userfaultfd list_head > + * @downgrade: Set to true to attempt a write downgrade of the mmap_sem > + * > + * If @downgrade is true, check return code for potential release of the lock. > + */ > +static int > +do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma, > + struct mm_struct *mm, unsigned long start, > + unsigned long end, struct list_head *uf, bool downgrade) > +{ > + struct vm_area_struct *prev, *last; > + int error = -ENOMEM; > + /* we have start < vma->vm_end */ > > - if (mas_preallocate(&mas, vma, GFP_KERNEL)) > + if (mas_preallocate(mas, vma, GFP_KERNEL)) > return -ENOMEM; > - prev = vma->vm_prev; > - /* we have start < vma->vm_end */ > > + mas->last = end - 1; > /* > * If we need to split any vma, do it now to save pain later. > * ... > +/* > + * do_mas_munmap() - munmap a given range. > + * @mas: The maple state > + * @mm: The mm_struct > + * @start: The start address to munmap > + * @len: The length of the range to munmap > + * @uf: The userfaultfd list_head > + * @downgrade: set to true if the user wants to attempt to write_downgrade the > + * mmap_sem > + * > + * This function takes a @mas that is either pointing to the previous VMA or set > + * to MA_START and sets it up to remove the mapping(s). The @len will be > + * aligned and any arch_unmap work will be preformed. > + * > + * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise. > + */ > +int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm, > + unsigned long start, size_t len, struct list_head *uf, > + bool downgrade) > +{ > + unsigned long end; > + struct vm_area_struct *vma; > + > + if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) > + return -EINVAL; > + > + end = start + PAGE_ALIGN(len); > + if (end == start) > + return -EINVAL; > + > + /* arch_unmap() might do unmaps itself. */ > + arch_unmap(mm, start, end); > + > + /* Find the first overlapping VMA */ > + vma = mas_find(mas, end - 1); > + if (!vma) > + return 0; > + > + return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade); > +} > + ... > @@ -2845,11 +2908,12 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) > int ret; > struct mm_struct *mm = current->mm; > LIST_HEAD(uf); > + MA_STATE(mas, &mm->mm_mt, start, start); > > if (mmap_write_lock_killable(mm)) > return -EINTR; > > - ret = __do_munmap(mm, start, len, &uf, downgrade); > + ret = do_mas_munmap(&mas, mm, start, len, &uf, downgrade); > /* > * Returning 1 indicates mmap_lock is downgraded. > * But 1 is not legal return value of vm_munmap() and munmap(), reset Running a syscall fuzzer for a while could trigger those. WARNING: CPU: 95 PID: 1329067 at mm/slub.c:3643 kmem_cache_free_bulk CPU: 95 PID: 1329067 Comm: trinity-c32 Not tainted 5.18.0-next-20220603 #137 pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kmem_cache_free_bulk lr : mt_destroy_walk sp : ffff80005ed66bf0 x29: ffff80005ed66bf0 x28: ffff401d6c82f050 x27: 0000000000000000 x26: dfff800000000000 x25: 0000000000000003 x24: 1ffffa97cc5fb120 x23: ffffd4be62fd8760 x22: ffff401d6c82f050 x21: 0000000000000003 x20: 0000000000000000 x19: ffff401d6c82f000 x18: ffffd4be66407d1c x17: ffff40297ac21f0c x16: 1fffe8016136146b x15: 1fffe806c7d1ad38 x14: 1fffe8016136145e x13: 0000000000000004 x12: ffff70000bdacd8d x11: 1ffff0000bdacd8c x10: ffff70000bdacd8c x9 : ffffd4be60d633c4 x8 : ffff80005ed66c63 x7 : 0000000000000001 x6 : 0000000000000003 x5 : ffff80005ed66c60 x4 : 0000000000000000 x3 : ffff400b09b09a80 x2 : ffff401d6c82f050 x1 : 0000000000000000 x0 : ffff07ff80014a80 Call trace: kmem_cache_free_bulk mt_destroy_walk mas_wmb_replace mas_spanning_rebalance.isra.0 mas_wr_spanning_store.isra.0 mas_wr_store_entry.isra.0 mas_store_prealloc do_mas_align_munmap.constprop.0 do_mas_munmap __vm_munmap __arm64_sys_munmap invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync irq event stamp: 665580 hardirqs last enabled at (665579): kasan_quarantine_put hardirqs last disabled at (665580): el1_dbg softirqs last enabled at (664048): __do_softirq softirqs last disabled at (663831): __irq_exit_rcu BUG: KASAN: double-free or invalid-free in kmem_cache_free_bulk CPU: 95 PID: 1329067 Comm: trinity-c32 Tainted: G W 5.18.0-next-20220603 #137 Call trace: dump_backtrace show_stack dump_stack_lvl print_address_description.constprop.0 print_report kasan_report_invalid_free ____kasan_slab_free __kasan_slab_free slab_free_freelist_hook kmem_cache_free_bulk mas_destroy mas_store_prealloc do_mas_align_munmap.constprop.0 do_mas_munmap __vm_munmap __arm64_sys_munmap invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Allocated by task 1329067: kasan_save_stack __kasan_slab_alloc slab_post_alloc_hook kmem_cache_alloc_bulk mas_alloc_nodes mas_preallocate __vma_adjust vma_merge mprotect_fixup do_mprotect_pkey.constprop.0 __arm64_sys_mprotect invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Freed by task 1329067: kasan_save_stack kasan_set_track kasan_set_free_info ____kasan_slab_free __kasan_slab_free slab_free_freelist_hook kmem_cache_free mt_destroy_walk mas_wmb_replace mas_spanning_rebalance.isra.0 mas_wr_spanning_store.isra.0 mas_wr_store_entry.isra.0 mas_store_prealloc do_mas_align_munmap.constprop.0 do_mas_munmap __vm_munmap __arm64_sys_munmap invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync The buggy address belongs to the object at ffff401d6c82f000 which belongs to the cache maple_node of size 256 The buggy address is located 0 bytes inside of 256-byte region [ffff401d6c82f000, ffff401d6c82f100) The buggy address belongs to the physical page: page:fffffd0075b20a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x401dec828 head:fffffd0075b20a00 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x1bfffc0000010200(slab|head|node=1|zone=2|lastcpupid=0xffff) raw: 1bfffc0000010200 fffffd00065b2a08 fffffd0006474408 ffff07ff80014a80 raw: 0000000000000000 00000000002a002a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 185514, tgid 185514 (trinity-c15), ts 9791681605400, free_ts 9785882037080 post_alloc_hook get_page_from_freelist __alloc_pages alloc_pages allocate_slab new_slab ___slab_alloc __slab_alloc.constprop.0 kmem_cache_alloc mas_alloc_nodes mas_preallocate __vma_adjust vma_merge mlock_fixup apply_mlockall_flags __arm64_sys_munlockall page last free stack trace: free_pcp_prepare free_unref_page __free_pages __free_slab discard_slab __slab_free ___cache_free qlist_free_all kasan_quarantine_reduce __kasan_slab_alloc __kmalloc_node kvmalloc_node __slab_free ___cache_free qlist_free_all kasan_quarantine_reduce __kasan_slab_alloc __kmalloc_node kvmalloc_node proc_sys_call_handler proc_sys_read new_sync_read vfs_read Memory state around the buggy address: ffff401d6c82ef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff401d6c82ef80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff401d6c82f000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff401d6c82f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff401d6c82f100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
* Qian Cai <quic_qiancai@quicinc.com> [220606 08:09]: > On Wed, May 04, 2022 at 01:13:53AM +0000, Liam Howlett wrote: > > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com> > > > > Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and > > do_mas_align_munmap(). > > > > do_munmap() is a wrapper to create a maple state for any callers that have > > not been converted to the maple tree. > > > > do_mas_munmap() takes a maple state to mumap a range. This is just a > > small function which checks for error conditions and aligns the end of the > > range. > > > > do_mas_align_munmap() uses the aligned range to mumap a range. > > do_mas_align_munmap() starts with the first VMA in the range, then finds > > the last VMA in the range. Both start and end are split if necessary. > > Then the VMAs are removed from the linked list and the mm mlock count is > > updated at the same time. Followed by a single tree operation of > > overwriting the area in with a NULL. Finally, the detached list is > > unmapped and freed. > > > > By reorganizing the munmap calls as outlined, it is now possible to avoid > > extra work of aligning pre-aligned callers which are known to be safe, > > avoid extra VMA lookups or tree walks for modifications. > > > > detach_vmas_to_be_unmapped() is no longer used, so drop this code. > > > > vm_brk_flags() can just call the do_mas_munmap() as it checks for > > intersecting VMAs directly. > > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> > ... .. > Running a syscall fuzzer for a while could trigger those. Thanks. > > WARNING: CPU: 95 PID: 1329067 at mm/slub.c:3643 kmem_cache_free_bulk > CPU: 95 PID: 1329067 Comm: trinity-c32 Not tainted 5.18.0-next-20220603 #137 > pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : kmem_cache_free_bulk > lr : mt_destroy_walk > sp : ffff80005ed66bf0 Does your syscall fuzzer create a reproducer? This looks like arm64 and says 5.18.0-next-20220603 again. Was this bisected to the patch above? Regards, Liam
On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > Does your syscall fuzzer create a reproducer? This looks like arm64 > and says 5.18.0-next-20220603 again. Was this bisected to the patch > above? This was triggered by running the fuzzer over the weekend. $ trinity -C 160 No bisection was done. It was only brought up here because the trace pointed to do_mas_munmap() which was introduced here.
On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > above? > > This was triggered by running the fuzzer over the weekend. > > $ trinity -C 160 > > No bisection was done. It was only brought up here because the trace > pointed to do_mas_munmap() which was introduced here. Liam, I'm getting a similar crash on arm64 -- the allocator is madvise(), not mprotect(). Please take a look. Thanks. ================================================================== BUG: KASAN: double-free or invalid-free in kmem_cache_free_bulk+0x230/0x3b0 Pointer tag: [0c], memory tag: [fe] CPU: 2 PID: 8320 Comm: stress-ng Tainted: G B W 5.19.0-rc1-lockdep+ #3 Call trace: dump_backtrace+0x1a0/0x200 show_stack+0x24/0x30 dump_stack_lvl+0x7c/0xa0 print_report+0x15c/0x524 kasan_report_invalid_free+0x64/0x84 ____kasan_slab_free+0x150/0x184 __kasan_slab_free+0x14/0x24 slab_free_freelist_hook+0x100/0x1ac kmem_cache_free_bulk+0x230/0x3b0 mas_destroy+0x10d8/0x1270 mas_store_prealloc+0xb8/0xec do_mas_align_munmap+0x398/0x694 do_mas_munmap+0xf8/0x118 __vm_munmap+0x154/0x1e0 __arm64_sys_munmap+0x44/0x54 el0_svc_common+0xfc/0x1cc do_el0_svc_compat+0x38/0x5c el0_svc_compat+0x68/0xf4 el0t_32_sync_handler+0xc0/0xf0 el0t_32_sync+0x190/0x194 Allocated by task 8437: kasan_set_track+0x4c/0x7c __kasan_slab_alloc+0x84/0xa8 kmem_cache_alloc_bulk+0x300/0x408 mas_alloc_nodes+0x198/0x294 mas_preallocate+0x8c/0x110 __vma_adjust+0x174/0xc88 vma_merge+0x2e4/0x300 do_madvise+0x504/0xd20 __arm64_sys_madvise+0x54/0x64 el0_svc_common+0xfc/0x1cc do_el0_svc_compat+0x38/0x5c el0_svc_compat+0x68/0xf4 el0t_32_sync_handler+0xc0/0xf0 el0t_32_sync+0x190/0x194 Freed by task 8320: kasan_set_track+0x4c/0x7c kasan_set_free_info+0x2c/0x38 ____kasan_slab_free+0x13c/0x184 __kasan_slab_free+0x14/0x24 slab_free_freelist_hook+0x100/0x1ac kmem_cache_free+0x11c/0x264 mt_destroy_walk+0x6d8/0x714 mas_wmb_replace+0x9d4/0xa68 mas_spanning_rebalance+0x1af0/0x1d2c mas_wr_spanning_store+0x908/0x964 mas_wr_store_entry+0x53c/0x5c0 mas_store_prealloc+0x88/0xec do_mas_align_munmap+0x398/0x694 do_mas_munmap+0xf8/0x118 __vm_munmap+0x154/0x1e0 __arm64_sys_munmap+0x44/0x54 el0_svc_common+0xfc/0x1cc do_el0_svc_compat+0x38/0x5c el0_svc_compat+0x68/0xf4 el0t_32_sync_handler+0xc0/0xf0 el0t_32_sync+0x190/0x194 The buggy address belongs to the object at ffffff808b5f0a00 which belongs to the cache maple_node of size 256 The buggy address is located 0 bytes inside of 256-byte region [ffffff808b5f0a00, ffffff808b5f0b00) The buggy address belongs to the physical page: page:fffffffe022d7c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xcffff808b5f0a00 pfn:0x10b5f0 head:fffffffe022d7c00 order:2 compound_mapcount:0 compound_pincount:0 flags: 0x8000000000010200(slab|head|zone=2|kasantag=0x0) raw: 8000000000010200 fffffffe031a8608 fffffffe021a3608 caffff808002c800 raw: 0cffff808b5f0a00 0000000000150013 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffff808b5f0800: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ffffff808b5f0900: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe >ffffff808b5f0a00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ^ ffffff808b5f0b00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ffffff808b5f0c00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ==================================================================
On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > above? > > > > This was triggered by running the fuzzer over the weekend. > > > > $ trinity -C 160 > > > > No bisection was done. It was only brought up here because the trace > > pointed to do_mas_munmap() which was introduced here. > > Liam, > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > not mprotect(). Please take a look. Another crash on x86_64, which seems different: ================================================================== BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 CPU: 66 PID: 18461 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 Call Trace: <TASK> dump_stack_lvl+0xc5/0xf4 print_address_description+0x7f/0x460 print_report+0x10b/0x240 ? mab_mas_cp+0x2d9/0x6c0 kasan_report+0xe6/0x110 ? mab_mas_cp+0x2d9/0x6c0 kasan_check_range+0x2ef/0x310 ? mab_mas_cp+0x2d9/0x6c0 memcpy+0x44/0x70 mab_mas_cp+0x2d9/0x6c0 mas_spanning_rebalance+0x1a45/0x4d70 ? stack_trace_save+0xca/0x160 ? stack_trace_save+0xca/0x160 mas_wr_spanning_store+0x16a4/0x1ad0 mas_wr_spanning_store+0x16a4/0x1ad0 mas_wr_store_entry+0xbf9/0x12e0 mas_store_prealloc+0x205/0x3c0 do_mas_align_munmap+0x6cf/0xd10 do_mas_munmap+0x1bb/0x210 ? down_write_killable+0xa6/0x110 __vm_munmap+0x1c4/0x270 __x64_sys_munmap+0x60/0x70 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x589827 Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff9276c518 EFLAGS: 00000206 ORIG_RAX: 000000000000000b RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827 RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000 RBP: 00000000004cf000 R08: 00007fff9276c550 R09: 0000000000923bf0 R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000 R13: 00000000004cf040 R14: 0000000000000004 R15: 00007fff9276c668 </TASK> Allocated by task 18461: __kasan_slab_alloc+0xaf/0xe0 kmem_cache_alloc_bulk+0x261/0x360 mas_alloc_nodes+0x2d7/0x4d0 mas_preallocate+0xe0/0x220 do_mas_align_munmap+0x1ce/0xd10 do_mas_munmap+0x1bb/0x210 __vm_munmap+0x1c4/0x270 __x64_sys_munmap+0x60/0x70 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff88c5a2319c00 which belongs to the cache maple_node of size 256 The buggy address is located 128 bytes inside of 256-byte region [ffff88c5a2319c00, ffff88c5a2319d00) The buggy address belongs to the physical page: page:000000000a5cfe8b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x45a2319 flags: 0x1400000000000200(slab|node=1|zone=1) raw: 1400000000000200 ffffea01168dea88 ffffea0116951f48 ffff88810004ff00 raw: 0000000000000000 ffff88c5a2319000 0000000100000008 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88c5a2319c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88c5a2319c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88c5a2319d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88c5a2319d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88c5a2319e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ==================================================================
* Yu Zhao <yuzhao@google.com> [220611 17:50]: > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > > above? > > > > > > This was triggered by running the fuzzer over the weekend. > > > > > > $ trinity -C 160 > > > > > > No bisection was done. It was only brought up here because the trace > > > pointed to do_mas_munmap() which was introduced here. > > > > Liam, > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > > not mprotect(). Please take a look. > > Another crash on x86_64, which seems different: Thanks, yes. This one may be different. The others are the same source and I'm working on that. > > ================================================================== > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 > > CPU: 66 PID: 18461 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > Call Trace: > <TASK> > dump_stack_lvl+0xc5/0xf4 > print_address_description+0x7f/0x460 > print_report+0x10b/0x240 > ? mab_mas_cp+0x2d9/0x6c0 > kasan_report+0xe6/0x110 > ? mab_mas_cp+0x2d9/0x6c0 > kasan_check_range+0x2ef/0x310 > ? mab_mas_cp+0x2d9/0x6c0 > memcpy+0x44/0x70 > mab_mas_cp+0x2d9/0x6c0 > mas_spanning_rebalance+0x1a45/0x4d70 > ? stack_trace_save+0xca/0x160 > ? stack_trace_save+0xca/0x160 > mas_wr_spanning_store+0x16a4/0x1ad0 > mas_wr_spanning_store+0x16a4/0x1ad0 > mas_wr_store_entry+0xbf9/0x12e0 > mas_store_prealloc+0x205/0x3c0 > do_mas_align_munmap+0x6cf/0xd10 > do_mas_munmap+0x1bb/0x210 > ? down_write_killable+0xa6/0x110 > __vm_munmap+0x1c4/0x270 > __x64_sys_munmap+0x60/0x70 > do_syscall_64+0x44/0xa0 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x589827 > Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff > ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fff9276c518 EFLAGS: 00000206 ORIG_RAX: 000000000000000b > RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827 > RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000 > RBP: 00000000004cf000 R08: 00007fff9276c550 R09: 0000000000923bf0 > R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000 > R13: 00000000004cf040 R14: 0000000000000004 R15: 00007fff9276c668 > </TASK> > > Allocated by task 18461: > __kasan_slab_alloc+0xaf/0xe0 > kmem_cache_alloc_bulk+0x261/0x360 > mas_alloc_nodes+0x2d7/0x4d0 > mas_preallocate+0xe0/0x220 > do_mas_align_munmap+0x1ce/0xd10 > do_mas_munmap+0x1bb/0x210 > __vm_munmap+0x1c4/0x270 > __x64_sys_munmap+0x60/0x70 > do_syscall_64+0x44/0xa0 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > The buggy address belongs to the object at ffff88c5a2319c00 > which belongs to the cache maple_node of size 256 > The buggy address is located 128 bytes inside of > 256-byte region [ffff88c5a2319c00, ffff88c5a2319d00) > > The buggy address belongs to the physical page: > page:000000000a5cfe8b refcount:1 mapcount:0 mapping:0000000000000000 > index:0x0 pfn:0x45a2319 > flags: 0x1400000000000200(slab|node=1|zone=1) > raw: 1400000000000200 ffffea01168dea88 ffffea0116951f48 ffff88810004ff00 > raw: 0000000000000000 ffff88c5a2319000 0000000100000008 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff88c5a2319c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ffff88c5a2319c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >ffff88c5a2319d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ^ > ffff88c5a2319d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff88c5a2319e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ==================================================================
* Yu Zhao <yuzhao@google.com> [220611 17:50]: > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > > above? > > > > > > This was triggered by running the fuzzer over the weekend. > > > > > > $ trinity -C 160 > > > > > > No bisection was done. It was only brought up here because the trace > > > pointed to do_mas_munmap() which was introduced here. > > > > Liam, > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > > not mprotect(). Please take a look. > > Another crash on x86_64, which seems different: Thanks for this. I was able to reproduce the other crashes that you and Qian reported. I've sent out a patch set to Andrew to apply to the branch which includes the fix for them and an unrelated issue discovered when I wrote the testcases to cover what was going on here. > > ================================================================== > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 > > CPU: 66 PID: 18461 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > Call Trace: > <TASK> > dump_stack_lvl+0xc5/0xf4 > print_address_description+0x7f/0x460 > print_report+0x10b/0x240 > ? mab_mas_cp+0x2d9/0x6c0 > kasan_report+0xe6/0x110 > ? mab_mas_cp+0x2d9/0x6c0 > kasan_check_range+0x2ef/0x310 > ? mab_mas_cp+0x2d9/0x6c0 > memcpy+0x44/0x70 > mab_mas_cp+0x2d9/0x6c0 > mas_spanning_rebalance+0x1a45/0x4d70 > ? stack_trace_save+0xca/0x160 > ? stack_trace_save+0xca/0x160 > mas_wr_spanning_store+0x16a4/0x1ad0 > mas_wr_spanning_store+0x16a4/0x1ad0 > mas_wr_store_entry+0xbf9/0x12e0 > mas_store_prealloc+0x205/0x3c0 > do_mas_align_munmap+0x6cf/0xd10 > do_mas_munmap+0x1bb/0x210 > ? down_write_killable+0xa6/0x110 > __vm_munmap+0x1c4/0x270 > __x64_sys_munmap+0x60/0x70 > do_syscall_64+0x44/0xa0 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x589827 > Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff > ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fff9276c518 EFLAGS: 00000206 ORIG_RAX: 000000000000000b > RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827 > RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000 > RBP: 00000000004cf000 R08: 00007fff9276c550 R09: 0000000000923bf0 > R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000 > R13: 00000000004cf040 R14: 0000000000000004 R15: 00007fff9276c668 > </TASK> ... As for this crash, I was unable to reproduce and the code I just sent out changes this code a lot. Was this running with "trinity -c madvise" or another use case/fuzzer? Thanks, Liam
On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > * Yu Zhao <yuzhao@google.com> [220611 17:50]: > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > > > above? > > > > > > > > This was triggered by running the fuzzer over the weekend. > > > > > > > > $ trinity -C 160 > > > > > > > > No bisection was done. It was only brought up here because the trace > > > > pointed to do_mas_munmap() which was introduced here. > > > > > > Liam, > > > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > > > not mprotect(). Please take a look. > > > > Another crash on x86_64, which seems different: > > Thanks for this. I was able to reproduce the other crashes that you and > Qian reported. I've sent out a patch set to Andrew to apply to the > branch which includes the fix for them and an unrelated issue discovered > when I wrote the testcases to cover what was going on here. Thanks. I'm restarting the test and will report the results in a few hours. > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 ^^^^^^^^^ > As for this crash, I was unable to reproduce and the code I just sent > out changes this code a lot. Was this running with "trinity -c madvise" > or another use case/fuzzer? This is also stress-ng (same as the one on arm64). The test stopped before it could try syzkaller (fuzzer).
* Yu Zhao <yuzhao@google.com> [220615 14:08]: > On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Yu Zhao <yuzhao@google.com> [220611 17:50]: > > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > > > > above? > > > > > > > > > > This was triggered by running the fuzzer over the weekend. > > > > > > > > > > $ trinity -C 160 > > > > > > > > > > No bisection was done. It was only brought up here because the trace > > > > > pointed to do_mas_munmap() which was introduced here. > > > > > > > > Liam, > > > > > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > > > > not mprotect(). Please take a look. > > > > > > Another crash on x86_64, which seems different: > > > > Thanks for this. I was able to reproduce the other crashes that you and > > Qian reported. I've sent out a patch set to Andrew to apply to the > > branch which includes the fix for them and an unrelated issue discovered > > when I wrote the testcases to cover what was going on here. > > Thanks. I'm restarting the test and will report the results in a few hours. > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 > ^^^^^^^^^ > > > As for this crash, I was unable to reproduce and the code I just sent > > out changes this code a lot. Was this running with "trinity -c madvise" > > or another use case/fuzzer? > > This is also stress-ng (same as the one on arm64). The test stopped > before it could try syzkaller (fuzzer). Thanks. What are the arguments to stress-ng you use? I've run "stress-ng --class vm -a 20 -t 600s --temp-path /tmp" until it OOMs on my vm, but it only has 8GB of ram. Regards, Liam
On Wed, Jun 15, 2022 at 12:55 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > * Yu Zhao <yuzhao@google.com> [220615 14:08]: > > On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > * Yu Zhao <yuzhao@google.com> [220611 17:50]: > > > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > > > > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > > > > > above? > > > > > > > > > > > > This was triggered by running the fuzzer over the weekend. > > > > > > > > > > > > $ trinity -C 160 > > > > > > > > > > > > No bisection was done. It was only brought up here because the trace > > > > > > pointed to do_mas_munmap() which was introduced here. > > > > > > > > > > Liam, > > > > > > > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > > > > > not mprotect(). Please take a look. > > > > > > > > Another crash on x86_64, which seems different: > > > > > > Thanks for this. I was able to reproduce the other crashes that you and > > > Qian reported. I've sent out a patch set to Andrew to apply to the > > > branch which includes the fix for them and an unrelated issue discovered > > > when I wrote the testcases to cover what was going on here. > > > > Thanks. I'm restarting the test and will report the results in a few hours. > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 > > ^^^^^^^^^ > > > > > As for this crash, I was unable to reproduce and the code I just sent > > > out changes this code a lot. Was this running with "trinity -c madvise" > > > or another use case/fuzzer? > > > > This is also stress-ng (same as the one on arm64). The test stopped > > before it could try syzkaller (fuzzer). > > Thanks. What are the arguments to stress-ng you use? I've run > "stress-ng --class vm -a 20 -t 600s --temp-path /tmp" until it OOMs on > my vm, but it only has 8GB of ram. Yes, I used the same parameters with 512GB of RAM, and the kernel with KASAN and other debug options.
On Wed, Jun 15, 2022 at 1:05 PM Yu Zhao <yuzhao@google.com> wrote: > > On Wed, Jun 15, 2022 at 12:55 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Yu Zhao <yuzhao@google.com> [220615 14:08]: > > > On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > * Yu Zhao <yuzhao@google.com> [220611 17:50]: > > > > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > > > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <quic_qiancai@quicinc.com> wrote: > > > > > > > > > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote: > > > > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64 > > > > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch > > > > > > > > above? > > > > > > > > > > > > > > This was triggered by running the fuzzer over the weekend. > > > > > > > > > > > > > > $ trinity -C 160 > > > > > > > > > > > > > > No bisection was done. It was only brought up here because the trace > > > > > > > pointed to do_mas_munmap() which was introduced here. > > > > > > > > > > > > Liam, > > > > > > > > > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(), > > > > > > not mprotect(). Please take a look. > > > > > > > > > > Another crash on x86_64, which seems different: > > > > > > > > Thanks for this. I was able to reproduce the other crashes that you and > > > > Qian reported. I've sent out a patch set to Andrew to apply to the > > > > branch which includes the fix for them and an unrelated issue discovered > > > > when I wrote the testcases to cover what was going on here. > > > > > > Thanks. I'm restarting the test and will report the results in a few hours. > > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461 > > > ^^^^^^^^^ > > > > > > > As for this crash, I was unable to reproduce and the code I just sent > > > > out changes this code a lot. Was this running with "trinity -c madvise" > > > > or another use case/fuzzer? > > > > > > This is also stress-ng (same as the one on arm64). The test stopped > > > before it could try syzkaller (fuzzer). > > > > Thanks. What are the arguments to stress-ng you use? I've run > > "stress-ng --class vm -a 20 -t 600s --temp-path /tmp" until it OOMs on > > my vm, but it only has 8GB of ram. > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > KASAN and other debug options. Sorry, Liam. I got the same crash :( 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop 55140693394d maple_tree: Make mas_prealloc() error checking more generic 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes 4d4472148ccd maple_tree: Change spanning store to work on larger trees ea36bcc14c00 test_maple_tree: Add tests for preallocations and large spanning writes 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() ================================================================== BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 Call Trace: <TASK> dump_stack_lvl+0xc5/0xf4 print_address_description+0x7f/0x460 print_report+0x10b/0x240 ? mab_mas_cp+0x2d9/0x6c0 kasan_report+0xe6/0x110 ? mast_spanning_rebalance+0x2634/0x29b0 ? mab_mas_cp+0x2d9/0x6c0 kasan_check_range+0x2ef/0x310 ? mab_mas_cp+0x2d9/0x6c0 ? mab_mas_cp+0x2d9/0x6c0 memcpy+0x44/0x70 mab_mas_cp+0x2d9/0x6c0 mas_spanning_rebalance+0x1a3e/0x4f90 ? stack_trace_save+0xca/0x160 ? stack_trace_save+0xca/0x160 mas_wr_spanning_store+0x16c5/0x1b80 mas_wr_store_entry+0xbf9/0x12e0 mas_store_prealloc+0x205/0x3c0 do_mas_align_munmap+0x6cf/0xd10 do_mas_munmap+0x1bb/0x210 ? down_write_killable+0xa6/0x110 __vm_munmap+0x1c4/0x270 __x64_sys_munmap+0x60/0x70 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x589827 Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffee601ec08 EFLAGS: 00000206 ORIG_RAX: 000000000000000b RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827 RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000 RBP: 00000000004cf000 R08: 00007ffee601ec40 R09: 0000000000923bf0 R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000 R13: 00000000004cf040 R14: 0000000000000002 R15: 00007ffee601ed58 </TASK> Allocated by task 19303: __kasan_slab_alloc+0xaf/0xe0 kmem_cache_alloc_bulk+0x261/0x360 mas_alloc_nodes+0x2d7/0x4d0 mas_preallocate+0xe2/0x230 do_mas_align_munmap+0x1ce/0xd10 do_mas_munmap+0x1bb/0x210 __vm_munmap+0x1c4/0x270 __x64_sys_munmap+0x60/0x70 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff88c35a3b9e00 which belongs to the cache maple_node of size 256 The buggy address is located 128 bytes inside of 256-byte region [ffff88c35a3b9e00, ffff88c35a3b9f00) The buggy address belongs to the physical page: page:00000000325428b6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x435a3b9 flags: 0x1400000000000200(slab|node=1|zone=1) raw: 1400000000000200 ffffea010d71a5c8 ffffea010d71dec8 ffff88810004ff00 raw: 0000000000000000 ffff88c35a3b9000 0000000100000008 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88c35a3b9e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88c35a3b9e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88c35a3b9f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88c35a3b9f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88c35a3ba000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ==================================================================
* Yu Zhao <yuzhao@google.com> [220615 17:17]: ... > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > KASAN and other debug options. > > Sorry, Liam. I got the same crash :( Thanks for running this promptly. I am trying to get my own server setup now. > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > spanning writes > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > ================================================================== > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > Call Trace: > <TASK> > dump_stack_lvl+0xc5/0xf4 > print_address_description+0x7f/0x460 > print_report+0x10b/0x240 > ? mab_mas_cp+0x2d9/0x6c0 > kasan_report+0xe6/0x110 > ? mast_spanning_rebalance+0x2634/0x29b0 > ? mab_mas_cp+0x2d9/0x6c0 > kasan_check_range+0x2ef/0x310 > ? mab_mas_cp+0x2d9/0x6c0 > ? mab_mas_cp+0x2d9/0x6c0 > memcpy+0x44/0x70 > mab_mas_cp+0x2d9/0x6c0 > mas_spanning_rebalance+0x1a3e/0x4f90 Does this translate to an inline around line 2997? And then probably around 2808? > ? stack_trace_save+0xca/0x160 > ? stack_trace_save+0xca/0x160 > mas_wr_spanning_store+0x16c5/0x1b80 > mas_wr_store_entry+0xbf9/0x12e0 > mas_store_prealloc+0x205/0x3c0 > do_mas_align_munmap+0x6cf/0xd10 > do_mas_munmap+0x1bb/0x210 > ? down_write_killable+0xa6/0x110 > __vm_munmap+0x1c4/0x270 Looks like a NULL entry being written. > __x64_sys_munmap+0x60/0x70 > do_syscall_64+0x44/0xa0 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x589827 Thanks, Liam
On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > ... > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > KASAN and other debug options. > > > > Sorry, Liam. I got the same crash :( > > Thanks for running this promptly. I am trying to get my own server > setup now. > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > spanning writes > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > ================================================================== > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > Call Trace: > > <TASK> > > dump_stack_lvl+0xc5/0xf4 > > print_address_description+0x7f/0x460 > > print_report+0x10b/0x240 > > ? mab_mas_cp+0x2d9/0x6c0 > > kasan_report+0xe6/0x110 > > ? mast_spanning_rebalance+0x2634/0x29b0 > > ? mab_mas_cp+0x2d9/0x6c0 > > kasan_check_range+0x2ef/0x310 > > ? mab_mas_cp+0x2d9/0x6c0 > > ? mab_mas_cp+0x2d9/0x6c0 > > memcpy+0x44/0x70 > > mab_mas_cp+0x2d9/0x6c0 > > mas_spanning_rebalance+0x1a3e/0x4f90 > > Does this translate to an inline around line 2997? > And then probably around 2808? $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 mab_mas_cp+0x2d9/0x6c0: mab_mas_cp at lib/maple_tree.c:1988 $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e mas_spanning_rebalance+0x1a3e/0x4f90: mast_cp_to_nodes at lib/maple_tree.c:? (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 mas_wr_spanning_store+0x16c5/0x1b80: mas_wr_spanning_store at lib/maple_tree.c:? No idea why faddr2line didn't work for the last two addresses. GDB seems more reliable. (gdb) li *(mab_mas_cp+0x2d9) 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). (gdb) li *(mas_spanning_rebalance+0x1a3e) 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). quit) (gdb) li *(mas_wr_spanning_store+0x16c5) 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
* Yu Zhao <yuzhao@google.com> [220615 21:59]: > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > ... > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > KASAN and other debug options. > > > > > > Sorry, Liam. I got the same crash :( > > > > Thanks for running this promptly. I am trying to get my own server > > setup now. > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > spanning writes > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > ================================================================== > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0xc5/0xf4 > > > print_address_description+0x7f/0x460 > > > print_report+0x10b/0x240 > > > ? mab_mas_cp+0x2d9/0x6c0 > > > kasan_report+0xe6/0x110 > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > ? mab_mas_cp+0x2d9/0x6c0 > > > kasan_check_range+0x2ef/0x310 > > > ? mab_mas_cp+0x2d9/0x6c0 > > > ? mab_mas_cp+0x2d9/0x6c0 > > > memcpy+0x44/0x70 > > > mab_mas_cp+0x2d9/0x6c0 > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > Does this translate to an inline around line 2997? > > And then probably around 2808? > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > mab_mas_cp+0x2d9/0x6c0: > mab_mas_cp at lib/maple_tree.c:1988 > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > mas_spanning_rebalance+0x1a3e/0x4f90: > mast_cp_to_nodes at lib/maple_tree.c:? > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > mas_wr_spanning_store+0x16c5/0x1b80: > mas_wr_spanning_store at lib/maple_tree.c:? > > No idea why faddr2line didn't work for the last two addresses. GDB > seems more reliable. > > (gdb) li *(mab_mas_cp+0x2d9) > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > (gdb) li *(mas_spanning_rebalance+0x1a3e) > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > quit) > (gdb) li *(mas_wr_spanning_store+0x16c5) > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). Thanks. I am not having luck recreating it. I am hitting what looks like an unrelated issue in the unstable mm, "scheduling while atomic". I will try the git commit you indicate above.
On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > * Yu Zhao <yuzhao@google.com> [220615 21:59]: > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > > > ... > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > > KASAN and other debug options. > > > > > > > > Sorry, Liam. I got the same crash :( > > > > > > Thanks for running this promptly. I am trying to get my own server > > > setup now. > > > > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > > spanning writes > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > > > ================================================================== > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > > Call Trace: > > > > <TASK> > > > > dump_stack_lvl+0xc5/0xf4 > > > > print_address_description+0x7f/0x460 > > > > print_report+0x10b/0x240 > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > kasan_report+0xe6/0x110 > > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > kasan_check_range+0x2ef/0x310 > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > memcpy+0x44/0x70 > > > > mab_mas_cp+0x2d9/0x6c0 > > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > > > Does this translate to an inline around line 2997? > > > And then probably around 2808? > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > > mab_mas_cp+0x2d9/0x6c0: > > mab_mas_cp at lib/maple_tree.c:1988 > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > > mas_spanning_rebalance+0x1a3e/0x4f90: > > mast_cp_to_nodes at lib/maple_tree.c:? > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > > mas_wr_spanning_store+0x16c5/0x1b80: > > mas_wr_spanning_store at lib/maple_tree.c:? > > > > No idea why faddr2line didn't work for the last two addresses. GDB > > seems more reliable. > > > > (gdb) li *(mab_mas_cp+0x2d9) > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > > (gdb) li *(mas_spanning_rebalance+0x1a3e) > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > > quit) > > (gdb) li *(mas_wr_spanning_store+0x16c5) > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). > > > Thanks. I am not having luck recreating it. I am hitting what looks > like an unrelated issue in the unstable mm, "scheduling while atomic". > I will try the git commit you indicate above. Fix here: https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/
On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote: > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Yu Zhao <yuzhao@google.com> [220615 21:59]: > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > > > > > ... > > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > > > KASAN and other debug options. > > > > > > > > > > Sorry, Liam. I got the same crash :( > > > > > > > > Thanks for running this promptly. I am trying to get my own server > > > > setup now. > > > > > > > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > > > spanning writes > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > > > > > ================================================================== > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > > > Call Trace: > > > > > <TASK> > > > > > dump_stack_lvl+0xc5/0xf4 > > > > > print_address_description+0x7f/0x460 > > > > > print_report+0x10b/0x240 > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > kasan_report+0xe6/0x110 > > > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > kasan_check_range+0x2ef/0x310 > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > memcpy+0x44/0x70 > > > > > mab_mas_cp+0x2d9/0x6c0 > > > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > > > > > Does this translate to an inline around line 2997? > > > > And then probably around 2808? > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > > > mab_mas_cp+0x2d9/0x6c0: > > > mab_mas_cp at lib/maple_tree.c:1988 > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > > > mas_spanning_rebalance+0x1a3e/0x4f90: > > > mast_cp_to_nodes at lib/maple_tree.c:? > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > > > mas_wr_spanning_store+0x16c5/0x1b80: > > > mas_wr_spanning_store at lib/maple_tree.c:? > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB > > > seems more reliable. > > > > > > (gdb) li *(mab_mas_cp+0x2d9) > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > > > (gdb) li *(mas_spanning_rebalance+0x1a3e) > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > > > quit) > > > (gdb) li *(mas_wr_spanning_store+0x16c5) > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). > > > > > > Thanks. I am not having luck recreating it. I am hitting what looks > > like an unrelated issue in the unstable mm, "scheduling while atomic". > > I will try the git commit you indicate above. > > Fix here: > https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/ A seemingly new crash on arm64: KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f] pc : __hwasan_check_x2_67043363+0x4/0x34 lr : mas_wr_walk_descend+0xe0/0x2c0 sp : ffffffc0164378d0 x29: ffffffc0164378f0 x28: 13ffff8028ee7328 x27: ffffffc016437a68 x26: 0dffff807aa63710 x25: ffffffc016437a60 x24: 51ffff8028ee1928 x23: ffffffc016437a78 x22: ffffffc0164379e0 x21: ffffffc016437998 x20: efffffc000000000 x19: ffffffc016437998 x18: 07ffff8077718180 x17: 45ffff800b366010 x16: 0000000000000000 x15: 9cffff8092bfcdf0 x14: ffffffefef411b8c x13: 0000000000000001 x12: 0000000000000002 x11: ffffffffffffff00 x10: 0000000000000000 x9 : efffffc000000000 x8 : ffffffc016437a60 x7 : 0000000000000000 x6 : ffffffefef8246cc x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffeff0bf48ee x2 : 0000000000000008 x1 : ffffffc0164379b8 x0 : ffffffc016437998 Call trace: __hwasan_check_x2_67043363+0x4/0x34 mas_wr_store_entry+0x178/0x5c0 mas_store+0x88/0xc8 dup_mmap+0x4bc/0x6d8 dup_mm+0x8c/0x17c copy_mm+0xb0/0x12c copy_process+0xa44/0x17d4 kernel_clone+0x100/0x2cc __arm64_sys_clone+0xf4/0x120 el0_svc_common+0xfc/0x1cc do_el0_svc_compat+0x38/0x5c el0_svc_compat+0x68/0xf4 el0t_32_sync_handler+0xc0/0xf0 el0t_32_sync+0x190/0x194 Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)
On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <yuzhao@google.com> wrote: > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote: > > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > * Yu Zhao <yuzhao@google.com> [220615 21:59]: > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > > > > > > > ... > > > > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > > > > KASAN and other debug options. > > > > > > > > > > > > Sorry, Liam. I got the same crash :( > > > > > > > > > > Thanks for running this promptly. I am trying to get my own server > > > > > setup now. > > > > > > > > > > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > > > > spanning writes > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > > > > > > > ================================================================== > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > > > > Call Trace: > > > > > > <TASK> > > > > > > dump_stack_lvl+0xc5/0xf4 > > > > > > print_address_description+0x7f/0x460 > > > > > > print_report+0x10b/0x240 > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > kasan_report+0xe6/0x110 > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > kasan_check_range+0x2ef/0x310 > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > memcpy+0x44/0x70 > > > > > > mab_mas_cp+0x2d9/0x6c0 > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > > > > > > > Does this translate to an inline around line 2997? > > > > > And then probably around 2808? > > > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > > > > mab_mas_cp+0x2d9/0x6c0: > > > > mab_mas_cp at lib/maple_tree.c:1988 > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > > > > mas_spanning_rebalance+0x1a3e/0x4f90: > > > > mast_cp_to_nodes at lib/maple_tree.c:? > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > > > > mas_wr_spanning_store+0x16c5/0x1b80: > > > > mas_wr_spanning_store at lib/maple_tree.c:? > > > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB > > > > seems more reliable. > > > > > > > > (gdb) li *(mab_mas_cp+0x2d9) > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e) > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > > > > quit) > > > > (gdb) li *(mas_wr_spanning_store+0x16c5) > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). > > > > > > > > > Thanks. I am not having luck recreating it. I am hitting what looks > > > like an unrelated issue in the unstable mm, "scheduling while atomic". > > > I will try the git commit you indicate above. > > > > Fix here: > > https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/ > > A seemingly new crash on arm64: > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f] > Call trace: > __hwasan_check_x2_67043363+0x4/0x34 > mas_wr_store_entry+0x178/0x5c0 > mas_store+0x88/0xc8 > dup_mmap+0x4bc/0x6d8 > dup_mm+0x8c/0x17c > copy_mm+0xb0/0x12c > copy_process+0xa44/0x17d4 > kernel_clone+0x100/0x2cc > __arm64_sys_clone+0xf4/0x120 > el0_svc_common+0xfc/0x1cc > do_el0_svc_compat+0x38/0x5c > el0_svc_compat+0x68/0xf4 > el0t_32_sync_handler+0xc0/0xf0 > el0t_32_sync+0x190/0x194 > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930) And bad rss counters from another arm64 machine: BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4 Call trace: __mmdrop+0x1f0/0x208 __mmput+0x194/0x198 mmput+0x5c/0x80 exit_mm+0x108/0x190 do_exit+0x244/0xc98 __arm64_sys_exit_group+0x0/0x30 __wake_up_parent+0x0/0x48 el0_svc_common+0xfc/0x1cc do_el0_svc_compat+0x38/0x5c el0_svc_compat+0x68/0xf4 el0t_32_sync_handler+0xc0/0xf0 el0t_32_sync+0x190/0x194 Code: b000b520 91259c00 aa1303e1 94482015 (d4210000)
* Yu Zhao <yuzhao@google.com> [220616 01:56]: > On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <yuzhao@google.com> wrote: > > > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 21:59]: > > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > > > > > > > > > ... > > > > > > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > > > > > KASAN and other debug options. > > > > > > > > > > > > > > Sorry, Liam. I got the same crash :( > > > > > > > > > > > > Thanks for running this promptly. I am trying to get my own server > > > > > > setup now. > > > > > > > > > > > > > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > > > > > spanning writes > > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > > > > > > > > > ================================================================== > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > > > > > Call Trace: > > > > > > > <TASK> > > > > > > > dump_stack_lvl+0xc5/0xf4 > > > > > > > print_address_description+0x7f/0x460 > > > > > > > print_report+0x10b/0x240 > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > kasan_report+0xe6/0x110 > > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > kasan_check_range+0x2ef/0x310 > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > memcpy+0x44/0x70 > > > > > > > mab_mas_cp+0x2d9/0x6c0 > > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > > > > > > > > > Does this translate to an inline around line 2997? > > > > > > And then probably around 2808? > > > > > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > > > > > mab_mas_cp+0x2d9/0x6c0: > > > > > mab_mas_cp at lib/maple_tree.c:1988 > > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > > > > > mas_spanning_rebalance+0x1a3e/0x4f90: > > > > > mast_cp_to_nodes at lib/maple_tree.c:? > > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > > > > > mas_wr_spanning_store+0x16c5/0x1b80: > > > > > mas_wr_spanning_store at lib/maple_tree.c:? > > > > > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB > > > > > seems more reliable. > > > > > > > > > > (gdb) li *(mab_mas_cp+0x2d9) > > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e) > > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > > > > > quit) > > > > > (gdb) li *(mas_wr_spanning_store+0x16c5) > > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). > > > > > > > > > > > > Thanks. I am not having luck recreating it. I am hitting what looks > > > > like an unrelated issue in the unstable mm, "scheduling while atomic". > > > > I will try the git commit you indicate above. > > > > > > Fix here: > > > https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/ > > > > A seemingly new crash on arm64: > > > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f] > > Call trace: > > __hwasan_check_x2_67043363+0x4/0x34 > > mas_wr_store_entry+0x178/0x5c0 > > mas_store+0x88/0xc8 > > dup_mmap+0x4bc/0x6d8 > > dup_mm+0x8c/0x17c > > copy_mm+0xb0/0x12c > > copy_process+0xa44/0x17d4 > > kernel_clone+0x100/0x2cc > > __arm64_sys_clone+0xf4/0x120 > > el0_svc_common+0xfc/0x1cc > > do_el0_svc_compat+0x38/0x5c > > el0_svc_compat+0x68/0xf4 > > el0t_32_sync_handler+0xc0/0xf0 > > el0t_32_sync+0x190/0x194 > > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930) > > And bad rss counters from another arm64 machine: > > BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4 > Call trace: > __mmdrop+0x1f0/0x208 > __mmput+0x194/0x198 > mmput+0x5c/0x80 > exit_mm+0x108/0x190 > do_exit+0x244/0xc98 > __arm64_sys_exit_group+0x0/0x30 > __wake_up_parent+0x0/0x48 > el0_svc_common+0xfc/0x1cc > do_el0_svc_compat+0x38/0x5c > el0_svc_compat+0x68/0xf4 > el0t_32_sync_handler+0xc0/0xf0 > el0t_32_sync+0x190/0x194 > Code: b000b520 91259c00 aa1303e1 94482015 (d4210000) > What was the setup for these two? I'm running trinity, but I suspect you are using stress-ng? If so, what are the arguments? My arm64 vm is even lower memory than my x86_64 vm so I will probably have to adjust accordingly. Thanks, Liam
On Thu, Jun 16, 2022 at 12:27 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > * Yu Zhao <yuzhao@google.com> [220616 01:56]: > > On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 21:59]: > > > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > > > > > > KASAN and other debug options. > > > > > > > > > > > > > > > > Sorry, Liam. I got the same crash :( > > > > > > > > > > > > > > Thanks for running this promptly. I am trying to get my own server > > > > > > > setup now. > > > > > > > > > > > > > > > > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > > > > > > spanning writes > > > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > > > > > > > > > > > ================================================================== > > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > > > > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > > > > > > Call Trace: > > > > > > > > <TASK> > > > > > > > > dump_stack_lvl+0xc5/0xf4 > > > > > > > > print_address_description+0x7f/0x460 > > > > > > > > print_report+0x10b/0x240 > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > kasan_report+0xe6/0x110 > > > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > kasan_check_range+0x2ef/0x310 > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > memcpy+0x44/0x70 > > > > > > > > mab_mas_cp+0x2d9/0x6c0 > > > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > > > > > > > > > > > Does this translate to an inline around line 2997? > > > > > > > And then probably around 2808? > > > > > > > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > > > > > > mab_mas_cp+0x2d9/0x6c0: > > > > > > mab_mas_cp at lib/maple_tree.c:1988 > > > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90: > > > > > > mast_cp_to_nodes at lib/maple_tree.c:? > > > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > > > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > > > > > > mas_wr_spanning_store+0x16c5/0x1b80: > > > > > > mas_wr_spanning_store at lib/maple_tree.c:? > > > > > > > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB > > > > > > seems more reliable. > > > > > > > > > > > > (gdb) li *(mab_mas_cp+0x2d9) > > > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > > > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e) > > > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > > > > > > quit) > > > > > > (gdb) li *(mas_wr_spanning_store+0x16c5) > > > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). > > > > > > > > > > > > > > > Thanks. I am not having luck recreating it. I am hitting what looks > > > > > like an unrelated issue in the unstable mm, "scheduling while atomic". > > > > > I will try the git commit you indicate above. > > > > > > > > Fix here: > > > > https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/ > > > > > > A seemingly new crash on arm64: > > > > > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f] > > > Call trace: > > > __hwasan_check_x2_67043363+0x4/0x34 > > > mas_wr_store_entry+0x178/0x5c0 > > > mas_store+0x88/0xc8 > > > dup_mmap+0x4bc/0x6d8 > > > dup_mm+0x8c/0x17c > > > copy_mm+0xb0/0x12c > > > copy_process+0xa44/0x17d4 > > > kernel_clone+0x100/0x2cc > > > __arm64_sys_clone+0xf4/0x120 > > > el0_svc_common+0xfc/0x1cc > > > do_el0_svc_compat+0x38/0x5c > > > el0_svc_compat+0x68/0xf4 > > > el0t_32_sync_handler+0xc0/0xf0 > > > el0t_32_sync+0x190/0x194 > > > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930) > > > > And bad rss counters from another arm64 machine: > > > > BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4 > > Call trace: > > __mmdrop+0x1f0/0x208 > > __mmput+0x194/0x198 > > mmput+0x5c/0x80 > > exit_mm+0x108/0x190 > > do_exit+0x244/0xc98 > > __arm64_sys_exit_group+0x0/0x30 > > __wake_up_parent+0x0/0x48 > > el0_svc_common+0xfc/0x1cc > > do_el0_svc_compat+0x38/0x5c > > el0_svc_compat+0x68/0xf4 > > el0t_32_sync_handler+0xc0/0xf0 > > el0t_32_sync+0x190/0x194 > > Code: b000b520 91259c00 aa1303e1 94482015 (d4210000) > > > > What was the setup for these two? I'm running trinity, but I suspect > you are using stress-ng? That's correct. > If so, what are the arguments? My arm64 vm is > even lower memory than my x86_64 vm so I will probably have to adjust > accordingly. I usually lower the N for `-a N`.
* Yu Zhao <yuzhao@google.com> [220616 14:35]: > On Thu, Jun 16, 2022 at 12:27 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > * Yu Zhao <yuzhao@google.com> [220616 01:56]: > > > On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote: > > > > > > > > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 21:59]: > > > > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett@oracle.com> wrote: > > > > > > > > > > > > > > > > * Yu Zhao <yuzhao@google.com> [220615 17:17]: > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with > > > > > > > > > > KASAN and other debug options. > > > > > > > > > > > > > > > > > > Sorry, Liam. I got the same crash :( > > > > > > > > > > > > > > > > Thanks for running this promptly. I am trying to get my own server > > > > > > > > setup now. > > > > > > > > > > > > > > > > > > > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything) > > > > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop > > > > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic > > > > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes > > > > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees > > > > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large > > > > > > > > > spanning writes > > > > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr() > > > > > > > > > > > > > > > > > > ================================================================== > > > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0 > > > > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303 > > > > > > > > > > > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1 > > > > > > > > > Call Trace: > > > > > > > > > <TASK> > > > > > > > > > dump_stack_lvl+0xc5/0xf4 > > > > > > > > > print_address_description+0x7f/0x460 > > > > > > > > > print_report+0x10b/0x240 > > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > > kasan_report+0xe6/0x110 > > > > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0 > > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > > kasan_check_range+0x2ef/0x310 > > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0 > > > > > > > > > memcpy+0x44/0x70 > > > > > > > > > mab_mas_cp+0x2d9/0x6c0 > > > > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90 > > > > > > > > > > > > > > > > Does this translate to an inline around line 2997? > > > > > > > > And then probably around 2808? > > > > > > > > > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9 > > > > > > > mab_mas_cp+0x2d9/0x6c0: > > > > > > > mab_mas_cp at lib/maple_tree.c:1988 > > > > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e > > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90: > > > > > > > mast_cp_to_nodes at lib/maple_tree.c:? > > > > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997 > > > > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5 > > > > > > > mas_wr_spanning_store+0x16c5/0x1b80: > > > > > > > mas_wr_spanning_store at lib/maple_tree.c:? > > > > > > > > > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB > > > > > > > seems more reliable. > > > > > > > > > > > > > > (gdb) li *(mab_mas_cp+0x2d9) > > > > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988). > > > > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e) > > > > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801). > > > > > > > quit) > > > > > > > (gdb) li *(mas_wr_spanning_store+0x16c5) > > > > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030). > > > > > > > > > > > > > > > > > > Thanks. I am not having luck recreating it. I am hitting what looks > > > > > > like an unrelated issue in the unstable mm, "scheduling while atomic". > > > > > > I will try the git commit you indicate above. > > > > > > > > > > Fix here: > > > > > https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/ > > > > > > > > A seemingly new crash on arm64: > > > > > > > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f] > > > > Call trace: > > > > __hwasan_check_x2_67043363+0x4/0x34 > > > > mas_wr_store_entry+0x178/0x5c0 > > > > mas_store+0x88/0xc8 > > > > dup_mmap+0x4bc/0x6d8 > > > > dup_mm+0x8c/0x17c > > > > copy_mm+0xb0/0x12c > > > > copy_process+0xa44/0x17d4 > > > > kernel_clone+0x100/0x2cc > > > > __arm64_sys_clone+0xf4/0x120 > > > > el0_svc_common+0xfc/0x1cc > > > > do_el0_svc_compat+0x38/0x5c > > > > el0_svc_compat+0x68/0xf4 > > > > el0t_32_sync_handler+0xc0/0xf0 > > > > el0t_32_sync+0x190/0x194 > > > > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930) > > > > > > And bad rss counters from another arm64 machine: > > > > > > BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4 > > > Call trace: > > > __mmdrop+0x1f0/0x208 > > > __mmput+0x194/0x198 > > > mmput+0x5c/0x80 > > > exit_mm+0x108/0x190 > > > do_exit+0x244/0xc98 > > > __arm64_sys_exit_group+0x0/0x30 > > > __wake_up_parent+0x0/0x48 > > > el0_svc_common+0xfc/0x1cc > > > do_el0_svc_compat+0x38/0x5c > > > el0_svc_compat+0x68/0xf4 > > > el0t_32_sync_handler+0xc0/0xf0 > > > el0t_32_sync+0x190/0x194 > > > Code: b000b520 91259c00 aa1303e1 94482015 (d4210000) > > > > > > > What was the setup for these two? I'm running trinity, but I suspect > > you are using stress-ng? > > That's correct. > > > If so, what are the arguments? My arm64 vm is > > even lower memory than my x86_64 vm so I will probably have to adjust > > accordingly. > > I usually lower the N for `-a N`. I'm still trying to reproduce any of these bugs you are seeing. I sent out two fixes that I cc'ed you on that may help at least the last one here. My thinking is there isn't enough pre-allocation happening and so I am missing some of the munmap events. I fixed this by not pre-allocating the side tree and return -ENOMEM instead. This is safe since munmap can allocate anyways for splits.
diff --git a/include/linux/mm.h b/include/linux/mm.h index f6d633f04a64..0cc2cb692a78 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2712,8 +2712,9 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); -extern int __do_munmap(struct mm_struct *, unsigned long, size_t, - struct list_head *uf, bool downgrade); +extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm, + unsigned long start, size_t len, struct list_head *uf, + bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); diff --git a/mm/mmap.c b/mm/mmap.c index d49dca8fecd5..dd21f0a3f236 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2372,47 +2372,6 @@ static void unmap_region(struct mm_struct *mm, tlb_finish_mmu(&tlb); } -/* - * Create a list of vma's touched by the unmap, removing them from the mm's - * vma list as we go.. - */ -static bool -detach_vmas_to_be_unmapped(struct mm_struct *mm, struct ma_state *mas, - struct vm_area_struct *vma, struct vm_area_struct *prev, - unsigned long end) -{ - struct vm_area_struct **insertion_point; - struct vm_area_struct *tail_vma = NULL; - - insertion_point = (prev ? &prev->vm_next : &mm->mmap); - vma->vm_prev = NULL; - vma_mas_szero(mas, vma->vm_start, end); - do { - if (vma->vm_flags & VM_LOCKED) - mm->locked_vm -= vma_pages(vma); - mm->map_count--; - tail_vma = vma; - vma = vma->vm_next; - } while (vma && vma->vm_start < end); - *insertion_point = vma; - if (vma) - vma->vm_prev = prev; - else - mm->highest_vm_end = prev ? vm_end_gap(prev) : 0; - tail_vma->vm_next = NULL; - - /* - * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or - * VM_GROWSUP VMA. Such VMAs can change their size under - * down_read(mmap_lock) and collide with the VMA we are about to unmap. - */ - if (vma && (vma->vm_flags & VM_GROWSDOWN)) - return false; - if (prev && (prev->vm_flags & VM_GROWSUP)) - return false; - return true; -} - /* * __split_vma() bypasses sysctl_max_map_count checking. We use this where it * has already been checked or doesn't make sense to fail. @@ -2492,40 +2451,51 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, return __split_vma(mm, vma, addr, new_below); } -/* Munmap is split into 2 main parts -- this part which finds - * what needs doing, and the areas themselves, which do the - * work. This now handles partial unmappings. - * Jeremy Fitzhardinge <jeremy@goop.org> - */ -int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, - struct list_head *uf, bool downgrade) +static inline int +unlock_range(struct vm_area_struct *start, struct vm_area_struct **tail, + unsigned long limit) { - unsigned long end; - struct vm_area_struct *vma, *prev, *last; - int error = -ENOMEM; - MA_STATE(mas, &mm->mm_mt, 0, 0); + struct mm_struct *mm = start->vm_mm; + struct vm_area_struct *tmp = start; + int count = 0; - if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) - return -EINVAL; + while (tmp && tmp->vm_start < limit) { + *tail = tmp; + count++; + if (tmp->vm_flags & VM_LOCKED) + mm->locked_vm -= vma_pages(tmp); - len = PAGE_ALIGN(len); - end = start + len; - if (len == 0) - return -EINVAL; + tmp = tmp->vm_next; + } - /* arch_unmap() might do unmaps itself. */ - arch_unmap(mm, start, end); + return count; +} - /* Find the first overlapping VMA where start < vma->vm_end */ - vma = find_vma_intersection(mm, start, end); - if (!vma) - return 0; +/* + * do_mas_align_munmap() - munmap the aligned region from @start to @end. + * @mas: The maple_state, ideally set up to alter the correct tree location. + * @vma: The starting vm_area_struct + * @mm: The mm_struct + * @start: The aligned start address to munmap. + * @end: The aligned end address to munmap. + * @uf: The userfaultfd list_head + * @downgrade: Set to true to attempt a write downgrade of the mmap_sem + * + * If @downgrade is true, check return code for potential release of the lock. + */ +static int +do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma, + struct mm_struct *mm, unsigned long start, + unsigned long end, struct list_head *uf, bool downgrade) +{ + struct vm_area_struct *prev, *last; + int error = -ENOMEM; + /* we have start < vma->vm_end */ - if (mas_preallocate(&mas, vma, GFP_KERNEL)) + if (mas_preallocate(mas, vma, GFP_KERNEL)) return -ENOMEM; - prev = vma->vm_prev; - /* we have start < vma->vm_end */ + mas->last = end - 1; /* * If we need to split any vma, do it now to save pain later. * @@ -2546,17 +2516,31 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, error = __split_vma(mm, vma, start, 0); if (error) goto split_failed; + prev = vma; + vma = __vma_next(mm, prev); + mas->index = start; + mas_reset(mas); + } else { + prev = vma->vm_prev; } + if (vma->vm_end >= end) + last = vma; + else + last = find_vma_intersection(mm, end - 1, end); + /* Does it split the last one? */ - last = find_vma(mm, end); - if (last && end > last->vm_start) { + if (last && end < last->vm_end) { error = __split_vma(mm, last, end, 1); + if (error) goto split_failed; + + if (vma == last) + vma = __vma_next(mm, prev); + mas_reset(mas); } - vma = __vma_next(mm, prev); if (unlikely(uf)) { /* @@ -2569,16 +2553,46 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, * failure that it's not worth optimizing it for. */ error = userfaultfd_unmap_prep(vma, start, end, uf); + if (error) goto userfaultfd_error; } - /* Detach vmas from rbtree */ - if (!detach_vmas_to_be_unmapped(mm, &mas, vma, prev, end)) - downgrade = false; + /* + * unlock any mlock()ed ranges before detaching vmas, count the number + * of VMAs to be dropped, and return the tail entry of the affected + * area. + */ + mm->map_count -= unlock_range(vma, &last, end); + /* Drop removed area from the tree */ + mas_store_prealloc(mas, NULL); - if (downgrade) - mmap_write_downgrade(mm); + /* Detach vmas from the MM linked list */ + vma->vm_prev = NULL; + if (prev) + prev->vm_next = last->vm_next; + else + mm->mmap = last->vm_next; + + if (last->vm_next) { + last->vm_next->vm_prev = prev; + last->vm_next = NULL; + } else + mm->highest_vm_end = prev ? vm_end_gap(prev) : 0; + + /* + * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or + * VM_GROWSUP VMA. Such VMAs can change their size under + * down_read(mmap_lock) and collide with the VMA we are about to unmap. + */ + if (downgrade) { + if (last && (last->vm_flags & VM_GROWSDOWN)) + downgrade = false; + else if (prev && (prev->vm_flags & VM_GROWSUP)) + downgrade = false; + else + mmap_write_downgrade(mm); + } unmap_region(mm, vma, prev, start, end); @@ -2592,14 +2606,63 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, map_count_exceeded: split_failed: userfaultfd_error: - mas_destroy(&mas); + mas_destroy(mas); return error; } +/* + * do_mas_munmap() - munmap a given range. + * @mas: The maple state + * @mm: The mm_struct + * @start: The start address to munmap + * @len: The length of the range to munmap + * @uf: The userfaultfd list_head + * @downgrade: set to true if the user wants to attempt to write_downgrade the + * mmap_sem + * + * This function takes a @mas that is either pointing to the previous VMA or set + * to MA_START and sets it up to remove the mapping(s). The @len will be + * aligned and any arch_unmap work will be preformed. + * + * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise. + */ +int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm, + unsigned long start, size_t len, struct list_head *uf, + bool downgrade) +{ + unsigned long end; + struct vm_area_struct *vma; + + if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start) + return -EINVAL; + + end = start + PAGE_ALIGN(len); + if (end == start) + return -EINVAL; + + /* arch_unmap() might do unmaps itself. */ + arch_unmap(mm, start, end); + + /* Find the first overlapping VMA */ + vma = mas_find(mas, end - 1); + if (!vma) + return 0; + + return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade); +} + +/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls. + * @mm: The mm_struct + * @start: The start address to munmap + * @len: The length to be munmapped. + * @uf: The userfaultfd list_head + */ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf) { - return __do_munmap(mm, start, len, uf, false); + MA_STATE(mas, &mm->mm_mt, start, start); + + return do_mas_munmap(&mas, mm, start, len, uf, false); } unsigned long mmap_region(struct file *file, unsigned long addr, @@ -2633,7 +2696,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } /* Unmap any existing mapping in the area */ - if (do_munmap(mm, addr, len, uf)) + if (do_mas_munmap(&mas, mm, addr, len, uf, false)) return -ENOMEM; /* @@ -2845,11 +2908,12 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) int ret; struct mm_struct *mm = current->mm; LIST_HEAD(uf); + MA_STATE(mas, &mm->mm_mt, start, start); if (mmap_write_lock_killable(mm)) return -EINTR; - ret = __do_munmap(mm, start, len, &uf, downgrade); + ret = do_mas_munmap(&mas, mm, start, len, &uf, downgrade); /* * Returning 1 indicates mmap_lock is downgraded. * But 1 is not legal return value of vm_munmap() and munmap(), reset @@ -2984,10 +3048,7 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma, if (likely((vma->vm_end < oldbrk) || ((vma->vm_start == newbrk) && (vma->vm_end == oldbrk)))) { /* remove entire mapping(s) */ - mas_set(mas, newbrk); - if (vma->vm_start != newbrk) - mas_reset(mas); /* cause a re-walk for the first overlap. */ - ret = __do_munmap(mm, newbrk, oldbrk - newbrk, uf, true); + ret = do_mas_munmap(mas, mm, newbrk, oldbrk-newbrk, uf, true); goto munmap_full_vma; } @@ -3168,9 +3229,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags) if (ret) goto limits_failed; - if (find_vma_intersection(mm, addr, addr + len)) - ret = do_munmap(mm, addr, len, &uf); - + ret = do_mas_munmap(&mas, mm, addr, len, &uf, 0); if (ret) goto munmap_failed; diff --git a/mm/mremap.c b/mm/mremap.c index 98f50e633009..4495f69eccbe 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -975,20 +975,23 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, /* * Always allow a shrinking remap: that just unmaps * the unnecessary pages.. - * __do_munmap does all the needed commit accounting, and + * do_mas_munmap does all the needed commit accounting, and * downgrades mmap_lock to read if so directed. */ if (old_len >= new_len) { int retval; + MA_STATE(mas, &mm->mm_mt, addr + new_len, addr + new_len); - retval = __do_munmap(mm, addr+new_len, old_len - new_len, - &uf_unmap, true); - if (retval < 0 && old_len != new_len) { - ret = retval; - goto out; + retval = do_mas_munmap(&mas, mm, addr + new_len, + old_len - new_len, &uf_unmap, true); /* Returning 1 indicates mmap_lock is downgraded to read. */ - } else if (retval == 1) + if (retval == 1) { downgraded = true; + } else if (retval < 0 && old_len != new_len) { + ret = retval; + goto out; + } + ret = addr; goto out; }