Message ID | 20250312221521.1255690-1-yang@os.amperecomputing.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: vma: skip anonymous vma when inserting vma to file rmap tree | expand |
On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote: > LKP reported 800% performance improvement for small-allocs benchmark > from vm-scalability [1] with patch ("/dev/zero: make private mapping > full anonymous mapping") [2], but the patch was nack'ed since it changes > the output of smaps somewhat. ... > --- > v2: > * Added the comments in code suggested by Lorenzo > * Collected R-b from Lorenze > > mm/vma.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) Hi Yang, Replying to v2, as the code is the same as v1 in linux-next: The LTP test "mmap10" consistently triggers a kernel NULL pointer dereference with this change, at least on x86 and s390. Reverting just this single patch from linux-next fixes the issue. LTP: starting mmap10 BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000010d22a067 P4D 800000010d22a067 PUD 11ff09067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 5 UID: 0 PID: 1719 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 RIP: 0010:__rb_insert_augmented+0x2b/0x1d0 Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00 RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246 RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088 RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000 R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0 R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000 FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0 Call Trace: <TASK> ? __die_body.cold+0x19/0x2b ? page_fault_oops+0xc4/0x1f0 ? search_extable+0x26/0x30 ? search_module_extables+0x3f/0x60 ? exc_page_fault+0x6b/0x150 ? asm_exc_page_fault+0x26/0x30 ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10 ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10 ? __rb_insert_augmented+0x2b/0x1d0 copy_mm+0x48a/0x8c0 copy_process+0xf98/0x1930 kernel_clone+0xb7/0x3b0 __do_sys_clone+0x65/0x90 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff643eb2b00 Code: 31 c0 31 d2 31 f6 bf 11 00 20 01 48 89 e5 53 48 83 ec 08 64 48 8b 04 25 10 00 00 00 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 89 c3 85 c0 75 31 64 48 8b 04 25 10 00 00 RSP: 002b:00007ffdac219010 EFLAGS: 00000202 ORIG_RAX: 0000000000000038 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff643eb2b00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 RBP: 00007ffdac219020 R08: 0000000000000000 R09: 0000000000000000 R10: 00007ff643df1a10 R11: 0000000000000202 R12: 0000000000000001 R13: 0000000000000000 R14: 00007ff644036000 R15: 0000000000000000 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- RIP: 0010:__rb_insert_augmented+0x2b/0x1d0 Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00 RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246 RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088 RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000 R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0 R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000 FS: 00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0 LTP: starting mmap10 Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:000000000247c007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d Oops: 0004 ilc:3 [#1] SMP Modules linked in: CPU: 0 UID: 0 PID: 665 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #16 Hardware name: IBM 3931 A01 704 (KVM/Linux) Krnl PSW : 0704c00180000000 000003ffe0ee0440 (__rb_insert_augmented+0x60/0x210) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 00000000009ff000 0000000000000000 000000008e5f7508 0000000084a7ed08 00000000000009fe 0000000000000000 0000000000000000 0000037fe06c7b68 00000000801d0e90 000003ffe04158d0 0000000084a7ed08 0000000000000000 000003ffbb700000 00000000801d0e48 000003ffe0ee057c 0000037fe06c7a40 Krnl Code: 000003ffe0ee0430: e31030080004 lg %r1,8(%r3) 000003ffe0ee0436: ec1200888064 cgrj %r1,%r2,8,000003ffe0ee0546 #000003ffe0ee043c: b90400a3 lgr %r10,%r3 >000003ffe0ee0440: e310b0100024 stg %r1,16(%r11) 000003ffe0ee0446: e3b030080024 stg %r11,8(%r3) 000003ffe0ee044c: ec180009007c cgij %r1,0,8,000003ffe0ee045e 000003ffe0ee0452: ec2b000100d9 aghik %r2,%r11,1 000003ffe0ee0458: e32010000024 stg %r2,0(%r1) Call Trace: [<000003ffe0ee0440>] __rb_insert_augmented+0x60/0x210 [<000003ffe016d6c4>] dup_mmap+0x424/0x8c0 [<000003ffe016dc62>] copy_mm+0x102/0x1c0 [<000003ffe016e8ae>] copy_process+0x7ce/0x12b0 [<000003ffe016f458>] kernel_clone+0x68/0x380 [<000003ffe016f84a>] __do_sys_clone+0x5a/0x70 [<000003ffe016faa0>] __s390x_sys_clone+0x40/0x50 [<000003ffe011c9b6>] do_syscall.constprop.0+0x116/0x140 [<000003ffe0ef1d64>] __do_syscall+0xd4/0x1c0 [<000003ffe0efd044>] system_call+0x74/0x98 Last Breaking-Event-Address: [<000003ffe0ee058a>] __rb_insert_augmented+0x1aa/0x210 Kernel panic - not syncing: Fatal exception: panic_on_oops
diff --git a/mm/vma.c b/mm/vma.c index c7abef5177cc..2fe99d181cfd 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -1648,6 +1648,10 @@ static void unlink_file_vma_batch_process(struct unlink_vma_file_batch *vb) void unlink_file_vma_batch_add(struct unlink_vma_file_batch *vb, struct vm_area_struct *vma) { + /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */ + if (vma_is_anonymous(vma)) + return; + if (vma->vm_file == NULL) return; @@ -1671,8 +1675,13 @@ void unlink_file_vma_batch_final(struct unlink_vma_file_batch *vb) */ void unlink_file_vma(struct vm_area_struct *vma) { - struct file *file = vma->vm_file; + struct file *file; + + /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */ + if (vma_is_anonymous(vma)) + return; + file = vma->vm_file; if (file) { struct address_space *mapping = file->f_mapping; @@ -1684,9 +1693,14 @@ void unlink_file_vma(struct vm_area_struct *vma) void vma_link_file(struct vm_area_struct *vma) { - struct file *file = vma->vm_file; + struct file *file; struct address_space *mapping; + /* Rare, but e.g. /dev/zero sets vma->vm_file on an anon VMA */ + if (vma_is_anonymous(vma)) + return; + + file = vma->vm_file; if (file) { mapping = file->f_mapping; i_mmap_lock_write(mapping);