Message ID | 20250214115547.0d7287d3@gandalf.local.home (mailing list archive) |
---|---|
State | Accepted |
Commit | 129fe718819cc5e24ea2f489db9ccd4371f0c6f6 |
Headers | show |
Series | [v2] tracing: Do not allow mmap() of persistent ring buffer | expand |
On Fri, 14 Feb 2025 11:55:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: Steven Rostedt <rostedt@goodmis.org> .... > The reason was that the code that maps the ring buffer pages to user space > has: > > page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]); ... > But virt_to_page() does not work with vmap()'d memory which is what the > persistent ring buffer has. It is rather trivial to allow this, but for > now just disable mmap() of instances that have their ring buffer from the > reserve_mem option. I've recently fallen foul of the same issue elsewhere [1]. Perhaps there ought to be a variant of virt_to_page() that returns an error for addresses outside the kernel map. Or even a fast version that doesn't check for places where the cost of the additional conditional would matter. Even a kernel panic for an unchecked NULL pointer would be easier to diagnose than the current situation. [1] In my case it was dma_alloc_coherent() using vmalloc() when an iommu is enabled and then the wrong things happening when I try to mmap() the buffer into userspace (offset in both the dma buffer and the user file). I do need to check that the iommu is honouring the buffer alignment. David
On Fri, 14 Feb 2025 11:55:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > When trying to mmap a trace instance buffer that is attached to > reserve_mem, it would crash: > > BUG: unable to handle page fault for address: ffffe97bd00025c8 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 2862f3067 P4D 2862f3067 PUD 0 > Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI > CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty #233 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > RIP: 0010:validate_page_before_insert+0x5/0xb0 > Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89 > RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246 > RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29 > RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08 > RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004 > R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000 > R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000 > FS: 00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0 > Call Trace: > <TASK> > ? __die_body.cold+0x19/0x1f > ? __die+0x2e/0x40 > ? page_fault_oops+0x157/0x2b0 > ? search_module_extables+0x53/0x80 > ? validate_page_before_insert+0x5/0xb0 > ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70 > ? __bad_area_nosemaphore+0x16e/0x1b0 > ? bad_area_nosemaphore+0x16/0x20 > ? do_kern_addr_fault+0x77/0x90 > ? exc_page_fault+0x22b/0x230 > ? asm_exc_page_fault+0x2b/0x30 > ? validate_page_before_insert+0x5/0xb0 > ? vm_insert_pages+0x151/0x400 > __rb_map_vma+0x21f/0x3f0 > ring_buffer_map+0x21b/0x2f0 > tracing_buffers_mmap+0x70/0xd0 > __mmap_region+0x6f0/0xbd0 > mmap_region+0x7f/0x130 > do_mmap+0x475/0x610 > vm_mmap_pgoff+0xf2/0x1d0 > ksys_mmap_pgoff+0x166/0x200 > __x64_sys_mmap+0x37/0x50 > x64_sys_call+0x1670/0x1d70 > do_syscall_64+0xbb/0x1d0 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > The reason was that the code that maps the ring buffer pages to user space > has: > > page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]); > > And uses that in: > > vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); > > But virt_to_page() does not work with vmap()'d memory which is what the > persistent ring buffer has. It is rather trivial to allow this, but for > now just disable mmap() of instances that have their ring buffer from the > reserve_mem option. > > If an mmap() is performed on a persistent buffer it will return -ENODEV > just like it would if the .mmap field wasn't defined in the > file_operations structure. > Looks good to me. Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Thanks, > Cc: stable@vger.kernel.org > Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance") > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > Changes since v1: https://lore.kernel.org/20250213180737.061871ae@gandalf.local.home > > - Return -ENODEV instead of -EINVAL as that is what is returned when the .mmap > field is missing from the file_operations structure. > > kernel/trace/trace.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 25ff37aab00f..0e6d517e74e0 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -8279,6 +8279,10 @@ static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma) > struct trace_iterator *iter = &info->iter; > int ret = 0; > > + /* Currently the boot mapped buffer is not supported for mmap */ > + if (iter->tr->flags & TRACE_ARRAY_FL_BOOT) > + return -ENODEV; > + > ret = get_snapshot_map(iter->tr); > if (ret) > return ret; > -- > 2.47.2 >
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 25ff37aab00f..0e6d517e74e0 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8279,6 +8279,10 @@ static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma) struct trace_iterator *iter = &info->iter; int ret = 0; + /* Currently the boot mapped buffer is not supported for mmap */ + if (iter->tr->flags & TRACE_ARRAY_FL_BOOT) + return -ENODEV; + ret = get_snapshot_map(iter->tr); if (ret) return ret;