Message ID | 20231219184556.1552951-1-vdonnefort@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Introducing trace buffer mapping by user-space | expand |
On Tue, 19 Dec 2023 18:45:54 +0000 Vincent Donnefort <vdonnefort@google.com> wrote: > The tracing ring-buffers can be stored on disk or sent to network > without any copy via splice. However the later doesn't allow real time > processing of the traces. A solution is to give userspace direct access > to the ring-buffer pages via a mapping. An application can now become a > consumer of the ring-buffer, in a similar fashion to what trace_pipe > offers. > > Attached to this cover letter an example of consuming read for a > ring-buffer, using libtracefs. > I'm still testing this, but I needed to add this patch to fix two bugs. One is that you are calling rb_wakeup_waiters() for both the buffer and the cpu_buffer, and it needs to know which one to use the container_of() macro. The other is a "goto unlock" that unlocks two locks where only one was taken. -- Steve diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 35f3736f660b..987ad7bd1e8b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -389,6 +389,7 @@ struct rb_irq_work { bool waiters_pending; bool full_waiters_pending; bool wakeup_full; + bool is_cpu_buffer; }; /* @@ -771,10 +772,20 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) static void rb_wake_up_waiters(struct irq_work *work) { struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); - struct ring_buffer_per_cpu *cpu_buffer = - container_of(rbwork, struct ring_buffer_per_cpu, irq_work); + struct ring_buffer_per_cpu *cpu_buffer; + struct trace_buffer *buffer; + int cpu; - rb_update_meta_page(cpu_buffer); + if (rbwork->is_cpu_buffer) { + cpu_buffer = container_of(rbwork, struct ring_buffer_per_cpu, irq_work); + rb_update_meta_page(cpu_buffer); + } else { + buffer = container_of(rbwork, struct trace_buffer, irq_work); + for_each_buffer_cpu(buffer, cpu) { + cpu_buffer = buffer->buffers[cpu]; + rb_update_meta_page(cpu_buffer); + } + } wake_up_all(&rbwork->waiters); if (rbwork->full_waiters_pending || rbwork->wakeup_full) { @@ -1569,6 +1580,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) init_waitqueue_head(&cpu_buffer->irq_work.waiters); init_waitqueue_head(&cpu_buffer->irq_work.full_waiters); mutex_init(&cpu_buffer->mapping_lock); + cpu_buffer->irq_work.is_cpu_buffer = true; bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), GFP_KERNEL, cpu_to_node(cpu)); @@ -6209,7 +6221,8 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu) if (cpu_buffer->mapped) { WRITE_ONCE(cpu_buffer->mapped, cpu_buffer->mapped + 1); - goto unlock; + mutex_unlock(&cpu_buffer->mapping_lock); + return 0; } /* prevent another thread from changing buffer sizes */
On Tue, Dec 19, 2023 at 03:39:24PM -0500, Steven Rostedt wrote: > On Tue, 19 Dec 2023 18:45:54 +0000 > Vincent Donnefort <vdonnefort@google.com> wrote: > > > The tracing ring-buffers can be stored on disk or sent to network > > without any copy via splice. However the later doesn't allow real time > > processing of the traces. A solution is to give userspace direct access > > to the ring-buffer pages via a mapping. An application can now become a > > consumer of the ring-buffer, in a similar fashion to what trace_pipe > > offers. > > > > Attached to this cover letter an example of consuming read for a > > ring-buffer, using libtracefs. > > > > I'm still testing this, but I needed to add this patch to fix two bugs. One > is that you are calling rb_wakeup_waiters() for both the buffer and the > cpu_buffer, and it needs to know which one to use the container_of() macro. > > The other is a "goto unlock" that unlocks two locks where only one was taken. > > -- Steve > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 35f3736f660b..987ad7bd1e8b 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -389,6 +389,7 @@ struct rb_irq_work { > bool waiters_pending; > bool full_waiters_pending; > bool wakeup_full; > + bool is_cpu_buffer; > }; > > /* > @@ -771,10 +772,20 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) > static void rb_wake_up_waiters(struct irq_work *work) > { > struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); > - struct ring_buffer_per_cpu *cpu_buffer = > - container_of(rbwork, struct ring_buffer_per_cpu, irq_work); > + struct ring_buffer_per_cpu *cpu_buffer; > + struct trace_buffer *buffer; > + int cpu; > > - rb_update_meta_page(cpu_buffer); > + if (rbwork->is_cpu_buffer) { > + cpu_buffer = container_of(rbwork, struct ring_buffer_per_cpu, irq_work); > + rb_update_meta_page(cpu_buffer); > + } else { > + buffer = container_of(rbwork, struct trace_buffer, irq_work); > + for_each_buffer_cpu(buffer, cpu) { > + cpu_buffer = buffer->buffers[cpu]; > + rb_update_meta_page(cpu_buffer); > + } > + } Arg, somehow never reproduced the problem :-\. I suppose you need to cat trace/trace_pipe and mmap(trace/cpuX/trace_pipe) at the same time? Updating the meta-page is only useful if the reader we are waking up is a user-space one, which would only happen with the cpu_buffer version of this function. We could limit the update of the meta_page only to this case? > > wake_up_all(&rbwork->waiters); > if (rbwork->full_waiters_pending || rbwork->wakeup_full) { > @@ -1569,6 +1580,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) > init_waitqueue_head(&cpu_buffer->irq_work.waiters); > init_waitqueue_head(&cpu_buffer->irq_work.full_waiters); > mutex_init(&cpu_buffer->mapping_lock); > + cpu_buffer->irq_work.is_cpu_buffer = true; > > bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), > GFP_KERNEL, cpu_to_node(cpu)); > @@ -6209,7 +6221,8 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu) > > if (cpu_buffer->mapped) { > WRITE_ONCE(cpu_buffer->mapped, cpu_buffer->mapped + 1); > - goto unlock; > + mutex_unlock(&cpu_buffer->mapping_lock); > + return 0; > } > > /* prevent another thread from changing buffer sizes */
On Wed, 20 Dec 2023 13:06:06 +0000 Vincent Donnefort <vdonnefort@google.com> wrote: > > @@ -771,10 +772,20 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) > > static void rb_wake_up_waiters(struct irq_work *work) > > { > > struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); > > - struct ring_buffer_per_cpu *cpu_buffer = > > - container_of(rbwork, struct ring_buffer_per_cpu, irq_work); > > + struct ring_buffer_per_cpu *cpu_buffer; > > + struct trace_buffer *buffer; > > + int cpu; > > > > - rb_update_meta_page(cpu_buffer); > > + if (rbwork->is_cpu_buffer) { > > + cpu_buffer = container_of(rbwork, struct ring_buffer_per_cpu, irq_work); > > + rb_update_meta_page(cpu_buffer); > > + } else { > > + buffer = container_of(rbwork, struct trace_buffer, irq_work); > > + for_each_buffer_cpu(buffer, cpu) { > > + cpu_buffer = buffer->buffers[cpu]; > > + rb_update_meta_page(cpu_buffer); > > + } > > + } > > Arg, somehow never reproduced the problem :-\. I suppose you need to cat > trace/trace_pipe and mmap(trace/cpuX/trace_pipe) at the same time? It triggered as soon as I ran "trace-cmd start -e sched_switch" In other words, it broke the non mmap case. This function gets called for both the buffer and cpu_buffer irq_work entries. You added the container_of() to get access to cpu_buffer, when the rbwork could also be for the main buffer too. The main buffer has no meta page, and it triggered a NULL pointer dereference, as "cpu_buffer->mapped" returned true (because it was on something of the buffer structure that wasn't zero), and then here: if (cpu_buffer->mapped) { WRITE_ONCE(cpu_buffer->meta_page->reader.read, 0); It dereferenced cpu_buffer->meta_page->reader which is only God knows what! > > Updating the meta-page is only useful if the reader we are waking up is a > user-space one, which would only happen with the cpu_buffer version of this > function. We could limit the update of the meta_page only to this case? I rather not add another irq_work entry. This workaround should be good enough. Thanks, -- Steve
On Wed, Dec 20, 2023 at 08:29:32AM -0500, Steven Rostedt wrote: > On Wed, 20 Dec 2023 13:06:06 +0000 > Vincent Donnefort <vdonnefort@google.com> wrote: > > > > @@ -771,10 +772,20 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) > > > static void rb_wake_up_waiters(struct irq_work *work) > > > { > > > struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); > > > - struct ring_buffer_per_cpu *cpu_buffer = > > > - container_of(rbwork, struct ring_buffer_per_cpu, irq_work); > > > + struct ring_buffer_per_cpu *cpu_buffer; > > > + struct trace_buffer *buffer; > > > + int cpu; > > > > > > - rb_update_meta_page(cpu_buffer); > > > + if (rbwork->is_cpu_buffer) { > > > + cpu_buffer = container_of(rbwork, struct ring_buffer_per_cpu, irq_work); > > > + rb_update_meta_page(cpu_buffer); > > > + } else { > > > + buffer = container_of(rbwork, struct trace_buffer, irq_work); > > > + for_each_buffer_cpu(buffer, cpu) { > > > + cpu_buffer = buffer->buffers[cpu]; > > > + rb_update_meta_page(cpu_buffer); > > > + } > > > + } > > > > Arg, somehow never reproduced the problem :-\. I suppose you need to cat > > trace/trace_pipe and mmap(trace/cpuX/trace_pipe) at the same time? > > It triggered as soon as I ran "trace-cmd start -e sched_switch" > > In other words, it broke the non mmap case. This function gets called for > both the buffer and cpu_buffer irq_work entries. You added the > container_of() to get access to cpu_buffer, when the rbwork could also be > for the main buffer too. The main buffer has no meta page, and it triggered > a NULL pointer dereference, as "cpu_buffer->mapped" returned true (because > it was on something of the buffer structure that wasn't zero), and then here: > > if (cpu_buffer->mapped) { > WRITE_ONCE(cpu_buffer->meta_page->reader.read, 0); > > It dereferenced cpu_buffer->meta_page->reader > > which is only God knows what! > > > > > Updating the meta-page is only useful if the reader we are waking up is a > > user-space one, which would only happen with the cpu_buffer version of this > > function. We could limit the update of the meta_page only to this case? > > I rather not add another irq_work entry. This workaround should be good > enough. I meant, to only do in rb_wake_up_waiters() if (rbwork->is_cpu_buffer) rb_update_meta_page(cpu_buffer) And skip the meta-page update for the !is_cpu_buffer case? > > Thanks, > > -- Steve
On Wed, 20 Dec 2023 13:49:30 +0000 Vincent Donnefort <vdonnefort@google.com> wrote: > I meant, to only do in rb_wake_up_waiters() > > if (rbwork->is_cpu_buffer) > rb_update_meta_page(cpu_buffer) > > And skip the meta-page update for the !is_cpu_buffer case? Ah yeah, that works. -- Steve