Message ID | 20241010195849.2f77cc3f@gandalf.local.home (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | ring-buffer: Have the buffer update counter be atomic | expand |
On 10/11/24 01:58, Steven Rostedt wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > In order to prevent any subtle races with the buffer update counter, > change it to an atomic_t. Also, since atomic_t is 32 bits, move its > location in the ring_buffer_per_cpu structure next to "current_context" as > that too is only 32 bits (making 64 bit alignment). > > The counter is only used to detect that the buffer has been updated when > the buffer verifier check is being done. It's not really that important > that it's atomic or not. But since the updates to the counter are never in > the fast path, having it be consistent isn't a bad thing. > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > Note, this is based on top of: > > https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/ Sorry for not replying to your last comment on my patch, I was ill. The member ring_buffer_per_cpu.cnt is intended to be accessed under the reader_lock, same as the pages pointer which it is tied to, so this change shouldn't be strictly needed.
On Fri, 11 Oct 2024 10:05:51 +0200 Petr Pavlu <petr.pavlu@suse.com> wrote: > On 10/11/24 01:58, Steven Rostedt wrote: > > From: Steven Rostedt <rostedt@goodmis.org> > > > > In order to prevent any subtle races with the buffer update counter, > > change it to an atomic_t. Also, since atomic_t is 32 bits, move its > > location in the ring_buffer_per_cpu structure next to "current_context" as > > that too is only 32 bits (making 64 bit alignment). > > > > The counter is only used to detect that the buffer has been updated when > > the buffer verifier check is being done. It's not really that important > > that it's atomic or not. But since the updates to the counter are never in > > the fast path, having it be consistent isn't a bad thing. > > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > Note, this is based on top of: > > > > https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/ > > Sorry for not replying to your last comment on my patch, I was ill. > > The member ring_buffer_per_cpu.cnt is intended to be accessed under the > reader_lock, same as the pages pointer which it is tied to, so this > change shouldn't be strictly needed. > Right, but there was one location that the cnt was updated outside the lock. The one I commented on. But instead of adding a lock around it, I decided to just make it an atomic. Then there would be no need for the lock. Hmm, it still needs a memory barrier though. At least a smp_mb__after_atomic(). -- Steve
On Fri, 11 Oct 2024 10:01:32 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > > Sorry for not replying to your last comment on my patch, I was ill. > > > > The member ring_buffer_per_cpu.cnt is intended to be accessed under the > > reader_lock, same as the pages pointer which it is tied to, so this > > change shouldn't be strictly needed. > > > > Right, but there was one location that the cnt was updated outside the > lock. The one I commented on. But instead of adding a lock around it, I > decided to just make it an atomic. Then there would be no need for the > lock. Hmm, it still needs a memory barrier though. At least a > smp_mb__after_atomic(). Hmm, I don't like how the update in ring_buffer_subbuf_order_set() is unprotected. I think this is the proper patch: diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 696d422d5b35..0672df07b599 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -6774,6 +6774,7 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) } for_each_buffer_cpu(buffer, cpu) { + unsigned long flags; if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; @@ -6800,11 +6801,15 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) struct buffer_page, list); list_del_init(&cpu_buffer->reader_page->list); + /* Synchronize with rb_check_pages() */ + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + /* The cpu_buffer pages are a link list with no head */ cpu_buffer->pages = cpu_buffer->new_pages.next; cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; cpu_buffer->cnt++; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); /* Clear the new_pages list */ INIT_LIST_HEAD(&cpu_buffer->new_pages); Even though it's also protected by the buffer->mutex, I feel more comfortable with this. -- Steve
On 10/11/24 16:48, Steven Rostedt wrote: > On Fri, 11 Oct 2024 10:01:32 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > >>> Sorry for not replying to your last comment on my patch, I was ill. >>> >>> The member ring_buffer_per_cpu.cnt is intended to be accessed under the >>> reader_lock, same as the pages pointer which it is tied to, so this >>> change shouldn't be strictly needed. >>> >> >> Right, but there was one location that the cnt was updated outside the >> lock. The one I commented on. But instead of adding a lock around it, I >> decided to just make it an atomic. Then there would be no need for the >> lock. Hmm, it still needs a memory barrier though. At least a >> smp_mb__after_atomic(). > > Hmm, I don't like how the update in ring_buffer_subbuf_order_set() is > unprotected. I think this is the proper patch: > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 696d422d5b35..0672df07b599 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -6774,6 +6774,7 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > } > > for_each_buffer_cpu(buffer, cpu) { > + unsigned long flags; > > if (!cpumask_test_cpu(cpu, buffer->cpumask)) > continue; > @@ -6800,11 +6801,15 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > struct buffer_page, list); > list_del_init(&cpu_buffer->reader_page->list); > > + /* Synchronize with rb_check_pages() */ > + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > + > /* The cpu_buffer pages are a link list with no head */ > cpu_buffer->pages = cpu_buffer->new_pages.next; > cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; > cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; > cpu_buffer->cnt++; > + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > > /* Clear the new_pages list */ > INIT_LIST_HEAD(&cpu_buffer->new_pages); > > Even though it's also protected by the buffer->mutex, I feel more > comfortable with this. Sorry, I missed context of your comment and that it is about ring_buffer_subbuf_order_set(). I agree, I also noticed the missing locking in this function and it looked to me as something that should be fixed. I happen to have a somewhat more complex patch for it from a few months ago (pasted below). I think I didn't send it to the list because I then noticed other potential locking problems with the subbuf code, which I wanted to examine more closely first.
On Fri, 11 Oct 2024 17:20:12 +0200 Petr Pavlu <petr.pavlu@suse.com> wrote: > I agree, I also noticed the missing locking in this function and it > looked to me as something that should be fixed. I happen to have > a somewhat more complex patch for it from a few months ago (pasted > below). I think I didn't send it to the list because I then noticed > other potential locking problems with the subbuf code, which I wanted to > examine more closely first. > Hmm, I think you are correct that the buffer->mutex isn't enough for the sub buffer page and it requires a bigger window. I'll look at your patch and also the logic to see if it can be squeezed down a little. -- Steve
On Fri, 11 Oct 2024 17:20:12 +0200 Petr Pavlu <petr.pavlu@suse.com> wrote: > >From 359f5aa523bc27ca8600b4efae471eefc0514ce0 Mon Sep 17 00:00:00 2001 > From: Petr Pavlu <petr.pavlu@suse.com> > Date: Tue, 16 Jul 2024 11:35:00 +0200 > Subject: [PATCH] ring-buffer: Fix reader locking when changing the sub buffer > order > > The function ring_buffer_subbuf_order_set() updates each > ring_buffer_per_cpu and installs new sub buffers that match the requested > page order. This operation may be invoked concurrently with readers that > rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or > the ring_buffer_per_cpu.pages and reader_page pointers. However, no > exclusive access is acquired by ring_buffer_subbuf_order_set(). Modyfing > the mentioned data while a reader also operates on them can then result in > incorrect memory access and various crashes. > > Fix the problem by taking the reader_lock when updating a specific > ring_buffer_per_cpu in ring_buffer_subbuf_order_set(). > > Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order") > Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> > --- > kernel/trace/ring_buffer.c | 41 ++++++++++++++++++++++++-------------- > 1 file changed, 26 insertions(+), 15 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 4c24191fa47d..d324f4f9ab9d 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -6773,36 +6773,38 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > } > > for_each_buffer_cpu(buffer, cpu) { > + struct buffer_data_page *old_free_data_page; > + struct list_head old_pages; > + unsigned long flags; > > if (!cpumask_test_cpu(cpu, buffer->cpumask)) > continue; > > cpu_buffer = buffer->buffers[cpu]; > > + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > + > /* Clear the head bit to make the link list normal to read */ > rb_head_page_deactivate(cpu_buffer); > > - /* Now walk the list and free all the old sub buffers */ > - list_for_each_entry_safe(bpage, tmp, cpu_buffer->pages, list) { > - list_del_init(&bpage->list); > - free_buffer_page(bpage); > - } > - /* The above loop stopped an the last page needing to be freed */ > - bpage = list_entry(cpu_buffer->pages, struct buffer_page, list); > - free_buffer_page(bpage); > - > - /* Free the current reader page */ > - free_buffer_page(cpu_buffer->reader_page); > + /* > + * Collect buffers from the cpu_buffer pages list and the > + * reader_page on old_pages, so they can be freed later when not > + * under a spinlock. The pages list is a linked list with no > + * head, adding old_pages turns it into a regular list with > + * old_pages being the head. > + */ > + list_add(&old_pages, cpu_buffer->pages); > + list_add(&cpu_buffer->reader_page->list, &old_pages); > > /* One page was allocated for the reader page */ > cpu_buffer->reader_page = list_entry(cpu_buffer->new_pages.next, > struct buffer_page, list); > list_del_init(&cpu_buffer->reader_page->list); > > - /* The cpu_buffer pages are a link list with no head */ > + /* Install the new pages, remove the head from the list */ > cpu_buffer->pages = cpu_buffer->new_pages.next; > - cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; > - cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; > + list_del(&cpu_buffer->new_pages); Heh, not sure why I didn't just use list_del() on the head itself instead of open coding it. But anyway, it should be list_del_init(), as I will likely remove the initialization of new_pages and have it always be initialized. I rather have it clean than pointing to something unknown. Care to send this as a real patch? We can look at any other locking issues here separately, but this should be fixed now. -- Steve > cpu_buffer->cnt++; > > /* Clear the new_pages list */ > @@ -6815,11 +6817,20 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > cpu_buffer->nr_pages = cpu_buffer->nr_pages_to_update; > cpu_buffer->nr_pages_to_update = 0; > > - free_pages((unsigned long)cpu_buffer->free_page, old_order); > + old_free_data_page = cpu_buffer->free_page; > cpu_buffer->free_page = NULL; > > rb_head_page_activate(cpu_buffer); > > + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > + > + /* Free old sub buffers */ > + list_for_each_entry_safe(bpage, tmp, &old_pages, list) { > + list_del_init(&bpage->list); > + free_buffer_page(bpage); > + } > + free_pages((unsigned long)old_free_data_page, old_order); > + > rb_check_pages(cpu_buffer); > } >
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index a6a1c26ea2e3..bbf7f68f9af2 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -481,9 +481,9 @@ struct ring_buffer_per_cpu { struct buffer_data_page *free_page; unsigned long nr_pages; unsigned int current_context; - struct list_head *pages; /* pages generation counter, incremented when the list changes */ - unsigned long cnt; + atomic_t cnt; + struct list_head *pages; struct buffer_page *head_page; /* read from head */ struct buffer_page *tail_page; /* write to tail */ struct buffer_page *commit_page; /* committed pages */ @@ -1532,7 +1532,7 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) head = rb_list_head(cpu_buffer->pages); if (!rb_check_links(cpu_buffer, head)) goto out_locked; - buffer_cnt = cpu_buffer->cnt; + buffer_cnt = atomic_read(&cpu_buffer->cnt); tmp = head; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); return; @@ -1540,7 +1540,7 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) while (true) { raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - if (buffer_cnt != cpu_buffer->cnt) { + if (buffer_cnt != atomic_read(&cpu_buffer->cnt)) { /* The list was updated, try again. */ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); goto again; @@ -2585,7 +2585,7 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages) /* make sure pages points to a valid page in the ring buffer */ cpu_buffer->pages = next_page; - cpu_buffer->cnt++; + atomic_inc(&cpu_buffer->cnt); /* update head page */ if (head_bit) @@ -2692,7 +2692,7 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer) * pointer to point to end of list */ head_page->prev = last_page; - cpu_buffer->cnt++; + atomic_inc(&cpu_buffer->cnt); success = true; break; } @@ -5347,7 +5347,7 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list; rb_inc_page(&cpu_buffer->head_page); - cpu_buffer->cnt++; + atomic_inc(&cpu_buffer->cnt); local_inc(&cpu_buffer->pages_read); /* Finally update the reader page to the new head */ @@ -6804,7 +6804,7 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) cpu_buffer->pages = cpu_buffer->new_pages.next; cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; - cpu_buffer->cnt++; + atomic_inc(&cpu_buffer->cnt); /* Clear the new_pages list */ INIT_LIST_HEAD(&cpu_buffer->new_pages);