Message ID | 20230208090814.869242-1-zhengyejian1@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tracing/ring-buffer: Remove integrity check at end of iter read | expand |
On Wed, 8 Feb 2023 17:08:14 +0800 Zheng Yejian <zhengyejian1@huawei.com> wrote: > Concurrently closing "trace" file and writing into ring buffer [1] can > cause WARNINGs [2]. It has been reported in > Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ > > It seems a data race between ring_buffer writing and integrity check. > That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG > was cleared when doing integrity check: > rb_check_pages() rb_handle_head_page(): > -------- -------- > rb_head_page_deactivate() > rb_head_page_set_normal() > rb_head_page_activate() > Good catch! > Integrity check at end of iter read was added since commit 659f451ff213 > ("ring-buffer: Add integrity check at end of iter read"). As it's commit > message said: > > As reading via an iterator requires disabling the ring buffer, it > > is a perfect place to have it. > However, since commit 1039221cc278 ("ring-buffer: Do not disable recording > when there is an iterator"), ring buffer was not disabled at that place, > so that integrity check should be removed. > > 1: > ``` read_trace.sh > while true; > do > # the "trace" file is closed after read > head -1 /sys/kernel/tracing/trace > /dev/null > done > ``` > ``` repro.sh > sysctl -w kernel.panic_on_warn=1 > # function tracer will writing enough data into ring_buffer > echo function > /sys/kernel/tracing/current_tracer > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ``` > > Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") > Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> > --- > kernel/trace/ring_buffer.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index c366a0a9ddba..34e955bd1e59 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -5203,17 +5203,6 @@ void > ring_buffer_read_finish(struct ring_buffer_iter *iter) > { > struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; > - unsigned long flags; > - > - /* > - * Ring buffer is disabled from recording, here's a good place > - * to check the integrity of the ring buffer. > - * Must prevent readers from trying to read, as the check > - * clears the HEAD page and readers require it. > - */ > - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > - rb_check_pages(cpu_buffer); > - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); I would rather find a way to make this still work than just removing it. Perhaps there's no reason to clear the flags, and change rb_check_pages() to mask them out before testing. Something like: static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) { struct list_head *head = cpu_buffer->pages; struct buffer_page *bpage, *tmp; if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head)) return -1; if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head)) return -1; if (rb_check_list(cpu_buffer, head)) return -1; list_for_each_entry_safe(bpage, tmp, head, list) { if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list)) return -1; if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list)) return -1; if (rb_check_list(cpu_buffer, &bpage->list)) return -1; } return 0; } I haven't tested the above. ? -- Steve > > atomic_dec(&cpu_buffer->resize_disabled); > kfree(iter->event);
On 2023/2/9 06:36, Steven Rostedt wrote: > On Wed, 8 Feb 2023 17:08:14 +0800 > Zheng Yejian <zhengyejian1@huawei.com> wrote: > >> Concurrently closing "trace" file and writing into ring buffer [1] can >> cause WARNINGs [2]. It has been reported in >> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ >> >> It seems a data race between ring_buffer writing and integrity check. >> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG >> was cleared when doing integrity check: >> rb_check_pages() rb_handle_head_page(): >> -------- -------- >> rb_head_page_deactivate() >> rb_head_page_set_normal() >> rb_head_page_activate() >> > > Good catch! Thanks! > >> Integrity check at end of iter read was added since commit 659f451ff213 >> ("ring-buffer: Add integrity check at end of iter read"). As it's commit >> message said: >> > As reading via an iterator requires disabling the ring buffer, it >> > is a perfect place to have it. >> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording >> when there is an iterator"), ring buffer was not disabled at that place, >> so that integrity check should be removed. >> >> 1: >> ``` read_trace.sh >> while true; >> do >> # the "trace" file is closed after read >> head -1 /sys/kernel/tracing/trace > /dev/null >> done >> ``` >> ``` repro.sh >> sysctl -w kernel.panic_on_warn=1 >> # function tracer will writing enough data into ring_buffer >> echo function > /sys/kernel/tracing/current_tracer >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ``` >> > > >> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") >> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> >> --- >> kernel/trace/ring_buffer.c | 11 ----------- >> 1 file changed, 11 deletions(-) >> >> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c >> index c366a0a9ddba..34e955bd1e59 100644 >> --- a/kernel/trace/ring_buffer.c >> +++ b/kernel/trace/ring_buffer.c >> @@ -5203,17 +5203,6 @@ void >> ring_buffer_read_finish(struct ring_buffer_iter *iter) >> { >> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; >> - unsigned long flags; >> - >> - /* >> - * Ring buffer is disabled from recording, here's a good place >> - * to check the integrity of the ring buffer. >> - * Must prevent readers from trying to read, as the check >> - * clears the HEAD page and readers require it. >> - */ >> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); >> - rb_check_pages(cpu_buffer); >> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > > I would rather find a way to make this still work than just removing it. Yes, we can try to find the way. > > Perhaps there's no reason to clear the flags, and change rb_check_pages() > to mask them out before testing. Something like: > > static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) > { > struct list_head *head = cpu_buffer->pages; > struct buffer_page *bpage, *tmp; > > if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head)) > return -1; > if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head)) > return -1; > > if (rb_check_list(cpu_buffer, head)) rb_check_list() expect to check a page with RB_FLAG being cleared, but in this solution, rb_head_page_deactivate() is not called before, so we may not call it directly? The same problem with below check for "bpage->list". > return -1; > > list_for_each_entry_safe(bpage, tmp, head, list) { I'd like to know if there is a case that "head" happens to be a "reader_page", and the ring buffer is not exactly being traversed? > if (RB_WARN_ON(cpu_buffer, > rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list)) > return -1; > if (RB_WARN_ON(cpu_buffer, > rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list)) > return -1; > if (rb_check_list(cpu_buffer, &bpage->list)) > return -1; > } > > return 0; > } > > I haven't tested the above. > > ? > > -- Steve > > >> >> atomic_dec(&cpu_buffer->resize_disabled); >> kfree(iter->event); >
On 2023/2/9 06:36, Steven Rostedt wrote: >> On Wed, 8 Feb 2023 17:08:14 +0800 >> Zheng Yejian <zhengyejian1@huawei.com> wrote: >> >>> Concurrently closing "trace" file and writing into ring buffer [1] can >>> cause WARNINGs [2]. It has been reported in >>> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ >>> >>> It seems a data race between ring_buffer writing and integrity check. >>> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG >>> was cleared when doing integrity check: >>> rb_check_pages() rb_handle_head_page(): >>> -------- -------- >>> rb_head_page_deactivate() >>> rb_head_page_set_normal() >>> rb_head_page_activate() >>> >> >> Good catch! > >Thanks! > >> >>> Integrity check at end of iter read was added since commit 659f451ff213 >>> ("ring-buffer: Add integrity check at end of iter read"). As it's commit >>> message said: >>> > As reading via an iterator requires disabling the ring buffer, it >>> > is a perfect place to have it. >>> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording >>> when there is an iterator"), ring buffer was not disabled at that place, >>> so that integrity check should be removed. >>> >>> 1: >>> ``` read_trace.sh >>> while true; >>> do >>> # the "trace" file is closed after read >>> head -1 /sys/kernel/tracing/trace > /dev/null >>> done >>> ``` >>> ``` repro.sh >>> sysctl -w kernel.panic_on_warn=1 >>> # function tracer will writing enough data into ring_buffer >>> echo function > /sys/kernel/tracing/current_tracer >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ``` >>> >> >> >>> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") >>> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> >>> --- >>> kernel/trace/ring_buffer.c | 11 ----------- >>> 1 file changed, 11 deletions(-) >>> >>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c >>> index c366a0a9ddba..34e955bd1e59 100644 >>> --- a/kernel/trace/ring_buffer.c >>> +++ b/kernel/trace/ring_buffer.c >>> @@ -5203,17 +5203,6 @@ void >>> ring_buffer_read_finish(struct ring_buffer_iter *iter) >>> { >>> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; >>> - unsigned long flags; >>> - >>> - /* >>> - * Ring buffer is disabled from recording, here's a good place >>> - * to check the integrity of the ring buffer. >>> - * Must prevent readers from trying to read, as the check >>> - * clears the HEAD page and readers require it. >>> - */ >>> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); >>> - rb_check_pages(cpu_buffer); >>> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); >> >> I would rather find a way to make this still work than just removing it. > >Yes, we can try to find the way. > >> >> Perhaps there's no reason to clear the flags, and change rb_check_pages() >> to mask them out before testing. Something like: >> >> static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) >> { >> struct list_head *head = cpu_buffer->pages; >> struct buffer_page *bpage, *tmp; >> >> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head)) >> return -1; >> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head)) >> return -1; >> >> if (rb_check_list(cpu_buffer, head)) > >rb_check_list() expect to check a page with RB_FLAG being cleared, >but in this solution, rb_head_page_deactivate() is not called before, >so we may not call it directly? The same problem with below check for >"bpage->list". Correct. >> return -1; >> >> list_for_each_entry_safe(bpage, tmp, head, list) { > >I'd like to know if there is a case that "head" happens to be a >"reader_page", and the ring buffer is not exactly being traversed? In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as cpu_buffer->pages has duplicate entry in the list. -00 |rb_list_head_clear(inline) -00 |rb_head_page_deactivate(inline) -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200) -01 |atomic_try_cmpxchg_acquire(inline) -01 |queued_spin_lock(inline) -01 |do_raw_spin_lock_flags(inline) -01 |__raw_spin_lock_irqsave(inline) -01 |_raw_spin_lock_irqsave(inline) -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780) -02 |cpumask_next(inline) -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00) -03 |__fput(file = 0xFFFFFF8A53A63F00) -04 |____fput(work = ?) -05 |_raw_spin_unlock_irq(inline) -05 |task_work_run() -06 |tracehook_notify_resume(inline) -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868) -07 |prepare_exit_to_user_mode(inline) -07 |exit_to_user_mode(inline) -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0) -08 |el0t_64_sync_handler(regs = ?) -09 |el0t_64_sync(asm) ... .. ffffff80359eeb00 --> Duplicate entry ffffff80359ee300 ffffff80359ee180 ffffff80359eeec0 ffffff80359eec00 ffffff80359ee800 -- Tail page ffffff80359eedc0 -- Head page ffffff80359ee640 ffffff80359ee080 ffffff80359ee700 ffffff80359ee7c0 ffffff80359eed80 ffffff80359ee900 ffffff80359ee9c0 ffffff80359eea00 ffffff80359eea80 ffffff80359eec80 ffffff80359ee240 ffffff80359ee6c0 ffffff80359ee0c0 ffffff80359ee8c0 ffffff80359ee940 ffffff80359eee00 ffffff80359ee000 ffffff80359eeb00 ---> Duplicate entry -Mukesh > >> if (RB_WARN_ON(cpu_buffer, >> rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list)) >> return -1; >> if (RB_WARN_ON(cpu_buffer, >> rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list)) >> return -1; >> if (rb_check_list(cpu_buffer, &bpage->list)) >> return -1; >> } >> >> return 0; >> } >> >> I haven't tested the above. >> >> ? >> >> -- Steve >> >> >>> >>> atomic_dec(&cpu_buffer->resize_disabled); >>> kfree(iter->event);
On Fri, 10 Feb 2023 20:52:36 +0530 Mukesh Ojha <quic_mojha@quicinc.com> wrote: > >> return -1; > >> > >> list_for_each_entry_safe(bpage, tmp, head, list) { > > > >I'd like to know if there is a case that "head" happens to be a > >"reader_page", and the ring buffer is not exactly being traversed? No, the way it works is that the reader page is found by searching for the head pointer, and then it is set when swapped. Basically, the pseudo code is: reader->next = head_page | HEAD_FLAG val = head_page->prev->next val &= ~FLAGS val |= HEAD_FLAG cmpxchg(head_page->prev->next, val, reader) The HEAD_FLAG is always on the pointer that points to the head page that gets swapped. This will never point to the reader page, as that would mean the writer has access to it. > > In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as > cpu_buffer->pages has duplicate entry in the list. > > -00 |rb_list_head_clear(inline) > -00 |rb_head_page_deactivate(inline) > -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200) > -01 |atomic_try_cmpxchg_acquire(inline) > -01 |queued_spin_lock(inline) > -01 |do_raw_spin_lock_flags(inline) > -01 |__raw_spin_lock_irqsave(inline) > -01 |_raw_spin_lock_irqsave(inline) > -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780) > -02 |cpumask_next(inline) > -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00) > -03 |__fput(file = 0xFFFFFF8A53A63F00) > -04 |____fput(work = ?) > -05 |_raw_spin_unlock_irq(inline) > -05 |task_work_run() > -06 |tracehook_notify_resume(inline) > -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868) > -07 |prepare_exit_to_user_mode(inline) > -07 |exit_to_user_mode(inline) > -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0) > -08 |el0t_64_sync_handler(regs = ?) > -09 |el0t_64_sync(asm) > > ... > .. > ffffff80359eeb00 --> Duplicate entry > ffffff80359ee300 > ffffff80359ee180 > ffffff80359eeec0 > ffffff80359eec00 > ffffff80359ee800 -- Tail page > ffffff80359eedc0 -- Head page > ffffff80359ee640 > ffffff80359ee080 > ffffff80359ee700 > ffffff80359ee7c0 > ffffff80359eed80 > ffffff80359ee900 > ffffff80359ee9c0 > ffffff80359eea00 > ffffff80359eea80 > ffffff80359eec80 > ffffff80359ee240 > ffffff80359ee6c0 > ffffff80359ee0c0 > ffffff80359ee8c0 > ffffff80359ee940 > ffffff80359eee00 > ffffff80359ee000 > ffffff80359eeb00 ---> Duplicate entry So this is a separate issue where the ring buffer is corrupted? -- Steve
On 2/11/2023 10:07 PM, Steven Rostedt wrote: > On Fri, 10 Feb 2023 20:52:36 +0530 > Mukesh Ojha <quic_mojha@quicinc.com> wrote: > >>>> return -1; >>>> >>>> list_for_each_entry_safe(bpage, tmp, head, list) { >>> >>> I'd like to know if there is a case that "head" happens to be a >>> "reader_page", and the ring buffer is not exactly being traversed? > > No, the way it works is that the reader page is found by searching for the > head pointer, and then it is set when swapped. Basically, the pseudo code > is: > > reader->next = head_page | HEAD_FLAG > val = head_page->prev->next > val &= ~FLAGS > val |= HEAD_FLAG > cmpxchg(head_page->prev->next, val, reader) > > The HEAD_FLAG is always on the pointer that points to the head page that > gets swapped. This will never point to the reader page, as that would mean > the writer has access to it. > >> >> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as >> cpu_buffer->pages has duplicate entry in the list. >> >> -00 |rb_list_head_clear(inline) >> -00 |rb_head_page_deactivate(inline) >> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200) >> -01 |atomic_try_cmpxchg_acquire(inline) >> -01 |queued_spin_lock(inline) >> -01 |do_raw_spin_lock_flags(inline) >> -01 |__raw_spin_lock_irqsave(inline) >> -01 |_raw_spin_lock_irqsave(inline) >> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780) >> -02 |cpumask_next(inline) >> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00) >> -03 |__fput(file = 0xFFFFFF8A53A63F00) >> -04 |____fput(work = ?) >> -05 |_raw_spin_unlock_irq(inline) >> -05 |task_work_run() >> -06 |tracehook_notify_resume(inline) >> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868) >> -07 |prepare_exit_to_user_mode(inline) >> -07 |exit_to_user_mode(inline) >> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0) >> -08 |el0t_64_sync_handler(regs = ?) >> -09 |el0t_64_sync(asm) >> >> ... >> .. >> ffffff80359eeb00 --> Duplicate entry >> ffffff80359ee300 >> ffffff80359ee180 >> ffffff80359eeec0 >> ffffff80359eec00 >> ffffff80359ee800 -- Tail page >> ffffff80359eedc0 -- Head page >> ffffff80359ee640 >> ffffff80359ee080 >> ffffff80359ee700 >> ffffff80359ee7c0 >> ffffff80359eed80 >> ffffff80359ee900 >> ffffff80359ee9c0 >> ffffff80359eea00 >> ffffff80359eea80 >> ffffff80359eec80 >> ffffff80359ee240 >> ffffff80359ee6c0 >> ffffff80359ee0c0 >> ffffff80359ee8c0 >> ffffff80359ee940 >> ffffff80359eee00 >> ffffff80359ee000 >> ffffff80359eeb00 ---> Duplicate entry > > So this is a separate issue where the ring buffer is corrupted? It looks to be different issue and there also i see similar call stack of tracing_release() but in that issue it is looping forever in deactivate call due to list corruption. I am not yet able to root cause a place of corruption as it is reproduced only once, will need to check more on this. For this issue, i have posted at https://lore.kernel.org/lkml/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com/ -Mukesh > > -- Steve
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c366a0a9ddba..34e955bd1e59 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5203,17 +5203,6 @@ void ring_buffer_read_finish(struct ring_buffer_iter *iter) { struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; - unsigned long flags; - - /* - * Ring buffer is disabled from recording, here's a good place - * to check the integrity of the ring buffer. - * Must prevent readers from trying to read, as the check - * clears the HEAD page and readers require it. - */ - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - rb_check_pages(cpu_buffer); - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); atomic_dec(&cpu_buffer->resize_disabled); kfree(iter->event);