Message ID | 20240824030343.3218618-1-zhengyejian@huaweicloud.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tracing: Mitigate possible softlockup in __tracing_open() | expand |
On Sat, 24 Aug 2024 11:03:43 +0800 Zheng Yejian <zhengyejian@huaweicloud.com> wrote: > In __tracing_open(), when max latency tracers took place on the cpu, > the time start of its buffer would be updated, then event entries with > timestamps being earlier than start of the buffer would be skipped > (see tracing_iter_reset()). > > Softlockup will occur if the kernel is non-preemptible and too many > entries were skipped in the loop that reset every cpu buffer, so add > cond_resched() to avoid it. > > Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> > --- > kernel/trace/trace.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index ebe7ce2f5f4a..88faa95b457b 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot) > for_each_tracing_cpu(cpu) { > ring_buffer_read_start(iter->buffer_iter[cpu]); > tracing_iter_reset(iter, cpu); > + /* > + * When max latency tracers took place on the cpu, the time start > + * of its buffer would be updated, then event entries with timestamps > + * being earlier than start of the buffer would be skipped > + * (see tracing_iter_reset()). Softlockup will occur if the kernel > + * is non-preemptible and too many entries were skipped in the loop, > + * so add cond_resched() to mitigate it. > + */ > + cond_resched(); OK, but why we cond_resched() only here? Another tracing_iter_reset() in s_start() does not cause the soft lockups in the same situation? Thank you, > } > } else { > cpu = iter->cpu_file; > -- > 2.25.1 > >
On 2024/8/25 23:05, Masami Hiramatsu (Google) wrote: > On Sat, 24 Aug 2024 11:03:43 +0800 > Zheng Yejian <zhengyejian@huaweicloud.com> wrote: > >> In __tracing_open(), when max latency tracers took place on the cpu, >> the time start of its buffer would be updated, then event entries with >> timestamps being earlier than start of the buffer would be skipped >> (see tracing_iter_reset()). >> >> Softlockup will occur if the kernel is non-preemptible and too many >> entries were skipped in the loop that reset every cpu buffer, so add >> cond_resched() to avoid it. >> >> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> >> --- >> kernel/trace/trace.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >> index ebe7ce2f5f4a..88faa95b457b 100644 >> --- a/kernel/trace/trace.c >> +++ b/kernel/trace/trace.c >> @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot) >> for_each_tracing_cpu(cpu) { >> ring_buffer_read_start(iter->buffer_iter[cpu]); >> tracing_iter_reset(iter, cpu); >> + /* >> + * When max latency tracers took place on the cpu, the time start >> + * of its buffer would be updated, then event entries with timestamps >> + * being earlier than start of the buffer would be skipped >> + * (see tracing_iter_reset()). Softlockup will occur if the kernel >> + * is non-preemptible and too many entries were skipped in the loop, >> + * so add cond_resched() to mitigate it. >> + */ >> + cond_resched(); > > OK, but why we cond_resched() only here? Another tracing_iter_reset() in > s_start() does not cause the soft lockups in the same situation? > Sorry for the late reply. __tracing_open() is called when trace file is being opened, s_start() is called when trace file is being read. Normally, we read immediately after open at which tracing_iter_reset() was just called, softlockup may be not that easily triggered in s_start(), do we need cond_resched() there? > Thank you, > > >> } >> } else { >> cpu = iter->cpu_file; >> -- >> 2.25.1 >> >> > >
On Sat, 24 Aug 2024 11:03:43 +0800 Zheng Yejian <zhengyejian@huaweicloud.com> wrote: > In __tracing_open(), when max latency tracers took place on the cpu, > the time start of its buffer would be updated, then event entries with > timestamps being earlier than start of the buffer would be skipped > (see tracing_iter_reset()). > > Softlockup will occur if the kernel is non-preemptible and too many > entries were skipped in the loop that reset every cpu buffer, so add > cond_resched() to avoid it. > > Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> > --- > kernel/trace/trace.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index ebe7ce2f5f4a..88faa95b457b 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot) > for_each_tracing_cpu(cpu) { > ring_buffer_read_start(iter->buffer_iter[cpu]); > tracing_iter_reset(iter, cpu); > + /* > + * When max latency tracers took place on the cpu, the time start > + * of its buffer would be updated, then event entries with timestamps > + * being earlier than start of the buffer would be skipped > + * (see tracing_iter_reset()). Softlockup will occur if the kernel > + * is non-preemptible and too many entries were skipped in the loop, > + * so add cond_resched() to mitigate it. > + */ > + cond_resched(); This is the wrong place to put this. If the problem is with tracing_iter_reset(), then add it there. while (ring_buffer_iter_peek(buf_iter, &ts)) { if (ts >= iter->array_buffer->time_start) break; entries++; ring_buffer_iter_advance(buf_iter); /* This could be a big loop */ cond_resched(); } -- Steve > } > } else { > cpu = iter->cpu_file;
On 2024/8/26 22:35, Steven Rostedt wrote: > On Sat, 24 Aug 2024 11:03:43 +0800 > Zheng Yejian <zhengyejian@huaweicloud.com> wrote: > >> In __tracing_open(), when max latency tracers took place on the cpu, >> the time start of its buffer would be updated, then event entries with >> timestamps being earlier than start of the buffer would be skipped >> (see tracing_iter_reset()). >> >> Softlockup will occur if the kernel is non-preemptible and too many >> entries were skipped in the loop that reset every cpu buffer, so add >> cond_resched() to avoid it. >> >> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> >> --- >> kernel/trace/trace.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >> index ebe7ce2f5f4a..88faa95b457b 100644 >> --- a/kernel/trace/trace.c >> +++ b/kernel/trace/trace.c >> @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot) >> for_each_tracing_cpu(cpu) { >> ring_buffer_read_start(iter->buffer_iter[cpu]); >> tracing_iter_reset(iter, cpu); >> + /* >> + * When max latency tracers took place on the cpu, the time start >> + * of its buffer would be updated, then event entries with timestamps >> + * being earlier than start of the buffer would be skipped >> + * (see tracing_iter_reset()). Softlockup will occur if the kernel >> + * is non-preemptible and too many entries were skipped in the loop, >> + * so add cond_resched() to mitigate it. >> + */ >> + cond_resched(); > > This is the wrong place to put this. If the problem is with > tracing_iter_reset(), then add it there. > > while (ring_buffer_iter_peek(buf_iter, &ts)) { > if (ts >= iter->array_buffer->time_start) > break; > entries++; > ring_buffer_iter_advance(buf_iter); > /* This could be a big loop */ > cond_resched(); Do resched check every event peek, I'm afraid there will be too many 're-schedule'. > } > > -- Steve > > > >> } >> } else { >> cpu = iter->cpu_file; >
On Tue, 27 Aug 2024 09:29:55 +0800 Zheng Yejian <zhengyejian@huaweicloud.com> wrote: > > This is the wrong place to put this. If the problem is with > > tracing_iter_reset(), then add it there. > > > > while (ring_buffer_iter_peek(buf_iter, &ts)) { > > if (ts >= iter->array_buffer->time_start) > > break; > > entries++; > > ring_buffer_iter_advance(buf_iter); > > /* This could be a big loop */ > > cond_resched(); > > Do resched check every event peek, I'm afraid there will > be too many 're-schedule'. cond_resched() as the name suggests only schedules conditionally if NEED_RESCHED is set. This isn't a fast path (obviously, as it requires a cond_resched). -- Steve
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index ebe7ce2f5f4a..88faa95b457b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4706,6 +4706,15 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot) for_each_tracing_cpu(cpu) { ring_buffer_read_start(iter->buffer_iter[cpu]); tracing_iter_reset(iter, cpu); + /* + * When max latency tracers took place on the cpu, the time start + * of its buffer would be updated, then event entries with timestamps + * being earlier than start of the buffer would be skipped + * (see tracing_iter_reset()). Softlockup will occur if the kernel + * is non-preemptible and too many entries were skipped in the loop, + * so add cond_resched() to mitigate it. + */ + cond_resched(); } } else { cpu = iter->cpu_file;
In __tracing_open(), when max latency tracers took place on the cpu, the time start of its buffer would be updated, then event entries with timestamps being earlier than start of the buffer would be skipped (see tracing_iter_reset()). Softlockup will occur if the kernel is non-preemptible and too many entries were skipped in the loop that reset every cpu buffer, so add cond_resched() to avoid it. Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> --- kernel/trace/trace.c | 9 +++++++++ 1 file changed, 9 insertions(+)