ring-buffer: Add barrire in rb_move_tail

Message ID	20220830120854.7545-1-jiazi.li@transsion.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-trace-devel-owner@kernel.org> From: "Jiazi.Li" <jqqlijiazi@gmail.com> To: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@redhat.com> Cc: "Jiazi.Li" <jiazi.li@transsion.com>, linux-trace-devel@vger.kernel.org Subject: [PATCH] ring-buffer: Add barrire in rb_move_tail Date: Tue, 30 Aug 2022 20:08:54 +0800 Message-Id: <20220830120854.7545-1-jiazi.li@transsion.com> Precedence: bulk
Series	ring-buffer: Add barrire in rb_move_tail \| expand ring-buffer: Add barrire in rb_move_tail

Jiazi Li Aug. 30, 2022, 12:08 p.m. UTC

The following crash issue was encountered several times when read
trace_pipe:

1. in ring_buffer_event_data, found event's type is bigger than
RINGBUF_TYPE_DATA_TYPE_LEN_MAX, print WARNING:
[249033.677658]WARNING: CPU: 1 PID: 9975 at ring_buffer_event_data+0x38/0x48

2. trace_read_pipe print this event resulting in crash issue:
[249033.806200] die+0xac/0x37c
[249033.806891] die_kernel_fault+0x60/0x70
[249033.807710] die_kernel_fault+0x0/0x70
[249033.808518] do_bad_area+0x30/0x130
[249033.809296] do_translation_fault+0x34/0x40
[249033.810158] do_mem_abort+0x4c/0xf8
[249033.810935] el1_da+0x1c/0xac
[249033.811647] string+0x38/0x74
[249033.812357] vsnprintf+0x430/0x65c
[249033.813122] seq_buf_vprintf+0x64/0xcc
[249033.813934] trace_seq_printf+0x70/0xd8
[249033.814755] trace_raw_output_ipi_handler+0x40/0x54
[249033.815705] print_trace_line+0x3ec/0x424
[249033.816547] tracing_read_pipe+0x28c/0x3ec
[249033.817402] __vfs_read+0x44/0x13c
[249033.818166] vfs_read+0xb8/0x150
[249033.818908] ksys_read+0x6c/0xcc
[249033.819651] __arm64_sys_read+0x18/0x20
[249033.820470] el0_svc_common+0x98/0x160
[249033.821278] el0_svc_handler+0x60/0x78
[249033.822086] el0_svc+0x8/0x300

ipi event is not enable in my device, the event tracing_read_pipe
want to print is a PADDING event generated in rb_move_tail.
array[0] of this event is 0x18, ipi entry trace event's type value
is 0x18 too in my device.
so trace_read_pipe call trace_raw_output_ipi_handler to print this
event, eventually, system crash because illegal pointers.

In ring_buffer_peek, PADDING type event will not return to caller.
So at least when ring_buffer_peek is called, this event has not
been set to PADDING.

My reader task poll on trace_pipe and other files, wakeup
by other files and found that trace ring_buffer is not empty
because of read != commit.
But maybe because of cpu out-of-order execution, new event has not
been set to PADDING type, and leads to above issue.

Can we fix this issue by add a barrier between rb_reset_tail
and rb_end_commit?
Or are there other bugs that cause this issue?

Signed-off-by: Jiazi.Li <jiazi.li@transsion.com>
---
 kernel/trace/ring_buffer.c | 5 +++++
 1 file changed, 5 insertions(+)

Steven Rostedt Sept. 1, 2022, 3:13 p.m. UTC | #1

On Tue, 30 Aug 2022 20:08:54 +0800
"Jiazi.Li" <jqqlijiazi@gmail.com> wrote:

> The following crash issue was encountered several times when read
> trace_pipe:
> 
> 1. in ring_buffer_event_data, found event's type is bigger than
> RINGBUF_TYPE_DATA_TYPE_LEN_MAX, print WARNING:
> [249033.677658]WARNING: CPU: 1 PID: 9975 at ring_buffer_event_data+0x38/0x48
> 
> 2. trace_read_pipe print this event resulting in crash issue:
> [249033.806200] die+0xac/0x37c
> [249033.806891] die_kernel_fault+0x60/0x70
> [249033.807710] die_kernel_fault+0x0/0x70
> [249033.808518] do_bad_area+0x30/0x130
> [249033.809296] do_translation_fault+0x34/0x40
> [249033.810158] do_mem_abort+0x4c/0xf8
> [249033.810935] el1_da+0x1c/0xac
> [249033.811647] string+0x38/0x74
> [249033.812357] vsnprintf+0x430/0x65c
> [249033.813122] seq_buf_vprintf+0x64/0xcc
> [249033.813934] trace_seq_printf+0x70/0xd8
> [249033.814755] trace_raw_output_ipi_handler+0x40/0x54
> [249033.815705] print_trace_line+0x3ec/0x424
> [249033.816547] tracing_read_pipe+0x28c/0x3ec
> [249033.817402] __vfs_read+0x44/0x13c
> [249033.818166] vfs_read+0xb8/0x150
> [249033.818908] ksys_read+0x6c/0xcc
> [249033.819651] __arm64_sys_read+0x18/0x20
> [249033.820470] el0_svc_common+0x98/0x160
> [249033.821278] el0_svc_handler+0x60/0x78
> [249033.822086] el0_svc+0x8/0x300
> 
> ipi event is not enable in my device, the event tracing_read_pipe

The ipi event is *not* enabled? So you are saying that it translated the
extra data at the end that was suppose to be PADDING?

> want to print is a PADDING event generated in rb_move_tail.
> array[0] of this event is 0x18, ipi entry trace event's type value
> is 0x18 too in my device.
> so trace_read_pipe call trace_raw_output_ipi_handler to print this
> event, eventually, system crash because illegal pointers.
> 
> In ring_buffer_peek, PADDING type event will not return to caller.
> So at least when ring_buffer_peek is called, this event has not
> been set to PADDING.
> 
> My reader task poll on trace_pipe and other files, wakeup
> by other files and found that trace ring_buffer is not empty
> because of read != commit.
> But maybe because of cpu out-of-order execution, new event has not
> been set to PADDING type, and leads to above issue.
> 
> Can we fix this issue by add a barrier between rb_reset_tail
> and rb_end_commit?
> Or are there other bugs that cause this issue?

I rather not add a barrier in the writer because that will slow down the
tracing.

> 
> Signed-off-by: Jiazi.Li <jiazi.li@transsion.com>
> ---
>  kernel/trace/ring_buffer.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index d59b6a328b7f..20f3ef839467 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2717,6 +2717,11 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  
>  	rb_reset_tail(cpu_buffer, tail, info);
>  
> +	/*
> +	 * Add barrier to make sure tail event be set to
> +	 * PADDING type before update commit.
> +	 */
> +	smp_mb();

Hmm, I'm thinking this may be a bigger race than a memory barrier. I'm
thinking that the reader has full access to this information at this
moment, as soon as the tail page moved forward. I unfortunately do not have
the time right now to do a full investigation, but thank you for reporting
this.

I think your mb() solution is just making the race window smaller and not
actually fixing the race.

-- Steve

>  	/* Commit what we have for now. */
>  	rb_end_commit(cpu_buffer);
>  	/* rb_end_commit() decs committing */

Jiazi Li Sept. 2, 2022, 3:47 a.m. UTC | #2

On Thu, Sep 01, 2022 at 11:13:20AM -0400, Steven Rostedt wrote:
> On Tue, 30 Aug 2022 20:08:54 +0800
> "Jiazi.Li" <jqqlijiazi@gmail.com> wrote:
> 
> > The following crash issue was encountered several times when read
> > trace_pipe:
> > 
> > 1. in ring_buffer_event_data, found event's type is bigger than
> > RINGBUF_TYPE_DATA_TYPE_LEN_MAX, print WARNING:
> > [249033.677658]WARNING: CPU: 1 PID: 9975 at ring_buffer_event_data+0x38/0x48
> > 
> > 2. trace_read_pipe print this event resulting in crash issue:
> > [249033.806200] die+0xac/0x37c
> > [249033.806891] die_kernel_fault+0x60/0x70
> > [249033.807710] die_kernel_fault+0x0/0x70
> > [249033.808518] do_bad_area+0x30/0x130
> > [249033.809296] do_translation_fault+0x34/0x40
> > [249033.810158] do_mem_abort+0x4c/0xf8
> > [249033.810935] el1_da+0x1c/0xac
> > [249033.811647] string+0x38/0x74
> > [249033.812357] vsnprintf+0x430/0x65c
> > [249033.813122] seq_buf_vprintf+0x64/0xcc
> > [249033.813934] trace_seq_printf+0x70/0xd8
> > [249033.814755] trace_raw_output_ipi_handler+0x40/0x54
> > [249033.815705] print_trace_line+0x3ec/0x424
> > [249033.816547] tracing_read_pipe+0x28c/0x3ec
> > [249033.817402] __vfs_read+0x44/0x13c
> > [249033.818166] vfs_read+0xb8/0x150
> > [249033.818908] ksys_read+0x6c/0xcc
> > [249033.819651] __arm64_sys_read+0x18/0x20
> > [249033.820470] el0_svc_common+0x98/0x160
> > [249033.821278] el0_svc_handler+0x60/0x78
> > [249033.822086] el0_svc+0x8/0x300
> > 
> > ipi event is not enable in my device, the event tracing_read_pipe
> 
> The ipi event is *not* enabled? So you are saying that it translated the
> extra data at the end that was suppose to be PADDING?
> 

Yes, in my tracing instance, I only enable four events:
signal_generate, signal_deliver, sched_process_fork and
sched_process_exit.
I also not enable any filter on these events.
Since these four events not trigger frequently, so reader comsume
speed is bigger than event write, and there is no overwrite.

Reader mistook this extra data to ipi_entry event.

> > want to print is a PADDING event generated in rb_move_tail.
> > array[0] of this event is 0x18, ipi entry trace event's type value
> > is 0x18 too in my device.
> > so trace_read_pipe call trace_raw_output_ipi_handler to print this
> > event, eventually, system crash because illegal pointers.
> > 
> > In ring_buffer_peek, PADDING type event will not return to caller.
> > So at least when ring_buffer_peek is called, this event has not
> > been set to PADDING.
> > 
> > My reader task poll on trace_pipe and other files, wakeup
> > by other files and found that trace ring_buffer is not empty
> > because of read != commit.
> > But maybe because of cpu out-of-order execution, new event has not
> > been set to PADDING type, and leads to above issue.
> > 
> > Can we fix this issue by add a barrier between rb_reset_tail
> > and rb_end_commit?
> > Or are there other bugs that cause this issue?
> 
> I rather not add a barrier in the writer because that will slow down the
> tracing.
> 
> > 
> > Signed-off-by: Jiazi.Li <jiazi.li@transsion.com>
> > ---
> >  kernel/trace/ring_buffer.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index d59b6a328b7f..20f3ef839467 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -2717,6 +2717,11 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
> >  
> >  	rb_reset_tail(cpu_buffer, tail, info);
> >  
> > +	/*
> > +	 * Add barrier to make sure tail event be set to
> > +	 * PADDING type before update commit.
> > +	 */
> > +	smp_mb();
> 
> Hmm, I'm thinking this may be a bigger race than a memory barrier. I'm
> thinking that the reader has full access to this information at this
> moment, as soon as the tail page moved forward. I unfortunately do not have
> the time right now to do a full investigation, but thank you for reporting
> this.
> 
> I think your mb() solution is just making the race window smaller and not
> actually fixing the race.
> 
> -- Steve
> 

Yes, I'm thinking this may be a race issue too.
But I didn't find race point through code review.
I have tried the following two methods to investigation this issue:
1. add log in rb_reset_tail, ring_buffer_peek, rb_per_cpu_empty
2. do not add log in these functions, just get timestamp and store to
ring_buffer struct for debug.
But All these changes make issue no longer reproduce, maybe because
these changes also make race window smaller.
Do you have any suggestions?

We now have found more than 20 similar issues, all are reader peek a
PADDING event.
Compared with ordinary data event, the time between write
RING_BUF_PADDING to memory and update commit is very short, because
rb_reset_tail and rb_end_commit are inline functions.
We do not found reader peek a ordinary data event with expired data,
So I think maybe add a barrier could fix this issue.

> >  	/* Commit what we have for now. */
> >  	rb_end_commit(cpu_buffer);
> >  	/* rb_end_commit() decs committing */
>

Steven Rostedt Sept. 2, 2022, 12:56 p.m. UTC | #3

On Fri, 2 Sep 2022 11:47:58 +0800
lijiazi <jqqlijiazi@gmail.com> wrote:

> Yes, I'm thinking this may be a race issue too.
> But I didn't find race point through code review.

I think I may have found it. Can you test this patch?

[ Disclaimer, I did not even try to compile it ]

-- Steve

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d59b6a328b7f..0f47e68fee9b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4580,6 +4580,10 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	goto again;
 
  out:
+	/* If the commit is past the end of page, a writer is still updating it */
+	if (reader && rb_page_size(reader) > BUF_PAGE_SIZE)
+		reader = NULL;
+
 	/* Update the read_stamp on the first event */
 	if (reader && reader->read == 0)
 		cpu_buffer->read_stamp = reader->page->time_stamp;

Jiazi Li Sept. 5, 2022, 3:23 a.m. UTC | #4

On Fri, Sep 02, 2022 at 08:56:41AM -0400, Steven Rostedt wrote:
> On Fri, 2 Sep 2022 11:47:58 +0800
> lijiazi <jqqlijiazi@gmail.com> wrote:
> 
> > Yes, I'm thinking this may be a race issue too.
> > But I didn't find race point through code review.
> 
> I think I may have found it. Can you test this patch?
> 
Sure, I will test this patch.
Thanks!
> [ Disclaimer, I did not even try to compile it ]
> 
> -- Steve
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index d59b6a328b7f..0f47e68fee9b 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -4580,6 +4580,10 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	goto again;
>  
>   out:
> +	/* If the commit is past the end of page, a writer is still updating it */
> +	if (reader && rb_page_size(reader) > BUF_PAGE_SIZE)
> +		reader = NULL;
> +
From ramdump, current reader page's commit is 0xff0, not bigger than
BUF_PAGE_SIZE:
crash> struct buffer_page 0xffffffd10b599580 -x
struct buffer_page {
  list = {
    next = 0xffffffd10b599500,
    prev = 0xffffffd10b599680
  },
  write = {
    a = {
      counter = 0x100ff0
    }
  },
  read = 0xfd4,
  entries = {
    a = {
      counter = 0x100053
    }
  },
  real_end = 0xfd4,
  page = 0xffffffd10b553000
}
crash> struct buffer_data_page 0xffffffd10b553000 -x
struct buffer_data_page {
  time_stamp = 0xe2679ca0cd3d,
  commit = {
    a = {
      counter = 0xff0
    }
  },
  data = 0xffffffd10b553010 "\b"
}
I also can extrace trace log from ramdump by crash-trace extension tool:
bsp: <...>-32191 [006] 249032.606401: signal_generate:      sig=17 errno=0 code=1 comm=WifiDiagnostics pid=1535 grp=1 res=1
bsp: <...>-32183 [006] 249032.625192: sched_process_exit:   comm=osi_bin pid=32183 prio=120
bsp: <...>-32196 [006] 249033.677333: sched_process_exit:   comm=ip pid=32196 prio=120
bsp: <...>-32196 [006] 249033.677562: signal_generate:      sig=17 errno=0 code=1 comm=sh pid=32195 grp=1 res=0
Above logs is on reader page, reader task try to read PADDING event
after this event and lead to crash.
below logs on tail page:
bsp: <...>-32195 [006] 249033.677596: signal_deliver:       sig=17 errno=0 code=1 sa_handler=6385258d68 sa_flags=0
bsp: <...>-32195 [006] 249033.678561: signal_deliver:       sig=17 errno=0 code=1 sa_handler=6385258d68 sa_flags=0
bsp: <...>-32195 [006] 249033.680041: sched_process_exit:   comm=sh pid=32195 prio=120
bsp: <...>-32195 [006] 249033.680096: signal_generate:      sig=17 errno=0 code=1 comm=osi_bin pid=4505 grp=1 res=2
>  	/* Update the read_stamp on the first event */
>  	if (reader && reader->read == 0)
>  		cpu_buffer->read_stamp = reader->page->time_stamp;

Steven Rostedt Sept. 6, 2022, 4:25 p.m. UTC | #5

On Mon, 5 Sep 2022 11:23:14 +0800
lijiazi <jqqlijiazi@gmail.com> wrote:

> >From ramdump, current reader page's commit is 0xff0, not bigger than  
> BUF_PAGE_SIZE:
> crash> struct buffer_page 0xffffffd10b599580 -x  
> struct buffer_page {
>   list = {
>     next = 0xffffffd10b599500,
>     prev = 0xffffffd10b599680
>   },
>   write = {
>     a = {
>       counter = 0x100ff0
>     }
>   },
>   read = 0xfd4,
>   entries = {
>     a = {
>       counter = 0x100053
>     }
>   },
>   real_end = 0xfd4,
>   page = 0xffffffd10b553000
> }
> crash> struct buffer_data_page 0xffffffd10b553000 -x  
> struct buffer_data_page {
>   time_stamp = 0xe2679ca0cd3d,
>   commit = {
>     a = {
>       counter = 0xff0
>     }
>   },
>   data = 0xffffffd10b553010 "\b"
> }
> I also can extrace trace log from ramdump by crash-trace extension tool:
> bsp: <...>-32191 [006] 249032.606401: signal_generate:      sig=17 errno=0 code=1 comm=WifiDiagnostics pid=1535 grp=1 res=1
> bsp: <...>-32183 [006] 249032.625192: sched_process_exit:   comm=osi_bin pid=32183 prio=120
> bsp: <...>-32196 [006] 249033.677333: sched_process_exit:   comm=ip pid=32196 prio=120
> bsp: <...>-32196 [006] 249033.677562: signal_generate:      sig=17 errno=0 code=1 comm=sh pid=32195 grp=1 res=0
> Above logs is on reader page, reader task try to read PADDING event
> after this event and lead to crash.

Ah, it's not an issue with the commit value but the write value.

Can you test this patch.

-- Steve

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d59b6a328b7f..6bf7706bb33b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2608,6 +2608,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 		/* Mark the rest of the page with padding */
 		rb_event_set_padding(event);
 
+		/* Make sure the padding is visible before the write update */
+		smp_wmb();
+
 		/* Set the write back to the previous setting */
 		local_sub(length, &tail_page->write);
 		return;
@@ -4580,6 +4583,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	goto again;
 
  out:
+	/* If the write is past the end of page, a writer is still updating it */
+	if (reader && reader->write > BUF_PAGE_SIZE)
+		reader = NULL;
+
+	/* Make sure we see any padding after the write update */
+	smp_rmb();
+
 	/* Update the read_stamp on the first event */
 	if (reader && reader->read == 0)
 		cpu_buffer->read_stamp = reader->page->time_stamp;

Jiazi Li Sept. 7, 2022, 3:03 a.m. UTC | #6

On Tue, Sep 06, 2022 at 12:25:42PM -0400, Steven Rostedt wrote:
> On Mon, 5 Sep 2022 11:23:14 +0800
> lijiazi <jqqlijiazi@gmail.com> wrote:
> 
> > >From ramdump, current reader page's commit is 0xff0, not bigger than  
> > BUF_PAGE_SIZE:
> > crash> struct buffer_page 0xffffffd10b599580 -x  
> > struct buffer_page {
> >   list = {
> >     next = 0xffffffd10b599500,
> >     prev = 0xffffffd10b599680
> >   },
> >   write = {
> >     a = {
> >       counter = 0x100ff0
> >     }
> >   },
> >   read = 0xfd4,
> >   entries = {
> >     a = {
> >       counter = 0x100053
> >     }
> >   },
> >   real_end = 0xfd4,
> >   page = 0xffffffd10b553000
> > }
> > crash> struct buffer_data_page 0xffffffd10b553000 -x  
> > struct buffer_data_page {
> >   time_stamp = 0xe2679ca0cd3d,
> >   commit = {
> >     a = {
> >       counter = 0xff0
> >     }
> >   },
> >   data = 0xffffffd10b553010 "\b"
> > }
> > I also can extrace trace log from ramdump by crash-trace extension tool:
> > bsp: <...>-32191 [006] 249032.606401: signal_generate:      sig=17 errno=0 code=1 comm=WifiDiagnostics pid=1535 grp=1 res=1
> > bsp: <...>-32183 [006] 249032.625192: sched_process_exit:   comm=osi_bin pid=32183 prio=120
> > bsp: <...>-32196 [006] 249033.677333: sched_process_exit:   comm=ip pid=32196 prio=120
> > bsp: <...>-32196 [006] 249033.677562: signal_generate:      sig=17 errno=0 code=1 comm=sh pid=32195 grp=1 res=0
> > Above logs is on reader page, reader task try to read PADDING event
> > after this event and lead to crash.
> 
> Ah, it's not an issue with the commit value but the write value.
> 
> Can you test this patch.
> 
Sure, I will test this patch.
Thanks!
> -- Steve
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index d59b6a328b7f..6bf7706bb33b 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2608,6 +2608,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  		/* Mark the rest of the page with padding */
>  		rb_event_set_padding(event);
>  
> +		/* Make sure the padding is visible before the write update */
> +		smp_wmb();
> +
>  		/* Set the write back to the previous setting */
>  		local_sub(length, &tail_page->write);
>  		return;
> @@ -4580,6 +4583,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	goto again;
>  
>   out:
> +	/* If the write is past the end of page, a writer is still updating it */
> +	if (reader && reader->write > BUF_PAGE_SIZE)
> +		reader = NULL;
> +
> +	/* Make sure we see any padding after the write update */
> +	smp_rmb();
> +
>  	/* Update the read_stamp on the first event */
>  	if (reader && reader->read == 0)
>  		cpu_buffer->read_stamp = reader->page->time_stamp;

Jiazi Li Sept. 7, 2022, 8:13 a.m. UTC | #7

On Tue, Sep 06, 2022 at 12:25:42PM -0400, Steven Rostedt wrote:
> On Mon, 5 Sep 2022 11:23:14 +0800
> lijiazi <jqqlijiazi@gmail.com> wrote:
> 
> > >From ramdump, current reader page's commit is 0xff0, not bigger than  
> > BUF_PAGE_SIZE:
> > crash> struct buffer_page 0xffffffd10b599580 -x  
> > struct buffer_page {
> >   list = {
> >     next = 0xffffffd10b599500,
> >     prev = 0xffffffd10b599680
> >   },
> >   write = {
> >     a = {
> >       counter = 0x100ff0
> >     }
> >   },
> >   read = 0xfd4,
> >   entries = {
> >     a = {
> >       counter = 0x100053
> >     }
> >   },
> >   real_end = 0xfd4,
> >   page = 0xffffffd10b553000
> > }
> > crash> struct buffer_data_page 0xffffffd10b553000 -x  
> > struct buffer_data_page {
> >   time_stamp = 0xe2679ca0cd3d,
> >   commit = {
> >     a = {
> >       counter = 0xff0
> >     }
> >   },
> >   data = 0xffffffd10b553010 "\b"
> > }
> > I also can extrace trace log from ramdump by crash-trace extension tool:
> > bsp: <...>-32191 [006] 249032.606401: signal_generate:      sig=17 errno=0 code=1 comm=WifiDiagnostics pid=1535 grp=1 res=1
> > bsp: <...>-32183 [006] 249032.625192: sched_process_exit:   comm=osi_bin pid=32183 prio=120
> > bsp: <...>-32196 [006] 249033.677333: sched_process_exit:   comm=ip pid=32196 prio=120
> > bsp: <...>-32196 [006] 249033.677562: signal_generate:      sig=17 errno=0 code=1 comm=sh pid=32195 grp=1 res=0
> > Above logs is on reader page, reader task try to read PADDING event
> > after this event and lead to crash.
> 
> Ah, it's not an issue with the commit value but the write value.
> 
> Can you test this patch.
> 
> -- Steve
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index d59b6a328b7f..6bf7706bb33b 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2608,6 +2608,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  		/* Mark the rest of the page with padding */
>  		rb_event_set_padding(event);
>  
> +		/* Make sure the padding is visible before the write update */
> +		smp_wmb();
> +

I checked several ramdumps, most of the padding event that caused crash not be
written here. Because its corresponding ring_buffer_event->time_delta is
not 0, take one of them as an example:
crash> struct ring_buffer_event ffffffd10b553fe4
struct ring_buffer_event {
  type_len = 29,
  time_delta = 1,
  array = 0xffffffd10b553fe8
}
and array[0] is 0x18:
crash> rd ffffffd10b553fe8
ffffffd10b553fe8:  0000000500000018                    ........
0x18 = 0xff0 - 0xfd4 - 0x4
0xfd4 is end address of last data event in reader page.

The available space size is 0x18 bytes, and next event want to write is
signal_deliver, this event need 0x28 bytes:
crash> trace_event_raw_signal_deliver -x
struct trace_event_raw_signal_deliver {
    struct trace_entry ent;
    int sig;
    int errno;
    int code;
    unsigned long sa_handler;
    unsigned long sa_flags;
    char __data[0];
}
SIZE: 0x28
So trigger rb move tail, tail value in rb_reset_tail is 0xfd4, the
following if condition is false:
if (tail > (BUF_PAGE_SIZE - RB_EVNT_MIN_SIZE))
>  		/* Set the write back to the previous setting */
>  		local_sub(length, &tail_page->write);
>  		return;
> @@ -4580,6 +4583,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	goto again;
>  
>   out:
> +	/* If the write is past the end of page, a writer is still updating it */
> +	if (reader && reader->write > BUF_PAGE_SIZE)
> +		reader = NULL;
> +
> +	/* Make sure we see any padding after the write update */
> +	smp_rmb();
> +
>  	/* Update the read_stamp on the first event */
>  	if (reader && reader->read == 0)
>  		cpu_buffer->read_stamp = reader->page->time_stamp;

Jiazi Li Sept. 15, 2022, 8:25 a.m. UTC | #8

On Wed, Sep 07, 2022 at 04:13:58PM +0800, lijiazi wrote:
> > 
> > Ah, it's not an issue with the commit value but the write value.
> > 
> > Can you test this patch.
> > 
> > -- Steve
> > 
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index d59b6a328b7f..6bf7706bb33b 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -2608,6 +2608,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
> >  		/* Mark the rest of the page with padding */
> >  		rb_event_set_padding(event);
> >  
> > +		/* Make sure the padding is visible before the write update */
> > +		smp_wmb();
> > +
> 
> I checked several ramdumps, most of the padding event that caused crash not be
> written here. Because its corresponding ring_buffer_event->time_delta is
> not 0, take one of them as an example:
> crash> struct ring_buffer_event ffffffd10b553fe4
> struct ring_buffer_event {
>   type_len = 29,
>   time_delta = 1,
>   array = 0xffffffd10b553fe8
> }
> and array[0] is 0x18:
> crash> rd ffffffd10b553fe8
> ffffffd10b553fe8:  0000000500000018                    ........
> 0x18 = 0xff0 - 0xfd4 - 0x4
> 0xfd4 is end address of last data event in reader page.
> 
> The available space size is 0x18 bytes, and next event want to write is
> signal_deliver, this event need 0x28 bytes:
> crash> trace_event_raw_signal_deliver -x
> struct trace_event_raw_signal_deliver {
>     struct trace_entry ent;
>     int sig;
>     int errno;
>     int code;
>     unsigned long sa_handler;
>     unsigned long sa_flags;
>     char __data[0];
> }
> SIZE: 0x28
> So trigger rb move tail, tail value in rb_reset_tail is 0xfd4, the
> following if condition is false:
> if (tail > (BUF_PAGE_SIZE - RB_EVNT_MIN_SIZE))
> >  		/* Set the write back to the previous setting */
> >  		local_sub(length, &tail_page->write);
> >  		return;
> > @@ -4580,6 +4583,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
> >  	goto again;
> >  
> >   out:
> > +	/* If the write is past the end of page, a writer is still updating it */
> > +	if (reader && reader->write > BUF_PAGE_SIZE)

Maybe should use rb_page_write(reeader) to filter out RB_WRITE_INTCNT?
Thanks!
> > +		reader = NULL;
> > +
> > +	/* Make sure we see any padding after the write update */
> > +	smp_rmb();
> > +
> >  	/* Update the read_stamp on the first event */
> >  	if (reader && reader->read == 0)
> >  		cpu_buffer->read_stamp = reader->page->time_stamp;

Steven Rostedt Sept. 28, 2022, 2:54 p.m. UTC | #9

Finally got around to looking at this again!

On Thu, 15 Sep 2022 16:25:50 +0800
lijiazi <jqqlijiazi@gmail.com> wrote:

> On Wed, Sep 07, 2022 at 04:13:58PM +0800, lijiazi wrote:
> > > 
> > > Ah, it's not an issue with the commit value but the write value.
> > > 
> > > Can you test this patch.
> > > 
> > > -- Steve
> > > 
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index d59b6a328b7f..6bf7706bb33b 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c
> > > @@ -2608,6 +2608,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
> > >  		/* Mark the rest of the page with padding */
> > >  		rb_event_set_padding(event);
> > >  
> > > +		/* Make sure the padding is visible before the write update */
> > > +		smp_wmb();
> > > +  
> > 
> > I checked several ramdumps, most of the padding event that caused crash not be
> > written here. Because its corresponding ring_buffer_event->time_delta is
> > not 0, take one of them as an example:  
> > crash> struct ring_buffer_event ffffffd10b553fe4  
> > struct ring_buffer_event {
> >   type_len = 29,
> >   time_delta = 1,
> >   array = 0xffffffd10b553fe8
> > }
> > and array[0] is 0x18:  
> > crash> rd ffffffd10b553fe8  
> > ffffffd10b553fe8:  0000000500000018                    ........
> > 0x18 = 0xff0 - 0xfd4 - 0x4
> > 0xfd4 is end address of last data event in reader page.
> > 
> > The available space size is 0x18 bytes, and next event want to write is
> > signal_deliver, this event need 0x28 bytes:  
> > crash> trace_event_raw_signal_deliver -x  
> > struct trace_event_raw_signal_deliver {
> >     struct trace_entry ent;
> >     int sig;
> >     int errno;
> >     int code;
> >     unsigned long sa_handler;
> >     unsigned long sa_flags;
> >     char __data[0];
> > }
> > SIZE: 0x28


> > So trigger rb move tail, tail value in rb_reset_tail is 0xfd4, the
> > following if condition is false:
> > if (tail > (BUF_PAGE_SIZE - RB_EVNT_MIN_SIZE))  
> > >  		/* Set the write back to the previous setting */
> > >  		local_sub(length, &tail_page->write);
> > >  		return;

OK. But that memory barrier should still be in place there, as the error is
still at that point as well. But we are missing a memory barrier here:

@@ -2654,6 +2657,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
        /* time delta must be non zero */
        event->time_delta = 1;
 
+       /* Make sure the padding is visible before the tail_page->write update */
+       smp_wmb();
+
        /* Set write to end of buffer */
        length = (tail + length) - BUF_PAGE_SIZE;
        local_sub(length, &tail_page->write);


> > > @@ -4580,6 +4583,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
> > >  	goto again;
> > >  
> > >   out:
> > > +	/* If the write is past the end of page, a writer is still updating it */
> > > +	if (reader && reader->write > BUF_PAGE_SIZE)  
> 
> Maybe should use rb_page_write(reeader) to filter out RB_WRITE_INTCNT?

Sure, I'll update that. But missing the filtering out, it would just cause
it to return NULL more than it needs to, and still be able to prevent the
race from happening.

Will reply with an updated patch.

Thanks!

-- Steve

> > > +		reader = NULL;
> > > +
> > > +	/* Make sure we see any padding after the write update */
> > > +	smp_rmb();
> > > +
> > >  	/* Update the read_stamp on the first event */
> > >  	if (reader && reader->read == 0)
> > >  		cpu_buffer->read_stamp = reader->page->time_stamp;

Steven Rostedt Sept. 28, 2022, 2:59 p.m. UTC | #10

On Wed, 28 Sep 2022 10:54:43 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Will reply with an updated patch.

Can you try this patch?

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 677812b8fae0..7b548a26009e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2643,6 +2643,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 		/* Mark the rest of the page with padding */
 		rb_event_set_padding(event);
 
+		/* Make sure the padding is visible before the write update */
+		smp_wmb();
+
 		/* Set the write back to the previous setting */
 		local_sub(length, &tail_page->write);
 		return;
@@ -2654,6 +2657,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	/* time delta must be non zero */
 	event->time_delta = 1;
 
+	/* Make sure the padding is visible before the tail_page->write update */
+	smp_wmb();
+
 	/* Set write to end of buffer */
 	length = (tail + length) - BUF_PAGE_SIZE;
 	local_sub(length, &tail_page->write);
@@ -4615,6 +4621,13 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	goto again;
 
  out:
+	/* If the write is past the end of page, a writer is still updating it */
+	if (reader && rb_page_write(reader) > BUF_PAGE_SIZE)
+		reader = NULL;
+
+	/* Make sure we see any padding after the write update (see rb_reset_tail()) */
+	smp_rmb();
+
 	/* Update the read_stamp on the first event */
 	if (reader && reader->read == 0)
 		cpu_buffer->read_stamp = reader->page->time_stamp;

Steven Rostedt Sept. 28, 2022, 8:29 p.m. UTC | #11

On Wed, 28 Sep 2022 10:59:35 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > Will reply with an updated patch.  
> 
> Can you try this patch?

Actually, that failed my testing (for other reasons), can you try this one?

-- Steve

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3046deacf7b3..46b6f4107428 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2648,6 +2648,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 		/* Mark the rest of the page with padding */
 		rb_event_set_padding(event);
 
+		/* Make sure the padding is visible before the write update */
+		smp_wmb();
+
 		/* Set the write back to the previous setting */
 		local_sub(length, &tail_page->write);
 		return;
@@ -2659,6 +2662,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	/* time delta must be non zero */
 	event->time_delta = 1;
 
+	/* Make sure the padding is visible before the tail_page->write update */
+	smp_wmb();
+
 	/* Set write to end of buffer */
 	length = (tail + length) - BUF_PAGE_SIZE;
 	local_sub(length, &tail_page->write);
@@ -4627,6 +4633,29 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	arch_spin_unlock(&cpu_buffer->lock);
 	local_irq_restore(flags);
 
+	/* The writer has preempt disable, wait for it. But not forever */
+        for (nr_loops = 0; nr_loops < 1000; nr_loops++) {
+		/* If the write is past the end of page, a writer is still updating it */
+		if (likely(!reader || rb_page_write(reader) <= BUF_PAGE_SIZE))
+			break;
+
+		udelay(1);
+
+		/* Get the latest version of the reader write value */
+		smp_rmb();
+	}
+
+	/* The writer is not moving forward? Something is wrong */
+	if (RB_WARN_ON(cpu_buffer, nr_loops == 1000))
+		reader = NULL;
+
+	/*
+	 * Make sure we see any padding after the write update
+	 * (see rb_reset_tail())
+	 */
+	smp_rmb();
+
+
 	return reader;
 }

Jiazi Li Sept. 29, 2022, 9:12 a.m. UTC | #12

On Wed, Sep 28, 2022 at 04:29:08PM -0400, Steven Rostedt wrote:
> On Wed, 28 Sep 2022 10:59:35 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > Will reply with an updated patch.  
> > 
> > Can you try this patch?
> 
> Actually, that failed my testing (for other reasons), can you try this one?
> 
Sure, thank you.
> -- Steve
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 3046deacf7b3..46b6f4107428 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2648,6 +2648,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  		/* Mark the rest of the page with padding */
>  		rb_event_set_padding(event);
>  
> +		/* Make sure the padding is visible before the write update */
> +		smp_wmb();
> +
>  		/* Set the write back to the previous setting */
>  		local_sub(length, &tail_page->write);
>  		return;
> @@ -2659,6 +2662,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  	/* time delta must be non zero */
>  	event->time_delta = 1;
>  
> +	/* Make sure the padding is visible before the tail_page->write update */
> +	smp_wmb();
> +
>  	/* Set write to end of buffer */
>  	length = (tail + length) - BUF_PAGE_SIZE;
>  	local_sub(length, &tail_page->write);
> @@ -4627,6 +4633,29 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	arch_spin_unlock(&cpu_buffer->lock);
>  	local_irq_restore(flags);
>  
> +	/* The writer has preempt disable, wait for it. But not forever */
> +        for (nr_loops = 0; nr_loops < 1000; nr_loops++) {
> +		/* If the write is past the end of page, a writer is still updating it */
> +		if (likely(!reader || rb_page_write(reader) <= BUF_PAGE_SIZE))
> +			break;
> +
> +		udelay(1);
> +
> +		/* Get the latest version of the reader write value */
> +		smp_rmb();
> +	}
> +
> +	/* The writer is not moving forward? Something is wrong */
> +	if (RB_WARN_ON(cpu_buffer, nr_loops == 1000))
> +		reader = NULL;
> +
> +	/*
> +	 * Make sure we see any padding after the write update
> +	 * (see rb_reset_tail())
> +	 */
> +	smp_rmb();
> +
> +
>  	return reader;
>  }
>

Steven Rostedt Sept. 30, 2022, 9:49 p.m. UTC | #13

On Thu, 29 Sep 2022 17:12:01 +0800
lijiazi <jqqlijiazi@gmail.com> wrote:

> > Actually, that failed my testing (for other reasons), can you try this one?
> >   
> Sure, thank you.

Have you had a chance to test this yet?

The merge window is about to open, and I have it already tested through all
my tests. But I want to hold off pushing it to next until I have a
tested-by from you.

Thanks!

-- Steve

Jiazi Li Oct. 8, 2022, 1:21 a.m. UTC | #14

On Fri, Sep 30, 2022 at 05:49:34PM -0400, Steven Rostedt wrote:
> On Thu, 29 Sep 2022 17:12:01 +0800
> lijiazi <jqqlijiazi@gmail.com> wrote:
> 
> > > Actually, that failed my testing (for other reasons), can you try this one?
> > >   
> > Sure, thank you.
> 
> Have you had a chance to test this yet?
> 
> The merge window is about to open, and I have it already tested through all
> my tests. But I want to hold off pushing it to next until I have a
> tested-by from you.
> 
> Thanks!
> 
> -- Steve
Sorry for the late reply, I just finished my National Day holiday.
Up to now, the crash issue has not been reproduced with this patch.
Thanks!

Steven Rostedt Oct. 8, 2022, 12:50 p.m. UTC | #15

On Sat, 8 Oct 2022 09:21:29 +0800
lijiazi <jqqlijiazi@gmail.com> wrote:

> On Fri, Sep 30, 2022 at 05:49:34PM -0400, Steven Rostedt wrote:
> > On Thu, 29 Sep 2022 17:12:01 +0800
> > lijiazi <jqqlijiazi@gmail.com> wrote:
> >   
> > > > Actually, that failed my testing (for other reasons), can you try this one?
> > > >     
> > > Sure, thank you.  
> > 
> > Have you had a chance to test this yet?
> > 
> > The merge window is about to open, and I have it already tested through all
> > my tests. But I want to hold off pushing it to next until I have a
> > tested-by from you.
> > 
> > Thanks!
> > 
> > -- Steve  
> Sorry for the late reply, I just finished my National Day holiday.
> Up to now, the crash issue has not been reproduced with this patch.
> Thanks!

No problem. Thanks for testing it. And thank you even more for
reporting it. This was a real bug that's been around pretty much from
the beginning and nobody (at least I know of) reported it.

-- Steve

ring-buffer: Add barrire in rb_move_tail

Commit Message

Comments

Patch