diff mbox series

[1/1] tracing: fix bug in rb_per_cpu_empty() that might cause deadloop.

Message ID YPaNxsIlb2yjSi5Y@aegistudio (mailing list archive)
State Superseded
Headers show
Series [1/1] tracing: fix bug in rb_per_cpu_empty() that might cause deadloop. | expand

Commit Message

Haoran Luo July 20, 2021, 8:48 a.m. UTC
The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when
"head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to
the same buffer page, whose "buffer_data_page" is empty and "read" field
is non-zero.

An error scenario could be constructed as followed (kernel perspective):

1. All pages in the buffer has been accessed by reader(s) so that all of
them will have non-zero "read" field.

2. Read and clear all buffer pages so that "rb_num_of_entries()" will
return 0 rendering there's no more data to read. It is also required
that the "read_page", "commit_page" and "tail_page" points to the same
page, while "head_page" is the next page of them.

3. Invoke "ring_buffer_lock_reserve()" with large enough "length"
so that it shot pass the end of current tail buffer page. Now the
"head_page", "commit_page" and "tail_page" points to the same page.

4. Discard current event with "ring_buffer_discard_commit()", so that
"head_page", "commit_page" and "tail_page" points to a page whose buffer
data page is now empty.

When the error scenario has been constructed, "tracing_read_pipe" will
be trapped inside a deadloop: "trace_empty()" returns 0 since
"rb_per_cpu_empty()" returns 0 when it hits the CPU containing such
constructed ring buffer. Then "trace_find_next_entry_inc()" always
return NULL since "rb_num_of_entries()" reports there's no more entry
to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking
"tracing_read_pipe" back to the start of the "waitagain" loop.

I've also written a proof-of-concept script to construct the scenario
and trigger the bug automatically, you can use it to trace and validate
my reasoning above:

  https://github.com/aegistudio/RingBufferDetonator.git

Tests has been carried out on linux kernel 5.14-rc2
(2734d6c1b1a089fb593ef6a23d4b70903526fe0c), my fixed version
of kernel (for testing whether my update fixes the bug) and
some older kernels (for range of affected kernels). Test result is
also attached to the proof-of-concept repository.

Signed-off-by: Haoran Luo <www@aegistudio.net>
---
 kernel/trace/ring_buffer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Linus Torvalds July 20, 2021, 6:35 p.m. UTC | #1
On Tue, Jul 20, 2021 at 2:04 AM Haoran Luo <www@aegistudio.net> wrote:
>
>         return reader->read == rb_page_commit(reader) &&
>                 (commit == reader ||
>                  (commit == head &&
> -                 head->read == rb_page_commit(commit)));
> +                 (rb_page_commit(commit) == 0 ||
> +                  head->read == rb_page_commit(commit))));
>  }

Can we please re-write that whole thing to be legible?

That conditional was complicated before. It's worse now.

For example, just splitting it up would help:

        if (reader->read != rb_page_commit(reader))
                return false;

        if (commit == reader)
                return true;

        if (commit != head)
                return false;

        return !rb_page_commit(commit) ||
                commit->read == rb_page_commit(commit);

but adding comments to each conditional case would also be good.

Note: I checked "head->read" to "commit->read" in that last
conditional, because we had _just_ verified that "commit" and "head"
are the same, and it seems to be more logical to check "commit->read"
with "rb_page_commit(commit)".

(And somebody should check that I split it right - I'm just doing this
in the MUA without knowing the code: I hate conditionals that are
cross multiple lines like this, and I hate them particularly when they
turn out to be buggy, so when you fix a complex conditional, that's a
BIG sign to me that the problem was that the conditional was too
complex to begin with).

Btw, that "rb_page_commit()" forces a new atomic read each time, so
the "rb_page_commit(commit)" value should probably be cached as a
variable before it's used twice.

Steven?

                  Linus
Haoran Luo July 20, 2021, 10:36 p.m. UTC | #2
On Wed, 21 Jul 2021 02:35:56 +0800 Linus Torvalds <torvalds@linuxfoundation.org> wrote:
 > On Tue, Jul 20, 2021 at 2:04 AM Haoran Luo <www@aegistudio.net> wrote: 
 > > 
 > >         return reader->read == rb_page_commit(reader) && 
 > >                 (commit == reader || 
 > >                  (commit == head && 
 > > -                 head->read == rb_page_commit(commit))); 
 > > +                 (rb_page_commit(commit) == 0 || 
 > > +                  head->read == rb_page_commit(commit)))); 
 > >  } 
 >  
 > Can we please re-write that whole thing to be legible? 
 >  
 > That conditional was complicated before. It's worse now. 
 >  
 > For example, just splitting it up would help: 
 >  
 >  if (reader->read != rb_page_commit(reader)) 
 >  return false; 
 >  
 >  if (commit == reader) 
 >  return true; 
 >  
 >  if (commit != head) 
 >  return false; 
 >  
 >  return !rb_page_commit(commit) || 
 >  commit->read == rb_page_commit(commit); 
 >  
 > but adding comments to each conditional case would also be good. 
 >  
 > Note: I checked "head->read" to "commit->read" in that last 
 > conditional, because we had _just_ verified that "commit" and "head" 
 > are the same, and it seems to be more logical to check "commit->read" 
 > with "rb_page_commit(commit)". 
 >  
 > (And somebody should check that I split it right - I'm just doing this 
 > in the MUA without knowing the code: I hate conditionals that are 
 > cross multiple lines like this, and I hate them particularly when they 
 > turn out to be buggy, so when you fix a complex conditional, that's a 
 > BIG sign to me that the problem was that the conditional was too 
 > complex to begin with). 
 >  
 > Btw, that "rb_page_commit()" forces a new atomic read each time, so 
 > the "rb_page_commit(commit)" value should probably be cached as a 
 > variable before it's used twice. 
 >  
 > Steven? 
 >  
 >  Linus 
 > 

Agreed, this piece of code would be less likely to be buggy if they
are commented by what they are judging. I will do it and send the next
version of patch.
Steven Rostedt July 21, 2021, 1:08 a.m. UTC | #3
On Tue, 20 Jul 2021 11:35:56 -0700
Linus Torvalds <torvalds@linuxfoundation.org> wrote:

> On Tue, Jul 20, 2021 at 2:04 AM Haoran Luo <www@aegistudio.net> wrote:
> >
> >         return reader->read == rb_page_commit(reader) &&
> >                 (commit == reader ||
> >                  (commit == head &&
> > -                 head->read == rb_page_commit(commit)));
> > +                 (rb_page_commit(commit) == 0 ||
> > +                  head->read == rb_page_commit(commit))));
> >  }  
> 
> Can we please re-write that whole thing to be legible?
> 
> That conditional was complicated before. It's worse now.

I agree. I always cringe when I write code like this. For some reason,
I write code I dislike over adding multiple returns. Not sure why I do
that. Probably some childhood trauma.

I'll add the comments below, as it looks correct.

> 
> For example, just splitting it up would help:
> 

	/*
	 * If what has been read on the reader page is not equal
	 * to the content on the reader page, then there's more
	 * data (not empty).
	 */

>         if (reader->read != rb_page_commit(reader))
>                 return false;
> 

	/*
	 * The last thing written is on the reader page, and the above
	 * statement already shows everything written was read.
	 * (empty buffer)
	 */

>         if (commit == reader)
>                 return true;
> 

	/*
	 * The last page written does not equal the last page read,
	 * thus there's more data (not empty).
	 */

>         if (commit != head)
>                 return false;
> 

So, looking at this, I see that the bug is not here. head->read should
never be non-zero. The issue is that we missed resetting "read" to zero
before adding it back to the ring buffer.


>         return !rb_page_commit(commit) ||
>                 commit->read == rb_page_commit(commit);

Really, the original code worked because head->read is expected to be
zero. And looking at the code more, it should even work with:

	/*
	 * With the commit == head page (tail == head), and nothing on
	 * the commit page there's nothing in the buffer.
	 */
	if (!rb_page_commit(commit))
		return true;

> 
> but adding comments to each conditional case would also be good.
> 
> Note: I checked "head->read" to "commit->read" in that last
> conditional, because we had _just_ verified that "commit" and "head"
> are the same, and it seems to be more logical to check "commit->read"
> with "rb_page_commit(commit)".

I believe I kept it as "head" to re-enforce that the two are the same.


> 
> (And somebody should check that I split it right - I'm just doing this
> in the MUA without knowing the code: I hate conditionals that are
> cross multiple lines like this, and I hate them particularly when they
> turn out to be buggy, so when you fix a complex conditional, that's a
> BIG sign to me that the problem was that the conditional was too
> complex to begin with).
> 
> Btw, that "rb_page_commit()" forces a new atomic read each time, so
> the "rb_page_commit(commit)" value should probably be cached as a
> variable before it's used twice.
> 

Well, the double read of rb_page_commit() is not needed as the bug is
that read is non zero. And I'm not sure this patch even fixes the bug,
because read is not updated when commit is, thus read is some random
number that could cause this to return a false positive. "read" needed
to be reset when the page was added back to the ring buffer, and it
appears that it was missed (ever though all the other fields were reset).

Although this patch is wrong, I'm not against a clean up patch to make
the code easier to read.

I think the real fix is below (not even compiled tested).

-- Steve

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d1463eac11a3..0b9ce927e95f 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4445,6 +4445,7 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
+	cpu_buffer->reader_page->read = 0;
 	cpu_buffer->reader_page->real_end = 0;
 
  spin:
Steven Rostedt July 21, 2021, 1:10 a.m. UTC | #4
On Wed, 21 Jul 2021 06:36:43 +0800
contact me <www@aegistudio.net> wrote:

> Agreed, this piece of code would be less likely to be buggy if they
> are commented by what they are judging. I will do it and send the next
> version of patch.

Please send me the clean up patch, and I'll add it to the next merge
window. You can use the comments I gave Linus. But can you test the
patch I replied to Linus with. And I'll add that one for Linus and
stable, and add you as the "Reported-by" on it.

Thanks!

-- Steve
Haoran Luo July 21, 2021, 2:13 a.m. UTC | #5
On Wed, 21 Jul 2021 09:08:55 +0800 Steven Rostedt <rostedt@goodmis.org> wrote:
 > On Tue, 20 Jul 2021 11:35:56 -0700 
 > Linus Torvalds <torvalds@linuxfoundation.org> wrote: 
 >  
 > > On Tue, Jul 20, 2021 at 2:04 AM Haoran Luo <www@aegistudio.net> wrote: 
 > > > 
 > > >         return reader->read == rb_page_commit(reader) && 
 > > >                 (commit == reader || 
 > > >                  (commit == head && 
 > > > -                 head->read == rb_page_commit(commit))); 
 > > > +                 (rb_page_commit(commit) == 0 || 
 > > > +                  head->read == rb_page_commit(commit)))); 
 > > >  } 
 > > 
 > > Can we please re-write that whole thing to be legible? 
 > > 
 > > That conditional was complicated before. It's worse now. 
 >  
 > I agree. I always cringe when I write code like this. For some reason, 
 > I write code I dislike over adding multiple returns. Not sure why I do 
 > that. Probably some childhood trauma. 
 >  
 > I'll add the comments below, as it looks correct. 
 >  
 > > 
 > > For example, just splitting it up would help: 
 > > 
 >  
 >     /* 
 >      * If what has been read on the reader page is not equal 
 >      * to the content on the reader page, then there's more 
 >      * data (not empty). 
 >      */ 
 >  
 > >         if (reader->read != rb_page_commit(reader)) 
 > >                 return false; 
 > > 
 >  
 >     /* 
 >      * The last thing written is on the reader page, and the above 
 >      * statement already shows everything written was read. 
 >      * (empty buffer) 
 >      */ 
 >  
 > >         if (commit == reader) 
 > >                 return true; 
 > > 
 >  
 >     /* 
 >      * The last page written does not equal the last page read, 
 >      * thus there's more data (not empty). 
 >      */ 
 >  
 > >         if (commit != head) 
 > >                 return false; 
 > > 
 >  
 > So, looking at this, I see that the bug is not here. head->read should 
 > never be non-zero. The issue is that we missed resetting "read" to zero 
 > before adding it back to the ring buffer. 
 >  
 >  
 > >         return !rb_page_commit(commit) || 
 > >                 commit->read == rb_page_commit(commit); 
 >  
 > Really, the original code worked because head->read is expected to be 
 > zero. And looking at the code more, it should even work with: 
 >  
 >     /* 
 >      * With the commit == head page (tail == head), and nothing on 
 >      * the commit page there's nothing in the buffer. 
 >      */ 
 >     if (!rb_page_commit(commit)) 
 >         return true; 
 >  
 > > 
 > > but adding comments to each conditional case would also be good. 
 > > 
 > > Note: I checked "head->read" to "commit->read" in that last 
 > > conditional, because we had _just_ verified that "commit" and "head" 
 > > are the same, and it seems to be more logical to check "commit->read" 
 > > with "rb_page_commit(commit)". 
 >  
 > I believe I kept it as "head" to re-enforce that the two are the same. 
 >  
 >  
 > > 
 > > (And somebody should check that I split it right - I'm just doing this 
 > > in the MUA without knowing the code: I hate conditionals that are 
 > > cross multiple lines like this, and I hate them particularly when they 
 > > turn out to be buggy, so when you fix a complex conditional, that's a 
 > > BIG sign to me that the problem was that the conditional was too 
 > > complex to begin with). 
 > > 
 > > Btw, that "rb_page_commit()" forces a new atomic read each time, so 
 > > the "rb_page_commit(commit)" value should probably be cached as a 
 > > variable before it's used twice. 
 > > 
 >  
 > Well, the double read of rb_page_commit() is not needed as the bug is 
 > that read is non zero. And I'm not sure this patch even fixes the bug, 
 > because read is not updated when commit is, thus read is some random 
 > number that could cause this to return a false positive. "read" needed 
 > to be reset when the page was added back to the ring buffer, and it 
 > appears that it was missed (ever though all the other fields were reset). 
 >  
 > Although this patch is wrong, I'm not against a clean up patch to make 
 > the code easier to read. 
 >  
 > I think the real fix is below (not even compiled tested). 
 >  
 > -- Steve 
 >  
 > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c 
 > index d1463eac11a3..0b9ce927e95f 100644 
 > --- a/kernel/trace/ring_buffer.c 
 > +++ b/kernel/trace/ring_buffer.c 
 > @@ -4445,6 +4445,7 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) 
 >      local_set(&cpu_buffer->reader_page->write, 0); 
 >      local_set(&cpu_buffer->reader_page->entries, 0); 
 >      local_set(&cpu_buffer->reader_page->page->commit, 0); 
 > +    cpu_buffer->reader_page->read = 0; 
 >      cpu_buffer->reader_page->real_end = 0; 
 >  
 >  spin: 
 > 

I'll have a try and tell you the result. But I've seen the code written on

https://github.com/torvalds/linux/blob/2734d6c1b1a089fb593ef6a23d4b70903526fe0c/kernel/trace/ring_buffer.c#L4513
	cpu_buffer->reader_page->read = 0;

I though the algorithm is to reset the reader_page->read when it swaps with the
head page. And following the original algorithm the read field of a buffer page 
should be either 0 or its original value when the content to read exhaust. So I
added an extra conditional to ensure that.
Steven Rostedt July 21, 2021, 2:48 a.m. UTC | #6
On Wed, 21 Jul 2021 10:13:01 +0800
contact me <www@aegistudio.net> wrote:

> I'll have a try and tell you the result. But I've seen the code written on

Thanks.

> 
> https://github.com/torvalds/linux/blob/2734d6c1b1a089fb593ef6a23d4b70903526fe0c/kernel/trace/ring_buffer.c#L4513
> 	cpu_buffer->reader_page->read = 0;

Yeah, that clears the "read" for the reader page after it gets swapped
out of the writers buffer, and before the reader gets to it.

> 
> I though the algorithm is to reset the reader_page->read when it swaps with the
> head page. And following the original algorithm the read field of a buffer page 
> should be either 0 or its original value when the content to read exhaust. So I
> added an extra conditional to ensure that.

Actually, we could just change it to ignore read as well, and that
could be the fix.

Basically, the bug is that we use that "read" field that's in the
writer's buffer (the main ring buffer outside the reader page), and it
has stale data.

So we either need to clear the "read" before placing it back into the
writer's buffer (which my patch does). Or we can simply ignore it,
which would change your patch to:

 	return reader->read == rb_page_commit(reader) &&
 		(commit == reader ||
 		 (commit == head &&
-		  head->read == rb_page_commit(commit)));
+		  (rb_page_commit(commit) == 0)));
 }

In which case, you can send me that patch with the cleanup that Linus
suggested, and my commenting.

-- Steve
diff mbox series

Patch

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d1463eac11a3..00446a501742 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3883,7 +3883,8 @@  static bool rb_per_cpu_empty(struct ring_buffer_per_cpu *cpu_buffer)
 	return reader->read == rb_page_commit(reader) &&
 		(commit == reader ||
 		 (commit == head &&
-		  head->read == rb_page_commit(commit)));
+		  (rb_page_commit(commit) == 0 ||
+		   head->read == rb_page_commit(commit))));
 }
 
 /**