Message ID | 20231226125902.4a057f1d@gandalf.local.home (mailing list archive) |
---|---|
State | Accepted |
Commit | 623b1f896fa8a669a277ee5a258307a16c7377a3 |
Headers | show |
Series | ring-buffer: Fix wake ups when buffer_percent is set to 100 | expand |
On Tue, 26 Dec 2023 12:59:02 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > The tracefs file "buffer_percent" is to allow user space to set a > water-mark on how much of the tracing ring buffer needs to be filled in > order to wake up a blocked reader. > > 0 - is to wait until any data is in the buffer > 1 - is to wait for 1% of the sub buffers to be filled > 50 - would be half of the sub buffers are filled with data > 100 - is not to wake the waiter until the ring buffer is completely full > > Unfortunately the test for being full was: > > dirty = ring_buffer_nr_dirty_pages(buffer, cpu); > return (dirty * 100) > (full * nr_pages); > > Where "full" is the value for "buffer_percent". > > There is two issues with the above when full == 100. > > 1. dirty * 100 > 100 * nr_pages will never be true > That is, the above is basically saying that if the user sets > buffer_percent to 100, more pages need to be dirty than exist in the > ring buffer! > > 2. The page that the writer is on is never considered dirty, as dirty > pages are only those that are full. When the writer goes to a new > sub-buffer, it clears the contents of that sub-buffer. > > That is, even if the check was ">=" it would still not be equal as the > most pages that can be considered "dirty" is nr_pages - 1. > > To fix this, add one to dirty and use ">=" in the compare. > > Cc: stable@vger.kernel.org > Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage") > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > kernel/trace/ring_buffer.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 83eab547f1d1..32c0dd2fd1c3 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -881,9 +881,14 @@ static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int f > if (!nr_pages || !full) > return true; > > - dirty = ring_buffer_nr_dirty_pages(buffer, cpu); > + /* > + * Add one as dirty will never equal nr_pages, as the sub-buffer > + * that the writer is on is not counted as dirty. > + * This is needed if "buffer_percent" is set to 100. > + */ > + dirty = ring_buffer_nr_dirty_pages(buffer, cpu) + 1; Is this "+ 1" required? If we have 200 pages and 1 buffer is dirty, it is 0.5% dirty. Consider @full = 1%. @dirty = 1 + 1 = 2 and @dirty * 100 == 200. but @full * @nr_pages = 1 * 200 = 200. Thus it hits (200 >= 200 is true) even if dirty pages are 0.5%. > > - return (dirty * 100) > (full * nr_pages); > + return (dirty * 100) >= (full * nr_pages); Thank you, > } > > /* > -- > 2.42.0 >
On Wed, 27 Dec 2023 07:57:08 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Tue, 26 Dec 2023 12:59:02 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > The tracefs file "buffer_percent" is to allow user space to set a > > water-mark on how much of the tracing ring buffer needs to be filled in > > order to wake up a blocked reader. > > > > 0 - is to wait until any data is in the buffer > > 1 - is to wait for 1% of the sub buffers to be filled > > 50 - would be half of the sub buffers are filled with data > > 100 - is not to wake the waiter until the ring buffer is completely full > > > > Unfortunately the test for being full was: > > > > dirty = ring_buffer_nr_dirty_pages(buffer, cpu); > > return (dirty * 100) > (full * nr_pages); > > > > Where "full" is the value for "buffer_percent". > > > > There is two issues with the above when full == 100. > > > > 1. dirty * 100 > 100 * nr_pages will never be true > > That is, the above is basically saying that if the user sets > > buffer_percent to 100, more pages need to be dirty than exist in the > > ring buffer! > > > > 2. The page that the writer is on is never considered dirty, as dirty > > pages are only those that are full. When the writer goes to a new > > sub-buffer, it clears the contents of that sub-buffer. > > > > That is, even if the check was ">=" it would still not be equal as the > > most pages that can be considered "dirty" is nr_pages - 1. > > > > To fix this, add one to dirty and use ">=" in the compare. > > > > Cc: stable@vger.kernel.org > > Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage") > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > kernel/trace/ring_buffer.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > index 83eab547f1d1..32c0dd2fd1c3 100644 > > --- a/kernel/trace/ring_buffer.c > > +++ b/kernel/trace/ring_buffer.c > > @@ -881,9 +881,14 @@ static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int f > > if (!nr_pages || !full) > > return true; > > > > - dirty = ring_buffer_nr_dirty_pages(buffer, cpu); > > + /* > > + * Add one as dirty will never equal nr_pages, as the sub-buffer > > + * that the writer is on is not counted as dirty. > > + * This is needed if "buffer_percent" is set to 100. > > + */ > > + dirty = ring_buffer_nr_dirty_pages(buffer, cpu) + 1; > > Is this "+ 1" required? If we have 200 pages and 1 buffer is dirty, > it is 0.5% dirty. Consider @full = 1%. Yes it is required, as the comment above it states. dirty will never equal nr_pages. Without it, buffer_percent == 100 will never wake up. The +1 is to add the page the writer is on, which is never considered "dirty". > > @dirty = 1 + 1 = 2 and @dirty * 100 == 200. but > @full * @nr_pages = 1 * 200 = 200. > Thus it hits (200 >= 200 is true) even if dirty pages are 0.5%. Do we care? What's the difference if it wakes up on 2 dirty pages or 1? It would be very hard to measure the difference. But if you say 100, which means "I want to wake up when full" it will never wake up. Because it will always be nr_pages - 1. We could also say the +1 is the reader page too, because that's not counted as well. In other words, we can bike shed this to make 1% accurate (which honestly, I have no idea what the use case for that would be) or we can fix the bug that has 100% which just means, wake me up if the buffer is full, and when the writer is on the last page, it is considered full. -- Steve
On Thu, 28 Dec 2023 23:05:21 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 27 Dec 2023 07:57:08 +0900 > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > > On Tue, 26 Dec 2023 12:59:02 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > > > The tracefs file "buffer_percent" is to allow user space to set a > > > water-mark on how much of the tracing ring buffer needs to be filled in > > > order to wake up a blocked reader. > > > > > > 0 - is to wait until any data is in the buffer > > > 1 - is to wait for 1% of the sub buffers to be filled > > > 50 - would be half of the sub buffers are filled with data > > > 100 - is not to wake the waiter until the ring buffer is completely full > > > > > > Unfortunately the test for being full was: > > > > > > dirty = ring_buffer_nr_dirty_pages(buffer, cpu); > > > return (dirty * 100) > (full * nr_pages); > > > > > > Where "full" is the value for "buffer_percent". > > > > > > There is two issues with the above when full == 100. > > > > > > 1. dirty * 100 > 100 * nr_pages will never be true > > > That is, the above is basically saying that if the user sets > > > buffer_percent to 100, more pages need to be dirty than exist in the > > > ring buffer! > > > > > > 2. The page that the writer is on is never considered dirty, as dirty > > > pages are only those that are full. When the writer goes to a new > > > sub-buffer, it clears the contents of that sub-buffer. > > > > > > That is, even if the check was ">=" it would still not be equal as the > > > most pages that can be considered "dirty" is nr_pages - 1. > > > > > > To fix this, add one to dirty and use ">=" in the compare. > > > > > > Cc: stable@vger.kernel.org > > > Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage") > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > --- > > > kernel/trace/ring_buffer.c | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > > index 83eab547f1d1..32c0dd2fd1c3 100644 > > > --- a/kernel/trace/ring_buffer.c > > > +++ b/kernel/trace/ring_buffer.c > > > @@ -881,9 +881,14 @@ static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int f > > > if (!nr_pages || !full) > > > return true; > > > > > > - dirty = ring_buffer_nr_dirty_pages(buffer, cpu); > > > + /* > > > + * Add one as dirty will never equal nr_pages, as the sub-buffer > > > + * that the writer is on is not counted as dirty. > > > + * This is needed if "buffer_percent" is set to 100. > > > + */ > > > + dirty = ring_buffer_nr_dirty_pages(buffer, cpu) + 1; > > > > Is this "+ 1" required? If we have 200 pages and 1 buffer is dirty, > > it is 0.5% dirty. Consider @full = 1%. > > Yes it is required, as the comment above it states. dirty will never > equal nr_pages. Without it, buffer_percent == 100 will never wake up. > > The +1 is to add the page the writer is on, which is never considered > "dirty". Ah, got it. I thought the page under writing is dirty too (similar to the memory page) That's my misunderstanding. > > > > > @dirty = 1 + 1 = 2 and @dirty * 100 == 200. but > > @full * @nr_pages = 1 * 200 = 200. > > Thus it hits (200 >= 200 is true) even if dirty pages are 0.5%. > > Do we care? > > What's the difference if it wakes up on 2 dirty pages or 1? It would be > very hard to measure the difference. > > But if you say 100, which means "I want to wake up when full" it will > never wake up. Because it will always be nr_pages - 1. OK, because we count only the complete page as dirty. In that case, the number of readable page should be dirty + 1. > > We could also say the +1 is the reader page too, because that's not > counted as well. > > In other words, we can bike shed this to make 1% accurate (which > honestly, I have no idea what the use case for that would be) or we can > fix the bug that has 100% which just means, wake me up if the buffer is > full, and when the writer is on the last page, it is considered full. Got it. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Thank you, > > -- Steve >
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 83eab547f1d1..32c0dd2fd1c3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -881,9 +881,14 @@ static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int f if (!nr_pages || !full) return true; - dirty = ring_buffer_nr_dirty_pages(buffer, cpu); + /* + * Add one as dirty will never equal nr_pages, as the sub-buffer + * that the writer is on is not counted as dirty. + * This is needed if "buffer_percent" is set to 100. + */ + dirty = ring_buffer_nr_dirty_pages(buffer, cpu) + 1; - return (dirty * 100) > (full * nr_pages); + return (dirty * 100) >= (full * nr_pages); } /*