diff mbox series

tracing: Have trace_marker writes be just half of TRACE_SEQ_SIZE

Message ID 20240304192710.4c99677c@gandalf.local.home (mailing list archive)
State Superseded
Headers show
Series tracing: Have trace_marker writes be just half of TRACE_SEQ_SIZE | expand

Commit Message

Steven Rostedt March 5, 2024, 12:27 a.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Since the size of trace_seq's buffer is the max an event can output, have
the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That
will keep writes that has meta data written from being dropped (but
reported), because the total output of the print event is greater than
what the trace_seq can hold.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Randy Dunlap March 5, 2024, 12:43 a.m. UTC | #1
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 8198bfc54b58..d68544aef65f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
>  	if ((ssize_t)cnt < 0)
>  		return -EINVAL;
>  
> +	/*
> +	 * TRACE_SEQ_SIZE is the total size of trace_seq buffer used
> +	 * for output. As the print event outputs more than just
> +	 * the string written, keep it smaller than the trace_seq
> +	 * as it could drop the event if the extra data makes it bigger
> +	 * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE

	                                          the

> +	 * is more than enough.
> +	 */
> +	if (cnt > TRACE_SEQ_SIZE / 2)
> +		cnt = TRACE_SEQ_SIZE / 2;
> +
>  	meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
>   again:
>  	size = cnt + meta_size;
Steven Rostedt March 5, 2024, 12:49 a.m. UTC | #2
On Mon, 4 Mar 2024 16:43:46 -0800
Randy Dunlap <rdunlap@infradead.org> wrote:

> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 8198bfc54b58..d68544aef65f 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
> >  	if ((ssize_t)cnt < 0)
> >  		return -EINVAL;
> >  
> > +	/*
> > +	 * TRACE_SEQ_SIZE is the total size of trace_seq buffer used
> > +	 * for output. As the print event outputs more than just
> > +	 * the string written, keep it smaller than the trace_seq
> > +	 * as it could drop the event if the extra data makes it bigger
> > +	 * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE  
> 
> 	                                          the

I honestly think my 't' key isn't triggering as much. At least when before
hitting 'h', as I noticed I've been writing "he", "hey" and "here" a lot,
and spell check isn't (obviously) catching it ;-)

-- Steve


> 
> > +	 * is more than enough.
> > +	 */
> > +	if (cnt > TRACE_SEQ_SIZE / 2)
> > +		cnt = TRACE_SEQ_SIZE / 2;
> > +
> >  	meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
> >   again:
> >  	size = cnt + meta_size;  
> 
>
Mathieu Desnoyers March 5, 2024, 1:15 a.m. UTC | #3
On 2024-03-04 19:27, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> Since the size of trace_seq's buffer is the max an event can output, have
> the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That
> will keep writes that has meta data written from being dropped (but
> reported), because the total output of the print event is greater than
> what the trace_seq can hold.

Defining the trace_mark limit in terms of "TRACE_SEQ_SIZE / 2"
seems backwards. It's basically using a define of the maximum
buffer size for the pretty-printing output and defining the maximum
input size of a system call to half of that.

I'd rather see, in a header file shared between tracing mark
write implementation and output implementation:

#define TRACING_MARK_MAX_SIZE	4096

and then a static validation that this input fits within your
pretty printing output in the output implementation file:

BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE);

This way we clearly document that the tracing mark write
input limit is 4kB, rather than something derived from
the size of an output buffer.

Thanks,

Mathieu

> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>   kernel/trace/trace.c | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 8198bfc54b58..d68544aef65f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
>   	if ((ssize_t)cnt < 0)
>   		return -EINVAL;
>   
> +	/*
> +	 * TRACE_SEQ_SIZE is the total size of trace_seq buffer used
> +	 * for output. As the print event outputs more than just
> +	 * the string written, keep it smaller than the trace_seq
> +	 * as it could drop the event if the extra data makes it bigger
> +	 * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE
> +	 * is more than enough.
> +	 */
> +	if (cnt > TRACE_SEQ_SIZE / 2)
> +		cnt = TRACE_SEQ_SIZE / 2;
> +
>   	meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
>    again:
>   	size = cnt + meta_size;
> @@ -7328,11 +7339,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
>   	if (cnt < FAULTED_SIZE)
>   		size += FAULTED_SIZE - cnt;
>   
> -	if (size > TRACE_SEQ_BUFFER_SIZE) {
> -		cnt -= size - TRACE_SEQ_BUFFER_SIZE;
> -		goto again;
> -	}
> -
>   	buffer = tr->array_buffer.buffer;
>   	event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
>   					    tracing_gen_ctx());
Steven Rostedt March 5, 2024, 1:35 a.m. UTC | #4
On Mon, 4 Mar 2024 20:15:57 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> On 2024-03-04 19:27, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > 
> > Since the size of trace_seq's buffer is the max an event can output, have
> > the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That
> > will keep writes that has meta data written from being dropped (but
> > reported), because the total output of the print event is greater than
> > what the trace_seq can hold.  
> 
> Defining the trace_mark limit in terms of "TRACE_SEQ_SIZE / 2"
> seems backwards. It's basically using a define of the maximum
> buffer size for the pretty-printing output and defining the maximum
> input size of a system call to half of that.
> 
> I'd rather see, in a header file shared between tracing mark
> write implementation and output implementation:
> 
> #define TRACING_MARK_MAX_SIZE	4096
> 
> and then a static validation that this input fits within your
> pretty printing output in the output implementation file:
> 
> BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE);

That's not the meta size I'm worried about. The sizeof(meta data) is the
raw event binary data, which is not related to the size of the event output.

 # cd /sys/kernel/tracing
 # echo hello > trace_marker
 # cat trace
[..]
           <...>-999     [001] .....  2296.140373: tracing_mark_write: hello
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   This is the meta data that is added to trace_seq

-- Steve



> 
> This way we clearly document that the tracing mark write
> input limit is 4kB, rather than something derived from
> the size of an output buffer.
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > ---
> >   kernel/trace/trace.c | 16 +++++++++++-----
> >   1 file changed, 11 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 8198bfc54b58..d68544aef65f 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
> >   	if ((ssize_t)cnt < 0)
> >   		return -EINVAL;
> >   
> > +	/*
> > +	 * TRACE_SEQ_SIZE is the total size of trace_seq buffer used
> > +	 * for output. As the print event outputs more than just
> > +	 * the string written, keep it smaller than the trace_seq
> > +	 * as it could drop the event if the extra data makes it bigger
> > +	 * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE
> > +	 * is more than enough.
> > +	 */
> > +	if (cnt > TRACE_SEQ_SIZE / 2)
> > +		cnt = TRACE_SEQ_SIZE / 2;
> > +
> >   	meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
> >    again:
> >   	size = cnt + meta_size;
> > @@ -7328,11 +7339,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
> >   	if (cnt < FAULTED_SIZE)
> >   		size += FAULTED_SIZE - cnt;
> >   
> > -	if (size > TRACE_SEQ_BUFFER_SIZE) {
> > -		cnt -= size - TRACE_SEQ_BUFFER_SIZE;
> > -		goto again;
> > -	}
> > -
> >   	buffer = tr->array_buffer.buffer;
> >   	event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
> >   					    tracing_gen_ctx());  
>
Mathieu Desnoyers March 5, 2024, 1:36 a.m. UTC | #5
On 2024-03-04 20:35, Steven Rostedt wrote:
> On Mon, 4 Mar 2024 20:15:57 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> On 2024-03-04 19:27, Steven Rostedt wrote:
>>> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>>>
>>> Since the size of trace_seq's buffer is the max an event can output, have
>>> the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That
>>> will keep writes that has meta data written from being dropped (but
>>> reported), because the total output of the print event is greater than
>>> what the trace_seq can hold.
>>
>> Defining the trace_mark limit in terms of "TRACE_SEQ_SIZE / 2"
>> seems backwards. It's basically using a define of the maximum
>> buffer size for the pretty-printing output and defining the maximum
>> input size of a system call to half of that.
>>
>> I'd rather see, in a header file shared between tracing mark
>> write implementation and output implementation:
>>
>> #define TRACING_MARK_MAX_SIZE	4096
>>
>> and then a static validation that this input fits within your
>> pretty printing output in the output implementation file:
>>
>> BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE);
> 
> That's not the meta size I'm worried about. The sizeof(meta data) is the
> raw event binary data, which is not related to the size of the event output.
> 
>   # cd /sys/kernel/tracing
>   # echo hello > trace_marker
>   # cat trace
> [..]
>             <...>-999     [001] .....  2296.140373: tracing_mark_write: hello
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     This is the meta data that is added to trace_seq

If this header has a known well-defined upper-limit length, then use
that in the BUILD_BUG_ON().

Thanks,

Mathieu

> 
> -- Steve
> -- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Steven Rostedt March 5, 2024, 1:41 a.m. UTC | #6
On Mon, 4 Mar 2024 20:35:16 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE);  
> 
> That's not the meta size I'm worried about. The sizeof(meta data) is the
> raw event binary data, which is not related to the size of the event output.
> 
>  # cd /sys/kernel/tracing
>  # echo hello > trace_marker
>  # cat trace
> [..]
>            <...>-999     [001] .....  2296.140373: tracing_mark_write: hello
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>    This is the meta data that is added to trace_seq

That said, the meta data is most likely not going to be more than 128 bytes
(it shouldn't be more than 80).

I could do as you suggest and create a separate TRACE_MARKER_SIZE and just
make sure that it's less than TRACE_SEQ_BUFFER_SIZE (as that's the size of
the content) by 128 bytes.

/* Added meta data should not be more than 128 bytes */
BUILD_BUG_ON((TRACE_MARKER_MAX_SIZE + 128) > TRACE_SEQ_BUFFER_SIZE);

-- Steve
Mathieu Desnoyers March 5, 2024, 1:42 a.m. UTC | #7
On 2024-03-04 20:41, Steven Rostedt wrote:
> On Mon, 4 Mar 2024 20:35:16 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>> BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE);
>>
>> That's not the meta size I'm worried about. The sizeof(meta data) is the
>> raw event binary data, which is not related to the size of the event output.
>>
>>   # cd /sys/kernel/tracing
>>   # echo hello > trace_marker
>>   # cat trace
>> [..]
>>             <...>-999     [001] .....  2296.140373: tracing_mark_write: hello
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>     This is the meta data that is added to trace_seq
> 
> That said, the meta data is most likely not going to be more than 128 bytes
> (it shouldn't be more than 80).
> 
> I could do as you suggest and create a separate TRACE_MARKER_SIZE and just
> make sure that it's less than TRACE_SEQ_BUFFER_SIZE (as that's the size of
> the content) by 128 bytes.
> 
> /* Added meta data should not be more than 128 bytes */
> BUILD_BUG_ON((TRACE_MARKER_MAX_SIZE + 128) > TRACE_SEQ_BUFFER_SIZE);

Bonus points if you add

#define TRACE_OUTPUT_META_DATA_MAX_LEN		80

and a runtime check in the code generating this header.

This would avoid adding an unchecked upper limit.

Thanks,

Mathieu

> 
> -- Steve
Steven Rostedt March 5, 2024, 1:57 a.m. UTC | #8
On Mon, 4 Mar 2024 20:36:28 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> >             <...>-999     [001] .....  2296.140373: tracing_mark_write: hello
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >     This is the meta data that is added to trace_seq  
> 
> If this header has a known well-defined upper-limit length, then use
> that in the BUILD_BUG_ON().

Unfortunately there's no set limit. It's built up by different callbacks
and such. The output can be changed by options set by the user and even by
tracers, like the function graph tracer:

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 1)               |  /* hello */


But the worse that will happen if it overflows is that the event is
replaced with:

             <...>-999     [001] .....  2296.140373: [LINE TOO BIG]

But this has never happened outside of development. I guess you could
trigger it if you add a trace_printk() that has a string bigger than
TRACE_SEQ_BUFFER_SIZE. But as Linus says, "Don't do stupid things" ;-)

But in reality, with all the options and everything, I've never seen the
appended text more than 80 bytes (and probably much less).

-- Steve
Steven Rostedt March 5, 2024, 1:59 a.m. UTC | #9
On Mon, 4 Mar 2024 20:42:39 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> #define TRACE_OUTPUT_META_DATA_MAX_LEN		80
> 
> and a runtime check in the code generating this header.
> 
> This would avoid adding an unchecked upper limit.

That would be a lot of complex code that is for debugging something that
has never happened in the past 16 years and Linus has already reprimanded
me on doing such things.

-- Steve
Mathieu Desnoyers March 5, 2024, 2:18 a.m. UTC | #10
On 2024-03-04 20:59, Steven Rostedt wrote:
> On Mon, 4 Mar 2024 20:42:39 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> #define TRACE_OUTPUT_META_DATA_MAX_LEN		80
>>
>> and a runtime check in the code generating this header.
>>
>> This would avoid adding an unchecked upper limit.
> 
> That would be a lot of complex code that is for debugging something that
> has never happened in the past 16 years and Linus has already reprimanded
> me on doing such things.

Is it more complex than remembering the iterator position in
print_trace_fmt() right before:

         if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) {
                 if (iter->iter_flags & TRACE_FILE_LAT_FMT)
                         trace_print_lat_context(iter);
                 else
                         trace_print_context(iter);
         }

and then checking just after that the offset is not beyond 128
bytes ? Perhaps there is even something internal to "iter"
that already knows about this offset (?).

This would catch any context going beyond its planned upper
limit early. Failing early and not just in rare corner cases
is good.

And it's not for debugging, it's for validation of assumptions
made about an upper bound limit defined for a compile-time
check, so as the code evolves issues are caught early.

Thanks,

Mathieu
Steven Rostedt March 5, 2024, 2:35 a.m. UTC | #11
On Mon, 4 Mar 2024 21:18:13 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> On 2024-03-04 20:59, Steven Rostedt wrote:
> > On Mon, 4 Mar 2024 20:42:39 -0500
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >> #define TRACE_OUTPUT_META_DATA_MAX_LEN		80
> >>
> >> and a runtime check in the code generating this header.
> >>
> >> This would avoid adding an unchecked upper limit.  
> > 
> > That would be a lot of complex code that is for debugging something that
> > has never happened in the past 16 years and Linus has already reprimanded
> > me on doing such things.  
> 
> Is it more complex than remembering the iterator position in
> print_trace_fmt() right before:
> 
>          if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) {
>                  if (iter->iter_flags & TRACE_FILE_LAT_FMT)
>                          trace_print_lat_context(iter);
>                  else
>                          trace_print_context(iter);
>          }

You forgot all of theses:

	if (iter->ent->type == TRACE_BPUTS &&
			trace_flags & TRACE_ITER_PRINTK &&
			trace_flags & TRACE_ITER_PRINTK_MSGONLY)
		return trace_print_bputs_msg_only(iter);

	if (iter->ent->type == TRACE_BPRINT &&
			trace_flags & TRACE_ITER_PRINTK &&
			trace_flags & TRACE_ITER_PRINTK_MSGONLY)
		return trace_print_bprintk_msg_only(iter);

	if (iter->ent->type == TRACE_PRINT &&
			trace_flags & TRACE_ITER_PRINTK &&
			trace_flags & TRACE_ITER_PRINTK_MSGONLY)
		return trace_print_printk_msg_only(iter);

	if (trace_flags & TRACE_ITER_BIN)
		return print_bin_fmt(iter);

	if (trace_flags & TRACE_ITER_HEX)
		return print_hex_fmt(iter);

	if (trace_flags & TRACE_ITER_RAW)
		return print_raw_fmt(iter);

	return print_trace_fmt(iter);

> 
> and then checking just after that the offset is not beyond 128
> bytes ? Perhaps there is even something internal to "iter"
> that already knows about this offset (?).
> 
> This would catch any context going beyond its planned upper
> limit early. Failing early and not just in rare corner cases
> is good.
> 
> And it's not for debugging, it's for validation of assumptions
> made about an upper bound limit defined for a compile-time
> check, so as the code evolves issues are caught early.

validating is debugging.

Seriously Mathieu. Why do we need that? The BUILD_BUG_ON() itself is
probably not even needed. Why do all this just to prevent an unlikely
situation of an event being dropped from printing?

It's not even lost (unless they are using trace_pipe, which very few people
use). If you see a "LINE TOO BIG" you can run:

  # trace-cmd extract
  # trace-cmd report

Which will pull out the raw data where the kernel trace_seq doesn't matter
and trace_cmd will handle it just fine (its trace_seq is dynamically
allocated and can increase in size when needed).

-- Steve
Steven Rostedt March 5, 2024, 2:37 a.m. UTC | #12
On Mon, 4 Mar 2024 21:35:38 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > And it's not for debugging, it's for validation of assumptions
> > made about an upper bound limit defined for a compile-time
> > check, so as the code evolves issues are caught early.  
> 
> validating is debugging.

Did Linus put you up to this? To test me to see if I'm learning how to say "No" ;-)

-- Steve
Mathieu Desnoyers March 5, 2024, 2:48 a.m. UTC | #13
On 2024-03-04 21:37, Steven Rostedt wrote:
> On Mon, 4 Mar 2024 21:35:38 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>> And it's not for debugging, it's for validation of assumptions
>>> made about an upper bound limit defined for a compile-time
>>> check, so as the code evolves issues are caught early.
>>
>> validating is debugging.
> 
> Did Linus put you up to this? To test me to see if I'm learning how to say "No" ;-)

No, he did not. I genuinely think that validating size limits like
this either at compile time or, when they can vary at runtime like
in this case, with a dynamic check, decreases the cognitive
load on the reviewers. We can then assume that whatever limit
was put in place is actually enforced and not just wishful
thinking.

If the "header" size upper bound is not validated at runtime, there
is not much point in adding the BUILD_BUG_ON() based on that value
in the first place, and you should then just add a runtime check that
you don't overflow the output buffer before writing the output to it.

Thanks,

Mathieu
Steven Rostedt March 5, 2024, 3:07 a.m. UTC | #14
On Mon, 4 Mar 2024 21:48:44 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> On 2024-03-04 21:37, Steven Rostedt wrote:
> > On Mon, 4 Mar 2024 21:35:38 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >   
> >>> And it's not for debugging, it's for validation of assumptions
> >>> made about an upper bound limit defined for a compile-time
> >>> check, so as the code evolves issues are caught early.  
> >>
> >> validating is debugging.  
> > 
> > Did Linus put you up to this? To test me to see if I'm learning how to say "No" ;-)  
> 
> No, he did not. 

I was being facetious.

> I genuinely think that validating size limits like
> this either at compile time or, when they can vary at runtime like
> in this case, with a dynamic check, decreases the cognitive
> load on the reviewers. We can then assume that whatever limit
> was put in place is actually enforced and not just wishful
> thinking.

I'm for that too. But the purpose of trace_seq was to be able to safely
append strings on top of each other. It was written specifically for the
trace file output. It should be long enough to print the entire string. If
the trace_seq overflows, it will not add any more content. But this is
checked at the end to see if everything did fit.

I had the "half" size logic because it was still way more than enough to
hold the write and the header, where the header should never be that big.

It's not much different than vsnprintf() having a 32K precision field that
warns if you go over it. If a header is 128 bytes then it is really a
failure in output, as it will cause each line to overflow a normal 80
character terminal screen before it even gets to the content of the event.

Such a header is stupid and this is where I'm starting to understand Linus,
where we don't need to write a bunch of debug code to make sure we don't
have some huge header just because it may cause the content of the event
from being visible.

Oh! and yes, there are events that can be large (stack traces and such). If
a header is created to be that big, you will likely drop actual real
events that have nothing to do with trace_marker.

> 
> If the "header" size upper bound is not validated at runtime, there
> is not much point in adding the BUILD_BUG_ON() based on that value
> in the first place, and you should then just add a runtime check that
> you don't overflow the output buffer before writing the output to it.


OK, then I guess we don't need the checks. 4K for a trace_marker limit and
8K for the trace_seq that will eventually hold its content, is a pretty
simple concept. Do we need a bunch of logic to keep it from breaking?
Especially since trace_marker is used a lot in testing the tracing code
itself.

-- Steve
diff mbox series

Patch

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8198bfc54b58..d68544aef65f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7320,6 +7320,17 @@  tracing_mark_write(struct file *filp, const char __user *ubuf,
 	if ((ssize_t)cnt < 0)
 		return -EINVAL;
 
+	/*
+	 * TRACE_SEQ_SIZE is the total size of trace_seq buffer used
+	 * for output. As the print event outputs more than just
+	 * the string written, keep it smaller than the trace_seq
+	 * as it could drop the event if the extra data makes it bigger
+	 * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE
+	 * is more than enough.
+	 */
+	if (cnt > TRACE_SEQ_SIZE / 2)
+		cnt = TRACE_SEQ_SIZE / 2;
+
 	meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
  again:
 	size = cnt + meta_size;
@@ -7328,11 +7339,6 @@  tracing_mark_write(struct file *filp, const char __user *ubuf,
 	if (cnt < FAULTED_SIZE)
 		size += FAULTED_SIZE - cnt;
 
-	if (size > TRACE_SEQ_BUFFER_SIZE) {
-		cnt -= size - TRACE_SEQ_BUFFER_SIZE;
-		goto again;
-	}
-
 	buffer = tr->array_buffer.buffer;
 	event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
 					    tracing_gen_ctx());