Message ID | 20240304192710.4c99677c@gandalf.local.home (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tracing: Have trace_marker writes be just half of TRACE_SEQ_SIZE | expand |
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 8198bfc54b58..d68544aef65f 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, > if ((ssize_t)cnt < 0) > return -EINVAL; > > + /* > + * TRACE_SEQ_SIZE is the total size of trace_seq buffer used > + * for output. As the print event outputs more than just > + * the string written, keep it smaller than the trace_seq > + * as it could drop the event if the extra data makes it bigger > + * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE the > + * is more than enough. > + */ > + if (cnt > TRACE_SEQ_SIZE / 2) > + cnt = TRACE_SEQ_SIZE / 2; > + > meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */ > again: > size = cnt + meta_size;
On Mon, 4 Mar 2024 16:43:46 -0800 Randy Dunlap <rdunlap@infradead.org> wrote: > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index 8198bfc54b58..d68544aef65f 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, > > if ((ssize_t)cnt < 0) > > return -EINVAL; > > > > + /* > > + * TRACE_SEQ_SIZE is the total size of trace_seq buffer used > > + * for output. As the print event outputs more than just > > + * the string written, keep it smaller than the trace_seq > > + * as it could drop the event if the extra data makes it bigger > > + * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE > > the I honestly think my 't' key isn't triggering as much. At least when before hitting 'h', as I noticed I've been writing "he", "hey" and "here" a lot, and spell check isn't (obviously) catching it ;-) -- Steve > > > + * is more than enough. > > + */ > > + if (cnt > TRACE_SEQ_SIZE / 2) > > + cnt = TRACE_SEQ_SIZE / 2; > > + > > meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */ > > again: > > size = cnt + meta_size; > >
On 2024-03-04 19:27, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > Since the size of trace_seq's buffer is the max an event can output, have > the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That > will keep writes that has meta data written from being dropped (but > reported), because the total output of the print event is greater than > what the trace_seq can hold. Defining the trace_mark limit in terms of "TRACE_SEQ_SIZE / 2" seems backwards. It's basically using a define of the maximum buffer size for the pretty-printing output and defining the maximum input size of a system call to half of that. I'd rather see, in a header file shared between tracing mark write implementation and output implementation: #define TRACING_MARK_MAX_SIZE 4096 and then a static validation that this input fits within your pretty printing output in the output implementation file: BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE); This way we clearly document that the tracing mark write input limit is 4kB, rather than something derived from the size of an output buffer. Thanks, Mathieu > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > kernel/trace/trace.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 8198bfc54b58..d68544aef65f 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, > if ((ssize_t)cnt < 0) > return -EINVAL; > > + /* > + * TRACE_SEQ_SIZE is the total size of trace_seq buffer used > + * for output. As the print event outputs more than just > + * the string written, keep it smaller than the trace_seq > + * as it could drop the event if the extra data makes it bigger > + * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE > + * is more than enough. > + */ > + if (cnt > TRACE_SEQ_SIZE / 2) > + cnt = TRACE_SEQ_SIZE / 2; > + > meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */ > again: > size = cnt + meta_size; > @@ -7328,11 +7339,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, > if (cnt < FAULTED_SIZE) > size += FAULTED_SIZE - cnt; > > - if (size > TRACE_SEQ_BUFFER_SIZE) { > - cnt -= size - TRACE_SEQ_BUFFER_SIZE; > - goto again; > - } > - > buffer = tr->array_buffer.buffer; > event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size, > tracing_gen_ctx());
On Mon, 4 Mar 2024 20:15:57 -0500 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > On 2024-03-04 19:27, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > Since the size of trace_seq's buffer is the max an event can output, have > > the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That > > will keep writes that has meta data written from being dropped (but > > reported), because the total output of the print event is greater than > > what the trace_seq can hold. > > Defining the trace_mark limit in terms of "TRACE_SEQ_SIZE / 2" > seems backwards. It's basically using a define of the maximum > buffer size for the pretty-printing output and defining the maximum > input size of a system call to half of that. > > I'd rather see, in a header file shared between tracing mark > write implementation and output implementation: > > #define TRACING_MARK_MAX_SIZE 4096 > > and then a static validation that this input fits within your > pretty printing output in the output implementation file: > > BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE); That's not the meta size I'm worried about. The sizeof(meta data) is the raw event binary data, which is not related to the size of the event output. # cd /sys/kernel/tracing # echo hello > trace_marker # cat trace [..] <...>-999 [001] ..... 2296.140373: tracing_mark_write: hello ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is the meta data that is added to trace_seq -- Steve > > This way we clearly document that the tracing mark write > input limit is 4kB, rather than something derived from > the size of an output buffer. > > Thanks, > > Mathieu > > > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > kernel/trace/trace.c | 16 +++++++++++----- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index 8198bfc54b58..d68544aef65f 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, > > if ((ssize_t)cnt < 0) > > return -EINVAL; > > > > + /* > > + * TRACE_SEQ_SIZE is the total size of trace_seq buffer used > > + * for output. As the print event outputs more than just > > + * the string written, keep it smaller than the trace_seq > > + * as it could drop the event if the extra data makes it bigger > > + * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE > > + * is more than enough. > > + */ > > + if (cnt > TRACE_SEQ_SIZE / 2) > > + cnt = TRACE_SEQ_SIZE / 2; > > + > > meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */ > > again: > > size = cnt + meta_size; > > @@ -7328,11 +7339,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, > > if (cnt < FAULTED_SIZE) > > size += FAULTED_SIZE - cnt; > > > > - if (size > TRACE_SEQ_BUFFER_SIZE) { > > - cnt -= size - TRACE_SEQ_BUFFER_SIZE; > > - goto again; > > - } > > - > > buffer = tr->array_buffer.buffer; > > event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size, > > tracing_gen_ctx()); >
On 2024-03-04 20:35, Steven Rostedt wrote: > On Mon, 4 Mar 2024 20:15:57 -0500 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> On 2024-03-04 19:27, Steven Rostedt wrote: >>> From: "Steven Rostedt (Google)" <rostedt@goodmis.org> >>> >>> Since the size of trace_seq's buffer is the max an event can output, have >>> the trace_marker be half of the entire TRACE_SEQ_SIZE, which is 4K. That >>> will keep writes that has meta data written from being dropped (but >>> reported), because the total output of the print event is greater than >>> what the trace_seq can hold. >> >> Defining the trace_mark limit in terms of "TRACE_SEQ_SIZE / 2" >> seems backwards. It's basically using a define of the maximum >> buffer size for the pretty-printing output and defining the maximum >> input size of a system call to half of that. >> >> I'd rather see, in a header file shared between tracing mark >> write implementation and output implementation: >> >> #define TRACING_MARK_MAX_SIZE 4096 >> >> and then a static validation that this input fits within your >> pretty printing output in the output implementation file: >> >> BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE); > > That's not the meta size I'm worried about. The sizeof(meta data) is the > raw event binary data, which is not related to the size of the event output. > > # cd /sys/kernel/tracing > # echo hello > trace_marker > # cat trace > [..] > <...>-999 [001] ..... 2296.140373: tracing_mark_write: hello > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > This is the meta data that is added to trace_seq If this header has a known well-defined upper-limit length, then use that in the BUILD_BUG_ON(). Thanks, Mathieu > > -- Steve > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
On Mon, 4 Mar 2024 20:35:16 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE); > > That's not the meta size I'm worried about. The sizeof(meta data) is the > raw event binary data, which is not related to the size of the event output. > > # cd /sys/kernel/tracing > # echo hello > trace_marker > # cat trace > [..] > <...>-999 [001] ..... 2296.140373: tracing_mark_write: hello > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > This is the meta data that is added to trace_seq That said, the meta data is most likely not going to be more than 128 bytes (it shouldn't be more than 80). I could do as you suggest and create a separate TRACE_MARKER_SIZE and just make sure that it's less than TRACE_SEQ_BUFFER_SIZE (as that's the size of the content) by 128 bytes. /* Added meta data should not be more than 128 bytes */ BUILD_BUG_ON((TRACE_MARKER_MAX_SIZE + 128) > TRACE_SEQ_BUFFER_SIZE); -- Steve
On 2024-03-04 20:41, Steven Rostedt wrote: > On Mon, 4 Mar 2024 20:35:16 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > >>> BUILD_BUG_ON(TRACING_MARK_MAX_SIZE + sizeof(meta data stuff...) > TRACE_SEQ_SIZE); >> >> That's not the meta size I'm worried about. The sizeof(meta data) is the >> raw event binary data, which is not related to the size of the event output. >> >> # cd /sys/kernel/tracing >> # echo hello > trace_marker >> # cat trace >> [..] >> <...>-999 [001] ..... 2296.140373: tracing_mark_write: hello >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> This is the meta data that is added to trace_seq > > That said, the meta data is most likely not going to be more than 128 bytes > (it shouldn't be more than 80). > > I could do as you suggest and create a separate TRACE_MARKER_SIZE and just > make sure that it's less than TRACE_SEQ_BUFFER_SIZE (as that's the size of > the content) by 128 bytes. > > /* Added meta data should not be more than 128 bytes */ > BUILD_BUG_ON((TRACE_MARKER_MAX_SIZE + 128) > TRACE_SEQ_BUFFER_SIZE); Bonus points if you add #define TRACE_OUTPUT_META_DATA_MAX_LEN 80 and a runtime check in the code generating this header. This would avoid adding an unchecked upper limit. Thanks, Mathieu > > -- Steve
On Mon, 4 Mar 2024 20:36:28 -0500 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > <...>-999 [001] ..... 2296.140373: tracing_mark_write: hello > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This is the meta data that is added to trace_seq > > If this header has a known well-defined upper-limit length, then use > that in the BUILD_BUG_ON(). Unfortunately there's no set limit. It's built up by different callbacks and such. The output can be changed by options set by the user and even by tracers, like the function graph tracer: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) | /* hello */ But the worse that will happen if it overflows is that the event is replaced with: <...>-999 [001] ..... 2296.140373: [LINE TOO BIG] But this has never happened outside of development. I guess you could trigger it if you add a trace_printk() that has a string bigger than TRACE_SEQ_BUFFER_SIZE. But as Linus says, "Don't do stupid things" ;-) But in reality, with all the options and everything, I've never seen the appended text more than 80 bytes (and probably much less). -- Steve
On Mon, 4 Mar 2024 20:42:39 -0500 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > #define TRACE_OUTPUT_META_DATA_MAX_LEN 80 > > and a runtime check in the code generating this header. > > This would avoid adding an unchecked upper limit. That would be a lot of complex code that is for debugging something that has never happened in the past 16 years and Linus has already reprimanded me on doing such things. -- Steve
On 2024-03-04 20:59, Steven Rostedt wrote: > On Mon, 4 Mar 2024 20:42:39 -0500 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> #define TRACE_OUTPUT_META_DATA_MAX_LEN 80 >> >> and a runtime check in the code generating this header. >> >> This would avoid adding an unchecked upper limit. > > That would be a lot of complex code that is for debugging something that > has never happened in the past 16 years and Linus has already reprimanded > me on doing such things. Is it more complex than remembering the iterator position in print_trace_fmt() right before: if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { if (iter->iter_flags & TRACE_FILE_LAT_FMT) trace_print_lat_context(iter); else trace_print_context(iter); } and then checking just after that the offset is not beyond 128 bytes ? Perhaps there is even something internal to "iter" that already knows about this offset (?). This would catch any context going beyond its planned upper limit early. Failing early and not just in rare corner cases is good. And it's not for debugging, it's for validation of assumptions made about an upper bound limit defined for a compile-time check, so as the code evolves issues are caught early. Thanks, Mathieu
On Mon, 4 Mar 2024 21:18:13 -0500 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > On 2024-03-04 20:59, Steven Rostedt wrote: > > On Mon, 4 Mar 2024 20:42:39 -0500 > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > >> #define TRACE_OUTPUT_META_DATA_MAX_LEN 80 > >> > >> and a runtime check in the code generating this header. > >> > >> This would avoid adding an unchecked upper limit. > > > > That would be a lot of complex code that is for debugging something that > > has never happened in the past 16 years and Linus has already reprimanded > > me on doing such things. > > Is it more complex than remembering the iterator position in > print_trace_fmt() right before: > > if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { > if (iter->iter_flags & TRACE_FILE_LAT_FMT) > trace_print_lat_context(iter); > else > trace_print_context(iter); > } You forgot all of theses: if (iter->ent->type == TRACE_BPUTS && trace_flags & TRACE_ITER_PRINTK && trace_flags & TRACE_ITER_PRINTK_MSGONLY) return trace_print_bputs_msg_only(iter); if (iter->ent->type == TRACE_BPRINT && trace_flags & TRACE_ITER_PRINTK && trace_flags & TRACE_ITER_PRINTK_MSGONLY) return trace_print_bprintk_msg_only(iter); if (iter->ent->type == TRACE_PRINT && trace_flags & TRACE_ITER_PRINTK && trace_flags & TRACE_ITER_PRINTK_MSGONLY) return trace_print_printk_msg_only(iter); if (trace_flags & TRACE_ITER_BIN) return print_bin_fmt(iter); if (trace_flags & TRACE_ITER_HEX) return print_hex_fmt(iter); if (trace_flags & TRACE_ITER_RAW) return print_raw_fmt(iter); return print_trace_fmt(iter); > > and then checking just after that the offset is not beyond 128 > bytes ? Perhaps there is even something internal to "iter" > that already knows about this offset (?). > > This would catch any context going beyond its planned upper > limit early. Failing early and not just in rare corner cases > is good. > > And it's not for debugging, it's for validation of assumptions > made about an upper bound limit defined for a compile-time > check, so as the code evolves issues are caught early. validating is debugging. Seriously Mathieu. Why do we need that? The BUILD_BUG_ON() itself is probably not even needed. Why do all this just to prevent an unlikely situation of an event being dropped from printing? It's not even lost (unless they are using trace_pipe, which very few people use). If you see a "LINE TOO BIG" you can run: # trace-cmd extract # trace-cmd report Which will pull out the raw data where the kernel trace_seq doesn't matter and trace_cmd will handle it just fine (its trace_seq is dynamically allocated and can increase in size when needed). -- Steve
On Mon, 4 Mar 2024 21:35:38 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > And it's not for debugging, it's for validation of assumptions > > made about an upper bound limit defined for a compile-time > > check, so as the code evolves issues are caught early. > > validating is debugging. Did Linus put you up to this? To test me to see if I'm learning how to say "No" ;-) -- Steve
On 2024-03-04 21:37, Steven Rostedt wrote: > On Mon, 4 Mar 2024 21:35:38 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > >>> And it's not for debugging, it's for validation of assumptions >>> made about an upper bound limit defined for a compile-time >>> check, so as the code evolves issues are caught early. >> >> validating is debugging. > > Did Linus put you up to this? To test me to see if I'm learning how to say "No" ;-) No, he did not. I genuinely think that validating size limits like this either at compile time or, when they can vary at runtime like in this case, with a dynamic check, decreases the cognitive load on the reviewers. We can then assume that whatever limit was put in place is actually enforced and not just wishful thinking. If the "header" size upper bound is not validated at runtime, there is not much point in adding the BUILD_BUG_ON() based on that value in the first place, and you should then just add a runtime check that you don't overflow the output buffer before writing the output to it. Thanks, Mathieu
On Mon, 4 Mar 2024 21:48:44 -0500 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > On 2024-03-04 21:37, Steven Rostedt wrote: > > On Mon, 4 Mar 2024 21:35:38 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > >>> And it's not for debugging, it's for validation of assumptions > >>> made about an upper bound limit defined for a compile-time > >>> check, so as the code evolves issues are caught early. > >> > >> validating is debugging. > > > > Did Linus put you up to this? To test me to see if I'm learning how to say "No" ;-) > > No, he did not. I was being facetious. > I genuinely think that validating size limits like > this either at compile time or, when they can vary at runtime like > in this case, with a dynamic check, decreases the cognitive > load on the reviewers. We can then assume that whatever limit > was put in place is actually enforced and not just wishful > thinking. I'm for that too. But the purpose of trace_seq was to be able to safely append strings on top of each other. It was written specifically for the trace file output. It should be long enough to print the entire string. If the trace_seq overflows, it will not add any more content. But this is checked at the end to see if everything did fit. I had the "half" size logic because it was still way more than enough to hold the write and the header, where the header should never be that big. It's not much different than vsnprintf() having a 32K precision field that warns if you go over it. If a header is 128 bytes then it is really a failure in output, as it will cause each line to overflow a normal 80 character terminal screen before it even gets to the content of the event. Such a header is stupid and this is where I'm starting to understand Linus, where we don't need to write a bunch of debug code to make sure we don't have some huge header just because it may cause the content of the event from being visible. Oh! and yes, there are events that can be large (stack traces and such). If a header is created to be that big, you will likely drop actual real events that have nothing to do with trace_marker. > > If the "header" size upper bound is not validated at runtime, there > is not much point in adding the BUILD_BUG_ON() based on that value > in the first place, and you should then just add a runtime check that > you don't overflow the output buffer before writing the output to it. OK, then I guess we don't need the checks. 4K for a trace_marker limit and 8K for the trace_seq that will eventually hold its content, is a pretty simple concept. Do we need a bunch of logic to keep it from breaking? Especially since trace_marker is used a lot in testing the tracing code itself. -- Steve
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 8198bfc54b58..d68544aef65f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7320,6 +7320,17 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, if ((ssize_t)cnt < 0) return -EINVAL; + /* + * TRACE_SEQ_SIZE is the total size of trace_seq buffer used + * for output. As the print event outputs more than just + * the string written, keep it smaller than the trace_seq + * as it could drop the event if the extra data makes it bigger + * than what the trace_seq can hold. Half he TRACE_SEQ_SIZE + * is more than enough. + */ + if (cnt > TRACE_SEQ_SIZE / 2) + cnt = TRACE_SEQ_SIZE / 2; + meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */ again: size = cnt + meta_size; @@ -7328,11 +7339,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, if (cnt < FAULTED_SIZE) size += FAULTED_SIZE - cnt; - if (size > TRACE_SEQ_BUFFER_SIZE) { - cnt -= size - TRACE_SEQ_BUFFER_SIZE; - goto again; - } - buffer = tr->array_buffer.buffer; event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size, tracing_gen_ctx());