Message ID | 20230801090124.8050-5-zegao@tencent.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | add to report task state in symbolic chars from sched tracepoint | expand |
On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote: > Report priorities in 'short' and prev_state in 'int' to save > some buffer space. And also reorder the fields so that we take > struct alignment into consideration to make the record compact. > > Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> I don't see a single line describing the effort you've done to audit consumers of this tracepoint. *IF* you're wanting to break this tracepoint ABI, because seriously that's what it is, then you get to invest the time and effort to audit the users.
Oops, I thought sending this series for RFC is the "effort" you mean to audit the users :/ Correct me if I'm making stupid moves here and enlighten me what I should do furthermore to audit the users. Thanks, Ze On Tue, Aug 1, 2023 at 7:47 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote: > > Report priorities in 'short' and prev_state in 'int' to save > > some buffer space. And also reorder the fields so that we take > > struct alignment into consideration to make the record compact. > > > > Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > I don't see a single line describing the effort you've done to audit > consumers of this tracepoint. > > *IF* you're wanting to break this tracepoint ABI, because seriously > that's what it is, then you get to invest the time and effort to audit > the users.
On Tue, 1 Aug 2023 17:01:22 +0800 Ze Gao <zegao2021@gmail.com> wrote: > Report priorities in 'short' and prev_state in 'int' to save > some buffer space. And also reorder the fields so that we take > struct alignment into consideration to make the record compact. > > Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > Signed-off-by: Ze Gao <zegao@tencent.com> I'd swap this patch with patch 3. That is, make the field changes first. I'd like this to get in regardless of if the state_char is accepted. We may want to get this in first to see if there's any regressions before we add a state_char. -- Steve > --- > include/trace/events/sched.h | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index e507901bcab8..36863ffb00c6 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -188,7 +188,7 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new, > TP_ARGS(p)); > > #ifdef CREATE_TRACE_POINTS > -static inline long __trace_sched_switch_state(bool preempt, > +static inline int __trace_sched_switch_state(bool preempt, > unsigned int prev_state, > struct task_struct *p) > { > @@ -251,25 +251,25 @@ TRACE_EVENT(sched_switch, > TP_ARGS(preempt, prev, next, prev_state), > > TP_STRUCT__entry( > - __array( char, prev_comm, TASK_COMM_LEN ) > __field( pid_t, prev_pid ) > - __field( int, prev_prio ) > - __field( long, prev_state ) > - __field( char, prev_state_char ) > - __array( char, next_comm, TASK_COMM_LEN ) > __field( pid_t, next_pid ) > - __field( int, next_prio ) > + __field( short, prev_prio ) > + __field( short, next_prio ) > + __field( int, prev_state ) > + __array( char, prev_comm, TASK_COMM_LEN ) > + __array( char, next_comm, TASK_COMM_LEN ) > + __field( char, prev_state_char ) > ), > > TP_fast_assign( > - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); > __entry->prev_pid = prev->pid; > - __entry->prev_prio = prev->prio; > + __entry->next_pid = next->pid; > + __entry->prev_prio = (short) prev->prio; > + __entry->next_prio = (short) next->prio; > __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); > __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev); > memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); > - __entry->next_pid = next->pid; > - __entry->next_prio = next->prio; > + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); > /* XXX SCHED_DEADLINE */ > ), >
On Tue, 1 Aug 2023 13:46:50 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote: > > Report priorities in 'short' and prev_state in 'int' to save > > some buffer space. And also reorder the fields so that we take > > struct alignment into consideration to make the record compact. > > > > Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > I don't see a single line describing the effort you've done to audit > consumers of this tracepoint. > > *IF* you're wanting to break this tracepoint ABI, because seriously > that's what it is, then you get to invest the time and effort to audit > the users. The known major users that I am aware of is raesdaemon, powertop/latencytop, perf, trace-cmd and some bpf tools. The bpf tooling is known to update per kernel. The others all use libtraceevent that can handle this change. What other tools are there? There's Perfetto, but it also looks at tracefs to examine where the values are. There's LTTng, but I believe it uses the raw tracepoint directly and doesn't look at the layout of the ftrace/perf buffers. All other tooling I am slightly aware of uses libtracefs and libtraceveent, as I've been giving many talks on how to use those libraries. -- Steve
Thanks for clarifying this ! Steven. This is really helpful. Regards, Ze On Tue, Aug 1, 2023 at 10:33 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 1 Aug 2023 13:46:50 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote: > > > Report priorities in 'short' and prev_state in 'int' to save > > > some buffer space. And also reorder the fields so that we take > > > struct alignment into consideration to make the record compact. > > > > > > Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > > I don't see a single line describing the effort you've done to audit > > consumers of this tracepoint. > > > > *IF* you're wanting to break this tracepoint ABI, because seriously > > that's what it is, then you get to invest the time and effort to audit > > the users. > > The known major users that I am aware of is raesdaemon, > powertop/latencytop, perf, trace-cmd and some bpf tools. The bpf tooling is > known to update per kernel. The others all use libtraceevent that can > handle this change. > > What other tools are there? There's Perfetto, but it also looks at tracefs > to examine where the values are. There's LTTng, but I believe it uses the > raw tracepoint directly and doesn't look at the layout of the ftrace/perf > buffers. > > All other tooling I am slightly aware of uses libtracefs and libtraceveent, > as I've been giving many talks on how to use those libraries. > > -- Steve
Fair point, will do it in v4 as well. Thanks, Ze On Tue, Aug 1, 2023 at 10:16 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 1 Aug 2023 17:01:22 +0800 > Ze Gao <zegao2021@gmail.com> wrote: > > > Report priorities in 'short' and prev_state in 'int' to save > > some buffer space. And also reorder the fields so that we take > > struct alignment into consideration to make the record compact. > > > > Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > Signed-off-by: Ze Gao <zegao@tencent.com> > > I'd swap this patch with patch 3. That is, make the field changes first. > I'd like this to get in regardless of if the state_char is accepted. We may > want to get this in first to see if there's any regressions before we add a > state_char. > > -- Steve > > > > --- > > include/trace/events/sched.h | 22 +++++++++++----------- > > 1 file changed, 11 insertions(+), 11 deletions(-) > > > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > > index e507901bcab8..36863ffb00c6 100644 > > --- a/include/trace/events/sched.h > > +++ b/include/trace/events/sched.h > > @@ -188,7 +188,7 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new, > > TP_ARGS(p)); > > > > #ifdef CREATE_TRACE_POINTS > > -static inline long __trace_sched_switch_state(bool preempt, > > +static inline int __trace_sched_switch_state(bool preempt, > > unsigned int prev_state, > > struct task_struct *p) > > { > > @@ -251,25 +251,25 @@ TRACE_EVENT(sched_switch, > > TP_ARGS(preempt, prev, next, prev_state), > > > > TP_STRUCT__entry( > > - __array( char, prev_comm, TASK_COMM_LEN ) > > __field( pid_t, prev_pid ) > > - __field( int, prev_prio ) > > - __field( long, prev_state ) > > - __field( char, prev_state_char ) > > - __array( char, next_comm, TASK_COMM_LEN ) > > __field( pid_t, next_pid ) > > - __field( int, next_prio ) > > + __field( short, prev_prio ) > > + __field( short, next_prio ) > > + __field( int, prev_state ) > > + __array( char, prev_comm, TASK_COMM_LEN ) > > + __array( char, next_comm, TASK_COMM_LEN ) > > + __field( char, prev_state_char ) > > ), > > > > TP_fast_assign( > > - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); > > __entry->prev_pid = prev->pid; > > - __entry->prev_prio = prev->prio; > > + __entry->next_pid = next->pid; > > + __entry->prev_prio = (short) prev->prio; > > + __entry->next_prio = (short) next->prio; > > __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); > > __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev); > > memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); > > - __entry->next_pid = next->pid; > > - __entry->next_prio = next->prio; > > + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); > > /* XXX SCHED_DEADLINE */ > > ), > > >
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index e507901bcab8..36863ffb00c6 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -188,7 +188,7 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new, TP_ARGS(p)); #ifdef CREATE_TRACE_POINTS -static inline long __trace_sched_switch_state(bool preempt, +static inline int __trace_sched_switch_state(bool preempt, unsigned int prev_state, struct task_struct *p) { @@ -251,25 +251,25 @@ TRACE_EVENT(sched_switch, TP_ARGS(preempt, prev, next, prev_state), TP_STRUCT__entry( - __array( char, prev_comm, TASK_COMM_LEN ) __field( pid_t, prev_pid ) - __field( int, prev_prio ) - __field( long, prev_state ) - __field( char, prev_state_char ) - __array( char, next_comm, TASK_COMM_LEN ) __field( pid_t, next_pid ) - __field( int, next_prio ) + __field( short, prev_prio ) + __field( short, next_prio ) + __field( int, prev_state ) + __array( char, prev_comm, TASK_COMM_LEN ) + __array( char, next_comm, TASK_COMM_LEN ) + __field( char, prev_state_char ) ), TP_fast_assign( - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); __entry->prev_pid = prev->pid; - __entry->prev_prio = prev->prio; + __entry->next_pid = next->pid; + __entry->prev_prio = (short) prev->prio; + __entry->next_prio = (short) next->prio; __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev); memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); - __entry->next_pid = next->pid; - __entry->next_prio = next->prio; + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); /* XXX SCHED_DEADLINE */ ),
Report priorities in 'short' and prev_state in 'int' to save some buffer space. And also reorder the fields so that we take struct alignment into consideration to make the record compact. Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Ze Gao <zegao@tencent.com> --- include/trace/events/sched.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)