Message ID | 20230726121618.19198-1-zegao@tencent.com (mailing list archive) |
---|---|
Headers | show |
Series | report task state in symbolic chars from sched tracepoint | expand |
On Wed, Jul 26, 2023 at 5:16 AM Ze Gao <zegao2021@gmail.com> wrote: > > > This is the 2nd attempt to fix the report task state issue in sched > tracepint, here is the first version: > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > Against v1, add a new var to report task state in symbolic char instead > of replacing the old one and to not to break anything. > > -- > > In the status quo, we should see three different outcomes of the reported > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > of tracepoint sched_switch. And it's not hard to figure out that the > former two are built upon the third one, and the reason why we see this > inconsistency is that the former two does not catch up with the internal > change of reported task state definitions as the kernel evolves. > > IMHO, exporting internal representations of task state in the tracepoint > sched_switch is not a good practice and not encouraged at all, which can > easily break userspace tools that relies on it. Especially when tracepoints > are massively used in many observability tools nowadays due to its stable > nature, which makes them no longer used for debug only purpose and we > should be careful to decide what ought to be reported to userspace and what > ought not. > > Therefore, to fix the issues mentioned above for good, instead of choosing > I proposed to add a new variable to report task state in sched_switch with > a symbolic character along with the old hardcoded value, and save the > further processing of userspace tools and spare them from knowing > implementation details in the kernel. > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. > > Reviews welcome! Thanks Ze, I think this is worthwhile cleanup and makes the code overall simpler. I don't know if others have strong opinions, I don't often work in this code, but I think the patches are worth landing this. Acked-by: Ian Rogers <irogers@google.com> Thanks, Ian > Regards, > > Ze > > Ze Gao (2): > sched, tracing: add to report task state in symbolic chars > perf sched: use the new prev_state_char instead in tracepoint > sched_switch > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > 2 files changed, 45 insertions(+), 72 deletions(-) > > Ze Gao (1): > libtraceevent: use the new prev_state_char instead in tracepoint > sched_switch > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > 1 file changed, 4 insertions(+), 25 deletions(-) > > -- > 2.40.1 >
Thanks Ian, In regard to ABI, symbolic chars are much more stable and I think we can benefit from this in the long run. Regards, Ze On Sat, Jul 29, 2023 at 1:29 AM Ian Rogers <irogers@google.com> wrote: > > On Wed, Jul 26, 2023 at 5:16 AM Ze Gao <zegao2021@gmail.com> wrote: > > > > > > This is the 2nd attempt to fix the report task state issue in sched > > tracepint, here is the first version: > > > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > > > Against v1, add a new var to report task state in symbolic char instead > > of replacing the old one and to not to break anything. > > > > -- > > > > In the status quo, we should see three different outcomes of the reported > > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > > of tracepoint sched_switch. And it's not hard to figure out that the > > former two are built upon the third one, and the reason why we see this > > inconsistency is that the former two does not catch up with the internal > > change of reported task state definitions as the kernel evolves. > > > > IMHO, exporting internal representations of task state in the tracepoint > > sched_switch is not a good practice and not encouraged at all, which can > > easily break userspace tools that relies on it. Especially when tracepoints > > are massively used in many observability tools nowadays due to its stable > > nature, which makes them no longer used for debug only purpose and we > > should be careful to decide what ought to be reported to userspace and what > > ought not. > > > > Therefore, to fix the issues mentioned above for good, instead of choosing > > I proposed to add a new variable to report task state in sched_switch with > > a symbolic character along with the old hardcoded value, and save the > > further processing of userspace tools and spare them from knowing > > implementation details in the kernel. > > > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. > > > > Reviews welcome! > > Thanks Ze, > > I think this is worthwhile cleanup and makes the code overall simpler. > I don't know if others have strong opinions, I don't often work in > this code, but I think the patches are worth landing this. > > Acked-by: Ian Rogers <irogers@google.com> > > Thanks, > Ian > > > Regards, > > > > Ze > > > > Ze Gao (2): > > sched, tracing: add to report task state in symbolic chars > > perf sched: use the new prev_state_char instead in tracepoint > > sched_switch > > > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > > 2 files changed, 45 insertions(+), 72 deletions(-) > > > > Ze Gao (1): > > libtraceevent: use the new prev_state_char instead in tracepoint > > sched_switch > > > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > > 1 file changed, 4 insertions(+), 25 deletions(-) > > > > -- > > 2.40.1 > >
On Wed, 26 Jul 2023 20:16:15 +0800 Ze Gao <zegao2021@gmail.com> wrote: > > This is the 2nd attempt to fix the report task state issue in sched > tracepint, here is the first version: > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > Against v1, add a new var to report task state in symbolic char instead > of replacing the old one and to not to break anything. > > -- > > In the status quo, we should see three different outcomes of the reported > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > of tracepoint sched_switch. And it's not hard to figure out that the > former two are built upon the third one, and the reason why we see this > inconsistency is that the former two does not catch up with the internal > change of reported task state definitions as the kernel evolves. > > IMHO, exporting internal representations of task state in the tracepoint > sched_switch is not a good practice and not encouraged at all, which can > easily break userspace tools that relies on it. Especially when tracepoints > are massively used in many observability tools nowadays due to its stable > nature, which makes them no longer used for debug only purpose and we > should be careful to decide what ought to be reported to userspace and what > ought not. > > Therefore, to fix the issues mentioned above for good, instead of choosing > I proposed to add a new variable to report task state in sched_switch with > a symbolic character along with the old hardcoded value, and save the > further processing of userspace tools and spare them from knowing > implementation details in the kernel. > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. This series looks good to me. Putting the flag in the trace record is a good idea :) Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> for this series. Thank you, > > Reviews welcome! > > Regards, > > Ze > > Ze Gao (2): > sched, tracing: add to report task state in symbolic chars > perf sched: use the new prev_state_char instead in tracepoint > sched_switch > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > 2 files changed, 45 insertions(+), 72 deletions(-) > > Ze Gao (1): > libtraceevent: use the new prev_state_char instead in tracepoint > sched_switch > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > 1 file changed, 4 insertions(+), 25 deletions(-) > > -- > 2.40.1 >
Hi Masami, Thanks for your review! Sincerely, Ze On Mon, Jul 31, 2023 at 11:53 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Wed, 26 Jul 2023 20:16:15 +0800 > Ze Gao <zegao2021@gmail.com> wrote: > > > > > This is the 2nd attempt to fix the report task state issue in sched > > tracepint, here is the first version: > > > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > > > Against v1, add a new var to report task state in symbolic char instead > > of replacing the old one and to not to break anything. > > > > -- > > > > In the status quo, we should see three different outcomes of the reported > > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > > of tracepoint sched_switch. And it's not hard to figure out that the > > former two are built upon the third one, and the reason why we see this > > inconsistency is that the former two does not catch up with the internal > > change of reported task state definitions as the kernel evolves. > > > > IMHO, exporting internal representations of task state in the tracepoint > > sched_switch is not a good practice and not encouraged at all, which can > > easily break userspace tools that relies on it. Especially when tracepoints > > are massively used in many observability tools nowadays due to its stable > > nature, which makes them no longer used for debug only purpose and we > > should be careful to decide what ought to be reported to userspace and what > > ought not. > > > > Therefore, to fix the issues mentioned above for good, instead of choosing > > I proposed to add a new variable to report task state in sched_switch with > > a symbolic character along with the old hardcoded value, and save the > > further processing of userspace tools and spare them from knowing > > implementation details in the kernel. > > > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. > > This series looks good to me. Putting the flag in the trace record is > a good idea :) > > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > for this series. > > Thank you, > > > > > Reviews welcome! > > > > Regards, > > > > Ze > > > > Ze Gao (2): > > sched, tracing: add to report task state in symbolic chars > > perf sched: use the new prev_state_char instead in tracepoint > > sched_switch > > > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > > 2 files changed, 45 insertions(+), 72 deletions(-) > > > > Ze Gao (1): > > libtraceevent: use the new prev_state_char instead in tracepoint > > sched_switch > > > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > > 1 file changed, 4 insertions(+), 25 deletions(-) > > > > -- > > 2.40.1 > > > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>